Textbook: since this section is inexplicably missing from this year's textbook, we'll make do with these notes, and the chapters handed out in class.
Note: A large part of this we will undoubtedly be deferring to the next session.
Number of host computers on the Internet ^ Jan 2007: 433,193,199 | Jan 2003: 171,638,297 100,000,000 | Jan 1999: 43,200,000 | 16,000,000 10,000,000 | # | 1,100,000 # 1,000,000 | # # | # # 100,000 | # # | 28,000 # # 10,000 | # # # | # # # 1,000 | # # # | 213 # # # 100 | # # # # | # # # # +----+----+----+----+----> '81 '87 '92 '97
How does my message get through the Internet?
Also, I can send pictures and movies (and viruses) as attachments. How does that
work?
A bit is a single binary digit, either 0 or 1.
Not the only way to build hardware, but the simplest.
Think:
electricity through wire = 1
no electricity through wire = 0
Bits are so small that they're inconvenient. So: a byte is a binary number contained in 8 bits:
00000000(2) = 0(10) 11111111(2) = 255(10)A byte can represent/stand for/symbolize 256 things.
(A kilobyte is 210=1,024 bytes.
A megabyte is 220=1,048,576 bytes.)
Integers are represented fairly directly as binary numbers in
N bytes,
with N depending on how new the computer is.
The only tricky part is representing negative integers;
twos
complement is the name of the dominant method.
For our purposes, just think of using one bit to represent
negative.
So one byte can hold -127 to +127.
(As an aside, there are some situations where you need really big integers, and some languages like Lisp support that.)
So 3.14 in 32 bits is
0 10000001 10010001111010111000010
Characters are translated into bits using the ASCII standard (American Standard Code for Information Interchange - not that it matters).
letter binary code decimal 'A' 01000001 64 + 1 = 65 'B' 01000010 64 + 1 = 66 : 'Z' 01011010 64 + 26 = 90And so on, to include lower case letters, punctuation, etc. So
01001001 00100000 01100001 01101101 00101110is the string "I am."
The main point: it's all just a bunch of bytes!
On one level... The Internet is a network of wires connected with computers.
On another... The Internet gives programs the capability to communicate between computers.
We'll see how the Internet bridges this gap.
To simplify the matter, we split the bridge into four layers:
Each layer adds a header explaining how to handle the message at its level:
+----------+--------------+---------------+------------------------------+ | physical | internetwork | transport | application | | header | header (IP) | header (TCP) | message (eg, HTTP) | +----------+--------------+---------------+------------------------------+ <--- front
Of the physical layer we will say little: It magically sends a message across a single network.
The IP software must figure out where a packet should go, and how to get it there.
Machines have two names: a mnemonic name (composed of words) for humans to remember:
jasmine.bh.andrew.cmu.eduand a 4-byte numerical IP address that is really used by the machines:
128.2.124.152The IP address is used to describe the destination of a message. The first two numbers, 128.2, indicate CMU's network domain, cmu.edu.
(The hierarchical naming in both the mnemonic and numerical forms is very clever, but not essential to examine for our purposes.)
In order to send a message, the computer must first convert the mnemonic name into the real numerical IP address. This is called name resolution.
Since the Internet is big and always changing, the computer contacts a domain name server (DNS) to resolve the IP address. In principle, to find jasmine.bh.andrew.cmu.edu, your computer
Of course, in pactice this would make the .edu (or .com!) domain name
server really busy, and take a long time for each name
resolution.
So in practice the system uses caches: each computer in the
chain stores the IP addresses it sees. This saves time and network
traffic, and allows names to be resolved quickly most of the time.
But if you're the first person in a while to try to contact a
webserver in Zanzibar, it will take noticeably longer for the DNS to
resolve the name.
Who is in charge of domain names? Look here.
The gateway computers have routing tables that tell them where non-local messages go. Consider the following relatively simple case:
gateway gateway gateway 10.0.0.5 20.0.0.6 30.0.0.7 20.0.0.5 30.0.0.6 40.0.0.7 network 10.?.?.? network 20.?.?.? network 30.?.?.? rest of Internet
The routing table of the middle gateway above might look something like this:
if destination is: | then route to: |
10.?.?.? | 20.0.0.5 |
20.?.?.? | local destination |
30.?.?.? | local destination |
else | 30.0.0.7 |
These routing tables need to evolve over time. Periodically, gateways tell their neighbors about the best routes they know. If the recipient decides it needs to update its routing table, it tells its neighbors.
As mentioned before, gateways do not guarantee delivery, only best-effort. They frequently drop packets, for a number of reasons:
IP gives us
Since more than one program might want to use the Internet on a single computer, each program reserves a port when it wants to communicate. There are 65k port numbers (0--65,535), which can be specified using two-byte port numbers.
When a program establishes a TCP connection, it sends its port number, so that the other program knows how to find it to respond (its numerical internet address is already in the IP header).
A server is a program waiting for connections on a computer with a port reserved. Common servers have certain well-known ports reserved for them, so that other programs can easily find them and send them messages:
port | protocol |
---|---|
21 | FTP |
25 | SMTP |
53 | DOMAIN |
80 | HTTP |
1530 | FishNet |
A client reserves a port on its own computer and sends messages to the server by sending messages to the server's port-computer combination. Then the server can respond by sending messages to the client's port-computer combination. And they talk.
(Notice there's nothing wrong with a server or client talking to multiple programs using the same port.)
The basic approach to providing reliable delivery is straightforward, but things get complicated in order to be efficient.
The receiver sends an acknowledgement message (ACK) when it receives some data. If the sender doesn't get an ACK soon enough, it resends the data:
The packets in one connection are numbered in order to allow the receiver to be sure it has them all, and in the right order.
One challenging problem is deciding how long to wait before giving up on acknowledgements.
Let's skip that part.
Our simple acknowledgement protocol is very slow, like a bucket brigade with only one bucket:
+-------------------+ +----|----+----+----+----|----+----+----+----+----+----+ | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | +----|----+----+----+----|----+----+----+----+----+----+ +-------------------+ A sliding window of size 4The window size corresponds to the number of buckets.
byte 0 source port byte 2 destination port byte 4 sequence number (tells which segment is sent) byte 8 acknowledgement number (tells which segment has been received) byte 12 header length byte 12.5 ignore byte 14 desired window size byte 16 ignore byte 20 : options : byte ?? : application message
http://avrim.pc.cs.cmu.edu/index.htmlThis indicates to the browser that it should use HTTP to request the file /index.html from avrim.pc.cs.cmu.edu. So the browser uses TCP to open a connection to port 80 on that machine (since that is HTTP's well-known port number).
Once the connection is open, it sends the following message to the web server there:
GET /index.html HTTP/1.1 Accept: text/html(This ends with a blank line.)
The server responds with a message like the following message, and then closes the connection:
HTTP/1.0 200 Document follows Server: CERN/3.0A Date: Mon, 11 Jan 1999 03:22:42 GMT Content-Type: text/html Content-Length: 115 Last-Modified: Mon, 11 Jan 1999 03:17:24 GMT <p>I'm <tt>avrim.pc.cs.cmu.edu</tt>; my primary user is <a href=http://www.cburch.com/>Carl Burch</a>.</p>
The HTML (HyperText Markup Language) encoding seen here is not part of the network protocols, but rather a well-designed way of embedding addresses etc. invisibly into text.
Suppose I'm spot@cburch.com working on avrim.pc.cs.cmu.edu and I tell my email program to send email to burch@andrew.cmu.edu. It uses TCP on avrim to open a connection to port 25 on andrew.cmu.edu. (We'll use boldface to distinguish text sent from avrim below.)
First andrew responds with a welcome message (220 codes let any program reading this know that it's a welcome message):
220-andrew.cmu.edu ESMTP Sendmail 8.8.5/8.8.2 220-Mis-identifying the sender of mail is an abuse of computing facilities 220 ESMTP spoken here helo avrim.pc.cs.cmu.edu 250 andrew.cmu.edu Hello AVRIM.PC.CS.CMU.EDU [128.2.185.114], pleased to meet you mail from: spot@cburch.com 250 spot@cburch.com... Sender ok rcpt to: burch@andrew.cmu.edu 250 burch@andrew.cmu.edu... Recipient ok data 354 Enter mail, end with "." on a line by itself Arf, arf! . 250 XAA21092 Message accepted for delivery quit 221 andrew.cmu.edu closing connection