1 03-Transport

Previous: 02-Application.html

1.1 Audio-recording

1.2 Opening thought

“Now, if someone tries to monopolize the Web,
for example pushes proprietary variations on network protocols,
then that would make me unhappy.”
- Tim Berners-Lee

1.2.1 Internet shutdowns

https://www.accessnow.org/keepiton/ (good animation no longer posted).
Review methods.

1.2.2 An interesting story in 2022


1.2.3 A partial Solution

and as many layers as you can,
ideally, from the IP-layer, all the way up,
via IPSec.

1.3 Review

Review by glancing back now at section on encapsulation and layering:

1.4 Introduction


* 3.1-3.2

* http://intronetworks.cs.luc.edu/current2/uhtml/intro.html#transport
* https://www.computer-networking.info/1st/html/transport/principles.html
* https://www.computer-networking.info/2nd/html/protocols/transport.html
* https://en.wikipedia.org/wiki/Transport_layer

Application usage of transport layer protocols

Transport layer is more than TCP and UDP

1.4.1 Relationship between layers


Application and network layers are end-end
Note: Transport facilitates process-to-process connections.

1.4.2 Overview

1.4.3 Services

Transport layer services are conveyed to an application via a programming interface to the transport layer protocols. Required core services

* Locally extending host-to-host delivery to process-to-process delivery is called transport-layer multiplexing and de-multiplexing
* Ports can provide multiple endpoints on a single node.
* For example, the name on a postal address is a kind of multiplexing, and distinguishes between different recipients of the same location.
* Computer applications will each listen for information on their own ports, which enables the use of more than one network service at the same time.
* For example, you could run a web server and a mail server on the same host, with the same IP address.
* This service is part of the transport layer in the TCP/IP model, but of the session layer in the OSI model (the more unrealistic academic one). Optional services

Connection-oriented communication:
It is normally easier for an application to interpret a connection as a data stream rather than having to deal with the underlying connection-less models, such as the datagram model of the User Datagram Protocol (UDP) and of the Internet Protocol (IP).

Same order delivery:
* The network layer doesn’t generally guarantee that packets of data will arrive in the same order that they were sent, but often this is a desirable feature.
* This is usually done through the use of segment numbering, with the receiver passing them to the application in order.
* This can cause head-of-line blocking.

* Packets may be lost during transport due to network congestion and errors.
* By means of an error detection code, such as a checksum, the transport protocol may check that the data is not corrupted, and verify correct receipt by sending an ACK or NACK message to the sender.
* Automatic repeat request schemes may be used to re-transmit lost or corrupted data.

Flow control:
* The rate of data transmission between two nodes must sometimes be managed to prevent a fast sender from transmitting more data than can be supported by the receiving data buffer, causing a buffer overrun.
* This can also be used to improve efficiency by reducing buffer under-run.

Congestion avoidance:
* Congestion control can control traffic entry into a telecommunications network, so as to avoid congestive collapse by attempting to avoid over-subscription of any of the processing or link capabilities of the intermediate nodes and networks and taking resource reducing steps, such as reducing the rate of sending packets.
* For example, automatic repeat requests may keep the network in a congested state
* This situation can be avoided by adding congestion avoidance to the flow control, including slow-start.
* This keeps the bandwidth consumption at a low level in the beginning of the transmission, or after packet re-transmission.

1.5 Ports


1.5.1 Port numbers

1.5.2 Use in URLs

1.5.3 Scanning open ports


+++++++++++++++++++++++++++++++++ Cahoot-03-0

nmap web.mst.edu
Watch this with Wireshark

does extensive port-scanning There are many types of port-scan

* TCP scan
* SYN scan
* UDP scan
* ACK scan
* Window scan
* FIN scan
* and more…

1.5.4 Inversion of source and destination port numbers


1.5.5 Connection Multiplexing and De-multiplexing


Multiplexing at the transport layer refers merely to the mixing and un-mixing of traffic with different (port, IP) tuples for source and destination addresses.

Transport-layer multiplexing and de-multiplexing

Web server and two clients
While destination (IP, port) tuples are the same, the source (IP, port) tuples differ for each client. Multiplex at sender

Handle data from multiple sockets,
add transport header including source (IP, port) tuple,
and destination (IP, port) tuple,
used for de-multiplexing at the destination,
and on response back to the source. De-multiplex at receiver

Use header info to deliver received segments to correct process, and correct socket within the process. Transport headers generally

Source and destination port-numbers in segment

1.6 UDP

I heard a great UDP joke the other day….
You might not get it.


* Chapter 3.3-3.4

* http://intronetworks.cs.luc.edu/current2/uhtml/udp.html
* https://www.computer-networking.info/1st/html/transport/udp.html
* https://www.computer-networking.info/2nd/html/protocols/udp.html

* https://en.wikipedia.org/wiki/User_Datagram_Protocol
* https://en.wikipedia.org/wiki/Datagram_Transport_Layer_Security
* (UDP encryption)
* https://tools.ietf.org/html/rfc768

1.6.1 Recall application usage


1.6.2 Connectionless de-multiplexing

recall: when creating datagram to send into UDP socket, must specify destination IP address destination port #

1.6.3 Overview

The best thing about UDP jokes is that I don’t care if you get them or not.

1.6.4 Features

I had another funny UDP joke to tell,
but I lost it somewhere…

1.6.5 UDP Segment structure



UDP segment structure:
* The UDP header contains four fields:
* a 16 bits source port
* a 16 bits destination port
* a 16 bits length field
* a 16 bits checksum

Source port number:
* sender’s port; should be assumed to be the port to reply to if needed.
* If the source host is the client, the port number is likely to be an ephemeral port number.
* If the source host is the server, the port number is likely to be a well-known port number.

Destination port number:
* receiver’s port is required.
* Similar to source port number, if the client is the destination host then the port number will likely be an ephemeral port number and if the destination host is the server then the port number will likely be a well-known port number.
* specifies the length in bytes of the UDP header and UDP data.
* The minimum length is 8 bytes because that is the length of the header.
* Data length, which is imposed by the underlying IPv4 protocol, is 65,507 bytes (65,535 - 8 byte UDP header - 20 byte IP header).
* may be used for error-checking of the header and data.
* This field is optional in IPv4, and mandatory in IPv6.
* The field carries all-zeros if unused.


I’d make another joke about UDP,
but I don’t know if anyone’s actually listening…

1.6.6 Checksum

Goal: detect “errors” (e.g., flipped bits) in transmitted segment sender receiver

1.6.7 UDP checksum for IPv4

* You may wonder why UDP provides a checksum in the first place, as many link layer protocols (including the popular Ethernet protocol) also provide error checking.
* There is no guarantee that all the links between source and destination provide error checking
* one of the links may use a link-layer protocol that does not provide error checking.
* Even if segments are correctly transferred across a link, it’s possible that bit errors could be introduced when a segment is stored in a router’s memory.
* Given that neither link-by-link reliability nor in-memory error detection is guaranteed, UDP must provide error detection at the transport layer, on an end-end basis, if the end-end data transfer service is to provide error detection.
* The checksum at this level is an example of the celebrated end-end principle in system design [Saltzer 1984], which states that since certain functionality (error detection, in this case) should be implemented on an end-end basis:
* “functions placed at the lower levels may be redundant or of little value when compared to the cost of providing them at the higher level.”
* UDP does not do anything to recover from an error.
* Some implementations of UDP simply discard the damaged segment; others pass the damaged segment to the application with a warning.

Pseudo-header the checksum is taken over:
* The source and destination addresses are those in the IPv4 header.
* The protocol is that for UDP (see List of IP protocol numbers): 17 (0x11).
* The UDP length field is the length of the UDP header and data. The field data stands for the transmitted data.
* UDP checksum computation is optional for IPv4.
* If a checksum is not used it should be set to the value zero.
* Checksum uses the 1s complement: https://en.wikipedia.org/wiki/Ones%27_complement
* 1s complement of the sum of all the 16-bit words in the segment, such that adding back the checksum to the same input will produce 111111111…
* The ones’ complement of a binary number is defined as the value obtained by inverting all the bits in the binary representation of the number (swapping 0s for 1s and vice versa).
* The ones’ complement of the number then behaves like the negative of the original number in some arithmetic operations.

As an example, suppose that we have the following three 16-bit words:

The sum of first two of these 16-bit words is

Adding the third word to the above sum gives

The checksum of the UDP segment is computed over:
* a pseudo header containing
* the source IP address,
* the destination IP address and
* a 32 bits bit field containing the most significant byte set to 0, the second set to 17 and the length of the UDP segment in the lower two bytes
* the entire UDP segment, including its header

UDP checksum pseudo header includes IP addresses (which are normally in network layer), why?

1.6.8 UDP checksum for IPv6

+++++++++++++++++++++++ Cahoot-03-1

Are any others of these disadvantages?
If we assume there are bad network citizens?

1.6.9 Why bother with UDP UDP is well suited for certain applications

I once told an NTP joke.
The timing was perfect…

The punchline often arrives before the set-up.
Do you know the problem with UDP jokes?

1.6.10 Reliable transfer over UDP

+++++++++++++++++++++++ Cahoot-03-2

1.6.11 Security for UDP?

1.7 Wireshark and tools

nc (netcat) can also serve to explore UDP functionality (-u is udp)
$ nc -u serverurl

Re-demonstrate with Wireshark:
socket_UDP_server.py, socket_UDP_client.py, nc -u, nslookup

Let’s begin to think about TCP:

Re-demonstrate with Wireshark:
socket_TCP_server_mt.py, socket_TCP_client_mt.py, ncat, traceroute, web browser, ncat

Some things to observe:
* Analyze > follow > TCP stream
* Statistics > flow graph > TCP selection
* Notice
* Handshake
* Transfer
* Closing connection

1.8 Reliable data transfer


Reliable data transfer
How to implement the black frames boxes above?

1.8.1 General mechanisms for reliability

* Used to detect bit errors in a transmitted packet.

Timer (timeout)
* Used to timeout/re-transmit a packet, possibly because the packet (or its ACK) was lost within the channel.
* Because timeouts can occur when a packet is delayed but not lost (premature timeout), or when a packet has been received by the receiver but the receiver-to-sender ACK has been lost, duplicate copies of a packet may be received by a receiver.

Sequence number
* Used for sequential numbering of packets of data flowing from sender to receiver.
* Gaps in the sequence numbers of received packets allow the receiver to detect a lost packet.
* Packets with duplicate sequence numbers allow the receiver to detect duplicate copies of a packet.

* Used by the receiver to tell the sender that a packet or set of packets has been received correctly.
* Acknowledgments will typically carry the sequence number of the packet or packets being acknowledged. Acknowledgments may be individual or cumulative, depending on the protocol.

Negative acknowledgment
* Used by the receiver to tell the sender that a packet has not been received correctly.
* Negative acknowledgments will typically carry the sequence number of the packet that was not received correctly.

Window, pipelining
* Note: These two are more like speed-enhancing mechanisms for compensating for reliability mechanisms.
* The sender may be restricted to sending only packets with sequence numbers that fall within a given range.
* By allowing multiple packets to be transmitted but not yet acknowledged, sender utilization can be increased over a stop-and-wait mode of operation.
* We’ll see shortly that the window size may be set on the basis of the receiver’s ability to receive and buffer messages, or the level of congestion in the network, or both.

More to come in protocol operation (below).

This is about how the next part covering TCP will feel:

1.9 TCP

“Hi, I’d like to hear a TCP joke”
“Hello, would you like to hear a TCP joke”
“Yes, I’d like to hear a TCP joke”
“OK, I’ll tell you a TCP joke”
“OK, I’ll hear a TCP joke”
“Are you ready to hear a TCP joke”
“Yes, I’m ready hear a TCP joke”
“OK, I’m about to send a TCP joke. It will last 10 seconds, it has two characters, it doesn’t have a setting, it ends with a punchline”
“Ok, I’m ready to hear a TCP joke that will last 10 seconds, has two characters, has no explicit setting and ends with a punchline”
“I’m sorry, your connection has timed out,
… Hello, Would you like to hear a TCP joke”


* Chapter 3.5-3.8

* http://intronetworks.cs.luc.edu/current2/uhtml/
* http://intronetworks.cs.luc.edu/current2/uhtml/tcpA.html (read)
* http://intronetworks.cs.luc.edu/current2/uhtml/tcpB.html
* http://intronetworks.cs.luc.edu/current2/uhtml/reno.html
* http://intronetworks.cs.luc.edu/current2/uhtml/dynamicsA.html
* http://intronetworks.cs.luc.edu/current2/uhtml/dynamicsB.html
* http://intronetworks.cs.luc.edu/current2/uhtml/newtcps.html
* https://www.computer-networking.info/1st/html/transport/tcp.html
* https://www.computer-networking.info/2nd/html/protocols/tcp.html

* https://en.wikipedia.org/wiki/Transmission_Control_Protocol

* https://tools.ietf.org/html/rfc7414
* RFCs: 793, 1122, 1323, 2018, 2581
* https///media.pearsoncmg.com/aw/ecs_kurose_compnetwork_7/cw/content/interactiveanimations/go-back-n-protocol/index.html|go-back-n
* https///media.pearsoncmg.com/aw/ecs_kurose_compnetwork_7/cw/content/interactiveanimations/selective-repeat-protocol/index.html|selective-repeat

1.9.1 Recall application layer socket usage

TCP socket:

1.9.2 Connection oriented de-multiplexing

demux: receiver uses all four values to direct segment to appropriate socket.
* server host may support many simultaneous TCP sockets:
* each socket identified by its own 4-tuple
* web servers have different sockets for each connecting client
* non-persistent HTTP will have different socket for each request

UDP looked like this:

TCP looks like this (e.g., Web server and two clients):

1.9.3 Overview

* Transmission Control Protocol is a connection-oriented protocol.
* This means that it requires handshaking to set up end-to-end communications.
* Once a TCP connection is made via a 3-way handshake, an application sends data simply by writing to that connection.
* No further application-level addressing is needed.
* TCP connections are managed by the operating-system kernel, not by the application.
* As part of TCP connection establishment, both sides of the connection will initialize many TCP state variables in memory.

* Manages message acknowledgment, re-transmission, and timeout.
* Multiple attempts to deliver the message are made.
* If part of a message gets lost along the way, the server will re-request the lost part.
* There will be either no missing data, or, in case of multiple timeouts, a connection is dropped.
* TCP numbers each packet, keeps track of which are lost, and re-transmits them after a timeout.
* It holds early-arriving out-of-order packets for delivery in the correct sequence at a later time.
* Every arriving data packet is acknowledged by the receiver.
* Timeout and re-transmission occurs when an acknowledgment packet isn’t received by the sender within a given time.

Ordered data transfer:
* If two messages are sent over a connection in sequence, the first message will reach the receiving application first.
* When data segments arrive in the wrong order, TCP buffers delay the out-of-order data until all data can be properly re-ordered and delivered to the application.
* The destination host re-arranges received packets according to sequence number

* TCP requires three packets to set up a socket connection, before any user data can be sent.
* TCP handles reliability and congestion control.

* Data is read as a byte stream:
* From the perspective of the application, no distinguishing indications are transmitted to signal message (segment) boundaries.
* An application using TCP can write 1 byte at a time, or 100 kB at a time.
* TCP will buffer and/or divide up the data into appropriate sized packets.

Re-transmission of lost packets:
* Any parts of the cumulative stream not acknowledged are re-transmitted

Error-free data transfer:
* Service guarantees data were transmitted veritably

Flow control:
* limits the rate that a sender transfers data, to facilitate consistent delivery.
* The receiver continually hints to the sender how much data can be received.
* Controlled by a sliding window parameter
* When the receiving host’s buffer fills, the next acknowledgment contains a 0 in the window size, to stop transfer and allow the data in the buffer to be processed.
* Thus, sender will not overwhelm receiver’s buffers.

Congestion control:
* Self-throttles to fairly utilize bandwidth between all connections.

* Always between a single sender and a single receiver (not multiple).

Full duplex
* Once a connection is set up, user data may be sent bi-directionally over the connection.
* If there is a TCP connection between Process A on one host and Process B on another host, then application-layer data can flow from Process A to Process B at the same time as application-layer data flows from Process B to Process A.

1.9.4 Buffers

+++++++++++++++++++++++ Cahoot-03-3

1.9.5 Source and destination

What identifies a TCP stream?

1.9.6 TCP Segment structure

Sequence number (32 bits), acknowledgment number (32 bits), and window (16 bits)
* Used to provide a reliable data transfer, using a window-based protocol.
* In a TCP bytestream, each byte of the stream consumes one sequence number.

Segment structure details (also observe this is Wireshark, and compare side-by-side):
Source port (16 bits)
* Identifies the sending port.

Destination port (16 bits)
* Identifies the receiving port.

Sequence number (32 bits)
* For reliability
* Has a dual role:
* If the SYN flag is set (1), then this is the initial sequence number.
* The sequence number of the actual first data byte and the acknowledged number in the corresponding ACK are then this sequence number plus 1.
* If the SYN flag is clear (0), then this is the accumulated sequence number of the first data byte of this segment for the current session.

Acknowledgment number (32 bits)
* For reliability
* If the ACK flag is set then the value of this field is the next sequence number that the sender of the ACK is expecting.
* This acknowledges receipt of all prior bytes (if any).
* The first ACK sent by each end acknowledges the other end’s initial sequence number itself, but no data.

Data offset (4 bits)
* Specifies the size of the TCP header in 32-bit words.
* The minimum size header is 5 words and the maximum is 15 words thus giving the minimum size of 20 bytes and maximum of 60 bytes, allowing for up to 40 bytes of options in the header.
* This field gets its name from the fact that it is also the offset from the start of the TCP segment to the actual data.

Reserved (3 bits)
* For future use and should be set to zero.

Flags (9 bits) (aka Control bits)
* Contains 9 1-bit flags
* NS (1 bit): ECN-nonce - concealment protection (experimental: see RFC 3540).
* CWR (1 bit):
* Congestion Window Reduced (CWR) flag is set by the sending host to indicate that it received a TCP segment with the ECE flag set and had responded in congestion control mechanism (added to header by RFC 3168).
* ECE (1 bit):
* ECN-Echo has a dual role, depending on the value of the SYN flag.
* It indicates:
* If the SYN flag is set (1), that the TCP peer is ECN capable.
* If the SYN flag is clear (0), that a packet with Congestion Experienced flag set (ECN=11) in the IP header was received during normal transmission (added to header by RFC 3168).
* This serves as an indication of network congestion (or impending congestion) to the TCP sender.
* URG (1 bit):
* indicates that the Urgent pointer field is significant
* ACK (1 bit):
* indicates that the Acknowledgment field is significant.
* All packets after the initial SYN packet sent by the client should have this flag set.
* PSH (1 bit):
* Push function.
* Asks to push the buffered data to the receiving application.
* In practice TCP implementations do not allow TCP users to indicate when the PSH flag should be set and thus there are few real utilizations of this flag.
* RST (1 bit):
* Reset the connection
* SYN (1 bit):
* Synchronize sequence numbers.
* Only the first packet sent from each end should have this flag set.
* Some other flags and fields change meaning based on this flag, and some are only valid when it is set, and others when it is clear.
* FIN (1 bit):
* Finish
* Last packet from sender.

Window size (16 bits)
* For flow control
* The size of the receive window, which specifies the number of window size units (by default, bytes) (beyond the segment identified by the sequence number in the acknowledgment field) that the sender of this segment is currently willing to receive (see Flow control and Window Scaling).

Checksum (16 bits)
* The 16-bit checksum field is used for error-checking of the header, the Payload, and a Pseudo-Header.
* The Pseudo-Header consists of the Source IP Address, the Destination IP Address, the protocol number for the TCP-Protocol (0x0006), and the length of the TCP-Headers including Payload (in Bytes).

Urgent pointer (16 bits)
* if the URG flag is set, then this 16-bit field is an offset from the sequence number indicating the last urgent data byte.

Options (Variable 0320 bits, divisible by 32)
* The length of this field is determined by the data offset field. Options have up to three fields: Option-Kind (1 byte), Option-Length (1 byte), Option-Data (variable). The Option-Kind field indicates the type of option, and is the only field that is not optional. Depending on what kind of option we are dealing with, the next two fields may be set: the Option-Length field indicates the total length of the option, and the Option-Data field contains the value of the option, if applicable. For example, an Option-Kind byte of 0x01 indicates that this is a No-Op option used only for padding, and does not have an Option-Length or Option-Data byte following it. An Option-Kind byte of 0 is the End Of Options option, and is also only one byte. An Option-Kind byte of 0x02 indicates that this is the Maximum Segment Size option, and will be followed by a byte specifying the length of the MSS field (should be 0x04). This length is the total length of the given options field, including Option-Kind and Option-Length bytes. So while the MSS value is typically expressed in two bytes, the length of the field will be 4 bytes (+2 bytes of kind and length). In short, an MSS option field with a value of 0x05B4 will show up as (0x02 0x04 0x05B4) in the TCP options section. Some options may only be sent when SYN is set; they are indicated below as [SYN]. Option-Kind and standard lengths given as (Option-Kind,Option-Length).
* 0 (8 bits): End of options list
* 1 (8 bits): No operation (NOP, Padding) This may be used to align option fields on 32-bit boundaries for better performance.
* 2,4,SS (32 bits): Maximum segment size (see maximum segment size) [SYN]
* 3,3,S (24 bits): Window scale (see window scaling for details) [SYN][10]
* 4,2 (16 bits): Selective Acknowledgment permitted. [SYN] (See selective acknowledgments for details)[11]
* 5,N,BBBB,EEEE,… (variable bits, N is either 10, 18, 26, or 34)- Selective Acknowledgment (SACK)[12] These first two bytes are followed by a list of 14 blocks being selectively acknowledged, specified as 32-bit begin/end pointers.
* 8,10,TTTT,EEEE (80 bits)- Timestamp and echo of previous timestamp (see TCP timestamps for details)
* The remaining options are historical, obsolete, experimental, not yet standardized, or unassigned. Option number assignments are maintained by the IANA.

* The TCP header padding is used to ensure that the TCP header ends, and data begins, on a 32 bit boundary. The padding is composed of zeros.

03-Transport/detail_tcp_header.png Options: Robustness

The robustness principle (good advice in networking and in life…):

The handling of the TCP options by TCP implementations is one of the many applications of the robustness principle, which is usually attributed to Jon Postel, and is often quoted as:

“Be liberal in what you accept, and conservative in what you send” RFC 1122

In other words:
Even when you have the capacity to understand complexity and variety, speak simply.
Can you accurately explain a complex topic to someone who does not have a full mastery your native language? Checksum computation

TCP checksum also has extra IP information added:
* Minor layering mix-up
* TCP and IP were not independent layers long ago (TCP/IP). When TCP runs over IPv4

The method used to compute the checksum is defined in RFC 793:
* The checksum field is the 16 bit one’s complement of the one’s complement sum of all 16-bit words in the header and text.
* If a segment contains an odd number of header and text octets to be checksummed, the last octet is padded on the right with zeros to form a 16-bit word for checksum purposes.
* The pad is not transmitted as part of the segment.
* While computing the checksum, the checksum field itself is replaced with zeros.
* To summarize:
* After appropriate padding, all 16-bit words are added using one’s complement arithmetic.
* The sum is then bitwise complemented and inserted as the checksum field.
* The source and destination addresses are those of the IPv4 header.
* The Protocol value is 6 for TCP (from a standard list of IP protocol numbers).
* The TCP length field is the length of the TCP header and data (measured in octets). When TCP runs over IPv6

The method used to compute the checksum is changed, as per RFC 2460:
* Any transport or other upper-layer protocol that includes the addresses from the IP header in its checksum computation must be modified for use over IPv6, to include the 128-bit IPv6 addresses instead of 32-bit IPv4 addresses.
Source address:
* the one in the IPv6 header

Destination address:
* The final destination
* If the IPv6 packet doesn’t contain a Routing header, TCP uses the destination address in the IPv6 header
* Otherwise, at the originating node, it uses the address in the last element of the Routing header, and, at the receiving node, it uses the destination address in the IPv6 header.

TCP length:
* the length of the TCP header and data

Next Header:
* the protocol value for TCP (6)

1.9.7 TCP breaks data into chunks

More to come below (protocol operation).

1.9.8 Protocol operation

TCP protocol operations may be divided into three phases.
1. Connections must be properly established in a multi-step handshake process (connection establishment)
2. before entering the data transfer phase.
3. After data transmission is completed, the connection termination closes established virtual circuits and releases all allocated resources.

Trace visiting:
* *_socket_TCP*.py in wireshark
* Notice
* TCP connection setup
* Connection teardown
* Wireshark trace a real connection downloading a file
* Follow > TCP stream (or right click on packet)
* Statistics > flow graph > TCP selection
* Statistics > TCP stream graph > multiple options Connection management

Follow protocol, lest your rock climbing partner falls to their death…
Climber: “On belay?” (Are you ready to belay me?)
Belayer: “Belay on.” (Slack is gone and I’m ready.)
Climber: “Climbing.” (I’m going to climb now.)
Belayer: “Climb on.” (I’m ready for you to climb.)

03-Transport/tcpconn.gif Connection establishment

Show below in Wireshark, with relative and real sequence numbers; compare side-by-side.

First, if the server is NOT accepting connections:
Will be relevant later (nmap uses this).

Second, if the server IS accepting connections:
* To establish a connection, TCP uses a three-way handshake.
* Before a client attempts to connect with a server, the server must first bind to and listen at a port to open it up for connections: this is called a passive open.
* Once the passive open is established, a client may initiate an active open.
* To establish a connection, the three-way (or 3-step) handshake occurs:

  1. SYN:
  2. SYN-ACK:
  3. ACK:

In full detail:
* A TCP connection is established by using a three-way handshake.
* The connection establishment phase uses the sequence number, the acknowledgment number and the SYN flag.
* When a TCP connection is established, the two communicating hosts negotiate the initial sequence number to be used in both directions of the connection.
* For this, each TCP entity maintains a 32 bits counter, which is supposed to be incremented by one at least every 4 microseconds and after each connection establishment.
* When a client host wants to open a TCP connection with a server host, it creates a TCP segment with :
* the SYN flag set
* the sequence number set to the current value of the 32 bits counter of the client host’s TCP entity
* Upon reception of this segment (which is often called a SYN segment), the server host replies with a segment containing:
* the SYN flag set
* the sequence number set to the current value of the 32 bits counter of the server host’s TCP entity
* the ACK flag set
* the acknowledgment number set to the sequence number of the received SYN segment incremented by 1 (mod 232).
* When a TCP entity sends a segment having x+1 as acknowledgment number, this indicates that it has received all data up to and including sequence number x and that it is expecting data having sequence number x+1.
* As the SYN flag was set in a segment having sequence number x, this implies that setting the SYN flag in a segment consumes one sequence number.
* This segment is often called a SYN+ACK segment.
* The acknowledgment confirms to the client that the server has correctly received the SYN segment.
* The sequence number of the SYN+ACK segment is used by the server host to verify that the client has received the segment.
* Upon reception of the SYN+ACK segment, the client host replies with a segment containing:
* the ACK flag set
* the acknowledgment number set to the sequence number of the received SYN+ACK segment incremented by 1 (mod 232)
* At this point, the TCP connection is open and both the client and the server are allowed to send TCP segments containing data.

Simultaneous establishment of a TCP connection (rare, weird)
03-Transport/transport-fig-062-c.png Connection termination

Show this in Wireshark, with relative and real sequence numbers; compare side-by-side.

When a connection ends, the “resources” (that is, the buffers and variables) in the hosts are de-allocated.


* A’s FIN is, in effect, a promise to B not to send any more.
* However, A must still be prepared to receive data from B, hence the optional data shown in the diagram.
* A good example of this occurs when A is sending a stream of data to B to be sorted.
* A sends FIN to indicate that it is done sending, and only then does B sort the data and begin sending it back to A. This can be generated with the command, on A,
* $cat thefile | ssh B sort.
* That said, the presence of the optional B-to-A data above following A’s FIN is relatively less common.
* In the diagram above, A sends a FIN to B and receives an ACK, and then, later, B sends a FIN to A and receives an ACK.
* This essentially amounts to two separate two-way closure handshakes.

In the absence of the optional data from B to A after A sends its FIN, the closing sequence reduces to the left-hand diagram below:
03-Transport/tcp_closes.svg Example


Num A sends B sends
1 SYN, seq=0
2 SYN+ACK, seq=0, ack=1 (expecting)
3 ACK, seq=1, ack=1 (ACK of SYN)
4 “abc”, seq=1, ack=1
5 ACK, seq=1, ack=4
6 “defg”, seq=4, ack=1
7 seq=1, ack=8
8 “foobar”, seq=8, ack=1
9 seq=1, ack=14, “hello”
10 seq=14, ack=6, “goodbye”
11,12 seq=21, ack=6, FIN seq=6, ack=21 ;; ACK of “goodbye”, crossing packets
13 seq=6, ack=22 ;; ACK of FIN
14 seq=6, ack=22, FIN
15 seq=22, ack=7 ;; ACK of FIN

Recall: these sequence numbers are actually big numbers, that differ relatively by the amounts above (and as displayed in Wireshark). State machine representation

During the lifetime of a TCP connection, the local end-point undergoes a series of state changes: States of the client
03-Transport/stateclient.png States of the server

++++++++++++++++++++ Cahoot-03-4

Trace Wireshark

Download this:

Upload this:
http://gaia.cs.umass.edu/wireshark-labs/TCP-wireshark-file1.html List of states State machine: establishment State machine: termination

Starting from the Established state, there are two main paths through the below FSM. State machine diagram overview

Both establishment and termination:


For more detail, see:
http://www.medianet.kent.edu/techreports/TR2005-07-22-tcp-EFSM.pdf TCP block (TCB)

This is what is kept associated with the “table” of all current streams, in RAM, for both the client and the server:

+++++++++++++++++++++ Cahoot-03-5 Side note: nmap


Our discussion above has assumed that both the client and server are prepared to communicate,
i.e., that the server is listening on the port to which the client sends its SYN segment.
What if a host receives a mis-matched TCP segment,
whose port numbers or source IP address do not match any existing sockets in the host?
For example, suppose a host receives a TCP SYN packet with destination port 80,
but the host is not accepting connections on port 80
(that is, it is not running a Web server on port 80).
Then the host will send a special reset segment to the source.
This TCP segment has the RST flag bit set to 1.
Thus, when a host sends a reset segment, it is telling the source:
“I don’t have a socket for that segment. Please do not resend the segment.”
When a host receives a UDP packet whose destination port number doesn’t match with an ongoing UDP socket,
the host often sends a special ICMP datagram.
Now that we have a good understanding of TCP connection management,
let’s revisit the nmap port-scanning tool and examine more closely how it works.

To explore a specific TCP port, say port 6789, on a target host,
nmap will send a TCP SYN segment with destination port 6789 to that host.
There are three possible outcomes:

  1. The source host receives a TCP SYNACK segment from the target host. Since this means that an application is running with TCP port 6789 on the target post, nmap returns “open.”

  2. The source host receives a TCP RST segment from the target host. This means that the SYN segment reached the target host, but the target host is not running an appli- cation with TCP port 6789. But the attacker at least knows that the segments destined to the host at port 6789 are not blocked by any firewall on the path between source and target hosts. (Firewalls are discussed in Chapter 8.)

  3. The source receives nothing. This likely means that the SYN segment was blocked by an intervening firewall and never reached the target host.

Nmap is a powerful tool, which can “case the joint” not only for open TCP ports, but also for open UDP ports, for firewalls and their configurations, and even for the versions of applications and operating systems.
Much of this is done by manipulating TCP connection-management or other administrative segments, for example:

syn scan, ack scan, fin scan, window scan Data transfer

Green box above in TCP’s FSM.

For actual data-containing packets, a complex set of protocol details exist. History

The first designs did not have all the features we’re covering,
which were slowly added over time:
Now, there are many variations on TCP:
https://en.wikipedia.org/wiki/TCP_congestion_control#Algorithms Basics

Maximizing transfer - sending many packets optimistically and rapidly
Reliability and ordered transfer - whole packet loss
Error detection - corrupt packets
Flow control - don’t overwhelm the receiver
Congestion control - dealing with queues and full pipes Maximizing data transfer

Stop and wait versus pipe-lining
Pipelining requires that we think about pipelined acknowledgments and pipelined reliability.
Receive window:
03-Transport/gbn0.png Reliability



Reliability defines protocols for whole packet loss.
Reliability is not defined as dealing with corrupted packets,
which is handled by error detection below.

Two mechanisms for detecting lost packets:
1. Duplicate acknowledgment based re-transmission (dup-ack)
2. Timeout based re-transmission

Reminder: Sequence and acknowledgment numbers
TCP Acknowledgments are cumulative and positive.
We only acknowledge the last received packet,
of a continuous cumulative chain of packets from the start.

Make sure you get this point before continuing!

Duplicate acknowledgment based re-transmission (dup-ack):
If a single packet (say packet 127) in a stream is lost,
then the receiver cannot acknowledge packets above 127 because it uses cumulative ACKs.
Hence the receiver acknowledges packet 123 again on the receipt of another data packet.
This duplicate acknowledgment is used as a signal for packet loss.
That is, if the sender receives three duplicate acknowledgments,
it re-transmits the last un-acknowledged packet.
A threshold of three is used,
because the network may reorder packets causing duplicate acknowledgments.
This threshold avoids spurious re-transmissions due to reordering,
though it is still possible.

Indirectly “asking for segment-2”,
is actually just an ack of segment-1:
When pipelining, if we just lose one packet,
do we have to re-transmit everything,
or just the lost packet?

Sometimes, selective acknowledgments (SACKs) are used,
providing more explicit feedback on which packets have been received.
This improves TCP’s ability to re-transmit the right packets.

++++++++++++++++++++++++++ Cahoot-03-6

Timeout based re-transmission

Re-transmission due to a lost acknowledgment:

Cumulative acknowledgment avoids re-transmitting the lost segment:

What is the definition of control in CS/engineering?

Round Trip Time (RTT) and timeout duration:
First, to decide when a packet has probably been lost,
one needs to know a reasonable timeout threshold.
How do we compute a round-trip time (RTT)?
One good was is an exponential weighted moving average (EWMA).
Whenever a packet is sent, the sender sets a timer.
The timer is set to a conservative prediction of when that packet will be ack’ed.
If the sender does not receive an ACK by then,
then it transmits that packet again.

The timer is reset every time the sender receives an acknowledgment.
This means that the re-transmit timer expires only when the sender has received no acknowledgment for a long time.

Further, in the case a re-transmit timer has expired and no acknowledgment is received,
the next timer is set to twice the previous value (up to a certain threshold).
This helps defend against a man-in-the-middle denial of service attack,
that tries to fool the sender into making so many re-transmissions that the receiver is overwhelmed.

If the sender infers that data has been lost in the network,
using one of the two techniques described above,
then it re-transmits the data.
Such real-time changing of parameters based on the environment creates momentum,
feedback, and a dynamical distributed process,
especially when interacting with other TCP connections. Error detection

When reviewing the TCP header,
we talked about computing a checksum:
Fold bit-string into 16-bit page, sum bits, invert:
See checksum above, for packet corruption, not loss.
The TCP checksum is a weak check by modern standards
(statistically OK at detecting errors).
Data Link Layers with high bit error rates may require additional link error correction/detection capabilities.
The weak checksum is partially compensated for by the common use of a CRC or better integrity check at layer 2,
below both TCP and IP, such as is used in PPP or the Ethernet frame.
However, this does not mean that the 16-bit TCP checksum is redundant:
Remarkably, introduction of errors in packets between CRC-protected hops is common,
but the end-to-end 16-bit TCP checksum catches most of these simple errors.
This is the end-to-end principle at work.

A concrete example:
Receiver must do the same thing as sender, re-compute,
and compare the value to the checksum the sender sent:

+++++++++++++++++++++ Cahoot-03-7 Flow control

Broad principles:

As a consequence of pipelining,
we must consider the rate at which we send data,
for our receiver’s ability to process it.
TCP provides flow control by having the sender maintain a variable called the receive window.

Receive window in header above,
receive buffer in TCP block.

The receive window provides the sender an idea of how much free buffer space is available at the receiver.
Because TCP is full-duplex, the sender at each side of the connection maintains a distinct receive window.
In each TCP segment, the receiver specifies in the receive window field.
It defines the amount of additionally received data, in bytes,
that it is willing to buffer for the connection.
The sending host can send only up to that amount of data,
before it must wait for an acknowledgment and window update from the receiving host.

TCP sequence numbers and receive windows behave very much like a clock.
The receive window shifts each time the receiver receives and acknowledges a new segment of data.
Once it runs out of sequence numbers, the sequence number loops back to 0.

The great thing about TCP jokes is that you always get them.
The problem with TCP jokes is that I’ll keep retelling them slower until you get them! Congestion control

As a consequence of pipelining,
we must consider the rate at which we send data,
for our network’s ability to relay it.

A full topic on its own:




High level idea:
Q: How fast should sender send data, to not overwhelm the network?
A: ??

Q: How do we control the rate of sending data?
A: by controlling:
the number of packets out on the wire, un-acked,
and how rapidly we increase this number over time.

What is the definition of control in CS/engineering?

A quick preview (more detail to come):
cwnd is under fine-grained, step-wise contrtol.
ssthresh is a threshold that operates at a courser, larger, intermittent timescale.

Q: How do we know we’ve overwhelmed the network?
A: by observing:
* duplicate acks
* timeouts
* Explicit Congestion Notification (ECN) flag in TCP header (more rare)

Q: Which indicates worse congestion?
A: A point to ponder, more below.

TCP block data:

* Fine control of sending rate, modulated to adjust speed.
* A congestion window (cwnd) is maintained by the sender.
* cwnd is one of the factors that determines the number of bytes that can be outstanding at any time.
* It thus determines the rate at which sender sends data.
* This is not to be confused with the sliding window size which is maintained by the receiver, for the purpose of determining how much data the receiver’s buffers can handle.
* Instead, reducing the congestion window is a means of reducing the degree to which a link between the sender and the receiver is overloaded with too much traffic.
* It is reduced after observing evidence of congestion.
* When a connection is set up, the congestion window, a value maintained independently at each host, is set to a small multiple of the MSS allowed on that connection.

* A value adjusted only intermittently
* It is treated as an occasional set-point for modifying cwnd.
* A slow-start threshold (ssthresh) determines how long a ramp-up in speed occurs.

Both cwnd and ssthresh can be modulated over time, to control sending rate.
cwnd is fine-grained control.
ssthresh is an occasional resetting threshold.

Most modern implementations of TCP contain four intertwined parts:
1. slow-start
2. congestion avoidance
3. fast re-transmit (this is just 3xdup-ack)
4. fast recovery (in some versions only, varies)

* Slow start
* Begin increasing cwdnd, with a a slow exponential increase in sending rate (slow start).
* Continue slow start for some duration of time, defined by the slow-start-threshold (ssthresh).
* Congestion avoidance
* After cwnd increased to reach ssthresh, transition to congestion avoidance.
* In congestion avoidance, congestion window (cwnd) is dictated by an additive increase multiplicative decrease (AIMD) approach.
* Additive linear increase
* If all segments are received, and the acknowledgments reach the sender on time, then a constant is added to the window size (cwnd), at the rate of 1/(congestion window) segment on each new acknowledgment received.
* Multiplicative deacrease
* If a timeout occurs, and then we do a multiplicative decrease:
* Congestion window is reset to 1 MSS.
* ssthresh is set to half the congestion window size before the timeout.
* After a timeout, slow start is initiated again, until cwnd reaches the ssthresh.

  1. Slow start
  1. Congestion avoidance

* Linear growth of cwnd, until a loss, then an exponential reduction.
* Additive increase, multiplicative decrease (AIMD).
* Sawtooth is common behavior for a closed-loop control algorithm.

Types of congestion:

Mild congestion (dup-ack)
* TCP considers that the network is lightly congested if it receives three duplicate acknowledgements and performs a fast retransmit.
* If the fast retransmit is successful, this implies that only one segment has been lost.
* In this case, TCP performs multiplicative decrease and the congestion window is divided by 2.
* The slow-start threshold is set to the new value of the congestion window.

Severe congestion (timeout)
* TCP considers that the network is severely congested when its retransmission timer expires.
* In this case, TCP retransmits the first segment, sets the slow-start threshold to 50% of the congestion window.
* The congestion window is reset to its initial value, and TCP performs a slow-start.

  1. Fast re-transmit (this is just the dup-ack above).
  1. Fast recovery (“New Reno” version)
    03-Transport/cong_fast.png Summary

Such behavior could be expressed in pseudocode:

# Initialisation
cwnd = MSS;
ssthresh = swin;

# Ack arrival
if tcp.ack > snd.una:  # new ack, no congestion
    if  cwnd < ssthresh :
        # slow-start: increase quickly cwnd
        # double cwnd  every rtt
        cwnd = cwnd + MSS
        # congestion avoidance : increase slowly cwnd
        # increase cwnd by one mss every rtt
        cwnd = cwnd + mss * (mss / cwnd)
else: # duplicate or old ack
    if tcp.ack==snd.una:    # duplicate acknowledgement
        if dupacks==3:
            ssthresh=max(cwnd / 2, 2 * MSS)
        # ack for old segment, ignored

Expiration of the retransmission timer:
    send(snd.una)     # retransmit first lost segment
    sshtresh=max(cwnd / 2, 2 * MSS)

This is a summary of many of the above mechanisms illustrated in FSM form:
Note: pseudocode and FSM are not the same implementation.

+++++++++++++++++++++ Cahoot-03-8 Maximum segment size

This is an oddball field in TCP,
which is related to layers below (data-link).
* The maximum segment size (MSS) is the largest amount of data, specified in bytes, that a TCP entity is willing to receive in a single segment.
* Typically the MSS is announced by each side.
* MSS announcement is also often called “MSS negotiation”, though that’s a misnomer.
* Two completely independent values of MSS are permitted for the two directions of data flow in a TCP connection.

Why define a max size at the TCP layer?

IP constraints (one layer down from transport)
* IP fragmentation (more later), can lead to packet loss and excessive re-transmissions.
* Thus, to optimize performance, each TCP entitiy’s MSS should be set small enough.
* When the TCP connection is established, MSS announcment uses the MSS option field.

Why is there IP fragmentation?

Data-link constraints (two layers down from transport)
* Each entity is connected to a data-link layer (the next one down).
* There are many heterogeneous data-link networks between them.
* Each data link layer defines a maximum transmission unit (MTU), above which, it will not send.

* The initial MSS value is derived from the maximum transmission unit (MTU) size of the data link layer network each TCP entitiy is attached to.

After initialization
Dynamically adjust the MSS.
TCP senders can use path MTU discovery to infer the minimum data-link layer MTU along the network path between the sender and receiver.
This avoid IP fragmentation within the network.
Less fragmentation reduces loss.


1.10 Security at the transport layer

1.10.1 Port scans

Recall nmap?
Can you hide which ports are open?
Make a secret code to reveal them?

1.10.2 UDP Vulnerabilities

UDP packets can easily be spoofed for amplification, redirection, or flooding attacks.
It is easy to send UDP packets with spoofed source IP address.
The attacker sends a small message to a server, with spoofed source address,
and the server then responds to the spoofed address with a much larger response message.
This creates a larger volume of traffic to the victim,
than the attacker would be able to generate directly.

One approach is for the server to limit the size of its response,
ideally to the size of the client’s request,
until it has been able to verify that the client actually receives the packets,
sent to its claimed IP address; QUIC uses this approach.

1.10.3 TCP Vulnerabilities

http://seclab.cs.sunysb.edu/sekar/papers/netattacks.pdf DoS via SYN flood attack

During three-way handshake,
in response to a received SYN,
the server allocates and initializes connection variables and buffers.
The server then sends a SYNACK in response,
and awaits an ACK segment from the client.
If the client does not send an ACK,
to complete the third step of this 3-way handshake,
eventually, often after a minute or more,
the server will terminate the half-open connection,
and reclaim the allocated resources.
Attacker(s) send a large number of TCP SYN segments,
without completing the third handshake step.
With this deluge of SYN segments,
the server’s connection resources become exhausted,
as they are allocated (but never used!) for half-open connections
Legitimate clients are then denied service.
PUSH and ACK floods are used in other variants of flood attacks.

Some servers attempt to detect syn-floods and fake connections. Sockstress

Sockstress is an attack method similar to the floods above.
With a syn flood, the server may time out each attempted connection.
To avoid this protective defense,
Sockstress provides more convincing evidence of real connection.
Sockstress is a user-land TCP socket stress framework,
that can complete arbitrary numbers of open sockets,
without incurring the typical overhead of tracking state.
Once the socket is established,
it is capable of sending TCP attacks,
that target specific types of kernel and system resources such as:
Counters, Timers, and Memory Pools.

Example, sub-variant:
One way to convince the server each connection is real,
increasing timeouts, using a fake connect, and then send window of 0.
03-Transport/sockstress-0window-option.jpg Connection hijacking: TCP sequence prediction attack

Imagine an attacked who is on the same LAN, but is not the gateway router.
Assuming the attacker does not control the full infrastructure (as an ISP would),
and thus can not be a perfect MITM,
the attacker wants to send fake packets to a victim.
The attacker hopes to correctly guess the sequence number to be used by the sending host.
If they can do this, they may be able to send counterfeit packets to the receiving host.
These packets which will seem to originate from the real sending host.
The attacker may issue packets using the same source IP address as a host (spoofed).
By monitoring the traffic before an attack is mounted,
the malicious host can figure out the correct sequence number.
After the IP address and the correct sequence number are known,
it is a race between the attacker and the trusted host to get the correct packet sent.
The attacker may DoS it’s victim.
Once the attacker has control over the connection,
they are able to send counterfeit packets without getting a response.
If an attacker can cause delivery of counterfeit packets of this sort,
they may be able to cause various sorts of mischief, including:
the injection into an existing TCP connection, of data of the attacker’s choosing,
and the premature closure of an existing TCP connection,
by the injection of counterfeit packets with the RST bit set.
(TCP reset attack).
A TCP reset attack is easier if attacker is the ISP, why?
An attacker who is able to eavesdrop a TCP session,
and redirect packets, can hijack a TCP connection.
To do so, the attacker learns the sequence number from the ongoing communication,
and forges a false segment that looks like the next segment in the stream.
Such a simple hijack can result in one packet being erroneously accepted at one end.
When the receiving host acknowledges the extra segment,
to the other side of the connection,
synchronization is lost.
Hijacking might be combined with Address Resolution Protocol (ARP) spoofing,
or routing attacks that allow taking control of the packet flow,
so as to get permanent control of the hijacked TCP connection.
These are lower level, which we have not covered yet.
Partial fix:
Initial sequence number is now chosen at random.
ARP spoofing (lower layer) we’ll cover latter,
as an extension to this attack. TCP veto

An attacker who can eavesdrop,
and predict the size of the next packet to be sent,
can cause the receiver to accept a malicious payload,
without disrupting the existing connection.
The attacker injects a malicious packet with the sequence number,
and a payload size of the next expected packet.
When the legitimate packet is ultimately received,
it is found to have the same sequence number and length as a packet already received,
and is silently dropped as a normal duplicate packet,
and the legitimate packet is “vetoed” by the malicious packet.
Unlike in connection hijacking,
the connection is never de-synchronized,
and communication continues as normal,
after the malicious payload is accepted.
TCP veto gives the attacker less control over the communication,
but makes the attack particularly resistant to detection.
https://ieeexplore.ieee.org/document/6497785 TCP reset attack

In most packets the reset RST bit is set to 0, and has no effect.
However, if the RST bit is set to 1,
then it indicates to the receiving computer that:
the computer should immediately stop using the TCP connection.
Thus, it should not send any more packets,
and discards any further packets it receives,
with headers indicating they belong to that connection.
A TCP reset kills a TCP connection instantly.
It is possible for a 3rd computer to monitor the TCP packets on the connection,
and then send a “forged” packet containing a TCP reset to one or both endpoints.

Who performs this attack?
* Those with physical access to the wires/waves.
* The Great Firewall of China and Iranian internet censorship mechanisms are known to use TCP reset attack to interfere with and block connections, as a method to carry out Internet censorship.
* Comcast did this to people too:
* In late 2007, Comcast began using forged TCP resets to cripple peer-to-peer and certain groupware applications on their customers’ computers.
* https://en.wikipedia.org/wiki/TCP_reset_attack#Comcast_Controversy

1.10.4 TLS


1.11 Other protocols

1.11.1 uTP

1.12 Raw socket programming

Random examples of the hard way:

1.12.1 Background


1.12.2 UDP raw sockets

Show these behave in wireshark: Basics

Run the UDP server from the last lecture page before this.

A Packet sniffer for UDP (server)

We can spoof port using this header.

1.12.3 TCP raw sockets

Can we fake an a source IP address using just the TCP or UDP transport layer header?
Do we have to fake any more headers?
Does the IP address go into the TCP header anywhere?
Do we have to fake any more headers to spoof a source IP address?
What is the difference between faking a DNS reply (as we did in UDP),
and faking a source IP address?

++++++++++++ Cahoot-03-9 Basics

++++++++++++ Cahoot-03-10 SYN flood

1.13 Spoofing network packets

In python (the easy way):

Scapy is really a nice library!
We’ll do an assignment with this upcoming.

Next: 04-NetworkData.html