1 02-Application


Previous: 01-Overview.html

1.1 Audio-recording

1.2 Opening thought

“Web users ultimately want to get at data quickly and easily.
They don’t care as much about attractive sites and pretty design.”

- Tim Berners-Lee

Is this assumption currently true?
Was it ever true?
Is it true for some people at least?

1.3 Reading

1.4 Overview

1.5 Network applications

End system communication at application layer
02-Application/app_layer00.png

Goal: write programs that:
* run on (different) end systems
* communicate over network
* e.g., web server software communicates with browser software

No need (or desire) to write software for network-core devices:
* network-core devices do not (and SHOULD not) run user applications!
* applications on end systems allows for rapid app development, propagation, innovation
* this is changing somewhat, which could impede the development of new protocols

Client-server versus peer-to-peer (P2P)
02-Application/app_layer01.png

1.5.1 Client-server architecture

02-Application/client-serrver.png

1.5.1.1 Server

1.5.1.2 Client

1.5.2 P2P architecture

++++++++++++++ Cahoot-02-1

Discussion question:
* If p2p works well, why has it not become the norm?
* What are some other pros/cons of p2p architectures?

1.5.3 Process communication via sockets

This is a hint of the next layer, transport
02-Application/app_layer02.png

1.5.3.1 Processes

client process: process that initiates communication
server process: process that waits to be contacted

aside: applications with P2P architectures have both client processes and server processes

1.5.3.2 Sockets

1.5.3.3 Addresses

Q: does the IP address of a host, on which a process runs, suffice for identifying the process?
A: no, many processes can be running on one host

For example, to send HTTP message to gaia.cs.umass.edu web server:
IP address: 128.119.245.12
port number: 80

Hosts are often identified by IP address
* IP address is a 32-bit quantity uniquely identifying the host, in ipv4.
* Addresses are 128 bit for ipv6.
* We ran out of ipv4 addresses…
* The sending process must also identify the receiving process (more specifically, the receiving socket) running in the host.
* This information is needed because in general a host could be running many network applications.
* A destination port number serves this purpose.

++++++++++++++ Cahoot-02-2

1.5.4 Preview of transport services

data integrity
* some apps (e.g., file transfer, web transactions) require 100% reliable data transfer
* other apps (e.g., audio) can tolerate some loss

timing
* some apps (e.g., Internet telephony, interactive games) require low delay to be “effective”
* some apps (e.g., file download) tolerate delay

throughput
* some apps (e.g., multimedia) require minimum amount of throughput to be “effective”
* other apps (“elastic apps”) make use of whatever throughput they get

security
* encryption, data integrity, …

1.5.4.1 Multiple types of service:

Unreliable datagram (UDP)
02-Application/service.png

Byte-stream (TCP)
02-Application/stream.png

Service requirements for applications?
02-Application/app_layer03.png

Applications usually choose between TCP and UDP
* TCP and UDP are transport layer
* Employed by most application layer programs
* Other Transport layer, or pseudo-transport layer protocols exist.
* SCTP (stream control transmission protocol), SSU (I2P app), DCCP, RUDP, UDP-lite, etc.
* An application designer could design their own transport layer protocol, since Transport layer and up runs on end hosts, as opposed to network infrastructure.
* Could build into the core/kernel of end operating systems and languages as new socket type
* Could also just design the features into the application layer, rather than actually get a transport protocol built into the kernel of an OS.
* UDP lets you build new things!

1.5.4.1.1 TCP

Connection-oriented service
* TCP has the client and server exchange transport-layer control information with each other, before the application-level messages begin to flow.
* This so-called handshaking procedure alerts the client and server, allowing them to prepare for an onslaught of packets.
* After the handshaking phase, a TCP connection is said to exist between the sockets of the two processes.
* The connection is a full-duplex connection, in that the two processes can both send messages to each other over the connection at the same time, bi-directionally
* When the application finishes sending messages, it must tear down the connection.

TCP has a Reliable data transfer service
* The communicating processes can rely on TCP to deliver all data sent without error and in the proper order.

TCP also includes a congestion-control mechanism
* The TCP congestion-control mechanism throttles a sending process (client or server) when the network is congested between sender and receiver.

Summary
* reliable transport: between sending and receiving process
* flow control: sender won’t overwhelm receiver
* congestion control: throttle sender when network overloaded
* does not provide: timing, minimum throughput guarantee, security
* connection-oriented: setup required between client and server processes

1.5.4.1.2 UDP

UDP service:
* unreliable data transfer between sending and receiving process
* does not provide: reliability, flow control, congestion control, timing, throughput guarantee, security, or connection setup,

Discussion questions:
* Why bother with UDP?
* With TCP, why is there a UDP?

++++++++++++++ Cahoot-02-3

1.5.4.2 Encryption

https://en.wikipedia.org/wiki/Transport_Layer_Security

Neither base TCP nor UDP provide any encryption!

An Enhancement for TCP provides:
1. encryption,
2. data integrity, and
3. end-point authentication.

The great thing about a TLS joke,
is that you can tell if it’s not the original…

Transport layer protocols used
02-Application/app_layer04.png

Tunneling:
Inner-most -> Outer-more… -> Outer-most
Application -> TLS -> TCP -> IP -> MAC -> Ethernet -> Physical

More detail here:
* 05-Security.html
* ../../Security/Content/12a-AppliedCryptoSystems.html

1.5.5 Application-layer protocols

An application-layer protocol defines how an application’s processes, running on different end systems, pass messages to each other, for example:
* The types of messages exchanged, for example, request messages and response messages
* E.g., request, response
* The syntax of the various message types, such as the fields in the message and how the fields are delineated
* The semantics of the fields
* meaning of the information in the fields
* Rules for determining when and how a process sends messages and responds to messages, and change state.

open protocols:
defined in RFCs
allows for interoperability
e.g., HTTP, SMTP
proprietary protocols:
e.g., Skype (used to be open, fun story)

1.6 Web and HTTP

1.6.1 Web protocol example

Observe HTTP with Wireshark:

$ nc -C info.cern.ch 80
$ GET / HTTP/1.1
$ Host: info.cern.ch
$ ncat -C hackware.ru 80
$ GET / HTTP/1.0
$ Host: hackware.ru

Trace HTTP conversation in Wireshark
Observe each packet has headers from multiple layers

These must be typed exactly, or they will not work!
ncat -C option is for crlf: $ man ncat to read more

Encrypted option:

$ ncat -C --ssl hackware.ru 443
$ GET / HTTP/1.0
$ Host: hackware.ru

1.6.2 Web server and clients

02-Application/app_layer05.png
02-Application/httpreqresp.png

Web pages
* A Web page (also called a document) consists of objects.
* An object is simply a file such as an HTML file, a JPEG image, a Java applet (lol…), or a video clip that is addressable by a single URL.
* If a Web page contains HTML text and five JPEG images, then the Web page has six objects: the base HTML file plus the five images.
* The base HTML file references the other objects in the page with the objects’ URLs.
* Each URL has two components:
* the hostname of the server that houses the object and
* the object’s path name.
* For example, the URL
http://www.someSchool.edu/someDepartment/picture.gif
has www.someSchool.edu for a hostname and
/someDepartment/picture.gif for a path name.

02-Application/webpage.png
http://www.w3.org/MarkUp/ defines the standard

HTTP/1 and HTTP/2 use TCP (not UDP)
* The HTTP client first initiates a TCP connection with the server.
* Once the connection is established, the browser and the server processes access TCP through their socket interfaces.
* Server sends requested files to clients without storing any state information about the client, a stateless protocol.

1.6.3 Persistence

HTTP can choose either:
* Each request/response pair sent over a separate TCP connection (non-persistent connections), or
* All of the requests and their corresponding responses sent over the same TCP connection (persistent connections)

HTTP sequence
A base HTML file and 10 JPEG images, and that all 11 of these objects reside on the same server: http://www.someSchool.edu/someDepartment/home.index
1. HTTP client process initiates a TCP connection to the server https://www.someSchool.edu on port number 80, which is the default port number for HTTP. Associated with the TCP connection, there will be a socket at the client and a socket at the server.
2. HTTP client sends an HTTP request message to the server via its socket. The request message includes the path name /someDepartment/home.index.
3. HTTP server process receives the request message via its socket, retrieves the object /someDepartment/home.index from its storage (RAM or disk), encapsulates the object in an HTTP response message, and sends the response message to the client via its socket.
4. HTTP server process tells TCP to close the TCP connection. (But TCP doesn’t actually terminate the connection until it knows for sure that the client has received the response message intact.)
5. HTTP client receives the response message. The TCP connection terminates. The message indicates that the encapsulated object is an HTML file. The client extracts the file from the response message, examines the HTML file, and finds references to the 10 JPEG objects.
6. first four steps are then repeated for each of the referenced JPEG objects.

Time to fill a request
02-Application/app_layer06.png

non-persistent HTTP
* at most one object sent over TCP connection
* connection then closed
* downloading multiple objects required multiple connections

persistent HTTP
* multiple objects can be sent over single TCP connection between client, server

Disadvantages of non-persistent connections
* First, a brand-new connection must be established and maintained for each requested object.
* For each of these connections, TCP buffers must be allocated and TCP variables must be kept in both the client and server.
* Each object suffers a delivery delay of two RTTs one RTT to establish the TCP connection and one RTT to request and receive an object.

Persistent connections
* With persistent connections, the server leaves the TCP connection open after sending a response.
* Subsequent requests and responses between the same client and server can be sent over the same connection.
* Multiple Web pages residing on the same server can be sent from the server to the same client over a single persistent TCP connection.
* Requests for objects can be made back-to-back, without waiting for replies to pending requests (pipelining).
* Typically, the HTTP server closes a connection when it isn’t used for a certain time (a configurable timeout interval).

persistent HTTP:
* server leaves connection open after sending response
* subsequent HTTP messages between same client/server sent over open connection
* client sends requests as soon as it encounters a referenced object
* as little as one RTT for all the referenced objects

++++++++++++++ Cahoot-02-4

1.6.4 Message format

two types of HTTP messages:
1. request,
2. response

HTTP request message:
ASCII (human-readable format)

1.6.4.1 HTTP request message

GET /somedir/page.html HTTP/1.1
Host: www.mst.edu
Connection: close
User-agent: Mozilla/5.0
Accept-language: en

General request
02-Application/app_layer07.png
sp=space; cr=carriage return; lf=line feed

Example:

GET /index.html HTTP/1.1\r\n
Host: www-net.cs.umass.edu\r\n
User-Agent: Firefox/3.6.10\r\n
Accept: text/html,application/xhtml+xml\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
\r\n

1.6.4.2 HTTP Response Message

HTTP/1.1 200 OK
Connection: close
Date: Tue, 09 Aug 2011 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 09 Aug 2011 15:11:03 GMT
Content-Length: 6821
Content-Type: text/html
(data data data data data ...
The entity body is the
meat of the message,
it contains the requested object itself)

General reply
02-Application/app_layer08.png
sp=space; cr=carriage return; lf=line feed

Example:

HTTP/1.1 200 OK\r\n
Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n
Server: Apache/2.0.52 (CentOS)\r\n
Last-Modified: Tue, 30 Oct 2007 17:00:02 GMT\r\n
ETag: "17dc6-a5c-bf716880"\r\n
Accept-Ranges: bytes\r\n
Content-Length: 2652\r\n
Keep-Alive: timeout=10, max=100\r\n
Connection: Keep-Alive\r\n
Content-Type: text/html; charset=ISO-8859-1\r\n
\r\n
data data data data data ...

Server responses
Status code appears in 1st line in server-to-client response message.

The best thing about 404 jokes is …
wait, damnit, it’s around here somewhere…

418: I’m a teapot
* A “real” joke built into the protocol
* https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/418
* https://en.wikipedia.org/wiki/Hyper_Text_Coffee_Pot_Control_Protocol
* https://save418.com/

Old example: Open TCP connection, send GET request

telnet cis.poly.edu 80
GET /~ross/ HTTP/1.1
Host: cis.poly.edu

E.g.,
02-Application/Http_request_telnet.png

Note: ncat has generally replaced telnet, though they both still work

nc -C info.cern.ch 80
GET / HTTP/1.1
Host: info.cern.ch

1.6.4.3 Uploading form input

POST method:
* web page often includes form input
* input is uploaded to server in entity body (i.e., message part of packet)
URL method:
* uses GET method
* input is uploaded in URL field of request line:
www.somesite.com/animalsearch?monkeys&banana

We’ll demo this later!

1.6.5 Method types

Demonstrate:
* More wireshark HTTP examples in detail
* http://info.cern.ch/
* via nc
* via a browser that does not generate junk traffic:
* epiphany or surf or qutebrowser
* Record it in Wireshark
* Identify HTTP headers, match them to fields

1.6.6 Cookies

Many Web sites use cookies.
Four components:
1) cookie header line of HTTP response message
2) cookie header line in next HTTP request message
3) cookie file kept on user’s host, managed by user’s browser
4) back-end database at Web site

Example:
* Susan always access Internet from PC
* visits specific e-commerce site for first time
* when initial HTTP requests arrives at site, site creates:
* unique ID
* entry in backend database for ID
02-Application/app_layer09.png

what cookies can be used for:
* authorization
* shopping carts
* recommendations
* user session state (Web e-mail)
* tracking

how to keep “state”:
* protocol endpoints: maintain state at sender/receiver over multiple transactions
* cookies: http messages carry state

cookies and privacy:
* cookies permit sites to learn a lot about you
* you may supply name and e-mail to sites

1.6.7 Caching

Proxy server can cache
goal: satisfy client request without involving origin server

Bottleneck
02-Application/app_layer11.png

Caching helps bottleneck
02-Application/app_layer12.png

Demos
* Briefly show (not link) $ Webserver.py
* Run it, show visiting in browser http://localhost:6789

1.7 QUIC

Some interesting reading:
* QUIC (Google’s web protocol on top of UDP)
* http://intronetworks.cs.luc.edu/current2/uhtml/udp.html#quic
* https://en.wikipedia.org/wiki/QUIC
* https://daniel.haxx.se/blog/2018/11/11/http-3/
* https://blog.cloudflare.com/the-road-to-quic/
* Both innovative, and breaks federated interoperability. Pros and Cons

I received this HTTP 200 joke.
It was OK…

++++++++++++++ Cahoot-02-5

1.7.1 Demo first Wireshark lab on HTTP myself in class

Simple file
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file1.html
GET, OK, etc.

Refreshing a cached page
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file2.html
IF-MODIFIED-SINCE, refresh, re-sent?

Large file
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file3.html
Initial HTTP GET, TCP segments, how many HTTP OK, when?

Multiple-parts
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file4.html
Notice the image retrieval.
What entity is responsible for requesting the multiple objects in a page, when?

“Secure” web-page with login
http://gaia.cs.umass.edu/wireshark-labs/protected_pages/HTTP-wireshark-file5.html
username: wireshark-students
password: network
auth field of request

#!/usr/bin/python3

import base64
coded_string = "d2lyZXNoYXJrLXN0dWRlbnRz="
base64.b64decode(coded_string)
#!/bin/bash

echo -n d2lyZXNoYXJrLXN0dWRlbnRz= | base64 -d

Let’s think about http, privacy, and security in various scenarios:
https://www.eff.org/pages/tor-and-https

Before continuing, review TCP, UDP, and multi-threading below!

1.8 FTP

02-Application/app_layer13.png

FTP control and data connections
* FTP uses two parallel TCP connections to transfer a file, a control connection and a data connection.
* The control connection is used for sending control information between the two hosts, information such as user identification, password, commands to change remote directory, and commands to “put” and “get” files.
* The data connection is used to actually send a file.
* FTP is said to send its control information out-of-band.
* HTTP sends request and response header lines into the same TCP connection that carries the transferred file itself, named in-band.

02-Application/app_layer14.png

FTP sequence
* When a user starts an FTP session with a remote host, the client side of FTP (user) first initiates a control TCP connection with the server side (remote host) on server port number 21.
* client side of FTP sends the user identification and password over this control connection.
* client side of FTP also sends, over the control connection, commands to change the remote directory.
* When the server side receives a command for a file transfer over the control connection (either to, or from, the remote host), the server side initiates a TCP data connection to the client side.
* FTP sends exactly one file over the data connection and then closes the data connection.
* If, during the same session, the user wants to transfer another file, FTP opens another data connection.
* Control connection remains open throughout the duration of the user session, but a new data connection is created for each file transferred within a session (data connections are non-persistent).

FTP requests
* Commands, from client to server, and replies, from server to client, are sent across the control connection in 7-bit ASCII format.
* In order to delineate successive commands, a carriage return and line feed end each command.
* Each command consists of four uppercase ASCII characters, some with optional arguments:
* USER username: Used to send the user identification to the server.
* PASS password: Used to send the user password to the server.
* LIST: Used to ask the server to send back a list of all the files in the current remote directory. The list of files is sent over a (new and non-persistent) data connection rather than the control TCP connection.
* RETR filename: Used to retrieve (that is, get) a file from the current directory of the remote host. This command causes the remote host to initiate a data connection and to send the requested file over the data connection.
* STOR filename: Used to store (that is, put) a file into the current directory of the remote host.

FTP replies Some typical replies, along with their possible messages, are as follows:
* 331 Username OK, password required
* 125 Data connection already open; transfer starting
* 425 Can’t open data connection
* 452 Error writing file

Demonstrate:
* [ ] Find an open ftp site, watch connection with wireshark
* With sftp, do we see any application layer protocol details with Wireshark?

1.9 E-mail

1.9.1 Email protocol example

Observe SMTP with wireshark (does any of this show in wireshark)

ncat -C smtp.zoho.com 587

Does any of the application layer information show in wireshark here?
ncat --ssl -C smtp.zoho.com 465
HELO web.site, MAIL FROM, RCPT TO, DATA, QUIT

Observe POP

ncat --ssl -C pop.zoho.com 995
user bob
pass password
list

1.9.2 Overview: SMTP, mail servers, mail user agents

02-Application/app_layer15.png
Notes:
* user-agent is local, but also remote.
* user-agent used to be on remote machine
* then, real mail user-agents
* then user-agent back on remote machine
* Mail server is a messy multi-part aggregate of things in software

How many people still use a real, local, MUA?

1.9.3 SMTP

Electronic Mail: SMTP
[RFC 2821]

Alice sends a message to Bob
02-Application/app_layer16.png

Basic process
1. Alice invokes her user agent for e-mail, provides Bob’s e-mail address (for example, bob@someschool.edu), composes a message, and instructs the user agent to send the message.
2. Alice’s user agent sends the message to her mail server, where it is placed in a message queue.
3. The client side of SMTP, running on Alice’s mail server, sees the message in the message queue. It opens a TCP connection to an SMTP server, running on Bob’s mail server.
4. After some initial SMTP handshaking, the SMTP client sends Alice’s message into the TCP connection.
5. At Bob’s mail server, the server side of SMTP receives the message. Bob’s mail server then places the message in Bob’s mailbox.
6. Bob invokes his user agent to read the message at his convenience.

Example SMTP transcript
Hostname of the client is crepes.fr
Hostname of the server is server.edu

S: 220 server.edu
C: EHLO crepes.fr // a nicer HELO
S: 250 Hello crepes.fr, pleased to meet you
C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr ... Sender ok
C: RCPT TO: <bob@server.edu>
S: 250 bob@server.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 server.edu closing connection

Another SMTP example
02-Application/smtp.png

1.9.3.1 If login is required

base64 encoding is required for username and password:
https://en.wikipedia.org/wiki/Base64

c: AUTH LOGIN
s: 334 VXNlcm5hbWU6
c: yourusernameinb64encoding
s: 334 VXNlcm5hbWU6
c: yourpasswordinb64encoding

To get base64 encoding of a string:

# encode in bash
$ echo -n 'string' | base64

# decode in bash
$ echo -n c3RyaW5nCg== | base64 -d
# In python:
>>> import base64
>>> base64.b64encode('string'.encode())
>>> base64.b64decode('c3RyaW5n')

then, you can proceed sending:

C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr ... Sender ok
C: RCPT TO: <bob@server.edu>
S: 250 bob@server.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 server.edu closing connection

1.9.3.2 Notes

SMTP uses persistent connections
SMTP requires message (header and body) to be in 7-bit ASCII
SMTP server uses CRLF.CRLF to determine end of message

comparison with HTTP:
* HTTP: pull
* SMTP: push
* both have ASCII command/response interaction, status codes
* HTTP: each object encapsulated in its own response message
* SMTP: multiple objects sent in multipart message

1.9.4 Message format

Message header
* Header containing peripheral information that precedes the body of the message itself.
* The header lines and the body of the message are separated by a blank line (CRLF).
* RFC 5322 specifies the exact format for mail header lines as well as their semantic interpretations.
* As with HTTP, each header line contains readable text, consisting of a keyword followed by a colon followed by a value.
* Some of the keywords are required and others are optional.
* Every header must have a From: header line and a To: header line; a header may include a Subject: header line as well as other optional header lines.

From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Searching for the meaning of life.

Show: Open an email in Mutt/raw to illustrate headers, MIME, multipart

1.9.5 Access protocols

Email protocols and direction of communication
When sent an email by Alice, how does a recipient like Bob, running a user agent on his local PC, obtain his messages, which are sitting in a mail server within Bob’s mail provider?
02-Application/app_layer17.png
* Post Office Protocol—Version 3 (POP3)
* Internet Mail Access Protocol (IMAP)
* HTTP

1.9.5.1 POP3

C: client
S: server
ncat mailServer 110
S: +OK POP3 server ready
C: user bob
S: +OK
C: pass hungry
S: +OK user successfully logged on
C: list
S: 1 498
S: 2 912
S: .
C: retr 1
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 1
C: retr 2
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 2
C: quit
S: +OK POP3 server signing off

Another POP3 example
02-Application/pop3.png

1.9.5.2 IMAP

++++++++++++++ Cahoot-02-6

1.10 Domain name resolution (DNS)

Discussion question:
At first guess, would you think the internet has a kill-switch, like it might in a Hollywood movie?
If it did, what might the consequences be?
On businesses?
On people?
In the USA?
In China?
In Russia?
In Kazakhstan?
etc.

02-Application/DNS-Server.png

“The Domain Name Server (DNS) is the Achilles heel of the Web.
The important thing is that it’s managed responsibly.”

-Tim Berners-Lee

1.10.1 Basic idea

People: many identifiers:
* SSN, name, passport #, etc.,

Internet hosts, routers:
* IP address (32 bit for IPv4) - used for addressing datagrams
* Easy to remember “name”, e.g., https://www.yahoo.com - used by humans

The big questions:
* How to securely and fairly map between IP address and name, and vice versa ?
* What happens if you want a name someone else has?
* What happens if someone wants the name you have?
* Is the entity with the name really the remote entity?
* Are security and fairness opposed?
* Are both opposed to policing the space, to censorship?

1.10.2 DNS: the problem

(double-meaning intended)

1.10.3 Basics of DNS: host aliasing

Web browser example:
1. User machine runs the client side of the DNS application.
2. A web browser extracts the hostname, https://www.someschool.edu, from a URL entered by the user, and passes the hostname to the client side of the DNS application.
3. The local DNS client sends a query containing the hostname to a somewhat-remote DNS server (or a chain of such servers).
4. The DNS client eventually receives a reply, which includes the IP address for the hostname.
5. Once the web browser receives the IP address from DNS, it can initiate a TCP connection to the HTTP server process located at port 80 at that IP address.

Domain Name System:
* is a distributed database implemented in a hierarchy of many name servers
* application-layer protocol:
* hosts, name servers communicate to resolve names (address/name translation)
* clients request name/address bindings
* This is a core Internet function.
* an example of keeping the complexity at network’s “edge”, where it can be maintained.

1.10.4 DNS: services, structure

1.10.5 History of DNS

Discussion Question:
* What are several reasons an entity might want to steal a network name?
* Would you guess that all such purposes bad?

02-Application/dns_process.png

1.10.6 Partial hierarchy of DNS servers

02-Application/app_layer18.png
DNS is just a pyramid scheme…

client wants IP for https://www.amazon.com; 1st approximation:
* client queries public root server (directly or indirectly) to find .com DNS server
* client queries .com DNS server to get amazon.com’s public-facing DNS server
* client queries amazon.com’s public-facing DNS server to get IP address for https://www.amazon.com

1.10.6.1 Root DNS servers

https://en.wikipedia.org/wiki/Root_name_server
https://en.wikipedia.org/wiki/DNS_root_zone
02-Application/app_layer19.png
* root servers are contacted by local name server that can not resolve name
* root contacts authoritative name server if name mapping not known
* gets mapping
* returns mapping to local name server

1.10.6.2 Top-level domain (TLD) servers

https://en.wikipedia.org/wiki/Top-level_domain
* responsible for com, org, net, edu, aero, jobs, museums, and all top-level country domains, e.g.: uk, fr, ca, jp
* Network Solutions maintains servers for .com TLD
* Educause for .edu TLD

If https://mst.eu is available…
What fun things could we do with that…?

Ask: How can one “be” an EU resident on the internet?

1.10.6.3 Authoritative DNS servers

1.10.6.4 Local DNS name server / Resolver

02-Application/dns_resolve.jpg

1.10.7 DNS query example

Show: some Wireshark observations of nslookup for various types of record (overview this time, more detail again lower).

  1. Visit https://mst.edu with web browser

  2. Make a manual query using command line tools

#!/bin/bash

nslookup mst.edu
nslookup www.mst.edu

dig mst.edu
dig www.mst.edu

whois mst.edu
whois icann.org

# What are the authoritative servers?
nslookup -type=NS mst.edu

# What do the authoritative servers say?
nslookup mst.edu ns-1.mst.edu
  1. Use python to query the same
#!/usr/bin/python3

import socket

print(socket.gethostbyname('mst.edu'))

https://en.wikipedia.org/wiki/WHOIS
WHOIS going to tell us a Domain Name joke?

1.10.8 DNS server interaction

02-Application/dnslevels.png

Standard iterated query
Some host at cis.poly.edu wants IP address for gaia.cs.umass.edu
02-Application/app_layer20.png
Iterated query:
Contacted server replies with name of server to contact.
“I don’t know this name,
but ask this other server who is responsible for knowing,
or is responsible for asking some server that is.”

Recursive queries
02-Application/app_layer21.png
Recursive query:
Puts burden of name resolution on contacted name server.
Heavy load at upper levels of hierarchy?

Q:
Can one’s own machine just do the query to root, TLD, and authoritative?
Why bother with the institutional resolver?

A:
Yes, if you set up your own DNS server (easy).
Just install bind, and configure it.
It’s just extra functionality not built into every client.

++++++++++++++ Cahoot-02-7

DNS caching, updating records
02-Application/DNS_Architecture.png
* once (any) name server / resolver learns mapping, it caches the mapping for a while
* cache entries timeout (disappear) after some Time To Live (TTL)
* TLD server information is typically cached in local name servers
* thus root name servers visited less
* cached entries may be out-of-date (best effort name-to-address translation!)
* if name host changes IP address, may not be known Internet-wide until all TTLs expire
* update/notify mechanisms proposed IETF standard
* RFC 2136

DNS is at the root of many internet problems…

1.10.9 DNS protocol, message format

https://en.wikipedia.org/wiki/Domain_Name_System#DNS_message_format

Query and reply messages, both with same overall message format

Message header
* identification:
* 16 bit number for query,
* reply to query uses same number as query
* flags:
* query or reply
* recursion desired
* recursion available
* reply is authoritative
02-Application/app_layer22.png

The header of DNS messages is composed of 12 bytes and its structure is shown in the figure below.

The QR flag is set to 0 in DNS queries and 1 in DNS answers.

The Opcode is used to specify the type of query.
For instance, a standard query is when a client sends a name, and the server returns the corresponding data.
An update request is when the client sends a name, and new data, and the server then updates its database.

The AA bit is set, when the server that sent the response has authority for the domain name found in the question section.
In the original DNS deployments, two types of servers were considered : authoritative servers and non-authoritative servers.
The authoritative servers are managed by the system administrators responsible for a given domain.
They always store the most recent information about a domain.
Non-authoritative servers are servers or resolvers that store DNS information about external domains without being managed by the owners of a domain.
They may thus provide answers that are out of date.
From a security point of view, the authoritative bit is not an absolute indication about the validity of an answer.

Ask: Is this secure?
It uses UDP; what does this imply?

Where TC is set, the partial RRSet that would not completely fit may be left in the response.
When a DNS client receives a reply with TC set, it should ignore that response, and query again, using a mechanism, such as a TCP connection, that will permit larger replies.

The RD (recursion desired) bit is set by a client when it sends a query to a resolver.
Such a query is said to be recursive because the resolver will recurse through the DNS hierarchy to retrieve the answer on behalf of the client.
In the past, all resolvers were configured to perform recursive queries on behalf of any Internet host.
However, this exposes the resolvers to several security risks.
The simplest one is that the resolver could become overloaded by having too many recursive queries to process.
As of this writing, most resolvers only allow recursive queries from clients belonging to their company or network and discard all other recursive queries.

The RA bit indicates whether the server supports recursion.

The RCODE is used to distinguish between different types of errors. See RFC 1035 for additional details.

The last four fields indicate the size of the Question, Answer, Authority and Additional sections of the DNS message.

1.10.10 DNS records (RR)

Name indicates the name of the node to which this resource record pertains.

The two bytes Type field indicates the type of resource record.

The Class field was used to support the utilization of the DNS in other environments than the Internet.

The TTL field indicates the lifetime of the Resource Record in seconds.
This field is set by the server that returns an answer and indicates for how long a client or a resolver can store the Resource Record inside its cache.
A long TTL indicates a stable RR.
Some companies use short TTL values for mobile hosts and also for popular servers.
For example, a web hosting company that wants to spread the load over a pool of hundred servers can configure its nameservers to return different answers to different clients.
If each answer has a small TTL, the clients will be forced to send DNS queries regularly.
The nameserver will reply to these queries by supplying the address of the less loaded server.

The RDLength field is the length of the RData field that contains the information of the type specified in the Type field.

Several types of DNS RR are used in practice.
* The A type is used to encode the IPv4 address that corresponds to the specified name.
* The AAAA type is used to encode the IPv6 address that corresponds to the specified name.
* A NS record contains the name of the DNS server that is responsible for a given domain.

type=A
* name is hostname
* value is IP address

type=NS
* name is domain (e.g., foo.com)
* value is hostname of authoritative name server for this domain

type=CNAME
* name is alias name for some “canonical” (the real) name
* https://www.ibm.com is really servereast.backup2.ibm.com
* value is canonical name
* CNAME (or canonical names) are used to define aliases.
* For example https:_www.example.com could be a CNAME for pc12.example.com that is the actual name of the server on which the web server for https:_www.example.com runs.

type=MX
* value is name of mailserver associated with name

There are more record types (summary of commonly used):
https://en.wikipedia.org/wiki/List_of_DNS_record_types
* Address Mapping record (A Record) also known as a DNS host record, stores a hostname and its corresponding IPv4 address.
* IP Version 6 Address record (AAAA Record) stores a hostname and its corresponding IPv6 address.
* Canonical Name record (CNAME Record) can be used to alias a hostname to another hostname. When a DNS client requests a record that contains a CNAME, which points to another hostname, the DNS resolution process is repeated with the new hostname.
* Mail exchanger record (MX Record) specifies an SMTP email server for the domain, used to route outgoing emails to an email server.
* Name Server records (NS Record) specifies that a DNS Zone, such as “example.com” is delegated to a specific Authoritative Name Server, and provides the address of the name server.
* Reverse-lookup Pointer records (PTR Record) allows a DNS resolver to provide an IP address and receive a hostname (reverse DNS lookup).
* Certificate record (CERT Record) stores encryption certificates PKIX, SPKI, PGP, and so on.
* Service Location (SRV Record) a service location record, like MX but for other communication protocols.
* Text Record (TXT Record) typically carries machine-readable data such as opportunistic encryption, sender policy framework, DKIM, DMARC, etc.
* Start of Authority (SOA Record) this record appears at the beginning of a DNS zone file, and indicates the Authoritative Name Server for the current DNS zone, contact details for the domain administrator, domain serial number, and information on how frequently DNS information for this zone should be refreshed.

02-Application/dns_iterative_resolve.png

+++++++++++++++++ Cahoot-02-8

1.10.11 Wireshark protocol details

Show: some Wireshark observations of nslookup for various types of record, this time in detail about the fields.

  1. Visit https://mst.edu with web browser

  2. Make a manual query using command line tools

#!/bin/bash

nslookup mst.edu
nslookup www.mst.edu

dig mst.edu
dig www.mst.edu

whois mst.edu
whois icann.org

# What are the authoritative servers?
nslookup -type=NS mst.edu

# What do the authoritative servers say?
nslookup mst.edu ns-1.mst.edu
  1. Use python to query the same
#!/usr/bin/python3

import socket

print(socket.gethostbyname('mst.edu'))

1.10.12 Inserting records into DNS

02-Application/dns_lookup_v_registration.png

1.10.13 Reverse DNS

https://en.wikipedia.org/wiki/Reverse_DNS_lookup
* The DNS is mainly used to find the IP address that correspond to a given name.
* However, it is sometimes useful to obtain the name that corresponds to an IP address.
* This done by using the PTR (pointer) RR.
* The RData part of a PTR RR contains the name while the Name part of the RR contains the IP address encoded in the in-addr.arpa domain.
* IPv4 addresses are encoded in the in-addr.arpa by reversing the four digits that compose the dotted decimal representation of the address.
* For example, consider IPv4 address 192.0.2.11.
* The hostname associated to this address can be found by requesting the PTR RR that corresponds to 11.2.0.192.in-addr.arpa.
* A similar solution is used to support IPv6 addresses, see RFC 3596.

1.10.14 Attacks on DNS

https://en.wikipedia.org/wiki/Domain_Name_System#Security_issues
https://en.wikipedia.org/wiki/Domain_Name_System#Privacy_and_tracking_issues
02-Application/dns_abuse.jpg

1.10.14.1 Specific attacks

DDoS bandwidth-flooding attack
An attacker could attempt to send to each DNS root server a deluge of packets,
so many that the majority of legitimate DNS queries never get answered.
Bombard root servers with traffic.
This has not really been successful to date.

Defenses include:
Traffic filtering.
Local DNS servers cache IPs of TLD servers,
allowing root server bypass.

Bombarding TLD servers is potentially more dangerous.
02-Application/dns_flood.jpg

Man-in-the-middle attack
The attacker intercepts queries from hosts and returns bogus replies.
https://en.wikipedia.org/wiki/DNS_hijacking
(show in class)
02-Application/dns_spoof.jpg

DNS poisoning attack
The attacker sends bogus replies to a DNS server,
who is making outgoing requests itself,
tricking the server into accepting bogus records into its cache.
Send bogus replies to DNS server, which caches
https://en.wikipedia.org/wiki/DNS_spoofing
(show in class)
02-Application/dns_poison.jpg

DNS redirection
Another important DNS attack is not an attack on the DNS service, per se,
but instead exploits the DNS infrastructure,
to launch a DDoS attack against a targeted host.
Attacker sends DNS queries to many authoritative DNS servers,
with each query having the spoofed source address of the targeted host.
The DNS servers then send their replies directly to the targeted host.

Exploit DNS for DDoS:
send queries with spoofed source address and target IP.
This often requires amplification

DNS as exfiltration / infiltration / tunneling
One can sneak data through DNS requests or replies.
02-Application/dns_tunneling.jpg

+++++++++++++++++ Cahoot-02-9

1.10.15 DNS protections

Ways to avoid those attacks:

Just encrypt the connections to the server:
https://en.wikipedia.org/wiki/DNS_over_HTTPS
https://en.wikipedia.org/wiki/DNS_over_TLS

Tor/VPN/Proxy (privacy, but also some security).

Cryptographic signatures on DNS messages
https://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions
https://en.wikipedia.org/wiki/DNS-based_Authentication_of_Named_Entities
https://en.wikipedia.org/wiki/DNS_Certification_Authority_Authorization

1.10.16 Who controls DNS?

The Lord of the DNS

One DNS to rule them all,
One DNS to find them,
One DNS to bring them all,
and in the darkness bind them…
02-Application/sauron.webp
(i.e., a big boring Sauron committee…)
https://en.wikipedia.org/wiki/ICANN
https://en.wikipedia.org/wiki/ICANN#Criticism

https://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority

https://en.wikipedia.org/wiki/OpenNIC
(permitted to be an open alternative)
02-Application/dns_overview.png

+++++++++++++++++ Cahoot-02-10

1.11 Alternative name systems

Fellowship of the DNS…

Fair, robust, distributed, decentralized, non-exploitable name resolution,
is a bit of a https://en.wikipedia.org/wiki/Catch-22_(logic)
and a real difficult problem to solve…

Discussion question:
What might a reliable distributed solution look like?
Might they come with their own exploits and problems?
Might a p2p system end up even more dictatorially problematic than DNS?
(e.g., Mr. Robot’s Evil Corp cryptocurrency)?

GNU name system
https://gnunet.org/gns
https://lsd.gnunet.org/lsd0001/
https://news.ycombinator.com/item?id=30154830
(discuss proposal to replace DNS!)
ICANN https///icann.zoom.us/rec/play/znYwyZWPwrNraKqiZCLwOkHp_NITBj0QdhMpIrZPTrJumDRxIaecB8DHAygsgO-8PxQKkYx5ESGj6pBl.vZAWJHZoGeNyX9R4?startTime=1572978711000&_x_zm_rtaid=M4Wj53e3QXyaUK9nI6hiQg.1644387258044.8569edd15b9c2bafee5b5a283ad9fa90&_x_zm_rhtaid=108|discussion of using GNUnet instead of DNS

I2P web-of-trust name system
https://geti2p.net/en/docs/naming
(web of trust based)

Crypto-currency-based
https://ens.domains/
https://docs.ens.domains/en/latest/introduction.html

https://www.namecoin.org/
https://en.wikipedia.org/wiki/Namecoin

1.12 Hosting a site

Do you need to buy a name to host a site on clearnet?
Do you need to buy an static IP to host a site on clearnet,
or does a dynamic IP suffice?
What about dynamic DNS?
https://en.wikipedia.org/wiki/Dynamic_DNS
Do you need to buy act actual machine? A virtual one?
Do you need to buy an HTTPS certificate?
Do you need to buy anything else?

What about overlay layers or darknets for simple free hosting?
https://en.wikipedia.org/wiki/I2P
https://en.wikipedia.org/wiki/Tor_(anonymity_network)
Can one circumvent DNS editing as a censorship technique?
Can one block sites at all with common darknets?

What is the easiest way to set up an independent site on your own hardware, or a VPS you rent?
Static websites:
https://onionshare.org/
http://lldan5gahapx5k7iafb3s4ikijc4ni7gx5iywdflkba5y2ezyg6sjgyd.onion/

sudo dnf install tor
pip3 install --upgrade onionshare-cli --user
echo "cool publically accessible website" >index.html
onionshare-cli --website --public index.html

You could even host a website like this on your phone,
in under 10 minutes:
https://onionshare.org/mobile/#download
Anywhere that had an internet connection,
you could leave your phone plugged in an host a website there…

https://medium.com/axon-technologies/hosting-anonymous-website-on-tor-network-3a82394d7a01
Interactive backend easily possible with tor process and Apache.

https://geti2p.net/en/faq#myI2P%20Site

1.13 P2P

Today:
Theoretical difficulties with P2P and their solutions (general).
An overview of protocols and services provided by P2P overlay applications (general).
High level protocol specification for an example P2P application (BitTorrent).

1.13.1 A variety of protocols

There are many P2P protocols.
BitTorrent is just one we will review today.

1.13.2 P2P vs. Client server

02-Application/p2p00.png
* no always-on server
* arbitrary end systems directly communicate
* peers are intermittently connected and change IP addresses

Examples:
* file distribution (BitTorrent)
* Streaming (KanKan)
* VoIP (Skype - partial, funny story of history)

02-Application/p2p01.png

File distribution problem: Client server vs. P2P

Upload/download capacity is limited resource!

Question:
How much time to distribute file (size F),
from one server, to N peers?
02-Application/app_layer23.png

1.13.2.1 Client-server time to distribute file

Let’s first determine the distribution time for the client-server architecture,
which we denote by Dcs . In the client-server architecture, none of the peers aids in
distributing the file. We make the following observations:

The server must transmit one copy of the file to each of the N peers.
Thus, the server must transmit N * F bits.
Since the server’s upload rate is us,
the time to distribute the file must be at least (N * F) / us

Server transmission
Must sequentially send (upload) a number (N) of file (F) copies:

us = server upload

time to send one copy:
F / us

time to send N copies:
(N * F) / us

Let d min denote the download rate of the peer with the lowest download rate,
that is, dmin = min{d1, dp,… ,dN){width=700px.
The peer with the lowest download rate,
cannot obtain all F bits of the file in less than F / dmin seconds.
Thus the minimum distribution time is at least F / dmin
That however, will almost never be the real time,
as the server must distribute to many peers.

Client: each client must download file copy
* dmin = min client download rate
* min client download time: F / dmin

Time to distribute F to N clients using client-server approach:
Dcs > max{ (N * F) / us, F / dmin }
Max numerator increases linearly with N.

Question: how much time to distribute file (size F) from one server to N peers?
02-Application/app_layer23.png

1.13.2.2 P2P time to distribute file

At the beginning of the distribution, only the server has the file.
To get this file into the community of peers, the server must send each bit of the file at least once into its access link.
Thus, the minimum distribution time is at least F / us
Unlike the client-server scheme, a bit sent once by the server may not have to be sent by the server again, as the peers may redistribute the bit among themselves.

Server transmission
Must upload at least one copy.
Time to send one copy:
F / us

As with the client-server architecture, the peer with the lowest download rate cannot obtain all F bits of the file in less than F / d min seconds.
Thus the minimum distribution time is at least F / dmin
Unlike with the client-server model, with p2p, this could actually (and often is) the server’s bandwidth contribution.

Client: each client must download file copy
* min client download time: F / dmin

The total upload capacity of the system as a whole is equal to the upload rate of the server plus the upload rates of each of the individual peers, that is:
utotal = us + u1 + … + uN
The system must deliver (upload) F bits to each of the N peers, thus delivering a total of N * F bits.
This cannot be done at a rate faster than utotal.
Thus, the minimum distribution time is also at least (N * F) / (us + u1 + … + uN).

Clients: as aggregate, each individual (i) must download N * F bits
* max upload rate (limiting max download rate) is us + sum(ui)

Time to distribute F to N clients using P2P approach:
DP2P > max{ F / us, F / dmin, (N * F) / (us + sum(ui)) }
Max numerator increases with N
But, so does the denominator,
since each peer provides service capacity

1.13.2.3 Distribution time for P2P vs. Client-server

02-Application/app_layer24.png
Net client upload rate = u
F / u = 1 hour
us = 10u
dmin >= us

P2P vs Client server
For the P2P architecture the minimal distribution time is always lesser,
compared to the distribution time of the client-server architecture.
It is also less than a fixed duration, above some number of peers N!
Applications with the P2P architecture can be self-scaling.
This scalability is a direct consequence of peers being re-distributors,
as well as consumers of bits.

1.13.3 BitTorrent overview

Standard protocol,
many clients (Vuze, BigglyBt, I2P-Snark, Bittorrent-official, etc., ),
and versions of tracker software (some server-based trackers).

File divided into 256Kb chunks (or other equal size).
Peers in torrent send/receive file chunks.
02-Application/app_layer25.png
Tracker:
tracks peers participating in torrent (or DHT);
runs their own choice of tracker software.
Used to be only a server-side operation, now also can be P2P!

Torrent:
meta-data and group of peers exchanging chunks of a file.

Client:
uploads and downloads files; runs their own client torrent software

Process:
Alice arrives, chooses a torrent, and using the torrent meta-data,
obtains a list of peers from tracker server (or distributed tracker),
and finally begins exchanging file chunks with peers in torrent.

++++++++++++++++ Cahoot-02-11

1.13.3.1 Overview of process

Peer joining torrent:
* new peer has no chunks, but will accumulate them over time from other peers
* registers with tracker (server or distributed) to get list of peers who have the torrent of interest, connects to subset of peers (“neighbors”)

After joining:
* while downloading, peer uploads chunks to other peers
* peer may change peers, with whom it exchanges chunks

Churn:
* peers may come and go
* once peer has entire file, it may (selfishly) leave or (altruistically) remain in torrent

Requesting chunks:
* at any given time, different peers have different subsets of file chunks
* periodically, Alice asks each peer for list of chunks that they have
* Alice requests missing chunks from peers, rarest first

Ask: Why rarest first?

Discussion question:
Why not just be a leech (download but not contribute)?
How might you design a protocol with incentives?
What might an incentive look like?
Should you build incentives into protocols?
Do people follow incentives?

1.13.3.2 BitTorrent participation incentive

How do we put a kink in the wires of those who don’t contribute enough,
slowing down their transfers,
to encourage every peer to reciprocate?

Sending chunks: tit-for-tat incentives
https://en.wikipedia.org/wiki/Tit_for_tat
* Alice sends chunks to those four peers currently sending her chunks at highest rate, rewarding them with more data.
* other peers are choked by Alice (do not receive chunks from her)
* re-evaluate top 4 every 10 secs
* every 30 secs: randomly select another peer, starts sending chunks
* “optimistically un-choke” this peer, in hope that the new peer reciprocates
* newly chosen peer may join top 4

Overview:
(1) Alice “optimistically un-chokes” a new participant, Bob, in hopes that reciprocates
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers

All this results in higher upload rate, finding better trading partners, and getting file faster !

Sharing is caring…

1.13.3.2.1 Problem 0: How to get people to share

Due to risk or costs in internet speed or throughput,
individuals could potentially download, but not upload.

1.13.3.2.2 Solution 0: Participation incentives

An interesting read, game theory in software design and CompSci:
http://bittorrent.org/bittorrentecon.pdf

1.13.4 Distributed Hash Table (DHT)

General (not BitTorrent specific)

++++++++++++++++ Cahoot-02-12

Review: dictionaries, maps, and hash tables

Simple database with (key, value) pairs:
key: human name;
value: social security number
02-Application/pasted_image.png

key = hash(original key)
02-Application/pasted_image001.png
02-Application/hash_table.png
* O(1) complexity regardless of size of data.
* Can store large sparse key-space in smaller array with constant access time

Note: There are potentially two distributed databases (or merged into one) in some p2p networks:
1. Routing table for overlay network peers, who are defined by their addresses
2. Database of torrents: addresses/peers

1.13.4.1 Distributed database

It’s easy to keep a database on a server,
but how do we increase the censorship resistance and robustness?

1.13.4.1.1 Problem 1: how to keep a database everywhere?

DHT
02-Application/DHT.png

1.13.4.1.2 Solution 1: distributed hash table (DHT)

Problem
* How to lookup which user is storing a particular hashed key?

Solution
* each peer only aware of immediate successor and predecessor.
* circular DHT (a)
* Only index forward neighbors
* number of messages is n/2; O(N) messages on average to resolve query, when there are N peers
02-Application/app_layer26.png
(b) Storing indices of more neighbors increases messaging efficiency, and increases storage overhead
* Each peer keeps track of IP addresses of predecessor, successor, short cuts.
* A balance of connections: space versus time
* DHT can be designed so that both the number of neighbors per peer as well as the average number of messages per query is O(log N), where N is the number of peers.

1.13.4.2 Peer churn

Peers come an go, and the network must adapt.
02-Application/app_layer26.png

1.13.4.2.1 Problem 2: peers turn over

Example, peer 5 abruptly leaves, or is disconnected

1.13.4.2.2 Solution 2: synchronization procedure

handling peer churn:
* peers may come and go (churn)
* each peer knows address of its two successors
* each peer periodically pings its
* two successors to check aliveness
* if immediate successor leaves, choose next successor as new immediate successor

example: peer 5 abruptly leaves
* peer 4 detects peer 5’s departure; makes 8 its immediate successor
* 4 asks 8 who its immediate successor is; makes 8’s immediate successor its second successor.

1.13.5 BitTorrent protocols

https://en.wikipedia.org/wiki/Bittorrent
https://wiki.wireshark.org/BitTorrent
http://bittorrent.org/beps/bep_0000.html

1.13.5.0.1 Main protocol and torrent files

https://www.bittorrent.org/beps/bep_0003.html (show in class)
https://en.wikipedia.org/wiki/Torrent_file (show in class)
https://en.wikipedia.org/wiki/Magnet_URI_scheme
Show a real torrent file, map to specifications.
For example, https://ftp.qubes-os.org/iso/Qubes-R4.1.0-x86_64.torrent

1.13.5.0.2 DHT server-less tracker protocol extension
1.13.5.0.3 Transport layer protocols used by BitTorrent

BitTorrent protocol: two main transport level choices

  1. Option 1: BitTorrent started with using TCP as its transport protocol.
  1. Option 2: UDP-based Micro Transport Protocol, called uTP.
0       4       8               16              24              32
+-------+-------+---------------+---------------+---------------+
| type  | ver   | extension     | connection_id                 |
+-------+-------+---------------+---------------+---------------+
| timestamp_microseconds                                        |
+---------------+---------------+---------------+---------------+
| timestamp_difference_microseconds                             |
+---------------+---------------+---------------+---------------+
| wnd_size                                                      |
+---------------+---------------+---------------+---------------+
| seq_nr                        | ack_nr                        |
+---------------+---------------+---------------+---------------+
1.13.5.0.4 BT video streaming (P2P)
02-Application/vuze-hp.png

1.13.5.1 BitTorrent Protocol

Show/demo: Wireshark downloading Linux ISO with transmission

1.13.6 P2P security

1.13.6.1 Problems:

1.13.6.2 Solutions:

1.14 Socket programming

Goal
Learn how to build client/server applications that communicate using sockets.

Socket:
A tunnel between application processes, in an end-to-end transport protocol

Two primary socket types for two transport services exist.
UDP is an unreliable, lightweight datagram service.
TCP is a reliable, heavier, byte-stream, connection oriented service.

1.14.1 Example

Application example we’ll put in code, in order:
* Client
* inputs a line of characters (data) from the keyboard, and
* sends the data to server
* Server
* receives the data,
* converts the characters to uppercase, and
* sends the modified data to client
* Client
* receives modified data, and
* displays it as a printed line on the screen

1.14.1.1 UDP

UDP involves no persistent “connection” between a client and server.
No handshaking occurs before sending data.
A sender explicitly attaches a destination IP address and port number to each packet.
A receiver extracts the sender’s IP address and port number from each received packet.
Transmitted data may be lost.
Transmitted data may be received out-of-order.
UDP provides unreliable transfer of groups of bytes (“datagrams”) between client and server.

UDP socket code:
* 02-Application/socket_01_UDP_server.py
* 02-Application/socket_01_UDP_client.py

Demonstrate:
0. Run in background:
python3 socket_01_UDP_server.py

  1. Show Wireshark watching the client and server code:
    sudo wireshark &

  2. Connect with:

python3 socket_01_UDP_client.py
nc -uC 127.0.0.1 6789
man nc # ncat can send UDP packets too!

Show how nc or multiple python clients can block

02-Application/app_layer27.png

+++++++++++++++++++++++++++++++++ Cahoot-02-13

1.14.1.2 TCP

A server process must first be running.
The server must have created a TCP socket,
that welcomes a client’s contact.
A client contacts a server.
The client specifies an IP address and port number of a server process.
The client uses that address to create a TCP socket.
The client’s TCP socket establishes a connection to the server.
When contacted by a client on the welcoming socket,
the server’s TCP socket creates a secondary new socket,
for the server process to communicate with that particular client.
This allows server to talk with multiple clients.
Source port numbers distinguish different clients.
TCP provides a reliable, in-order, byte-stream transfer between a client and server.

TCP socket code:
Example 1:
* 02-Application/socket_02_TCP_server.py
* 02-Application/socket_02_TCP_client.py

Example 2:
* 02-Application/socket_02_TCP_server2.py
* 02-Application/socket_02_TCP_client2.py

Demonstrate:
0. Run a server:
python3 socket_02_TCP_server.py

  1. Show Wireshark watching the client and server code:
    sudo wireshark &

  2. Connect with

python socket_02_TCP_client.py
nc -C 127.0.0.1 6789
  1. show currently connected sockets with:
# new option
man ss
ss
# old, ss is better
man netstat
netstat -an
# another option
man lsof
lsof -i -n

02-Application/app_layer28.png
The term “port” is not the same idea or definition as the term “socket”.

Socket is an instance object dually created both:
within a requesting application, and
within an operating system for that requesting application.

Port is a designation dually configured both:
as field in the transport-layer headers, in actual packets, and
in the OS’s kernel networking core, and firewall configuration.
The OS’s kernel routes packets to the application.

Step by step:
02-Application/app_layer29.png

+++++++++++++++++++++++++++++++++ Cahoot-02-14

1.14.2 Concurrency in in python

https://realpython.com/python-concurrency/

1.14.2.1 Global interpreter lock

https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock
https://wiki.python.org/moin/GlobalInterpreterLock
https://realpython.com/python-gil/
Don’t like the GIL,
perhaps go with pypy:
https://www.pypy.org
https://realpython.com/pypy-faster-python/

You can use either:
multithreading
multiprocessing
asyncio

When should you use each?
multithreading to deal with simple blocking (no real speed up).
multiprocessing to run over multiple cores (speed up).
asyncio to deal more more complex or larger-scale needs (often blocking).

1.14.2.2 Multiprocessing

See my code now:
* 02-Application/thread_00_none.py
* 02-Application/thread_01_unrolled.py
* 02-Application/thread_02_fake.py
* 02-Application/thread_03_storage.py

Show multi-threaded examples now:
* 02-Application/socket_04_TCP_server_mt.py
* 02-Application/socket_04_TCP_client_mt.py

Now, nc does not block the server from other client’s:

python3 socket_04_TCP_server_mt.py &
nc -C 127.0.0.1 50002
python3 socket_04_TCP_server_mt.py

1.14.3 General socket functionality

https://docs.python.org/3/library/socket.html
https://realpython.com/python-sockets/

Let’s review some program-internal functions.

1.14.3.1 socket.socket.bind

>>> help(socket.socket.bind)
bind(...)
bind(address)
Bind the socket to a local address.
For IP sockets, the address is a pair (host, port);
the host must refer to the local host.
For raw packet sockets the address is a tuple
(ifname, proto [,pkttype [,hatype [,addr]]])

socket.socket.bind takes a tuple: (hostname or IP, port)
https://serverfault.com/questions/78048/whats-the-difference-between-ip-address-0-0-0-0-and-127-0-0-1
What are valid hostname or IP addresses to use?

The use of the term “local” above is ambiguous.
Q: What does it mean here, operationally?
A: That the IP address being bound is assigned to an interface managed by your operating system!
More to come on interfaces when we cover the network layer:
../Content/04-NetworkData.html

""
defaults to all traffic to the machine.
It is the same as 0.0.0.0 for IPv4.
It’s easier for IPv6.

0.0.0.0
which also listens to all traffic on the machine
(0.0.0.0 means various different things in different contexts).
https://www.rfc-editor.org/rfc/rfc1122#page-29 section 3.2.1.3
(a) { 0, 0 }
This host on this network.
MUST NOT be sent,
except as a source address as part of an initialization procedure,
by which the host learns its own IP address.
See also Section 3.3.6 for a non-standard use of {0,0}.
https://www.rfc-editor.org/rfc/rfc5735#section-3
0.0.0.0/8 - Addresses in this block refer to source hosts on “this” network.
Address 0.0.0.0/32 may be used as a source address for this host on this network;
other addresses within 0.0.0.0/8 may be used to refer to specified hosts on this network ([RFC1122], Section 3.2.1.3).
Despite the standard, 0.0.0.6 for example, won’t bind in python3.

<hostname>
https://docs.python.org/3/library/socket.html
If you use a hostname in the host portion of IPv4/v6 socket address,
the program may show a nondeterministic behavior,
as Python uses the first address returned from the DNS resolution.
The socket address will be resolved differently into an actual IPv4/v6 address,
depending on the results from DNS resolution and/or the host configuration.
For deterministic behavior use a numeric address in host portion.
On my Fedora machine, it resolves to 127.0.0.1.
Hostname is a shallow alias, implemented via checking: /etc/hosts.html

127.0.0.1 through 127.255.255.254 (CIDER notation: 127.0.0.0/8)
https://www.rfc-editor.org/rfc/rfc5735#section-3
127.0.0.0/8 - This block is assigned for use as the Internet host loopback address.
A datagram sent by a higher-level protocol to an address anywhere within this block loops back inside the host.
This is ordinarily implemented using only 127.0.0.1/32 for loopback.
As described in [RFC1122], Section 3.2.1.3,
addresses within the entire 127.0.0.0/8 block do not legitimately appear on any network anywhere.
Your local machine only.
You can use 127.0.0.4 (or whatever in the range),
but that socket will only be reachable on that IP.
Python’s sending socket defaults to 127.0.0.1 as the sending IP,
when sending to any localhost address.

A LAN-only IP address
10.0.0.0 - 10.255.255.255 (10.0.0.0/8 prefix)
172.16.0.0 - 172.31.255.255 (172.16.0.0/12 prefix)
192.168.0.0 - 192.168.255.255 (192.168.0.0/16 prefix)
https://datatracker.ietf.org/doc/html/rfc1918
These IP ranges are declared as LAN IPs,
as opposed to public, globally routable IPs,
or to localhost IPs, etc.
If you have an interface bound to an IP in this range,
then you could bind any of these.
If your interface in the OS is not bound to one,
then you can not bind the socket in python either.

A public, globally routable IP address
More-or-less anything not in the below list:
https://www.iana.org/assignments/iana-ipv4-special-registry/iana-ipv4-special-registry.xhtml
https://en.wikipedia.org/wiki/IPv4#Special-use_addresses
If you have an interface bound to an IP in this range,
then you could bind any of these.
If your interface in the OS is not bound to one,
then you can not bind the socket in python either.

Which should you choose?

If you’re debugging locally,
then use 127.0.0.1.

If you are lazy,
then use “” or 0.0.0.0

If you want more security,
then consider using a specific IP,
of an interface on your machine.

1.14.3.2 POSIX sockets

Below, we illustrate state diagrams for UDP and TCP sockets.
These are standard POSIX sockets,
also known as BSD or Berkeley sockets.
https://en.wikipedia.org/wiki/Berkeley_sockets
Many languages use similar BSD sockets to those in the C language.
Python’s also follow the below API.

Discussion question:
What is the value of having a POSIX standard?
What is the value of specifying the socket API itself as part of POSIX?
https://en.wikipedia.org/wiki/POSIX

1.14.3.2.1 UDP
02-Application/UDP-socket-programming.png
02-Application/UDP-socket1.png
02-Application/UDP-socket2.png
02-Application/UDP-socket3.png
02-Application/UDP-socket4.png
02-Application/UDP-sockets.jpg
1.14.3.2.2 TCP

The overview
02-Application/sockets-ethernet-interface.webp

The states:
02-Application/sockets-tcp-flow.webp

02-Application/socket_BSDflow.png
02-Application/socket_Berkeley_SOCKET.jpg
02-Application/socket_diag.png
02-Application/socket_review.gif

To think ahead to what we’re covering next,
TCP’s actual internal FSM is much more detailed than this!
These images below are just the high-level API.
We’ll dive into that now:

Next: 03-Transport.html