Previous: 01-Overview.html
document.querySelector('video').playbackRate = 1.2
“Web users ultimately want to get at data quickly and
easily.
They don’t care as much about attractive sites and pretty
design.”
- Tim Berners-Lee
Is this assumption currently true?
Was it ever true?
Is it true for some people at least?
End system communication at application layer
Goal: write programs that:
* run on (different) end systems
* communicate over network
* e.g., web server software communicates with browser software
No need (or desire) to write software for network-core devices:
* network-core devices do not (and SHOULD not) run user
applications!
* applications on end systems allows for rapid app development,
propagation, innovation
* this is changing somewhat, which could impede the development of new
protocols
Client-server versus peer-to-peer (P2P)
++++++++++++++ Cahoot-02-1
Discussion question:
* If p2p works well, why has it not become the norm?
* What are some other pros/cons of p2p architectures?
This is a hint of the next layer, transport
client process: process that initiates communication
server process: process that waits to be contacted
aside: applications with P2P architectures have both client processes and server processes
Q: does the IP address of a host, on which a process runs, suffice
for identifying the process?
A: no, many processes can be running on one host
For example, to send HTTP message to gaia.cs.umass.edu web
server:
IP address: 128.119.245.12
port number: 80
Hosts are often identified by IP address
* IP address is a 32-bit quantity uniquely identifying
the host, in ipv4.
* Addresses are 128 bit for ipv6.
* We ran out of ipv4 addresses…
* The sending process must also identify the receiving process (more
specifically, the receiving socket) running in the host.
* This information is needed because in general a host could be running
many network applications.
* A destination port number serves this purpose.
++++++++++++++ Cahoot-02-2
data integrity
* some apps (e.g., file transfer, web transactions) require 100%
reliable data transfer
* other apps (e.g., audio) can tolerate some loss
timing
* some apps (e.g., Internet telephony, interactive games) require low
delay to be “effective”
* some apps (e.g., file download) tolerate delay
throughput
* some apps (e.g., multimedia) require minimum amount of throughput to
be “effective”
* other apps (“elastic apps”) make use of whatever throughput they
get
security
* encryption, data integrity, …
Unreliable datagram (UDP)
Byte-stream (TCP)
Service requirements for applications?
Applications usually choose between TCP and
UDP
* TCP and UDP are transport layer
* Employed by most application layer programs
* Other Transport layer, or pseudo-transport layer protocols
exist.
* SCTP (stream control transmission protocol), SSU (I2P app), DCCP,
RUDP, UDP-lite, etc.
* An application designer could design their own transport layer
protocol, since Transport layer and up runs on end hosts, as opposed to
network infrastructure.
* Could build into the core/kernel of end operating systems and
languages as new socket type
* Could also just design the features into the application layer, rather
than actually get a transport protocol built into the kernel of an
OS.
* UDP lets you build new things!
Connection-oriented service
* TCP has the client and server exchange transport-layer control
information with each other, before the application-level messages begin
to flow.
* This so-called handshaking procedure alerts the client and server,
allowing them to prepare for an onslaught of packets.
* After the handshaking phase, a TCP connection is said to exist between
the sockets of the two processes.
* The connection is a full-duplex connection, in that the two processes
can both send messages to each other over the connection at the same
time, bi-directionally
* When the application finishes sending messages, it must tear down the
connection.
TCP has a Reliable data transfer service
* The communicating processes can rely on TCP to deliver all data sent
without error and in the proper order.
TCP also includes a congestion-control
mechanism
* The TCP congestion-control mechanism throttles a sending process
(client or server) when the network is congested between sender and
receiver.
Summary
* reliable transport: between sending and receiving
process
* flow control: sender won’t overwhelm receiver
* congestion control: throttle sender when network
overloaded
* does not provide: timing, minimum throughput
guarantee, security
* connection-oriented: setup required between client
and server processes
UDP service:
* unreliable data transfer between sending and receiving process
* does not provide: reliability, flow control, congestion control,
timing, throughput guarantee, security, or connection setup,
Discussion questions:
* Why bother with UDP?
* With TCP, why is there a UDP?
++++++++++++++ Cahoot-02-3
https://en.wikipedia.org/wiki/Transport_Layer_Security
Neither base TCP nor UDP provide any encryption!
An Enhancement for TCP provides:
1. encryption,
2. data integrity, and
3. end-point authentication.
The great thing about a TLS joke,
is that you can tell if it’s not the original…
Transport layer protocols used
Tunneling:
Inner-most -> Outer-more… -> Outer-most
Application -> TLS -> TCP -> IP -> MAC -> Ethernet ->
Physical
More detail here:
* 05-Security.html
* ../../Security/Content/12a-AppliedCryptoSystems.html
An application-layer protocol defines how an
application’s processes, running on different end systems, pass messages
to each other, for example:
* The types of messages exchanged, for example, request
messages and response messages
* E.g., request, response
* The syntax of the various message types, such as the
fields in the message and how the fields are delineated
* The semantics of the fields
* meaning of the information in the fields
* Rules for determining when and how a process sends
messages and responds to messages, and change state.
open protocols:
defined in RFCs
allows for interoperability
e.g., HTTP, SMTP
proprietary protocols:
e.g., Skype (used to be open, fun story)
Observe HTTP with Wireshark:
$ nc -C info.cern.ch 80
$ GET / HTTP/1.1
$ Host: info.cern.ch
$ ncat -C hackware.ru 80
$ GET / HTTP/1.0
$ Host: hackware.ru
Trace HTTP conversation in Wireshark
Observe each packet has headers from multiple layers
These must be typed exactly, or they will not
work!
ncat -C
option is for crlf: $ man ncat
to read
more
Encrypted option:
$ ncat -C --ssl hackware.ru 443
$ GET / HTTP/1.0
$ Host: hackware.ru
Web pages
* A Web page (also called a document) consists of objects.
* An object is simply a file such as an HTML file, a JPEG image, a Java
applet (lol…), or a video clip that is addressable by a single
URL.
* If a Web page contains HTML text and five JPEG images, then the Web
page has six objects: the base HTML file plus the five images.
* The base HTML file references the other objects in the page with the
objects’ URLs.
* Each URL has two components:
* the hostname of the server that houses the object
and
* the object’s path name.
* For example, the URL
http://www.someSchool.edu/someDepartment/picture.gif
has www.someSchool.edu
for a hostname
and
/someDepartment/picture.gif
for a path
name.
http://www.w3.org/MarkUp/ defines the standard
HTTP/1 and HTTP/2 use TCP (not UDP)
* The HTTP client first initiates a TCP connection with the
server.
* Once the connection is established, the browser and the server
processes access TCP through their socket interfaces.
* Server sends requested files to clients without storing any state
information about the client, a stateless protocol.
HTTP can choose either:
* Each request/response pair sent over a separate TCP connection
(non-persistent connections), or
* All of the requests and their corresponding responses sent over the
same TCP connection (persistent connections)
HTTP sequence
A base HTML file and 10 JPEG images, and that all 11 of these objects
reside on the same server:
http://www.someSchool.edu/someDepartment/home.index
1. HTTP client process initiates a TCP connection to the server
https://www.someSchool.edu on port number 80, which is the default port
number for HTTP. Associated with the TCP connection, there will be a
socket at the client and a socket at the server.
2. HTTP client sends an HTTP request message to the server via its
socket. The request message includes the path name
/someDepartment/home.index.
3. HTTP server process receives the request message via its socket,
retrieves the object /someDepartment/home.index from its storage (RAM or
disk), encapsulates the object in an HTTP response message, and sends
the response message to the client via its socket.
4. HTTP server process tells TCP to close the TCP connection. (But TCP
doesn’t actually terminate the connection until it knows for sure that
the client has received the response message intact.)
5. HTTP client receives the response message. The TCP connection
terminates. The message indicates that the encapsulated object is an
HTML file. The client extracts the file from the response message,
examines the HTML file, and finds references to the 10 JPEG
objects.
6. first four steps are then repeated for each of the referenced JPEG
objects.
Time to fill a request
non-persistent HTTP
* at most one object sent over TCP connection
* connection then closed
* downloading multiple objects required multiple connections
persistent HTTP
* multiple objects can be sent over single TCP connection between
client, server
Disadvantages of non-persistent connections
* First, a brand-new connection must be established and maintained for
each requested object.
* For each of these connections, TCP buffers must be allocated and TCP
variables must be kept in both the client and server.
* Each object suffers a delivery delay of two RTTs one RTT to establish
the TCP connection and one RTT to request and receive an object.
Persistent connections
* With persistent connections, the server leaves the TCP connection open
after sending a response.
* Subsequent requests and responses between the same client and server
can be sent over the same connection.
* Multiple Web pages residing on the same server can be sent from the
server to the same client over a single persistent TCP connection.
* Requests for objects can be made back-to-back, without waiting for
replies to pending requests (pipelining).
* Typically, the HTTP server closes a connection when it isn’t used for
a certain time (a configurable timeout interval).
persistent HTTP:
* server leaves connection open after sending response
* subsequent HTTP messages between same client/server sent over open
connection
* client sends requests as soon as it encounters a referenced
object
* as little as one RTT for all the referenced objects
++++++++++++++ Cahoot-02-4
two types of HTTP messages:
1. request,
2. response
HTTP request message:
ASCII (human-readable format)
GET /somedir/page.html HTTP/1.1
Host: www.mst.edu
Connection: close
User-agent: Mozilla/5.0
Accept-language: en
General request
sp=space; cr=carriage return; lf=line feed
Example:
GET /index.html HTTP/1.1\r\n
Host: www-net.cs.umass.edu\r\n
User-Agent: Firefox/3.6.10\r\n
Accept: text/html,application/xhtml+xml\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
\r\n
HTTP/1.1 200 OK
Connection: close
Date: Tue, 09 Aug 2011 15:44:04 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Tue, 09 Aug 2011 15:11:03 GMT
Content-Length: 6821
Content-Type: text/html
(data data data data data ...
The entity body is the
meat of the message,
it contains the requested object itself)
General reply
sp=space; cr=carriage return; lf=line feed
Example:
HTTP/1.1 200 OK\r\n
Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n
Server: Apache/2.0.52 (CentOS)\r\n
Last-Modified: Tue, 30 Oct 2007 17:00:02 GMT\r\n
ETag: "17dc6-a5c-bf716880"\r\n
Accept-Ranges: bytes\r\n
Content-Length: 2652\r\n
Keep-Alive: timeout=10, max=100\r\n
Connection: Keep-Alive\r\n
Content-Type: text/html; charset=ISO-8859-1\r\n
\r\n
data data data data data ...
Server responses
Status code appears in 1st line in server-to-client response
message.
The best thing about 404 jokes is …
wait, damnit, it’s around here somewhere…
418: I’m a teapot
* A “real” joke built into the protocol
* https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/418
*
https://en.wikipedia.org/wiki/Hyper_Text_Coffee_Pot_Control_Protocol
* https://save418.com/
Old example: Open TCP connection, send GET request
telnet cis.poly.edu 80
GET /~ross/ HTTP/1.1
Host: cis.poly.edu
E.g.,
Note: ncat
has generally replaced
telnet, though they both still work
nc -C info.cern.ch 80
GET / HTTP/1.1
Host: info.cern.ch
POST method:
* web page often includes form input
* input is uploaded to server in entity body (i.e., message part of
packet)
URL method:
* uses GET method
* input is uploaded in URL field of request line:
www.somesite.com/animalsearch?monkeys&banana
We’ll demo this later!
Demonstrate:
* More wireshark HTTP examples in detail
* http://info.cern.ch/
* via nc
* via a browser that does not generate junk traffic:
* epiphany
or surf
or
qutebrowser
* Record it in Wireshark
* Identify HTTP headers, match them to fields
Many Web sites use cookies.
Four components:
1) cookie header line of HTTP response message
2) cookie header line in next HTTP request message
3) cookie file kept on user’s host, managed by user’s browser
4) back-end database at Web site
Example:
* Susan always access Internet from PC
* visits specific e-commerce site for first time
* when initial HTTP requests arrives at site, site creates:
* unique ID
* entry in backend database for ID
what cookies can be used for:
* authorization
* shopping carts
* recommendations
* user session state (Web e-mail)
* tracking
how to keep “state”:
* protocol endpoints: maintain state at sender/receiver over multiple
transactions
* cookies: http messages carry state
cookies and privacy:
* cookies permit sites to learn a lot about you
* you may supply name and e-mail to sites
Proxy server can cache
goal: satisfy client request without involving origin server
Bottleneck
Caching helps bottleneck
Demos
* Briefly show (not link) $ Webserver.py
* Run it, show visiting in browser
http://localhost:6789
Some interesting reading:
* QUIC (Google’s web protocol on top of UDP)
* http://intronetworks.cs.luc.edu/current2/uhtml/udp.html#quic
* https://en.wikipedia.org/wiki/QUIC
* https://daniel.haxx.se/blog/2018/11/11/http-3/
* https://blog.cloudflare.com/the-road-to-quic/
* Both innovative, and breaks federated interoperability. Pros and
Cons
I received this HTTP 200 joke.
It was OK…
++++++++++++++ Cahoot-02-5
Simple file
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file1.html
GET, OK, etc.
Refreshing a cached page
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file2.html
IF-MODIFIED-SINCE, refresh, re-sent?
Large file
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file3.html
Initial HTTP GET, TCP segments, how many HTTP OK, when?
Multiple-parts
http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file4.html
Notice the image retrieval.
What entity is responsible for requesting the multiple objects in a
page, when?
“Secure” web-page with login
http://gaia.cs.umass.edu/wireshark-labs/protected_pages/HTTP-wireshark-file5.html
username: wireshark-students
password: network
auth field of request
#!/usr/bin/python3
import base64
coded_string = "d2lyZXNoYXJrLXN0dWRlbnRz="
base64.b64decode(coded_string)
Let’s think about http, privacy, and security in various
scenarios:
https://www.eff.org/pages/tor-and-https
Before continuing, review TCP, UDP, and multi-threading below!
https://en.wikipedia.org/wiki/File_Transfer_Protocol
https://tools.ietf.org/html/rfc2428 (FTP)
File Transfer Protocol
transfer file to/from remote host
client/server model
ftp: RFC 959
ftp server: port 21
FTP control and data connections
* FTP uses two parallel TCP connections to transfer a file, a control
connection and a data connection.
* The control connection is used for sending control
information between the two hosts, information such as user
identification, password, commands to change remote directory, and
commands to “put” and “get” files.
* The data connection is used to actually send a
file.
* FTP is said to send its control information
out-of-band.
* HTTP sends request and response header lines into the same TCP
connection that carries the transferred file itself, named
in-band.
FTP sequence
* When a user starts an FTP session with a remote host, the client side
of FTP (user) first initiates a control TCP connection with the server
side (remote host) on server port number 21.
* client side of FTP sends the user identification and password over
this control connection.
* client side of FTP also sends, over the control connection, commands
to change the remote directory.
* When the server side receives a command for a file transfer over the
control connection (either to, or from, the remote host), the server
side initiates a TCP data connection to the client side.
* FTP sends exactly one file over the data connection and then closes
the data connection.
* If, during the same session, the user wants to transfer another file,
FTP opens another data connection.
* Control connection remains open throughout the duration of the user
session, but a new data connection is created for each file transferred
within a session (data connections are non-persistent).
FTP requests
* Commands, from client to server, and replies, from server to client,
are sent across the control connection in 7-bit ASCII format.
* In order to delineate successive commands, a carriage
return and line feed end each command.
* Each command consists of four uppercase ASCII characters, some with
optional arguments:
* USER username: Used to send the user identification
to the server.
* PASS password: Used to send the user password to the
server.
* LIST: Used to ask the server to send back a list of
all the files in the current remote directory. The list of files is sent
over a (new and non-persistent) data connection rather than the control
TCP connection.
* RETR filename: Used to retrieve (that is, get) a file
from the current directory of the remote host. This command causes the
remote host to initiate a data connection and to send the requested file
over the data connection.
* STOR filename: Used to store (that is, put) a file
into the current directory of the remote host.
FTP replies Some typical replies, along with their possible
messages, are as follows:
* 331 Username OK, password required
* 125 Data connection already open; transfer starting
* 425 Can’t open data connection
* 452 Error writing file
Demonstrate:
* [ ] Find an open ftp site, watch connection with wireshark
* With sftp, do we see any application layer protocol details with
Wireshark?
Observe SMTP with wireshark (does any of this show in wireshark)
ncat -C smtp.zoho.com 587
Does any of the application layer information show in wireshark
here?
ncat --ssl -C smtp.zoho.com 465
HELO web.site, MAIL FROM, RCPT TO, DATA, QUIT
Observe POP
ncat --ssl -C pop.zoho.com 995
user bob
pass password
list
Notes:
* user-agent is local, but also remote.
* user-agent used to be on remote machine
* then, real mail user-agents
* then user-agent back on remote machine
* Mail server is a messy multi-part aggregate of things in software
How many people still use a real, local, MUA?
Electronic Mail: SMTP
[RFC 2821]
Alice sends a message to Bob
Basic process
1. Alice invokes her user agent for e-mail, provides Bob’s e-mail
address (for example, bob@someschool.edu), composes a message, and
instructs the user agent to send the message.
2. Alice’s user agent sends the message to her mail server, where it is
placed in a message queue.
3. The client side of SMTP, running on Alice’s mail server, sees the
message in the message queue. It opens a TCP connection to an SMTP
server, running on Bob’s mail server.
4. After some initial SMTP handshaking, the SMTP client sends Alice’s
message into the TCP connection.
5. At Bob’s mail server, the server side of SMTP receives the message.
Bob’s mail server then places the message in Bob’s mailbox.
6. Bob invokes his user agent to read the message at his
convenience.
Example SMTP transcript
Hostname of the client is crepes.fr
Hostname of the server is server.edu
S: 220 server.edu
C: EHLO crepes.fr // a nicer HELO
S: 250 Hello crepes.fr, pleased to meet you
C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr ... Sender ok
C: RCPT TO: <bob@server.edu>
S: 250 bob@server.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 server.edu closing connection
Another SMTP example
base64 encoding is required for username and password:
https://en.wikipedia.org/wiki/Base64
c: AUTH LOGIN
s: 334 VXNlcm5hbWU6
c: yourusernameinb64encoding
s: 334 VXNlcm5hbWU6
c: yourpasswordinb64encoding
To get base64 encoding of a string:
# encode in bash
$ echo -n 'string' | base64
# decode in bash
$ echo -n c3RyaW5nCg== | base64 -d
# In python:
>>> import base64
>>> base64.b64encode('string'.encode())
>>> base64.b64decode('c3RyaW5n')
then, you can proceed sending:
C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr ... Sender ok
C: RCPT TO: <bob@server.edu>
S: 250 bob@server.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 server.edu closing connection
SMTP uses persistent connections
SMTP requires message (header and body) to be in 7-bit ASCII
SMTP server uses CRLF.CRLF to determine end of message
comparison with HTTP:
* HTTP: pull
* SMTP: push
* both have ASCII command/response interaction, status codes
* HTTP: each object encapsulated in its own response message
* SMTP: multiple objects sent in multipart message
Message header
* Header containing peripheral information that precedes the body of the
message itself.
* The header lines and the body of the message are separated by a blank
line (CRLF).
* RFC 5322 specifies the exact format for mail header lines as well as
their semantic interpretations.
* As with HTTP, each header line contains readable text, consisting of a
keyword followed by a colon followed by a value.
* Some of the keywords are required and others are optional.
* Every header must have a From: header line and a To: header line; a
header may include a Subject: header line as well as other optional
header lines.
From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Searching for the meaning of life.
Show: Open an email in Mutt/raw to illustrate headers, MIME, multipart
Email protocols and direction of communication
When sent an email by Alice, how does a recipient like Bob, running a
user agent on his local PC, obtain his messages, which are sitting in a
mail server within Bob’s mail provider?
* Post Office Protocol—Version 3 (POP3)
* Internet Mail Access Protocol (IMAP)
* HTTP
C: client
S: server
ncat mailServer 110
S: +OK POP3 server ready
C: user bob
S: +OK
C: pass hungry
S: +OK user successfully logged on
C: list
S: 1 498
S: 2 912
S: .
C: retr 1
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 1
C: retr 2
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 2
C: quit
S: +OK POP3 server signing off
Another POP3 example
++++++++++++++ Cahoot-02-6
Discussion question:
At first guess, would you think the internet has a kill-switch, like it
might in a Hollywood movie?
If it did, what might the consequences be?
On businesses?
On people?
In the USA?
In China?
In Russia?
In Kazakhstan?
etc.
“The Domain Name Server (DNS) is the Achilles heel of the
Web.
The important thing is that it’s managed
responsibly.”
-Tim Berners-Lee
People: many identifiers:
* SSN, name, passport #, etc.,
Internet hosts, routers:
* IP address (32 bit for IPv4) - used for addressing datagrams
* Easy to remember “name”, e.g., https://www.yahoo.com - used by
humans
The big questions:
* How to securely and fairly map between IP address and
name, and vice versa ?
* What happens if you want a name someone else has?
* What happens if someone wants the name you have?
* Is the entity with the name really the remote entity?
* Are security and fairness opposed?
* Are both opposed to policing the space, to censorship?
(double-meaning intended)
Web browser example:
1. User machine runs the client side of the DNS application.
2. A web browser extracts the hostname, https://www.someschool.edu, from
a URL entered by the user, and passes the hostname to the client side of
the DNS application.
3. The local DNS client sends a query containing the hostname to a
somewhat-remote DNS server (or a chain of such servers).
4. The DNS client eventually receives a reply, which includes the IP
address for the hostname.
5. Once the web browser receives the IP address from DNS, it can
initiate a TCP connection to the HTTP server process located at port 80
at that IP address.
Domain Name System:
* is a distributed database implemented in a hierarchy of many name
servers
* application-layer protocol:
* hosts, name servers communicate to resolve names (address/name
translation)
* clients request name/address bindings
* This is a core Internet function.
* an example of keeping the complexity at network’s “edge”, where it can
be maintained.
Discussion Question:
* What are several reasons an entity might want to steal a network
name?
* Would you guess that all such purposes bad?
DNS is just a pyramid scheme…
client wants IP for https://www.amazon.com; 1st approximation:
* client queries public root server (directly or indirectly) to find
.com DNS server
* client queries .com DNS server to get amazon.com’s public-facing DNS
server
* client queries amazon.com’s public-facing DNS server to get IP address
for https://www.amazon.com
https://en.wikipedia.org/wiki/Root_name_server
https://en.wikipedia.org/wiki/DNS_root_zone
* root servers are contacted by local name server that can not resolve
name
* root contacts authoritative name server if name mapping not
known
* gets mapping
* returns mapping to local name server
https://en.wikipedia.org/wiki/Top-level_domain
* responsible for com, org, net, edu, aero, jobs, museums, and all
top-level country domains, e.g.: uk, fr, ca, jp
* Network Solutions maintains servers for .com TLD
* Educause for .edu TLD
If https://mst.eu is available…
What fun things could we do with that…?
Ask: How can one “be” an EU resident on the internet?
Show: some Wireshark observations of nslookup for various types of record (overview this time, more detail again lower).
Visit https://mst.edu with web browser
Make a manual query using command line tools
#!/bin/bash
nslookup mst.edu
nslookup www.mst.edu
dig mst.edu
dig www.mst.edu
whois mst.edu
whois icann.org
# What are the authoritative servers?
nslookup -type=NS mst.edu
# What do the authoritative servers say?
nslookup mst.edu ns-1.mst.edu
https://en.wikipedia.org/wiki/WHOIS
WHOIS going to tell us a Domain Name joke?
Standard iterated query
Some host at cis.poly.edu
wants IP address for
gaia.cs.umass.edu
Iterated query:
Contacted server replies with name of server to contact.
“I don’t know this name,
but ask this other server who is responsible for knowing,
or is responsible for asking some server that is.”
Recursive queries
Recursive query:
Puts burden of name resolution on contacted name server.
Heavy load at upper levels of hierarchy?
Q:
Can one’s own machine just do the query to root, TLD, and
authoritative?
Why bother with the institutional resolver?
A:
Yes, if you set up your own DNS server (easy).
Just install bind
, and configure it.
It’s just extra functionality not built into every client.
++++++++++++++ Cahoot-02-7
DNS caching, updating records
* once (any) name server / resolver learns mapping, it caches the
mapping for a while
* cache entries timeout (disappear) after some Time To Live (TTL)
* TLD server information is typically cached in local name servers
* thus root name servers visited less
* cached entries may be out-of-date (best effort name-to-address
translation!)
* if name host changes IP address, may not be known Internet-wide until
all TTLs expire
* update/notify mechanisms proposed IETF standard
* RFC 2136
DNS is at the root of many internet problems…
https://en.wikipedia.org/wiki/Domain_Name_System#DNS_message_format
Query and reply messages, both with same overall message format
Message header
* identification:
* 16 bit number for query,
* reply to query uses same number as query
* flags:
* query or reply
* recursion desired
* recursion available
* reply is authoritative
The header of DNS messages is composed of 12 bytes and its structure is shown in the figure below.
The QR flag is set to 0 in DNS queries and 1 in DNS answers.
The Opcode is used to specify the type of
query.
For instance, a standard query is when a client sends a name, and the
server returns the corresponding data.
An update request is when the client sends a name, and new data, and the
server then updates its database.
The AA bit is set, when the server that sent the
response has authority for the domain name found in the question
section.
In the original DNS deployments, two types of servers were considered :
authoritative servers and non-authoritative servers.
The authoritative servers are managed by the system administrators
responsible for a given domain.
They always store the most recent information about a domain.
Non-authoritative servers are servers or resolvers that store DNS
information about external domains without being managed by the owners
of a domain.
They may thus provide answers that are out of date.
From a security point of view, the authoritative bit is not an absolute
indication about the validity of an answer.
Ask: Is this secure?
It uses UDP; what does this imply?
Where TC is set, the partial RRSet that would not
completely fit may be left in the response.
When a DNS client receives a reply with TC set, it should ignore that
response, and query again, using a mechanism, such as a TCP connection,
that will permit larger replies.
The RD (recursion desired) bit is set by a client
when it sends a query to a resolver.
Such a query is said to be recursive because the resolver will recurse
through the DNS hierarchy to retrieve the answer on behalf of the
client.
In the past, all resolvers were configured to perform recursive queries
on behalf of any Internet host.
However, this exposes the resolvers to several security risks.
The simplest one is that the resolver could become overloaded by having
too many recursive queries to process.
As of this writing, most resolvers only allow recursive queries from
clients belonging to their company or network and discard all other
recursive queries.
The RA bit indicates whether the server supports recursion.
The RCODE is used to distinguish between different types of errors. See RFC 1035 for additional details.
The last four fields indicate the size of the Question, Answer, Authority and Additional sections of the DNS message.
The last four sections of the DNS message contain Resource Records (RR).
All RRs have the same top level format shown in the figure below.
DNS: distributed database storing resource records (RR)
RR format: (name, value, type, ttl)
NAME
TYPE
CLASS
TTL
RDLENGTH
RDATA
Name indicates the name of the node to which this resource record pertains.
The two bytes Type field indicates the type of resource record.
The Class field was used to support the utilization of the DNS in other environments than the Internet.
The TTL field indicates the lifetime of the Resource
Record in seconds.
This field is set by the server that returns an answer and indicates for
how long a client or a resolver can store the Resource Record inside its
cache.
A long TTL indicates a stable RR.
Some companies use short TTL values for mobile hosts and also for
popular servers.
For example, a web hosting company that wants to spread the load over a
pool of hundred servers can configure its nameservers to return
different answers to different clients.
If each answer has a small TTL, the clients will be forced to send DNS
queries regularly.
The nameserver will reply to these queries by supplying the address of
the less loaded server.
The RDLength field is the length of the RData field that contains the information of the type specified in the Type field.
Several types of DNS RR are used in practice.
* The A type is used to encode the IPv4 address that
corresponds to the specified name.
* The AAAA type is used to encode the IPv6 address that
corresponds to the specified name.
* A NS record contains the name of the DNS server that
is responsible for a given domain.
type=A
* name is hostname
* value is IP address
type=NS
* name is domain (e.g., foo.com)
* value is hostname of authoritative name server for this domain
type=CNAME
* name is alias name for some “canonical” (the real) name
* https://www.ibm.com is really servereast.backup2.ibm.com
* value is canonical name
* CNAME (or canonical names) are used to define aliases.
* For example https:_www.example.com could be a CNAME for
pc12.example.com that is the actual name of the server on which the web
server for https:_www.example.com runs.
type=MX
* value is name of mailserver associated with name
There are more record types (summary of commonly
used):
https://en.wikipedia.org/wiki/List_of_DNS_record_types
* Address Mapping record (A Record) also known as a DNS
host record, stores a hostname and its corresponding IPv4 address.
* IP Version 6 Address record (AAAA Record) stores a
hostname and its corresponding IPv6 address.
* Canonical Name record (CNAME Record) can be used to
alias a hostname to another hostname. When a DNS client requests a
record that contains a CNAME, which points to another hostname, the DNS
resolution process is repeated with the new hostname.
* Mail exchanger record (MX Record) specifies an SMTP
email server for the domain, used to route outgoing emails to an email
server.
* Name Server records (NS Record) specifies that a DNS
Zone, such as “example.com” is delegated to a specific Authoritative
Name Server, and provides the address of the name server.
* Reverse-lookup Pointer records (PTR Record) allows a
DNS resolver to provide an IP address and receive a hostname (reverse
DNS lookup).
* Certificate record (CERT Record) stores encryption
certificates PKIX, SPKI, PGP, and so on.
* Service Location (SRV Record) a service location
record, like MX but for other communication protocols.
* Text Record (TXT Record) typically carries
machine-readable data such as opportunistic encryption, sender policy
framework, DKIM, DMARC, etc.
* Start of Authority (SOA Record) this record appears
at the beginning of a DNS zone file, and indicates the Authoritative
Name Server for the current DNS zone, contact details for the domain
administrator, domain serial number, and information on how frequently
DNS information for this zone should be refreshed.
+++++++++++++++++ Cahoot-02-8
Show: some Wireshark observations of nslookup for various types of record, this time in detail about the fields.
Visit https://mst.edu with web browser
Make a manual query using command line tools
#!/bin/bash
nslookup mst.edu
nslookup www.mst.edu
dig mst.edu
dig www.mst.edu
whois mst.edu
whois icann.org
# What are the authoritative servers?
nslookup -type=NS mst.edu
# What do the authoritative servers say?
nslookup mst.edu ns-1.mst.edu
https://en.wikipedia.org/wiki/Reverse_DNS_lookup
* The DNS is mainly used to find the IP address that correspond to a
given name.
* However, it is sometimes useful to obtain the name that corresponds to
an IP address.
* This done by using the PTR (pointer) RR.
* The RData part of a PTR RR contains the name while the Name part of
the RR contains the IP address encoded in the in-addr.arpa domain.
* IPv4 addresses are encoded in the in-addr.arpa by reversing the four
digits that compose the dotted decimal representation of the
address.
* For example, consider IPv4 address 192.0.2.11.
* The hostname associated to this address can be found by requesting the
PTR RR that corresponds to 11.2.0.192.in-addr.arpa.
* A similar solution is used to support IPv6 addresses, see RFC
3596.
https://en.wikipedia.org/wiki/Domain_Name_System#Security_issues
https://en.wikipedia.org/wiki/Domain_Name_System#Privacy_and_tracking_issues
DDoS bandwidth-flooding attack
An attacker could attempt to send to each DNS root server a deluge of
packets,
so many that the majority of legitimate DNS queries never get
answered.
Bombard root servers with traffic.
This has not really been successful to date.
Defenses include:
Traffic filtering.
Local DNS servers cache IPs of TLD servers,
allowing root server bypass.
Bombarding TLD servers is potentially more dangerous.
Man-in-the-middle attack
The attacker intercepts queries from hosts and returns bogus
replies.
https://en.wikipedia.org/wiki/DNS_hijacking
(show in class)
DNS poisoning attack
The attacker sends bogus replies to a DNS server,
who is making outgoing requests itself,
tricking the server into accepting bogus records into its cache.
Send bogus replies to DNS server, which caches
https://en.wikipedia.org/wiki/DNS_spoofing
(show in class)
DNS redirection
Another important DNS attack is not an attack on the DNS service, per
se,
but instead exploits the DNS infrastructure,
to launch a DDoS attack against a targeted host.
Attacker sends DNS queries to many authoritative DNS servers,
with each query having the spoofed source address of the targeted
host.
The DNS servers then send their replies directly to the targeted
host.
Exploit DNS for DDoS:
send queries with spoofed source address and target IP.
This often requires amplification
DNS as exfiltration / infiltration / tunneling
One can sneak data through DNS requests or replies.
+++++++++++++++++ Cahoot-02-9
Ways to avoid those attacks:
Just encrypt the connections to the server:
https://en.wikipedia.org/wiki/DNS_over_HTTPS
https://en.wikipedia.org/wiki/DNS_over_TLS
Tor/VPN/Proxy (privacy, but also some security).
Cryptographic signatures on DNS messages
https://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions
https://en.wikipedia.org/wiki/DNS-based_Authentication_of_Named_Entities
https://en.wikipedia.org/wiki/DNS_Certification_Authority_Authorization
The Lord of the DNS
One DNS to rule them all,
One DNS to find them,
One DNS to bring them all,
and in the darkness bind them…
(i.e., a big boring Sauron committee…)
https://en.wikipedia.org/wiki/ICANN
https://en.wikipedia.org/wiki/ICANN#Criticism
https://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority
https://en.wikipedia.org/wiki/OpenNIC
(permitted to be an open alternative)
+++++++++++++++++ Cahoot-02-10
Fellowship of the DNS…
Fair, robust, distributed, decentralized, non-exploitable name
resolution,
is a bit of a https://en.wikipedia.org/wiki/Catch-22_(logic)
and a real difficult problem to solve…
Discussion question:
What might a reliable distributed solution look like?
Might they come with their own exploits and problems?
Might a p2p system end up even more dictatorially problematic than
DNS?
(e.g., Mr. Robot’s Evil Corp cryptocurrency)?
GNU name system
https://gnunet.org/gns
https://lsd.gnunet.org/lsd0001/
https://news.ycombinator.com/item?id=30154830
(discuss proposal to replace DNS!)
ICANN https///icann.zoom.us/rec/play/znYwyZWPwrNraKqiZCLwOkHp_NITBj0QdhMpIrZPTrJumDRxIaecB8DHAygsgO-8PxQKkYx5ESGj6pBl.vZAWJHZoGeNyX9R4?startTime=1572978711000&_x_zm_rtaid=M4Wj53e3QXyaUK9nI6hiQg.1644387258044.8569edd15b9c2bafee5b5a283ad9fa90&_x_zm_rhtaid=108|discussion
of using GNUnet instead of DNS
I2P web-of-trust name system
https://geti2p.net/en/docs/naming
(web of trust based)
Crypto-currency-based
https://ens.domains/
https://docs.ens.domains/en/latest/introduction.html
https://www.namecoin.org/
https://en.wikipedia.org/wiki/Namecoin
Do you need to buy a name to host a site on clearnet?
Do you need to buy an static IP to host a site on clearnet,
or does a dynamic IP suffice?
What about dynamic DNS?
https://en.wikipedia.org/wiki/Dynamic_DNS
Do you need to buy act actual machine? A virtual one?
Do you need to buy an HTTPS certificate?
Do you need to buy anything else?
What about overlay layers or darknets for simple free hosting?
https://en.wikipedia.org/wiki/I2P
https://en.wikipedia.org/wiki/Tor_(anonymity_network)
Can one circumvent DNS editing as a censorship technique?
Can one block sites at all with common darknets?
What is the easiest way to set up an independent site on your own
hardware, or a VPS you rent?
Static websites:
https://onionshare.org/
http://lldan5gahapx5k7iafb3s4ikijc4ni7gx5iywdflkba5y2ezyg6sjgyd.onion/
sudo dnf install tor
pip3 install --upgrade onionshare-cli --user
echo "cool publically accessible website" >index.html
onionshare-cli --website --public index.html
You could even host a website like this on your phone,
in under 10 minutes:
https://onionshare.org/mobile/#download
Anywhere that had an internet connection,
you could leave your phone plugged in an host a website there…
https://medium.com/axon-technologies/hosting-anonymous-website-on-tor-network-3a82394d7a01
Interactive backend easily possible with tor process and Apache.
https://geti2p.net/en/faq#myI2P%20Site
Today:
Theoretical difficulties with P2P and their solutions (general).
An overview of protocols and services provided by P2P overlay
applications (general).
High level protocol specification for an example P2P application
(BitTorrent).
There are many P2P protocols.
BitTorrent is just one we will review today.
* no always-on server
* arbitrary end systems directly communicate
* peers are intermittently connected and change IP addresses
Examples:
* file distribution (BitTorrent)
* Streaming (KanKan)
* VoIP (Skype - partial, funny story of history)
File distribution problem: Client server vs. P2P
Upload/download capacity is limited resource!
Question:
How much time to distribute file (size F),
from one server, to N peers?
Let’s first determine the distribution time for the client-server
architecture,
which we denote by Dcs . In the client-server architecture, none of the
peers aids in
distributing the file. We make the following observations:
The server must transmit one copy of the file to each of the N
peers.
Thus, the server must transmit N * F bits.
Since the server’s upload rate is us,
the time to distribute the file must be at least (N * F) /
us
Server transmission
Must sequentially send (upload) a number (N) of file (F) copies:
us = server upload
time to send one copy:
F / us
time to send N copies:
(N * F) / us
Let d min denote the download rate of the peer with the lowest
download rate,
that is, dmin = min{d1, dp,…
,dN){width=700px.
The peer with the lowest download rate,
cannot obtain all F bits of the file in less than F / dmin
seconds.
Thus the minimum distribution time is at least F / dmin
That however, will almost never be the real time,
as the server must distribute to many peers.
Client: each client must download file copy
* dmin = min client download rate
* min client download time: F / dmin
Time to distribute F to N clients using client-server
approach:
Dcs > max{ (N * F) / us, F / dmin
}
Max numerator increases linearly with N.
Question: how much time to distribute file (size F)
from one server to N peers?
At the beginning of the distribution, only the server has the
file.
To get this file into the community of peers, the server must send each
bit of the file at least once into its access link.
Thus, the minimum distribution time is at least F / us
Unlike the client-server scheme, a bit sent once by the server may not
have to be sent by the server again, as the peers may redistribute the
bit among themselves.
Server transmission
Must upload at least one copy.
Time to send one copy:
F / us
As with the client-server architecture, the peer with the lowest
download rate cannot obtain all F bits of the file in less than F / d
min seconds.
Thus the minimum distribution time is at least F / dmin
Unlike with the client-server model, with p2p, this could actually (and
often is) the server’s bandwidth contribution.
Client: each client must download file copy
* min client download time: F / dmin
The total upload capacity of the system as a whole is equal to the
upload rate of the server plus the upload rates of each of the
individual peers, that is:
utotal = us + u1 + … +
uN
The system must deliver (upload) F bits to each of the N peers, thus
delivering a total of N * F bits.
This cannot be done at a rate faster than utotal.
Thus, the minimum distribution time is also at least (N * F) /
(us + u1 + … + uN).
Clients: as aggregate, each individual (i) must
download N * F bits
* max upload rate (limiting max download rate) is us +
sum(ui)
Time to distribute F to N clients using P2P
approach:
DP2P > max{ F / us, F / dmin, (N *
F) / (us + sum(ui)) }
Max numerator increases with N
But, so does the denominator,
since each peer provides service capacity
Net client upload rate = u
F / u = 1 hour
us = 10u
dmin >= us
P2P vs Client server
For the P2P architecture the minimal distribution time is always
lesser,
compared to the distribution time of the client-server
architecture.
It is also less than a fixed duration, above some number of
peers N!
Applications with the P2P architecture can be self-scaling.
This scalability is a direct consequence of peers being
re-distributors,
as well as consumers of bits.
Standard protocol,
many clients (Vuze, BigglyBt, I2P-Snark, Bittorrent-official, etc.,
),
and versions of tracker software (some server-based trackers).
File divided into 256Kb chunks (or other equal size).
Peers in torrent send/receive file chunks.
Tracker:
tracks peers participating in torrent (or DHT);
runs their own choice of tracker software.
Used to be only a server-side operation, now also can be P2P!
Torrent:
meta-data and group of peers exchanging chunks of a file.
Client:
uploads and downloads files; runs their own client torrent software
Process:
Alice arrives, chooses a torrent, and using the torrent meta-data,
obtains a list of peers from tracker server (or distributed
tracker),
and finally begins exchanging file chunks with peers in torrent.
++++++++++++++++ Cahoot-02-11
Peer joining torrent:
* new peer has no chunks, but will accumulate them over time from other
peers
* registers with tracker (server or distributed) to get list of peers
who have the torrent of interest, connects to subset of peers
(“neighbors”)
After joining:
* while downloading, peer uploads chunks to other peers
* peer may change peers, with whom it exchanges chunks
Churn:
* peers may come and go
* once peer has entire file, it may (selfishly) leave or
(altruistically) remain in torrent
Requesting chunks:
* at any given time, different peers have different subsets of file
chunks
* periodically, Alice asks each peer for list of chunks that they
have
* Alice requests missing chunks from peers, rarest first
Ask: Why rarest first?
Discussion question:
Why not just be a leech (download but not contribute)?
How might you design a protocol with incentives?
What might an incentive look like?
Should you build incentives into protocols?
Do people follow incentives?
How do we put a kink in the wires of those who don’t contribute
enough,
slowing down their transfers,
to encourage every peer to reciprocate?
Sending chunks: tit-for-tat incentives
https://en.wikipedia.org/wiki/Tit_for_tat
* Alice sends chunks to those four peers currently sending her chunks at
highest rate, rewarding them with more data.
* other peers are choked by Alice (do not receive chunks from her)
* re-evaluate top 4 every 10 secs
* every 30 secs: randomly select another peer, starts sending
chunks
* “optimistically un-choke” this peer, in hope that the new peer
reciprocates
* newly chosen peer may join top 4
Overview:
(1) Alice “optimistically un-chokes” a new participant, Bob, in hopes
that reciprocates
(2) Alice becomes one of Bob’s top-four providers; Bob
reciprocates
(3) Bob becomes one of Alice’s top-four providers
All this results in higher upload rate, finding better trading partners, and getting file faster !
Sharing is caring…
Due to risk or costs in internet speed or throughput,
individuals could potentially download, but not upload.
An interesting read, game theory in software design and
CompSci:
http://bittorrent.org/bittorrentecon.pdf
General (not BitTorrent specific)
++++++++++++++++ Cahoot-02-12
Review: dictionaries, maps, and hash tables
Simple database with (key, value) pairs:
key: human name;
value: social security number
key = hash(original key)
* O(1) complexity regardless of size of data.
* Can store large sparse key-space in smaller array with constant access
time
Note: There are potentially two distributed
databases (or merged into one) in some p2p networks:
1. Routing table for overlay network peers, who are defined by their
addresses
2. Database of torrents: addresses/peers
It’s easy to keep a database on a server,
but how do we increase the censorship resistance and robustness?
(ComputerNetworkingEssentials.pdf, 128.17.123.38)
DHT
Problem
* How to lookup which user is storing a particular hashed key?
Solution
* each peer only aware of immediate successor and predecessor.
* circular DHT (a)
* Only index forward neighbors
* number of messages is n/2; O(N) messages on average to resolve query,
when there are N peers
(b) Storing indices of more neighbors increases messaging efficiency,
and increases storage overhead
* Each peer keeps track of IP addresses of predecessor, successor, short
cuts.
* A balance of connections: space versus time
* DHT can be designed so that both the number of neighbors per peer as
well as the average number of messages per query is O(log N), where N is
the number of peers.
Peers come an go, and the network must adapt.
Example, peer 5 abruptly leaves, or is disconnected
handling peer churn:
* peers may come and go (churn)
* each peer knows address of its two successors
* each peer periodically pings its
* two successors to check aliveness
* if immediate successor leaves, choose next successor as new immediate
successor
example: peer 5 abruptly leaves
* peer 4 detects peer 5’s departure; makes 8 its immediate
successor
* 4 asks 8 who its immediate successor is; makes 8’s immediate successor
its second successor.
https://en.wikipedia.org/wiki/Bittorrent
https://wiki.wireshark.org/BitTorrent
http://bittorrent.org/beps/bep_0000.html
https://www.bittorrent.org/beps/bep_0003.html (show in class)
https://en.wikipedia.org/wiki/Torrent_file (show in class)
https://en.wikipedia.org/wiki/Magnet_URI_scheme
Show a real torrent file, map to specifications.
For example,
https://ftp.qubes-os.org/iso/Qubes-R4.1.0-x86_64.torrent
BitTorrent protocol: two main transport level choices
https://www.bittorrent.org/beps/bep_0029.html
The motivation for uTP is for BitTorrent clients to not disrupt internet connections, while still utilizing the unused bandwidth fully.
When using regular TCP connections, BitTorrent quickly fills up the send buffer, adding multiple seconds delay to all interactive traffic.
More detail on this when we get to the details of TCP buffers (next major topic is transport layer).
Show with torrent download (bt-utp header):
0 4 8 16 24 32
+-------+-------+---------------+---------------+---------------+
| type | ver | extension | connection_id |
+-------+-------+---------------+---------------+---------------+
| timestamp_microseconds |
+---------------+---------------+---------------+---------------+
| timestamp_difference_microseconds |
+---------------+---------------+---------------+---------------+
| wnd_size |
+---------------+---------------+---------------+---------------+
| seq_nr | ack_nr |
+---------------+---------------+---------------+---------------+
Show/demo: Wireshark downloading Linux ISO with transmission
Goal
Learn how to build client/server applications that communicate using
sockets.
Socket:
A tunnel between application processes, in an end-to-end transport
protocol
Two primary socket types for two transport services exist.
UDP is an unreliable, lightweight datagram
service.
TCP is a reliable, heavier, byte-stream, connection
oriented service.
Application example we’ll put in code, in
order:
* Client
* inputs a line of characters (data) from the keyboard, and
* sends the data to server
* Server
* receives the data,
* converts the characters to uppercase, and
* sends the modified data to client
* Client
* receives modified data, and
* displays it as a printed line on the screen
UDP involves no persistent “connection” between a client and
server.
No handshaking occurs before sending data.
A sender explicitly attaches a destination IP address and port number to
each packet.
A receiver extracts the sender’s IP address and port number from each
received packet.
Transmitted data may be lost.
Transmitted data may be received out-of-order.
UDP provides unreliable transfer of groups of bytes (“datagrams”)
between client and server.
UDP socket code:
* 02-Application/socket_01_UDP_server.py
* 02-Application/socket_01_UDP_client.py
Demonstrate:
0. Run in background:
python3 socket_01_UDP_server.py
Show Wireshark watching the client and server code:
sudo wireshark &
Connect with:
python3 socket_01_UDP_client.py
nc -uC 127.0.0.1 6789
man nc # ncat can send UDP packets too!
Show how nc
or multiple python clients can block
+++++++++++++++++++++++++++++++++ Cahoot-02-13
A server process must first be running.
The server must have created a TCP socket,
that welcomes a client’s contact.
A client contacts a server.
The client specifies an IP address and port number of a server
process.
The client uses that address to create a TCP socket.
The client’s TCP socket establishes a connection to the server.
When contacted by a client on the welcoming socket,
the server’s TCP socket creates a secondary new socket,
for the server process to communicate with that particular client.
This allows server to talk with multiple clients.
Source port numbers distinguish different clients.
TCP provides a reliable, in-order, byte-stream transfer between a client
and server.
TCP socket code:
Example 1:
* 02-Application/socket_02_TCP_server.py
* 02-Application/socket_02_TCP_client.py
Example 2:
* 02-Application/socket_02_TCP_server2.py
* 02-Application/socket_02_TCP_client2.py
Demonstrate:
0. Run a server:
python3 socket_02_TCP_server.py
Show Wireshark watching the client and server code:
sudo wireshark &
Connect with
python socket_02_TCP_client.py
nc -C 127.0.0.1 6789
# new option
man ss
ss
# old, ss is better
man netstat
netstat -an
# another option
man lsof
lsof -i -n
The term “port” is not the same idea or definition as the term
“socket”.
Socket is an instance object dually created both:
within a requesting application, and
within an operating system for that requesting application.
Port is a designation dually configured both:
as field in the transport-layer headers, in actual packets, and
in the OS’s kernel networking core, and firewall configuration.
The OS’s kernel routes packets to the application.
Step by step:
+++++++++++++++++++++++++++++++++ Cahoot-02-14
https://realpython.com/python-concurrency/
https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock
https://wiki.python.org/moin/GlobalInterpreterLock
https://realpython.com/python-gil/
Don’t like the GIL,
perhaps go with pypy:
https://www.pypy.org
https://realpython.com/pypy-faster-python/
You can use either:
multithreading
multiprocessing
asyncio
When should you use each?
multithreading to deal with simple blocking (no real speed up).
multiprocessing to run over multiple cores (speed up).
asyncio to deal more more complex or larger-scale needs (often
blocking).
See my code now:
* 02-Application/thread_00_none.py
* 02-Application/thread_01_unrolled.py
* 02-Application/thread_02_fake.py
* 02-Application/thread_03_storage.py
Show multi-threaded examples now:
* 02-Application/socket_04_TCP_server_mt.py
* 02-Application/socket_04_TCP_client_mt.py
Now, nc
does not block the server from other
client’s:
python3 socket_04_TCP_server_mt.py &
nc -C 127.0.0.1 50002
python3 socket_04_TCP_server_mt.py
https://docs.python.org/3/library/socket.html
https://realpython.com/python-sockets/
Let’s review some program-internal functions.
>>> help(socket.socket.bind)
bind(...)
bind(address)
Bind the socket to a local address.
For IP sockets, the address is a pair (host, port);
the host must refer to the local host.
For raw packet sockets the address is a tuple
(ifname, proto [,pkttype [,hatype [,addr]]])
socket.socket.bind
takes a tuple: (hostname or IP,
port)
https://serverfault.com/questions/78048/whats-the-difference-between-ip-address-0-0-0-0-and-127-0-0-1
What are valid hostname or IP addresses to use?
The use of the term “local” above is ambiguous.
Q: What does it mean here, operationally?
A: That the IP address being bound is assigned to an interface managed
by your operating system!
More to come on interfaces when we cover the network layer:
../Content/04-NetworkData.html
""
defaults to all traffic to the machine.
It is the same as 0.0.0.0 for IPv4.
It’s easier for IPv6.
0.0.0.0
which also listens to all traffic on the machine
(0.0.0.0 means various different things in different contexts).
https://www.rfc-editor.org/rfc/rfc1122#page-29 section 3.2.1.3
(a) { 0, 0 }
This host on this network.
MUST NOT be sent,
except as a source address as part of an initialization procedure,
by which the host learns its own IP address.
See also Section 3.3.6 for a non-standard use of {0,0}.
https://www.rfc-editor.org/rfc/rfc5735#section-3
0.0.0.0/8 - Addresses in this block refer to source hosts on “this”
network.
Address 0.0.0.0/32 may be used as a source address for this host on this
network;
other addresses within 0.0.0.0/8 may be used to refer to specified hosts
on this network ([RFC1122], Section 3.2.1.3).
Despite the standard, 0.0.0.6 for example, won’t bind in python3.
<hostname>
https://docs.python.org/3/library/socket.html
If you use a hostname in the host portion of IPv4/v6 socket
address,
the program may show a nondeterministic behavior,
as Python uses the first address returned from the DNS resolution.
The socket address will be resolved differently into an actual IPv4/v6
address,
depending on the results from DNS resolution and/or the host
configuration.
For deterministic behavior use a numeric address in host portion.
On my Fedora machine, it resolves to 127.0.0.1.
Hostname is a shallow alias, implemented via checking: /etc/hosts.html
127.0.0.1 through 127.255.255.254
(CIDER notation:
127.0.0.0/8)
https://www.rfc-editor.org/rfc/rfc5735#section-3
127.0.0.0/8 - This block is assigned for use as the Internet host
loopback address.
A datagram sent by a higher-level protocol to an address anywhere within
this block loops back inside the host.
This is ordinarily implemented using only 127.0.0.1/32 for
loopback.
As described in [RFC1122], Section 3.2.1.3,
addresses within the entire 127.0.0.0/8 block do not legitimately appear
on any network anywhere.
Your local machine only.
You can use 127.0.0.4 (or whatever in the range),
but that socket will only be reachable on that IP.
Python’s sending socket defaults to 127.0.0.1 as the sending IP,
when sending to any localhost address.
A LAN-only IP address
10.0.0.0 - 10.255.255.255 (10.0.0.0/8 prefix)
172.16.0.0 - 172.31.255.255 (172.16.0.0/12 prefix)
192.168.0.0 - 192.168.255.255 (192.168.0.0/16 prefix)
https://datatracker.ietf.org/doc/html/rfc1918
These IP ranges are declared as LAN IPs,
as opposed to public, globally routable IPs,
or to localhost IPs, etc.
If you have an interface bound to an IP in this range,
then you could bind any of these.
If your interface in the OS is not bound to one,
then you can not bind the socket in python either.
A public, globally routable IP address
More-or-less anything not in the below list:
https://www.iana.org/assignments/iana-ipv4-special-registry/iana-ipv4-special-registry.xhtml
https://en.wikipedia.org/wiki/IPv4#Special-use_addresses
If you have an interface bound to an IP in this range,
then you could bind any of these.
If your interface in the OS is not bound to one,
then you can not bind the socket in python either.
Which should you choose?
If you’re debugging locally,
then use 127.0.0.1.
If you are lazy,
then use “” or 0.0.0.0
If you want more security,
then consider using a specific IP,
of an interface on your machine.
Below, we illustrate state diagrams for UDP and TCP sockets.
These are standard POSIX sockets,
also known as BSD or Berkeley sockets.
https://en.wikipedia.org/wiki/Berkeley_sockets
Many languages use similar BSD sockets to those in the C language.
Python’s also follow the below API.
Discussion question:
What is the value of having a POSIX standard?
What is the value of specifying the socket API itself as part of
POSIX?
https://en.wikipedia.org/wiki/POSIX
The overview
The states:
To think ahead to what we’re covering next,
TCP’s actual internal FSM is much more detailed than this!
These images below are just the high-level API.
We’ll dive into that now:
Next: 03-Transport.html