## Cryptography on the 'Net

Books that cover this material well are
• Applied Cryptography, Bruce Schneier (excellent)
• SSL and TLS, Eric Rescorla (also excellent)
• SSH the Secure Shell , Barrett and Silverman

#### 1. Background

• With all these packets flying around - including credit cards, bank transfers, you name it - we'd like some security, eh?
• Model
• To even discuss what you mean by "security" you need to define what kind of problem you're worried about.
• The situation for the internet is commonly described in these terms. Two people want to communicate securely via a channel (the net) that someone else (all people who own the computers between them) can affect. In crypto circles, the players are given names to identify their roles; here they are A, B, and the evesdropper.
``` Alice -------- ( Eve )-------- Bob
```
• Eve may just listen (called a "passive" attack) or actually modify (called "active") the messages passing between Alice and Bob. We assume that Eve can see or modify everything on this connection.
• Encryption
• The answer is to encrypt the message. The picture becomes
``` --------------------------              --------------------------
| Alice                  |----( Eve )---| Bob                    |
| encrypt(Hello) = x5!\$b |              | decrypt(x5!\$b) = Hello |
--------------------------              --------------------------
```
so that the true message is hidden from Eve.
• Goals
There are several things we'd like our cryptography to accomplish, though we'll need an assortment of tricky techniques to manage all of them.
• Confidentiality - Eve should not be able to read the message.
• Authentication - Eve should not be able to pretend to be Alice.
• Integrity - Eve should not be able to modify the message.
• Secrets
• Bob and Alice must share some kind of secret knowledge that Eve does not have.
• Secret Methods vs Secret Keys
• In the old days, people often tried to keep the encryption method itself secret. (Security through obscurity.) But this has several problems.
• It's hard to know if the method is any good if it hasn't been well studied.
• You'd need a different method for each pair of people. And most of us aren't mathemeticians.
• These days there are a variety of know methods (called algorithms or ciphers), each using secret keys. You can only encrypt and decrypt if you know both the cipher and the key.
• How to share and manage the secrets can be tricky. Key management "trust management" systems are whole topics of their own. See for example KeyNote http://www.cis.upenn.edu/~keynote/ and RFC2704.

#### 2. Toolbox

• There has been a tremendous amount of work in this field over the last few decades, producing many, many methods with different strengths and weaknesses.
• Ciphers come in several varieties.

• One way hash function
• Goes by many names: message digest, digital fingerprint, cryptographic checksum.
• Short (one line) random-looking string calculated from a file.
• Cannot go backwards; cannot easily find files that produce a given digest.
• Math: something akin breaking the file into 256-bit chunks and then XOR-ing them all together. (Though that's a bit too simple; only gives on of the two above properties.)
• Two commonly used algorithms are
• MD5 (Message Digest version 5)
• digest is 16 bytes = 32 hex chars = 128 bits
• Been around for some time. Some weaker versions (MD4) have been broken.
• SHA-1 (Secure Hash Algorithm 1).
• digest is 20 bytes = 40 hex chars = 160 bits
• Newer, generally considered stronger. Designed by the NSA, which gives some people pause. SHA-1 is sort-of a souped up MD5.
• Command line example on bob:
``` \$ openssl md5 filename
\$ openssl sha1 filename
```
Change one character in the file, and try again.
See "man dgst", which describes the various "openssl dgst" functions, as well as "man md5sum", which describes a utility to generate and check these digests.

• Symmetric key
• Same key used for encryption and decryption, which means both Alice and Bob must know it.
• The popular algorithms are computationally fast.
• Two kinds: block (64 bit or larger chunks done at a time) and stream (bits or bytes of message done sequentially). The distinction isn't quite as clear as you might think, since the block ciphers are usually run in CBC (Cipher Block Chaining) mode.
If M[n] = n'th message block, C[n] is the encrypted block, E() is the encryption function, and ^ is the XOR operator, then the CBC method is to set C[n] = E( C[n-1] ^ M[n] ). CBC also requires a random IV (Initial Vector) to start this process for the first block.
• With symmetric keys, and a really good cipher, a brute-force attack needs to try roughly half of 2**(key_length) encryptions to break in. If you could make a biological computer whose 10**14 cells (a few tons) could each do a million (10**6) possibilities per second, then breaking a 128 bit key would take about 2**128 / (10**20/sec * 3*10**7 sec/year) = 10**11 years = 300 billion years, which is about 20 times the age of the universe.
• Some popular algorithms are
• DES (Digital Encryption Standard)
• 56-bit key, block encryption, lots of swapping and XOR-ing of bits.
• worldwide standard for 20 years. Probably can be broken now by folks like the NSA, though who knows for sure. Developed by NSA, which also has provoked many discussions of whether or not they left in a back door.
• Passwords on unix systems are kept as digests calculated with a variation of DES called "crypt3".
• 3DES or tripleDES
• DES done three times, with three different keys.
• Considered safer (though of course slower) than DES.
• 128, 192, or 256 bit keys.
• After a 4 year process, NIST chose Rijndael as its cipher for government use
• IDEA (International Data Encryption Algorithm)
• 128-bit key. patented.
• part of the well known PGP (Pretty Good Privacy)
• About twice as fast as DES.
• Considered stronger than DES.
See "man enc" on bob for many more.
• From the command line
``` \$ openssl des -in lecture.html -out lecture.des \
-K 1234567812345678 -iv abcdabcdabcdabcd
\$ openssl des -d -in lecture.des -out \
lecture-decoded.html -K 1234567812345678 -iv abcdabcdabcdabcd
```
Note that here the 56-bit key is given as a 16 hex char string. (Or you provide a password that is used to access the key in a locked file.)

• Public/Private key (or "asymmetric key")
• One key is private to an individual, never sent out to anyone. This private secret provides both security and authenticatation.
• Another key is known to either everyone, or to at least Bob. This is the "public" key.
• These methods are 100 to 1000 times slower than symmetric encryption. Therefore, no one uses them to encrypt the whole message
• Instead, they're used for exchanging symmetric keys and proving your identify.
• Two of the algorithms are

• By far the most popular public/private algorithm.
• Here how it works.
1. Alice picks two large primes p and q.
2. She calculates n = p*q.
3. She chooses any e (for encrypt) such that e and (p-1)*(q-1) are relatively prime.
4. She then computes (extended Euclid's algorithm) a number d (for decrypt) with the property that e*d = 1 mod (p-1)(q-1).
5. She tells the whole world her values for n and e; that's her public key.
6. She keeps d hidden as her secret private key.
7. To encode a message, she breaks it into numbers m[i] smaller than n. Then
encode: c[i] = m[i]**e mod n
decode: m[i] = c[i]**d mod n
• An example using small numbers might help. (In real applications these numbers are hundreds of digits long.)
1. Choose p=47, q=71.
2. n = 47*71 = 3337.
3. Choose (randomly) e = 79.
It (among many others) has the required property that 79 and (p-1)*(q-1) = 3320 do not have any factors in common.
4. Then we can find d = 1019. (Check: 1019*79 mod 3320 = 1. For numbers this small we can find d by brute force search.)
5. Publish public (n,e) = (3337, 79).
6. Keep d=1019 secret.
7. Say the message to be sent is 6882326879666683.
8. Break it into three digit (smaller than n) blocks,
9. (m1, m2, m3, ...) = ( 688, 232, 687, ...)
10. Encryption with public key:
688**79 mod 3337 = 1570 = c1
232**79 mod 3337 = 2756 = c1, and so on
11. The coded message is thus 1570 2756 ...
12. The decryption with private key gives:
1570**1019 mod 3337 = 688 = m1, and so on.
numbers (from Schneir's Applied Cryptography, pg 468.)
• Note that there are mathematical shortcuts to doing these big powers, else this wouldn't be practical. For example, to find a**16 mod n we do not do (a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*) mod n, but instead find ((((a**2 mod n)**2 mod n)**2 mod n)**2 mod n) which is only four multiplications (log base 2 of 16) and which never gets any bigger than n**2.
• In practice, the convention is to use e=65537. Since this is relatively small, encoding is relatively fast.
• Chosen-plaintext attacks.
• One problem with public/private key systems is that since the encoding is key is public, the attacker can try to work backwards. Given a coded message, she can encode all possible plaintext messages to see if one matches the code. For short messages, this may well be faster than trying all the keys. Symmetric algorithms don't have this particular weakness.
• Here's what some keys look like, created with the the command line "openssl" utilty. (Another good command line tool for managing public/private keys, encryption, and signing is GnuPG.)
```
[msie@bob rsa-example]\$ openssl genrsa -out private-1024.pem 1024
Generating RSA private key, 1024 bit long modulus
.........................................++++++
....++++++
e is 65537 (0x10001)

[msie@bob rsa-example]\$ cat private-1024.pem
-----BEGIN RSA PRIVATE KEY-----
MIICXQIBAAKBgQDG4QTKUK6WbYvnaY1tXcCtUmfOW2eTnldcgSq4/t+ZVsWU5wmD
Zcq6ncOGu8pkooyYQeReOuifSye6GMQudfD4ej9PKhDrH7gQcQJk2vV01ojPlYG5
t5O6Coi7TvQy29waIOjThAqtg4H7Le4w0f4g0MJMlyOs3TKnz0qFx5jIqQIDAQAB
AoGAEjhDLBXAKN/YVVcCMebI5BgMkoclMgzri/n5ZAFVksK0TzPrVzJYJEiXxRwn
KpkJsFk5Brj23sEP3qiuMGN1s+R/zhCZpkk0A9rHVL7hHS+m51J+4jezbndcBIf5
4wIisa5H7qVQHSuR0iHclXV87oyDv2axNKJsgXE54vUIeEECQQDqSz3+AXrQLKX2
+0kQJFSQ8tprT9i81FKUkjeqqAbqcdoApPeFrah9ImFUh1+Cv/MO2QewMSbhO20+
VmfkwjMFAkEA2U3W02KLpeq1vfFMTjvjsV8VVvTKWzXYlsYU9ANjitcSNKmj1B59
yjYVg39n/XIyIVKsqjRishFUPvI7m734VQJBAKpenG2gVdYbIXQ/thlu0a+1aO6v
2UM2gfZXfPMzzBOfRo9BZlxmsyaLYYs+BU3mlrAtUVHl7AfMVtwFqPbH4KECQQC7
ch2hchwsHu5uzjqYMakTU4XA4J+9VhFi3bMtWc7/8M3Ph5W+YB750vVz3O8C/QKp
I/u1RkLsf25AbgtlKNWRAkAhNgH25llu4yKICbh1EWRpBPIsb9E3GR5gaSSC4q2W
Scmp7BEAcZdv51snJOGwbNlyPsz/xz8rcreKAsAkl+xi
-----END RSA PRIVATE KEY-----

[msie@bob rsa-example]\$ openssl rsa -in private-1024.pem -pubout -out public.pem
writing RSA key
[msie@bib rsa-example]\$ cat public.pem
-----BEGIN PUBLIC KEY-----
W2eTnldcgSq4/t+ZVsWU5wmDZcq6ncOGu8pkooyYQeReOuifSye6GMQudfD4ej9P
KhDrH7gQcQJk2vV01ojPlYG5t5O6Coi7TvQy29waIOjThAqtg4H7Le4w0f4g0MJM
lyOs3TKnz0qFx5jIqQIDAQAB
-----END PUBLIC KEY-----

# This will show the prime numbers p and q explicitly:
[msie@bob rsa-example]\$ openssl rsa -in private-1024.pem -text
```
• More on keys
• A "Certificate" is someone's public key signed by a trusted person. People who's job it is to be trusted, and who's public keys come installed in browsers, are called Certificat Authorities.
• There are whole books (The Open-source PKI Book, http://ospkibook.sourceforge.net/docs/OSPKI-2.4.6/OSPKI/ospki-book.htm) on Public Key Infrastructures, i.e. how to deal with keys.
• Private keys are often stored encrypted, with a user pass-phrase needed to access it. The reason is that the user can remember a 7 leter pass phrase better than he can remember a 128-bit random number.
• There are also a variety of file formats for storing the keys. x.509 being the most overarching, and PEM (Privacy-Enhanced Mail) being one of the more common.
• The security of public keys depends on how easily they can be factored. This is hard, but they are, after all, public. (Symmetric keys are secret, and so they can be shorter and still resist equivalent brute force attacks.) Algorithms exist to factor such numbers without trying nearly all the sqrt(N) factors, and such mathematics is getting better all the time. Who knows how this trend will go. Scheier says "If you want your keys to remain secure for 20 years, 1024 bits is likely too short."

• Diffie-Hellman
• Another way for Alice and Bob to get a shared secret, without *any* prior communication or shared knowledge, and without Eve able to know it.
• First method of its kind historically.
• It uses number theory and mod arithmetic:
1. Alice picks p and g prime. (Say 47 and 3.)
2. Alice pices secret x (8); Bob picks secret y (10).
3. Alice sends A=(g**x mod p) which is 28 here.
4. Bob sends alice B=(g**y mod p) which is 17 here.
5. Alice computes SharedSecret = (A**x mod p) which gives 4.
6. Bob computes SharedSecretS = (B**y mod p) which gives 4.
7. The same number because (g**x)**y = (g**y)**x!
8. For big numbers (hundreds of digits), Eve is left in the dark.
(Numbers from Tanenbaum Computer Networks.)
• Specified as an option in the SSL key-exchange standard, though in practice nearly everyone uses RSA.
• Eve can, however, successfully run a "man-in-the-middle" attack by intercepting all messages and numbers and substituting her own. She never learns the shared secret of Bob-Alice, but she never needs to. Instead she generates a shared Bob-Eve secret and a shared Eve-Alice secret, which Bob and Eve use to encode their traffic. Eve decodes everything, re-codes, and sends along the other party. The only way to beat attacks like this is for Bob and Alice to have some prior mutual secret, or to use some trusted authority to verify who they are. (i.e. certificate authorities.)

#### aside 1: using ssh with public/private keys

```
Here's the recipe, as of OpenSSH_3.6.1p2, SSH protocols 1.5/2.0, OpenSSL 0x0090701f
for logging in between two computers (call them "laptop.m.edu" and "remote.m.edu")

First we create public/private keys on laptop.m.edu,
which will be tied to the user and host given in
in the shell environment variables.

on laptop.m.edu :
\$ echo creating key for \$LOGNAME@\$HOSTNAME
who@laptop.m.edu
\$ mkdir ~/.ssh/
\$ cd ~/.ssh/
\$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/laptop/who/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /laptop/who/.ssh/id_rsa.
Your public key has been saved in /laptop/who/.ssh/id_rsa.pub.
The key fingerprint is:
7e:c6:f9:db:fc:38:4d:e9:73:82:e6:92:7b:59:c0:bd who@laptop.m.edu

If a passphrase is given, then that phrase must be entered
but safer.  Be clear that without a passphrase,
remote accounts.

Next, copy the public part of this into the file authorized_keys.
If this is the only key you'll use, you can just do

\$ cp id_rsa.pub authorized_keys

Finally, copy that authorized_keys file to the remote machine that

\$ scp authorized_keys who@remote.m.edu:.ssh/authorized_keys

That's it.  Now you should be able to do

who@laptop\$ ssh remote.m.edu
who@remote\$

Another way to do this to to encode your private key with a password
but use an "agent" program to store it during your login session.
With this approach, you need to type the passphrase once after
you login to your laptop - then the running agent knows how to
unlock your private key when it needs to.

What's going on in any of these systems
is that ssh uses your private key (in laptop:.ssh/id_rsa)
to craft a message and send it to the remote machine.  There it uses
the corresponding public key in authorized_keys to see if you can get in.

The files look like this

\$ cat id_rsa
-----BEGIN RSA PRIVATE KEY-----
MIICXAIBAAKBgQDFEp7NPmh4WSXvPY0yrCEA5uVZmidIEZrH/0VOCp5ZITJCD9mc
uqXu/ZYFtysiFSE5yTXsjz8UPcPbM6RHQnpyrTvVycg4yyeY+AKmCL1cLGsutWV+
s1M+wmUfBWUSgRKFopHDFxVerCUmJSMakW+pRpf9jVmOGYtdYyFilPZqWwIBIwKB
gD3v6MQpjow5Rm/C4zvPspniKtMEkAC1E2NtfC54Xab7zfeBUwVfO3b/PcdIMiCn
jhl5wH2MGyOeYiBSDw8U5KclZYlliIOZ/ZpxHCQ/5JfwVD/FZGGCXc/iekSCIvJK
dCwmvuXRSNndCRuJG8y/aEXfYM//qel7is7Bt3JZk0f7AkEA7RaUNExJI7KOLXV3
C6ubFzc6FDkPX5W123ajiakbak1993jacklaiej1001kbGy22mqXBsRK+DXQFf/R
VjXIuQJBANTK6YD4HmBlZTQcXo7dUgE/Kbi+fuN71oBhYKzrTivpAbddIB66WWgl
RNK7KcP8/NP5XFkbisbKb818yFBJGbMCQQDfilE4n7KsoQm11ScZocy+HiDRPR0f
m7ghx+1CCfgvGxUvA5Bu/uxnLbUKJIttixLN73h0GCltrxSnByR12vBzAkEAi9XM
pTVVylFCgVR4pwZ3t60qC7BiAzQfPmvgcaHxipHN7YZW758HjZTVdImfGmRroT13
Mz38HDvjEgGK8u41dQJBAKKKfp8LJS0R9BOuF0oWxhFZR4QbySnSUCnQqYVG8rGY
tgTLBzBtK0c9nYJEx9JEmY2SYg/fZiDiB3UN0Cb16/U=
-----END RSA PRIVATE KEY-----

\$ cat id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAxRKezT5oeFkl7z2NMqwhAOblWZonSBGax/9FTgqeWSEyQg/Z
nLql7v2WBbcrIhUhOck17I8/FD3D2zOkR0J6cq071cnIOMsnmPgCpgi9XCxrLrVlfrNTPsJlHwVlEoEShaKR
wlJiUjGpFvqUaX/Y1ZjhmLXWMhYpT2als= who@laptop.m.edu

```

#### aside 2 - GnuPG

For encrypting email and for command line use, I tend to use gpg,the Gnu Privacy Guard, successor to pgp (Pretty Good Privacy). Besides implementing the various algorithms, it also maintains a database of keys in ~user/.gnupg and can also fetch public keys from standard servers. For example, see http://www.keyserver.net/en/ or subkeys.pgp.net My email (Thunderbird / Mozilla) GUI client uses the command line "gpg" to handle encryption. See http://www.gnupg.org/(en)/documentation/howtos.html and http://www.gnupg.org/gph/en/manual.html Here's how to do a few things from the command line. * generating a key : gpg --gen-key and then answer all the questions. Sample: in "milo" account passphrase = "silk, velvet" * listing keys gpg --list-keys * edit your key (needs passphrase) gpg --edit-key UID * fetching keys from remote server gpg --search mahoney@marlboro.edu (default server on cs is subkeys.pgp.net) see for example www.keyserver.net/en/ * uploading to a server gpg --send-keys * exporting public keys gpg --export UID > public.txt * encrypting and signing a file you need the public key of the recipient My email client (Mozilla Thunderbird) uses gpg to do things, so much of this can happen through a GUI.

#### 3. Abstract Protocols

• In practice all three of these algorithms - digests, symmetric encryption, public key - are used together to satisfy confidentiality, integrity, and authentication. A typical scheme runs something like this.
1. Alice generates a random session key K. She encrypts her long message M using 3DES with that key, i.e. C=3DES(K,M). She creates a digest of the message D with SHA-1.
2. Alice gets Bob's public key from a trusted authority Trent.
3. Alice creates a MAC (Message Authentication Certificate) by encoding a short block of (D, K, timestamp) with first her own private key, and then Bob's public key.
4. Alice then wraps the whole caboodle up with a short note as to what it is, and send it to Bob. The whole thing looks something like this:
``` header : "Here's something from Alice, encoded with ...."
MAC    : RSA(AlicePrivateKey, RSA(BobPublicKey, "D,K,time"))
Code   : 3DES(K,M)

```
5. Bob gets the package, and he (or his software) figures out what to do.
6. First he gets Alice's public key from Trent.
7. Then he decodes the MAC with his own private key, which only he has - thus only he can read the message.
8. Then he decodes the MAC again with Alice's public key - thus ensuring that only Alice could have sent it.
9. Finally, with the key K in hand, he decodes the message with 3DES.
10. To check that the message arrived intact and unaltered, he computes the message digest D with SHA-1 and compares with the one Alice sent along.
11. To make sure that this isn't something that Eve recorded from last week and is sending again, he checks the encoded time stamp.
• The basic ideas of these security protcols are
1. Pick (or negotiate) a suite of algorithms.
2. Exchange or somehow agree on a session key to be used for symmetric encryption.
3. Use prior secrets or a trusted authority of some kind (i.e. public keys database, certificate authorities) to establish identities.
4. Ensure message integrity with digests.
5. Avoid replay attacks with timestamps.
6. Encode the bulk of the message or stream with a fast, symmetric cipher.
• Many, many variations depending on the requirements - bandwidth available, computing power, level of security required, convenience, shared secrets and their installation, ...
• Other protocols can accomplish related tasks - zero knowledge proofs, split secrets, blind signatures, multiple keys, and many others.
• A chain is only as weak as the weakest link, and these protocols are no exception. Moreover, the points of failure are usually not always the ones you are looking at.
• If the government really wants to see your data, it's cheaper and easier to put in a camera to watch your terminal than to break the 256-bit encryption on your traffic.
• Few people break 56-bit keys. Many more people bribe vendors to give them credit card numbers from their database.

#### 4. SSH - Application level encryption

• SSH is a client/server application protocol that provides an encrypted, endpoint-authenticating, data-integrity-preserving channel, typically used for launching other remote applications.
• Warning: SSH comes in two versions, v1 and v2, which are totally different protocols. Further, there are significant differences between two two different implementations of the v1 and v2 standards, a commercial one (SSH Communications) and an open source one (OpenSSH). What I'll be saying is oriented mainly towards ssh.com's Unix v2 implementations.
• Operations over the channel
```  ssh server
ssh -l user server
# equivalently ssh user@server
# lots of other flags available, e.g. ssh -d [debug level]; try ssh -h
# escape from session with line-initial ~ (try ~?)
```
• Remote command execution
```  ssh server cmd
```
• Remote file transfer
```  # an ftp-ish client
sftp server
# a cp-ish client
scp from-file to-file
# which takes various flags -p preserver attributes/timestamps,
# -r recursive,  either from-file or to-file may be remote;
# file specification takes the form which allows for lovely
# things like (from A as user u1):
scp u2@B:foo/fup.html u3@C:
```
• Port forwarding. In its typical use ssh is a standard client/server application: a client talks, from some dynamically assigned tcp port, to a server at tcp/22. However ssh is also able to forward connections from a second port on the client end (say some arbitrary dynamic port), through the basic ssh connection, and on to a second port at the ssh server end, say to a POP server. Port forwarding turns out to be very securing services and for firewall traversal. More on this later, probably, when we do firewalls.

• SSH has been developed largely as a replacement for:
• Ftp and telnet, file transfer and remote login protocols that involve sending passwords and data over the wire in plaintext.
• The Unix r-commands, rcp, rsh, rlogin that do respectively file transfer, remote command execution, remote login. The r-programs do away with passwords, but involve a weak form of authentication. A given host, as we touched on briefly in session 1 in connection with TCP session hijacking, can define other machines to be trusted. It does so by entering their hostnames in certain configuration files (globally, /etc/hosts.equiv, per user, ~/.rhosts). A user u on a remote trusted machine would then be able to connect to the trusting machine as local user u without having to supply a password. The main weakness in this scheme has to do with the form of authentication it uses: a request comes in to the trusting machine; it extracts the source IP address, does a reverse lookup to find the corresponding domain name, and compares the result to entries in the config files. This procedure is susceptible to IP spoofing and subversion of the DNS system, both of which turn out to be reasonably easy to do - those protocols have few security features.

• Under the hood
• SSH provides the core security services encryption, authentication, and integrity. It authenticates the server using public key cryptographic methods; it does bulk data encryption using symmetrical algorithms, leaving the choice of these to be negotiated between client and server; and it provides data integrity by way of the usual hash functions.

In all these respects, as we'll see, SSH resembles SSL. However there are important differences in the ways the two protocols operate. For example, in SSH the server is authenticated using public key cryptography, but this doesn't (yet) involve the use of X.509 certificates (the SSH v2 protocol allows for a server to pass a certificate to a client, but at this point all going implementations deal with the raw keys). Further, SSH has a much more extensive apparatus for dealing with client authentication than does SSL: in SSL, client authentication is optional and in practice rarely performed; when it is done, it amounts simply to a transfer of a digital certificate; SSH sessions, on the other hand, almost invariably include client authentication, and the protocol offers a number of modes of authentication.

• SSH v2 involves three sub-protocols--a transport protocol that handles connection setup, server authentication, and algorithm negotiation; an authentication protocol that handles user authentication; and a connection protocol that deals with the multiple data and control streams that pass through the channel. Here's the basic structure of the transport protocol, seen from the point of view of a connection-initiating sequence:
• After a generic TCP connect, the two sides exchange version strings (especially important when there are large populations of both v1- and v2-speakers floating around).
• Switch to a binary protocol. The packet format is going to look like this:
```  uint32                                    packet_length
byte[MAC_length]                          MAC (i.e., hash)
```
• Now start negotiating, and in particular we need to figure out what key exchange method to use, so each side sends a SSH_MSG_KEXINIT message. You might think this negotiation would take awhile--we're always going to be worried about how many round-trips the protocol consumes, right?--but in practice there's only one key exchange method defined--Diffie-Hellman with SHA-1 hashing--and the standard allows you to guess, i.e., to follow up your initial offer immediately with a packet built for the key exchange method you think the other party will agree to. Setting the first_kex_packet_follows flag in your initial offer means "the thing that follows isn't me sneezing, it's an answer to what I think you're going to say".

The various strings in the SSH_MSG_KEXINIT message consist of defined string constants for the various algorithms, comma-separated, and listed in decreasing order of preference. If both parties have the same algorithm in a given category ranked highest they use that; otherwise they use the algorithm most highly ranked by the client that the server knows how to do (and that's consistent with other algorithms selected). So here's what the initial offer looks like:

```  byte     SSH_MSG_KEXINIT
string   kex_algorithms
string   encryption_algorithms_client_to_server
string   encryption_algorithms_server_to_client
string   mac_algorithms_client_to_server
string   mac_algorithms_server_to_client
string   compression_algorithms_client_to_server
string   compression_algorithms_server_to_client
string   languages_client_to_server
string   languages_server_to_client
boolean  first_kex_packet_follows
uint32   reserved
```
• Now guess, i.e., start in with Diffie-Hellman calculations. In what follows, V_C = the client's version string, V_S = the server's; I_C = the payload of the client's SSH_MSG_KEXINIT message, I_S the server's; K_S = the server's public key. The Diffie-Hellman operators g and p are defined in the standard, i.e., they're known to both parties in advance.
• Client
• Pick random x
• Send
```  byte  SSH_MSG_KEXDH_INIT
mpint e (= g**x mod p)
```
• Server (having received client's message)
• Pick random y, compute f = g**y mod p
• Compute
• K = e**y mod p (K is the Diffie-Hellman shared secret)
• H = SHA-1(V_C || V_S || I_C || I_S || K_S || e || f || K)
• Generate a signature s on H using server's private key
• Send
```  byte   SSH_MSG_KEXDH_REPLY
string K_S (this is the pub key)
mpint  f
string s
```
• Client
• Verify that K_S is really the host key for the server (should have a copy in our local store - this is how we avoid the man-in-the-middle attacks, and verify we know who we're talking to)
• Compute
• K = f**x mod p (client's copy of shared secret)
• H (same calculation as server performed)
• Verify s (i.e., we should be able to get H back by decrypting s with K_S; if we can do this we've verified the server)
• Client and server now generate the various session keys from K and H (the "shared secret", as we'll see too in the case of SSL, is really just input to further manipulation that produces the actual keys, the further manipulation consisting of hashing K, H, and some literal values defined in the standards documents
• The next messages exchanged are a pair of SSH_MSG_NEWKEYs--i.e., each side announces it's about to start communicating using the keys and algorithms just negotiated
• All this is before it's even really got going!

At this point the session can move in various directions. Most commonly the client will request an operation involving one of the other sub-protocols--and most commonly of all will move to client authentication. The form of this request is

```  byte   SSH_MSG_SERVICE_REQUEST
string service name ("ssh-userauth" | "ssh-connection")
```
If the server's prepared to proceed, it responds with a SSH_MSG_SERVICE_ACCEPT.

• Client authentication. SSH can do various forms of authentication. Without going into the implementation of these methods, they include:
• Password. The client authenticates against the password database on the SSH server. The difference between this and non-SSH remote login, e.g. via telnet, is that the password travels encrypted from client to server (i.e., when we reach this point, we're already talking across an encrypted channel)
• Trusted host.
• Every machine has a public/private key pair it can use to prove its identify, by the fact that each computer stores copies of the public key of every other computer it connects to. The protocol has each side prove to the other that it is who it's supposed to be. These public key database are either installed by hand, or created the first time a computer is contacted. (And hopefully that wasn't the time Eve was being Evil.)
• (Per user) public key
• User generates a key pair using ssh-keygen. The private key is kept at the client end, stored in a file under ~/.ssh2, and is pointed to from ~/.ssh2/identification with an entry of the form
```  IdKey id_dsa_1024_a
```
...where id_dsa_1024_a is the name of the file holding the private key. A copy of the public key is moved to the user's account on the server (under ~/.ssh2 again), and pointed to from ~/.ssh2/authorization with an entry of the form
```  Key id_dsa_1024_a.pub
```
• When the user wants to connect to the server, the SSH client sends a copy of the public key half of the key pair it wants to authenticate with; the server checks to see that this key is authorized to connect to the account, then encrypts a challenge message with that key and returns the encrypted challenge to the client; the client decrypts the message (with the private key), hashes it, and returns the hash to the server; the server than compares this value to its own hash of the challenge. This sequence assures the server that the entity presenting the public key and asking for authorization in fact holds the corresponding private key.

Observe that there are some subtleties here--for example the client hashes the decrypted challenge (rather than returning it as is) to avoid the possibility that the challenge was actually something cleverly but maliciously chosen by the server--e.g. the (henceforth non-repudiable) plaintext "I hereby bequeath all my worldly possessions to the owner of server S", or perhaps a message previously encrypted with the user's public key

• Port forwarding
• An SSH connection involves, minimally, a TCP connection between a client at a dynamic port and a server at port 22. It may also involve a kind of virtual connection between another pair of ports--one on the client end, one on the server end--that goes through the basic SSH connection.

Let's say you're on A and you'd like to access a POP server on B--you'd like to, but you're worried about your POP username and password going in plaintext over the network. Let's say also that SSH facilities exist on both A and B. What port forwarding lets you do is to pass the POP connection through the SSH one. Three things are required:

• At the server end, the SSH server will make a TCP connection to the POP server
• At the client end, the SSH client will make a connection to the SSH server; it will also listen for connections on a second (arbitrary) TCP port (let's say 7773).
• Now when you want to read mail from A, you tell your mail client to connect to port 7773 on A. But that port is really owned by your SSH client; it forwards stuff on to the SSH server, which forwards it again to the POP server. Return traffic goes back through the same chain.
For example, if from A you say
```  ssh -L 7773:localhost:111 B
```
...where -L mean "I want local forwarding" (vs. -R, "I want remote forwarding", which we'll get to in a second), and where the triple "7773:localhost:111" indicates a local entry point for an application client, the service provider (n.b. "localhost" here doesn't mean A, the machine initiating the SSH connection, rather it's relative to B, the machine receiving it), and the relevant port on the service provider, then what gets set up is the following arrangement (TCP ports are bracketed; we have a couple well-known server ports, 22 and 111, the listening port for the SSH client, as specified on the command line, and a couple other dynamic ports, for which I've picked numbers out of thin air)

This command still creates an ssh session on B, just as it would without the -L flag. It just has a side effect as well - the two ends of the connection listen for local connections on their respective computers, and pass that stuff back and forth as well.

The result is that on A, "telnet localhost 7773" will actually ring port 111 on B.

• Port forwarding with the -L flag is "local" forwarding. Conversely we have "remote" port forwarding with -R. If for example you're on C and you say
```  ssh -R 15658:localhost:80 D
```
• Set up a TCP connection to the webserver on C
• Make an SSH connection to the SSH server on D, tell it to grab 15658 and send through anything that comes in to that port
The result is that any web client pointed at http://D:15658/ will see the website governed by the server on C.
• There is a caveat about who's allowed to connect to those high ports. Going back to the previous "local" case. We open a port on A, 7773, and tell our mail client on A that if it connects to 7773, it'll be magically--and securely--transported to the mail server. Fine, but that port 7773, that's just a regular old TCP port, so could a mail client on any other host connect to (A, 7773) and get to the mail server?

The answer turns out to be no, by default, but this can be overridden. By default, the application client has to be on the same machine as the SSH client. The way this is enforced has to do with the way network sockets are allocated. When you say "gimme TCP port 7773", by default a socket is "bound", for that port, on all your machine's network interfaces. It possible however to restrict that binding, and that's what's done in the case of port forwarding--you get the port, but it's only on the (machine-internal) "loopback" interface associated with the IP address 127.0.0.1. This means that someone trying to reach this port will have to be able to do so at the destination address 127.0.0.1--and only another process on the same machine can do that.

The default behavior can be overridden in a user's config file (~/.ssh2/ssh2_config) with an entry like

```  GatewayPorts yes
```
With such an entry in place, the application client and SSH client can communicate over any IP link
• Port forwarding can also be extended on the SSH server side (to take the case of "local" port forwarding). To stick with our first example, when we say
```  ssh -L 7773:localhost:111 B
```
..."localhost" tells B that the destination server is on B itself, at port 111. But we can insert any destination we like here. For example
```  ssh -L 7773:www.pardons-r-us.com:80 B
```
means
• listen locally on 7773
• Make an SSH connection to B
• Send anything that comes in to 7773 over to B
• Have B send anything that comes through the pipe on to pardons-r-us
• Uses of extended port forwarding
• Firewall piercing. Let's say you're on J and you want to get to some service on L, let's say POP again, tcp/111. But what if there's a firewall K between J and L, blocking tcp/111? Sounds like trouble, but if K is a bastion host supporting SSH, you may be able to do
```  ssh -L 5159:L:111 K
```
...and point your local mail client at 5159

• And here's an example that involves extending port forwarding from both the client side and the server side. Imagine you've got a network that has a webserver intended for organization-internal use. Clients on the network can access the webserver directly, but at the edge of your network you've got a firewall that blocks access to port 80. Then all of a sudden you sprout a branch office in, say, Port Moresby. You'd like those guys to be able to access the webserver, but the firewall gets in the way. However if the local (home office, I mean) firewall either provides SSH service or will pass SSH traffic through to a device behind it that does, we can extend the previous example as follows:

On PM-firewall:

• Turn on GatewayPorts
• Do: ssh -L 4488:W:80 L
• Do something, e.g. with TCP wrappers, to ensure that only branch office clients access PM-firewall at 4488
And inform Port Moresby users that the URL for the home webserver is http://PM-firewall:4488/

This example is artificial and restricted in a number of ways--starting with the face that in the case of web service you'd probably just run an SSL-based server at the home end, together with some form of client authentication. But there are other services, and the overall setup is in any case instructive.

Observe that

• We have two unprotected networks--unprotected by SSH--at either end of the arrangement: there's nothing to ensure the privacy, integrity, etc., etc. of communications between the browsers and PM-firewall, similarly no protection between L and W.
• But we're not worried about this because by hypothesis those are our networks. Either our users are angels and we have nothing to worry about from them, or they're not and we do. Either way, we can take whatever additional security precautions we like in these zones.
• This is not however the case in the intervening Internet--there we control nothing, and we have reason to believe not everything and everyone is angelic. But we don't worry about this either because we have a tunnel, an encrypted, authenticating, integrity-checking pipe through the Internet, whose two endpoints are on our routers.
• This sort of structure captures the essential idea behind "virtual private networks".

#### 5. SSL/TSL - Socket level encryption

• History. Netscape essentially just dreamed SSL up. Not capriciously of course--they did so at a conjuncture in web history when a mechanism for providing secure web transactions was badly needed. SSL has evolved considerably since its original incarnation--evolved in its technical underpinnings, changed names (from SSL to TLS), changed stewardship (it's now defined in a set of RFCs and its further development will be under the aegis of an IETF working group). A transition from SSLv3 to TLS is currently underway in deployed web software.

• Characterization. It's a software layer between the application and TCP. It offers services to the layer above (it provides an encrypted channel, integrity checks, authentication of endpoints; from an application programming point of view, think of this as involving a replacement of regular old TCP socket calls with SSL ones) and it relies on services from the layer below (in particular it counts on TCP giving reliable transport). Not in particular that while SSH is primarily a command-line application, SSL is primarily a library toolkit for application developers. (Though it does come with command line tools, like the "openssl" I was using on the command line earlier.)

• A walk through a basic SSL session.
• SSL defines a set of control and data-bearing message types and subtypes, and specifies how those messages can be sequenced to form meaningful interactions.
• The pieces are the same as we've already seen: exchange keys, prove identities, and then use a session encryption key for the rest. One difference is that typically only the server proves its identity to the client. This is motivated by browser/server interactions, since you need to know its the Amazon you're giving your credit card to, but all Amazon wants is your card number.
• A typical SSL engagement looks something like this:  Client Server SYN SYN + ACK of client SYN ACK of server SYN Handshake:ClientHello cipher_suites (offer of supported cipher clusters, represented as two-byte codes; the client is offering different combinations of signing algorithms, digest algorithms, and so on) random (4 bytes GMT + 28 bytes random) version session_id (first time through client leaves this blank) compression_method (the only one defined is null...) Handshake:ServerHello (contains essentially the same fields as the ClientHello message. Server fills in session_id, for uses we'll see shortly. What's in cipher_suite are the algorithms the client and server will actually use--i.e., the server decides) Handshake:Certificate (here's my X.509 certificate; see the example below) Handshake:ServerHelloDone Handshake:ClientKeyExchange (client selects pre_master_secret (= two bytes designating version + 46 bytes of random, encrypts with sender's public key (having verified the server's certificate and extracted the key) and sends; each side now turns the pre_master_secret into the master_secret, and then to turn the master_secret into a set of session keys; see below for further details) ChangeCipherSpec (my next message will use the encryptions we agreed on) Handshake:Finished the client now sends a digest of all its previous messages so the server can verify the integrity of the messages received; the point is to prevent the possibility of an attacker injecting bogus handshake messages in the case of SSLv3, the contents of the Finished message are an MD5 hash followed by a SHA-1 hash; here's how the MD5 hash is produced: ```= md5(master_secret + pad2 + md5 (concatenated_handshake_messages + sender + master_secret + pad1)) where sender is a constant, different for client and server pad1 = 48 0x36 bytes pad2 = 48 0x5c bytes ``` ChangeCipherSpec (my next message will use the encryptions we agreed on) Handshake:Finished (here's a digest of what I just said) ApplicationData (blah blah blah) ApplicationData (yak yak yak) Alert: warning, close_notify (the point of which is to prevent truncation attack, i.e., attacker inserting a premature FIN (they can't insert a bogus close_notify because of the integrity checks built into SSL FIN Alert: warning, close_notify ACK of FIN FIN ACK of FIN
• Example X.509v3 certificate
```[root@at ssl.crt]# openssl x509 -noout -text -in server.crt
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 1 (0x1)
Signature Algorithm: md5WithRSAEncryption
Issuer: C=XY, ST=Snake Desert, L=Snake Town, O=Snake Oil, Ltd, OU=Certif
icate Authority, CN=Snake Oil CA/Email=ca@snakeoil.dom
Validity
Not Before: Feb 14 19:03:52 2001 GMT
Not After : Feb 14 19:03:52 2002 GMT
Subject: C=XY, ST=Snake Desert, L=Snake Town, O=Snake Oil, Ltd, OU=Webse
rver Team, CN=at.marlboro.edu/Email=root@at.marlboro.edu
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public Key: (1024 bit)
Modulus (1024 bit):
00:c1:b1:97:57:c8:ed:cc:2f:aa:c7:6e:42:db:24:
99:d4:67:9c:d7:6a:f3:6d:cb:37:54:ea:8e:20:8f:
df:8f:97:75:12:da:91:d4:49:86:e4:fd:d8:a9:cf:
0b:7e:41:b5:8c:c7:a4:f0:3b:ac:db:68:65:86:40:
d0:08:86:14:21:d7:46:a6:2d:e8:18:97:59:11:60:
1d:96:c2:cc:d5:91:48:ee:a1:a2:d7:c9:8d:9f:92:
96:b4:d0:2f:a3:c3:4a:36:ed:c9:ce:09:c9:2a:53:
10:c7:55:56:b4:1f:17:73:d1
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Subject Alternative Name:
email:root@at.marlboro.edu
Netscape Comment:
mod_ssl generated test server certificate
Netscape Cert Type:
SSL Server
Signature Algorithm: md5WithRSAEncryption
83:c3:af:99:8b:2e:21:5d:88:ee:c0:7b:4c:5d:89:83:25:37:
96:87:ba:ea:73:3b:04:5a:d4:07:75:e6:8f:93:cd:3b:f7:01:
8a:f9:5e:96:04:65:75:73:24:20:7a:23:89:f9:a8:e3:7f:85:
b8:85:5b:5b:56:3f:be:c9:83:62:1f:37:95:80:10:5b:19:8a:
03:d0:60:cd:0d:66:91:c5:ba:f5:5a:65:99:6f:32:ec:7a:72:
7c:d4:aa:32:67:73:d1:1d:7f:eb:c1:be:c2:85:87:89:7f:90:
ec:dc
[root@at ssl.crt]#
```
• SSL record protocol
• All SSL messages--control and data--are carried in a record format
• Messages, records, segments: an SSL implementation may put multiple messages in one record. The chunk of application data that goes into a record might correspond to a single write() call (or an implementation might buffer). There may be a one-to-one correspondence between records and TCP segments (or an implementation might try to cram multiple records into one segment).
• record format:
• type (application_data | alert | handshake | change_cipher_spec)
• length
• version
• data
• the form of the data will depend on its type: a change_cipher_spec message takes up a single byte; an alert consists of a "level" (warning | fatal) and a "value" and takes up two bytes; handshakes have a four-byte header (one byte of type, three of length) and a body determined by the type
• in all cases except handshake messages prior to Finished during an initial set up, the data section of a record will include the actual contents and a digest of those contents (and possibly some padding out to the block size of encryption algorithm), and the digest, the contents, and the padding are then encrypted
```data = encrypted (contents +
MAC +
```
• in the case of TLS, the MAC is computed as
```MAC = hmac_hash(mac_signing_secret,
sequence_number +
type +
version +
data_length +
contents)
```
• Further SSL
• Resumption
• SSL distinguishes between connections (mapping one-to-one to TCP connections) and (virtual) sessions. By retaining state information at the session level, certain performance optimizations are possible.
• A given interaction between a client and server--say a shopping-cart sequence--may involve multiple TCP connections (things don't have to be like that, but the fact of the matter is that browsers do parallelize requests through multiple connections, and servers do have connection timeouts and caps on the number of requests that can be sent through a given connection). We might want to protect all the connections, but the SSL handshake is extremely expensive.
• Session resumption allows a client and server to reuse a master secret from a prior interaction, and avoid most of the expense of the handshake. During an initial SSL setup, the client leaves the session_id field in the ClientHello message blank. The server fills this in in the answering ServerHello. Client and server cache this value, along with the associated master_secret and negotiated cipher suite. Next time through, the client reuses the session_id (or to be more precise, the client sets up a new TCP connection, and then asks itself if it has a cached session_id previously negotiated with that server; if so it uses it). If the server agrees to the resumption it returns a ServerHello with the same session_id value, and then immediately does a ChangeCipherSpec, short-circuiting the rest of the handshake. Both sides now have to recompute the encryption keys etc. from the cached master_secret and the random values passed in this most recent ClientHello/ServerHello pair.
• Performance
• Rescorla reports that resumption can involve a 20x speedup over the normal handshake. The gain is mainly seen on the server side (since the server no longer has to do a private key decrypt of the pre_master_secret), so this translates directly to increased server throughput, i.e., an ability to handle more connections/second.
• Cache handling is not trivial
• Sizing. Too large a cache and cache search becomes the overall bottleneck in the system.
• Access arbitration.
• In the case of Apache, "the webserver" consists of multiple connection-handling processes. These independent processes do not naturally share a memory store. And yet connections from the same client may reach several of those processes, so we'd really like to build such a store to maximize cache hits. There are various approaches to doing this: files and file locking, shared memory segments...
• The situation is even more complicated when "the webserver" is a whole farm of machines, and when any given request could reach any one of them. Here it's even harder to come up with a unified session cache, but see for example Cox and Thorpe's Apachecon 2000 paper proposing a dedicated session cache server that would sit behind a whole server farm.
• Client authentication
• In its most common mode SSL involves only server authentication (the client typically identifying itself in the flow of application data--e.g. credit card coming as part of an HTML form submission), but it can do client authentication as well. We'll see that this leads to some complications in the case of HTTP over SSL, but for the moment here are the mechanics  Client Server ClientHello ServerHello Certificate CertificateRequest ServerHelloDone Certificate ClientKeyExchange CertificateVerify (client signs something to verify its identity; why doesn't the server need to do this?)
• Rehandshake
• It's possible for either client or server to ask for a renegotiation in mid-connection. One scenario for doing so would be a server wanting to move to stronger ciphers when a client jumps to an especially sensitive area of the website.
• The client can initiate a rehandshake simple by sending a new ClientHello (and client and server then proceed through a whole new handshake). The server can do so by sending a HelloRequest, to which the client responds with a ClientHello and we go through the usual business.
• Performance
• The bottom line is that SSL is expensive--expensive in terms of packets, of round trips, of computational time.

Note also that once you've done encryption, you can't do any compression - the signal should look random at that point. (The SSL protocol does include some provisions for doing compression first, but apparently no one uses them.)

For a point of comparison, here's the start of a simple (non-SSL) HTTP request for a text file between a client and a server on the same LAN. Notice that it take six packets and just over three ms from receipt of client SYN to the point at which the server pushes the first data packet out the door (we're looking at this from the point of view of tcpdump running on the server; with some of the tcpdump output excised to preserve sanity):

```# 1: client SYN
10:00:03.799212 mdhcp244.marlboro.edu.1071 > at.marlboro.edu.www: S
# 2: server SYN + ACK-of-client-SYN
10:00:03.799276 at.marlboro.edu.www > mdhcp244.marlboro.edu.1071: S
# 3: client ACK-of-server-SYN
10:00:03.800850 mdhcp244.marlboro.edu.1071 > at.marlboro.edu.www: . ack 1
# 4: client http request
10:00:03.801705 mdhcp244.marlboro.edu.1071 > at.marlboro.edu.www: P 1:296(295)
# 5: server ACK of request
10:00:03.801753 at.marlboro.edu.www > mdhcp244.marlboro.edu.1071: . ack 296
# 6: server's first data packet
10:00:03.802412 at.marlboro.edu.www > mdhcp244.marlboro.edu.1071: P 1:1449(1448) ack 296
```
• By contrast, here's the output of the ssldump tool for an HTTPS request for the same object (I've inserted markers to show the mapping to the underlying TCP segments). In this case it takes 13 packets to reach the first one bearing application data, and the process takes just over 31 ms, roughly an order of magnitude longer than than the plain HTTP request (the first timestamp for each message = absolute offset from conclusion of TCP setup, second timestamp = relative offset from time of last SSL event). The most striking fact is that half the overall time is taken up in the interval between the server's receipt of ClientKeyExchange and the server's send of ChangeCipherSpec. It's in this interval that the client's computing the session keys from the pre_master_secret and that the server's doing the same--having however first to do a private key decrypt on the pre_master_secret. In fact on first principles it's precisely such private key signings and decrypts that are likely to soak up the bulk of computational time.
```*****PACKETS 1-3*****
New TCP connection #1: mdhcp244.marlboro.edu(1081) <-> at.marlboro.edu(443)
*****PACKET 4*****
1 1  0.0022 (0.0022)  C>S SSLv2 compatible client hello
Version 3.0
cipher suites
SSL2_CK_RC4
SSL2_CK_RC4_EXPORT40
SSL2_CK_RC2
SSL2_CK_RC2_EXPORT40
SSL2_CK_DES
SSL2_CK_3DES
SSL_RSA_WITH_RC4_128_MD5
Unknown value 0xfeff
SSL_RSA_WITH_3DES_EDE_CBC_SHA
Unknown value 0xfefe
SSL_RSA_WITH_DES_CBC_SHA
SSL_RSA_EXPORT1024_WITH_RC4_56_SHA
SSL_RSA_EXPORT1024_WITH_DES_CBC_SHA
SSL_RSA_EXPORT_WITH_RC4_40_MD5
SSL_RSA_EXPORT_WITH_RC2_CBC_40_MD5
*****PACKET 6*****
1 2  0.0034 (0.0012)  S>CV3.0(74)  Handshake
ServerHello
Version 3.0
random[32]=
3a 91 44 35 a0 ef 3a 08 a5 31 6d 0c 41 88 32 51
28 af f9 d9 33 61 98 23 59 4a 0f 82 eb 2d b4 d0
session_id[32]=
5f 14 ec 6b d0 05 e8 31 f4 d5 fc ab 12 13 a2 4c
fe 1e b5 02 61 69 47 ec 67 78 e3 24 37 bb b5 a7
cipherSuite         SSL_RSA_WITH_RC4_128_MD5
compressionMethod                   NULL
1 3  0.0034 (0.0000)  S>CV3.0(841)  Handshake
Certificate
1 4  0.0034 (0.0000)  S>CV3.0(4)  Handshake
ServerHelloDone
*****PACKET 8*****
1 5  0.0118 (0.0083)  C>SV3.0(132)  Handshake
ClientKeyExchange
EncryptedPreMasterSecret[128]=
67 ee bf 34 28 ff 1a 8f 1b dd d2 4c e1 8e 78 9e
57 24 45 1e a4 5b c9 02 53 ea 42 7f 1c d7 a9 d3
8f e2 03 ae b9 d8 00 8e e4 d9 c7 eb 03 b8 d9 6a
ee 7c 3c bb 9e 88 57 fc 01 06 4b ef 78 cf c1 b9
5a 8e a4 c8 35 e2 37 ca bc 6b 79 ab be fd d6 43
dc a1 1d cf 24 bb b2 20 85 1e f1 f8 e8 62 96 63
52 c1 07 32 f2 cb 90 9f 59 4d a5 7f 40 32 ee 08
f9 ef 10 ce e0 e4 0a ca 4b d8 8a 24 9e e3 b1 65
*****PACKET 10*****
1 6  0.0229 (0.0110)  C>SV3.0(1)  ChangeCipherSpec
1 7  0.0229 (0.0000)  C>SV3.0(56)  Handshake
*****PACKET 11*****
1 8  0.0267 (0.0038)  S>CV3.0(1)  ChangeCipherSpec
1 9  0.0267 (0.0000)  S>CV3.0(56)  Handshake
*****PACKET 12*****
1 10 0.0298 (0.0030)  C>SV3.0(310)  application_data
*****PACKET 13*****
1 11 0.0312 (0.0014)  S>CV3.0(3004)  application_data
```
• The extent to which pre_master_secret decryption time dominates is more clear in the following trace from Rescorla (using instrumented openssl programs, and eliminating network latency; n.b. units = ms here):
```Client                        Server
0.07 Write client_hello
0.07 Write server_hello
0.20 Write certificate (three certs)
0.00 Write server_hello_done
2.54 Write client_key_exchange (of which 2.26 encrypt_premaster)
0.15 Write finished
31.02 Read client_key_exchange (of which 30.79
decrypt_premaster)
0.12 Write finished
```
• HTTP over SSL
• The governing conventions for HTTP over SSL are first that URLs to be accessed in this way are distinguished by the schema "https"; and second, that the well-known port for the service is TCP/443. That is, presented by the user with a request for https://www.secrets.com/, the browser will translate the name of the server to an IP address, initiate a TCP connection to that address at port 443, and then build an SSL session on top of the TCP connection.
• Complications
• Proxies. SSL is fundamentally incompatible with traditional web proxying. In the standard proxy model, the proxy acts on behalf of the real client--it receives an HTTP request from the client and then mirrors it to the origin server as if it were coming from the proxy itself. However under SSL the proxy's in no position to do this because the request is encrypted.

If the proxy is part of a firewall and client access to the Internet has to pass through the proxy, the client can resort to a special HTTP request method, CONNECT: the client sends a CONNECT request to the proxy specifying the intended SSL server ("CONNECT www.secrets.com:443 HTTP/1.1"); the server opens a TCP connection to the server at 443 and also returns a HTTP 200 (= everything's ok) response to the client; and the client then opens an SSL negotiation--received by the proxy, but passed straight through to the true origin server

Notice that if the proxy is really part of a firewall--i.e., if it's part of some sort of filtering system--the CONNECT mechanism has the (un)fortunate consequence that the proxy administrator no longer really knows what's passing through the proxy--there'd be no way to tell that the encrypted application traffic was HTTP (as opposed to some other protocol; no way to tell even if outbound connections were limited to destination port 443, since we have no idea what other people are actually running at any given port); there'd be no way to tell that it was SSL traffic at all (of course if you looked at it and it was going to port 23 and looked like telnet traffic you'd be suspicious; but if it started with a reasonable facsimile of an (unencrypted) SSL handshake and then turned to Greek, it'd be nearly impossible to tell true SSL traffic from bogus)

Notice also that the proxy is well-situated for a man-in-the-middle attack. It could, if the client were careless enough to accept a wildcard certificate--terminate the client's SSL messages locally, replicate them to the server over a second SSL connection, and examine the client's plaintext transmissions between the two SSL pipes.

• Virtual hosts. It's common to want to provide web service for many domains on a single machine (put the other way around, it'd be terribly inefficient, given the number of domains wanting service, to insist on one machine per domain). The difficult has been that the HTTP request, as it reaches a server, only names a local resource, i.e., only provides a pathname. How's the server to know whether "GET /foo.html HTTP/1.0" refers to a file in the document tree for www.a.com or www.b.com?

The way this is solved in more recent HTTP implementations is by the use of a special Host header that accompanies the HTTP request, where "GET /foo.html HTTP/1.0" followed by "Host: www.a.com" disambiguates which foo is at issue (of course, the web-serving software also needs to know to read the Host header...)

But this scheme doesn't work under SSL. It doesn't work because the SSL-based server hasn't seen the HTTP request at the point it needs to send a (domain-identifying) certificate: a client makes a TCP connection and then sends a ClientHello; the next thing the server needs to do is to return a Certificate message... but should it send the one for www.a.com or www.b.com? It has no clue, because the client still hasn't made an HTTP request. Solutions:

• Use IP-based virtual hosting instead: associate each domain with a distinct IP, overload the network interface with multiple IPs, check the destination IP on the packet's coming from the client. This was in fact the common way of doing virtual hosting before the Host header came along. It has the disadvantage of consuming one IP address per domain, and we know that IP addresses are a scarce commodity...
• Allow wildcards in cetificate subject names
• Turn back the clock, and put the server's domain name in ClientHello
• Client authentication.
• We've seen that it's uncommon but quite possible for client authentication to take place during an SSL handshake. What causes difficulties in the web context is the idea that we might want to do client authentication selectively, i.e., only when clients attempt to access certain resources. And the source of the difficulty is again the way in which SSL encapsulates the HTTP transaction: we don't know, when our opportunity to ask for client authentication arises (early in the handshake), whether they're going to ask for anything we need to authenticate on.
• Solutions
• Always authenticate (but this will be obnoxious for users)
• Start with a non-client-authenticating handshake, and then if they want something restricted immediately do a HelloRequest to initiate a new handshake
• Notice that in this case we wouldn't want to roll back the clock and stick the whole URL in the (unencrypted) ClientHello 'cause it may contain sensitive information.
• The traditional way of combining plain HTTP and HTTP over SSL has been to run the two services at separate ports. RFC 2817 defines a way to run both on one port, using the Upgrade header defined in HTTP 1.1. The idea is that the client would send a request with an "Upgrade: TLS/1.0" header; the server, if agreeable, would respond with the same, and the client would proceed with a ClientHello (alternatively, the server might initiate an upgrade on receipt of a request for a restricted resource by returning a special error (426), and the client could repeat its request, including an Upgrade header, and things would proceed as under the client-initiated scenario). Upgrade-based HTTP over SSL, at this point, runs exclusively on blackboards.

#### Trying this out

Choose at least one of the following cryptographic tasks.
1. Symmetric Encryption:
Use any utility you like (openssl, PGP, GnuPG, ...) to encode and decode a text file with a random symmetric 128-bit key of your choice using the RC5 cipher. You could use the command line utility openssl; "man openssl" and "man enc" will give you notes on that utility - or download and install a tool like GnuPG. (http://www.gnupg.org/). Verify that your cipher text is correct by decoding it.
2. Public/Private Keys:
Create a public/private RSA key pair for yourself, and use them to create a signed MAC of these lecture notes, that is, a digest (say SHA-1) of the notes, encrypted with your public key. GnuPG is probably the easiest tool to use for this, though I haven't look hard for others.
3. Message Digests
Most serious software packages available for download also have digest checks, to ensure that you have the real version. For example, the .md5 files in ftp://ftp.openssl.org/source/. Download one of these packages and verify that the digest matches the archive. Verify that if you uncompress it, modify the source a tiny bit, and recompress it, that the signature no longer matches. From the comman line on bob, "md5sum" or "openssl" can be used to do so; type "man md5sum" for details. Unpack the source, modify one character, recompress it, find the new digest, and verify that it no longer matches the original. How much did the digest change?
4. Playing with the math.
Create your own examples with small numbers of either the Diffie-Hellman or RSA algorithm, along the lines of what's in the notes up above. Work your way through the steps, and do out all the calculations either by hand or with a short program in a language of your choice.

Jim Mahoney <mahoney@marlboro.edu>