Comp 527: Assignment 4: Measuring RC4

Due date: Thursday, March 4, 11:59pm (get some sleep!)

This is a group assignment. Groups should have exactly two members. I will permit at most one group with three members.

Objective

The objective of this assignment is to do some simple cryptanalysis of the RC4 encryption algorithm. RC4 is a stream cipher, internally generating a stream of random bits which is then XOR-ed with the plaintext. The secret key initializes the random number generator. However, on observation there appears to be a weakness in RC4. Perhaps two keys that are very similar will result in similar random numbers coming out of RC4. In this assignment you will implement RC4, make some differential measurements of randomness for related keys, and you will graph your results.

Details

You are going to first implement the RC4 algorithm. Since I want this page to be visible world-wide and avoid crypto export issues, I won't put the algorithm here. You can find some code at the RC4 in three lines of Perl site (which also includes a full, readable C version). There will also be a handout in class.

This assignment is a form of related-key cryptanalysis. With a stream cipher like RC4, it should be the case that, if you toggle exactly one bit in the key, 50% of output bits should toggle. If you look closely at the key setup, it doesn't look like this is the case. It appears that single-bit differences in the key will result in fairly small differences in the internal state of RC4. After a sufficient amount of output, this small change will eventually propagate and then you would expect 50% of the output bits to toggle.

Implement the cipher

Don't worry about XOR-ing the output bits with a message you're going to keep secret. We're only going to study the random bits. Make sure you can accept keys of length up to 2048 bits.

Capture the output bits from two runs and XOR them

Generate a random key of 2048 bits (Unix's random(3C) is good enough for now). Collect some output from RC4. Toggle one or more of the key bits. Reinitialize RC4 and collect the output again. XOR them together. This gives you the difference of the two streams. If RC4 were ``perfect'', this difference stream would be ``perfectly'' random.

Analyze the differential output bits for randomness

There are many tests for randomness of numbers. You're going to implement a simple frequency counting test. Create an array of counters of length equal to a power of two (say, 256). Now, say the test data is a sequence of bits b0b1b2b3...bN. You can look at each sequence of 8 bits (b0..b7, b1..b8, b2..b9) as a number and increment the appropriate counter in the array. When you're done you would expect that the counters would be approximately equal. If the two original bitstreams were very similar, you would expect the counters for lots of zeros to have higher values than the counters for lots of ones.

You can compute a numerical measure of the randomness like so:

N = number of samples
C = number of counters
D = standard deviation of counter values
R = (D * C) / N

The closer the randomness (R) is to zero, the more random the data. You might consider using different numbers of counters and see if your randomness measure changes very much.

Run this randomness test lots of times and collect the results

Measure the randomness on outputs ranging from short through long (i.e., 2 bytes, 4 bytes, 8 bytes, 32 bytes, 128 bytes, 1024 bytes). For each of these, you need to consider the effect of toggling one bit in the key, two bits in the key, and so forth through 32 bits. For each of these pairs, you should make at least 20 measurements of the randomness and average the results. Your results will be more accurate if you make several thousand measurements for each pair.

Graph the results

You should produce a graph with one line for each output length. The Y axis is the randomness (higher values imply less randomness). The X axis is the number of bits you toggled. You would expect each line to start somewhere above zero and, as you toggle more bits, approach zero quickly. You would also expect to see higher values for the lines corresponding to shorter runs of data.

Discuss

How many bits need to be toggled (on average) before the differential bitstream looks random? This tells you something about the maximum useful key length for RC4. If you use a 128-bit RC4 key, each key bit is replicated 16 times in the internal key. If toggling 16 bits isn't enough to generate a random differential output, then that would imply 128-bit keys are not usefully strong.

If a vendor wished to ship a product using RC4 to encrypt short messages (50 bytes max), perhaps they should throw out some of the initial output from RC4 to let the key bits mix properly. How much?

Another possible analysis...

This work is not required for credit. It's for adventurous students looking to learn more about RC4.

If you want to pursue this problem further, here are some other things you might consider examining. Rather than randomness tests, you might observe that two RC4 systems initialized with similar keys may initially generate identical values but will eventually diverge. Try to measure the average length (in bytes) of identical output as a function of the number of bits you change in the key.

While you can measure that value, you may also be able to derive it probabilistically. See what you can do.

Submitting Your Results

You can either use Unix tools like gnuplot or a spreadsheet like Excel to generate your graph. Turn this into a GIF, put it on a Web page, and e-mail comp527 with a URL.

Credits

This assignment grew out of a discussion between myself and David Wagner. I hope you like it.

Dan Wallach, CS Department, Rice University

Last modified: Fri Feb 26 17:37:17 CST 1999