Comp 527: Assignment 4: Measuring RC4
up to main page
Due date: Thursday, March 4, 11:59pm (get some sleep!)
This is a group assignment. Groups should have exactly two
members. I will permit at most one group with three members.
Objective
The objective of this assignment is to do some simple cryptanalysis
of the RC4 encryption algorithm. RC4 is a stream
cipher, internally generating a stream
of random bits which is then XOR-ed with the plaintext. The secret key
initializes the random number generator.
However, on observation there appears to be a weakness in RC4. Perhaps
two keys that are very similar will result in similar random numbers
coming out of RC4. In this assignment you will implement RC4,
make some differential measurements of randomness for related keys,
and you will graph your results.
Details
You are going to first implement the RC4 algorithm. Since I want this
page to be visible world-wide and avoid crypto export issues, I won't
put the algorithm here. You can find some code at the RC4 in three lines of
Perl site (which also includes a full, readable C version). There
will also be a handout in class.
This assignment is a form of related-key cryptanalysis. With
a stream cipher like RC4, it should be the case that, if you
toggle exactly one bit in the key, 50% of output bits should toggle.
If you look closely at the key setup, it doesn't look like this is
the case. It appears that single-bit differences in the key will
result in fairly small differences in the internal state of RC4.
After a sufficient amount of output, this small change will eventually
propagate and then you would expect 50% of the output bits to toggle.
- Implement the cipher
- Don't worry about XOR-ing the output bits with a message you're
going to keep secret. We're only going to study the random bits.
Make sure you can accept keys of length up to 2048 bits.
- Capture the output bits from two runs and XOR them
- Generate a random key of 2048 bits (Unix's random(3C)
is good enough for now). Collect
some output from RC4. Toggle one or more of the key bits.
Reinitialize RC4 and collect the output again. XOR them together. This gives
you the difference of the two streams. If RC4 were
``perfect'', this difference stream would be ``perfectly'' random.
- Analyze the differential output bits for randomness
- There are many tests for randomness of numbers. You're going to
implement a simple frequency counting test. Create an array of
counters of length equal to a power of two (say, 256). Now, say
the test data is a sequence of bits b0b1b2b3...bN. You can look
at each sequence of 8 bits (b0..b7, b1..b8, b2..b9) as a number
and increment the appropriate counter in the array. When you're
done you would expect that the counters would be approximately
equal. If the two original bitstreams were very similar, you
would expect the counters for lots of zeros to have higher
values than the counters for lots of ones.
You can compute a numerical measure of the randomness like so:
- N = number of samples
- C = number of counters
- D = standard deviation of counter values
- R = (D * C) / N
The closer the randomness (R) is to zero, the more random the data. You
might consider using different numbers of counters and see if your
randomness measure changes very much.
- Run this randomness test lots of times and collect the results
- Measure the randomness on outputs ranging from short through
long (i.e., 2 bytes, 4 bytes, 8 bytes, 32 bytes, 128 bytes, 1024 bytes).
For each of these, you need to consider the effect of
toggling one bit in the key, two bits in the key, and so forth through
32 bits. For each of these pairs, you should make at least 20
measurements of the randomness and average the results. Your results
will be more accurate if you make several thousand measurements for
each pair.
- Graph the results
- You should produce a graph with one line for each output length.
The Y axis is the randomness (higher values imply less randomness).
The X axis is the number of bits you toggled. You would expect each
line to start somewhere above zero and, as you toggle more bits,
approach zero quickly. You would also expect to see higher values for
the lines corresponding to shorter runs of data.
- Discuss
- How many bits need to be toggled (on average) before the differential
bitstream looks random? This tells you something about the maximum
useful key length for RC4. If you use a 128-bit RC4 key, each key bit is
replicated 16 times in the internal key. If toggling 16 bits isn't enough
to generate a random differential output, then that would imply 128-bit keys
are not usefully strong.
If a vendor wished to ship a product using RC4 to encrypt short messages
(50 bytes max), perhaps they should throw out some of the initial output
from RC4 to let the key bits mix properly. How much?
- Another possible analysis...
- This work is not required for credit. It's for adventurous
students looking to learn more about RC4.
If you want to pursue this problem further, here are some other things
you might consider examining. Rather than randomness tests, you might observe
that two RC4 systems initialized with similar keys may initially
generate identical values but will eventually diverge. Try to measure
the average length (in bytes) of identical output as a function of the
number of bits you change in the key.
While you can measure that value, you may also be able to derive
it probabilistically. See what you can do.
Submitting Your Results
You can either use Unix tools like gnuplot or a spreadsheet like
Excel to generate your graph. Turn this into a GIF, put it
on a Web page, and e-mail comp527
with a URL.
Credits
This assignment grew out of a discussion between myself and
David Wagner.
I hope you like it.
Dan Wallach,
CS Department,
Rice University
Last modified: Fri Feb 26 17:37:17 CST 1999