Comp 527: Assignment 3: Mobile Code

Due date: Friday, February 26, 1pm

As in assignment 2, you should work on this assignment by yourself. Please cite your sources.

Question 1: Mobile Code vs. Fast Networks

One of the most commonly cited benefits of mobile code is that it helps reduce network bandwidth. If a Java applet understands the latest compressed audio or video format and can run close to hardware speeds, it would allow different Web sites to use any content type they want rather than one of the content types supported intrinsically by Microsoft or Netscape. Now assume we replace the current Internet with some beautiful high-bandwidth system, providing 100 MBits/sec (fast Ethernet speeds) between any two arbitrary machines at any time.

What different classes of applications will still benefit from mobile code?

Question 2: Digital Music Rights Management

The American recording industry association (RIAA) is deeply concerned about illegal music copying. Their current solution, implemented in DAT decks and home CD writers, is called SCMS (serial copy management system). There are basically two bits on the media. One bit says whether the song was recorded by the end-user (and thus has no copy protection). The other bit says whether it's an original or a copy. Their goal is to allow you to make first-generation copies but not second-generation copies.

As usual, the Internet, the MP3 compression standard, and computer CD-ROM writers have changed all the rules. The RIAA wants a strong solution to their problems. Consumers want to have their music. Discuss how their problems could be addressed. Do they need trusted hardware? Can mobile code help?

For lots of discussion and pointers to resources, a good place to start is MP3.com.

Question 3: Type Checking vs. Page Tables

Java has four different kinds of protection labels on object fields and methods: private, protected, public, and a default label sometimes called package-scope. A private variable may only be accessed by code running in the same class. A package-scoped variable may be accessed by code running in any class inside the same package (i.e., java.lang). Public variables may be accessed by anybody. Protected variables are somewhat funny, being accessible either by any subclass or by any class within the same package. For the purposes of this question, you may ignore the protected label and focus on the other three. Likewise, you may ignore static members and functions.

The other rule is that in order to access a member of a class instance (i.e., something allocated by using the new operation), you must already have a reference to the instance. In this respect, a class instance reference act as capability to access the class.

As we've seen, a sufficiently crafty adversary might be able to find a bug in the type checker and use this to override the above rules. One proposed solution is to use the hardware page tables instead of a type checker to enforce program safety. Describe exactly how you might safely support the following Java primitive operations:

new: allocate an object of some type, returning a reference
invoke: call a method on some class, passing arguments and returning a value
getfield: read a variable from a class instance
putfield: write a variable to a class instance

Would you trap every memory access in the kernel? Could you represent an object reference as a direct memory address or would you need a descriptor table of some kind? Could you optimize the cases for public and package-scoped variables by keeping more pages mapped for read/write? How much trusted code must run in kernel mode vs. user mode? If you could rearrange how objects are located in memory, could you use this as an optimization?

You may assume your kernel supports an operation like Solaris mprotect(2) and kernel trap handlers can emulate the behavior the trapped instruction should have caused and then resume execution in the user code.

Question 4: Watermarking a Dictionary

The Collberg / Thomborson paper describes a number of different techniques for installing watermarks in code, both statically modifying code and dynamic modifying data structures. For this question, you're going to design a watermarked dictionary system.

There are a number of different data structures that can act as efficient dictionaries, allowing storage and lookup of key/value pairs. These data structures also allow you to iterate over all their key-value pairs but make no guarantees about what order they will arrive.

If this were a real system, your data structure would be linked with an interpreter allowing the dictionary to be loaded with new words, queried, and listed out (in no particular order, for this exercise...). When the system vendor wants to check a watermark, they will start with an empty dictionary, load a sample set of words, and list it back out again. Depending on your software watermark, the order of the list should be different, but all other queries should continue to function normally.

Please choose any appropriate data structure (perhaps one you learned in a CS theory class) and modify it to support some kind of software watermark. Present pseudocode for the algorithm and highlight watermark-relevant portions. Say which techniques from Collberg / Thomborson you plan to use and which techniques you don't feel are applicable.

Your new data structure should be queryable in O(log N) expected time (or better) if it has N nodes in it. You may use randomized data structures (although this will be tricky). Generic unbalanced trees are not acceptable.

How many bits of watermarking can you hide in your algorithm? What would an adversary need to do to remove your watermark bits? If an adversary manages to corrupt some of your bits, will the rest still be detectable?

Dan Wallach, CS Department, Rice University

Last modified: Fri Feb 19 03:15:22 CST 1999