Assignment #10: Serialization and Compression

In the last assignment you read the dictionary file. In the case of an applet, that meant an 800K download each time the applet is loaded. In this assignment you will write a program that will convert the 800K dictionary file dict.txt into a much smaller file named dict.ser.gzip (pick a name of your choosing), which will be about 75K, but contain all the information you need.

The basic idea is that instead of reading the dictionary every time you run the program, and then forming the Hashtable, you will write a data preparing program that forms the Hashtable once, and then writes it out to a file in serialized form. Your applet will then only need make a call to readObject to reconstruct the Hashtable. And since download speed depends on the size of the file, you might as well compress it.

The Data Preparation Program

The data preparation program simply reads the original dict.txt file and forms the Hashtable. I presume you already have code to do this from the last assignment. Let us suppose you have the Hashtable in an ht variable. Then
  1. Get a FileOutputStream object for dict.ser.gzip, fout
  2. Wrap that in a BufferedOutputStream object, bout
  3. Wrap that in a GZIPOutputStream object, gout
  4. Wrap that in an ObjectOutputStream object, oout
  5. Make the call oout.writeObject( ht )
You will notice that the output file is actually quite large, and this takes a long time. The important optimization that you need to do before writing is to step through the Hashtable and remove all entries in which the value is a vector of length one (i.e. words that have no anagrams). When you do that, the Hashtable will be much, much smaller, and so will the output file. (Depending on how you wrote you program, it is possible that this implicitly changes the behavior of your applet when the user types in a word that is not in the dictionary, but let's not worry about that.)

The Applet

The applet is unchanged, except that you will now get the compressed serialized file, (use four InputStreams, wrapped in parallel to the preparation program's input streams, with the FileInputStream replaced by the serialized dictionary URL's input stream), and then you can have the hashtable in a single call to readObject.

Deployment

Your applet must be deployed on a live public web site. If you have a solix account, you can do this by placing your assign10.html file, .class files and compressed dictionary in a directory named www. The URL will then be http://www.fiu.edu/~yourname/assign10.html. You must make sure that all your files are readable (do chmod 0644 *) in your www directory, and that your directory is usable (do chmod 0755 .) in your www directory. solix does have a java compiler, so if you upload via ftp your html and java sources, you can compile there. Alternatively, you can upload your html and class files, but make sure the class files are transfered in binary mode by your ftp program. If you have an Internet Service Provider that allows you to do web pages, you can place your applet there.

Applet Parameters

In order to make the applet more general, instead of hard-coding the name of the compressed serialized dictionary file, it should be an applet parameter. This means your HTML page will have an extra <PARAM> tag, and your Java code will make a call to getParameter.

What to Submit

Submit your source code (both Java and HTML) and sample output (via a screen snapshot). Provide the URL of your applet.