Assignment #10: Serialization and Compression
In the last assignment you read the dictionary file. In the case of an
applet, that meant an 800K download each time the applet is loaded. In
this assignment you will write a program that will convert the 800K dictionary
file dict.txt into a much smaller file named dict.ser.gzip
(pick a name of your choosing), which will be about 75K, but contain all
the information you need.
The basic idea is that instead of reading the dictionary every time
you run the program, and then forming the Hashtable, you will
write a data preparing program that forms the Hashtable once,
and then writes it out to a file in serialized form. Your applet will then
only need make a call to readObject to reconstruct the Hashtable.
And since download speed depends on the size of the file, you might as
well compress it.
The Data Preparation Program
The data preparation program simply reads the original dict.txt
file and forms the Hashtable. I presume you already have code
to do this from the last assignment. Let us suppose you have the Hashtable
in an ht variable. Then
-
Get a FileOutputStream object for dict.ser.gzip, fout
-
Wrap that in a BufferedOutputStream object, bout
-
Wrap that in a GZIPOutputStream object, gout
-
Wrap that in an ObjectOutputStream object, oout
-
Make the call oout.writeObject( ht )
You will notice that the output file is actually quite large, and this
takes a long time. The important optimization that you need to do before
writing is to step through the Hashtable and remove all entries
in which the value is a vector of length one (i.e. words that have no anagrams).
When you do that, the Hashtable will be much, much smaller, and
so will the output file. (Depending on how you wrote you program, it is
possible that this implicitly changes the behavior of your applet when
the user types in a word that is not in the dictionary, but let's not worry
about that.)
The Applet
The applet is unchanged, except that you will now get the compressed serialized
file, (use four InputStreams, wrapped in parallel to the preparation
program's input streams, with the FileInputStream replaced
by the serialized dictionary URL's input stream),
and then you can have the hashtable in a single
call to readObject.
Deployment
Your applet must be deployed on a live public web site.
If you have a solix account, you can do this by placing your
assign10.html file,
.class files and compressed dictionary in
a directory named www.
The URL will then be http://www.fiu.edu/~yourname/assign10.html.
You must make sure that all your files are readable
(do chmod 0644 *) in your www directory,
and that your directory is usable
(do chmod 0755 .) in your www directory.
solix does have a java compiler, so if you upload
via ftp your html and java sources, you can compile there.
Alternatively, you can upload your html and class files, but make
sure the class files are transfered in binary mode by your ftp program.
If you have an Internet Service Provider that allows you to do web pages,
you can place your applet there.
Applet Parameters
In order to make the applet more general, instead of hard-coding the
name of the compressed serialized dictionary file, it should
be an applet parameter.
This means your HTML page will have an extra
<PARAM> tag, and your Java code will
make a call to getParameter.
What to Submit
Submit your source code (both Java and HTML)
and sample output (via a screen snapshot).
Provide the URL of your applet.