Assignment #2: Indexing a Textbook

Generate an index for a book.

The Input

The input file is specified as a command-line argument (use args).

The input file consists of a set of index entries. Each line consists of the string IX:, followed by an index entry name enclosed in braces, followed by a page number that is enclosed in braces. Each ! in an index entry name represets a sub-level. A |( represents the start of a range and a |) represents the end of the range. Occasionally, this will be the same page, and in that case, only output a single page number. Otherwise, do not collapse or expand ranges on your own. As an example, here's a sample input:

IX: {Series|(}        	{2}
IX: {Series!geometric|(}        	{4}
IX: {Euler's constant}       	{4}
IX: {Series!geometric|)}        	{4}
IX: {Series!arithmetic|(}        	{4}
IX: {Series!arithmetic|)}        	{5}
IX: {Series!harmonic|(}        	{5}
IX: {Euler's constant}       	{5}
IX: {Series!harmonic|)}        	{5}
IX: {Series|)}        	{5}
The corresponding output is
Euler's constant: 4, 5
Series: 2-5
   arithmetic: 4-5
   geometric: 4
   harmonic: 5

General output (if there is any) goes to System.out. General errors/warnings goes to System.err, and also index.log. The generated index goes to a file named index.txt.

Strategy

Use a TreeMap to map the index entry to a List that stores information about the page numbers.

This problem is a slightly more involved variation of the concordance problem that I will discuss in class. Here, each entry in the List is not an Integer, but rather is an entity that stores a page number and an indication of whether the page number starts a range, finishes a range, or is on its own.

Ignore case distinctions so that the sort is case-independent. Also, make sure that the ! (exclamation point) compares as less than any other printable character (which is not what the default does). Otherwise, you will have subentries in the wrong place. To do this, define a function object that performs a case-insensitive comparison (operating on strings that have each ! replaced with a Unicode character that is lower in the character set (e.g. '\0')) and pass it to the TreeMap constructor.

You may assume that every line of the input is properly formatted. You must verify that |( and |) are properly matched, and print a warning message if they are not. Any strange-looking characters on the line can be accepted as part of the index entry (they are likely text-formatting commands).

Occasionally, there will be a sub-entry even though there is no main entry (or even a sub-sub-entry with no sub-entry and main entry). You will need to detect instances in which this occurs and fabricate a main entry. The easiest way of doing that is in the final output routine.

Handling the see also entries is extra credit. Look in the index of a real book to see how "see also" lines are typically processed. Generally, the page number goes to the main entry, and the see also entry is not numbered. In order to get points for extra credit you must have a program that works correctly for the required portion, and you must clearly state that you have implemented the extra credit as part of your documentation.

What To Submit

Submit complete source code and generate an index for my first textbook. Note: there are a few indexing errors.