Assignment #5: Sorting Program

[Modified Exercise 8.28]: Write a simple sorting utility, sort. The sort command takes a file name as a parameter. The file contains a sequence of lines; each line contains fields. By default, the fields are separated by white space (space or tab, in any amount), but the -t option allows you to use any one character as a field separator. By default fields are to be interpreted as strings, but the -n option allows you to interpret a field as a (double) number. Finally, by default lines are to be sorted as if the entire line was a single string (breaking ties arbitrarily). However, the -k option can be used to specify that the sort is performed using a specified field.

Options may appear in any order. The file name may appear anywhere; if it is not present, use cin. You may assume a limit of nine fields. The sorted output is written to cout.

Examples

If the command is
sort -n -t: -k2 fileName
and the data file fileName is
10:20:30:40
20:100:40:30
30:40:10:20
Then the output is sorted on the basis of the second field, which is interpreted as a number, and is thus
10:20:30:40
30:40:10:20
20:100:40:30
Without the -n option, the output would be
20:100:40:30
10:20:30:40
30:40:10:20
Without any options, each line would be viewed as a giant string. The resulting output would be
10:20:30:40
20:100:40:30
30:40:10:20

Unusual Circumstances

If the file doesn't open, a line does not have enough fields, or a field that is to be a number can't be a number, then you have to do something reasonable. I will accept anything reasonable, but you have to document the behavior. You also must perform other checks, including verification that the command-line arguments make sense. For instance, a character must follow -t and -k.

Algorithms

You already have the quicksort algorithm in the online code and I expect you to use it UNMODIFIED. The idea is that quicksort is a template and can work with anything. So define a class that stores the complete line (as a String) and the key, on which the sort is based. Since the key could be either a double or a String, the logical declaration is:
template <class Comparable>
struct Line
{
    String     theLine;
    Comparable theKey;

    int operator< ( const Line & rhs ) const
      { return theKey < rhs.theKey; }
};
At this point, you need to hook everything together, and much of the work is parsing the command-line arguments and then the lines. I will sketch some of the ideas in class on Monday, but that will be all that I have to say.

C++ Stuff

Submission Details

I will provide four good test cases of various sizes, using various options, and two additional test cases that are degenerate. Look for them on November 9.
  1. Submit all your source code, and the results of running the program on the test cases.
  2. You must also provide a "man" page that explains what your program does. In particular, tell me how you handle degenerate cases. Try man sort to see what the Unix page looks like; yours should be more complete. I expect a professional job -- type it!
  3. You must tell me if the bad operator[] for strings was a problem, and if so, how you worked around it.

The man page, and operator[] decision are important components of your grade.

Extra Credit

  1. The data movement of all the strings can be excessive. Use pointers in the Line class to avoid the extra cost.
  2. Efficiently add a -r option that reverses the order of the comparisons. Thus the sort proceeds from largest to smallest.
  3. Efficiently add a -f that makes the sort case-insensitive.