Assignment #4: Improving the Kevin Bacon Algorithm

You will make two (relatively minor) modifications to assignment #3. On ocelot this makes the program three times as fast, finishing everything in 27 seconds. On my Pentium 300, a similar 3x speedup was observed, resulting in a total runtime of about 68 seconds. You will also experiment with various other optimizations to see if they make any difference.

Using C Style I/O

It is generally the case that the C++ stream library is significantly slower than the C I/O library. This would only be an issue if I/O consumed a significant amount of the program's processing time. In our case, it does. In C, a file stream is represented with FILE objects. And in C, for efficiency, these types of objects are not passed directly; instead we use FILE *. C-style I/O does not interact with the string library. Instead, either you use character-at-a-time I/O, or you use C-style strings (arrays of characters, with NULL terminators). Here is an example of printing any file in C++ using C library routines.
#include <cstdio>
using namespace std;

void printFile( const string & fileName )
{
  static const int MAX_LINE_LEN = 300;
  static char str[ MAX_LINE_LEN + 2 ];  // leave room for '\n' and '\0'

  FILE *fp = fopen( fileName.c_str( ), "r" );
  if( fp == NULL )
    return;

  while( fgets( str, MAX_LINE_LEN, fp ) )
    cout << str;   // newline included in string if read in input
  fclose( fp );
}
To seamlessly substitute new code for old, what you want to do is to write the following routine:
bool getline( FILE *fin, string & oneLine );
and then open a file using fopen instead of an ifstream, and call this version of getline in place of the getline you have. For this code to be reusable outside of this assignment, it would be nice if you dealt with the possibility of lines longer than the limit you specify in the call to fgets; this is extra credit. You can do this because if a line is too long, only the specified number of characters will be read, and the newline character will not be placed in str. You could then concatenate the results of successive calls to fgets.

Using A Different Map

The STL map is more general than what you need because it stores the items in key-sorted order. An alternative is a hash table implementation of the map.
For seamless replacement, implement a new map using the following interface:

unsigned int hash( const string & key );

class HashMap
{
  public:
    HashMap( int cap = 101 );
    int & operator[] ( const string & k );
    int size( ) const;

  private:
    void put( const string & k, int v );
    void addNewPair( const string & k, int v, int pos ); // pos computed already by location
    void rehash( );
 
    int location( const string & k ) const;

    vector<string> keys;   // the keys
    vector<int>    vals;   // the values
    int items;             // number of keys/values in map
    int TABLE_SIZE;        // capacity of the map

    enum { INVALID_VALUE = -1 };  // This value signals empty key/value pair
};

Of course, you want to be careful to avoid excessive calls to hash and you should pay careful attention to optimizing your implementation as much as possible, since this is one of the critical parts of the running time. You do not have to use separate compilation for this class.

Other Possible Improvements

What To Submit

Submit your code, sample output, and timing information. Include a summary of how performance changed as you made code changes, including at lesat a fe of the above possibilities. Probably this means a one-page writeup. Finish the writeup ahead of time; it is part of the grade and I will be very unimpressed if you scribble something on your output.