Assignment #4: Improving the Kevin Bacon Algorithm
You will make two (relatively minor) modifications to assignment #3. On
ocelot this makes the program three times as fast, finishing everything
in 27 seconds. On my Pentium 300, a similar 3x speedup was observed, resulting
in a total runtime of about 68 seconds. You will also experiment with various
other optimizations to see if they make any difference.
Using C Style I/O
It is generally the case that the C++ stream library is significantly slower
than the C I/O library. This would only be an issue if I/O consumed a significant
amount of the program's processing time. In our case, it does. In C, a
file stream is represented with FILE objects. And in C, for efficiency,
these types of objects are not passed directly; instead we use FILE
*. C-style I/O does not interact with the string library. Instead,
either you use character-at-a-time I/O, or you use C-style strings (arrays
of characters, with NULL terminators). Here is an example of printing any
file in C++ using C library routines.
#include <cstdio>
using namespace std;
void printFile( const string & fileName )
{
static const int MAX_LINE_LEN = 300;
static char str[ MAX_LINE_LEN + 2 ]; // leave room
for '\n' and '\0'
FILE *fp = fopen( fileName.c_str( ), "r" );
if( fp == NULL )
return;
while( fgets( str, MAX_LINE_LEN, fp ) )
cout << str; // newline included
in string if read in input
fclose( fp );
}
To seamlessly substitute new code for old, what you want to do is to
write the following routine:
bool getline( FILE *fin, string & oneLine );
and then open a file using fopen instead of an ifstream,
and call this version of getline in place of the getline
you have. For this code to be reusable outside of this assignment, it would
be nice if you dealt with the possibility of lines longer than the limit
you specify in the call to fgets; this is extra credit. You can
do this because if a line is too long, only the specified number of characters
will be read, and the newline character will not be placed in str.
You could then concatenate the results of successive calls to fgets.
Using A Different Map
The STL map is more general than what you need because it stores the items
in key-sorted order. An alternative is a hash table implementation of the
map.
For seamless replacement, implement a new map using the following interface:
unsigned int hash( const string & key );
class HashMap
{
public:
HashMap( int cap = 101 );
int & operator[] ( const string & k
);
int size( ) const;
private:
void put( const string & k, int v );
void addNewPair( const string & k, int v,
int pos ); // pos computed already by location
void rehash( );
int location( const string & k ) const;
vector<string> keys; // the keys
vector<int> vals;
// the values
int items;
// number of keys/values in map
int TABLE_SIZE;
// capacity of the map
enum { INVALID_VALUE = -1 }; // This value
signals empty key/value pair
};
Of course, you want to be careful to avoid excessive calls to hash
and you should pay careful attention to optimizing your implementation
as much as possible, since this is one of the critical parts of the running
time. You do not have to use separate compilation for this class.
Other Possible Improvements
There are numerous other improvements that you can experiment with that
will be easy to implement. Here are some of them:
-
For the hash table, experiment with different initial capacities to avoid
rehashing.
-
For the hash table, experiment with different cutoff points for rehashing.
-
For the hash table, experiment with different hash functions; for instance,
using fewer characters.
-
In general, when accessing characters of a string str, replace
this code:
string str = ....;
for( int i = 0; i < str.length( ); i++ )
...str[i]...
with code that avoids operator[]:
string str = ...;
const char *cstr = str.c_str( );
for( int i = 0; cstr[ i ] != '\0'; i++ )
...cstr[i]...
-
Experiment with reserving vector space instead of expanding on the fly.
-
When doing string searches, experiment with variants that begin the search
at a specified position instead of position 0.
-
When doing string searches, experiment with using C-style strstr
function (in conjunction with c_str).
-
If you wrote your own routines, such as isdigit, experiment with
replacing by macros in <cctype>.
-
If you need to search for characters near the end of a string, experiment
with rfind.
What To Submit
Submit your code, sample output, and timing information. Include a summary
of how performance changed as you made code changes, including at lesat
a fe of the above possibilities. Probably this means a one-page writeup.
Finish the writeup ahead of time; it is part of the grade and I will be
very unimpressed if you scribble something on your output.