For a description of the Kevin Bacon game, follow this occasionally unreliable link or look at Exercise 14.23 in the text. Try the game a few times and see if you can find someone with a Bacon Number higher than 3. In this assignment, you will find all persons with a Bacon Number of 8 or higher. One of these persons is a former President of the United States.
This is basically a shortest path problem. After that is done, find the large Bacon numbers by scanning the bacon numbers and print out the high-numbered actors and their chains. To print out a high-numbered actor, you should use recursion. Specifically, if some actor x has a Bacon number of b, then you know that they must have been in a movie with someone, call them actor y with a Bacon number of b-1. To print out x's chain, you would print out y's chain (recursively) and then the movie that x and y had in common.
These datafiles contain approximately 571,000 actors/actresses in a total of 192,000 movies, with 2,144,000 roles. These files also list TV roles, but you must not include TV roles in your analysis.
Before you run on the large data sets, use the small (uncompressed) file sample.list to debug the basic algorithms. In this data file, there are six actors, named a, b, c, d, e, and f, who have been in movies such as X, Y, and Z.
public class Database
{
private static final class Actor
{
String name;
int data; // Bacon number
,
// determined by computeBaconNumbers
public String toString( )
{ return name; }
public int hashCode( )
{ return name.hashCode( );
}
public boolean equals( Object other )
{ return (other instanceof
Actor) &&
( (Actor) other ).name.equals( name ); }
}
// Open fileName; update the maps
public void loadFile( String fileName ) throws IOException
{ ... }
public int getActorCount( )
{ ... }
public int getMovieCount( )
{ ... }
private Map actorsToMovies = new HashMap( );
private Map moviesToActors = new HashMap( );
}
First, you will need to increase the maximum size of the Java Virtual Machine from the default of 64Meg to 224Meg. You may not increase it any higher than that. If you are running the java interpreter from the command line, the magic option is -Xmx224m. If you are using an IDE, you will have to consult their documentation --- don't ask me.
Second, you will quickly run out of memory, because if you find two movies that are the same, but are on different input lines, the default setup will create two separate (yet equal) String objects and place them in the value lists of two different actors. Since there are 2.1 million roles, but only 192,000 movies, this means that you will have ten times as many String objects as you really need. What you need to do is to make sure that each movie title is represented by a single String object, and that the maps simply store references to that single String object. There are two basic alternatives:
When you maintain the list of movies for each actor, you will want to use an ArrayList. It takes little effort to ensure that the capacity in the ArrayList is not more than is needed, and you should do so, to avoid wasting space (since there are 571,000 such array lists).
When you construct the HashMaps, you can issue a parameter (the load factor). The higher the load factor, the less space you use (at the expense of a small increase in time). You should play around with that too; the default is 0.75; you will probably want a load factor of 2.00 or 3.00.
You must be very careful to avoid doing any more work in the inner loops than you need to, including creating excessive objects. IF YOU CREATE EXCESSIVE OBJECTS, YOUR PROGRAM MAY SLOW TO A CRAWL BECAUSE ALL ITS TIME WILL BE SPENT IN THE GARBAGE COLLECTOR OR RUN OUT OF MEMORY COMPLETELY.
Runtime rt = Runtime.getRuntime( ); int vmsize = (int) rt.totalMemory( ); System.out.println( "Virtual machine size: " + vmsize );Don't forget to write in the processor speed of the AUL computer you are using. If it's somewhat fast, or provably space-efficient, you can get extra credit. You cannot receive credit for a working program if your program fails to produce a large set of actors and actresses with Bacon Numbers of 8 or higher. Note: the data you will work on and the data the Oracle uses (and the IMDB data) are all slightly out of sync. So you might not be able to exactly reproduce the results, although I was able to get a complete set of Bacon Numbers 8 and higher when I ran my program on October 23, using the Oct 17 files.
// typically invoked with Kevin Bacon as parameter // This is the basic algorithm; there's stuff you // should do to avoid looking at the same cast twice. public void computeBaconNumbers( String centralActor ) { for each actor, act, set act's Bacon number to INFINITY; set centralActor's Bacon Number to 0; enqueue( centralActor ); while ( !queue.isEmpty( ) ) { Actor current = (Actor) queue.removeFirst( ); List currentMovies = (List) actorsToMovies.get( current ); for each movie, thisMovie, in currentMovies { List sharedActors = (List) moviesToActors.get( thisMovie ); for each Actor, otherActor, in sharedActors { if( otherActor's Bacon number is INFINITY ) { set otherActor's Bacon number to current's Bacon number + 1; enqueue( otherActor ); } } } } }