Assignment #5: Kevin Bacon Game

For a description of the Kevin Bacon game, follow this occasionally unreliable link or look at Exercise 14.23 in the text. Try the game a few times and see if you can find someone with a Bacon Number higher than 3. In this assignment, you will find all persons with a Bacon Number of 8 or higher.

Strategy

This is basically a shortest path problem. After that is done, find the large Bacon numbers by scanning the bacon numbers and print out the high-numbered actors and their chains. To print out a high-numbered actor, you should use recursion. Specifically, if some actor x has a Bacon number of b, then you know that they must have been in a movie with someone, call them actor y with a Bacon number of b-1. To print out x's chain, you would print out y's chain (recursively) and then the movie that x and y had in common.

The Input Files

The input file is a single compressed file that contains a single Map<String,String[]> in which the keys are the names of actors, and the values are each an array representing the names of the movies in which the actor has a role. This file was last updated June 26, 2009. Here is a link to the file: bacon.ser.gz. This file is appx. 30Megabytes.

The datafile contains approximately 1,171,600 actors/actresses in a total of 331,500 movies, with 3,925,000 roles. Because of the large data, you may need to increase the Virtual Machine size depending on your current defaults. You may use up to 400Mbytes. You may not increase it any higher than that. If you are running the java interpreter from the command line, the magic option is -Xmx400m. If you are using an IDE, you will have to consult their documentation --- don't ask me.

Strategy

First, you must read the data file into a map that will allow you to determine all movies for any actor. Then you should create a second map that will allow you to determine all actors for any movies. You need to exercise reasonable care to avoid spending too much time or space. One idea is to have the values in that map be an ArrayList and use ensureCapacity and trimToSize judiciously. Once both maps are built, you just need to run a single-source unweighted shortest path algorithm. You will need to write this yourself; the code in the book will be useless because the graph will have too many edges. See the hints below. IF YOU CREATE EXCESSIVE OBJECTS, YOUR PROGRAM MAY SLOW TO A CRAWL BECAUSE ALL ITS TIME WILL BE SPENT IN THE GARBAGE COLLECTOR OR YOU MAY RUN OUT OF MEMORY COMPLETELY.

What to Submit

Submit complete source code and the actors/actresses with Bacon Numbers of 8 or higher. Include the complete paths for each of the actors/actresses (with shared movie titles). Also indicate how long your algorithm takes. This means how long it takes to load, and also how long it takes to run the shortest path computation (not including the output of the answer), by inserting calls to System.currentTimeMillis at appropriate points in your code, and tell me how many actors and movies there are. Also, insert this code (at the end of your program) that tells me how large the VM is:
        Runtime rt = Runtime.getRuntime( );        
        int vmsize = (int) rt.totalMemory( );
        
        System.out.println( "Virtual machine size: " + vmsize );
You cannot receive credit for a working program if your program fails to produce a large set of actors and actresses with Bacon Numbers of 8 or higher. Note: the data you will work on and the data the Oracle uses (and the IMDB data) are all slightly out of sync. So you might not be able to exactly reproduce the results, but it is definitely the case that some of the people your program will produce do indeed have a Bacon number of 8 according to the Oracle.

Additional Notes


  • The exact sizes of the data files is 30,633,090 bytes. If you download and the files are larger, then you messed up the download. Note that if you are using Windows XP, these files might be uncompressed during downloading. If so, you can use this program to recompress the file. If the files are smaller, then the download probably got interrupted before it finished and you will need to retry.
  • When reading make sure you are using BufferedInputStream. Otherwise, if you are running over a network, your program will slow to a crawl.
  • Sketch of the basic shortest path computation:
        
          // typically invoked with "Bacon, Kevin" as parameter
          // Returns a map in which keys are names of the actors,
          //   values are their bacon numbers
        public Map<String,Integer> computeBaconNumbers( String centralActor )
        {
            MAINTAIN A QUEUE
            MAINTAIN A MAP TO STORE AND RETURN BACON NUMBERS 
            MAINTAIN A SET TO AVOID LOOKING AT A MOVIE TWICE
    
            enqueue the central actor
            set the central actor's number to 0
    
            while the queue is not empty 
            {
                Remove an actor (the current actor), from the queue
                
                for each movie m in which the current actor has a role
                {         
                    if movie m has already been looked at 
                        continue
    
                    for each Actor, otherActor who appeared in m
                    {               
                        if otherActor does not yet have a Bacon number 
                        {
                            set otherActor's Bacon number to current's Bacon number + 1;
                            enqueue the otherActor
                        }
                    }
                }
            }    
    
            return the map
        }