Assignment #5: Kevin Bacon Game

Due: Dec 2, 11:59 PM

The Problem

For a description of the Kevin Bacon game, follow this occasionally unreliable link. Try the game a few times and see if you can find someone with a Bacon Number higher than 3. In this assignment, you will find the Bacon number of given actors and actresses.

The Input Files

The input file is a single compressed file that contains a single Map<String,String[]> in which the keys are the names of actors, and the values are each an array representing the names of the movies in which the actor has a role. This file was last updated June 26, 2009. Here is a link to the file: bacon.ser.gz. This zipped file is approximately 30 MB and the unzipped version is about 60 MB. It contains approximately 1,171,600 actors/actresses in a total of 331,500 movies, with 3,925,000 roles. Because of the large data, you may need to increase the Virtual Machine size depending on your current defaults. You may use up to 400 Mbytes. You may not increase it any higher than that. If you are running the java interpreter from the command line, the magic option is -Xmx400m. If you are using an IDE, you will have to consult their documentation --- don't ask me.

Strategy

This is basically a shortest path problem in an undirected, unweighted graph (breadth-first search).

First, declare a class called KevinBaconGame in package cop3530. In the constructor for this class, you must read the data file into a map Actor2Movies that will allow you to determine all movies for any actor. You will need a second map Movie2Actors that will allow you to determine all actors for any movies. I suggest using HashMaps as follows:

    HashMap <String,String[ ] > Actor2Movies;
    HashMap <String,ArrayList <String>> Movie2Actors;
You could also use a thrid map for the Bacon numbers.
  public class KevinBaconGame (String dataFileName) { // constructor
       ObjectInputStream in = new ObjectInputStream(new FileInputStream(dataFileName));

       Actor2Movies = (HashMap <String,String[]>) in.readObject();
       Movie2Actors = new HashMap <>();
      ...
  }
You need to exercise reasonable care to avoid spending too much time or space. You can use ensureCapacity and trimToSize to keep the space utilization of the Movie2Actors map in check. Here are suggestions for building the Movie2Actors map.
  for (String actor : Actor2Movies.keySet()) {
      for (String movie : Actor2Movies.get(actor)) {
          if (!Movie2Actors.containsKey(movie)) {
              Movie2Actors.put(movie, new ArrayList());
              Movie2Actors.get(movie).ensureCapacity((int)(2* roles_sz / movies_sz));
          }
          Movie2Actors.get(movie).add(actor);
       }
   }
   for (String movie : Movie2Actors.keySet()) {
       Movie2Actors.get(movie).trimToSize();
   }

You need to write a method called getBaconNumber(String s) that returns an integer Bacon number. The implementation of this method uses a single-source unweighted shortest path algorithm without explicitly constructing the graph or its adjacency list. Note that the Bacon number of an actor such as "Campbell, Neve" is 1, because she has been in a movie with Kevin Bacon. Among the others, anyone who has been in a movie with an actor/actress with Bacon number 1, will have Bacon number 2, and so on.

As in the previous assignment, the test program will be provided as a .class file, without any Java source, so it is important that you write sufficient test cases on your own to convince yourself that your logic works.

What to Submit

Submit your source code (all java files), all class files for the java files, the tester class file, and the result of running the class file that I provide. Do not attach the data file "bacon.ser" in zipped or unzipped form. All of these should be zipped and one zip file (appropriately named) should be submitted. The zip file should represent the correct directory structure of the files needed to run. When the grader runs your program, he will place the data file "bacon.ser" in the same directory as the tester class file. Make sure your program runs with that assumption. Also, insert this code (at the end of your program) that tells me how large the VM is:
        Runtime rt = Runtime.getRuntime( );        
        int vmsize = (int) rt.totalMemory( );
        
        System.out.println( "Virtual machine size: " + vmsize );

  • Sketch of the getBaconNumber shortest path algorithm:
      // Returns the bacon number of a given actor/actress
      public int getBaconNumber( String queryActor ) {
            maintain queue Q of actors
            maintain a map to store Bacon numbers of actors 
            maintain a set of movies to avoid looking at a movie twice 
    
            enqueue "Bacon, Kevin" and set his Bacon number to 0
    
            while Q is not empty {
                Remove an actor A from queue Q
                for each movie m in which A has a role          
                    if movie m has not already been looked at 
                        for each actor B who appeared in movie m {               
                            if B does not yet have a Bacon number {
                                set B's Bacon number to A's Bacon number + 1
                                enqueue B
                            }
                            if B is queryActor then return B's Bacon number
                        }
            }    
        }