Assignment #5: Word Changing

In this assignment, you will solve a generalized version of the word changing problem.

Covers

Graph algorithms, binary heaps, advanced Java programming.

The basic problem

A word can be changed to another word by a 1-character substitution. The basic problem would be to write a program that reads the dictionary and then repeatedly prompts the user for words A and B. The program determines if a word A can be transformed to a word B by a series of 1-character substitutions, and if so, outputs the corresponding sequence of words (using the fewest number of substitutions). As an example, bleed converts to blood by the sequence bleed, blend, blond, blood.

The actual problem

To make things more interesting, we will add an extension that allows two additional ways to transform words. First, you can add a letter (converting a k-character word to a k+1-character word); second you can remove a letter. However, these transformations are not as desireable as simply changing a letter. The cost of applying one of these additional transformations (the cost of changing a letter is always 1), is the length of the longer string in the transformation. In other words, converting from bend to blend costs 5, as does converting from blend to bend. Example: to convert from ask to way you might get the sequence ask, as, was, way, total cost of 7, but maybe you can do better without adding or removing letters.

Determining Adjacent Words

To determine which words are adjacent, and thus which edges are in the graph, you must use the following algorithm:

Construct a map in which the key is an integer and the value is a set of all words that have the key as the word length. In other words, group the words together.

For each word of length len, repeatedly remove one character from the word, forming a word of length len-1 and check if what forms is a word (you can use the map to find words of length len-1 and do a contains on that set). Any pairs that you find in this step also form a transormation of converting one word to another by adding a single character, so add an edge to the graph in the opposite direction. An outline of this procedure (this code is by no means complete) is as follows:

for( Map.Entry<Integer,Set<String>>  e : theMap.entrySet( ) )
{
    int theWordLen = e.getKey( );
    Set<String> theWords = e.getValues( );

      // You will need more work for the next line to handle theWordLen == 1
    Set<String> shorterWords = theMap.get( theWordLen - 1 );

    for( String s : theWords )
    {
        for( int i = 0; i < theWordLen; i++ )
        {
            String shorter = removeOneChar( s, i );
            if( shorterWords.contains( shorter ) )
            {
                ADD TWO EDGES TO THE GRAPH
            }
        }
    }
}

To find adjacent words that are the same length, and differ in only one character, use the following algorithm:

for( Map.Entry<Integer,Set<String>>  e : theMap.entrySet( ) )
{
    int theWordLen = e.getKey( );
    Set<String> theWords = e.getValues( );

    for( int i = 0; i < theWordLen; i++ )
    {
          // CREATE AN EMPTY MAP, key is String, value is List of String
          //    such that the key represents all the strings in its corresponding
          //    value with character at position i removed
        Map<String,List<String>> repMap = new TreeMap<String,List<String>>( );
    
        for( String w : theWords )
        {
            String shorter = removeOneChar( s, i );
    
            UPDATE repMap with shorter and w
        }
    
        FIND cliques of words (lists with length more than 1 in repMap) that are
            identical except for position i and ADD EDGES 
    }
}

You may combine steps two and three, but you must use the basic algorithm that involves using maps.

What To Submit

Submit all your source code and the results of running the program on pairs of words that I will specify, using a dictionary I will provide. Include in your program timing data that shows

The time to read the dictionary and build whatever internal data structures you need.
The time to process each query.