Assignment #5: Word Changing

In this assignment, you will solve a generalized version of the word changing problem.

Covers

Graph algorithms, binary heaps, advanced Java programming.

The basic problem

A word can be changed to another word by a 1-character substitution. The basic problem would be to write a program that reads the dictionary and then repeatedly prompts the user for words A and B. The program determines if a word A can be transformed to a word B by a series of 1-character substitutions, and if so, outputs the corresponding sequence of words (using the fewest number of substitutions). As an example, bleed converts to blood by the sequence bleed, blend, blond, blood.

The actual problem

To make things more interesting, we will add an extension that allows two additional ways to transform words. First, you can add a letter (converting a k-character word to a k+1-character word); second you can remove a letter. However, these transformations are not as desireable as simply changing a letter. The cost of applying one of these additional transformations (the cost of changing a letter is always 1), is the length of the longer string in the transformation. In other words, converting from bend to blend costs 5, as does converting from blend to bend. Example: to convert from ask to way you might get the sequence ask, as, was, way, total cost of 7, but maybe you can do better without adding or removing letters.

Determining Adjacent Words

To determine which words are adjacent, and thus which edges are in the graph, you must use the following algorithm:
  1. Construct a map in which the key is an integer and the value is a set of all words that have the key as the word length. In other words, group the words together.
  2. For each word of length len, repeatedly remove one character from the word, forming a word of length len-1 and check if what forms is a word (you can use the map to find words of length len-1 and do a contains on that set. Any pairs that you find in this step also form a transormation of converting one word to another by adding a single character, so add an edge to the graph in the opposite direction. An outline of this procedure (this code is by no means complete) is as follows:
    for( Map.Entry<Integer,Set<String>>  e : theMap.entrySet( ) )
    {
        int theWordLen = e.getKey( );
        Set<String> theWords = e.getValues( );
    
          // You will need more work for the next line to handle theWordLen == 1
        Set<String> shorterWords = theMap.get( theWordLen - 1 );
    
        for( String s : theWords )
        {
            for( int i = 0; i < theWordLen; i++ )
            {
                String shorter = removeOneChar( s, i );
                if( shorterWords.contains( shorter ) )
                {
                    ADD TWO EDGES TO THE GRAPH
                }
            }
        }
    }
    
  3. To find adjacent words that are the same length, and differ in only one character, use the following algorithm:
    for( Map.Entry<Integer,Set<String>>  e : theMap.entrySet( ) )
    {
        int theWordLen = e.getKey( );
        Set<String> theWords = e.getValues( );
    
        String [ ] words = new String[ theWords.size( ) ];
    
        theWords.toArray( words );
    
        for( int i = 0; i < theWordLen; i++ )
        {
            SORT words by using a comparator that ignores position i
            FIND cliques of adjacent words (in the array) that are
                identical except for position i and ADD EDGES 
        }
    }
    
    To see how this algorithm works, consider words [ "eat", "fat", "fit", "fan", "far", "for" ]. When these words are sorted, and character 0 is ignored, we get [ "fan", "far", "eat", "fat", "fit", "for" ]. In this array, "eat" and "fat" are adjacent to each other and are identical except for position 0, so edges between them are added to the graph at cost 1. Then we sort, ignoring character 1, and we get [ "eat", "fan", "far", "for", "fat", "fit" ]. In this array "far" and "for" are adjacent to each other and identical except for position 1, so edges between them are add to the graph. The same goes for "fat" and "fit". Finally, we sort ignoring position 2. In this case, we get [ "eat", "fan", "far", "fat", "fit", "for" ]. Here, the clique "fan", "far", "fat" all are adjacent and all are identical except for position 2. So we add edges between every pair of words (a total of SIX directed edges).
You may combine steps two and three, and you may reorder the loops, but you must use the basic algorithm that involves sorting words with a comparator that ignores one character in the word.

What To Submit

Submit all your source code and the results of running the program on pairs of words that I will specify, using a dictionary I will provide. Include in your program timing data that shows
  1. The time to read the dictionary and build whatever internal data structures you need.
  2. The time to process each query.