Assignment #5: Word Changing

In this assignment, you will solve a generalized version of the word changing problem.

Covers

Graph algorithms, binary heaps, lists, maps, advanced Java programming.

The basic problem

A word can be changed to another word by a 1-character substitution. The basic problem would be to write a program that reads the dictionary and then repeatedly prompts the user for words A and B. The program determines if a word A can be transformed to a word B by a series of 1-character substitutions, and if so, outputs the corresponding sequence of words (using the fewest number of substitutions). As an example, bleed converts to blood by the sequence bleed, blend, blond, blood.

The actual problem

To make things more interesting, we will add an extension that allows two additional ways to transform words. First, you can add a letter (converting a k-character word to a k+1-character word); second you can remove a letter. However, these transformations are not as desireable as simply changing a letter. Hence, the user will specify, either by command-line argument or by input from System.in (your choice), the cost of applying one of these additional transformations (the cost of changing a letter is always 1), and now your problem is to find a transformation of minimum total cost. Example: to convert from ask to way with a cost of 4 for adding or removing a letter, you might get the sequence ask, as, was, way, total cost of 9, but maybe you can do better without adding or removing letters. Observe that if the cost of an additional transformation is huge (larger than the number of words in the dictionary), then the transformation is applied only as a last resort, if no sequence of 1-character substitutions is available at all.

Algorithm

This is a weighted shortest path problem.

The vertices are words.
There is an edge of cost 1 between words of equal length that are identical except for one character.
There is an edge of cost p between a word s and a word t where t is a result of removing one letter in s. There is also an edge of cost p between t and s.

You can use the Graph class in the online code.

Computing the edges in Case 3 above is easy. For each word, remove one character in each of the possible places and see if the result is also a word. If so, add the edge of cost p between the word and the result (in both directions).

Computing the edges in Case 2 is trickier. First, create a Map in which the keys are integers representing word lengths, and the values are collections of words (of the word length given by the key). Then you can work on groups of words that have the same word length. There is an obvious quadratic algorithm that you can use at this point, but it is too slow. In class I will discuss a better way, but the idea is as follows: for each word, remove the character in the first position. Call the result the representative of the word. As was the case in the anagram program, words that are Case 2, and differ in only the first position will have the same representative. So you can use a map as you did for anagrams. Then repeat this for the second position, third position, and so on. You should avoid having all the maps active at the same time, or you might run out of memory.

What To Submit

Submit all your source code and the results of running the program on pairs of words and transformation costs that I will specify, using a dictionary I will provide. Include in your program timing data that shows

The time to read the dictionary and build whatever internal data structures you need. This should be less than 15 seconds.
The time to process each query This should be less than 1 second.