Assignment #5: Word Changing
In this assignment,
you will solve a generalized version of the word changing problem.
Covers
Graph algorithms, binary heaps, lists, maps, advanced Java programming.
The basic problem
A word can be changed to another word by a 1-character substitution.
The basic problem would be to write a program that reads the dictionary
and then repeatedly prompts the user for words A and B.
The program determines if a word A can be transformed
to a word B by a series of 1-character substitutions, and if so,
outputs the corresponding sequence of words
(using the fewest number of substitutions). As an example,
bleed converts to blood
by the sequence bleed, blend, blond, blood.
The actual problem
To make things more interesting,
we will add an extension that allows two additional ways to transform words.
First, you can add a letter (converting a k-character word
to a k+1-character word); second you can remove a letter.
However, these transformations are not as desireable as simply changing
a letter. Hence, the user will specify, either by command-line argument
or by input from System.in (your choice), the cost of applying
one of these additional transformations (the cost of changing a letter
is always 1), and now your problem is to find a transformation
of minimum total cost.
Example: to convert from ask to way with a cost of
4 for adding or removing a letter, you might get the sequence
ask, as, was, way, total cost of 9,
but maybe you can do better without adding or removing letters.
Observe that if the cost of an additional transformation is huge
(larger than the number of words in the dictionary), then the transformation
is applied only as a last resort, if no sequence of 1-character
substitutions is available at all.
Algorithm
This is a weighted shortest path problem.
- The vertices are words.
- There is an edge of cost 1 between words of equal length that are identical except for
one character.
- There is an edge of cost p between a word s and a word t where
t is a result of removing one letter in s. There is also
an edge of cost p between t and s.
You can use the Graph class in the online code.
Computing the edges in Case 3 above is easy. For each word, remove one character
in each of the possible places
and see if the result is also a word. If so, add the edge of cost p between
the word and the result (in both directions).
Computing the edges in Case 2 is trickier.
First, create a Map in which the keys are integers representing word lengths,
and the values are collections of words (of the word length given by the key).
Then you can work on groups of words that have the same word length.
There is an obvious quadratic algorithm that you can use at this point, but it is
too slow. In class I will discuss a better way, but the idea is as follows:
for each word, remove the character in the first position. Call the result
the representative of the word. As was the case in the anagram program,
words that are Case 2, and differ in only the first position will have the same
representative. So you can use a map as you did for anagrams.
Then repeat this for the second position, third position, and so on.
You should avoid having all the maps active at the same time, or you
might run out of memory.
What To Submit
Submit all your source code and the results of running the program
on pairs of words and transformation costs that
I will specify, using a dictionary I will provide.
Include in your program timing data that shows
- The time to read the dictionary and build whatever internal data
structures you need.
This should be less than 15 seconds.
- The time to process each query
This should be less than 1 second.