Assignment #4: Anagrams

Find large sets of words that are anagrams for each other. Two words are anagrams if they contain the same set of letters (with same frequencies). For instance, least and steal are anagrams. So are steal and stale. In fact, least, steal, tales, stale, and slate are anagrams of each other, and form a large group of anagrams. You must find all large groups (five or more words) of anagrams contained in a dictionary of words: dict.txt (usual disclaimer: it's not my dictionary, and it is not feasible for me to manually purge inappropriate words).

Strategy

For each word, compute its representative. The representative is the characters of the word in sorted order. For instance, the representative for the word enraged is adeegnr. Observe that words that are anagrams will have the same representative. Thus the representative for grenade is also adeegnr. If you were programming in Java, you could use a Map in which the key is a String that is a representative, and the value is a List<String> of all words that have the key as their representative. After constructing the Map, you would simply need to find all values whose Lists have size five or higher and print those Listss. Of course, there is no Map class in C. Use a separate chaining hash table (sample code is in the C programming book). You can use the following basic node structure:

/*
 * Represents a node in a singly linked list of strings
 */
struct StringListNode
{
    char *word;
    struct StringListNode *next;
};

/*
 * Represents a node in a list of a separate chaining hash table
 */
struct HashListNode
{
    char *representative;          /* representative */
    struct StringListNode *first;  /* list of words */
    struct HashListNode *next;     /* next node in the hash list */
};