Assignment #9

This assignment requires to use several components of the Standard Library to implement a spell-checker. The amount of code that you will write is not large.

Specifications

Prompt the user for the name of a file that stores a dictionary of words. Then prompt for the name of a file that you want to spell-check. Any word that is not in the dictionary is considered to be misspelled. Output, in sorted order, each misspelled word and the line number(s) on which it occurs.If a word is misspelled more than once, it is listed once, but with several line numbers. Of course you should verify that files open correctly.

What's A Word?

For the purposes of this assignment, you will determine words as follows: The input is considered to be a sequence of tokens separated by whitespace. Any token that ends with a single period, question mark, comma, semicolon, or colon should have the punctuation removed. After doing this, any token that contains letters only is considered a word. Convert this word to lower case.

Example: For the following line
This is a test, one-half of four is 2.
The tokens are:
This
is
a
test,
one-half
of
four
is
2.
Among these, the words are:
this
is
a
test
of
four
is
one-half fails the rule of consisting entirely of letters, as does 2. This is converted to lower case and test has the punctuation at the end stripped.

These are the rules, even if I've missed a few cases (like apostrophes, etc.).

The Dictionary

The dictionary contains one word per line. A large dictionary (~800Kbytes) is available (it may take a little time to download). This dictionary was obtained from the Internet and may have inappropriate words. I apologize in advance if this is the case.

The Algorithm

Read the dictionary file and store its contents in a set<string>. Then read the data input file, one line at a time. Break the line into tokens using an istringstream object, and then write some functions to convert the tokens to words (or an empty string if it is not a word). Once you have a word, check to see if it is in the set<string> that stores the dictionary. If it is not, you will need to add it to a map<string,list<int> > that stores the misspelled words and the line numbers on which they occur. (This implies that you know the current line number.) Once everything is read, you need to step through the map and print its contents in an orderly way.

Header Files

You'll need
#include <iostream>
#include <fstream>
#include <sstream>
#include <set>
#include <map>
#include <list>
#include <cctype>   // contains isalpha( ), to check for letters
using namespace std;
 

What to Submit

Submit your complete source code and the results of running on the data file ch3.txt.