Spell Check

Spell Check

 Introduction ​ ​to ​ ​Computer ​ ​Programming

This assignment requires you to create four files: SpellChecker.h, SpellChecker.cpp, WordCounts.h and WordCounts.cpp ​. The class definitions should be in the .h files and the implementation of the methods in the .cpp files. Your main.cpp file should be used to test your implementation of your classes. You can create a project in CodeBlocks to create the main.cpp file and add the two classes to the project. CodeBlocks can create projects (use console application template) and will create the .h and the .cpp files when you add classes to the project. Code Blocks will compile each file for you and link them into the project executable for testing. You should ​NOT be using the ​#include “SpellChecker.cpp” ​and #include “WordCounts.cpp” ​in your main file to include your code into ​ ​main.
1. Once you have your code running on your virtual machine (VM), submit a zip file with
main.cpp, SpellChecker.h, SpellChecker.cpp, WordCounts.h and WordCounts.cpp to the
autograder ​ ​COG.
Part​ ​I

In this part of the assignment, you are to create a class, ​SpellChecker ​. You will define some class data members, member methods and helper functions. The class methods will be used to check the spelling of words and assess the word count across documents. Elements of this assignment are intentionally vague; at this point in the semester, you should be able to make your own decisions about appropriate data structures for storing and looking up data, as well as defining helper functions. You can assume that your code will never be storing more than 10,000 valid or corrected words.
SpellChecker ​ ​should ​ ​have ​ ​​at ​ ​least ​ ​the ​ ​following ​ ​​Public ​ ​members ​:
● string language ​: the name of the language this spell checker is using (i.e. “English”, “Spanish”, “Italian”, ​ ​”Hindi”, ​ ​…)

SpellChecker​ ​should ​ ​have ​ ​​at ​ ​least ​ ​the ​ ​following ​ ​​Private ​ ​members ​:
● char
​ ​start_marker ​: ​ ​used ​ ​for ​ ​marking ​ ​the ​ ​beginning ​ ​of ​ ​an ​ ​unknown ​ ​word ​ ​in ​ ​a ​ ​string
● char
​ ​end_marker ​: ​ ​used ​ ​for ​ ​marking ​ ​the ​ ​end ​ ​of ​ ​an ​ ​unknown ​ ​word.
SpellChecker​ ​​should ​ ​have ​ ​three ​ ​constructors ​:
● Default ​ ​Constructor, ​ ​the ​ ​one ​ ​with ​ ​no ​ ​arguments.
● Second ​ ​constructor ​ ​that ​ ​takes ​ ​a ​ ​string ​ ​for ​ ​the ​ ​object’s ​ ​​language ​.
● Third constructor that takes a string for the object’s ​language and two filenames as
parameters. The first filename specifies the file with correctly spelled words and the second filename ​ ​specifies ​ ​the ​ ​misspelled ​ ​words ​ ​with ​ ​their ​ ​corrections.
You ​ ​will ​ ​be ​ ​dealing ​ ​with ​ ​two ​ ​different ​ ​files:
● The ​ ​data ​ ​in ​ ​the ​ ​first ​ ​filename ​ ​supplies ​ ​a ​ ​​ ​list ​ ​of ​ ​correctly ​ ​spelled ​ ​words, ​ ​one ​ ​word ​ ​per ​ ​line:
● The data in the second filename contains a list of misspelled words and their correct
spellings. ​ ​​ ​The ​ ​word ​ ​and ​ ​its ​ ​correction ​ ​are ​ ​separated ​ ​by ​ ​a ​ ​tab ​ ​character ​ ​(‘\t’):
It is ​very ​important you understand the format of this file. The correctly spelled words may have spaces ​ ​in ​ ​them! ​ ​​ ​For ​ ​example ​ ​a ​ ​file ​ ​that ​ ​converts ​ ​common ​ ​texting ​ ​abbreviations ​ ​into ​ ​words:
The constructor with the filename arguments should open the files and read them into an
appropriate data members of the class. To find if a ​word is a valid spelling or is a misspelling, you should ​ ​think ​ ​about ​ ​storing ​ ​the ​ ​words ​ ​in ​ ​the ​ ​right ​ ​structure ​ ​so ​ ​that ​ ​it’s ​ ​easy ​ ​to ​ ​search ​ ​and ​ ​access ​ ​it.
SpellChecker ​should ​ ​also ​ ​include ​ ​the ​ ​following ​ ​public ​ ​methods:
● bool readValidWords(string filename) ​: this method should read in a file in exactly the
same way as detailed in the description of the constructor. This file will have the format
specified for correctly spelled words. This method should return a boolean of whether or not the file was successfully read in. This method should add the words from the file to the list of words ​ ​already ​ ​contained ​ ​in ​ ​the ​ ​object.
● bool readCorrectedWords(string filename) ​: ​this method should read in a file in exactly
the same way as detailed in the description of the constructor. The file will have the format
specified for the wrongly spelled words and their corrected spellings. This method should
return a boolean of whether or not the file was successfully read in. This method should add the ​ ​words ​ ​from ​ ​the ​ ​file ​ ​to ​ ​the ​ ​list ​ ​of ​ ​words ​ ​already ​ ​contained ​ ​in ​ ​the ​ ​object.
● Setters and Getters for the markers to be used for unknown words.
○ return ​ ​true ​ ​for ​ ​the ​ ​settters ​ ​if ​ ​the ​ ​new ​ ​marker ​ ​has ​ ​been ​ ​accepted
■bool ​ ​setStartMarker(char ​ ​begin)
■bool ​ ​setEndMarker(char ​ ​end)
○ char ​ ​getStartMarker()
○ char ​ ​getEndMarker()
● string repair(string sentence) ​: Repair will take in a string of multiple words, strip out all
the punctuation, ignore the case and return the sentence with all misspellings replaced or
marked. ​ ​For ​ ​example: ​ ​here ​ ​are ​ ​what ​ ​the ​ ​following ​ ​calls ​ ​would ​ ​return:
If you cannot find a word in the list of valid words or in the list of misspelled words (for
instance, if the word is misspelled beyond recognition), you should just return the misspelled words with the ​start_marker in front and the ​end_marker ​at the end. For example: if start_marker ​ ​​and ​ ​​end_marker ​ ​​are ​ ​both ​ ​‘~’, ​ ​the ​ ​call:
● (Challenge Problem) repairFile(string input_filename, string output_filename) ​: Repair
will process all the lines in the input file and write the repaired lines to the output file. The
resulting file would still have all of the punctuation, but with individual words corrected.
One way to break this down into simpler tasks would be to first check each stripped word for spelling. The second task would be to replace the corrected word within the sentence will all its ​ ​punctuation.

Part ​ ​II

In this part of the assignment, you are to create a class, ​WordCounts ​. You will define some class data members, member methods and helper functions. The class methods will be used keep a running count of the number of times each word is being used. You can assume that there will never be more than 10,000 unique words being counted. Your class will provide the following public methods to support ​ ​counting ​ ​word ​ ​usage:
● void tallyWords(string sentence): ​This function will take in a string of multiple words,
remove the punctuation, and increment the counts for all words in the string. If a word is not already in the list, add it to the list. This function is used to keep ​a running count ​of each unique word processed; that means multiple calls to the function should update the count of the ​ ​words, ​ ​not ​ ​replace ​ ​them. ​ ​If ​ ​we ​ ​call ​ ​the ​ ​function ​ ​three ​ ​times:
The count for the words “the” and “fox” should be 2, the count for the words “brown”, ”red”,
”blue”, ​ ​“cat”, ​ ​and ​ ​“teh” ​ ​should ​ ​be ​ ​1.
● int getTally (string word) ​: ​return the current count of the given word. If the word is not found ​ ​to ​ ​be ​ ​in ​ ​the ​ ​current ​ ​list ​ ​of ​ ​words, ​ ​return ​ ​0.
● void ​ ​resetTally() ​: ​ ​​reset ​ ​all ​ ​word ​ ​counts ​ ​to ​ ​zero.
● int mostTimes(string words[], int counts[], int n) ​: ​find the n most common words in
the text that has been counted and return those words and counts in via the arrays given as
parameters. ​ ​​ ​Assume ​ ​the ​ ​arrays ​ ​are ​ ​large ​ ​enough ​ ​to ​ ​hold ​ ​the ​ ​number ​ ​of ​ ​elements ​ ​requested.
main.cpp 

int main()

{

SpellChecker sp(“English”, “words_alpha.txt”, “incorr_words.txt”);

string str = sp.repair(“hello world”);

cout << str;

return 0;

} 

SpellChecker.cpp 

#ifndef SPELLCHECKER_CPP_INCLUDED

#define SPELLCHECKER_CPP_INCLUDED

#include <iostream>

#include <string>

#include <sstream>

#include <math.h>

#include “SpellChecker.h”

using namespace std;

SpellChecker::SpellChecker(){

}

SpellChecker::SpellChecker(string listWords){

wordList=listWords;

}

SpellChecker(string lang,fileCorrSpellIn,fileIncorrSpellIn){

language=lang;

ifstream fileCorrSpell;

fileCorrSpell.open(fileCorrSpellIn);

ifstream fileIncorrSpell;

fileIncorrSpell.open(fileIncorrSpellIn);

SpellChecker::~SpellChecker(){

}

bool SpellChecker::readValidWords(string fileCorrSpell){

if (fileCorrSpell.is_open()){

while (!fileCorrSpell.eof()){

getline(fileCorrSpell,line);

wordList.add(line);

}

return true;

}

}

bool SpellChecker::readCorrectedWords(string fileIncorrSpell){

if (fileIncorrSpell.is_open()){

while (!fileIncorrSpell.eof()){

getline(fileIncorrSpell,line);

tempString=line;

count=0;

for(int x=0;x<tempString.length();x++){

count++;

if(tempString[x]==’\t’){

tempAdd=tempString.substr(0,count-1);

wordList.add(tempAdd);

tempAdd=tempString.substr(count+1,tempString.length());

wordList.add(tempAdd);

}

}

}

return true;

}

fileCorrSpell.close();

fileIncorrSpell.close();

}

}

SpellChecker::~SpellChecker(){

char SpellChecker::getStartMarker(){

return start_marker;

}

bool SpellChecker::setStartMarker(char begin){

start_marker=begin;

}

char SpellChecker::getEndMarker(){

return end_marker;

}

bool SpellChecker::setEndMarker(char end){

end_marker=end;

}

string repair(string sentence){

wordCount=0;

count=0;

returnString=sentence;

for(int i=0;i<sentence.length();i++){

if(ispunct(sentence[i])){

returnString[i]=’ ‘;

}

}

for(int y=0;y<returnString.length()){

count++;

if(returnString[y]==’ ‘){

tempString=substr(wordCount,count-1);

wordCount=count+1;

if(readValidWords(tempString)){

newString=newString+tempString;

}

else{

//ADD CORRECTED WORD TO STRING

}

}

}

return returnString;

}

}

#endif // SPELLCHECKER_CPP_INCLUDED 

SpellChecker.h 

#ifndef SPELLCHECKER_H_

#define SPELLCHECKER_H_

#include <sstream>

#include <math.h>

#include <iostream>

#include <string>

#include <unordered_map>

#include <unordered_set>

using namespace std;

class SpellChecker{

public:

SpellChecker();

SpellChecker(string);

SpellChecker(string, string, string);

~SpellChecker();

string language;

bool readValidWords(string filename);

bool readCorrectedWords(string filename);

char getStartMarker();

bool setStartMarker(char begin);

char getEndMarker();

bool setEndMarker(char end);

string repair(string sentence);

private:

char start_marker;

char end_marker;

unordered_map<string, string> incorrWords;

unordered_set<string> corrWords;

};

#endif /* SPELLCHECKER_H_ */ 

WordCounts.cpp 

#include <iostream>

#include <sstream>

#include <string>

#include <fstream>

#include “WordCounts.h”

using namespace std;

WordCounts::WordCounts()

{

int countedWords = 0;

}

void WordCounts::tallyWords(string sentence)

{

istringstream iss(sentence);

string word = “”;

string ns = “”;

while(iss >> word)

{

string nw = “”;

for(int a=0; a< word.length(); a++)

{

if(word[a] >= ‘A’ && word[a] <= ‘Z’)

{

nw += tolower(word[a]);

}

else if (word[a] >= ‘a’ && word[a] <= ‘z’)

{

nw += word[a];

}

}

bool uw = true;

for(int b = 0; b<countedWords + 1; b++)

{

if(words[b] == nw)

{

uw = false;

counts[b]++;

}

}

if(uw)

{

words[countedWords] = nw;

counts[countedWords]= 1;

countedWords++;

}

}

}

int WordCounts::getTally(string word)

{

for(int a = 0; a < countedWords; a++)

{

if(word == words[a])

{

return counts[a];

}

}

}

void WordCounts::resetTally()

{

for(int a = 0; a < countedWords; a++)

{

counts[a] = 0;

}

}

int WordCounts::mostTimes(string commonWords[], int wordCounts[], int n)

{

string words2[countedWords];

int count2[countedWords];

for(int a= 0; a < countedWords; a++)

{

words2[a] = words[a];

count2[a] = counts[a];

}

for(int b=0; b < (countedWords – 1); b++)

{

for(int c = 0; c < countedWords; c++)

{

if(count2

[c]

< count2

[c – 1]

)

{

int t = count2

[c]

;

string t2 = words2[b];

count2

[c]

= count2[c-1];

words2

[c]

= words2[c-1];

count2[c-1] = t;

words2[c-1] = t2;

}

}

}

for (int a = 0; a < n; a++)

{

commonWords[a] = words2[a];

wordCounts[a] = count2[a];

}

return 0;

} 

WordCounts.h 

#include <iostream>

#include <string>

#include <fstream>

#include <sstream>

using namespace std;

class WordCounts

{

public:

void tallyWords(string sentence);

void countWords(string sentence);

int getTally(string word);

void resetTally();

int mostTimes(string words[], int counts[], int n);

void printWords();

WordCounts();

private:

int countedWords;

string words[10000];

int counts[10000];

}; 

Solution 

main.cpp 

#include <iostream>

#include <string>

#include “SpellChecker.h”

#include “WordCounts.h”

using namespace std;

int main()

{

SpellChecker sp(“English”);

sp.readValidWords(“Testing files/VALID_WORDS_3000.txt”);

sp.readCorrectedWords(“Testing files/MISSPELLED.txt”);

string str = sp.repair(“hello tork!”);

cout << str << endl;

WordCounts wc;

wc.tallyWords(“the brown fox”);

wc.tallyWords(“the red fox”);

wc.tallyWords(“the red cat”);

cout << “the:” << wc.getTally(“the”) << endl;

cout << “red:” << wc.getTally(“red”) << endl;

cout << “fox:” << wc.getTally(“fox”) << endl;

cout << “cat:” << wc.getTally(“cat”) << endl;

return 0;

} 

SpellChecker.cpp 

#include “SpellChecker.h”

SpellChecker::SpellChecker()

{

start_marker = ‘~’;

end_marker = ‘~’;

}

SpellChecker::SpellChecker(string lang)

{

language = lang;

start_marker = ‘~’;

end_marker = ‘~’;

}

SpellChecker::SpellChecker(string lang, string fileCorrSpell, string fileIncorrSpell)

{

language = lang;

start_marker = ‘~’;

end_marker = ‘~’;

readValidWords(fileCorrSpell);

readCorrectedWords(fileIncorrSpell);

}

bool SpellChecker::readValidWords(string fileCorrSpell){

ifstream inFile(fileCorrSpell);

string line;

if (inFile.is_open())

{

while (!inFile.eof())

{

getline(inFile, line);

corrWords.insert(line);

}

inFile.close();

return true;

}

return false;

}

bool SpellChecker::readCorrectedWords(string fileIncorrSpell){

ifstream inFile(fileIncorrSpell);

string line, key, value;

if (inFile.is_open())

{

while (!inFile.eof())

{

getline(inFile, line);

int c = line.find(“\t”);

key = line.substr(0, c);

value = line.substr(c + 1, line.length());

incorrWords[key] = value;

}

inFile.close();

return true;

}

return false;

}

char SpellChecker::getStartMarker()

{

return start_marker;

}

bool SpellChecker::setStartMarker(char begin)

{

start_marker = begin;

return true;

}

char SpellChecker::getEndMarker()

{

return end_marker;

}

bool SpellChecker::setEndMarker(char end)

{

end_marker = end;

return true;

}

string lowercase(string s)

{

string lower = “”;

for (int i  = 0; i < s.length(); i++)

lower += tolower(s[i]);

return lower;

}

string SpellChecker::repair(string sentence){

stringstream repaired;

string word = “”;

for (int i = 0; i < sentence.length(); i++)

{

if (ispunct(sentence[i]))

{  // do nothing

}

else if (isspace(sentence[i]))

{

if (word.length() > 0)

{

string wordLower = lowercase(word);

if (corrWords.find(wordLower) != corrWords.end())

repaired << word;

else if (incorrWords.find(wordLower) != incorrWords.end())

repaired << incorrWords[wordLower];

else

repaired << start_marker << word << end_marker;

word = “”;

}

repaired << sentence[i];

}

else

{

word += sentence[i];

}

}

if (word.length() > 0)

{

string wordLower = lowercase(word);

if (corrWords.find(wordLower) != corrWords.end())

repaired << word;

else if (incorrWords.find(wordLower) != incorrWords.end())

repaired << incorrWords[wordLower];

else

repaired << start_marker << word << end_marker;

word = “”;

}

return repaired.str();

} 

SpellChecker.h 

#ifndef SPELLCHECKER_H_

#define SPELLCHECKER_H_

#include <fstream>

#include <string>

#include <sstream>

#include <cctype>

#include <unordered_map>

#include <unordered_set>

using namespace std;

class SpellChecker{

public:

SpellChecker();

SpellChecker(string);

SpellChecker(string, string, string);

string language;

bool readValidWords(string filename);

bool readCorrectedWords(string filename);

char getStartMarker();

bool setStartMarker(char begin);

char getEndMarker();

bool setEndMarker(char end);

string repair(string sentence);

private:

char start_marker;

char end_marker;

unordered_map<string, string> incorrWords;

unordered_set<string> corrWords;

};

#endif /* SPELLCHECKER_H_ */ 

WordCounts.cpp 

#include “WordCounts.h”

WordCounts::WordCounts()

{

int countedWords = 0;

}

void WordCounts::tallyWords(string sentence)

{

istringstream iss(sentence);

string word = “”;

string ns = “”;

while (iss >> word)

{

string nw = “”;

for (int a = 0; a < word.length(); a++)

{

if (word[a] >= ‘A’ && word[a] <= ‘Z’)

{

nw += tolower(word[a]);

}

else if (word[a] >= ‘a’ && word[a] <= ‘z’)

{

nw += word[a];

}

}

bool uw = true;

for(int b = 0; b < countedWords; b++)

{

if (words[b] == nw)

{

counts[b]++;

uw = false;

break;

}

}

if (uw)

{

words[countedWords]  = nw;

counts[countedWords] = 1;

countedWords++;

}

}

}

int WordCounts::getTally(string word)

{

for (int a = 0; a < countedWords; a++)

{

if (word == words[a])

{

return counts[a];

}

}

}

void WordCounts::resetTally()

{

for (int a = 0; a < countedWords; a++)

{

counts[a] = 0;

}

}

int WordCounts::mostTimes(string commonWords[], int wordCounts[], int n)

{

for (int a = 0; a < countedWords; a++)

{

int tc = counts[a];

string tw = words[a];

int b = a – 1;

while (b >= 0 && counts[b] < tc)

{

counts[b + 1] = counts[b];

words[b + 1] = words[b];

b = b – 1;

}

counts[b + 1] = tc;

words[b + 1] = tw;

}

for (int a = 0; a < n; a++)

{

commonWords[a] = words[a];

wordCounts[a] = counts[a];

}

return 0;

} 

WordCounts.h 

#ifndef WORDCOUNTS_H_

#define WORDCOUNTS_H_

#include <iostream>

#include <string>

#include <fstream>

#include <sstream>

using namespace std;

class WordCounts

{

public:

void tallyWords(string sentence);

int getTally(string word);

void resetTally();

int mostTimes(string words[], int counts[], int n);

WordCounts();

private:

int countedWords;

string words[10000];

int counts[10000];

};

#endif