<<

CPTR-141 HW # 20 due Friday

Definition as handed out: Write a program that prompts the user to input the name of a text file and then outputs the number of words in the file. You can consider a “word” to be any text that is surrounded by whitespace (for example, a , , ) or boarders the beginning or end of the file.

For testing you can use the data.txt and hw19.txt files that are on the class web page.

Before jumping into writing code, write a pseudo-code outline of the logic for this program. Then do the coding.

Statement made during class: Turn in your pseudo-code outline in addition to the hardcopy of your program. Upload the source file to D2L.

Additional comments, information, and modification of the problem statement The initial problem statement is interesting but leaves some unanswered questions such as “what is a word”? Does a single surrounded by spaces count as a word? How about a string of punctuation characters with a blank before or after? How about a number which could be a single digit or multiple digits?

If the simplistic way is taken, meaning that words are identified only by finding white space (i.e. when isspace returns true), you end up counting isolated punctuation characters or sequences as words. I think you can do better. First a bit more information.

The isspace function returns true for these characters: space (i.e. blank), newline, carriage return, tab, vertical tab, or form feed. It returns false for punctuation characters.

Both the isalpha and isdigit functions return false for punctuation characters. Thus it is possible to distinguish between punctuation and alphanumeric characters.

THEREFORE by definition for this assignment: a) a single punctuation mark with one or more spaces before or after it is not a word. b) a string of punctuation marks with one or more spaces before or after it is not a word. ) a single numeric character or sequence of numeric only characters with one or more spaces before or after it is not a word. d) a sequence of mixed alpha and numeric characters, such as Base16, will be considered a word. e) a sequence of mixed alpha and numeric and punctuation, such as instream.get, would be considered a word.

Two levels of success for this homework The minimum, passing, homework will count punctuation and numeric characters as words or part of words. Higher level success will implement the a to d definitions above. Ideas for keeping track of when/or if a word has been found

In programming there is the concept of a flag. A flag is a variable that you place a value in to represent some condition or the presence/absence of some condition. Sometimes a flag is a variable of type bool in which case the variable would have true or false in it. For example in the context of the current problem, assume you have a flag named WordStart of type bool and that alpha and numeric characters can be found together in a word. As you start reading characters from the file you would test for alpha characters and numeric characters and if the current character is either then put true in the WordStart variable. Then you would read additional characters until you find a whitespace character. When you find a white space character, if WordStart flag is true you know that you just ended a word, not just a space, a word has been found and you would increment the word count and put false in WordStart. The flag is how the program “remembers” what came before the current statement or operation in a program.