Characters and Strings

57:017, Computers in Introduction Engineering—Quick Review Fundamentals of Strings and Characters of Character Strings Character Handling Library String Conversion Functions Standard Input/Output Library Functions String Manipulation Functions of the String Handling Library Comparison Functions of the String Handling Library Search Functions of the String Handling Library Other Functions of the String Handling Library

Fundamentals of Strings Introduction and Characters z Introduce some functions z Characters z Easy string and character processing z Character constant z Programs can process characters, strings, lines of z represented as a character in single quotes text, and blocks of memory z 'z' represents the integer value of z z stored in ASCII format ((yone byte/character ) z These techniques used to make: z Strings z Word processors z Series of characters treated as a single unit z Can include letters, digits and special characters (*, /, $) z Page layout software z (string constant) - written in double quotes z Typesetting programs z "Hello" z etc. z Strings are arrays of characters z The type of a string is char * i.e. “pointer to char” z Value of string is the address of first character

The ASCII Character Set Strings and Characters z String declarations z Declare as a character array or a variable of type char * char color[] = "blue"; char *colorPtr = "blue"; z Remember that strings represented as character arrays end with '\0' z color has 5 eltlements z Inputting strings z Use scanf char word[10]; scanf("%s", word); z Copies input into word[] z Do not need & (because word is a pointer) z Remember to leave room in the array for '\0'

1 Chars as Ints Inputting Strings using scanf char word[10]; z Since characters are stored in one-byte ASCII representation, char *wptr; they can be considered as one-byte integers. … z C allows chars to be treated as ints and vice versa—e.g.: // This is OK as long as input string is less int main() { // than 10 characters in length int ivar; scanf((,“%s”, word); char cvar; ivar = ‘b’; // But this is not, since wptr does not have associated cvar = 97; // storage locations to hold the string printf (“ivar=% cvar=‘%c’\n”, ivar, cvar); scanf(“%s”, wptr); }

//This would be OK: wptr = word; ivar=98 cvar=‘a’ scanf(“%s”, wptr);

Character Handling Library Character Handling Library

Prototype Description

z Character handling library int isdigit( int c ) Returns true if c is a digit and false otherwise.

z Includes functions to perform useful tests and int isalpha( int c ) Returns true if c is a letter and false otherwise. manipulations of character data int isalnum( int c ) Returns true if c is a digit or a letter and false otherwise. int isxdigit( int c ) Returns true if c is a digit character and false otherwise. z Each function receives a character (an int) or EOF as an int islower( int c ) Returns true if c is a lowercase letter and false otherwise. argument int isupper( int c ) Returns true if c is an uppercase letter; false otherwise. int tolower( int c ) If c is an uppercase letter, tolower returns c as a lowercase letter. Otherwise, tolower z The following slide contains a table of all the returns the argument unchanged. int toupper( int c ) If c is a lowercase letter, toupper returns c as an uppercase letter. Otherwise, toupper functions in returns the argument unchanged. int isspace( int c ) Returns true if c is a white-space character—newline ('\n'), space (' '), form feed ('\f'), carriage return ('\r'), horizontal tab ('\t'), or vertical tab ('\v')—and false otherwise int iscntrl( int c ) Returns true if c is a control character and false otherwise. int ispunct( int c ) Returns true if c is a printing character other than a space, a digit, or a letter and false otherwise. int isprint( int c ) Returns true value if c is a printing character including space (' ') and false otherwise. int isgraph( int c ) Returns true if c is a printing character other than space (' ') and false otherwise.

1 /* Fig. 8.2: fig08_02.c 2 Using functions isdigit, isalpha, isalnum, and isxdigit */ 33 isxdigit( '7' ) ? "7 is a " : "7 is not a ", 3 #include 34 "hexadecimal digit", 4 #include 5 35 isxdigit( '$' ) ? "$ is a " : "$ is not a ", 6 int main() 36 "hexadecimal digit", 7 { 37 isxdigit( 'f' ) ? "f is a " : "f is not a ", 8 printf( "%s\n%s%s\n%s%s\n\n", "According to isdigit: ", 38 "hexadecimal digit" ); 9 isdigit( '8' ) ? "8 is a " : "8 is not a ", "digit", 39 return 0; 10 isdigit( '#' ) ? "# is a " : 40 } 11 "# is not a ", "digit" ); 12 printf( "%s\n%s%s\n%s%s\n%s%s\n%s%s\n\n", According to isdigit: 8 is a digit 13 "According to isalpha:", # is not a digit 14 isalpha( 'A' ) ? "A is a " : "A is not a ", "letter", 15 isalpha( 'b' ) ? "b is a " : "b is not a ", "letter", According to isalpha: 16 isalpp(ha( '&' ) ? "& is a " : "& is not a " , "letter" , A is a letter 17 isalpha( '4' ) ? "4 is a " : b is a letter 18 "4 is not a ", "letter" ); & is not a letter 19 printf( "%s\n%s%s\n%s%s\n%s%s\n\n", 4 is not a letter 20 "According to isalnum:", 21 isalnum( 'A' ) ? "A is a " : "A is not a ", According to isalnum: 22 "digit or a letter", A is a digit or a letter 23 isalnum( '8' ) ? "8 is a " : "8 is not a ", 8 is a digit or a letter # is not a digit or a letter 24 "digit or a letter", 25 isalnum( '#' ) ? "# is a " : "# is not a ", According to isxdigit: 26 "digit or a letter" ); F is a hexadecimal digit 27 printf( "%s\n%s%s\n%s%s\n%s%s\n%s%s\n%s%s\n", is not a hexadecimal digit 28 "According to isxdigit:", 7 is a hexadecimal digit 29 isxdigit( 'F' ) ? "F is a " : "F is not a ", $ is not a hexadecimal digit 30 "hexadecimal digit", f is a hexadecimal digit 31 isxdigit( 'J' ) ? "J is a " : "J is not a ", 32 "hexadecimal digit",

2 1 /* Fig. 8.6: fig08_06.c String Conversion Functions 2 Using atof */ 3 #include 4 #include z Conversion functions 5 z In (general utilities library) 6 int main() 7 { z Convert strings of digits to integer and floating-point values 8 double d; 9

10 d = atof( "99.0" ); Prototype Description 11 printf( "%s%.3f\n%s%.3f\n", double atof( const char *nPtr ) Converts the string nPtr to double. 12 "The string \"99.0\" converted to double is ", d, 13 "The converted value divided by 2 is ", int atoi( const char *nPtr ) Converts the string to . nPtr int 14 d / 2.0 ); long atol( const char *nPtr ) Converts the string nPtr to long int. 15 return 0; double strtod( const char *nPtr, Converts the string nPtr to double. 16 } char **endPtr ) long strtol( const char *nPtr, Converts the string nPtr to long. The string "99.0" converted to double is 99.000 char **endPtr, int base ) The converted value divided by 2 is 49.500 unsigned long strtoul( const char Converts the string nPtr to unsigned *nPtr, char **endPtr, int base ) long.

1 /* Fig. 8.13: fig08_13.c 2 Using gets and putchar */ Standard Input/Output 3 #include 4 void reverse( const char * const ); 5 int main() Library Functions 6 { 7 char sentence[ 80 ]; 8 z Functions in 10 printf( "Enter a line of text:\n" ); z Used to manipulate character and string data 11 gets( sentence ); 12 13 printf( "\nThe line printed backwards is:\n" ); Function prototype Function description 14 reverse( sentence ); 15 int getchar( void ); Inputs the next character from the standard input and 16 return 0;; returns it as an integer. 17 } 18 char *gets( char *s ); Inputs characters from the standard input into the array 19 void reverse( const char * const sPtr ) s until a newline or end-of-file character is 20 { encountered. A terminating is appended 21 if ( sPtr[ 0 ] == '\0' ) reverse calls itself using substrings of to the array. 22 return; the original string. When it reaches the 23 else { '\0' character it prints using putchar int putchar( int c ); Prints the character stored in c. 24 reverse( &sPtr[ 1 ] ); int puts( const char *s ); Prints the string s followed by a newline character. 25 putchar( sPtr[ 0 ] ); 26 } int sprintf( char *s, Equivalent to printf, except the output is stored in 27 } const char *format, ... ); the array s instead of printing it on the screen. Enter a line of text: Characters and Strings int sscanf( char *s, const Equivalent to scanf, except the input is read from the char *format, ... ); array s instead of reading it from the keyboard. The line printed backwards is: sgnirtS dna sretcarahC

A Non-recursive Version of reverse() String Manipulation Functions of the String Handling Library

In the previous example, the function reverse() was written z String handling library has functions to recursively. Here is a non-recursive version of the function: z Manipulate string data z Search strings void reverse( const char * const sPtr ) { z Tokenize strings int i=0; z Determine string length

/* fin d en d o f the s tr ing. Cou ld a lso use s tr len () */ while (sPtr[i] != '\0') i++; Function prototype Function description char *strcpy( char *s1, Copies string s2 into array s1. The value of s1 is /* print the string from end to beginnning */ const char *s2 ) returned. for (; i>=0; i--) char *strncpy( char *s1, Copies at most n characters of string s2 into array s1. putchar(sPtr[i]); const char *s2, size_t n ) The value of s1 is returned. putchar('\n'); char *strcat( char *s1, Appends string s2 to array s1. The first character of const char *s2 ) s2 overwrites the terminating null character of s1. return; The value of s1 is returned. } // end reverse function char *strncat( char *s1, Appends at most n characters of string s2 to array s1. const char *s2, size_t n ) The first character of s2 overwrites the terminating null character of s1. The value of s1 is returned.

3 Comparing Character Strings Comparing Character Strings Consider the following example: z To determine if two character strings are char array1[ ] = “abcdef”; equal, they must be compared character-by char array2[ ] = “abcdef”; character if (array1 == array2) z C has libraryyp functions to perform character printf(“the arrays compare as equal\n”); string comparisons else printf(“the arrays do not compare as equal\n); The arrays do not compare as equal

Why?? Because the values of the pointers are being compared, not the contents of the strings.

Comparison Functions of the Search Functions of the String Handling Library String Handling Library

z Comparing strings Function prototype Function description z Computer compares numeric ASCII codes of characters in string char *strchr( const char *s, Locates the first occurrence of character c in string s. If c is found, a pointer to c in int c ); z Appendix D has a list of character codes s is returned. Otherwise, a NULL pointer is returned. size_t strcspn( const char Determines and returns the length of the initial segment of string s1 consisting of *s1, const char *s2 ); characters not contained in string s2. int strcmp( const char *s1, const char *s2 ); size_ t strspn( const char Determines and returns the length of the initial segment of string s1 consisting only *s1, const char *s2 ); of characters contained in string s2. z Compares string to s1 s2 char *strpbrk( const char Locates the first occurrence in string s1 of any character in string s2. If a character

z Returns a negative number if s1 < s2, zero if s1 == s2 or a *s1, const char *s2 ); from string s2 is found, a pointer to the character in string s1 is returned. Other- wise, a NULL pointer is returned. positive number if s1 > s2 char *strrchr( const char *s, Locates the last occurrence of c in string s. If c is found, a pointer to c in string s is int c ); returned. Otherwise, a NULL pointer is returned. char *strstr( const char *s1, Locates the first occurrence in string s1 of string s2. If the string is found, a pointer int strncmp( const char *s1, const char *s2, size_t n ); const char *s2 ); to the string in s1 is returned. Otherwise, a NULL pointer is returned. z Compares up to n characters of string s1 to s2 char *strtok( char *s1, const A sequence of calls to strtok breaks string s1 into “tokens”—logical pieces such char *s2 ); as words in a line of text—separated by characters contained in string s2. The first z Returns values as above call contains s1 as the first argument, and subsequent calls to continue tokenizing the same string contain NULL as the first argument. A pointer to the current token is returned by each call. If there are no more tokens when the function is called, NULL is returned.

1 /* Fig. 8.29: fig08_29.c 2 Using strtok */ 1 /* Fig. 8.27: fig08_27.c 3 #include 4 #include 2 Using strspn */ 5 3 #include 6 int main() 4 #include 7 { 8 char string[] = "This is a sentence with 7 tokens"; 5 9 char *tokenPtr; Initialize variables 6 int main() 10 7 { 11 printf( "%s\n%s\n\n%s\n", 12 "The string to be tokenized is:", string, 8 const char *string1 = "The value is 3.14159"; Function calls Initialize variables 13 "The tokens are:" ); 9 const char *string2 = "aehi lsTuv"; 14 10 15 tokenPtr = strtok( string, " " ); 11 printf( "%s%s\n%s%s\n\n%s\n%s%u\n", Function calls 16 17 whilhile (tk( token Pt!Ptr != NULL){NULL ) { 12 "string1 = ", string1, "string2 = ", string2, And Print 18 printf( "%s\n", tokenPtr ); Print 13 "The length of the initial segment of string1", 19 tokenPtr = strtok( NULL, " " ); 14 "containing only characters from string2 = ", 20 } 21 15 strspn( string1, string2 ) ); 22 return 0; 16 return 0; 23 } 17 } The string to be tokenized is: Program Output This is a sentence with 7 tokens Program Output string1 = The value is 3.14159 string2 = aehi lsTuv The tokens are: This The length of the initial segment of string1 is containing only characters from string2 = 13 a sentence with 7 tokens

4 Other Functions of the 1 /* Fig. 8.37: fig08_37.c

String Handling Library 2 Using strerror */

3 #include z char *strerror( int errornum ); 4 #include

z Creates a system-dependent error message based on 5

errornum 6 int main()

z Returns a pointer to the string 7 { z size_t strlen( const char *s ); 8 printf( "%s\n", strerror( 2 ) ); z Returns the number of characters (before NULL) in string s 9 return 0; 10 }

No such file or directory

An Example Problem Statement z Take as input a string of characters (command line z Character and string processing example argument), and search for words as substrings z Use standard char/string library functions (fgets(), z Example: strlen(), strcpy, etc) ./a.out thisisnotatest hi z Top-down design, with divide-and-conquer is is AND step-wise refinement no at z Use a new “Unix” specific utility (“pipe”) his not ate this test

Understanding the Problem and Proposing an Refinement Algorithm/Solution z Take string input (argv[]) z Generate all substrings Input string example: test z Compare all generated substrings to a All possible (contiguous) sub-strings: “dic tionary ” to de term ine if su bs tr ing is a “real” word t, e, s, t te, es, st tes, est test

5 Refinement Refinement

For each “len”gth 1 to n (n == length of input string) z For len = 1 to strlen(input) For each character i in input string: for i = 0 to strlen(input) get each substring at position i, length “len” get_chars(input,temp,i,len)

Refinement -- Algorithm start dstart

source dest get_chars(source, destination, position, length=1 t e s t t length); start dstart z Function to pass in the source string, the source dest destination string, the position in the source length=2 t e s t e string, and the length of characters to acquire start dstart source dest

length=2 t e s t e s

#include get_chars() refinement void get_chars(char *source, char *dest, int start, int len); void get_chars(char *source, char *dest, void get_chars(char *source, char *dest, int start, int length) Int main(int argc, char* argv[]) { int start, int len) { { char temp[30]; int i = 0; int dstart = 0; int i = 0; int j = 0; destination_start = 0; for(i=start,dstart=0;dstart

6 How do we compare these Output substrings to a dictionary?

./wordFind.exe test %cat file temp = t z Directs the contents of “file” to the “standard input” -- temp = e the contents are printed to the screen temp = s temp = t z This is a mechanism that we can use to send the temp = te content s of a l arge di c tionary fil e to our program temp = es z Unix pipe -- “|” temp = st %cat file | ./testProgram.exe temp = tes z Instead of sending the contents of the file to the temp = est screen, we can “pipe” them to our program temp = test

Simple program to show how a Refinement “pipe” may work z Take string input (argv[]) #include z Generate all substrings z Compare all generated substrings to a “dictionary” to determine if substring is a “real” word int main(int argc, char* argv[]) { Becomes: z Take string input (argv[]) z Read all words of dictionary into an array char word[5]; z Generate all substrings z Compare each substring to the words in array z If equal, then print “word” that has been “found” scanf("%s",word);

printf("word = %s\n",word); }

Expand the program to handle Output a large number of words:

#define NUM_WORDS 45427 % more file #define MAX 30 // max word length test #include % cat file | ./testProgram.exe int main(int argc, char* argv[]) { char words[NUM_ WORDS][MAX]; word = test int i = 0; while(i

7 Even simpler method for Output input/output

%cat linux.words | ./readWords.exe word = Aarhus z cat is most useful when directly taking the output 0 from one program and sending it as the input to word = Aaron 1 another word = Ababa z If file already exists,use the Input Redirection Shell 2 word = aback CdCommand, < 3 z % wordFind.exe argvstuff < dictionary.txt word = abaft 4 z Can do the same for output (Output Redirection word = abandon Shell Command 5 word = abandoned z % wordFind.exe argvstuff output.txt 6 word = abandoning 7

for(length=1;length<=strlen(argv[1]);length++){ /* read in a string from the command line for(i=0;i<=strlen(argv[1])-length;i++) { and try all permutations against a dictionary */ get_chars(argv[1],temp,i,length); //printf("temp = %s\n",temp); #define NUM_WORDS 45427 #define MAX 30 // max word length for(j=0;j //printf("%s %s\n",temp,words[j]); Now it is just a matter of if(strcmp(temp,words[j])==0){ void get_chars(char *source, char *dest, int start, int len); printf("%s\n",temp); } combining the two partial int main(int argc, char* argv[]) { } } solutions: char words[NUM_WORDS][MAX]; } char temp[MAX]; } int i = 0, j=0, length=0; void get_chars(char *source, char *dest, int start, int while(i

Output Future Modifications

%cat dictionary.txt | ./wordFind.exe thisisnotatest z Modify to try all possible combinations of hi substrings: is is z the = t, h, e, th, te, he, ht, et, eh, the, teh, hte, het, no eht at z Read from file (instead of pipe/stdio) his z not Keep dictionary in memory and do multiple ate words this test

8