Characters, Character Strings, and string-manipulation functions in C
see Kernighan & Ritchie – Section 1.9, Appendix B3 Characters
Printable characters (and some non-printable ones) are represented as 8-bit numeric values Stored in variables of type char
7-bit ASCII character code is used in C compatible with Latin-1, UTF-8 encodings
Character ASCII decimal ASCII hexadecimal ASCII binary A 65 0x41 0100 0001 a 97 0x61 0110 0001 B 66 0x42 0100 0010 b 98 0x62 0110 0010
Character Strings (Text)
C does not have a “string” type!
C has arrays of variables of its data types, including characters
Arrays must have a constant, fixed length
Text is kept in character arrays
The arrays must be as big as or bigger than the text strings In practice, they are almost always bigger Strings of Characters in Character Arrays
C character strings are held in memory as ASCII values, with an ASCII 0, or null, at the end. “Null-terminated strings” Also called ASCIIZ
Character arrays must be big enough to include the null character as well as the printable characters
Any extra elements in the array may be filled with more nulls, with garbage, or with remnants of previous strings Putting a String in a Character Array
Initialize an array when you declare it: char bfrA[10] = "abcdefg"; // last two char's unused char bfrB[] = "hijklm"; // array is just big enough
Assign characters one-by-one: char bfrA[10]; bfrA[0] = 'a'; bfrA[1] = 'b'; We'll see a better way shortly bfrA[2] = 'c'; ⁝ bfrA[7] = '\0'; String Input
scanf() - use “%s” for the format specifier, and supply a character array Amount of input can be limited with the size modifier: “%20s” will get 20 characters
fgets() expects a character array fgets() also expects the array size, to limit the amount of input text gets() also expects a character array, but don't use it – always use fgets() instead
The result is an array of characters that are valid up to the terminating null String Output
printf(), puts(), fputs() - all expect a null- terminated character string
They will keep printing characters until they see a null So if you give them a character array that doesn’t contain a null, they’ll keep going - off the end of the array, and until they happen to run into a 0 byte somewhere in memory (or run out of legal memory) Example - Caesar Cipher
Also known as "rot-N" for "rotate N characters" rot-13 is a common one
Simply done with character arithmetic
Use Boolean variables
Repeated inputs
Rotation count as cmd-line input Solution Sometimes side-effects are good, even necessary...
Character-String Functions in C Functions That Work With Character Strings
Find these in
strlen(char *src) report the number of "meaningful" characters stops counting at the first null character
strcmp(char *dest, char *src) compare two string arrays returns 0 if they match each other returns -1 if dest comes before (is "less than") src returns +1 if dest comes after src
note: "dest == src" would test whether both names refer to the same string Finding a Character In a String
char *strchr(char *s, int c) Return the location of the first occurrence of character c in the string s . This is a pointer, not an index/offset
char *strrchr(char*s, int c) Return the location of the last occurrence of character c in the string s
char *strstr(char*haystack, char *needle) Return the location of the substring needle within the (larger) string haystack Example: the strchr() function
strrchr() – finds "am!" Making New Strings
strncpy(char *dest, char *src, size_t n) . size_t is a type related to unsigned or long unsigned copy a text string from src array into dest array copies at most n characters . if src string is longer than n, the copied result will not be null terminated! set n <= the dest array's length to prevent buffer overflow
strncat(char *dest, char *src, size_t n) appends at most n characters from src array to end of dest array . if src string is longer than n, the copied result will not be null terminated Putting a String in a Character Array
Assign characters to the array: char bfrA[10];
/* bfrA[0] = 'a'; bfrA[1] = 'b'; bfrA[2] = 'c'; ⁝ Ugh bfrA[7] = '\0'; */
strncpy(bfrA, "abcdefg", 10); Better than character-by-character Avoiding Buffer Overflow
Recognize, but don't use these... Older versions of strncpy(), strcat():
strcpy(char *dest, char *src) copy a text string from one array into another appends null character to the end buffer overflow occurs if src is longer than dest!
strcat(char *dest, char *src) appends a text string to the end of another also appends null character can also overflow the dest Implementing String Functions
Functions that operate on strings actually work on character arrays
A function must step through each character of the arrays, checking for the null character
Typical function uses a loop to step through the array An Implementation of “strlen()”
Prototype: unsigned strlen(char *src);
Equivalent to: unsigned strlen(char src[]);
A simple counting loop works: unsigned strlen(char *src) { unsigned i; for (i = 0; src[i] != ‘\0’; i++) ; return i; } An Implementation of “strncpy()”
Prototype: char *strncpy(char *dest, char *src, unsigned maxlen);
Equivalent to: char *strncpy(char dest[], char src[], unsigned maxlen);
Use a loop to copy (see "man strncpy"): for (i = 0; i < maxlen && src[i] != '\0'; i++) dest[i] = src[i]; for ( ; i < maxlen; i++) // pad with nulls, if possible dest[i] = '\0'; return dest;