Characters, Character Strings, and String-Manipulation Functions in C

Characters, Character Strings, and string-manipulation functions in C see Kernighan & Ritchie – Section 1.9, Appendix B3 Characters Printable characters (and some non-printable ones) are represented as 8-bit numeric values Stored in variables of type char 7-bit ASCII character code is used in C compatible with Latin-1, UTF-8 encodings Character ASCII decimal ASCII hexadecimal ASCII binary A 65 0x41 0100 0001 a 97 0x61 0110 0001 B 66 0x42 0100 0010 b 98 0x62 0110 0010 Character Strings (Text) C does not have a “string” type! C has arrays of variables of its data types, including characters Arrays must have a constant, fixed length Text is kept in character arrays The arrays must be as big as or bigger than the text strings In practice, they are almost always bigger Strings of Characters in Character Arrays C character strings are held in memory as ASCII values, with an ASCII 0, or null, at the end. “Null-terminated strings” Also called ASCIIZ Character arrays must be big enough to include the null character as well as the printable characters Any extra elements in the array may be filled with more nulls, with garbage, or with remnants of previous strings Putting a String in a Character Array Initialize an array when you declare it: char bfrA[10] = "abcdefg"; // last two char's unused char bfrB[] = "hijklm"; // array is just big enough Assign characters one-by-one: char bfrA[10]; bfrA[0] = 'a'; bfrA[1] = 'b'; We'll see a better way shortly bfrA[2] = 'c'; ⁝ bfrA[7] = '\0'; String Input scanf() - use “%s” for the format specifier, and supply a character array Amount of input can be limited with the size modifier: “%20s” will get 20 characters fgets() expects a character array fgets() also expects the array size, to limit the amount of input text gets() also expects a character array, but don't use it – always use fgets() instead The result is an array of characters that are valid up to the terminating null String Output printf(), puts(), fputs() - all expect a null- terminated character string They will keep printing characters until they see a null So if you give them a character array that doesn’t contain a null, they’ll keep going - off the end of the array, and until they happen to run into a 0 byte somewhere in memory (or run out of legal memory) Example - Caesar Cipher Also known as "rot-N" for "rotate N characters" rot-13 is a common one Simply done with character arithmetic Use Boolean variables Repeated inputs Rotation count as cmd-line input Solution Sometimes side-effects are good, even necessary... Character-String Functions in C Functions That Work With Character Strings Find these in <string.h> … strlen(char *src) report the number of "meaningful" characters stops counting at the first null character strcmp(char *dest, char *src) compare two string arrays returns 0 if they match each other returns -1 if dest comes before (is "less than") src returns +1 if dest comes after src note: "dest == src" would test whether both names refer to the same string Finding a Character In a String char *strchr(char *s, int c) Return the location of the first occurrence of character c in the string s . This is a pointer, not an index/offset char *strrchr(char*s, int c) Return the location of the last occurrence of character c in the string s char *strstr(char*haystack, char *needle) Return the location of the substring needle within the (larger) string haystack Example: the strchr() function strrchr() – finds "am!" Making New Strings strncpy(char *dest, char *src, size_t n) . size_t is a type related to unsigned or long unsigned copy a text string from src array into dest array copies at most n characters . if src string is longer than n, the copied result will not be null terminated! set n <= the dest array's length to prevent buffer overflow strncat(char *dest, char *src, size_t n) appends at most n characters from src array to end of dest array . if src string is longer than n, the copied result will not be null terminated Putting a String in a Character Array Assign characters to the array: char bfrA[10]; /* bfrA[0] = 'a'; bfrA[1] = 'b'; bfrA[2] = 'c'; ⁝ Ugh bfrA[7] = '\0'; */ strncpy(bfrA, "abcdefg", 10); Better than character-by-character Avoiding Buffer Overflow Recognize, but don't use these... Older versions of strncpy(), strcat(): strcpy(char *dest, char *src) copy a text string from one array into another appends null character to the end buffer overflow occurs if src is longer than dest! strcat(char *dest, char *src) appends a text string to the end of another also appends null character can also overflow the dest Implementing String Functions Functions that operate on strings actually work on character arrays A function must step through each character of the arrays, checking for the null character Typical function uses a loop to step through the array An Implementation of “strlen()” Prototype: unsigned strlen(char *src); Equivalent to: unsigned strlen(char src[]); A simple counting loop works: unsigned strlen(char *src) { unsigned i; for (i = 0; src[i] != ‘\0’; i++) ; return i; } An Implementation of “strncpy()” Prototype: char *strncpy(char *dest, char *src, unsigned maxlen); Equivalent to: char *strncpy(char dest[], char src[], unsigned maxlen); Use a loop to copy (see "man strncpy"): for (i = 0; i < maxlen && src[i] != '\0'; i++) dest[i] = src[i]; for ( ; i < maxlen; i++) // pad with nulls, if possible dest[i] = '\0'; return dest; .

Characters, Character Strings, and String-Manipulation Functions in C

C Strings and Pointers

Technical Study Desktop Internationalization

Data Types in C

Wording Improvements for Encodings and Character Sets

String Class in C++

The Char Type ASCII Encoding Manipulating Characters Reading A

Recommendation Itu-R Br.1352-2

Chapter 11 Strings

Technical Study Universal Multiple-Octet Coded Character Set Coexistence & Migration

Distinguishing 8-Bit Characters and Japanese Professional Quality [9], Including Japanese Line Break- Characters in (U)Ptex Ing Rules and Vertical Typesetting

[MS-UCODEREF]: Windows Protocols Unicode Reference

Additional Information May Be Carried by Descriptors Which May Be Placed in the Descriptor Loop After the Basic Information