Strings in C

Total Page:16

File Type:pdf, Size:1020Kb

Strings in C Strings in C ======================================== Section #1: Strings in C (p.665) ======================================== We all know what strings are in programming, right? We use strings to represent text. And what is text but a sequence of individual characters. Hmm, sounds like an array, right? And that’s precisely how strings are stored in C, they’re stored in character arrays! There are different ways to determine where strings end in different languages, but in C, the end of a string is marked by a special character called “the null character.” Characters are stored internally in memory using ASCII integer values, and the null character has an ASCII integer value of zero. So, another way to view strings in C, is that they are null-terminated (or “zero-terminated”) arrays of chars. In the context of a character array, zero has special meaning, it means “end of the string.” This allows us to create strings of any arbitrary length. Let’s create a few strings from the stack and see what we get: auto char str1[] = {‘a’, ‘b’, ‘c’, 0}; Here is an array without an explicit dimension. Remember, you’re allowed to do that but only if you provide a complete initialization list. Since the initialization list has four initializers, and since str1 is an array where each array element is a char, that means a contiguous block of four chars will be allocated for the array. It’ll look something like this: str1 [0x120] :*****:*****:*****:*****: | ‘a’ | ‘b’ | ‘c’ | 0 | :*****:*****:*****:*****: But that’s not the only way to do it, here are a couple of others: auto char str2[] = “abc”; str2 [0x130] :*****:*****:*****:*****: | ‘a’ | ‘b’ | ‘c’ | 0 | :*****:*****:*****:*****: auto char str3[4] = “abc”; str3 [0x140] :*****:*****:*****:*****: | ‘a’ | ‘b’ | ‘c’ | 0 | :*****:*****:*****:*****: ?otice that str2 and str3 are being initialized with string literals. How is that possible? It’s possible because string literals are null-terminated character arrays! When you write a string literal (which is a sequence of characters enclosed in a pair of double-quotes), you’re actually creating an unnamed character array that’s allocated from the data section at compile time. C depends entirely upon static type-checking, which means that all data must have some kind of data type that is determined at compile-time. So, what is the data type of a string literal? That data is sitting in the data section somewhere, which means it must have a memory address, but what is its type? The answer is that it’s a const pointer to a char. Huh? That’s right! In other words, when you write a string literal, it forms an expression that evaluates to the base address of the character array in the data section. Recall that the base address of an array is the address of the first array element, and another word for “address” is “pointer”, and since the first array element is a char, that means a pointer to a char. What happens when you do this? printf(“%p\n”, “Hello world!”); #he “%p” code is used to display a memory address, so who’s address is going to get displayed? It’s going to be the base address of the unnamed character array in the data section that contains “Hello world!” (Try it and see for yourself!) Okay, if that’s the case, what happens here? auto char *str4 = “abc”; Can you visualize that? str4 is a stack variable because it has the auto storage class. It’s a pointer to a char, so it can only store the address of a char. What is it being initialized with? It’s being initialized with the base address of the unnamed string “abc” that’s sitting out in the data section. I hope you already drew a picture of that to make it more concrete, if not, I’ll give you a minute or two (don’t look below this sentence)… Okay, got it? Let’s compare notes: STACK DATA [x0x560] +-------+ +-----+-----+-----+-----+ str4 [0x150] | 0x560 | ---------------------> | ‘a’ | ‘b’ | ‘c’ | 0 | +-------+ +-----+-----+-----+-----+ 'ee how the stack pointer str4 is a variable with its own address, but the value it contains is just the base address of the unnamed string sitting in the heap? Wow! Are you ready for more? If you recall the “golden rule”: a[i] == *(a+i) Look familiar? Indexing into an array with the subscript operator is the same thing as taking the base address of the array to derive an address, then dereferencing that address to get what’s sitting there. Since str4 contains an address, that means you could apply the subscript operator to it to index into the characters of the string literal: str4[0] == *(str4+0) == ‘a’ or this: str4[2] == *(str4+2) == ‘c’ Pretty slick stuff! <TW, I just want to quickly mention that there are different ways to indicate the null character: 1) you can write it as a character constant, like this: ‘\0’ 2) you can write it using the predefined constant: NULL 3) you can write it using a simple integer value of zero: 0 ======================================== Section #2: Reading Strings from stdin ======================================== Ckay, we know how to create string literals from the data section using double quotes, and also how to use them to initialize character arrays. What if you want to read a string from the keyboard and store it in a character array? How would you do that? There are a few ways to do it, and each has tradeoffs. You might try this: auto char myStr[100]; myStr [0x200] :*****:*****:*****:*****:*****:*****:*****:*****:*****:*****:*****: | ‘?’ | ‘?’ | ‘?’ | ? | ? | ? | ? | ? | ? | ? | ? | <-- undefined content :*****:*****:*****:*****:*****:*****:*****:*****:*****:*****:*****: [0] [1] [2] . [99] printf(“Please enter a string of text: “); scanf(“%s”, myStr); Here the scanf function is being used to read a string from stdin and store it in the character array. Notice that we don’t need to use the address operator for myStr because that’s an array identifier without a subscript operator, which evaluates to the base address, which is a pointer to the first char in the array of chars. But if the user sees the prompt and types “C is fun!” what do we get? We get this: myStr [0x200] :*****:*****:*****:*****:*****:*****:*****:*****:*****:*****:*****: | ‘C’ | 0 | ‘?’ | ? | ? | ? | ? | ? | ? | ? | ? | :*****:*****:*****:*****:*****:*****:*****:*****:*****:*****:*****: [0] [1] [2] . [99] What? Where did the “fun” go? The problem here is that scanf sees whitespace as a delimiter. In other words, it uses whitespace to know when one piece of data ends and the next one begins. Since the user entered “C is fun!”, scanf figured it was done extracting the string from stdin as soon as it reached that blank space, so it just left us with a string that contains “C”. How can we get an entire string, including embedded whitespace? There’s a function in the standard libraries called fgets, it will read an entire string, whitespace and all. Well, almost… The fgets function sees the newline as the delimiter, so it will copy all of your characters up to and including the newline character. Here’s how it can be used: printf(“Please enter a string of text: “); fgets(myStr, 100, stdin); With that function call, you’ll get this: myStr [0x200] :*****:*****:*****:*****:*****:*****:*****:*****:*****:*****:*****:*****:*****: | ‘C’ | ‘ ’ | ‘i’ | ‘s’ | ‘ ’ | ‘f’ | ‘u’ | ‘n’ | ‘!’ |‘\n’ | 0 | ‘?’ | ‘?’ | :*****:*****:*****:*****:*****:*****:*****:*****:*****:*****:*****:*****:*****: [0] [1] [2] . [99] ?otice that fgets copies the newline character into the character array. Also, notice that the function call gets three pieces of information from our arguments: 1) the base address of the character array 2) the capacity of the array 3) the input stream to read from (stdin) This way fgets will read at most 100 characters and store them in your character array, always leaving room for the null character. This is a much safer function to use than scanf because fgets won’t exceed beyond the allocated capacity of the destination array, assuming you gave it the correct capacity argument. The scanf doesn’t have that information, so if you create an array of ten chars and then use scanf to read a string of 100 chars, scanf will just keep writing those chars to memory, far exceeding the allocated boundary of your array argument. Something else to note about the fgets function, it returns a pointer to the character array argument if all goes well. However, if an error occurs, or if !" is read, then the fgets function will return a NULL pointer instead. This is how you can tell if you successfully read a string when calling the function, you’ll get a pointer returned to you that’s either NULL or non-NULL. #here you have it, that’s what strings are like in C. Now you can see why, in C++, when null-terminated character arrays are used, they’re called cstrings, because that’s where they came from!.
Recommended publications
  • Package 'Pinsplus'
    Package ‘PINSPlus’ August 6, 2020 Encoding UTF-8 Type Package Title Clustering Algorithm for Data Integration and Disease Subtyping Version 2.0.5 Date 2020-08-06 Author Hung Nguyen, Bang Tran, Duc Tran and Tin Nguyen Maintainer Hung Nguyen <[email protected]> Description Provides a robust approach for omics data integration and disease subtyping. PIN- SPlus is fast and supports the analysis of large datasets with hundreds of thousands of sam- ples and features. The software automatically determines the optimal number of clus- ters and then partitions the samples in a way such that the results are ro- bust against noise and data perturbation (Nguyen et.al. (2019) <DOI: 10.1093/bioinformat- ics/bty1049>, Nguyen et.al. (2017)<DOI: 10.1101/gr.215129.116>). License LGPL Depends R (>= 2.10) Imports foreach, entropy , doParallel, matrixStats, Rcpp, RcppParallel, FNN, cluster, irlba, mclust RoxygenNote 7.1.0 Suggests knitr, rmarkdown, survival, markdown LinkingTo Rcpp, RcppArmadillo, RcppParallel VignetteBuilder knitr NeedsCompilation yes Repository CRAN Date/Publication 2020-08-06 21:20:02 UTC R topics documented: PINSPlus-package . .2 AML2004 . .2 KIRC ............................................3 PerturbationClustering . .4 SubtypingOmicsData . .9 1 2 AML2004 Index 13 PINSPlus-package Perturbation Clustering for data INtegration and disease Subtyping Description This package implements clustering algorithms proposed by Nguyen et al. (2017, 2019). Pertur- bation Clustering for data INtegration and disease Subtyping (PINS) is an approach for integraton of data and classification of diseases into various subtypes. PINS+ provides algorithms support- ing both single data type clustering and multi-omics data type. PINSPlus is an improved version of PINS by allowing users to customize the based clustering algorithm and perturbation methods.
    [Show full text]
  • A First Course to Openfoam
    Basic Shell Scripting Slides from Wei Feinstein HPC User Services LSU HPC & LON [email protected] September 2018 Outline • Introduction to Linux Shell • Shell Scripting Basics • Variables/Special Characters • Arithmetic Operations • Arrays • Beyond Basic Shell Scripting – Flow Control – Functions • Advanced Text Processing Commands (grep, sed, awk) Basic Shell Scripting 2 Linux System Architecture Basic Shell Scripting 3 Linux Shell What is a Shell ▪ An application running on top of the kernel and provides a command line interface to the system ▪ Process user’s commands, gather input from user and execute programs ▪ Types of shell with varied features o sh o csh o ksh o bash o tcsh Basic Shell Scripting 4 Shell Comparison Software sh csh ksh bash tcsh Programming language y y y y y Shell variables y y y y y Command alias n y y y y Command history n y y y y Filename autocompletion n y* y* y y Command line editing n n y* y y Job control n y y y y *: not by default http://www.cis.rit.edu/class/simg211/unixintro/Shell.html Basic Shell Scripting 5 What can you do with a shell? ▪ Check the current shell ▪ echo $SHELL ▪ List available shells on the system ▪ cat /etc/shells ▪ Change to another shell ▪ csh ▪ Date ▪ date ▪ wget: get online files ▪ wget https://ftp.gnu.org/gnu/gcc/gcc-7.1.0/gcc-7.1.0.tar.gz ▪ Compile and run applications ▪ gcc hello.c –o hello ▪ ./hello ▪ What we need to learn today? o Automation of an entire script of commands! o Use the shell script to run jobs – Write job scripts Basic Shell Scripting 6 Shell Scripting ▪ Script: a program written for a software environment to automate execution of tasks ▪ A series of shell commands put together in a file ▪ When the script is executed, those commands will be executed one line at a time automatically ▪ Shell script is interpreted, not compiled.
    [Show full text]
  • Lecture 2: Variables and Primitive Data Types
    Lecture 2: Variables and Primitive Data Types MIT-AITI Kenya 2005 1 In this lecture, you will learn… • What a variable is – Types of variables – Naming of variables – Variable assignment • What a primitive data type is • Other data types (ex. String) MIT-Africa Internet Technology Initiative 2 ©2005 What is a Variable? • In basic algebra, variables are symbols that can represent values in formulas. • For example the variable x in the formula f(x)=x2+2 can represent any number value. • Similarly, variables in computer program are symbols for arbitrary data. MIT-Africa Internet Technology Initiative 3 ©2005 A Variable Analogy • Think of variables as an empty box that you can put values in. • We can label the box with a name like “Box X” and re-use it many times. • Can perform tasks on the box without caring about what’s inside: – “Move Box X to Shelf A” – “Put item Z in box” – “Open Box X” – “Remove contents from Box X” MIT-Africa Internet Technology Initiative 4 ©2005 Variables Types in Java • Variables in Java have a type. • The type defines what kinds of values a variable is allowed to store. • Think of a variable’s type as the size or shape of the empty box. • The variable x in f(x)=x2+2 is implicitly a number. • If x is a symbol representing the word “Fish”, the formula doesn’t make sense. MIT-Africa Internet Technology Initiative 5 ©2005 Java Types • Integer Types: – int: Most numbers you’ll deal with. – long: Big integers; science, finance, computing. – short: Small integers.
    [Show full text]
  • Chapter 4 Variables and Data Types
    PROG0101 Fundamentals of Programming PROG0101 FUNDAMENTALS OF PROGRAMMING Chapter 4 Variables and Data Types 1 PROG0101 Fundamentals of Programming Variables and Data Types Topics • Variables • Constants • Data types • Declaration 2 PROG0101 Fundamentals of Programming Variables and Data Types Variables • A symbol or name that stands for a value. • A variable is a value that can change. • Variables provide temporary storage for information that will be needed during the lifespan of the computer program (or application). 3 PROG0101 Fundamentals of Programming Variables and Data Types Variables Example: z = x + y • This is an example of programming expression. • x, y and z are variables. • Variables can represent numeric values, characters, character strings, or memory addresses. 4 PROG0101 Fundamentals of Programming Variables and Data Types Variables • Variables store everything in your program. • The purpose of any useful program is to modify variables. • In a program every, variable has: – Name (Identifier) – Data Type – Size – Value 5 PROG0101 Fundamentals of Programming Variables and Data Types Types of Variable • There are two types of variables: – Local variable – Global variable 6 PROG0101 Fundamentals of Programming Variables and Data Types Types of Variable • Local variables are those that are in scope within a specific part of the program (function, procedure, method, or subroutine, depending on the programming language employed). • Global variables are those that are in scope for the duration of the programs execution. They can be accessed by any part of the program, and are read- write for all statements that access them. 7 PROG0101 Fundamentals of Programming Variables and Data Types Types of Variable MAIN PROGRAM Subroutine Global Variables Local Variable 8 PROG0101 Fundamentals of Programming Variables and Data Types Rules in Naming a Variable • There a certain rules in naming variables (identifier).
    [Show full text]
  • The Art of the Javascript Metaobject Protocol
    The Art Of The Javascript Metaobject Protocol enough?Humphrey Ephraim never recalculate remains giddying: any precentorship she expostulated exasperated her nuggars west, is brocade Gus consultative too around-the-clock? and unbloody If dog-cheapsycophantical and or secularly, norman Partha how slicked usually is volatilisingPenrod? his nomadism distresses acceptedly or interlacing Card, and send an email to a recipient with. On Auslegung auf are Schallabstrahlung download the Aerodynamik von modernen Flugtriebwerken. This poll i send a naming convention, the art of metaobject protocol for the corresponding to. What might happen, for support, if you should load monkeypatched code in one ruby thread? What Hooks does Ruby have for Metaprogramming? Sass, less, stylus, aura, etc. If it finds one, it calls that method and passes itself as value object. What bin this optimization achieve? JRuby and the psd. Towards a new model of abstraction in software engineering. Buy Online in Aruba at aruba. The current run step approach is: Checkpoint. Python object room to provide usable string representations of hydrogen, one used for debugging and logging, another for presentation to end users. Method handles can we be used to implement polymorphic inline caches. Mop is not the metaobject? Rails is a nicely designed web framework. Get two FREE Books of character Moment sampler! The download the number IS still thought. This proxy therefore behaves equivalently to the card dispatch function, and no methods will be called on the proxy dispatcher before but real dispatcher is available. While desertcart makes reasonable efforts to children show products available in your kid, some items may be cancelled if funny are prohibited for import in Aruba.
    [Show full text]
  • Julia's Efficient Algorithm for Subtyping Unions and Covariant
    Julia’s Efficient Algorithm for Subtyping Unions and Covariant Tuples Benjamin Chung Northeastern University, Boston, MA, USA [email protected] Francesco Zappa Nardelli Inria of Paris, Paris, France [email protected] Jan Vitek Northeastern University, Boston, MA, USA Czech Technical University in Prague, Czech Republic [email protected] Abstract The Julia programming language supports multiple dispatch and provides a rich type annotation language to specify method applicability. When multiple methods are applicable for a given call, Julia relies on subtyping between method signatures to pick the correct method to invoke. Julia’s subtyping algorithm is surprisingly complex, and determining whether it is correct remains an open question. In this paper, we focus on one piece of this problem: the interaction between union types and covariant tuples. Previous work normalized unions inside tuples to disjunctive normal form. However, this strategy has two drawbacks: complex type signatures induce space explosion, and interference between normalization and other features of Julia’s type system. In this paper, we describe the algorithm that Julia uses to compute subtyping between tuples and unions – an algorithm that is immune to space explosion and plays well with other features of the language. We prove this algorithm correct and complete against a semantic-subtyping denotational model in Coq. 2012 ACM Subject Classification Theory of computation → Type theory Keywords and phrases Type systems, Subtyping, Union types Digital Object Identifier 10.4230/LIPIcs.ECOOP.2019.24 Category Pearl Supplement Material ECOOP 2019 Artifact Evaluation approved artifact available at https://dx.doi.org/10.4230/DARTS.5.2.8 Acknowledgements The authors thank Jiahao Chen for starting us down the path of understanding Julia, and Jeff Bezanson for coming up with Julia’s subtyping algorithm.
    [Show full text]
  • Bash Guide for Beginners
    Bash Guide for Beginners Machtelt Garrels Garrels BVBA <tille wants no spam _at_ garrels dot be> Version 1.11 Last updated 20081227 Edition Bash Guide for Beginners Table of Contents Introduction.........................................................................................................................................................1 1. Why this guide?...................................................................................................................................1 2. Who should read this book?.................................................................................................................1 3. New versions, translations and availability.........................................................................................2 4. Revision History..................................................................................................................................2 5. Contributions.......................................................................................................................................3 6. Feedback..............................................................................................................................................3 7. Copyright information.........................................................................................................................3 8. What do you need?...............................................................................................................................4 9. Conventions used in this
    [Show full text]
  • Does Personality Matter? Temperament and Character Dimensions in Panic Subtypes
    325 Arch Neuropsychiatry 2018;55:325−329 RESEARCH ARTICLE https://doi.org/10.5152/npa.2017.20576 Does Personality Matter? Temperament and Character Dimensions in Panic Subtypes Antonio BRUNO1 , Maria Rosaria Anna MUSCATELLO1 , Gianluca PANDOLFO1 , Giulia LA CIURA1 , Diego QUATTRONE2 , Giuseppe SCIMECA1 , Carmela MENTO1 , Rocco A. ZOCCALI1 1Department of Psychiatry, University of Messina, Messina, Italy 2MRC Social, Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, London, United Kingdom ABSTRACT Introduction: Symptomatic heterogeneity in the clinical presentation of and 12.78% of the total variance. Correlations analyses showed that Panic Disorder (PD) has lead to several attempts to identify PD subtypes; only “Somato-dissociative” factor was significantly correlated with however, no studies investigated the association between temperament T.C.I. “Self-directedness” (p<0.0001) and “Cooperativeness” (p=0.009) and character dimensions and PD subtypes. The study was aimed to variables. Results from the regression analysis indicate that the predictor verify whether personality traits were differentially related to distinct models account for 33.3% and 24.7% of the total variance respectively symptom dimensions. in “Somatic-dissociative” (p<0.0001) and “Cardiologic” (p=0.007) factors, while they do not show statistically significant effects on “Respiratory” Methods: Seventy-four patients with PD were assessed by the factor (p=0.222). After performing stepwise regression analysis, “Self- Mini-International Neuropsychiatric Interview (M.I.N.I.), and the directedness” resulted the unique predictor of “Somato-dissociative” Temperament and Character Inventory (T.C.I.). Thirteen panic symptoms factor (R²=0.186; β=-0.432; t=-4.061; p<0.0001). from the M.I.N.I.
    [Show full text]
  • Lecture 17 the Shell and Shell Scripting Simple Shell Scripts
    Lecture 17 The Shell and Shell Scripting In this lecture • The UNIX shell • Simple Shell Scripts • Shell variables • File System commands, IO commands, IO redirection • Command Line Arguments • Evaluating Expr in Shell • Predicates, operators for testing strings, ints and files • If-then-else in Shell • The for, while and do loop in Shell • Writing Shell scripts • Exercises In this course, we need to be familiar with the "UNIX shell". We use it, whether bash, csh, tcsh, zsh, or other variants, to start and stop processes, control the terminal, and to otherwise interact with the system. Many of you have heard of, or made use of "shell scripting", that is the process of providing instructions to shell in a simple, interpreted programming language . To see what shell we are working on, first SSH into unix.andrew.cmu.edu and type echo $SHELL ---- to see the working shell in SSH We will be writing our shell scripts for this particular shell (csh). The shell scripting language does not fit the classic definition of a useful language. It does not have many of the features such as portability, facilities for resource intensive tasks such as recursion or hashing or sorting. It does not have data structures like arrays and hash tables. It does not have facilities for direct access to hardware or good security features. But in many other ways the language of the shell is very powerful -- it has functions, conditionals, loops. It does not support strong data typing -- it is completely untyped (everything is a string). But, the real power of shell program doesn't come from the language itself, but from the diverse library that it can call upon -- any program.
    [Show full text]
  • ASCII Delimited Format Plug-In User’S Guide
    ASCII Delimited Format Plug-in User’s Guide Version 3.4 ASCII DELIMITED ......................................................................................................... 4 CREATING AN ASCII DELIMITED MESSAGE ....................................................... 4 ASCII DELIMITED EXTERNAL MESSAGE UI........................................................ 6 DEFINING AN ASCII DELIMITED MESSAGE FORMAT...................................... 7 ASCII DELIMITED FORMAT OPTIONS .............................................................................. 7 Delimiter ..................................................................................................................... 8 Message Options......................................................................................................... 9 Treat Entire Input/Output as a Single Message (Message Mode) ...................... 9 Treat Each Record as a Separate Message (Batch Mode) ................................ 10 Single Record Mode ......................................................................................... 10 Header/Trailer Option.............................................................................................. 11 ADDING A NEW FIELD.................................................................................................... 12 SPECIFYING FIELD PROPERTIES...................................................................................... 13 The Required Property.....................................................................................
    [Show full text]
  • STAT579: SAS Programming
    Note on homework for SAS date formats I'm getting error messages using the format MMDDYY10D. even though this is listed on websites for SAS date formats. Instead, MMDDYY10 and similar (without the D seems to work for both hyphens and slashes. Also note that a date format such as MMDDYYw. means that the w is replaced by a number indicating the width of the string (e.g., 8 or 10). SAS Programming SAS data sets (Chapter 4 of Cody book) SAS creates data sets internally once they are read in from a Data Step. The data sets can be stored in different locations and accessed later on. The default is to store them in WORK, so if you create a data set using data adress; the logfile will say that it created a SAS dataset called WORK.ADDRESS. You can nagivate to the newly created SAS dataset. In SAS Studio, go to the Libraries Tab on the left (Usually appears toward the bottom until you click on it). Then WORK.ADDRESS should appear. SAS Programming SAS data sets SAS Programming SAS data sets SAS Programming Making datasets permanent You can also make SAS datasets permanent. This is done using the libname statement. E.g. SAS Programming Permanent SAS datasets The new dataset should be available to be accessed directly from other SAS programs without reading in original data. This can save a lot of time for large datasets. If the SAS dataset is called mydata, the SAS dataset will be called mydata.sas7bdat, where the 7 refers to the datastructures used in version 7 (and which hasn't changed up to version 9).
    [Show full text]
  • Positive Pay Format Guide
    Positive Pay Format Guide Check File Import Contents Contents ........................................................................................................................................................ 1 I. Supported File Types ............................................................................................................................. 2 A. Delimited Text Files ........................................................................................................................... 2 B. Microsoft Excel Files.......................................................................................................................... 2 C. Fixed-width Text Files ....................................................................................................................... 2 D. Header and Trailer Records .............................................................................................................. 2 II. File Data Requirements ......................................................................................................................... 3 A. Required Columns ............................................................................................................................. 3 B. Optional Columns.............................................................................................................................. 3 Positive Pay 1 of 3 BankFinancial, NA Format Guide 11-2016-1 I. Supported File Types Positive Pay supports the following three types of issued files: A. Delimited
    [Show full text]