Handout 4 Character Representation

Total Page:16

File Type:pdf, Size:1020Kb

Handout 4 Character Representation 06-11337 Introduction to Computer Science The University of Birmingham Autumn Semester 2002 School of Computer Science October 21, 2002 ­c Achim Jung and Uday Reddy Handout 4 Character representation 1. History. The representation of characters in a binary alphabet has a much longer history than computers. It came up in connection with telegraphs which were invented around 1800. A well-known example of such a representation is the Morse Code Alphabet from 1867. Computers, on the other hand, were primarily developed for numerical calculations and that is why the early machines offered only integer and floating point arithmetic (some offered only one of the two). The observation that a computer can also process non-numerical data appeared only in 1950’s. This is the reason why many concepts in the study of character representation refer to data transmission rather than data processing. Also, many developments took place in telecommunication companies such as AT&T. 2. ASCII. Most computers in the English speaking world today operate with a 7-bit representation, known as ASCII, or American Standard Code for Information Interchange. It has been standardised by the American National Standards Institute. See below for a table. If you study the character description then besides the usual everyday characters you find things like End Transmission Block and Negative Acknowledgement which point to the origins of the code in data transmission. In data processing some of these special symbols have assumed a different or second interpretation. I do not expect you to memorise this table. What I want you to remember is that the letters A–Z, a–z, and 0–9 form contiguous segments, separated from each other by other symbols. It is also useful to know that UNIX separates lines by the single character ‘ˆJ’, whereas PCs use the character combination ‘ˆMˆJ’. 3. ASCII and the keyboard. Only for some characters is it obvious how to enter them into a text using the keyboard. The ¼¿¾ characters from octal ¼½ to are called control characters. They are generated by holding down the control key and pressing the corresponding character key. For many programs they will have a special meaning, hence the name “control character”. Other codes in the table are not accessible through the keyboard directly. Emacs allows you to enter any ASCII character in a file, by first typing ‘ˆQ’, then the octal code (leading zero not necessary), and finishing with ‘return’. Give it a try! 4. Latin-1. The basic unit of computer memory is the byte, which consists of 8 bits. For representing an ASCII character we only need 7 bits, and hence the highest bit is always 0 in an ASCII character representation. A number of possibilities for the remaining 128 bit-patterns have been standardised by ISO (“International Standardization Organisation”). The one you will find on our UNIX-machines is called Latin-1. It contains numerous special characters from various European languages, such as ‘U’,¨ ‘ˆa’, etc. Go to http://czyborra.com/charsets/iso8859.html for a table. Our local UNIX systems understand Latin-1 codes. (For Emacs to display Latin-1 characters you may need to include the line (standard-display-european 1) in your “.emacs” file.) PCs also use ASCII but the extension to eight bits is different from Latin-1. 5. Unicode. The 256 different patterns that we can store in a byte are clearly not enough to cover the world’s differ- ent alphabets. Various efforts to define a two-byte code have been combined since 1991 in the Unicode consortium (http://www.unicode.org). Work is ongoing but a large number of character systems have been codified already. See http://charts.unicode.org for a list of tables. Observe that the patters from 0x0000 to 0x00FF are iden- tical with ASCII and Latin-1. 6. ASCII, Unicode and Java. Java is one of the first languages which fully embraces Unicode. Indeed, in the first stage of compilation, every input letter of your program is translated into the corresponding Unicode. This allows programmers from other cultures to use identifier names built up from characters of their own languages. However, the keywords of the language, such as class, int,etc,arefixed. A declaration of the form char c; 1 reserves two bytes of memory. In a char-cell we can store any of the 65,536 different Unicode characters (actually, not all these possibilities are yet defined). Unicode is a nice idea but we are still surrounded by ASCII and Latin-1 in this part of the world. For example, keyboards can’t possibly try to cover all of this code. So while entering an ASCII character into a char-cell is easily effected by a statement like c = ’A’; we have to have a “Unicode to ASCII” translation for other values. The basic idea for such a translation is to use escape sequences, where a special symbol (which in Java is always ‘\’) indicates that a certain number of following characters are describing some other character rather than denoting themselves. The following table lists the traditional escape sequences which are understood by C, C··, UNIX, and, of course, Java. esc seq description esc seq description \n newline \r return \t tab \v vertical tab \’ quote \" double quote \\ backslash \b backspace \f formfeed \a bell \? question mark \0 null character \ooo octal byte \xhh hexadecimal byte (Note how we recover the escape character ‘\’ itself!) Java adds to these the Unicode escapes of the format “\uhhhh” where each h stands for a hexadecimal digit. Unicode escapes can appear anywhere in a program, not just within the definition of a character or a string; try it! 7. Computing with characters. A little bit of arithmetic is permitted on char-values. This is useful for text processing. Forexample,ifthechar-cell c holds the character ’a’,thenc-32will represent ’A’. The same is true for every other lowercase letter of the ASCII alphabet and its Latin-1 extension. Thus we can we can use arithmetic to switch from lowercase to uppercase letters. 8. Text compression. Universal standardised codes are extremely important for the smooth cooperation of computers across networks and for the portability of programs. In the remainder of these notes, however, I want to look at a slightly different scenario. Suppose we are asked to develop a code for for a certain source of information. We assume that we know the set A of symbols that the source is using. Now, in such a situation it may well be that the set A has fewer than 256 elements and so the use of 8 bits for the representation of each symbol could be wasteful. As an example, consider English text. It will consist of 52 upper- and lowercase characters plus some 10 punctuation symbols. In this situation a code of 6 bits per character suffices. 9. The source code theorem. Suppose now that we also know the relative frequency with which we will encounter each character from the information source. We can then do even better by adapting the length of the code to the frequency of the character occurring. An example of such a strategy is the Morse Code Alphabet. One might wonder whether a particular code is optimal for the given application. Surprisingly, there is a precise answer to this question. It was developed by Claude Shannon in 1948. a ¾ A Ô Ô a Suppose the relative frequency with which the symbol appears is a . (So the sum of all will be 1.) Shannon calls the value ½ Á ´aµ=ÐÓg Ô a the information content of the symbol a. He then defines X Ô ¢ Á ´aµ a a¾A which he calls the entropy of the data source. This is his source code theorem: There is a coding of the symbols in A in a binary alphabet such that the average number of bits per symbol is arbitrarily close to the entropy of the source. No coding can have a bit rate smaller than the entropy. ¾ A Ð ´aµ The average number of bits is calculated almost like the entropy. Suppose the coding of symbol a takes many bits. Then the average length is X Ô ¢ Ð ´aµ a a¾A 2 10. Huffman code. Four years after Shannon published his theorem, David Albert Huffman came up with a very simple practical code which is optimal among all codes which code every character separately. His method is best explained by example symbol frequency code 0 a .5625 0 0 b .1875 1.0 10 0 .4375 c .1875 1 110 .25 1 d .0625 111 1 In a first stage one constructs a binary tree by repeatedly grouping together the two entries with smallest frequency (as- signing the sum to the new root), until only one node remains (which must carry the frequency 1). Each path in this tree is labelled by assigning a left branch with 1 and a right branch with 0. The code for each symbol is obtained by collecting all labels along the path from the root to the symbol. :6¿ In this particular example, one gets an average bit rate of ½ bits per symbol which is a lot better than the naive code with :6¾ 2 bits per character. The entropy of the system is ½ so it would be hard (and probably not worthwhile) to improve on this coding. It worthwhile to note that every sequence of 0s and 1s can be seen as a stream of Huffman coded characters (except that at the end we may have some digits left over). 11. Improving on the source code theorem. The second part in Shannon’s theorem assumes that there is no correlation between successive characters in the stream of information produced by the source.
Recommended publications
  • Package 'Pinsplus'
    Package ‘PINSPlus’ August 6, 2020 Encoding UTF-8 Type Package Title Clustering Algorithm for Data Integration and Disease Subtyping Version 2.0.5 Date 2020-08-06 Author Hung Nguyen, Bang Tran, Duc Tran and Tin Nguyen Maintainer Hung Nguyen <[email protected]> Description Provides a robust approach for omics data integration and disease subtyping. PIN- SPlus is fast and supports the analysis of large datasets with hundreds of thousands of sam- ples and features. The software automatically determines the optimal number of clus- ters and then partitions the samples in a way such that the results are ro- bust against noise and data perturbation (Nguyen et.al. (2019) <DOI: 10.1093/bioinformat- ics/bty1049>, Nguyen et.al. (2017)<DOI: 10.1101/gr.215129.116>). License LGPL Depends R (>= 2.10) Imports foreach, entropy , doParallel, matrixStats, Rcpp, RcppParallel, FNN, cluster, irlba, mclust RoxygenNote 7.1.0 Suggests knitr, rmarkdown, survival, markdown LinkingTo Rcpp, RcppArmadillo, RcppParallel VignetteBuilder knitr NeedsCompilation yes Repository CRAN Date/Publication 2020-08-06 21:20:02 UTC R topics documented: PINSPlus-package . .2 AML2004 . .2 KIRC ............................................3 PerturbationClustering . .4 SubtypingOmicsData . .9 1 2 AML2004 Index 13 PINSPlus-package Perturbation Clustering for data INtegration and disease Subtyping Description This package implements clustering algorithms proposed by Nguyen et al. (2017, 2019). Pertur- bation Clustering for data INtegration and disease Subtyping (PINS) is an approach for integraton of data and classification of diseases into various subtypes. PINS+ provides algorithms support- ing both single data type clustering and multi-omics data type. PINSPlus is an improved version of PINS by allowing users to customize the based clustering algorithm and perturbation methods.
    [Show full text]
  • C Strings and Pointers
    Software Design Lecture Notes Prof. Stewart Weiss C Strings and Pointers C Strings and Pointers Motivation The C++ string class makes it easy to create and manipulate string data, and is a good thing to learn when rst starting to program in C++ because it allows you to work with string data without understanding much about why it works or what goes on behind the scenes. You can declare and initialize strings, read data into them, append to them, get their size, and do other kinds of useful things with them. However, it is at least as important to know how to work with another type of string, the C string. The C string has its detractors, some of whom have well-founded criticism of it. But much of the negative image of the maligned C string comes from its abuse by lazy programmers and hackers. Because C strings are found in so much legacy code, you cannot call yourself a C++ programmer unless you understand them. Even more important, C++'s input/output library is harder to use when it comes to input validation, whereas the C input/output library, which works entirely with C strings, is easy to use, robust, and powerful. In addition, the C++ main() function has, in addition to the prototype int main() the more important prototype int main ( int argc, char* argv[] ) and this latter form is in fact, a prototype whose second argument is an array of C strings. If you ever want to write a program that obtains the command line arguments entered by its user, you need to know how to use C strings.
    [Show full text]
  • Technical Study Desktop Internationalization
    Technical Study Desktop Internationalization NIC CH A E L T S T U D Y [This page intentionally left blank] X/Open Technical Study Desktop Internationalisation X/Open Company Ltd. December 1995, X/Open Company Limited All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owners. X/Open Technical Study Desktop Internationalisation X/Open Document Number: E501 Published by X/Open Company Ltd., U.K. Any comments relating to the material contained in this document may be submitted to X/Open at: X/Open Company Limited Apex Plaza Forbury Road Reading Berkshire, RG1 1AX United Kingdom or by Electronic Mail to: [email protected] ii X/Open Technical Study (1995) Contents Chapter 1 Internationalisation.............................................................................. 1 1.1 Introduction ................................................................................................. 1 1.2 Character Sets and Encodings.................................................................. 2 1.3 The C Programming Language................................................................ 5 1.4 Internationalisation Support in POSIX .................................................. 6 1.5 Internationalisation Support in the X/Open CAE............................... 7 1.5.1 XPG4 Facilities.........................................................................................
    [Show full text]
  • Lecture 2: Variables and Primitive Data Types
    Lecture 2: Variables and Primitive Data Types MIT-AITI Kenya 2005 1 In this lecture, you will learn… • What a variable is – Types of variables – Naming of variables – Variable assignment • What a primitive data type is • Other data types (ex. String) MIT-Africa Internet Technology Initiative 2 ©2005 What is a Variable? • In basic algebra, variables are symbols that can represent values in formulas. • For example the variable x in the formula f(x)=x2+2 can represent any number value. • Similarly, variables in computer program are symbols for arbitrary data. MIT-Africa Internet Technology Initiative 3 ©2005 A Variable Analogy • Think of variables as an empty box that you can put values in. • We can label the box with a name like “Box X” and re-use it many times. • Can perform tasks on the box without caring about what’s inside: – “Move Box X to Shelf A” – “Put item Z in box” – “Open Box X” – “Remove contents from Box X” MIT-Africa Internet Technology Initiative 4 ©2005 Variables Types in Java • Variables in Java have a type. • The type defines what kinds of values a variable is allowed to store. • Think of a variable’s type as the size or shape of the empty box. • The variable x in f(x)=x2+2 is implicitly a number. • If x is a symbol representing the word “Fish”, the formula doesn’t make sense. MIT-Africa Internet Technology Initiative 5 ©2005 Java Types • Integer Types: – int: Most numbers you’ll deal with. – long: Big integers; science, finance, computing. – short: Small integers.
    [Show full text]
  • Chapter 4 Variables and Data Types
    PROG0101 Fundamentals of Programming PROG0101 FUNDAMENTALS OF PROGRAMMING Chapter 4 Variables and Data Types 1 PROG0101 Fundamentals of Programming Variables and Data Types Topics • Variables • Constants • Data types • Declaration 2 PROG0101 Fundamentals of Programming Variables and Data Types Variables • A symbol or name that stands for a value. • A variable is a value that can change. • Variables provide temporary storage for information that will be needed during the lifespan of the computer program (or application). 3 PROG0101 Fundamentals of Programming Variables and Data Types Variables Example: z = x + y • This is an example of programming expression. • x, y and z are variables. • Variables can represent numeric values, characters, character strings, or memory addresses. 4 PROG0101 Fundamentals of Programming Variables and Data Types Variables • Variables store everything in your program. • The purpose of any useful program is to modify variables. • In a program every, variable has: – Name (Identifier) – Data Type – Size – Value 5 PROG0101 Fundamentals of Programming Variables and Data Types Types of Variable • There are two types of variables: – Local variable – Global variable 6 PROG0101 Fundamentals of Programming Variables and Data Types Types of Variable • Local variables are those that are in scope within a specific part of the program (function, procedure, method, or subroutine, depending on the programming language employed). • Global variables are those that are in scope for the duration of the programs execution. They can be accessed by any part of the program, and are read- write for all statements that access them. 7 PROG0101 Fundamentals of Programming Variables and Data Types Types of Variable MAIN PROGRAM Subroutine Global Variables Local Variable 8 PROG0101 Fundamentals of Programming Variables and Data Types Rules in Naming a Variable • There a certain rules in naming variables (identifier).
    [Show full text]
  • The Art of the Javascript Metaobject Protocol
    The Art Of The Javascript Metaobject Protocol enough?Humphrey Ephraim never recalculate remains giddying: any precentorship she expostulated exasperated her nuggars west, is brocade Gus consultative too around-the-clock? and unbloody If dog-cheapsycophantical and or secularly, norman Partha how slicked usually is volatilisingPenrod? his nomadism distresses acceptedly or interlacing Card, and send an email to a recipient with. On Auslegung auf are Schallabstrahlung download the Aerodynamik von modernen Flugtriebwerken. This poll i send a naming convention, the art of metaobject protocol for the corresponding to. What might happen, for support, if you should load monkeypatched code in one ruby thread? What Hooks does Ruby have for Metaprogramming? Sass, less, stylus, aura, etc. If it finds one, it calls that method and passes itself as value object. What bin this optimization achieve? JRuby and the psd. Towards a new model of abstraction in software engineering. Buy Online in Aruba at aruba. The current run step approach is: Checkpoint. Python object room to provide usable string representations of hydrogen, one used for debugging and logging, another for presentation to end users. Method handles can we be used to implement polymorphic inline caches. Mop is not the metaobject? Rails is a nicely designed web framework. Get two FREE Books of character Moment sampler! The download the number IS still thought. This proxy therefore behaves equivalently to the card dispatch function, and no methods will be called on the proxy dispatcher before but real dispatcher is available. While desertcart makes reasonable efforts to children show products available in your kid, some items may be cancelled if funny are prohibited for import in Aruba.
    [Show full text]
  • Julia's Efficient Algorithm for Subtyping Unions and Covariant
    Julia’s Efficient Algorithm for Subtyping Unions and Covariant Tuples Benjamin Chung Northeastern University, Boston, MA, USA [email protected] Francesco Zappa Nardelli Inria of Paris, Paris, France [email protected] Jan Vitek Northeastern University, Boston, MA, USA Czech Technical University in Prague, Czech Republic [email protected] Abstract The Julia programming language supports multiple dispatch and provides a rich type annotation language to specify method applicability. When multiple methods are applicable for a given call, Julia relies on subtyping between method signatures to pick the correct method to invoke. Julia’s subtyping algorithm is surprisingly complex, and determining whether it is correct remains an open question. In this paper, we focus on one piece of this problem: the interaction between union types and covariant tuples. Previous work normalized unions inside tuples to disjunctive normal form. However, this strategy has two drawbacks: complex type signatures induce space explosion, and interference between normalization and other features of Julia’s type system. In this paper, we describe the algorithm that Julia uses to compute subtyping between tuples and unions – an algorithm that is immune to space explosion and plays well with other features of the language. We prove this algorithm correct and complete against a semantic-subtyping denotational model in Coq. 2012 ACM Subject Classification Theory of computation → Type theory Keywords and phrases Type systems, Subtyping, Union types Digital Object Identifier 10.4230/LIPIcs.ECOOP.2019.24 Category Pearl Supplement Material ECOOP 2019 Artifact Evaluation approved artifact available at https://dx.doi.org/10.4230/DARTS.5.2.8 Acknowledgements The authors thank Jiahao Chen for starting us down the path of understanding Julia, and Jeff Bezanson for coming up with Julia’s subtyping algorithm.
    [Show full text]
  • Does Personality Matter? Temperament and Character Dimensions in Panic Subtypes
    325 Arch Neuropsychiatry 2018;55:325−329 RESEARCH ARTICLE https://doi.org/10.5152/npa.2017.20576 Does Personality Matter? Temperament and Character Dimensions in Panic Subtypes Antonio BRUNO1 , Maria Rosaria Anna MUSCATELLO1 , Gianluca PANDOLFO1 , Giulia LA CIURA1 , Diego QUATTRONE2 , Giuseppe SCIMECA1 , Carmela MENTO1 , Rocco A. ZOCCALI1 1Department of Psychiatry, University of Messina, Messina, Italy 2MRC Social, Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, London, United Kingdom ABSTRACT Introduction: Symptomatic heterogeneity in the clinical presentation of and 12.78% of the total variance. Correlations analyses showed that Panic Disorder (PD) has lead to several attempts to identify PD subtypes; only “Somato-dissociative” factor was significantly correlated with however, no studies investigated the association between temperament T.C.I. “Self-directedness” (p<0.0001) and “Cooperativeness” (p=0.009) and character dimensions and PD subtypes. The study was aimed to variables. Results from the regression analysis indicate that the predictor verify whether personality traits were differentially related to distinct models account for 33.3% and 24.7% of the total variance respectively symptom dimensions. in “Somatic-dissociative” (p<0.0001) and “Cardiologic” (p=0.007) factors, while they do not show statistically significant effects on “Respiratory” Methods: Seventy-four patients with PD were assessed by the factor (p=0.222). After performing stepwise regression analysis, “Self- Mini-International Neuropsychiatric Interview (M.I.N.I.), and the directedness” resulted the unique predictor of “Somato-dissociative” Temperament and Character Inventory (T.C.I.). Thirteen panic symptoms factor (R²=0.186; β=-0.432; t=-4.061; p<0.0001). from the M.I.N.I.
    [Show full text]
  • Plain Text & Character Encoding
    Journal of eScience Librarianship Volume 10 Issue 3 Data Curation in Practice Article 12 2021-08-11 Plain Text & Character Encoding: A Primer for Data Curators Seth Erickson Pennsylvania State University Let us know how access to this document benefits ou.y Follow this and additional works at: https://escholarship.umassmed.edu/jeslib Part of the Scholarly Communication Commons, and the Scholarly Publishing Commons Repository Citation Erickson S. Plain Text & Character Encoding: A Primer for Data Curators. Journal of eScience Librarianship 2021;10(3): e1211. https://doi.org/10.7191/jeslib.2021.1211. Retrieved from https://escholarship.umassmed.edu/jeslib/vol10/iss3/12 Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License. This material is brought to you by eScholarship@UMMS. It has been accepted for inclusion in Journal of eScience Librarianship by an authorized administrator of eScholarship@UMMS. For more information, please contact [email protected]. ISSN 2161-3974 JeSLIB 2021; 10(3): e1211 https://doi.org/10.7191/jeslib.2021.1211 Full-Length Paper Plain Text & Character Encoding: A Primer for Data Curators Seth Erickson The Pennsylvania State University, University Park, PA, USA Abstract Plain text data consists of a sequence of encoded characters or “code points” from a given standard such as the Unicode Standard. Some of the most common file formats for digital data used in eScience (CSV, XML, and JSON, for example) are built atop plain text standards. Plain text representations of digital data are often preferred because plain text formats are relatively stable, and they facilitate reuse and interoperability.
    [Show full text]
  • Data Types in C
    Princeton University Computer Science 217: Introduction to Programming Systems Data Types in C 1 Goals of C Designers wanted C to: But also: Support system programming Support application programming Be low-level Be portable Be easy for people to handle Be easy for computers to handle • Conflicting goals on multiple dimensions! • Result: different design decisions than Java 2 Primitive Data Types • integer data types • floating-point data types • pointer data types • no character data type (use small integer types instead) • no character string data type (use arrays of small ints instead) • no logical or boolean data types (use integers instead) For “under the hood” details, look back at the “number systems” lecture from last week 3 Integer Data Types Integer types of various sizes: signed char, short, int, long • char is 1 byte • Number of bits per byte is unspecified! (but in the 21st century, pretty safe to assume it’s 8) • Sizes of other integer types not fully specified but constrained: • int was intended to be “natural word size” • 2 ≤ sizeof(short) ≤ sizeof(int) ≤ sizeof(long) On ArmLab: • Natural word size: 8 bytes (“64-bit machine”) • char: 1 byte • short: 2 bytes • int: 4 bytes (compatibility with widespread 32-bit code) • long: 8 bytes What decisions did the designers of Java make? 4 Integer Literals • Decimal: 123 • Octal: 0173 = 123 • Hexadecimal: 0x7B = 123 • Use "L" suffix to indicate long literal • No suffix to indicate short literal; instead must use cast Examples • int: 123, 0173, 0x7B • long: 123L, 0173L, 0x7BL • short:
    [Show full text]
  • Wording Improvements for Encodings and Character Sets
    Wording improvements for encodings and character sets Document #: P2297R0 Date: 2021-02-19 Project: Programming Language C++ Audience: SG-16 Reply-to: Corentin Jabot <[email protected]> Abstract Summary of behavior changes Alert & backspace The wording mandated that the executions encoding be able to encode ”alert, backspace, and carriage return”. This requirement is not used in the core wording (Tweaks of [5.13.3.3.1] may be needed), nor in the library wording, and therefore does not seem useful, so it was not added in the new wording. This will not have any impact on existing implementations. Unicode in raw string delimiters Falls out of the wording change. should we? New terminology Basic character set Formerly basic source character set. Represent the set of abstract (non-coded) characters in the graphic subset of the ASCII character set. The term ”source” has been dropped because the source code encoding is not observable nor relevant past phase 1. The basic character set is used: • As a subset of other encodings • To restric accepted characters in grammar elements • To restrict values in library literal character set, literal character encoding, wide literal character set, wide lit- eral character encoding Encodings and associated character sets of narrow and wide character and string literals. implementation-defined, and locale agnostic. 1 execution character set, execution character encoding, wide execution character set, wide execution character encoding Encodings and associated character sets of the encoding used by the library. isomorphic or supersets of their literal counterparts. Separating literal encodings from libraries encoding allows: • To make a distinction that exists in practice and which was not previously admitted by the standard previous.
    [Show full text]
  • Software II: Principles of Programming Languages
    Software II: Principles of Programming Languages Lecture 6 – Data Types Some Basic Definitions • A data type defines a collection of data objects and a set of predefined operations on those objects • A descriptor is the collection of the attributes of a variable • An object represents an instance of a user- defined (abstract data) type • One design issue for all data types: What operations are defined and how are they specified? Primitive Data Types • Almost all programming languages provide a set of primitive data types • Primitive data types: Those not defined in terms of other data types • Some primitive data types are merely reflections of the hardware • Others require only a little non-hardware support for their implementation The Integer Data Type • Almost always an exact reflection of the hardware so the mapping is trivial • There may be as many as eight different integer types in a language • Java’s signed integer sizes: byte , short , int , long The Floating Point Data Type • Model real numbers, but only as approximations • Languages for scientific use support at least two floating-point types (e.g., float and double ; sometimes more • Usually exactly like the hardware, but not always • IEEE Floating-Point Standard 754 Complex Data Type • Some languages support a complex type, e.g., C99, Fortran, and Python • Each value consists of two floats, the real part and the imaginary part • Literal form real component – (in Fortran: (7, 3) imaginary – (in Python): (7 + 3j) component The Decimal Data Type • For business applications (money)
    [Show full text]