Joint Mplane / Bigfoot Phd School: DAY 1 Hadoop Mapreduce: Theory and Practice

Total Page:16

File Type:pdf, Size:1020Kb

Joint Mplane / Bigfoot Phd School: DAY 1 Hadoop Mapreduce: Theory and Practice Joint mPlane / BigFoot PhD School: DAY 1 Hadoop MapReduce: Theory and Practice Pietro Michiardi Eurecom bigfootproject.eu ict-mplane.eu Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 1 / 86 Sources and Acks Jimmy Lin and Chris Dyer, “Data-Intensive Text Processing with MapReduce,” Morgan & Claypool Publishers, 2010. http://lintool.github.io/MapReduceAlgorithms/ Tom White, “Hadoop, The Definitive Guide,” O’Reilly / Yahoo Press, 2012 Anand Rajaraman, Jeffrey D. Ullman, Jure Leskovec, “Mining of Massive Datasets”, Cambridge University Press, 2013 Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 2 / 86 Introduction and Motivations Introduction and Motivations Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 3 / 86 Introduction and Motivations What is MapReduce? A programming model: I Inspired by functional programming I Parallel computations on massive amounts of data An execution framework: I Designed for large-scale data processing I Designed to run on clusters of commodity hardware Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 4 / 86 Introduction and Motivations What is this PhD School About DAY1: The MapReduce Programming Model I Principles of functional programming I Scalable algorithm design DAY2: In-depth description of Hadoop MapReduce I Architecture internals I Software components I Cluster deployments DAY3: Relational Algebra and High-Level Languages I Basic operators and their equivalence in MapReduce I Hadoop Pig and PigLatin Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 5 / 86 Introduction and Motivations What is Big Data? Vast repositories of data I The Web I Physics I Astronomy I Finance Volume, Velocity, Variety It’s not the algorithm, it’s the data! [1] I More data leads to better accuracy I With more data, accuracy of different algorithms converges Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 6 / 86 Key Principles Key Principles Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 7 / 86 Key Principles Scale out, not up! For data-intensive workloads, a large number of commodity servers is preferred over a small number of high-end servers I Cost of super-computers is not linear I But datacenter efficiency is a difficult problem to solve [2, 4] Some numbers (∼ 2012): I Data processed by Google every day: 100+ PB I Data processed by Facebook every day: 10+ PB Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 8 / 86 Key Principles Implications of Scaling Out Processing data is quick, I/O is very slow I 1 HDD = 75 MB/sec I 1000 HDDs = 75 GB/sec Sharing vs. Shared nothing: I Sharing: manage a common/global state I Shared nothing: independent entities, no common state Sharing is difficult: I Synchronization, deadlocks I Finite bandwidth to access data from SAN I Temporal dependencies are complicated (restarts) Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 9 / 86 Key Principles Failures are the norm, not the exception LALN data [DSN 2006] I Data for 5000 machines, for 9 years I Hardware: 60%, Software: 20%, Network 5% DRAM error analysis [Sigmetrics 2009] I Data for 2.5 years I 8% of DIMMs affected by errors Disk drive failure analysis [FAST 2007] I Utilization and temperature major causes of failures Amazon Web Service(s) failures [Several!] I Cascading effect Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 10 / 86 Key Principles Implications of Failures Failures are part of everyday life I Mostly due to the scale and shared environment Sources of Failures I Hardware / Software I Electrical, Cooling, ... I Unavailability of a resource due to overload Failure Types I Permanent I Transient Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 11 / 86 Key Principles Move Processing to the Data Drastic departure from high-performance computing model I HPC: distinction between processing nodes and storage nodes I HPC: CPU intensive tasks Data intensive workloads I Generally not processor demanding I The network becomes the bottleneck I MapReduce assumes processing and storage nodes to be collocated ! Data Locality Principle Distributed filesystems are necessary Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 12 / 86 Key Principles Process Data Sequentially and Avoid Random Access Data intensive workloads I Relevant datasets are too large to fit in memory I Such data resides on disks Disk performance is a bottleneck I Seek times for random disk access are the problem 10 F Example: 1 TB DB with 10 100-byte records. Updates on 1% requires 1 month, reading and rewriting the whole DB would take 1 day1 I Organize computation for sequential reads 1From a post by Ted Dunning on the Hadoop mailing list Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 13 / 86 Key Principles Implications of Data Access Patterns MapReduce is designed for: I Batch processing I involving (mostly) full scans of the data Typically, data is collected “elsewhere” and copied to the distributed filesystem I E.g.: Apache Flume, Hadoop Sqoop, ··· Data-intensive applications I Read and process the whole Web (e.g. PageRank) I Read and process the whole Social Graph (e.g. LinkPrediction, a.k.a. “friend suggest”) I Log analysis (e.g. Network traces, Smart-meter data, ··· ) Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 14 / 86 Key Principles Hide System-level Details Separate the what from the how I MapReduce abstracts away the “distributed” part of the system I Such details are handled by the framework BUT: In-depth knowledge of the framework is key I Custom data reader/writer I Custom data partitioning I Memory utilization Auxiliary components I Hadoop Pig I Hadoop Hive I Cascading/Scalding I ... and many many more! Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 15 / 86 Key Principles Seamless Scalability We can define scalability along two dimensions I In terms of data: given twice the amount of data, the same algorithm should take no more than twice as long to run I In terms of resources: given a cluster twice the size, the same algorithm should take no more than half as long to run Embarrassingly parallel problems I Simple definition: independent (shared nothing) computations on fragments of the dataset I How to to decide if a problem is embarrassingly parallel or not? MapReduce is a first attempt, not the final answer Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 16 / 86 The Programming Model The Programming Model Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 17 / 86 The Programming Model Functional Programming Roots Key feature: higher order functions I Functions that accept other functions as arguments I Map and Fold f f f f f g g g g g Figure: Illustration of map and fold. Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 18 / 86 The Programming Model Functional Programming Roots map phase: I Given a list, map takes as an argument a function f (that takes a single argument) and applies it to all element in a list fold phase: I Given a list, fold takes as arguments a function g (that takes two arguments) and an initial value (an accumulator) I g is first applied to the initial value and the first item in the list I The result is stored in an intermediate variable, which is used as an input together with the next item to a second application of g I The process is repeated until all items in the list have been consumed Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 19 / 86 The Programming Model Functional Programming Roots We can view map as a transformation over a dataset I This transformation is specified by the function f I Each functional application happens in isolation I The application of f to each element of a dataset can be parallelized in a straightforward manner We can view fold as an aggregation operation I The aggregation is defined by the function g I Data locality: elements in the list must be “brought together” I If we can group elements of the list, also the fold phase can proceed in parallel Associative and commutative operations I Allow performance gains through local aggregation and reordering Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 20 / 86 The Programming Model Functional Programming and MapReduce Equivalence of MapReduce and Functional Programming: I The map of MapReduce corresponds to the map operation I The reduce of MapReduce corresponds to the fold operation The framework coordinates the map and reduce phases: I Grouping intermediate results happens in parallel In practice: I User-specified computation is applied (in parallel) to all input records of a dataset I Intermediate results are aggregated by another user-specified computation Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 21 / 86 The Programming Model What can we do with MapReduce? MapReduce “implements” a subset of functional programming I The programming model appears quite limited and strict There are several important problems that can be adapted to MapReduce I We will focus on illustrative cases I We will see in detail “design patterns” F How to transform a problem and its input F How to save memory and bandwidth in the system Pietro Michiardi (Eurecom) Joint mPlane / BigFoot PhD School: DAY 1 22 / 86 The Programming Model Data Structures Key-value pairs are the basic data structure in MapReduce I Keys and values can be: integers, float, strings, raw bytes I They can also be arbitrary data structures The design of MapReduce algorithms involves: 2 I Imposing the key-value structure on arbitrary datasets F E.g.: for a collection of Web pages, input keys may be URLs and values may be the HTML content I In some algorithms, input keys are not used, in others they uniquely identify a record I Keys can be combined in complex ways to design various algorithms 2There’s more about it: here we only look at the input to the map function.
Recommended publications
  • 4 Hash Tables and Associative Arrays
    4 FREE Hash Tables and Associative Arrays If you want to get a book from the central library of the University of Karlsruhe, you have to order the book in advance. The library personnel fetch the book from the stacks and deliver it to a room with 100 shelves. You find your book on a shelf numbered with the last two digits of your library card. Why the last digits and not the leading digits? Probably because this distributes the books more evenly among the shelves. The library cards are numbered consecutively as students sign up, and the University of Karlsruhe was founded in 1825. Therefore, the students enrolled at the same time are likely to have the same leading digits in their card number, and only a few shelves would be in use if the leadingCOPY digits were used. The subject of this chapter is the robust and efficient implementation of the above “delivery shelf data structure”. In computer science, this data structure is known as a hash1 table. Hash tables are one implementation of associative arrays, or dictio- naries. The other implementation is the tree data structures which we shall study in Chap. 7. An associative array is an array with a potentially infinite or at least very large index set, out of which only a small number of indices are actually in use. For example, the potential indices may be all strings, and the indices in use may be all identifiers used in a particular C++ program.Or the potential indices may be all ways of placing chess pieces on a chess board, and the indices in use may be the place- ments required in the analysis of a particular game.
    [Show full text]
  • JS and the DOM
    Back to the client side: JS and the DOM 1 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento Q What are the DOM and the BOM? 2 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento JS and the DOM When a web page is loaded, the browser creates a Document Object Model of the page, which is as a tree of Objects. Every element in a document—the document as a whole, the head, tables within the document, table headers, text within the table cells—is part of its DOM, so they can all be accessed and manipulated using the DOM and a scripting language like JavaScript. § With the object model, JavaScript can: § change all the HTML [elements, attributes, styles] in the page § add or remove existing HTML elements and attributes § react to HTML events in the page § create new HTML events in the page 3 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento Using languages implementations of the DOM can be built for any language: e.g. Javascript Java Python … But Javascript is the only one that can work client-side. 4 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento Q What are the fundamental objects in DOM/BOM? 5 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento Fundamental datatypes - 1 § Node Every object located within a document is a node. In an HTML document, an object can be an element node but also a text node or attribute node. § Document (is-a Node) § the root document object itself.
    [Show full text]
  • Project 0: Implementing a Hash Table
    CS165: DATA SYSTEMS Project 0: Implementing a Hash Table CS 165, Data Systems, Fall 2019 Goal and Motivation. The goal of Project 0 is to help you develop (or refresh) basic skills at designing and implementing data structures and algorithms. These skills will be essential for following the class. Project 0 is meant to be done prior to the semester or during the early weeks. If you are doing it before the semester feel free to contact the staff for any help or questions by coming to the office hours or posting in Piazza. We expect that for most students Project 0 will take anything between a few days to a couple of weeks, depending in the student’s background on the above areas. If you are having serious trouble navigating Project 0 then you should reconsider the decision to take CS165. You can expect the semester project to be multiple orders of magnitude more work and more complex. How much extra work is this? Project 0 is actually designed as a part of the fourth milestone of the semester project. So after finishing Project 0 you will have refreshed some basic skills and you will have a part of your project as well. Basic Project Description. In many computer applications, it is often necessary to store a collection of key-value pairs. For example, consider a digital movie catalog. In this case, keys are movie names and values are their corresponding movie descriptions. Users of the application look up movie names and expect the program to fetch their corresponding descriptions quickly.
    [Show full text]
  • PAM: Parallel Augmented Maps
    PAM: Parallel Augmented Maps Yihan Sun Daniel Ferizovic Guy E. Belloch Carnegie Mellon University Karlsruhe Institute of Technology Carnegie Mellon University [email protected] [email protected] [email protected] Abstract 1 Introduction Ordered (key-value) maps are an important and widely-used The map data type (also called key-value store, dictionary, data type for large-scale data processing frameworks. Beyond table, or associative array) is one of the most important da- simple search, insertion and deletion, more advanced oper- ta types in modern large-scale data analysis, as is indicated ations such as range extraction, filtering, and bulk updates by systems such as F1 [60], Flurry [5], RocksDB [57], Or- form a critical part of these frameworks. acle NoSQL [50], LevelDB [41]. As such, there has been We describe an interface for ordered maps that is augment- significant interest in developing high-performance paral- ed to support fast range queries and sums, and introduce a lel and concurrent algorithms and implementations of maps parallel and concurrent library called PAM (Parallel Augment- (e.g., see Section 2). Beyond simple insertion, deletion, and ed Maps) that implements the interface. The interface includes search, this work has considered “bulk” functions over or- a wide variety of functions on augmented maps ranging from dered maps, such as unions [11, 20, 33], bulk-insertion and basic insertion and deletion to more interesting functions such bulk-deletion [6, 24, 26], and range extraction [7, 14, 55]. as union, intersection, filtering, extracting ranges, splitting, One particularly useful function is to take a “sum” over a and range-sums.
    [Show full text]
  • Chapter 6 : DATA TYPES
    Chapter 6 : DATA TYPES •Introduction • Primitive Data Types • Character String Types • User-Defined Ordinal Types • Array Types • Associative Arrays • Record Types • Union Types • Pointer Types Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 6-1 Introduction •A data type defines a collection of data objects and a set of predefined operations on those objects • Evolution of data types: – FORTRAN I (1957) - INTEGER, REAL, arrays – Ada (1983) - User can create a unique type for every category of variables in the problem space and have the system enforce the types Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 6-2 Introduction • Design issues for all data types: 1. What is the syntax of references to variables? 2. What operations are defined and how are they specified? Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 6-3 1 Primitive Data Types • Those not defined in terms of other data types 1. Integer – Almost always an exact reflection of the hardware, so the mapping is trivial – There may be as many as eight different integer types in a language Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 6-4 Primitive Data Types 2. Floating Point – Model real numbers, but only as approximations – Languages for scientific use support at least two floating-point types; sometimes more – Usually exactly like the hardware, but not always Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 6-5 IEEE Floating Point Formats Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 6-6 2 Primitive Data Types 3. Decimal – For business applications (money) – Store a fixed number of decimal digits (coded) – Advantage: accuracy – Disadvantages: limited range, wastes memory Copyright © 2006 Pearson Addison-Wesley.
    [Show full text]
  • A Short Course On
    Pointers Standard Libraries Classes ASHORT COURSE ON C++ Dr. Johnson School of Mathematics Semester 1 2008 university-logo Dr. Johnson C++ Pointers Standard Libraries Classes OUTLINE 1 POINTERS Why Pointers An Array is just a Pointer Pointers and Functions 2 STANDARD LIBRARIES Standard Containers 3 CLASSES Constructing a Class Using Classes Inheritance university-logo Dr. Johnson C++ Pointers Standard Libraries Classes OUTLINE 1 POINTERS Why Pointers An Array is just a Pointer Pointers and Functions 2 STANDARD LIBRARIES Standard Containers 3 CLASSES Constructing a Class Using Classes Inheritance university-logo Dr. Johnson C++ Pointers Standard Libraries Classes OUTLINE 1 POINTERS Why Pointers An Array is just a Pointer Pointers and Functions 2 STANDARD LIBRARIES Standard Containers 3 CLASSES Constructing a Class Using Classes Inheritance university-logo Dr. Johnson C++ Pointers Why Pointers Standard Libraries An Array is just a Pointer Classes Pointers and Functions OUTLINE 1 POINTERS Why Pointers An Array is just a Pointer Pointers and Functions 2 STANDARD LIBRARIES Standard Containers 3 CLASSES Constructing a Class Using Classes Inheritance university-logo Dr. Johnson C++ Pointers Why Pointers Standard Libraries An Array is just a Pointer Classes Pointers and Functions WHY DO WE NEED POINTERS? Pointers are an important mechanism in any computer program. Fortran pointers are hidden from the user. Pointers store the location of a value, rather than the value. 0xfff8 0xfff4 6 0xfff8 Memory a ptr_a Compiler 0xfff8 0xfff4 university-logo Dr. Johnson C++ Pointers Why Pointers Standard Libraries An Array is just a Pointer Classes Pointers and Functions USING POINTERS Declare pointers by putting * in front of the variable name.
    [Show full text]
  • Specification of Manifest AUTOSAR AP Release 18-03
    Specification of Manifest AUTOSAR AP Release 18-03 Document Title Specification of Manifest Document Owner AUTOSAR Document Responsibility AUTOSAR Document Identification No 713 Document Status Final Part of AUTOSAR Standard Adaptive Platform Part of Standard Release 18-03 Document Change History Date Release Changed by Description AUTOSAR 2018-03-29 18-03 Release • Time Synchronization Management • DDS Deployment • Optional elements in Service Interfaces • Interaction with web services AUTOSAR • Secure Communication 2017-10-27 17-10 Release • Support for interaction with crypto Management and persistency • Signal-to-Service translation • Support for E2E communication • Platform Health Management • Uploadable Software Package AUTOSAR 2017-03-31 17-03 Release • Initial release Management 1 of 602 Document ID 713: AUTOSAR_TPS_ManifestSpecification — AUTOSAR CONFIDENTIAL — Specification of Manifest AUTOSAR AP Release 18-03 2 of 602 Document ID 713: AUTOSAR_TPS_ManifestSpecification — AUTOSAR CONFIDENTIAL — Specification of Manifest AUTOSAR AP Release 18-03 Disclaimer This work (specification and/or software implementation) and the material contained in it, as released by AUTOSAR, is for the purpose of information only. AUTOSAR and the companies that have contributed to it shall not be liable for any use of the work. The material contained in this work is protected by copyright and other types of intel- lectual property rights. The commercial exploitation of the material contained in this work requires a license to such intellectual property rights. This work may be utilized or reproduced without any modification, in any form or by any means, for informational purposes only. For any other purpose, no part of the work may be utilized or reproduced, in any form or by any means, without permission in writing from the publisher.
    [Show full text]
  • Associative Arrays in C+ + Andrew Koenig AT&T Bell Laboratories 184 Liberty Corner Road; Warren NJ 07060; [email protected]
    Associative arrays in C+ + Andrew Koenig AT&T Bell Laboratories 184 Liberty Corner Road; Warren NJ 07060; [email protected] ABSTRACT An associative array is a one-dimensional array of unbounded size whose subscripts can be non-integral types such as character strings. Traditionally associated with dynamically typed languages such as AWK and SNOBOL4, associative arrays have applications ranging from compiler symbol tables to databases and commercial data processing. This paper describes a family of C+ + class definitions for associative arrays, including programming examples, application suggestions, and notes on performance. It concludes with a complete programming example which serves the same purpose as a well-known system command but is much smaller and simpler. 1. Introduction An associative array is a data structure that acts like a one-dimensional array of unbounded size whose subscripts are not restricted to integral types: it associates values of one type with values of another type (the two types may be the same). Such data structures are common in AWK[1] and SNOBOL4[2], but unusual in languages such as C[3], which have traditionally concentrated on lower-level data structures. Suppose that m is an associative array, or map, whose subscripts, or keys, are of type String and whose elements, or values, are integers. Suppose further that s and t are String variables. Then m[s] is an int called the value associated with s. In this example m[s] is an lvalue: it may be used on either side of an assignment. If s and t are equal, m[s] and m[t] are the same int object.
    [Show full text]
  • A Lightweight, Composable Metamodelling Language for Specification and Validation of Internal Domain Specific Languages
    A Lightweight, Composable Metamodelling Language for Specification and Validation of Internal Domain Specific Languages Markus Klotzb¨ucher and Herman Bruyninckx Department of Mechanical Engineering Katholieke Universiteit Leuven, Belgium [email protected] [email protected] Abstract. This work describes a declarative and lightweight metamod- elling framework called uMF for modelling and validating structural con- straints on programming language data-structures. By depending solely on the Lua language, uMF is embeddable even in very constrained em- bedded systems. That way, models can be instantiated and checked close to or even within the final run-time environment, thus permitting vali- dation to take the current state of the system into account. In contrast to classical metamodelling languages such as ECore, uMF supports open world models that are only partially constrained. We illustrate the use of uMF with a real-world example of a robot control formalism and compare it with a previously hand-written implementation. The results indicate that strict formalization was facilitated while flexibility and readability of the constraints were improved. 1 Introduction A domain specific language (DSL) is built specifically to express the concepts of a particular domain. Generally, two types of DSL can be distinguished accord- ing to the underlying implementation strategy. An external DSL is constructed from scratch, starting from definition of the desired syntax to implementation of the associated parser for reading it. In contrast, internal or embedded DSL are constructed on top of an existing host language [5] [16]. While this technique has been used for decades, in particular by the Lisp and Scheme communities, it has recently gained popularity through dynamic languages such as Ruby.
    [Show full text]
  • Pl Sql Array of Records Example
    Pl Sql Array Of Records Example Which Renard disrespect so indistinctively that Fredrick defends her kyle? Garth remains cold-blooded: she understock her brassiere subintroduced too digestedly? Suicidal and unsystematised Thornton airlifts her intensions wouldst playfully or indent gallingly, is Walsh cytogenetic? Avoid using this technology, and subsequently casted into the array of pl sql records example shows you still the rows from key can able to Process contents of collection here. You use the new type name in the declaration, the same as with predefined types such as NUMBER. REPLACE PACKAGE BODY SCOTT. Add your thoughts here. SQL implementation, as follows. When defining a VARRAY type, you must specify its maximum size with a positive integer. SQL record support are coming in the next major release. IS BEGIN IF truth IS NOT NULL THEN DBMS_OUTPUT. Nested table is a collection in which the size of the array is not fixed. Yes, I want to use it in loops. Holds set of partial rows from EMPLOYEES table. Having duplicate values in the subset results in false, if those duplicates are not present in the main set. FORALL supports the RETURNING bulk bind syntax. The systems requirements links off this site are no longer active on IBM. SQL programming language, namely PLPGSQL. If you find an error or have a suggestion for improving our content, we would appreciate your feedback. In the third case, the subscript is outside the legal range. You can make them persistent for the life of a database session by declaring the type in a package and assigning the values in a package body.
    [Show full text]
  • Chapter 6 Part 1
    Chapter 6 part 1 Data Types (updated based on 11th edition) ISBN 0-321—49362-1 Chapter 6 Topics • Introduction • Primitive Data Types • Character String Types • User-Defined Ordinal Types • Array Types • Associative Arrays • Record Types • Union Types • Pointer and Reference Types 2 Primitive Data Types • Almost all programming languages provide a set of primitive data types • Primitive data types: Those not defined in terms of other data types • Some primitive data types are merely reflections of the hardware • Others require only a little non-hardware support for their implementation 3 Primitive Data Types: Integer • Almost always an exact reflection of the hardware so the mapping is trivial • There may be as many as eight different integer types in a language • Java’s signed integer sizes: byte, short, int, long • Example integer type of unlimited length not supported by hardware? 4 Primitive Data Types: Floating Point • Model real numbers, but only as approximations • Languages for scientific use support at least two floating-point types (e.g., float and double; sometimes more • Usually exactly like the hardware, but not always • IEEE Floating-Point • Standard 754 5 Primitive Data Types: Complex • Some languages support a complex type, e.g., Fortran and Python • Each value consists of two floats, the real part and the imaginary part • Literal form (in Python): • (7 + 3j), where 7 is the real part and 3 is the imaginary part 6 Primitive Data Types: Decimal • For business applications (money) – Essential to COBOL – C# offers a decimal
    [Show full text]
  • Cmps401classnoteschap06.Pdf
    Chapter 6 Data Types 6.1 Introduction 236 6.2 Primitive Data Types 238 6.3 Character String Types 242 6.4 Enumeration Types 247 6.5 Array Types 250 6.6 Associative Arrays 261 6.7 Record Types 263 6.8 Tuple Types 266 6.9 List Types 268 6.10 Union Types 270 6.11 Pointer and Reference Types 273 6.12 Optional Types 285 6.13 Type Checking 286 6.14 Strong Typing 287 6.15 Type Equivalence 288 6.16 Theory and Data Types 292 Summary • Bibliographic Notes • Review Questions • Problem Set • Programming Exercises 294 CMPS401 Class Notes (Chap06) Page 1 / 36 Dr. Kuo-pao Yang Chapter 6 Data Types 6.1 Introduction 236 • A data type defines a collection of data values and a set of predefined operations on those values. • Computer programs produce results by manipulating data. • ALGOL 68 provided a few basic types and a few flexible structure-defining operators that allow a programmer to design a data structure for each need. • A descriptor is the collection of the attributes of a variable. • In an implementation a descriptor is a collection of memory cells that store variable attributes. • If the attributes are static, descriptor are required only at compile time. • These descriptors are built by the compiler, usually as a part of the symbol table, and are used during compilation. • For dynamic attributes, part or all of the descriptor must be maintained during execution. • Descriptors are used for type checking and by allocation and deallocation operations. 6.2 Primitive Data Types 238 • Those not defined in terms of other data types are called primitive data types.
    [Show full text]