Part V: Binary Response Data

Total Page:16

File Type:pdf, Size:1020Kb

Part V: Binary Response Data ✬ ✩ Part V: Binary response data ✫275 BIO 233, Spring 2015 ✪ ✬Western Collaborative Group Study ✩ Prospective study of coronary heart disease (CHD) • Recruited 3,524 men aged 39-59 between 1960-61 • ⋆ employed at 10 companies in California ⋆ baseline survey at intake ⋆ annual surveys until December 1969 Exclusions: • ⋆ 78 men who were actually outside the pre-specified age range ⋆ 141 subjects with CHD manifest at intake ⋆ 106 employees at one firm that excluded itself from follow-up ⋆ 45 subjects who were lost to follow-up, non-CHD death or self-exclusion prior to the first follow-up n = 3,154 study participants at risk for CHD • ✫276 BIO 233, Spring 2015 ✪ ✬Our primary goal is to investigate the relationship between ‘behavior ✩ • pattern’ and risk of CHD Participants were categorized into one of two behavior pattern groups: • Type A: characterized by enhanced aggressiveness, ambitiousness, competitive drive, and chronic sense of urgency Type B: characterized by more relaxed and non-competitive Data and documentation are available on the class website • > ## > load("WCGS_data.dat") > > dim(wcgs) [1] 3154 11 > names(wcgs) [1] "age" "ht" "wt" "sbp" "dbp" "chol" "ncigs" "behave" [9] "chd" "type" "time" ✫277 BIO 233, Spring 2015 ✪ ✬The variables (in column order) are: ✩ • 1 age age, years 2 ht height, in 3 wt weight, lbs 4 sbp systolic blood pressure, mmHg 5 dbp diastolic blood pressure, mmHg 6 chol cholesterol, mg/dL 7 ncigs number of cigarettes smoked per day 8 behave behavior type 0/1 = B/A 9 chd occurrence of a CHD event during follow-up 10 type type of CHD event 11 time time post-recruitment of the CHD event, days Values for the ‘risk factor’ covariates are those measured at the intake visit • The three CHD-related variables were measured prospectively • ⋆ over an approx. 8.5 years of follow-up ✫278 BIO 233, Spring 2015 ✪ ✬Important note: ✩ • ⋆ 423 were lost to follow-up ⋆ 140 men died during the follow-up For our purposes, we are going to ignore these issues and consider the • binary outcome of: 1 occurrence of CHD during follow-up Y = 0 otherwise In the dataset, the response variable is ‘chd’: • > ## > table(wcgs$chd) 0 1 2897 257 > round(mean(wcgs$chd) * 100, 1) [1] 8.1 ✫279 BIO 233, Spring 2015 ✪ ✬Primary exposure of interest is ’behave’: ✩ • > ## > table(wcgs$behave) 0 1 1565 1589 > round(mean(wcgs$behave) * 100, 1) [1] 50.4 Cross-tabulation and exposure-specific incidence • > ## > table(wcgs$behave, wcgs$chd) 0 1 0 1486 79 1 1411 178 > round(tapply(wcgs$chd, list(wcgs$behave), FUN=mean) * 100, 1) 0 1 5.0 11.2 ✫280 BIO 233, Spring 2015 ✪ ✬The probability of the occurrence of CHD during follow-up among type B ✩ • men is estimated to be 0.050 ⋆ expected percentage of type B men who will develop CHD during follow-up is 5.0% The probability of the occurrence of CHD during follow-up among type A • men is estimated to be 0.112 ⋆ expected percentage of type A men who will develop CHD during follow-up is 11.2% Often use the generic term ‘risk’ • Either way, it’s important to remember that these statements are referring • to populations of men, rather than the individuals themselves ⋆ we’ve estimated a common or average risk of CHD ⋆ referred to as the marginal risk ⋆ ‘marginal’ in the sense that it does not condition on anything else ✫281 BIO 233, Spring 2015 ✪ ✬Contrasts ✩ As stated at the start, the primary goal is to investigate the relationship • between behavior pattern and risk of CHD We’ve characterized risk for each type but the goal requires a comparison • of the risks To perform such a comparison we need to choose a contrast • Risk difference: • ⋆ RD = 0.112 - 0.050 = 0.062 ⋆ difference in the estimated risk of CHD during follow-up between type A and type B men is 0.062 (or 6.2%) ⋆ the way in which the additional risk of CHD of being a type A person manifests through an absolute increase ✫282 BIO 233, Spring 2015 ✪ ✬Relative risk: ✩ • ⋆ RR = 0.112 / 0.050 = 2.24 ⋆ ratio of the estimated risk of CHD for type A men during follow-up to the estimated risk for type B men ⋆ the way in which the additional risk of CHD of being a type A person manifests through an relative increase As with the interpretation of the risks themselves, these statements refer • to contrasts between populations ⋆ population of Type A men vs. population of Type B men Contrasts are ‘marginal’ in the sense that we don’t condition on anything • else when comparing the two populations ⋆ i.e. we don’t adjust for anything ✫283 BIO 233, Spring 2015 ✪ ✬Important to note that the RD and RR are related ✩ • ⋆ relationship depends on the value of the response probability for the ‘referent’ group RD across different combinations of P (Y = 1 X = 0) and RR • | 0.01 0.05 0.10 0.20 0.50 RR = 0.2 -0.008 -0.040 -0.08 -0.16 -0.40 RR = 0.5 -0.005 -0.025 -0.05 -0.10 -0.25 RR = 0.9 -0.001 -0.005 -0.01 -0.02 -0.05 RR=1.0 0 0 0 0 0 RR = 1.1 0.001 0.005 0.01 0.02 0.05 RR = 1.5 0.005 0.025 0.05 0.10 0.25 RR = 3.0 0.020 0.100 0.20 0.40 NA RR = 5.0 0.040 0.200 0.40 NA NA ✫284 BIO 233, Spring 2015 ✪ ✬The RD may be small even if the RR is big ✩ • ⋆ for either ‘protective’ or ‘detrimental’ effects When the RR is small, the RD is also small unless P (Y = 1 X = 0) is big • | ⋆ ‘common’ outcome However a small RR operating on a large population could correspond to a • big ‘public health’ impact ⋆ this rationale is often cited in studies of air pollution To move beyond simple contrasts, we need a more general framework for • modeling the relationship between the binary response and a vector of covariates ✫285 BIO 233, Spring 2015 ✪ ✬ GLMs for binary data ✩ We’ve noted that the Bernoulli distribution is the only possible distribution • for binary data ⋆ Y Bernoulli(µ) ∼ y 1−y fY (y; µ) = µ (1 µ) − fY (y; θ,φ) = exp yθ log(1 + exp θ ) { − { } } µ θ = log 1 µ − a(φ) = 1 b(θ) = log(1+exp θ ) c(y,φ) = 0 { } ✫286 BIO 233, Spring 2015 ✪ ✬The log-likelihood is ✩ • n ℓ(β; y) = yiθi b(θi) − i=1 Xn = yiθi log(1 + exp θi ) − { } i=1 X where θi is a function of β via T g(µi)= Xi β and exp θi µi = { } 1+exp θi { } ✫287 BIO 233, Spring 2015 ✪ ✬The score function for βj is ✩ • n ∂ℓ(β; y) ∂µi Xj,i = (yi µi) ∂β ∂η µ (1 µ ) − j i=1 i i i X − where the expression for ∂µi/∂ηi is dependent on the choice of the link function g( ) · Since the log-likelihood is only a function of β, the expected information • matrix is given by the (p + 1) (p + 1) matrix: × T ββ = X W X I where X is the design matrix for the model and W is a diagonal matrix with ith diagonal element 2 ∂µi 1 Wi = ∂ηi µi(1 µi) − ✫288 BIO 233, Spring 2015 ✪ ✬ Link functions ✩ In a GLM, the systematic component is given by • T g(µi) = ηi = Xi β We’ve noted previously that, for binary data, there are various options for • link functions including: linear: g(µi) = µi log: g(µi) = log(µi) µi logit: g(µi) = log 1 µi − probit: g(µi) = probit(µi) complementary log-log: g(µi) = log log(1 µi) {− − } ✫289 BIO 233, Spring 2015 ✪ ✬Q: How do we make a choice from among these options? ✩ Balance between interpretability and mathematical properties • ⋆ interpretability of contrasts ⋆ mathematical properties in terms of fitted values being in the appropriate range ✫290 BIO 233, Spring 2015 ✪ ✬Linear (identity) link function ✩ µi = β0 + β1Xi Interpret β0 as the probability of response when X = 0 • Interpret β1 as the change in the probability of response, comparing two • populations whose value of X differs by 1 unit The contrast we are modeling the risk difference (RD) • As we’ve noted, a potential problem is that this specification of the model • doesn’t respect the fact that the (true) response probability is bounded ✫291 BIO 233, Spring 2015 ✪ ✬Log link function ✩ log(µi) = β0 + β1Xi Interpret β0 as the log of the probability of response when X = 0 • ⋆ exp β0 is the probability of response when X = 0 { } Interpret β1 as the change in the log of the probability of response, • comparing two populations whose value of X differs by 1 unit ⋆ exp β1 is the ratio of the probability of response when X = 1 to that { } when X = 0 The contrast we are modeling the risk ratio (RR) • ✫292 BIO 233, Spring 2015 ✪ ✬As with the linear link, this choice of link function doesn’t necessarily ✩ • respect the fact that the (true) response probability is bounded Can see this explicitly this by considering the inverse of the link function: • T µi = exp X β { i } which takes values on (0, ) ∞ ✫293 BIO 233, Spring 2015 ✪ ✬Logit link function ✩ µi T logit(µi) = log = Xi β 1 µi − The functional • µi P (Yi = 1 Xi) = | 1 µi P (Yi = 0 Xi) − | is the odds of response Interpret β0 as the log of the odds of response when X = 0 • ⋆ exp β0 is the odds of response when X = 0 { } ✫294 BIO 233, Spring 2015 ✪ ✬Interpret β1 as the change in the log of the odds of response, comparing ✩ • two populations whose value of X differs by 1 unit ⋆ exp β1 is the ratio of the odds of response when X = 1 to that when { } X = 0 The contrast we are modeling is the odds ratio (OR) • Considering the inverse of the link function yields: • T exp Xi β µi = { } 1+exp XT β { i } ⋆ referred to as the ‘expit’ function ✫295 BIO 233, Spring 2015 ✪ ✬The expit function is the CDF of the standard logistic distribution ✩ • ⋆ distribution for a continuous random variable with support on ( , ) −∞ ∞ ⋆ pdf is given by exp x fX (x) = {− } 2 (1 + exp x ) {− } The CDF (of any distribution) provides a mapping from the support of the • random variable to the (0,1) interval FX ( ):( , ) (0, 1) · −∞ ∞ −→ We could use the inverse CDF of any distribution as a link function • F−1( ):(0, 1) ( , ) X · −→ −∞ ∞ ⋆ g( ) F−1( ) maps µ (0, 1)
Recommended publications
  • Generalized Linear Models (Glms)
    San Jos´eState University Math 261A: Regression Theory & Methods Generalized Linear Models (GLMs) Dr. Guangliang Chen This lecture is based on the following textbook sections: • Chapter 13: 13.1 – 13.3 Outline of this presentation: • What is a GLM? • Logistic regression • Poisson regression Generalized Linear Models (GLMs) What is a GLM? In ordinary linear regression, we assume that the response is a linear function of the regressors plus Gaussian noise: 0 2 y = β0 + β1x1 + ··· + βkxk + ∼ N(x β, σ ) | {z } |{z} linear form x0β N(0,σ2) noise The model can be reformulate in terms of • distribution of the response: y | x ∼ N(µ, σ2), and • dependence of the mean on the predictors: µ = E(y | x) = x0β Dr. Guangliang Chen | Mathematics & Statistics, San Jos´e State University3/24 Generalized Linear Models (GLMs) beta=(1,2) 5 4 3 β0 + β1x b y 2 y 1 0 −1 0.0 0.2 0.4 0.6 0.8 1.0 x x Dr. Guangliang Chen | Mathematics & Statistics, San Jos´e State University4/24 Generalized Linear Models (GLMs) Generalized linear models (GLM) extend linear regression by allowing the response variable to have • a general distribution (with mean µ = E(y | x)) and • a mean that depends on the predictors through a link function g: That is, g(µ) = β0x or equivalently, µ = g−1(β0x) Dr. Guangliang Chen | Mathematics & Statistics, San Jos´e State University5/24 Generalized Linear Models (GLMs) In GLM, the response is typically assumed to have a distribution in the exponential family, which is a large class of probability distributions that have pdfs of the form f(x | θ) = a(x)b(θ) exp(c(θ) · T (x)), including • Normal - ordinary linear regression • Bernoulli - Logistic regression, modeling binary data • Binomial - Multinomial logistic regression, modeling general cate- gorical data • Poisson - Poisson regression, modeling count data • Exponential, Gamma - survival analysis Dr.
    [Show full text]
  • The Hexadecimal Number System and Memory Addressing
    C5537_App C_1107_03/16/2005 APPENDIX C The Hexadecimal Number System and Memory Addressing nderstanding the number system and the coding system that computers use to U store data and communicate with each other is fundamental to understanding how computers work. Early attempts to invent an electronic computing device met with disappointing results as long as inventors tried to use the decimal number sys- tem, with the digits 0–9. Then John Atanasoff proposed using a coding system that expressed everything in terms of different sequences of only two numerals: one repre- sented by the presence of a charge and one represented by the absence of a charge. The numbering system that can be supported by the expression of only two numerals is called base 2, or binary; it was invented by Ada Lovelace many years before, using the numerals 0 and 1. Under Atanasoff’s design, all numbers and other characters would be converted to this binary number system, and all storage, comparisons, and arithmetic would be done using it. Even today, this is one of the basic principles of computers. Every character or number entered into a computer is first converted into a series of 0s and 1s. Many coding schemes and techniques have been invented to manipulate these 0s and 1s, called bits for binary digits. The most widespread binary coding scheme for microcomputers, which is recog- nized as the microcomputer standard, is called ASCII (American Standard Code for Information Interchange). (Appendix B lists the binary code for the basic 127- character set.) In ASCII, each character is assigned an 8-bit code called a byte.
    [Show full text]
  • Generalized Linear Models
    CHAPTER 6 Generalized linear models 6.1 Introduction Generalized linear modeling is a framework for statistical analysis that includes linear and logistic regression as special cases. Linear regression directly predicts continuous data y from a linear predictor Xβ = β0 + X1β1 + + Xkβk.Logistic regression predicts Pr(y =1)forbinarydatafromalinearpredictorwithaninverse-··· logit transformation. A generalized linear model involves: 1. A data vector y =(y1,...,yn) 2. Predictors X and coefficients β,formingalinearpredictorXβ 1 3. A link function g,yieldingavectoroftransformeddataˆy = g− (Xβ)thatare used to model the data 4. A data distribution, p(y yˆ) | 5. Possibly other parameters, such as variances, overdispersions, and cutpoints, involved in the predictors, link function, and data distribution. The options in a generalized linear model are the transformation g and the data distribution p. In linear regression,thetransformationistheidentity(thatis,g(u) u)and • the data distribution is normal, with standard deviation σ estimated from≡ data. 1 1 In logistic regression,thetransformationistheinverse-logit,g− (u)=logit− (u) • (see Figure 5.2a on page 80) and the data distribution is defined by the proba- bility for binary data: Pr(y =1)=y ˆ. This chapter discusses several other classes of generalized linear model, which we list here for convenience: The Poisson model (Section 6.2) is used for count data; that is, where each • data point yi can equal 0, 1, 2, ....Theusualtransformationg used here is the logarithmic, so that g(u)=exp(u)transformsacontinuouslinearpredictorXiβ to a positivey ˆi.ThedatadistributionisPoisson. It is usually a good idea to add a parameter to this model to capture overdis- persion,thatis,variationinthedatabeyondwhatwouldbepredictedfromthe Poisson distribution alone.
    [Show full text]
  • Modelling Binary Outcomes
    Modelling Binary Outcomes 01/12/2020 Contents 1 Modelling Binary Outcomes 5 1.1 Cross-tabulation . .5 1.1.1 Measures of Effect . .6 1.1.2 Limitations of Tabulation . .6 1.2 Linear Regression and dichotomous outcomes . .6 1.2.1 Probabilities and Odds . .8 1.3 The Binomial Distribution . .9 1.4 The Logistic Regression Model . 10 1.4.1 Parameter Interpretation . 10 1.5 Logistic Regression in Stata . 11 1.5.1 Using predict after logistic ........................ 13 1.6 Other Possible Models for Proportions . 13 1.6.1 Log-binomial . 14 1.6.2 Other Link Functions . 16 2 Logistic Regression Diagnostics 19 2.1 Goodness of Fit . 19 2.1.1 R2 ........................................ 19 2.1.2 Hosmer-Lemeshow test . 19 2.1.3 ROC Curves . 20 2.2 Assessing Fit of Individual Points . 21 2.3 Problems of separation . 23 3 Logistic Regression Practical 25 3.1 Datasets . 25 3.2 Cross-tabulation and Logistic Regression . 25 3.3 Introducing Continuous Variables . 26 3.4 Goodness of Fit . 27 3.5 Diagnostics . 27 3.6 The CHD Data . 28 3 Contents 4 1 Modelling Binary Outcomes 1.1 Cross-tabulation If we are interested in the association between two binary variables, for example the presence or absence of a given disease and the presence or absence of a given exposure. Then we can simply count the number of subjects with the exposure and the disease; those with the exposure but not the disease, those without the exposure who have the disease and those without the exposure who do not have the disease.
    [Show full text]
  • Section II Descriptive Statistics for Continuous & Binary Data (Including
    Section II Descriptive Statistics for continuous & binary Data (including survival data) II - Checklist of summary statistics Do you know what all of the following are and what they are for? One variable – continuous data (variables like age, weight, serum levels, IQ, days to relapse ) _ Means (Y) Medians = 50th percentile Mode = most frequently occurring value Quartile – Q1=25th percentile, Q2= 50th percentile, Q3=75th percentile) Percentile Range (max – min) IQR – Interquartile range = Q3 – Q1 SD – standard deviation (most useful when data are Gaussian) (note, for a rough approximation, SD 0.75 IQR, IQR 1.33 SD) Survival curve = life table CDF = Cumulative dist function = 1 – survival curve Hazard rate (death rate) One variable discrete data (diseased yes or no, gender, diagnosis category, alive/dead) Risk = proportion = P = (Odds/(1+Odds)) Odds = P/(1-P) Relation between two (or more) continuous variables (Y vs X) Correlation coefficient (r) Intercept = b0 = value of Y when X is zero Slope = regression coefficient = b, in units of Y/X Multiple regression coefficient (bi) from a regression equation: Y = b0 + b1X1 + b2X2 + … + bkXk + error Relation between two (or more) discrete variables Risk ratio = relative risk = RR and log risk ratio Odds ratio (OR) and log odds ratio Logistic regression coefficient (=log odds ratio) from a logistic regression equation: ln(P/1-P)) = b0 + b1X1 + b2X2 + … + bkXk Relation between continuous outcome and discrete predictor Analysis of variance = comparing means Evaluating medical tests – where the test is positive or negative for a disease In those with disease: Sensitivity = 1 – false negative proportion In those without disease: Specificity = 1- false positive proportion ROC curve – plot of sensitivity versus false positive (1-specificity) – used to find best cutoff value of a continuously valued measurement DESCRIPTIVE STATISTICS A few definitions: Nominal data come as unordered categories such as gender and color.
    [Show full text]
  • Yes, No, Maybe So: Tips and Tricks for Using 0/1 Binary Variables Laurie Hamilton, Healthcare Management Solutions LLC, Columbia MD
    NESUG 2012 Coders' Corner Yes, No, Maybe So: Tips and Tricks for Using 0/1 Binary Variables Laurie Hamilton, Healthcare Management Solutions LLC, Columbia MD ABSTRACT Many SAS® programmers are familiar with the use of 0/1 binary variables in various statistical procedures. But 0/1 variables are also useful in basic database construction, data profiling and QC techniques. By definition, a binary variable is a flavor of categorical variable, an outcome or response measure with only two possible values. In this paper, we will use a sample dataset composed of 0/1 numeric binary variables to demon- strate some tricks and tips for quick Data Profiling and Quality Assurance using basic SAS functions and the MEANS and FREQ procedures. INTRODUCTION Binary variables are a type of categorical variable, specifically those variables which can have only a Yes or No value. We see these types of variables often in Questionnaire type data, which is the example we will use in this paper. We will begin with a brief discussion of the options constructing 0/1 variables, including selection of data type and the implications for the coding of missing values. We will then look at some tips and tricks for using the SAS® functions SUM, NMISS and CAT to create summary variables from multiple binary variables across individual ob- servations and also demonstrate one method for constructing a Pattern variable which combines all the infor- mation in multiple binary variables into one character string. The SAS® procedures PROC MEANS and PROC FREQ are ideally suited to quickly profiling data composed of 0/1 numeric binary variables and we will explore some applications of those procedures using our sample data.
    [Show full text]
  • Data Representation
    Data Representation Data Representation Chapter Three A major stumbling block many beginners encounter when attempting to learn assembly language is the common use of the binary and hexadecimal numbering systems. Many programmers think that hexadecimal (or hex1) numbers represent absolute proof that God never intended anyone to work in assembly language. While it is true that hexadecimal numbers are a little different from what you may be used to, their advan- tages outweigh their disadvantages by a large margin. Nevertheless, understanding these numbering systems is important because their use simplifies other complex topics including boolean algebra and logic design, signed numeric representation, character codes, and packed data. 3.1 Chapter Overview This chapter discusses several important concepts including the binary and hexadecimal numbering sys- tems, binary data organization (bits, nibbles, bytes, words, and double words), signed and unsigned number- ing systems, arithmetic, logical, shift, and rotate operations on binary values, bit fields and packed data. This is basic material and the remainder of this text depends upon your understanding of these concepts. If you are already familiar with these terms from other courses or study, you should at least skim this material before proceeding to the next chapter. If you are unfamiliar with this material, or only vaguely familiar with it, you should study it carefully before proceeding. All of the material in this chapter is important! Do not skip over any material. In addition to the basic material, this chapter also introduces some new HLA state- ments and HLA Standard Library routines. 3.2 Numbering Systems Most modern computer systems do not represent numeric values using the decimal system.
    [Show full text]
  • Intro to Systems Digital Logic
    CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 3, 2015 Reading Quiz Today • Hardware basics Circuits: Borrow some • Machine memory models paper if you need to! • Digital signals • Logic gates • Manipulating/Representing values in hardware • Adders • Storage & memory (latches) Hardware Models (1940’s) • Harvard Architecture: CPU Program Data (Control and Memory Memory Arithmetic) Input/Output • Von Neumann Architecture: CPU (Control and Program Arithmetic) and Data Memory Input/Output Von Neumann Architecture Model • Computer is a generic computing machine: • Based on Alan Turing’s Universal Turing Machine • Stored program model: computer stores program rather than encoding it (feed in data and instructions) • No distinction between data and instructions memory • 5 parts connected by buses (wires): • Memory, Control, Processing, Input, Output Memory Cntrl Unit | Processing Unit Input/Output addr bus cntrl bus data bus Memory CPU: Cntrl Unit ALU Input/Output MAR MDR PC IR registers addr bus cntrl bus data bus Memory: data and instructions are stored in memory memory is addressable: addr 0, 1, 2, … • Memory Address Register: address to read/write • Memory Data Register: value to read/write Processing Unit: executes instrs selected by cntrl unit • ALU (artithmetic logic unit):“Register” simmple functional units: ADD, SUB… • Registers: temporary storage directly accessible by instructions Control unit: determinesSmall, very order vast instorage which space. instrs execute • PC: program counter:Fixed addr size of
    [Show full text]
  • Nand2tetris Chapters 1
    1 Boolean Logic1 Such simple things, And we make of them something so complex it defeats us, Almost. —John Ashbery (b. 1927), American poet Every digital device—be it a personal computer, a cellular telephone, or a network router—is based on a set of chips designed to store and process information. Although these chips come in different shapes and forms, they are all made from the same building blocks: Elementary logic gates. The gates can be physically implemented in many different materials and fabrication technologies, but their logical behavior is consistent across all computers. In this chapter we start out with one primitive logic gate—Nand—and build all the other logic gates from it. The result is a rather standard set of gates, which will be later used to construct our computer’s processing and storage chips. This will be done in chapters 2 and 3, respectively. All the hardware chapters in the book, beginning with this one, have the same structure. Each chapter focuses on a well-defined task, designed to construct or integrate a certain family of chips. The prerequisite knowledge needed to approach this task is provided in a brief Background section. The next section provides a complete Specification of the chips’ abstractions, namely, the various services that they should deliver, one way or another. Having presented the what, a subsequent Implementation section proposes guidelines and hints about how the chips can be 1 This document is an adaptation of Chapter 1 of The Elements of Computing Systems, by Noam Nisan and Shimon Schocken, which the authors have graciously made available on their web site: http://nand2tetris.org.
    [Show full text]
  • Chapter 4: the Building Blocks: Binary Numbers, Boolean Logic, and Gates
    Objectives Chapter 4: The Building In this chapter, you will learn about: Blocks: Binary Numbers, The binary numbering system Boolean Logic, and Gates Boolean logic and gates Invitation to Computer Science, Building computer circuits C++ Version, Third Edition Control circuits Invitation to Computer Science, C++ Version, Third Edition 1 Invitation to Computer Science, C++ Version, Third Edition 2 Introduction The Binary Numbering System Chapter 4 focuses on hardware design (also A computer’s internal storage techniques are called logic design) different from the way people represent information in daily lives How to represent and store information inside a computer How to use the principles of symbolic logic to Information inside a digital computer is stored as design gates a collection of binary data How to use gates to construct circuits that perform operations such as adding and comparing numbers, and fetching instructions Invitation to Computer Science, C++ Version, Third Edition 3 Invitation to Computer Science, C++ Version, Third Edition 4 Binary Representation of Numeric and Textual Information Figure 4.2 Binary-to-Decimal Binary numbering system Conversion Table Base-2 Built from ones and zeros Each position is a power of 2 1101 = 1 x 2 3 + 1 x 2 2 + 0 x 2 1 + 1 x 2 0 Decimal numbering system Base-10 Each position is a power of 10 3052 = 3 x 10 3 + 0 x 10 2 + 5 x 10 1 + 2 x 10 0 Invitation to Computer Science, C++ Version, Third Edition 5 Invitation to Computer Science, C++ Version, Third Edition 6 Binary Representation
    [Show full text]
  • Binary Response and Logistic Regression Analysis
    Chapter 3 Binary Response and Logistic Regression Analysis February 7, 2001 Part of the Iowa State University NSF/ILI project Beyond Traditional Statistical Methods Copyright 2000 D. Cook, P. Dixon, W. M. Duckworth, M. S. Kaiser, K. Koehler, W. Q. Meeker and W. R. Stephenson. Developed as part of NSF/ILI grant DUE9751644. Objectives This chapter explains • the motivation for the use of logistic regression for the analysis of binary response data. • simple linear regression and why it is inappropriate for binary response data. • a curvilinear response model and the logit transformation. • the use of maximum likelihood methods to perform logistic regression. • how to assess the fit of a logistic regression model. • how to determine the significance of explanatory variables. Overview Modeling the relationship between explanatory and response variables is a fundamental activity encoun- tered in statistics. Simple linear regression is often used to investigate the relationship between a single explanatory (predictor) variable and a single response variable. When there are several explanatory variables, multiple regression is used. However, often the response is not a numerical value. Instead, the response is simply a designation of one of two possible outcomes (a binary response) e.g. alive or dead, success or failure. Although responses may be accumulated to provide the number of successes and the number of failures, the binary nature of the response still remains. Data involving the relationship between explanatory variables and binary responses abound in just about every discipline from engineering to, the natural sciences, to medicine, to education, etc. How does one model the relationship between explanatory variables and a binary response variable? This chapter looks at binary response data and its analysis via logistic regression.
    [Show full text]
  • MARIE: an Introduction to a Simple Computer
    00068_CH04_Null.qxd 10/18/10 12:03 PM Page 195 “When you wish to produce a result by means of an instrument, do not allow yourself to complicate it.” —Leonardo da Vinci CHAPTER MARIE: An Introduction 4 to a Simple Computer 4.1 INTRODUCTION esigning a computer nowadays is a job for a computer engineer with plenty of Dtraining. It is impossible in an introductory textbook such as this (and in an introductory course in computer organization and architecture) to present every- thing necessary to design and build a working computer such as those we can buy today. However, in this chapter, we first look at a very simple computer called MARIE: a Machine Architecture that is Really Intuitive and Easy. We then pro- vide brief overviews of Intel and MIPs machines, two popular architectures reflecting the CISC and RISC design philosophies. The objective of this chapter is to give you an understanding of how a computer functions. We have, therefore, kept the architecture as uncomplicated as possible, following the advice in the opening quote by Leonardo da Vinci. 4.2 CPU BASICS AND ORGANIZATION From our studies in Chapter 2 (data representation) we know that a computer must manipulate binary-coded data. We also know from Chapter 3 that memory is used to store both data and program instructions (also in binary). Somehow, the program must be executed and the data must be processed correctly. The central processing unit (CPU) is responsible for fetching program instructions, decod- ing each instruction that is fetched, and performing the indicated sequence of operations on the correct data.
    [Show full text]