Bivariate Data

Total Page:16

File Type:pdf, Size:1020Kb

Bivariate Data BIVARIATE DATA David M. Lane. et al. Introduction to Statistics : pp. 172194 ioc.pdf [email protected] ICY0006: Lecture 3 1 / 24 Descriptive statistics Descriptive statistics is quantitatively describing the main features of a collection of information. It provides simple summaries about the observations that have been made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs. It involves two kinds of analysis: Univariate analysis: describing the distribution of a single variable, including I central tendency (mean, median, and mode) I dispersion (range and quantiles of the data-set, measures of spread such as the variance and standard deviation) I shape of the distribution (skewness and kurtosis Bivariate analysis: more than one variable are involved and describing the relationship between pairs of variables. In this case, descriptive statistics include: I Cross-tabulations and contingency tables I Graphical representation via scatterplots I Quantitative measures of dependence I Descriptions of conditional distributions ioc.pdf [email protected] ICY0006: Lecture 3 2 / 24 Descriptive statistics Descriptive statistics is quantitatively describing the main features of a collection of information. It provides simple summaries about the observations that have been made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs. It involves two kinds of analysis: Univariate analysis: describing the distribution of a single variable, including I central tendency (mean, median, and mode) I dispersion (range and quantiles of the data-set, measures of spread such as the variance and standard deviation) I shape of the distribution (skewness and kurtosis Bivariate analysis: more than one variable are involved and describing the relationship between pairs of variables. In this case, descriptive statistics include: I Cross-tabulations and contingency tables I Graphical representation via scatterplots I Quantitative measures of dependence I Descriptions of conditional distributions ioc.pdf [email protected] ICY0006: Lecture 3 2 / 24 Descriptive statistics Descriptive statistics is quantitatively describing the main features of a collection of information. It provides simple summaries about the observations that have been made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs. It involves two kinds of analysis: Univariate analysis: describing the distribution of a single variable, including I central tendency (mean, median, and mode) I dispersion (range and quantiles of the data-set, measures of spread such as the variance and standard deviation) I shape of the distribution (skewness and kurtosis Bivariate analysis: more than one variable are involved and describing the relationship between pairs of variables. In this case, descriptive statistics include: I Cross-tabulations and contingency tables I Graphical representation via scatterplots I Quantitative measures of dependence I Descriptions of conditional distributions ioc.pdf [email protected] ICY0006: Lecture 3 2 / 24 Contents 1 Introduction to Bivariate Data 2 Pearson product-moment correlation 3 Variance Sum Law II 4 Computing r by R ioc.pdf [email protected] ICY0006: Lecture 3 3 / 24 Next section 1 Introduction to Bivariate Data 2 Pearson product-moment correlation 3 Variance Sum Law II 4 Computing r by R ioc.pdf [email protected] ICY0006: Lecture 3 4 / 24 Bivariate Data more than one variable Often, more than one variable is collected on each individual. In health studies the variables such as age, sex, height, weight, blood pressure, and total cholesterol are often measured on each individual. Economic studies may be interested in, among other things, personal income and years of education. Bivariate data consists of data on two variables Usually we are interested in the relationship between the variables ioc.pdf [email protected] ICY0006: Lecture 3 5 / 24 Bivariate Data more than one variable Often, more than one variable is collected on each individual. In health studies the variables such as age, sex, height, weight, blood pressure, and total cholesterol are often measured on each individual. Economic studies may be interested in, among other things, personal income and years of education. Bivariate data consists of data on two variables Usually we are interested in the relationship between the variables ioc.pdf [email protected] ICY0006: Lecture 3 5 / 24 Example: Do the people tend to marry other people of about the same age? Our experience tells us yes, but how good is the correspondence? One way to address the question is to look at pairs of ages for a sample of married couples (an excerpt from a dataset consisting of 282 pairs of spousal ages): We see that, yes, husbands and wives tend to be of about the same age, with men having a tendency to be slightly older than their wives. ioc.pdf [email protected] ICY0006: Lecture 3 6 / 24 Example: Histograms and means of spousal ages Each distribution is fairly skewed with a long right tail. Not all husbands are older than their wives; this fact is lost when we separate the variables. Also the pairing within couple is lost by separating the variables. For example, based on the means alone, we cannot say what percentage of couples has younger husbands than wives. Another example of information not available from the separate descriptions what is the average age of husbands with 45-year-old wives? Finally, we do not know the relationship between the husband's age and the wife's age.ioc.pdf [email protected] ICY0006: Lecture 3 7 / 24 Example: Histograms and means of spousal ages Each distribution is fairly skewed with a long right tail. Not all husbands are older than their wives; this fact is lost when we separate the variables. Also the pairing within couple is lost by separating the variables. For example, based on the means alone, we cannot say what percentage of couples has younger husbands than wives. Another example of information not available from the separate descriptions what is the average age of husbands with 45-year-old wives? Finally, we do not know the relationship between the husband's age and the wife's age.ioc.pdf [email protected] ICY0006: Lecture 3 7 / 24 Visualization of Bivariate Data A scatter plot displays the bivariate data in a graphical form that maintains the pairing. Scatter plots that show linear relationships between variables can dier in several ways including the slope of the line about which they cluster and how tightly the points cluster about the line. This is a scatter plot of the paired ages (all 282 pairs): ioc.pdf [email protected] ICY0006: Lecture 3 8 / 24 Scatter plot Two observations: 1 there is a strong relationship between the husband's age and the wife's age: the older the husband, the older the wife. I When one variable (Y ) increases with the second variable (X ), we say that X and Y have a positive association. I When Y decreases as X increases, we say that they have a negative association. 2 The points cluster along a straight line. When this occurs, the relationship is called a linear relationship. I There is a perfect linear relationship between two variables if a scatterplot of the points falls on a straight line. I The relationship is linear even if the points diverge from the line as long as the divergence is random rather than being systematic. ioc.pdf [email protected] ICY0006: Lecture 3 9 / 24 Linear and non-linear relationships Perfect negative relationship Non-linear relationship ioc.pdf [email protected] ICY0006: Lecture 3 10 / 24 Weak relationship and no relationship Weak positive relationship No relationship ioc.pdf [email protected] ICY0006: Lecture 3 11 / 24 Next section 1 Introduction to Bivariate Data 2 Pearson product-moment correlation 3 Variance Sum Law II 4 Computing r by R ioc.pdf [email protected] ICY0006: Lecture 3 12 / 24 Pearson's correlation coecient Pearson's correlation coecient is a statistical measure of the strength of a linear relationship between paired data. The symbol for Pearson's correlation is r when it is measured in the population andr when it is measured in a sample. (Further on, we are dealing with samples and will use r). Denition Let X = fx1;:::;xN g and Y = fy1;:::;yN g are two datasets (two samples) with means MX and MY and standard deviations sX and sY respectively, then the sample Pearson correlation coecient (or simply correlation coecient) is dened by the formula ∑(X − M )(Y − M ) r = X Y : sX sY Considering the formula of standard deviation, we obtain the formula for computing: ∑(X − M )(Y − M ) ∑XY − NM M r = X Y = X Y p 2 2 q 2 2 ∑(X − MX ) ∑(Y − MY ) 2 2 ∑X − NMX ∑Y − NMY ioc.pdf [email protected] ICY0006: Lecture 3 13 / 24 Computing Pearson's r Example ∑XY − NM M r = X Y q 2 2 2 2 ∑X − NMX ∑Y − NMY 210 − 5 · 4 · 9 210 − 180 30 r = = p = p = 0:9682458 p(96 − 5 · 42)(465 − 5 · 92) 16 · 60 960 ioc.pdf [email protected] ICY0006: Lecture 3 14 / 24 Correlation coecients ioc.pdf [email protected] ICY0006: Lecture 3 15 / 24 Properties of Pearson's r The Pearson correlation coecient is symmetric: r = cor(X ;Y ) = cor(Y ;X ). r is restricted as −1 6 r 6 1. The Pearson correlation coecient is invariant to separate changes in location and scale in the two variables. That is, we may transform X to a + bX and transform Y to c + dY , where a;b;c, and d are constants with b;d 6= 0, without changing the correlation coecient.
Recommended publications
  • Chapter 9 Descriptive Statistics for Bivariate Data
    9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to explore the distribution of the values of a single variable) in Chapters 2 and 3. In this chapter we will consider bivariate data description. That is, we will discuss descriptive methods used to explore the joint distribution of the pairs of values of a pair of variables. The joint distribution of a pair of variables is the way in which the pairs of possible values of these variables are distributed among the units in the group of interest. When we measure or observe pairs of values for a pair of variables, we want to know how the two variables behave together (the joint distribution of the two variables), as well as how each variable behaves individually (the marginal distributions of each variable). In this chapter we will restrict our attention to bivariate data description for two quan- titative variables. We will make a distinction between two types of variables. A response variable is a variable that measures the response of a unit to natural or experimental stimuli. A response variable provides us with the measurement or observation that quan- ti¯es a relevant characteristic of a unit. An explanatory variable is a variable that can be used to explain, in whole or in part, how a unit responds to natural or experimental stimuli. This terminology is clearest in the context of an experimental study. Consider an experiment where a unit is subjected to a treatment (a speci¯c combination of conditions) and the response of the unit to the treatment is recorded.
    [Show full text]
  • T-TEST Outline Hypothesis Testing Steps Bivariate Analysis
    Bivariate Analysis T-TEST Variable 1 2 LEVELS >2 LEVELS CONTINUOUS Variable 2 2 LEVELS X2 X2 t-test chi square test chi square test >2 LEVELS X2 X2 ANOVA chi square test chi square test (F-test) CONTINUOUS t-test ANOVA -Correlation (F-test) -Simple linear Regression Outline Comparison of means: t-test Hypothesis testing steps T-test is used when one variable is of a continuous nature and the other is dichotomous. T-test The t-test is used to compare the means of two groups on a given variable. Anova Examples: Difference in average blood pressure among males & females. Difference in average BMI among those who exercise and those who do not. Hypothesis testing steps Comparison of means: t-test Example 1: Identify the study objective Research question: Among university students, is there a State the null & alternative hypothesis difference between the average weight for males versus females? Select the proper test statistic Null hypothesis (Ho): μ weight males = μ weight females Calculate the test statistic Alternative hypothesis (Ha): μ weight males ≠ μ weight females Take a statistical decision based on the p-value. Statistical test: t-test Reject or accept the null hypothesis Comparison of means: t-test Comparison of means: t-test T-Test (SPSS output) If this p-value is < 0.05 then reject null hypothesis and conclude that the variances are different (accept alternative) Group Statistics and hence check this p-value for the t-test. Std. Error gender N Mean Std. Deviation Mean weight male 804 75.92 12.843 .453 If this p-value is > 0.05 then accept null hypothesis and female 1135 56.47 8.923 .265 conclude that the variances are equal and hence check this p-value for the t-test.
    [Show full text]
  • Chapter Five Multivariate Statistics
    Chapter Five Multivariate Statistics Jorge Luis Romeu IIT Research Institute Rome, NY 13440 April 23, 1999 Introduction In this chapter we treat the multivariate analysis problem, which occurs when there is more than one piece of information from each subject, and present and discuss several materials analysis real data sets. We first discuss several statistical procedures for the bivariate case: contingency tables, covariance, correlation and linear regression. They occur when both variables are either qualitative or quantitative: Then, we discuss the case when one variable is qualitative and the other quantitative, via the one way ANOVA. We then overview the general multivariate regression problem. Finally, the non-parametric case for comparison of several groups is discussed. We emphasize the assessment of all model assumptions, prior to model acceptance and use and we present some methods of detection and correction of several types of assumption violation problems. The Case of Bivariate Data Up to now, we have dealt with data sets where each observation consists of a single measurement (e.g. each observation consists of a tensile strength measurement). These are called univariate observations and the related statistical problem is known as univariate analysis. In many cases, however, each observation yields more than one piece of information (e.g. tensile strength, material thickness, surface damage). These are called multivariate observations and the statistical problem is now called multivariate analysis. Multivariate analysis is of great importance and can help us enhance our data analysis in several ways. For, coming from the same subject, multivariate measurements are often associated with each other. If we are able to model this association then we can take advantage of the situation to obtain one from the other.
    [Show full text]
  • Visualizing Data Descriptive Statistics I
    Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Anthony Tanbakuchi Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the author © 2009 (Compile date: Tue May 19 14:48:11 2009) Contents 1 Visualizing Data1 Dot plots ........ 7 1.1 Frequency distribu- Stem and leaf plots... 7 tions: univariate data.2 1.3 Visualizing univariate Standard frequency dis- qualitative data..... 8 tributions.... 3 Bar plot......... 8 Relative frequency dis- Pareto chart....... 9 tributions.... 4 Cumulative frequency 1.4 Visualizing bivariate distributions.. 5 quantitative data . 11 1.2 Visualizing univariate Scatter plots . 11 quantitative data.... 5 Time series plots . 12 Histograms....... 5 1.5 Summary . 13 1 Visualizing Data The Six Steps in Statistics 1. Information is needed about a parameter, a relationship, or a hypothesis needs to be tested. 2. A study or experiment is designed. 3. Data is collected. 4. The data is described and explored. 5. The parameter is estimated, the relationship is modeled, the hypothesis is tested. 6. Conclusions are made. 1 2 of 14 1.1 Frequency distributions: univariate data Descriptive statistics Descriptive statistics. Definition 1.1 summarize and describe collected data (sample). Contrast with inferential statistics which makes inferences and generaliza- tions about populations. Useful characteristics for summarizing data Example 1 (Heights of students in class). If we are interested in heights of students and we measure the heights of students in this class, how can we summarize the data in useful ways? What to look for in data shape are most of the data all around the same value, or are they spread out evenly between a range of values? How is the data distributed? center what is the average value of the data? variation how much do the values vary from the center? outliers are any values significantly different from the rest? Class height data Load the class data, then look at what variables are defined.
    [Show full text]
  • Package 'Biwavelet'
    Package ‘biwavelet’ May 26, 2021 Type Package Title Conduct Univariate and Bivariate Wavelet Analyses Version 0.20.21 Date 2021-05-24 Author Tarik C. Gouhier, Aslak Grinsted, Viliam Simko Maintainer Tarik C. Gouhier <[email protected]> Description This is a port of the WTC MATLAB package written by Aslak Grinsted and the wavelet program written by Christopher Torrence and Gibert P. Compo. This package can be used to perform univariate and bivariate (cross-wavelet, wavelet coherence, wavelet clustering) analyses. License GPL (>= 2) URL https://github.com/tgouhier/biwavelet BugReports https://github.com/tgouhier/biwavelet/issues LazyData yes LinkingTo Rcpp Imports fields, foreach, Rcpp (>= 0.12.2) Suggests testthat, knitr, rmarkdown, devtools RoxygenNote 7.1.1 NeedsCompilation yes Repository CRAN Date/Publication 2021-05-26 05:10:10 UTC R topics documented: biwavelet-package . .2 ar1.spectrum . .4 ar1_ma0_sim . .5 arrow ............................................6 arrow2 . .7 1 2 biwavelet-package check.data . .7 check.datum . .8 convolve2D . .9 convolve2D_typeopen . 10 enviro.data . 10 get_minroots . 11 MOTHERS . 12 phase.plot . 12 plot.biwavelet . 13 pwtc............................................. 17 rcpp_row_quantile . 19 rcpp_wt_bases_dog . 20 rcpp_wt_bases_morlet . 21 rcpp_wt_bases_paul . 22 smooth.wavelet . 23 wclust . 24 wdist . 25 wt.............................................. 26 wt.bases . 28 wt.bases.dog . 29 wt.bases.morlet . 29 wt.bases.paul . 30 wt.sig . 31 wtc.............................................. 32 wtc.sig . 34 wtc_sig_parallel . 36 xwt ............................................. 37 Index 40 biwavelet-package Conduct Univariate and Bivariate Wavelet Analyses Description This is a port of the WTC MATLAB package written by Aslak Grinsted and the wavelet program written by Christopher Torrence and Gibert P. Compo. This package can be used to perform uni- variate and bivariate (cross-wavelet, wavelet coherence, wavelet clustering) wavelet analyses.
    [Show full text]
  • Meta4diag: Bayesian Bivariate Meta-Analysis of Diagnostic Test Studies for Routine Practice
    meta4diag: Bayesian Bivariate Meta-analysis of Diagnostic Test Studies for Routine Practice J. Guo and A. Riebler Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, PO 7491, Norway. July 8, 2016 Abstract This paper introduces the R package meta4diag for implementing Bayesian bivariate meta-analyses of diagnostic test studies. Our package meta4diag is a purpose-built front end of the R package INLA. While INLA offers full Bayesian inference for the large set of latent Gaussian models using integrated nested Laplace approximations, meta4diag extracts the features needed for bivariate meta-analysis and presents them in an intuitive way. It allows the user a straightforward model- specification and offers user-specific prior distributions. Further, the newly proposed penalised complexity prior framework is supported, which builds on prior intuitions about the behaviours of the variance and correlation parameters. Accurate posterior marginal distributions for sensitivity and specificity as well as all hyperparameters, and covariates are directly obtained without Markov chain Monte Carlo sampling. Further, univariate estimates of interest, such as odds ratios, as well as the SROC curve and other common graphics are directly available for interpretation. An in- teractive graphical user interface provides the user with the full functionality of the package without requiring any R programming. The package is available through CRAN https://cran.r-project.org/web/packages/meta4diag/ and its usage will be illustrated using three real data examples. arXiv:1512.06220v2 [stat.AP] 7 Jul 2016 1 1 Introduction A meta-analysis summarises the results from multiple studies with the purpose of finding a general trend across the studies.
    [Show full text]
  • Correlation and Regression Analysis
    OIC ACCREDITATION CERTIFICATION PROGRAMME FOR OFFICIAL STATISTICS Correlation and Regression Analysis TEXTBOOK ORGANISATION OF ISLAMIC COOPERATION STATISTICAL ECONOMIC AND SOCIAL RESEARCH AND TRAINING CENTRE FOR ISLAMIC COUNTRIES OIC ACCREDITATION CERTIFICATION PROGRAMME FOR OFFICIAL STATISTICS Correlation and Regression Analysis TEXTBOOK {{Dr. Mohamed Ahmed Zaid}} ORGANISATION OF ISLAMIC COOPERATION STATISTICAL ECONOMIC AND SOCIAL RESEARCH AND TRAINING CENTRE FOR ISLAMIC COUNTRIES © 2015 The Statistical, Economic and Social Research and Training Centre for Islamic Countries (SESRIC) Kudüs Cad. No: 9, Diplomatik Site, 06450 Oran, Ankara – Turkey Telephone +90 – 312 – 468 6172 Internet www.sesric.org E-mail [email protected] The material presented in this publication is copyrighted. The authors give the permission to view, copy download, and print the material presented that these materials are not going to be reused, on whatsoever condition, for commercial purposes. For permission to reproduce or reprint any part of this publication, please send a request with complete information to the Publication Department of SESRIC. All queries on rights and licenses should be addressed to the Statistics Department, SESRIC, at the aforementioned address. DISCLAIMER: Any views or opinions presented in this document are solely those of the author(s) and do not reflect the views of SESRIC. ISBN: xxx-xxx-xxxx-xx-x Cover design by Publication Department, SESRIC. For additional information, contact Statistics Department, SESRIC. i CONTENTS Acronyms
    [Show full text]
  • Multiple Linear Regression
    Multiple Linear Regression Mark Tranmer Mark Elliot 1 Contents Section 1: Introduction............................................................................................................... 3 Exam16 ...................................................................................................................................................... Exam11...................................................................................................................................... 4 Predicted values......................................................................................................................... 5 Residuals.................................................................................................................................... 6 Scatterplot of exam performance at 16 against exam performance at 11.................................. 6 1.3 Theory for multiple linear regression................................................................................... 7 Section 2: Worked Example using SPSS.................................................................................. 10 Section 3: Further topics.......................................................................................................... 36 Stepwise................................................................................................................................... 46 Section 4: BHPS assignment................................................................................................... 46 Reading list.............................................................................................................................
    [Show full text]
  • WHY MULTIVARIATE ANALYSIS? • Used for Analysing Complicated
    WHY MULTIVARIATE ANALYSIS? • Used for analysing complicated data sets • When there are many Independent Variables (IVs) and/or many Dependent Variables (DVs) • When IVs and DVs are correlated with one Dr. Azmi Mohd Tamil another to varying degrees Jabatan Kesihatan Masyarakat Fakulti Perubatan UKM • When need to come up with Prediction Model Based on lecture notes by Dr Azidah Hashim • Parallels greater complexity of contemporary research USING MULTIVARIATE ANALYSIS CHOICE OF APPROPRIATE STATISTICAL METHOD BASED ON: • WHICH STATISTICAL PROCEDURE TO USE? • Nature of IVs and DVs • HOW TO PERFORM CHOSEN PROCEDURE? • Investigator’s Experience • Personal Preferences • HOW TO INFER FROM RESULTS OBTAINED? • Ease of Comfort with Methods Used • Literature Review References • ANY OTHER ALTERNATIVE APPROACH? • Consultation with Statistician BASIC PREMISE I: BASIC PREMISE I: TERM USED:- Relationship between X Y X Y Independent Variable (IV) Dependent Variable (DV) Predictor Outcome Explanatory Response Eg . Smoking Lung Cancer Risk Factor Effect Age Hypertension Covariates (Continuous) Maternal ANC Birthweight Factor (Categorical) Control INDEPENDENT DEPENDENT VARIABLE VARIABLE Confounders RISK FACTOR OUTCOME Nuisance 1 BASIC PREMISE II: TERMINOLOGY: Univariate Analysis DATA • Analysis in which there is a single DV Bivariate Analysis Qualitative Quantitative • Analysis of two variables (categorical) • Wish to simply study the relationship between -Dichotomous -Continuous the variables - Polynomial -Discrete Multivariate Analysis • Simultaneously analyse multiple DVs and IVs Rough Guide to Multivariate BASIC PREMISE 111:VARIATIONS OF Methods (1) THE SAME THEME Name Xs Y THE GENERAL LINEAR MODEL: Regression and Continuous Continuous Correlation (eg. age) (eg. BP) Y = a + b1x1 + b2x2 + b3x3 …. + bixi Analysis of Variance Categorical Continuous Used in the following procedures: (ANOVA) (eg.
    [Show full text]
  • Rainbow Plots, Bagplots and Boxplots for Functional Data
    Rainbow plots, bagplots and boxplots for functional data Rob J Hyndman and Han Lin Shang Department of Econometrics and Business Statistics, Monash University, Clayton, Australia June 5, 2009 Abstract We propose new tools for visualizing large amounts of functional data in the form of smooth curves. The proposed tools include functional versions of the bagplot and boxplot, and make use of the first two robust principal component scores, Tukey's data depth and highest density regions. By-products of our graphical displays are outlier detection methods for functional data. We compare these new outlier detection methods with existing methods for detecting outliers in functional data, and show that our methods are better able to identify the outliers. An R-package containing computer code and data sets is available in the online supple- ments. Keywords: Highest density regions, Robust principal component analysis, Kernel density estima- tion, Outlier detection, Tukey's halfspace depth. 1 Introduction Functional data are becoming increasingly common in a wide range of fields, and there is a need to develop new statistical tools for analyzing such data. In this paper, we are interested in visualizing data comprising smooth curves (e.g., Ramsay & Silverman, 2005; Locantore et al., 1999). Such functional data may be age-specific mortality or fertility rates (Hyndman & Ullah, 2007), term- structured yield curves (Kargin & Onatski, 2008), spectrometry data (Reiss & Ogden, 2007), or one of the many applications described by Ramsay & Silverman (2002). Ramsay & Silverman (2005) and Ferraty & Vieu (2006) provide detailed surveys of the many parametric and nonparametric techniques for analyzing functional data. 1 Visualization methods help in the discovery of characteristics that might not have been ap- parent using mathematical models and summary statistics; and yet this area of research has not received much attention in the functional data analysis literature to date.
    [Show full text]
  • Geographic Information Systems Correlation Modeling As A
    Boise State University ScholarWorks Anthropology Graduate Projects and Theses Department of Anthropology 6-1-2011 Geographic Information Systems Correlation Modeling as a Management Tool in the Study Effects of Environmental Variables’ Effects on Cultural Resources Brian Wallace Boise State University GEOGRAPHIC INFORMATION SYSTEMS CORRELATION MODELING AS A MANAGEMENT TOOL IN THE STUDY EFFECTS OF ENVIRONMENTAL VARIABLES’ EFFECTS ON CULTURAL RESOURCES by Brian N. Wallace A Project Submitted in Partial Fulfillment Of the Requirements for the Degree of Master of Applied Anthropology Boise State University June 2011 BOISE STATE UNIVERSITY GRADUATE COLLEGE DEFENSE COMMITTEE APPROVAL of the project submitted by Brian N. Wallace The project presented by Brian N. Wallace entitled Geographic Information Systems Correlation Modeling as a Management Tool in the Study Effects of Environmental Variables’ Effects on Cultural Resources is hereby approved: ______________________________________________________ Mark G. Plew Date Advisor ______________________________________________________ John Ziker Date Committee Member ______________________________________________________ Margaret A. Streeter Date Committee Member ______________________________________________________ John R. (Jack) Pelton Date Dean of the Graduate College DEDICATION To my parents Jerry and Cheryl Wallace ii ACKNOWLEDGMENTS The author would like to express his appreciation to those individuals and entities who contributed to the completion of this thesis. Dr. Mark Plew is owed a great debt of gratitude for his invaluable advice, guidance, and friendship throughout my graduate experience. It is because of his helpful suggestions both for this paper and his mentorship over the last few years that have allowed me succeed beyond my own expectations. To the Cultural Resource and GIS staff of the Boise District of the Bureau of Land Management (BLM) whom allowed for access to the necessary data to run analyses successfully and for helpful guidance along the way.
    [Show full text]
  • Bivariate Analysis: the Statistical Analysis of the Relationship Between Two Variables
    bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ2) test for independence: A test of statistical significance used to assess the likelihood that an observed association between two variables could have occurred by chance. consistency checking: A data-cleaning procedure involving checking for unreasonable patterns of responses, such as a 12-year-old who voted in the last US presidential election. correlation coefficient: A statistical measure of the strength and direction of a linear relationship between two variables; it may vary from −1 to 0 to +1. data cleaning: The detection and correction of errors in a computer datafile that may have occurred during data collection, coding, and/or data entry. data matrix: The form of a computer datafile, with rows as cases and columns as variables; each cell represents the value of a particular variable (column) for a particular case (row). data processing: The preparation of data for analysis. descriptive statistics: Procedures for organizing and summarizing data. dummy variable: A variable or set of variable categories recoded to have values of 0 and 1. Dummy coding may be applied to nominal- or ordinal-scale variables for the purpose of regression or other numerical analysis. frequency distribution: A tabulation of the number of cases falling into each category of a variable. histogram: A graphic display in which the height of a vertical bar represents the frequency or percentage of cases in each category of an interval/ratio variable. imputation: A procedure for handling missing data in which missing values are assigned based on other information, such as the sample mean or known values of other variables.
    [Show full text]