Statistical Methods in Water Resources

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Methods in Water Resources Statistical Methods in Water Resources Chapter 3 of Section A, Statistical Analysis Book 4, Hydrologic Analysis and Interpretation Discharge, in cubic meters per second 50 100 200 500 1,000 2,000 5,000 3 20 ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● 10 ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● 5 ● ● ● ● ● ● ● 1 ● ● ● ● ● 2 ●●● ● 0 ● 1 ● ● ● ● 0.5 −1 ● ● ● 0.2 Concentration, in milligrams per liter in milligrams Concentration, ln(Concentration, in milligrams per liter) in milligrams ln(Concentration, −2 0.1 ● Normal quantiles −3 0.05 −3 −2 −1 0 1 2 3 ● 3 4 5 6 7 8 9 ● 1.0 ● ● ● ● ● ● ● ● ● ● ln(Discharge, in cubic meters per second) ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● Cumulative frequency Cumulative ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0 100 200 300 400 Annual mean discharge, in cubic meters per second Techniques and Methods 4–A3 Supersedes USGS Techniques of Water-Resources Investigations, book 4, chapter A3 U.S. Department of the Interior U.S. Geological Survey Cover: Top Left: A loess smooth curve of dissolved nitrate plus nitrite concentration as a function of discharge, Iowa River, at Wapello, Iowa, water years 1990–2008 for the months of June, July, August, and September. Top Right: U.S. Geological Survey scientists Frank Engel (left) and Aaron Walsh (right) sample suspended sediment at the North Fork Toutle River near Kid Valley, Washington, upstream of a U.S. Army Corps of Engineers sediment retention structure. This gage is operated in cooperation with the U.S. Army Corps of Engineers. Photograph by Molly Wood, U.S. Geological Survey, March 7, 2018. Bottom Left: Jackson Lake Dam and USGS streamgage 13011000, Snake River near Moran, Wyoming, within Grand Teton National Park. Photograph by Kenneth M. Nolan, U.S. Geological Survey. Bottom Right: Overlay of James River annual mean discharge (1900–2015) and standard normal distribution quantile plot. Statistical Methods in Water Resources By Dennis R. Helsel, Robert M. Hirsch, Karen R. Ryberg, Stacey A. Archfield, and Edward J. Gilroy Chapter 3 of Section A, Statistical Analysis Book 4, Hydrologic Analysis and Interpretation Techniques and Methods 4–A3 Supersedes USGS Techniques of Water-Resources Investigations, book 4, chapter A3 U.S. Department of the Interior U.S. Geological Survey U.S. Department of the Interior DAVID BERNHARDT, Secretary U.S. Geological Survey James F. Reilly II, Director U.S. Geological Survey, Reston, Virginia: 2020 First release: 1992 by Elsevier, in print Revised: September 2002 by the USGS, online as Techniques of Water-Resources Investigations (TWRI), book 4, chapter A3, version 1.1 Revised: May 2020, by the USGS, online and in print, as Techniques and Methods, book 4, chapter A3 Supersedes USGS Techniques of Water-Resources Investigations (TWRI), book 4, chapter A3, version 1.1 For more information on the USGS—the Federal source for science about the Earth, its natural and living resources, natural hazards, and the environment—visit https://www.usgs.gov or call 1–888–ASK–USGS. For an overview of USGS information products, including maps, imagery, and publications, visit https://store.usgs.gov. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. Although this information product, for the most part, is in the public domain, it also may contain copyrighted materials as noted in the text. Permission to reproduce copyrighted items must be secured from the copyright owner. Suggested citation: Helsel, D.R., Hirsch, R.M., Ryberg, K.R., Archfield, S.A., and Gilroy, E.J., 2020, Statistical methods in water resources: U.S. Geological Survey Techniques and Methods, book 4, chap. A3, 458 p., https://doi.org/10.3133/tm4a3. [Super- sedes USGS Techniques of Water-Resources Investigations, book 4, chap. A3, version 1.1.] Associated data for this publication: Helsel, D.R., Hirsch, R.M., Ryberg, K.R., Archfield, S.A., and Gilroy, E.J., 2020, Statistical methods in water resources—Supporting materials: U.S. Geological Survey data release, https://doi.org/10.5066/P9JWL6XR. ISSN 2328-7047 (print) ISSN 2328-7055 (online) iii Contents Chapter 1 Summarizing Univariate Data ..................................................................................................1 1.1 Characteristics of Water Resources Data .........................................................................................2 1.2 Measures of Central Tendency ............................................................................................................5 1.2.1 A Classical Measure of Central Tendency—The Arithmetic Mean...................................5 1.2.2 A Resistant Measure of Central Tendency—The Median ...................................................6 1.2.3 Other Measures of Central Tendency .....................................................................................7 1.3 Measures of Variability ..........................................................................................................................8 1.3.1 Classical Measures of Variability ............................................................................................8 1.3.2 Resistant Measures of Variability ............................................................................................9 1.3.3 The Coefficient of Variation—A Nondimensional Measure of Variability ......................10 1.4 Measures of Distribution Symmetry ..................................................................................................11 1.4.1 A Classical Measure of Symmetry—The Coefficient of Skewness .................................11 1.4.2 A Resistant Measure of Symmetry—The Quartile Skew ..................................................12 1.5 Other Resistant Measures of Symmetry ...........................................................................................12 1.6 Outliers ...................................................................................................................................................12 1.7 Transformations ....................................................................................................................................13 1.7.1 The Ladder of Powers ..............................................................................................................14 Chapter 2 Graphical Data Analysis .........................................................................................................17 2.1 Graphical Analysis of Single Datasets ..............................................................................................18 2.1.1 Histograms .................................................................................................................................18 2.1.2 Quantile Plots ............................................................................................................................20 2.1.3 Boxplots......................................................................................................................................22 2.1.4 Probability Plots ........................................................................................................................26 2.1.5 Q-Q plots as Exceedance Probability Plots ..........................................................................28 2.1.6 Deviations from a Linear Pattern on a Probability Plot ......................................................28 2.1.7 Probability Plots for Comparing Among Distributions ........................................................30 2.2 Graphical Comparisons of Two or More Datasets ..........................................................................30 2.2.1 Histograms .................................................................................................................................32 2.2.2 Dot-and-line Plots of Means and Standard Deviations .....................................................33 2.2.3 Side-by-side Boxplots ..............................................................................................................34 2.2.4 Q-Q Plots of Multiple Groups of Data ....................................................................................36 2.3 Scatterplots and Enhancements ........................................................................................................38 2.3.1 Evaluating Linearity ..................................................................................................................39 2.3.2 Evaluating Differences in Central Tendency on a Scatterplot ..........................................42 2.3.3 Evaluating Differences in Spread ..........................................................................................42 2.4 Graphs for Multivariate Data ..............................................................................................................45 2.4.1 Parallel Plots..............................................................................................................................46 2.4.2 Star Plots ....................................................................................................................................46 2.4.3 Trilinear and Piper Diagrams ..................................................................................................48 2.4.4 Scatterplot Matrix.....................................................................................................................50
Recommended publications
  • Birth Cohort Effects Among US-Born Adults Born in the 1980S: Foreshadowing Future Trends in US Obesity Prevalence
    International Journal of Obesity (2013) 37, 448–454 & 2013 Macmillan Publishers Limited All rights reserved 0307-0565/13 www.nature.com/ijo ORIGINAL ARTICLE Birth cohort effects among US-born adults born in the 1980s: foreshadowing future trends in US obesity prevalence WR Robinson1,2, KM Keyes3, RL Utz4, CL Martin1 and Y Yang2,5 BACKGROUND: Obesity prevalence stabilized in the US in the first decade of the 2000s. However, obesity prevalence may resume increasing if younger generations are more sensitive to the obesogenic environment than older generations. METHODS: We estimated cohort effects for obesity prevalence among young adults born in the 1980s. Using data collected from the National Health and Nutrition Examination Survey between 1971 and 2008, we calculated obesity for respondents aged between 2 and 74 years. We used the median polish approach to estimate smoothed age and period trends; residual non-linear deviations from age and period trends were regressed on cohort indicator variables to estimate birth cohort effects. RESULTS: After taking into account age effects and ubiquitous secular changes, cohorts born in the 1980s had increased propensity to obesity versus those born in the late 1960s. The cohort effects were 1.18 (95% CI: 1.01, 1.07) and 1.21 (95% CI: 1.02, 1.09) for the 1979–1983 and 1984–1988 birth cohorts, respectively. The effects were especially pronounced in Black males and females but appeared absent in White males. CONCLUSIONS: Our results indicate a generational divergence of obesity prevalence. Even if age-specific obesity prevalence stabilizes in those born before the 1980s, age-specific prevalence may continue to rise in the 1980s cohorts, culminating in record-high obesity prevalence as this generation enters its ages of peak obesity prevalence.
    [Show full text]
  • Fan Chart: Methodology and Its Application to Inflation Forecasting in India
    W P S (DEPR) : 5 / 2011 RBI WORKING PAPER SERIES Fan Chart: Methodology and its Application to Inflation Forecasting in India Nivedita Banerjee and Abhiman Das DEPARTMENT OF ECONOMIC AND POLICY RESEARCH MAY 2011 The Reserve Bank of India (RBI) introduced the RBI Working Papers series in May 2011. These papers present research in progress of the staff members of RBI and are disseminated to elicit comments and further debate. The views expressed in these papers are those of authors and not that of RBI. Comments and observations may please be forwarded to authors. Citation and use of such papers should take into account its provisional character. Copyright: Reserve Bank of India 2011 Fan Chart: Methodology and its Application to Inflation Forecasting in India Nivedita Banerjee 1 (Email) and Abhiman Das (Email) Abstract Ever since Bank of England first published its inflation forecasts in February 1996, the Fan Chart has been an integral part of inflation reports of central banks around the world. The Fan Chart is basically used to improve presentation: to focus attention on the whole of the forecast distribution, rather than on small changes to the central projection. However, forecast distribution is a technical concept originated from the statistical sampling distribution. In this paper, we have presented the technical details underlying the derivation of Fan Chart used in representing the uncertainty in inflation forecasts. The uncertainty occurs because of the inter-play of the macro-economic variables affecting inflation. The uncertainty in the macro-economic variables is based on their historical standard deviation of the forecast errors, but we also allow these to be subjectively adjusted.
    [Show full text]
  • Finding Basic Statistics Using Minitab 1. Put Your Data Values in One of the Columns of the Minitab Worksheet
    Finding Basic Statistics Using Minitab 1. Put your data values in one of the columns of the Minitab worksheet. 2. Add a variable name in the gray box just above the data values. 3. Click on “Stat”, then click on “Basic Statistics”, and then click on "Display Descriptive Statistics". 4. Choose the variable you want the basic statistics for click on “Select”. 5. Click on the “Statistics” box and then check the box next to each statistic you want to see, and uncheck the boxes next to those you do not what to see. 6. Click on “OK” in that window and click on “OK” in the next window. 7. The values of all the statistics you selected will appear in the Session window. Example (Navidi & Monk, Elementary Statistics, 2nd edition, #31 p. 143, 1st 4 columns): This data gives the number of tribal casinos in a sample of 16 states. 3 7 14 114 2 3 7 8 26 4 3 14 70 3 21 1 Open Minitab and enter the data under C1. The table below shows a portion of the entered data. ↓ C1 C2 Casinos 1 3 2 7 3 14 4 114 5 2 6 3 7 7 8 8 9 26 10 4 Now click on “Stat” and then choose “Basic Statistics” and “Display Descriptive Statistics”. Click in the box under “Variables:”, choose C1 from the window at left, and then click on the “Select” button. Next click on the “Statistics” button and choose which statistics you want to find. We will usually be interested in the following statistics: mean, standard error of the mean, standard deviation, minimum, maximum, range, first quartile, median, third quartile, interquartile range, mode, and skewness.
    [Show full text]
  • Continuous Random Variables
    ST 380 Probability and Statistics for the Physical Sciences Continuous Random Variables Recall: A continuous random variable X satisfies: 1 its range is the union of one or more real number intervals; 2 P(X = c) = 0 for every c in the range of X . Examples: The depth of a lake at a randomly chosen location. The pH of a random sample of effluent. The precipitation on a randomly chosen day is not a continuous random variable: its range is [0; 1), and P(X = c) = 0 for any c > 0, but P(X = 0) > 0. 1 / 12 Continuous Random Variables Probability Density Function ST 380 Probability and Statistics for the Physical Sciences Discretized Data Suppose that we measure the depth of the lake, but round the depth off to some unit. The rounded value Y is a discrete random variable; we can display its probability mass function as a bar graph, because each mass actually represents an interval of values of X . In R source("discretize.R") discretize(0.5) 2 / 12 Continuous Random Variables Probability Density Function ST 380 Probability and Statistics for the Physical Sciences As the rounding unit becomes smaller, the bar graph more accurately represents the continuous distribution: discretize(0.25) discretize(0.1) When the rounding unit is very small, the bar graph approximates a smooth function: discretize(0.01) plot(f, from = 1, to = 5) 3 / 12 Continuous Random Variables Probability Density Function ST 380 Probability and Statistics for the Physical Sciences The probability that X is between two values a and b, P(a ≤ X ≤ b), can be approximated by P(a ≤ Y ≤ b).
    [Show full text]
  • Binomial and Normal Distributions
    Probability distributions Introduction What is a probability? If I perform n experiments and a particular event occurs on r occasions, the “relative frequency” of r this event is simply . This is an experimental observation that gives us an estimate of the n probability – there will be some random variation above and below the actual probability. If I do a large number of experiments, the relative frequency gets closer to the probability. We define the probability as meaning the limit of the relative frequency as the number of experiments tends to infinity. Usually we can calculate the probability from theoretical considerations (number of beads in a bag, number of faces of a dice, tree diagram, any other kind of statistical model). What is a random variable? A random variable (r.v.) is the value that might be obtained from some kind of experiment or measurement process in which there is some random uncertainty. A discrete r.v. takes a finite number of possible values with distinct steps between them. A continuous r.v. takes an infinite number of values which vary smoothly. We talk not of the probability of getting a particular value but of the probability that a value lies between certain limits. Random variables are given names which start with a capital letter. An r.v. is a numerical value (eg. a head is not a value but the number of heads in 10 throws is an r.v.) Eg the score when throwing a dice (discrete); the air temperature at a random time and date (continuous); the age of a cat chosen at random.
    [Show full text]
  • On Median and Quartile Sets of Ordered Random Variables*
    ISSN 2590-9770 The Art of Discrete and Applied Mathematics 3 (2020) #P2.08 https://doi.org/10.26493/2590-9770.1326.9fd (Also available at http://adam-journal.eu) On median and quartile sets of ordered random variables* Iztok Banicˇ Faculty of Natural Sciences and Mathematics, University of Maribor, Koroskaˇ 160, SI-2000 Maribor, Slovenia, and Institute of Mathematics, Physics and Mechanics, Jadranska 19, SI-1000 Ljubljana, Slovenia, and Andrej Marusiˇ cˇ Institute, University of Primorska, Muzejski trg 2, SI-6000 Koper, Slovenia Janez Zerovnikˇ Faculty of Mechanical Engineering, University of Ljubljana, Askerˇ cevaˇ 6, SI-1000 Ljubljana, Slovenia, and Institute of Mathematics, Physics and Mechanics, Jadranska 19, SI-1000 Ljubljana, Slovenia Received 30 November 2018, accepted 13 August 2019, published online 21 August 2020 Abstract We give new results about the set of all medians, the set of all first quartiles and the set of all third quartiles of a finite dataset. We also give new and interesting results about rela- tionships between these sets. We also use these results to provide an elementary correctness proof of the Langford’s doubling method. Keywords: Statistics, probability, median, first quartile, third quartile, median set, first quartile set, third quartile set. Math. Subj. Class. (2020): 62-07, 60E05, 60-08, 60A05, 62A01 1 Introduction Quantiles play a fundamental role in statistics: they are the critical values used in hypoth- esis testing and interval estimation. Often they are the characteristics of distributions we usually wish to estimate. The use of quantiles as primary measure of performance has *This work was supported in part by the Slovenian Research Agency (grants J1-8155, N1-0071, P2-0248, and J1-1693.) E-mail addresses: [email protected] (Iztok Banic),ˇ [email protected] (Janez Zerovnik)ˇ cb This work is licensed under https://creativecommons.org/licenses/by/4.0/ 2 Art Discrete Appl.
    [Show full text]
  • The Upper Quartile Point: Below). the Interquartile Range (IQR)
    502 Median, quartiles, and percentiles are statistical Sample Application (§) ways to summarize the location and spread of experi- Suppose the mental data. They are a robust form of data reduc- EF0-EF1 tornadoes) are: , where hundreds or thousands of data are rep- tion 1.5, 0.8, 1.4, 1.8, 8.2, 1.0, 0.7, 0.5, 1.2 resented by several summary statistics. Find the median and interquartile range. Compare First, your data from the smallest to largest sort with the mean and standard deviation. values. This is easy to do on a computer. Each data point now has a rank associated with it, such as 1st Find the Answer: (smallest value), 2nd, 3rd, ... nth (largest value). Let x Given: data set listed above. = the value of the rth Find: q r = (1/2)·(n+1)] 0.5 Mean σ is called the median, and the data value x of this middle data point is the median value (q ). Namely, 0.5 First sort the data in ascending order: q = x for n = odd 0.5 Values ( ): 0.5, 0.7, 0.8, 1.0, 1.2, 1.4, 1.5, 1.8, 8.2 If n is an even number, there is no data point exactly in r): 1 2 3 4 5 6 7 8 9 the middle, so use the average of the 2 closest points: Middle: ^ q = 0.5· [x + x ] for n = even 0.5 +1 Thus, the median point is the 5th The median is a measure of the location or center data set, and corresponding value of that data point is of the data median = q = z = 1.2 km.
    [Show full text]
  • Median Polish with Covariate on Before and After Data
    IOSR Journal of Mathematics (IOSR-JM) e-ISSN: 2278-5728, p-ISSN: 2319-765X. Volume 12, Issue 3 Ver. IV (May. - Jun. 2016), PP 64-73 www.iosrjournals.org Median Polish with Covariate on Before and After Data Ajoge I., M.B Adam, Anwar F., and A.Y., Sadiq Abstract: The method of median polish with covariate is use for verifying the relationship between before and after treatment data. The relationship is base on the yield of grain crops for both before and after data in a classification of contingency table. The main effects such as grand, column and row effects were estimated using median polish algorithm. Logarithm transformation of data before applying median polish is done to obtain reasonable results as well as finding the relationship between before and after treatment data. The data to be transformed must be positive. The results of median polish with covariate were evaluated based on the computation and findings showed that the median polish with covariate indicated a relationship between before and after treatment data. Keywords: Before and after data, Robustness, Exploratory data analysis, Median polish with covariate, Logarithmic transformation I. Introduction This paper introduce the background of the problem by using median polish model for verifying the relationship between before and after treatment data in classification of contingency table. We first developed the algorithm of median polish, followed by sweeping of columns and rows medians. This computational procedure, as noted by Tukey (1977) operates iteratively by subtracting the median of each column from each observation in that column level, and then subtracting the median from each row from this updated table.
    [Show full text]
  • Skewness and Kurtosis 2
    A distribution is said to be 'skewed' when the mean and the median fall at different points in the distribution, and the balance (or centre of gravity) is shifted to one side or the other-to left or right. Measures of skewness tell us the direction and the extent of Skewness. In symmetrical distribution the mean, median and mode are identical. The more the mean moves away from the mode, the larger the asymmetry or skewness A frequency distribution is said to be symmetrical if the frequencies are equally distributed on both the sides of central value. A symmetrical distribution may be either bell – shaped or U shaped. 1- Bell – shaped or unimodel Symmetrical Distribution A symmetrical distribution is bell – shaped if the frequencies are first steadily rise and then steadily fall. There is only one mode and the values of mean, median and mode are equal. Mean = Median = Mode A frequency distribution is said to be skewed if the frequencies are not equally distributed on both the sides of the central value. A skewed distribution may be • Positively Skewed • Negatively Skewed • ‘L’ shaped positively skewed • ‘J’ shaped negatively skewed Positively skewed Mean ˃ Median ˃ Mode Negatively skewed Mean ˂ Median ˂ Mode ‘L’ Shaped Positively skewed Mean ˂ Mode Mean ˂ Median ‘J’ Shaped Negatively Skewed Mean ˃ Mode Mean ˃ Median In order to ascertain whether a distribution is skewed or not the following tests may be applied. Skewness is present if: The values of mean, median and mode do not coincide. When the data are plotted on a graph they do not give the normal bell shaped form i.e.
    [Show full text]
  • BIOINFORMATICS ORIGINAL PAPER Doi:10.1093/Bioinformatics/Btm145
    Vol. 23 no. 13 2007, pages 1648–1657 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm145 Data and text mining An efficient method for the detection and elimination of systematic error in high-throughput screening Vladimir Makarenkov1,*, Pablo Zentilli1, Dmytro Kevorkov1, Andrei Gagarin1, Nathalie Malo2,3 and Robert Nadon2,4 1Department d’informatique, Universite´ du Que´ bec a` Montreal, C.P.8888, s. Centre Ville, Montreal, QC, Canada, H3C 3P8, 2McGill University and Genome Quebec Innovation Centre, 740 Dr. Penfield Ave., Montreal, QC, Canada, H3A 1A4, 3Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, 1020 Pine Av. West, Montreal, QC, Canada, H3A 1A4 and 4Department of Human Genetics, McGill University, 1205 Dr. Penfield Ave., N5/13, Montreal, QC, Canada, H3A 1B1 Received on December 7, 2006; revised on February 22, 2007; accepted on April 10, 2007 Advance Access publication April 26, 2007 Associate Editor: Jonathan Wren ABSTRACT experimental artefacts that might confound with important Motivation: High-throughput screening (HTS) is an early-stage biological or chemical effects. The description of several process in drug discovery which allows thousands of chemical methods for quality control and correction of HTS data can compounds to be tested in a single study. We report a method for be found in Brideau et al. (2003), Gagarin et al. (2006a), Gunter correcting HTS data prior to the hit selection process (i.e. selection et al. (2003), Heuer et al. (2003), Heyse (2002), Kevorkov and of active compounds). The proposed correction minimizes the Makarenkov (2005), Makarenkov et al. (2006), Malo et al. impact of systematic errors which may affect the hit selection in (2006) and Zhang et al.
    [Show full text]
  • Exploratory Data Analysis
    Copyright © 2011 SAGE Publications. Not for sale, reproduction, or distribution. 530 Data Analysis, Exploratory The ultimate quantitative extreme in textual data Klingemann, H.-D., Volkens, A., Bara, J., Budge, I., & analysis uses scaling procedures borrowed from McDonald, M. (2006). Mapping policy preferences II: item response theory methods developed originally Estimates for parties, electors, and governments in in psychometrics. Both Jon Slapin and Sven-Oliver Eastern Europe, European Union and OECD Proksch’s Poisson scaling model and Burt Monroe 1990–2003. Oxford, UK: Oxford University Press. and Ko Maeda’s similar scaling method assume Laver, M., Benoit, K., & Garry, J. (2003). Extracting that word frequencies are generated by a probabi- policy positions from political texts using words as listic function driven by the author’s position on data. American Political Science Review, 97(2), some latent scale of interest and can be used to 311–331. estimate those latent positions relative to the posi- Leites, N., Bernaut, E., & Garthoff, R. L. (1951). Politburo images of Stalin. World Politics, 3, 317–339. tions of other texts. Such methods may be applied Monroe, B., & Maeda, K. (2004). Talk’s cheap: Text- to word frequency matrixes constructed from texts based estimation of rhetorical ideal-points (Working with no human decision making of any kind. The Paper). Lansing: Michigan State University. disadvantage is that while the scaled estimates Slapin, J. B., & Proksch, S.-O. (2008). A scaling model resulting from the procedure represent relative dif- for estimating time-series party positions from texts. ferences between texts, they must be interpreted if a American Journal of Political Science, 52(3), 705–722.
    [Show full text]
  • Expression Summarization Interrogate for Each Gene, Called a Probe Set
    Expression Quantification: Affy Affymetrix Genechip is an oligonucleotide array consisting of a several perfect match (PM) and their corresponding mismatch (MM) probes that interrogate for a single gene. · PM is the exact complementary sequence of the target genetic sequence, composed of 25 base pairs · MM probe, which has the same sequence with exception that the middle base (13th) position has been reversed · There are roughly 11­20 PM/MM probe pairs that Expression summarization interrogate for each gene, called a probe set Mikhail Dozmorov Fall 2016 2/36 Affymetrix Expression Array Preprocessing Affymetrix Expression Array Preprocessing Background adjustment Normalization Remove intensity contributions from optical noise and cross­ Remove array effect, make array comparable hybridization 1. Constant or linear (MAS) · so the true measurements aren't affected by neighboring measurements 2. Rank invariant (dChip) 3. Quantile (RMA) 1. PM­MM 2. PM only 3. RMA 4. GC­RMA 3/36 4/36 Affymetrix Expression Array Preprocessing Expression Index estimates Summarization Summarization Combine probe intensities into one measure per gene · Reduce the 11­20 probe intensities on each array to a single number for gene expression. 1. MAS 4.0, MAS 5.0 · The goal is to produce a measure that will serve as an 2. Li­Wong (dChip) indicator of the level of expression of a transcript using 3. RMA the PM (and possibly MM values). 5/36 6/36 Expression Index estimates Expression Index estimates Single Chip Multiple Chip · MAS 4.0 (avgDiff): no longer recommended for use due · MBEI (Li­Wong): a multiplicative model (Model based to many flaws.
    [Show full text]