Using R at the Bench: Step-By-Step Analytics for Biologists
Total Page:16
File Type:pdf, Size:1020Kb
This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. Index A-value, 160 Checking model assumptions, 124 Accuracy, 57 Chi-square test, 85, 97 Adjusted R2, 117, 123 goodness-of-fit, 98 Agglomerative clustering, 150 independence, 99 Alternative Classification, 145–147 hypothesis, 79 error, 147 one-sided, 79 rule, 147 two-sided, 79 Clustering, 145, 148 Analysis of variance, 131 k-means, 153 ANOVA, 113 agglomerative, 150 ANOVA, 113, 131 divisive, 150 assumptions, 140 hierarchical, 150 F-test, 133 partitional, 150, 153 model, 60 Common response, 48, 49 one-way, 132 Complete linkage, 152 two-way, 136 Confidence, 72 Association, 47 Confidence interval, 65, 113 Assumption, 7, 67 computing, 72 ANOVA, 140 interpretation, 67 constant variance, 125 large sample mean, 72 normality, 125 population proportion, 76 Average, 20 small sample mean, 73 Average linkage, 152 Confidence level, 67, 70, 72 Confounding, 48 Bar plot, 24 Constant variance assumption, 125 Bayesian statistics, 145, 176 Contingency table, 19, 60, 98 Bell curve, 33 Continuous variable, 18 Bias, 56 Correlation, 46 Bin, 26 Correlation coefficient, 115, 116 Binomial Critical value, 70 coefficient, 32 computing, 71 distribution, 31 Cumulative probability, 32 Biological replication, 59 Data variation, 58 categorical, 19, 24, 25 BLAST, 110 quantitative, 19, 27, 28 Bootstrap, 64 transformation, 37 Box plot, 28 Dependent variable, 51 Descriptive statistics, 17 Categorical, 60 Design of experiments, 51 data, 19 Deterministic model, 51 variable, 113 Differential expression, 132 Causation, 48 Discrete variable, 17 Cause and effect, 48 Discriminant function, 147 Central Limit Theorem, 39, 66 Discrimination, 147 181 © 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. 182 INDEX Distance measure, 149 errors, 82 Distribution, 18, 31, 52 five step procedure, 81 binomial, 31 power, 83 center, 23 Normal, 33 Independence test, 97, 99, 105 Divisive clustering, 150 Independent variable, 51 Dot plot, 27 Inference, 65 Dummy variable, 142 Inferential statistics, 5 Dye-swap, 57, 162 Interaction effect, 137 Intercept, 55, 116 E-value, 110 Interquartile range, 21, 24 Effect, 58 Effect size, 61 K-means clustering, 153 Error bars, 44 Least squares regression, 115 standard, 21 Leverage, 127 Estimate, 40 Likelihood, 101 Euclidean distance, 149 Likelihood ratio test, 101 Expected value, 20 Linkage, 152 Experimental design, 6 average, 152 Explanatory variable, 51, 113 complete, 152 Exploratory statistics, 5 single, 152 Loading, 157 F-test, 85, 95, 122 Loess normalization, 161 ANOVA, 133 Log-odds score, 102 Factor, 53 Logistic regression, 60, 114, 120 Factor effect, 137 M False discovery rate, 169 -value, 160 Fisher’s exact test, 85, 91, 100, 105 MA-plot, 160 Fitting a model, 54 Mahalanobis distance, 149 Fold-change, 160 Matched sample, 56 Frequency, 26 Mathematical model, 51 relative, 25 Maximum likelihood, 145, 174, 175 Frequentist statistics, 145, 176 Mean, 20, 23 Means plot, 137 Global normalization, 161 Median, 20, 23 Goodness-of-fit test, 85, 97, 98 Microarray Graphs, 24 data analysis, 145, 158, 163 dye-swap, 164 Heteroscedastic t-test, 88 experiment, 132 Hierarchical clustering, 150 normalization, 160 Histogram, 26 Model Homoscedastic t-test, 87 deterministic, 51 Hypothesis mathematical, 51 alternative, 79 statistical, 51, 53 null, 79 Model building, 123 Hypothesis test, 79, 113 Model assumptions, 7 assumptions, 86 Model parameter, 55, 113 © 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. Index 183 Model selection, 59 One-way ANOVA, 132 Model utility test, 123 assumptions, 136 Modified box plot, 29 model, 114 Monte Carlo, 109, 110 Ordinal variable, 17 Multiple linear regression, 121 Outlier, 21, 125, 131 Multiple R-squared, 117 Multiple regression, 114 p-value, 81, 83, 110 Multiple testing, 168 Paired t-test, 85, 88 Multivariate model, 113 Parameter, 52, 55 population, 65 Next-generation sequencing Partitional clustering, 150, 153 (NGS), 169 Pearson’s product moment correlation barcoding, 172 measure, 46 base calling, 171 Percentile, 20, 36 bridge amplification, 170 Permutation test, 85, 88, 108 data analysis, 169 Pie chart, 25 experimental overview, 170 Plot flow cell, 170 bar plot, 24, 30 library size, 170 boxplot,28,30 sample preparation, 170 box plot, modified, 29 sequencing depth, 171 dot plot, 27 statistical issues, 172 histogram, 26, 30 NGS, data analysis, 145 pie chart, 25, 30 Noise, 58 PP-plot, 36 Non-parametric test, 103 QQ-plot, 36 Normal distribution, 33 rankit, 37 mean, 34 scatter plot, 27 parameters, 34 strip chart, 27 percentile, 36 Population, 39, 53, 55, 65 shape, 33 parameter, 65 standard, 34 proportion, 91 standard deviation, 34 Post-hoc test, 96 Normality Posterior distribution, 177 assumption, 74, 125 Power, 83 checking for, 37 Precision, 57, 71 Normalization, 160 Predictor variable, 51, 113 dye-swap, 162 Principal component analysis, 145, 156 global, 161 Print-tip normalization, 162 loess, 161 Prior distribution, 177 print-tip, 162 Probability plot, 36 Null hypothesis, 79 Probability distribution, 31 Probe set, 159 Observation, 18 Proportion, 91 Oligonucleotide array, 159 One-sample t-test, 85, 86 QQ-plot, 86 One-sample z-test, 85, 91 Qualitative variable, 17 One-sided alternative, 79 Quantile plot, 37 © 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. 184 INDEX Quantitative, 60 Side-by-side box plot, 29 data, 19 Significance, 83 variable, 17, 113 Significance level, 61, 70, 81–83 Quantitative trait locus, 177 Simple linear regression, 114, 115 Quartile, 20 Single linkage, 152 Slope, 55, 116 Random sample, 56 Sources of variation, 6 Random variable, 18, 30 Spotted array, 159 Range, 21, 24 Spread, 19, 24 Rankit plot, 37 Standard deviation, 20, 24, Regression, 113 43, 44 assumptions, 124 Standard error, 21, 43, 45, 70 case study, 127 Standard normal distribution, 34 least squares, 115 Statistic, 55, 65 logistic, 60, 114, 120 Statistical classification, 146 model building, 123 Statistical inference, 65, 113 multiple, 114, 121 Statistical model, 51, 53 outliers, 125 Statistical significance, 83, 84 simple linear, 115 Statistical software, 8 Regression model, 60 Stratified sample, 56 checking assumptions, 119 Stratum, 56 Regression parameters Strip chart, 27 hypothesis testing, 118 Summary statistics, 21 interpretation, 117 Symmetry, 26 Reject, 80 Relative frequency, 25 t-test, 86 Replication, 62 Table, contingency, 19 biological, 59 Tail probability, 34, 35 technical, 59 Technical replication, 59 Resampling, 62 Technical variation, 59 Residual, 115 Test statistic, 6, 80, 81 Residual standard error, 117 Training data, 147 Response variable, 51, 113 Transformation, 37 Tree diagram, 150, 154 Sample, 39, 53, 55, 65 Tukey’s test, 85, 96 biased, 56 Two-sample t-test, 85, 87 matched, 56 Two-sample z-test, 85, 93 random, 56 Two-sided alternative, 79 stratified, 56 Two-way ANOVA, 136 Sample mean, 40, 41 model, 114 Sample proportion, 39, 40, 91 Type I error, 82 Sample size, 58, 60, 72 Type II error, 82 calculation, 77 Sample statistic, 43, 55 Uncertainty, 65 Sampling, 55 Univariate model, 113 Scatter plot, 27 Scheff´e test, 85, 96 Variability, 24, 61, 72 © 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. Index 185 Variable, 17 response, 51, 113 categorical, 97, 113 selection, 124 continuous, 17 transformation, 129 dependent, 51 Variance, 20, 24 discrete, 17 Variation explanatory, 51, 113 biological, 58 factor, 17 sources of, 6 independent, 51 technical, 59 ordinal, 17 predictor, 51, 113 Wilcoxon-Mann-Whitney test, 85, 88, qualitative, 17 104 quantitative, 113 random, 18, 30 z-test, 91 © 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. Index of Worked Out Examples ANOVA in microarray experiment, 132 Likelihood, 101 ANOVA model for analysis of microar- ray data, 141 Maximum likelihood estimation of a population proportion, 175 Bar plot, 24 Maximum likelihood parameter estima- Binomial distribution, 31 tion, 175 Biological and technical replication, 59 Microarray ANOVA enumeration, 165 Multiple linear regression, 127 Categorical variables in regression, 142 Multiple testing, 168 Causation, common response, and con- founding, 48 One-sample z-test for a population pro- Central Limit Theorem, 42 portion, 91 Chi-squared test for goodness-of-fit, 99 One- and two-sided alternatives in hy- Chi-squared test for independence, 99 pothesis testing, 79 Classification, 146 One-way Anova, 132 Computing normal percentiles, 36 Computing normal probabilities, 35 Parameter estimation in dye-swap mi- Confidence interval, 68 croarray experiment, 166 Confidence interval for a population Permutation test, 108 proportion, 76 Predictor and response variables, 51 Confidence interval for Normal Data, Probability distribution, 30 66 Publication