This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book.

Index

A-value, 160 Checking model assumptions, 124 Accuracy, 57 Chi-square test, 85, 97 Adjusted R2, 117, 123 goodness-of-fit, 98 Agglomerative clustering, 150 independence, 99 Alternative Classification, 145–147 hypothesis, 79 error, 147 one-sided, 79 rule, 147 two-sided, 79 Clustering, 145, 148 Analysis of , 131 k-means, 153 ANOVA, 113 agglomerative, 150 ANOVA, 113, 131 divisive, 150 assumptions, 140 hierarchical, 150 F-test, 133 partitional, 150, 153 model, 60 Common response, 48, 49 one-way, 132 Complete linkage, 152 two-way, 136 Confidence, 72 Association, 47 Confidence interval, 65, 113 Assumption, 7, 67 computing, 72 ANOVA, 140 interpretation, 67 constant variance, 125 large sample mean, 72 normality, 125 population proportion, 76 Average, 20 small sample mean, 73 Average linkage, 152 Confidence level, 67, 70, 72 Confounding, 48 Bar plot, 24 Constant variance assumption, 125 Bayesian , 145, 176 Contingency table, 19, 60, 98 Bell curve, 33 Continuous variable, 18 Bias, 56 Correlation, 46 Bin, 26 Correlation coefficient, 115, 116 Binomial Critical value, 70 coefficient, 32 computing, 71 distribution, 31 Cumulative probability, 32 Biological replication, 59 Data variation, 58 categorical, 19, 24, 25 BLAST, 110 quantitative, 19, 27, 28 Bootstrap, 64 transformation, 37 Box plot, 28 Dependent variable, 51 Descriptive statistics, 17 Categorical, 60 Design of experiments, 51 data, 19 Deterministic model, 51 variable, 113 Differential expression, 132 Causation, 48 Discrete variable, 17 Cause and effect, 48 Discriminant function, 147 Central Limit Theorem, 39, 66 Discrimination, 147

181

© 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. 182 INDEX

Distance measure, 149 errors, 82 Distribution, 18, 31, 52 five step procedure, 81 binomial, 31 power, 83 center, 23 Normal, 33 Independence test, 97, 99, 105 Divisive clustering, 150 Independent variable, 51 Dot plot, 27 Inference, 65 Dummy variable, 142 Inferential statistics, 5 Dye-swap, 57, 162 Interaction effect, 137 Intercept, 55, 116 E-value, 110 Interquartile range, 21, 24 Effect, 58 Effect size, 61 K-means clustering, 153 Error bars, 44 Least squares regression, 115 standard, 21 Leverage, 127 Estimate, 40 Likelihood, 101 Euclidean distance, 149 Likelihood ratio test, 101 Expected value, 20 Linkage, 152 Experimental design, 6 average, 152 Explanatory variable, 51, 113 complete, 152 Exploratory statistics, 5 single, 152 Loading, 157 F-test, 85, 95, 122 Loess normalization, 161 ANOVA, 133 Log-odds score, 102 Factor, 53 Logistic regression, 60, 114, 120 Factor effect, 137 M False discovery rate, 169 -value, 160 Fisher’s exact test, 85, 91, 100, 105 MA-plot, 160 Fitting a model, 54 Mahalanobis distance, 149 Fold-change, 160 Matched sample, 56 Frequency, 26 Mathematical model, 51 relative, 25 Maximum likelihood, 145, 174, 175 Frequentist statistics, 145, 176 Mean, 20, 23 Means plot, 137 Global normalization, 161 Median, 20, 23 Goodness-of-fit test, 85, 97, 98 Microarray Graphs, 24 data analysis, 145, 158, 163 dye-swap, 164 Heteroscedastic t-test, 88 experiment, 132 Hierarchical clustering, 150 normalization, 160 Histogram, 26 Model Homoscedastic t-test, 87 deterministic, 51 Hypothesis mathematical, 51 alternative, 79 statistical, 51, 53 null, 79 Model building, 123 Hypothesis test, 79, 113 Model assumptions, 7 assumptions, 86 Model parameter, 55, 113

© 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. Index 183

Model selection, 59 One-way ANOVA, 132 Model utility test, 123 assumptions, 136 Modified box plot, 29 model, 114 Monte Carlo, 109, 110 Ordinal variable, 17 Multiple linear regression, 121 Outlier, 21, 125, 131 Multiple R-squared, 117 Multiple regression, 114 p-value, 81, 83, 110 Multiple testing, 168 Paired t-test, 85, 88 Multivariate model, 113 Parameter, 52, 55 population, 65 Next-generation sequencing Partitional clustering, 150, 153 (NGS), 169 Pearson’s product moment correlation barcoding, 172 measure, 46 base calling, 171 Percentile, 20, 36 bridge amplification, 170 Permutation test, 85, 88, 108 data analysis, 169 Pie chart, 25 experimental overview, 170 Plot flow cell, 170 bar plot, 24, 30 library size, 170 boxplot,28,30 sample preparation, 170 box plot, modified, 29 sequencing depth, 171 dot plot, 27 statistical issues, 172 histogram, 26, 30 NGS, data analysis, 145 pie chart, 25, 30 Noise, 58 PP-plot, 36 Non-parametric test, 103 QQ-plot, 36 , 33 rankit, 37 mean, 34 , 27 parameters, 34 strip chart, 27 percentile, 36 Population, 39, 53, 55, 65 shape, 33 parameter, 65 standard, 34 proportion, 91 standard deviation, 34 Post-hoc test, 96 Normality Posterior distribution, 177 assumption, 74, 125 Power, 83 checking for, 37 Precision, 57, 71 Normalization, 160 Predictor variable, 51, 113 dye-swap, 162 Principal component analysis, 145, 156 global, 161 Print-tip normalization, 162 loess, 161 Prior distribution, 177 print-tip, 162 Probability plot, 36 Null hypothesis, 79 , 31 Probe set, 159 Observation, 18 Proportion, 91 Oligonucleotide array, 159 One-sample t-test, 85, 86 QQ-plot, 86 One-sample z-test, 85, 91 Qualitative variable, 17 One-sided alternative, 79 Quantile plot, 37

© 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. 184 INDEX

Quantitative, 60 Side-by-side box plot, 29 data, 19 Significance, 83 variable, 17, 113 Significance level, 61, 70, 81–83 Quantitative trait locus, 177 Simple linear regression, 114, 115 Quartile, 20 Single linkage, 152 Slope, 55, 116 Random sample, 56 Sources of variation, 6 Random variable, 18, 30 Spotted array, 159 Range, 21, 24 Spread, 19, 24 Rankit plot, 37 Standard deviation, 20, 24, Regression, 113 43, 44 assumptions, 124 Standard error, 21, 43, 45, 70 case study, 127 Standard normal distribution, 34 least squares, 115 Statistic, 55, 65 logistic, 60, 114, 120 Statistical classification, 146 model building, 123 Statistical inference, 65, 113 multiple, 114, 121 Statistical model, 51, 53 outliers, 125 Statistical significance, 83, 84 simple linear, 115 Statistical software, 8 Regression model, 60 Stratified sample, 56 checking assumptions, 119 Stratum, 56 Regression parameters Strip chart, 27 hypothesis testing, 118 Summary statistics, 21 interpretation, 117 Symmetry, 26 Reject, 80 Relative frequency, 25 t-test, 86 Replication, 62 Table, contingency, 19 biological, 59 Tail probability, 34, 35 technical, 59 Technical replication, 59 Resampling, 62 Technical variation, 59 Residual, 115 Test statistic, 6, 80, 81 Residual standard error, 117 Training data, 147 Response variable, 51, 113 Transformation, 37 Tree diagram, 150, 154 Sample, 39, 53, 55, 65 Tukey’s test, 85, 96 biased, 56 Two-sample t-test, 85, 87 matched, 56 Two-sample z-test, 85, 93 random, 56 Two-sided alternative, 79 stratified, 56 Two-way ANOVA, 136 Sample mean, 40, 41 model, 114 Sample proportion, 39, 40, 91 Type I error, 82 Sample size, 58, 60, 72 Type II error, 82 calculation, 77 Sample statistic, 43, 55 Uncertainty, 65 Sampling, 55 Univariate model, 113 Scatter plot, 27 Scheff´e test, 85, 96 Variability, 24, 61, 72

© 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. Index 185

Variable, 17 response, 51, 113 categorical, 97, 113 selection, 124 continuous, 17 transformation, 129 dependent, 51 Variance, 20, 24 discrete, 17 Variation explanatory, 51, 113 biological, 58 factor, 17 sources of, 6 independent, 51 technical, 59 ordinal, 17 predictor, 51, 113 Wilcoxon-Mann-Whitney test, 85, 88, qualitative, 17 104 quantitative, 113 random, 18, 30 z-test, 91

© 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book.

Index of Worked Out Examples

ANOVA in microarray experiment, 132 Likelihood, 101 ANOVA model for analysis of microar- ray data, 141 Maximum likelihood estimation of a population proportion, 175 Bar plot, 24 Maximum likelihood parameter estima- Binomial distribution, 31 tion, 175 Biological and technical replication, 59 Microarray ANOVA enumeration, 165 Multiple linear regression, 127 Categorical variables in regression, 142 Multiple testing, 168 Causation, common response, and con- founding, 48 One-sample z-test for a population pro- Central Limit Theorem, 42 portion, 91 Chi-squared test for goodness-of-fit, 99 One- and two-sided alternatives in hy- Chi-squared test for independence, 99 pothesis testing, 79 Classification, 146 One-way Anova, 132 Computing normal percentiles, 36 Computing normal probabilities, 35 Parameter estimation in dye-swap mi- Confidence interval, 68 croarray experiment, 166 Confidence interval for a population Permutation test, 108 proportion, 76 Predictor and response variables, 51 Confidence interval for Normal Data, Probability distribution, 30 66 Publication bias, 57 Contingency table, 19, 98 Correlation and causation, 47 Random variable vs. observation, 18 Determining outliers, 21 Resampling, 62 Differential gene expression, 80 Dissimilarity measure, 149 Sample proportion, 39 Dye bias in microarray experiments, 57 Sample size calculation, 77 Sampling bias, 56 F -test, 95 Side-by-side box plot, 29 Fisher’s exact test, 107 Simple linear regression, 117 Frequentist and Bayesian QTL analy- Small sample confidence interval for a sis, 177 mean, 73 Standard deviation vs. standard error, Hierarchical clustering, 150 43 Histogram, 26 Two-sample z-test, 93 Independent sample t-test, 88 Two-Way ANOVA, 136 K-means clustering, 156 Variable types, 17 Large sample confidence interval for mean, 72 Wilcoxon-Mann-Whitney test, 105

186

© 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book.

Index of R Commander Commands

Anova Forward model selection, 124 one-way, 133 two-way, 137 Goodness-of-fit test, 98 aov, 133 hclust, 152 Backward model selection, 124 Help, 16 Bar plot, 26 Hierarchical clustering, 152 Binomial Histogram, 26 coefficient, 32 probability, 32 Importing data, 12 tail probability, 32 Interquartile range, 21 Box plot, 28 k-means clustering, 155 cex,14 KMeans, 155 cex.lab,14 cex.main,14 Likelihood ratio test, 102 Chi-squared lm, 116 p-value, 102 Logistic regression, 121 goodness-of-fit test, 98 test for independence, 100 main,14 chisq.test,98 Maximum, 21 choose,32 Mean, 21 col,14 Means plot, 137 Computing critical values for normal Median, 21 and t-distributions, 70 Minimum, 21 Confidence interval for a population Missing data identifier, 13 mean, 75 Model selection, 124 Confidence interval for a population Multiple linear regression, 122 proportion, 77 Contingency table, 98 Normal Converting numeric variables to fac- percentiles, 36 tors, 13 probabilities, 35 cor,46 quantiles, 36 Correlation coefficient, 46 One-sample t-test, 87 dbinom,33 One-way Anova, 133 Diagnostic plots, 125, 127 Output save, 15 Entering data, 12 plot,14 F -test, 96 Paired sample t-test, 89 Fisher’s exact test, 107 pbinom,33 fisher.test, 107 pch,14

187

© 2015 Cold Spring Harbor Laboratory Press. All rights reserved. This is a free sample of content from Using R at the Bench: Step-by-Step Analytics for Biologists. Click here for more information or to buy the book. INDEX OF R COMMANDER COMMANDS 188

pchisq, 102 Scatter plot, 28 Percentile, 21 Script Pie chart, 26 save, 15 Plot Simple linear regression, 116 bar plot, 26 Standard deviation, 21 box plot, 28 Strip chart, 27 dot plot, 27 Summary statistics, 21 histogram, 26 pie chart, 26 t.test, 87–89 scatter plot, 28 Transformation, 129 strip chart, 27 Tukey’s post hoc test, 97 pnorm,92 Two independent sample t-test, 88 Principal component analysis, 157 Two-way Anova, 137 princomp, 157 pt,90 Variable convert, 13 qnorm,70 Variable selection, 124 QQ-plot construction, 37 Variable transformation, 129 qt,70 Variance, 21 Range, 21 wilcox.test Regression parameters, 117 ,104 Residual check for linear model, 120 Wilcoxon-Mann-Whitney test, 104 Residual plot, 125, 127 xlab,14 Save output, 15 Save R script, 15 ylab,14

© 2015 Cold Spring Harbor Laboratory Press. All rights reserved.