Glossary
Understandable Statistics, 8th edition, by Brase and Brase
Houghton Mifflin Company Boston New York
A
Addition rule rule to compute the probability that on a single trial, the event (A or B)
will occur. See Section 4.2.
Alpha (∀) represents the probability of a type I error in a statistical test. See also level
of significance. See Section 9.1.
Alternate hypothesis (H1) a statistical hypothesis that is constructed in such a way that
it is the hypothesis to be accepted when the null hypothesis is rejected. See also null
hypothesis. See Section 9.1.
Analysis of variance statistical analysis of population variances used to test hypotheses
concerning means of populations. See Section 11.5.
ANOVA See Analysis of variance.
Arithmetic mean : population, Ë sample sum of a collection of data values divided by
the number of values. Often referred to as simply the mean. See Section 3.1.
Average any of several statistical measures used to designate the center of a collection
of data. See Section 3.1. B
1 Bar graph a statistical graph that uses either vertical or horizontal bars representing
outcomes or frequencies of a data set. See Section 2.1.
Beta (∃)
used in hypothesis testing to represent the probability of a type II error. See Section
9.1.
used in linear regression to represent the population slope of the least-squares line.
See Section 10.4.
Bimodal distribution a distribution having two modes. See Section 2.2
Binomial experiment a statistical experiment with a fixed number of independent trials
n in which each trial has only two outcomes and the probability of success p remains
the same for each trial. See Section 5.2
Binomial distribution the set of all possible outcomes of a binomial experiment and
corresponding probabilities associated with these outcomes. See Section 5.2
Block used in analysis of variance, a group or collection of similar individuals. See
Section 11.6
Box-and-whisker plot graphical representation of the spread of a data set showing
quartiles. See Section 3.4.
Boxplot See box-and-whisker plot. C
Categorical data data that is separated into categories which are identifiable by non-
numeric properties. See Section 1.1
2 Census data collected from every member in an entire population. See Section 1.3.
Central limit theorem a theorem describing the distribution of sample means. As the
sample size n increases the distribution of sample means becomes normally
distributed with mean : and standard deviation σ / n . See Section 7.2.
Chebyshev’s theorem a theorem that uses the standard deviation and mean to give
information about the distribution of a collection of data. See Section 3.2.
Chi-square distribution probability distribution of a continuous random variable first
introduced in the overview of Chapter 11.
Circle graph See pie chart.
Class boundaries upper and lower values of a class for a grouped frequency
distribution adjusted so there are no gaps between consecutive classes. See Section
2.2.
Class midpoint used in a frequency table, the value midway between the lower class
limit and the upper class limit. See Section 2.2.
Class width difference between the upper class boundary and the lower class boundary.
See Section 2.2.
Cluster sampling a sampling method using preexisting or already determined sectors,
clusters, or natural groupings. See Section 1.2.
Coefficient of determination r2 used in linear regression to describe the variation in y
that can be explained using the variation in x and the least-squares regression model.
See Section 10.3.
3 Coefficient of variation CV the standard deviation of a data set divided by its mean;
the result is usually expressed as a percentage. See Section 3.2.
Combinations rule mathematical rule for calculating the number of different
combinations of a fixed number of distinct items. See Section 4.3.
Complement of an event set of outcomes in the sample space that are not outcomes of
the specified event. See Section 4.1.
Completely randomized design a procedure in analysis of variance in which each item
has the same chance of belonging to different categories or treatments. See Section
11.6.
Conditional probability P(A, given B) the probability that an event A occurs given
that event B has already occurred or is guaranteed to occur. See Section 4.2.
Confidence interval a computed range of values for a statistical parameter determined
from a random sample, probability distribution, and specified confidence level. See
Section 8.1.
Contingency table a table of observed frequencies. The rows correspond to one
method of classification and the columns correspond to another method of
classification. See Section 11.1.
Continuity correction an adjustment that must be made when a discrete random
variable is approximated by a continuous random variable. See Sections 6.4 and 7.3.
Continuous random variable a random variable with an infinite set of values that
correspond to points on a continuous real number interval. See Section 5.1.
4 Control chart a statistical chart displaying a measurement or characteristic to determine
whether or not there is statistical stability in a given process. See Sections 6.1 and
7.3.
Control group subjects in a statistical experiment who are not given a specified
treatment. See Section 1.3.
Convenience sampling a procedure for gathering data in which subjects are selected
because they are readily available. See Section 1.2.
Correlation coefficient ∆ population, r sample in linear regression, a statistical
measurement of the strength and direction of a relationship between two variables x
and y. See Sections 10.3 and 10.4.
Counting rule a mathematical rule which says that for a sequence of two events in
which the first event can occur n ways and the second event can occur m ways, the
events can occur together in a total of nm ways. See Section 4.3.
Critical region set of all values of a test statistic that result in the rejection of the null
hypothesis. See Section 9.1.
Critical value in hypothesis testing, numerical value(s) which separates the critical
region from the non-critical region (the region which would not lead to rejection of
the null hypothesis). See Section 9.2.
Cumulative frequency the sum of frequencies for a given class and all proceeding
classes. See Section 2.2.
Cumulative frequency table a frequency table in which each class and corresponding
frequency represents cumulative data up to and including that class. See Section 2.2.
5 D
Data measurements or observations describing a specified characteristic. See Section
1.1.
Degrees of freedom the number of values that are free to vary after certain restrictions
have been imposed; used when a distribution (such as the t, chi-square, of F) consists
of a family of curves. See Section 8.2, 11.1, and 11.4.
Dependent events A, B events for which the occurrence or non-occurrence of the first
event A affects the probability of the outcome or occurrence of the second event B.
See Section 4.2.
Dependent sample any sample whose values are related to or dependent upon the
values in another sample. See paired samples. See Section 9.6.
Descriptive statistics statistical methods involving the collection, organization,
summarization and presentation of data. See Section 1.1.
Discrete random variable a random variable that takes on values that can be counted.
See Section 5.1.
Distribution-free tests See nonparametric tests.
Dotplot a graph in which each data value is displayed as a point or dot against another
scale of values. See Section 2.2.
Double-blind procedure for a statistical experiment where neither the subject nor the
person conducting the experiment knows whether the subject is receiving either a
treatment or placebo. See Section 1.3.
6 E
Empirical rule a statistical rule that uses the mean and standard deviation to provide
information about data with a bell shaped or mound shaped distribution. See Section
6.1.
Equally likely events events that have the same probability or likelihood of occurring.
See Section 4.1.
Event one or more outcomes of a probability experiment; any subset of the sample
space. See Section 4.1.
Expected value the theoretical average of a random variable with a given probability
distribution. See Section 5.1.
Experiment a treatment of objects or subjects followed by an observation or
measurement of effects on these subjects. See Section 1.3.
Explained deviation used in linear regression with data pairs (x, y), the difference
between forecasted or predicted value yp from the least-squares line and the mean of
the y values. See Section 10.3.
Explained variation used in linear regression with data pairs (x, y), the sum of the
squares of the explained deviation. See Section 10.3
Explanatory variable used in linear regression, the dependent variable x in the least-
squares equation y = a + bx. See Sections 10.2 and 10.5.
Exploratory data analysis (EDA) methods of statistics investigating data using stem-
and-leaf plots, box-and-whisker plots and other strategies. See Section 2.3 and 3.4.
7 F
Factor from analysis of variance, a property or characteristic used to distinguish
different populations. See Section 11.6.
F distribution probability distribution of a continuous random variable named after the
English statistician Sir Ronald Fisher. See Sections 11.4, 11.5, and 11.6.
Five-number summary minimum value, maximum value, median, and the first and
third quartiles of a set of data values. See Section 3.4.
Frequency distribution a tabulation of raw data using classes and frequencies. See
Section 2.2.
Frequency polygon a graph displaying data using line segments connecting points with
the coordinates (class midpoint, class frequency). See Section 2.2.
Frequency table See frequency distribution. G
Geometric distribution probability distribution of a discrete random variable giving
the probability of first occurrence of an event. See Section 5.4.
Goodness-of-fit test a chi-square test used to determine if a specified frequency
distribution fits a given pattern. See Section 11.2. H
Histogram a graphical display of data using touching vertical bars of equal width to
represent frequencies of a distribution. See Section 2.2.
8 Hypothesis a statement or claim about a parameter or property of a population. See
Section 9.1.
Hypothesis test statistical technique for evaluating claims made about the parameters or
properties of population(s). See Section 9.1. I
Independence test a chi-square test using frequency of occurrence to test the
independence of two random variables. See Section 11.1.
Independent events A, B events for which the occurrence or non-occurrence of one
event A does not affect the probability of the outcome or occurrence of the other
event B. See Section 4.2.
Independent sample any sample whose values are not related to or dependent upon the
values in another sample. See Sections 8.5 and 9.7.
Inferential statistics methods of statistics that generalizes from samples to populations,
including hypothesis testing, estimation, and regression. See Section 1.1.
Influential point a point that strongly affects the position of the regression line. See
Chapter 10 Data Highlights.
Interquartile range difference between first and third quartiles. See Section 3.4.
Interval level of measurement describes data that can be arranged in order and
differences between data values are meaningful. See Section 1.1. L
9 Law of large numbers as a probability experiment is repeated more and more times,
the relative frequency probability of an outcome will approach its theoretical
probability. See Section 4.1.
Least-squares criteria for a regression line, the sum of the squares of vertical
distances of the sample points from the regression line is minimized. See Section
10.2.
Least-squares line the line that satisfies the least-squares criteria for a collection of data
pairs (x, y). See Section 10.2.
Left-tailed test hypothesis test in which the critical region is positioned in the extreme
left portion of the probability distribution. See Section 9.1.
Level of confidence c assigned probability that a population parameter could be
contained within a confidence interval. See Section 8.1.
Level of significance ∀ probability of a type I error in a statistical test. See Section
9.1.
Linear correlation coefficient ∆ population, r sample in linear regression, a statistical
measurement of the strength and direction of a relationship between two variables x
and y. Also called Pearson’s product moment correlation coefficient. See Sections
10.3 and 10.4.
Lower class limits smallest numerical value that can be included in a class of a
frequency table. See Section 2.2.
Lurking variable a variable not included in a statistical study that may influence other
variables which are included in the study. See Section 1.3.
10 M
Margin of error the maximal error of a 95% confidence interval. See Section 8.3.
Matched pairs data pairs from dependent samples. See paired data. See Section 9.6.
Maximum error of estimate maximal difference between a sample point estimate of a
parameter and the actual value of that parameter. See Section 8.1.
Mean : population, Ë sample sum of data values divided by the total number of values.
See arithmetic mean and expected value. See Sections 3.1 and 5.1.
Measure of center a computed value whose purpose is to indicate the center value
associated with a collection of data. See Section 3.1.
Measure of variation any of several possible measures such as range or standard
deviation which reflect the amount of variation or spread within a set of data values.
See Section 3.2.
Median middle value of a set of data values arranged in rank order from smallest to
largest. See Section 3.1.
Mode data value that occurs most frequently (when such a value exists). See Section
3.1.
Multiple regression statistical method to analyze linear relationships involving three or
more variables. See Section 10.5.
Multiplication rule rule to compute the probability that on a single trial, the event
(A and B) will occur. See Section 4.2.
11 Mutually exclusive events events that cannot occur together or simultaneously. See
Section 4.2. N
Nominal level of measurement level of measurement that characterizes data that
consists of names, labels, or categories. See Section 1.1.
Nonparametric tests statistical tests where there are no required assumptions regarding
the nature or shape of the underlying population distribution. See Section 12.1.
Normal distribution a symmetric distribution of a continuous random variable that
assumes a bell shape centered over the mean. Also called Gaussian distribution. See
Section 6.1.
Null hypothesis (H0) a statistical hypothesis that states a specific value for a parameter.
See Section 9.1. O
Observational study a statistical study in which we do not attempt to manipulate or
modify the subjects or objects being studied. See Section 1.3.
Ogive a graph that shows cumulative frequencies. See Section 2.2.
One-way analysis of variance analysis of variance which classifies data into groups
according to a single criterion. See Section 11.5.
Ordinal level of measurement describes data that can be arranged in order. However,
differences between data values either cannot be determined or are meaningless. See
Section 1.1.
12 Outliers very unusual data values in the sense that they are very far above or below
most of the data. See Section 3.4. P
Paired samples two samples that are dependent in the sense that the data values are
matched by pairing in a natural manner such as before and after studies, or by some
other feature. See dependent sample. See Section 9.6.
Parameter a measure characteristic of a population such as : or Φ. See Section 7.1.
Parametric tests statistical procedures using population parameters from assumed or
given probability distributions for the purpose of testing hypotheses
Pareto chart bar graph in which the bars are arranged in order of decreasing
frequencies. See Section 2.1.
P chart control chart regarding proportions of a specified attribute. See Section 7.3.
Pearson’s product moment correlation coefficient See linear correlation coefficient.
Percentile values that divide rank ordered data into 100 groups, with each group
containing approximately 1% of the values in data set. See Section 3.4.
Permutations rule mathematical rule which determines the number of different ordered
arrangements of a specified number of items in a collection of distinct items. See
Section 4.3.
Pie chart a graph in the form of a circle, using wedges to show the proportion of items
with a designated characteristic. See Section 2.1.
13 Placebo effect effect when subject receives no treatment, but (incorrectly) believes he
or she is in fact receiving treatment and responds favorably. See Section 1.3.
Point estimate a single numerical value computed from a sample that is an estimate of a
population parameter. See Section 8.1.
Poisson distribution a probability distribution of a discrete random variable that applies
to events occurring over specified intervals of time, distance, area, volume, etc. Also
used to approximate the binomial distribution for rare events. See Section 5.4.
Population total collection of all objects or subjects to be studied. See Section 1.1.
Power of a test (1 - ∃) the probability of rejecting a false null hypothesis. See Section
9.1.
Predicted values values of a response variable found by using values of explanatory
variables in a linear regression equation. See Section 10.2.
Prediction interval confidence interval corresponding to a given predicted value of the
response variable from a linear regression equation. See Section 10.2.
Probability a number between 0 and 1 corresponding to the likelihood that a given
event will occur. See Section 4.1.
Probability distribution values of a random variable with their corresponding
probabilities. See Section 5.1.
Probability histogram a histogram with outcomes presented on the horizontal axis and
corresponding probabilities presented on the vertical axis. See Section 5.1.
Probability value also called probability of chance. See P value.
14 P value the smallest level of significance for which the observed sample statistic tells us
to reject the null hypothesis of a hypothesis test. See Section 9.3. Q
Quantitative data data distinguished by non-numeric characteristics. See Section 1.1.
Qualitative data data characterized by numbers which represent counts or
measurements. See Section 1.1.
Quartiles three numerical values that partition rank ordered data into four groups where
approximately 25% of the data values fall into each group. See Section 3.4. R
Randomized block design used in two-way analysis of variance, an experimental
design in which subjects fitting a designated characteristic (block) are randomly
selected to be included in the block. See Section 11.6.
Random sample a subset of n measurements from a population chosen in such a way
that every member of the population has equal chance of being selected, and every
group of n members of the population has equal chance of being selected. Also called
a simple random sample. See Section 1.2.
Random variable a variable having a single value that is a numerical outcome of a
random process. See Section 5.1.
Range a measure of variation which is the difference between the largest and smallest
values of a data set. See Section 3.2.
15 Rank-sum test a nonparametric statistical test using ranks and sums of ranks from
dependent samples. Also called the Mann-Whitney test. See Section 12.2.
Rank correlation coefficient See Spearman rank correlation coefficient.
Ratio level of measurement describes data that can be arranged in order in which both
differences between data values and ratios of data values are meaningful. See Section
1.1.
Regression equation an algebraic equation describing a relationship among statistical
variables. See Sections 10.2 and 10.5.
Regression line a straight line that satisfies the least-squares criteria for a collection of
data points on a scatter diagram. See Section 10.2.
Relative frequency used in a frequency distribution, the frequency of a class divided by
the total of all frequencies. See Section 2.2.
Relative frequency approximation of probability an estimation of a probability value
using sample data, frequency table, and relative frequency. See Section 4.1.
Relative frequency histogram a histogram showing classes on the horizontal axis and
corresponding relative frequency of each class on the vertical axis. See Section 2.2.
Relative frequency table a table showing classes and corresponding relative
frequencies. See Section 2.2.
Replication repetition of a probability experiment or statistical process. See Section
1.3.
16 Residual used in linear regression, the differences between an observed sample
response value and the corresponding predicted value from the regression equation.
See Section 10.2 and 10.5
Response variable in linear regression, the independent variable y in the least squares
equation y = a + bx. See Section 10.2.
Right-tailed test hypothesis test in which the critical region is positioned in the extreme
right portion of the probability distribution. See Section 9.1. S
Sample any subset of a population. See Section 1.1.
Sample space set of all possible outcomes of a probability experiment; the outcomes
cannot be further broken down. See Section 4.1.
Sampling distribution the probability distribution of a sample statistic such as the
probability distribution of sample means or sample proportions. See Sections 7.2 and
7.3.
Sampling error resulting from sample fluctuations, the difference between a statistical
measure computed from a sample and the value of the same statistical measure
computed from the entire population. See Section 8.1.
Scatter diagram in linear regression, a graphical display of data pairs (x, y). See
Section 10.1.
Significance level (∀) used in hypothesis testing to designate the probability of a type I
error. See Section 9.1.
17 Sign test a nonparametric test used to compare samples from two dependent
populations. See Section 12.1.
Simple random sample See random sample.
Simulation numerical facsimile or representation of a real-world phenomena. See
Section 1.2.
Single factor analysis of variance See one-way analysis of variance.
Skewed left a histogram or distribution in which the left side or tail is stretched out
longer than the right. See Section 2.2.
Skewed right a histogram or distribution in which the right side or tail is stretched out
longer than the left. See Section 2.2.
Slope of least-squares line ∃ (population), b (sample) the value b computed for the
sample least-squares line y = a + bx or the value ∃ for the population least-squares
line y = ∀ + ∃x. See Sections 10.2 and 10.4.
Spearman’s rank correlation coefficient ∆S (population), rS (sample) a
nonparametric measure of the strength of relationship between two variables based
on ranks.
SS (between) used in analysis of variance, a sum of squares used to compute
thevariance between groups. See Section 11.5.
SS (total) used in analysis of variance, a sum of squares used to compute the total
variance. See Section 11.5.
SS (within) used in analysis of variance, a sum of squares used to compute the variance
within a treatment group. See Section 11.5.
18 Standard deviation Φ (population), s (sample) square root of sample variance or
population variance. See variance. See Section 3.2.
Standard error of estimate Se used in linear regression, a measure of the spread
between observed y values and corresponding predicted yp values over the entire
scatter diagram. See Section 10.2.
Standard error of the mean standard deviation of the population of all possible sample
means Ë from samples of the same size taken from the same population. See Section
7.2.
Standard normal distribution a normal distribution for which the standard deviation is
1 and the mean is 0. See Section 6.2.
Standard score the quantity (data value – population mean) divided by population
standard deviation. Also called z score. See Section 6.2.
Statistic a numerical descriptive measure of a sample. See Section 7.1.
Statistics the study of how to collect, organize, analyze, and interpret numerical
information from data. See descriptive statistics and also inferential statistics. See
Section 1.1.
Stem-and-leaf display a graphical method used to rank-order and arrange data into
groups. See Section 2.3.
Stratified sample a sampling method which divides the population into subgroups
(strata) representing homogeneous characteristics. Elements of the sample are then
selected from each stratum. See Section 1.2.
19 Student’s t distribution a family of symmetric, bell-shaped probability distributions
associated with small samples, first discovered by W.S. Gosset. See Section 8.2.
Symmetric distribution a distribution which can be broken into two symmetric parts
using a vertical line. The result is two halves, each of which is a mirror image of the
other. See Section 2.2.
Systematic sampling a sampling procedure in which every kth element is selected. See
Section 1.2.
T
t distribution See Student’s t distribution.
Test of independence a chi-square test using a contingency table to test for the
independence of two variables. See Section 11.1
Test of significance See hypothesis test.
Test statistic in hypothesis testing, a statistic computed from sample data which is used
in making a decision regarding the rejection (or non-rejection) of the null hypothesis.
See Section 9.1.
Time plot a graph representing sample data that occur over a specified period of time.
See Section 2.1.
Total deviation used in linear regression with data pairs (x, y), the sum of the
explained deviation and unexplained deviation. See Section 10.3.
Total variation used in linear regression with data pairs (x, y), the sum of the explained
variation and unexplained variation. See Section 10.3.
20 Treatment a characteristic used to distinguish between different populations. See
analysis of variance. See Sections 11.5 and 11.6.
Treatment group in an experiment, the group of subjects receiving a specified
treatment. See Section 1.3.
Tree diagram a graphical display of the set of possible outcomes in a compound event.
See Section 4.3.
Two-tailed test hypothesis test in which the critical region is divided between extreme
left and right portions of a probability distribution. See Section 9.1.
Two-way analysis of variance analysis of variance which classifies data into groups
according to two criteria. See Section 11.6.
Type I error in hypothesis testing, the error made by rejecting the null hypothesis when
it is in fact true. See Section 9.1.
Type II error in hypothesis testing, the error made by failing to reject the null
hypothesis when it is in fact false. See Section 9.1.
U
Unexplained deviation used in linear regression with data pairs (x, y), the difference
between the y coordinate and the corresponding forecasted or predicted value yp from
the least-squares line. See Section 10.3.
Unexplained variation used in linear regression with data pairs (x, y), the sum of the
squares of the unexplained deviations. See Section 10.3.
Uniform distribution a rectangular-shaped probability distribution (or histogram). See
Section 2.2.
21 Upper class limits largest numerical value that can be included in a class of a frequency
table. See Section 2.2. V
Variance a measure of data spread involving the difference between each data value
and the mean of the data set. For formula, see Section 3.2
Variable a characteristic or attribute in a statistical study which can take on different
values. See Section 1.1.
Variance between groups used in analysis of variance, it is the computed variation
among the different samples. See Section 11.5. W
Weighted average computed by multiplying each value of the data set by
corresponding weights and dividing by the total sum of all the weights. See Section
3.3. Y
y-intercept ∀ (population), a (sample) in linear regression, the y value when x = 0
from the least-squares regression equation y = a + bx (sample) or y = ∀ + ∃x
(population). Z
z score See standard score.
22
23