<<

Glossary

Understandable , 8th edition, by Brase and Brase

Houghton Mifflin Company Boston New York

A

Addition rule rule to compute the probability that on a single trial, the event (A or B)

will occur. See Section 4.2.

Alpha (∀) represents the probability of a type I error in a statistical test. See also level

of significance. See Section 9.1.

Alternate hypothesis (H1) a statistical hypothesis that is constructed in such a way that

it is the hypothesis to be accepted when the null hypothesis is rejected. See also null

hypothesis. See Section 9.1.

Analysis of statistical analysis of population used to test hypotheses

concerning of populations. See Section 11.5.

ANOVA See .

Arithmetic : population, Ë sum of a collection of values divided by

the number of values. Often referred to as simply the mean. See Section 3.1.

Average any of several statistical measures used to designate the center of a collection

of data. See Section 3.1. B

1 Bar graph a statistical graph that uses either vertical or horizontal bars representing

outcomes or frequencies of a . See Section 2.1.

Beta (∃)

used in hypothesis testing to represent the probability of a type II error. See Section

9.1.

used in to represent the population slope of the least-squares line.

See Section 10.4.

Bimodal distribution a distribution having two modes. See Section 2.2

Binomial a statistical experiment with a fixed number of independent trials

n in which each trial has only two outcomes and the probability of success p remains

the same for each trial. See Section 5.2

Binomial distribution the set of all possible outcomes of a binomial experiment and

corresponding probabilities associated with these outcomes. See Section 5.2

Block used in analysis of variance, a group or collection of similar individuals. See

Section 11.6

Box-and-whisker plot graphical representation of the spread of a data set showing

quartiles. See Section 3.4.

Boxplot See box-and-whisker plot. C

Categorical data data that is separated into categories which are identifiable by non-

numeric properties. See Section 1.1

2 data collected from every member in an entire population. See Section 1.3.

Central limit theorem a theorem describing the distribution of sample means. As the

sample size n increases the distribution of sample means becomes normally

distributed with mean : and σ / n . See Section 7.2.

Chebyshev’s theorem a theorem that uses the standard deviation and mean to give

information about the distribution of a collection of data. See Section 3.2.

Chi-square distribution of a continuous first

introduced in the overview of Chapter 11.

Circle graph See .

Class boundaries upper and lower values of a class for a grouped

distribution adjusted so there are no gaps between consecutive classes. See Section

2.2.

Class midpoint used in a frequency table, the value midway between the lower class

limit and the upper class limit. See Section 2.2.

Class width difference between the upper class boundary and the lower class boundary.

See Section 2.2.

Cluster a sampling method using preexisting or already determined sectors,

clusters, or natural groupings. See Section 1.2.

Coefficient of determination r2 used in linear regression to describe the variation in y

that can be explained using the variation in x and the least-squares regression model.

See Section 10.3.

3 CV the standard deviation of a data set divided by its mean;

the result is usually expressed as a percentage. See Section 3.2.

Combinations rule mathematical rule for calculating the number of different

combinations of a fixed number of distinct items. See Section 4.3.

Complement of an event set of outcomes in the sample space that are not outcomes of

the specified event. See Section 4.1.

Completely randomized design a procedure in analysis of variance in which each item

has the same chance of belonging to different categories or treatments. See Section

11.6.

Conditional probability P(A, given B) the probability that an event A occurs given

that event B has already occurred or is guaranteed to occur. See Section 4.2.

Confidence interval a computed of values for a statistical determined

from a random sample, probability distribution, and specified confidence level. See

Section 8.1.

Contingency table a table of observed frequencies. The rows correspond to one

method of classification and the columns correspond to another method of

classification. See Section 11.1.

Continuity correction an adjustment that must be made when a discrete random

variable is approximated by a continuous random variable. See Sections 6.4 and 7.3.

Continuous random variable a random variable with an infinite set of values that

correspond to points on a continuous real number interval. See Section 5.1.

4 a statistical chart displaying a measurement or characteristic to determine

whether or not there is statistical stability in a given process. See Sections 6.1 and

7.3.

Control group subjects in a statistical experiment who are not given a specified

treatment. See Section 1.3.

Convenience sampling a procedure for gathering data in which subjects are selected

because they are readily available. See Section 1.2.

Correlation coefficient ∆ population, r sample in linear regression, a statistical

measurement of the strength and direction of a relationship between two variables x

and y. See Sections 10.3 and 10.4.

Counting rule a mathematical rule which says that for a sequence of two events in

which the first event can occur n ways and the second event can occur m ways, the

events can occur together in a total of nm ways. See Section 4.3.

Critical region set of all values of a test that result in the rejection of the null

hypothesis. See Section 9.1.

Critical value in hypothesis testing, numerical value(s) which separates the critical

region from the non-critical region (the region which would not lead to rejection of

the null hypothesis). See Section 9.2.

Cumulative frequency the sum of frequencies for a given class and all proceeding

classes. See Section 2.2.

Cumulative frequency table a frequency table in which each class and corresponding

frequency represents cumulative data up to and including that class. See Section 2.2.

5 D

Data measurements or observations describing a specified characteristic. See Section

1.1.

Degrees of freedom the number of values that are free to vary after certain restrictions

have been imposed; used when a distribution (such as the t, chi-square, of F) consists

of a family of curves. See Section 8.2, 11.1, and 11.4.

Dependent events A, B events for which the occurrence or non-occurrence of the first

event A affects the probability of the outcome or occurrence of the second event B.

See Section 4.2.

Dependent sample any sample whose values are related to or dependent upon the

values in another sample. See paired samples. See Section 9.6.

Descriptive statistics statistical methods involving the collection, organization,

summarization and presentation of data. See Section 1.1.

Discrete random variable a random variable that takes on values that can be counted.

See Section 5.1.

Distribution-free tests See nonparametric tests.

Dotplot a graph in which each data value is displayed as a point or dot against another

scale of values. See Section 2.2.

Double-blind procedure for a statistical experiment where neither the subject nor the

person conducting the experiment knows whether the subject is receiving either a

treatment or placebo. See Section 1.3.

6 E

Empirical rule a statistical rule that uses the mean and standard deviation to provide

information about data with a bell shaped or mound shaped distribution. See Section

6.1.

Equally likely events events that have the same probability or likelihood of occurring.

See Section 4.1.

Event one or more outcomes of a probability experiment; any subset of the sample

space. See Section 4.1.

Expected value the theoretical average of a random variable with a given probability

distribution. See Section 5.1.

Experiment a treatment of objects or subjects followed by an observation or

measurement of effects on these subjects. See Section 1.3.

Explained deviation used in linear regression with data pairs (x, y), the difference

between forecasted or predicted value yp from the least-squares line and the mean of

the y values. See Section 10.3.

Explained variation used in linear regression with data pairs (x, y), the sum of the

squares of the explained deviation. See Section 10.3

Explanatory variable used in linear regression, the dependent variable x in the least-

squares equation y = a + bx. See Sections 10.2 and 10.5.

Exploratory data analysis (EDA) methods of statistics investigating data using stem-

and-leaf plots, box-and-whisker plots and other strategies. See Section 2.3 and 3.4.

7 F

Factor from analysis of variance, a property or characteristic used to distinguish

different populations. See Section 11.6.

F distribution probability distribution of a continuous random variable named after the

English Sir . See Sections 11.4, 11.5, and 11.6.

Five-number summary minimum value, maximum value, , and the first and

third quartiles of a set of data values. See Section 3.4.

Frequency distribution a tabulation of raw data using classes and frequencies. See

Section 2.2.

Frequency polygon a graph displaying data using line segments connecting points with

the coordinates (class midpoint, class frequency). See Section 2.2.

Frequency table See . G

Geometric distribution probability distribution of a discrete random variable giving

the probability of first occurrence of an event. See Section 5.4.

Goodness-of-fit test a chi-square test used to determine if a specified frequency

distribution fits a given pattern. See Section 11.2. H

Histogram a graphical display of data using touching vertical bars of equal width to

represent frequencies of a distribution. See Section 2.2.

8 Hypothesis a statement or claim about a parameter or property of a population. See

Section 9.1.

Hypothesis test statistical technique for evaluating claims made about the or

properties of population(s). See Section 9.1. I

Independence test a chi-square test using frequency of occurrence to test the

independence of two random variables. See Section 11.1.

Independent events A, B events for which the occurrence or non-occurrence of one

event A does not affect the probability of the outcome or occurrence of the other

event B. See Section 4.2.

Independent sample any sample whose values are not related to or dependent upon the

values in another sample. See Sections 8.5 and 9.7.

Inferential statistics methods of statistics that generalizes from samples to populations,

including hypothesis testing, estimation, and regression. See Section 1.1.

Influential point a point that strongly affects the position of the regression line. See

Chapter 10 Data Highlights.

Interquartile range difference between first and third quartiles. See Section 3.4.

Interval describes data that can be arranged in order and

differences between data values are meaningful. See Section 1.1. L

9 Law of large numbers as a probability experiment is repeated more and more times,

the relative frequency probability of an outcome will approach its theoretical

probability. See Section 4.1.

Least-squares criteria for a regression line, the sum of the squares of vertical

distances of the sample points from the regression line is minimized. See Section

10.2.

Least-squares line the line that satisfies the least-squares criteria for a collection of data

pairs (x, y). See Section 10.2.

Left-tailed test hypothesis test in which the critical region is positioned in the extreme

left portion of the probability distribution. See Section 9.1.

Level of confidence c assigned probability that a population parameter could be

contained within a . See Section 8.1.

Level of significance ∀ probability of a type I error in a statistical test. See Section

9.1.

Linear ∆ population, r sample in linear regression, a statistical

measurement of the strength and direction of a relationship between two variables x

and y. Also called Pearson’s product correlation coefficient. See Sections

10.3 and 10.4.

Lower class limits smallest numerical value that can be included in a class of a

frequency table. See Section 2.2.

Lurking variable a variable not included in a statistical study that may influence other

variables which are included in the study. See Section 1.3.

10 M

Margin of error the maximal error of a 95% confidence interval. See Section 8.3.

Matched pairs data pairs from dependent samples. See paired data. See Section 9.6.

Maximum error of estimate maximal difference between a sample point estimate of a

parameter and the actual value of that parameter. See Section 8.1.

Mean : population, Ë sample sum of data values divided by the total number of values.

See and expected value. See Sections 3.1 and 5.1.

Measure of center a computed value whose purpose is to indicate the center value

associated with a collection of data. See Section 3.1.

Measure of variation any of several possible measures such as range or standard

deviation which reflect the amount of variation or spread within a set of data values.

See Section 3.2.

Median middle value of a set of data values arranged in rank order from smallest to

largest. See Section 3.1.

Mode data value that occurs most frequently (when such a value exists). See Section

3.1.

Multiple regression statistical method to analyze linear relationships involving three or

more variables. See Section 10.5.

Multiplication rule rule to compute the probability that on a single trial, the event

(A and B) will occur. See Section 4.2.

11 Mutually exclusive events events that cannot occur together or simultaneously. See

Section 4.2. N

Nominal level of measurement level of measurement that characterizes data that

consists of names, labels, or categories. See Section 1.1.

Nonparametric tests statistical tests where there are no required assumptions regarding

the nature or shape of the underlying population distribution. See Section 12.1.

Normal distribution a symmetric distribution of a continuous random variable that

assumes a bell shape centered over the mean. Also called Gaussian distribution. See

Section 6.1.

Null hypothesis (H0) a statistical hypothesis that states a specific value for a parameter.

See Section 9.1. O

Observational study a statistical study in which we do not attempt to manipulate or

modify the subjects or objects being studied. See Section 1.3.

Ogive a graph that shows cumulative frequencies. See Section 2.2.

One-way analysis of variance analysis of variance which classifies data into groups

according to a single criterion. See Section 11.5.

Ordinal level of measurement describes data that can be arranged in order. However,

differences between data values either cannot be determined or are meaningless. See

Section 1.1.

12 Outliers very unusual data values in the sense that they are very far above or below

most of the data. See Section 3.4. P

Paired samples two samples that are dependent in the sense that the data values are

matched by pairing in a natural manner such as before and after studies, or by some

other feature. See dependent sample. See Section 9.6.

Parameter a measure characteristic of a population such as : or Φ. See Section 7.1.

Parametric tests statistical procedures using population parameters from assumed or

given probability distributions for the purpose of testing hypotheses

Pareto chart bar graph in which the bars are arranged in order of decreasing

frequencies. See Section 2.1.

P chart control chart regarding proportions of a specified attribute. See Section 7.3.

Pearson’s product moment correlation coefficient See linear correlation coefficient.

Percentile values that divide rank ordered data into 100 groups, with each group

containing approximately 1% of the values in data set. See Section 3.4.

Permutations rule mathematical rule which determines the number of different ordered

arrangements of a specified number of items in a collection of distinct items. See

Section 4.3.

Pie chart a graph in the form of a circle, using wedges to show the proportion of items

with a designated characteristic. See Section 2.1.

13 Placebo effect effect when subject receives no treatment, but (incorrectly) believes he

or she is in fact receiving treatment and responds favorably. See Section 1.3.

Point estimate a single numerical value computed from a sample that is an estimate of a

population parameter. See Section 8.1.

Poisson distribution a probability distribution of a discrete random variable that applies

to events occurring over specified intervals of time, distance, area, volume, etc. Also

used to approximate the for rare events. See Section 5.4.

Population total collection of all objects or subjects to be studied. See Section 1.1.

Power of a test (1 - ∃) the probability of rejecting a false null hypothesis. See Section

9.1.

Predicted values values of a response variable found by using values of explanatory

variables in a linear regression equation. See Section 10.2.

Prediction interval confidence interval corresponding to a given predicted value of the

response variable from a linear regression equation. See Section 10.2.

Probability a number between 0 and 1 corresponding to the likelihood that a given

event will occur. See Section 4.1.

Probability distribution values of a random variable with their corresponding

probabilities. See Section 5.1.

Probability a histogram with outcomes presented on the horizontal axis and

corresponding probabilities presented on the vertical axis. See Section 5.1.

Probability value also called probability of chance. See P value.

14 P value the smallest level of significance for which the observed sample statistic tells us

to reject the null hypothesis of a hypothesis test. See Section 9.3. Q

Quantitative data data distinguished by non-numeric characteristics. See Section 1.1.

Qualitative data data characterized by numbers which represent counts or

measurements. See Section 1.1.

Quartiles three numerical values that partition rank ordered data into four groups where

approximately 25% of the data values fall into each group. See Section 3.4. R

Randomized block design used in two-way analysis of variance, an experimental

design in which subjects fitting a designated characteristic (block) are randomly

selected to be included in the block. See Section 11.6.

Random sample a subset of n measurements from a population chosen in such a way

that every member of the population has equal chance of being selected, and every

group of n members of the population has equal chance of being selected. Also called

a simple random sample. See Section 1.2.

Random variable a variable having a single value that is a numerical outcome of a

random process. See Section 5.1.

Range a measure of variation which is the difference between the largest and smallest

values of a data set. See Section 3.2.

15 Rank-sum test a nonparametric statistical test using ranks and sums of ranks from

dependent samples. Also called the Mann-Whitney test. See Section 12.2.

Rank correlation coefficient See Spearman coefficient.

Ratio level of measurement describes data that can be arranged in order in which both

differences between data values and ratios of data values are meaningful. See Section

1.1.

Regression equation an algebraic equation describing a relationship among statistical

variables. See Sections 10.2 and 10.5.

Regression line a straight line that satisfies the least-squares criteria for a collection of

data points on a scatter diagram. See Section 10.2.

Relative frequency used in a frequency distribution, the frequency of a class divided by

the total of all frequencies. See Section 2.2.

Relative frequency approximation of probability an estimation of a probability value

using sample data, frequency table, and relative frequency. See Section 4.1.

Relative frequency histogram a histogram showing classes on the horizontal axis and

corresponding relative frequency of each class on the vertical axis. See Section 2.2.

Relative frequency table a table showing classes and corresponding relative

frequencies. See Section 2.2.

Replication repetition of a probability experiment or statistical process. See Section

1.3.

16 Residual used in linear regression, the differences between an observed sample

response value and the corresponding predicted value from the regression equation.

See Section 10.2 and 10.5

Response variable in linear regression, the independent variable y in the

equation y = a + bx. See Section 10.2.

Right-tailed test hypothesis test in which the critical region is positioned in the extreme

right portion of the probability distribution. See Section 9.1. S

Sample any subset of a population. See Section 1.1.

Sample space set of all possible outcomes of a probability experiment; the outcomes

cannot be further broken down. See Section 4.1.

Sampling distribution the probability distribution of a sample statistic such as the

probability distribution of sample means or sample proportions. See Sections 7.2 and

7.3.

Sampling error resulting from sample fluctuations, the difference between a statistical

measure computed from a sample and the value of the same statistical measure

computed from the entire population. See Section 8.1.

Scatter diagram in linear regression, a graphical display of data pairs (x, y). See

Section 10.1.

Significance level (∀) used in hypothesis testing to designate the probability of a type I

error. See Section 9.1.

17 a nonparametric test used to compare samples from two dependent

populations. See Section 12.1.

Simple random sample See random sample.

Simulation numerical facsimile or representation of a real-world phenomena. See

Section 1.2.

Single of variance See one-way analysis of variance.

Skewed left a histogram or distribution in which the left side or tail is stretched out

longer than the right. See Section 2.2.

Skewed right a histogram or distribution in which the right side or tail is stretched out

longer than the left. See Section 2.2.

Slope of least-squares line ∃ (population), b (sample) the value b computed for the

sample least-squares line y = a + bx or the value ∃ for the population least-squares

line y = ∀ + ∃x. See Sections 10.2 and 10.4.

Spearman’s rank correlation coefficient ∆S (population), rS (sample) a

nonparametric measure of the strength of relationship between two variables based

on ranks.

SS (between) used in analysis of variance, a sum of squares used to compute

thevariance between groups. See Section 11.5.

SS (total) used in analysis of variance, a sum of squares used to compute the total

variance. See Section 11.5.

SS (within) used in analysis of variance, a sum of squares used to compute the variance

within a treatment group. See Section 11.5.

18 Standard deviation Φ (population), s (sample) square root of sample variance or

population variance. See variance. See Section 3.2.

Standard error of estimate Se used in linear regression, a measure of the spread

between observed y values and corresponding predicted yp values over the entire

scatter diagram. See Section 10.2.

Standard error of the mean standard deviation of the population of all possible sample

means Ë from samples of the same size taken from the same population. See Section

7.2.

Standard a normal distribution for which the standard deviation is

1 and the mean is 0. See Section 6.2.

Standard the quantity (data value – population mean) divided by population

standard deviation. Also called z score. See Section 6.2.

Statistic a numerical descriptive measure of a sample. See Section 7.1.

Statistics the study of how to collect, organize, analyze, and interpret numerical

information from data. See and also inferential statistics. See

Section 1.1.

Stem-and-leaf display a graphical method used to rank-order and arrange data into

groups. See Section 2.3.

Stratified sample a sampling method which divides the population into subgroups

(strata) representing homogeneous characteristics. Elements of the sample are then

selected from each stratum. See Section 1.2.

19 Student’s t distribution a family of symmetric, bell-shaped probability distributions

associated with small samples, first discovered by W.S. Gosset. See Section 8.2.

Symmetric distribution a distribution which can be broken into two symmetric parts

using a vertical line. The result is two halves, each of which is a mirror image of the

other. See Section 2.2.

Systematic sampling a sampling procedure in which every kth element is selected. See

Section 1.2.

T

t distribution See Student’s t distribution.

Test of independence a chi-square test using a to test for the

independence of two variables. See Section 11.1

Test of significance See hypothesis test.

Test statistic in hypothesis testing, a statistic computed from sample data which is used

in making a decision regarding the rejection (or non-rejection) of the null hypothesis.

See Section 9.1.

Time plot a graph representing sample data that occur over a specified period of time.

See Section 2.1.

Total deviation used in linear regression with data pairs (x, y), the sum of the

explained deviation and unexplained deviation. See Section 10.3.

Total variation used in linear regression with data pairs (x, y), the sum of the explained

variation and unexplained variation. See Section 10.3.

20 Treatment a characteristic used to distinguish between different populations. See

analysis of variance. See Sections 11.5 and 11.6.

Treatment group in an experiment, the group of subjects receiving a specified

treatment. See Section 1.3.

Tree diagram a graphical display of the set of possible outcomes in a compound event.

See Section 4.3.

Two-tailed test hypothesis test in which the critical region is divided between extreme

left and right portions of a probability distribution. See Section 9.1.

Two-way analysis of variance analysis of variance which classifies data into groups

according to two criteria. See Section 11.6.

Type I error in hypothesis testing, the error made by rejecting the null hypothesis when

it is in fact true. See Section 9.1.

Type II error in hypothesis testing, the error made by failing to reject the null

hypothesis when it is in fact false. See Section 9.1.

U

Unexplained deviation used in linear regression with data pairs (x, y), the difference

between the y coordinate and the corresponding forecasted or predicted value yp from

the least-squares line. See Section 10.3.

Unexplained variation used in linear regression with data pairs (x, y), the sum of the

squares of the unexplained deviations. See Section 10.3.

Uniform distribution a rectangular-shaped probability distribution (or histogram). See

Section 2.2.

21 Upper class limits largest numerical value that can be included in a class of a frequency

table. See Section 2.2. V

Variance a measure of data spread involving the difference between each data value

and the mean of the data set. For formula, see Section 3.2

Variable a characteristic or attribute in a statistical study which can take on different

values. See Section 1.1.

Variance between groups used in analysis of variance, it is the computed variation

among the different samples. See Section 11.5. W

Weighted average computed by multiplying each value of the data set by

corresponding weights and dividing by the total sum of all the weights. See Section

3.3. Y

y-intercept ∀ (population), a (sample) in linear regression, the y value when x = 0

from the least-squares regression equation y = a + bx (sample) or y = ∀ + ∃x

(population). Z

z score See .

22

23