1 2 4 ) y
1 m x 2 n 1 ( m p y n , s
t x Test t ‐ t
a
n
1 of
n , / x s t test
t One SampleSample Two test
Form
test
exact
square ‐ df ANOVA Binomial Chi Fisher’s Statistic Goals General 3 1 ‐ test Tests Chi
Binomial test Binomial Chi-square or Fisher’s exact Fisher’s
McNemar’s test McNemar’s 2012 Test,
Sample
ANOVA GS ‐
test &
Wilcoxon- Wilcoxon test Wilcoxon test Wilcoxon 2012 Mann-Whitney Mann-Whitney Spring
and CSE
22,
560, One/Two
Lee,
Type of Data Binomial In of May
‐
Test, t-test t-test
[email protected] Su 7:
Two sample One sample Paired t-test Tour
GENOME square How groups? about 3 or more Goal Gaussian Non-Gaussian Gaussian Goal Binomial value Lecture group to a hypothetical Compare two Compare two Compare one paired groups Whirlwind unpaired groups 2 6 8
of
?
the
group Groups favorite
more
a
groups) here?
in individuals our
N in
whether
groups trait
within More
SNP test genotypes.
the these
a test?
‐ or to
t
of
appropriate
3 is
divide groups among
wise groups ps at once. ‐ of quantitative (=variability
genotype a
then pair
categories
the differs within
between all also noise
y
We
in y
framework
Analogous to pooled variance from a ttest. from variance pooled to Analogous
Tests
and value
measure
Summarizes the mean differences the mean differences Summarizes grou between all gene. genotype
perform we
Variabilit
trait 3
Variabilit Test statistical difference
‐ background not
the
F F
individuals the
candidate into average N than Suppose What Why Is Hypothesis The 5 7
time
a statistic
sample
a at
is
one
of
parametric ‐ pairwise
variance difference
analog non the
Test? one ‐ that at
error t
I test:
than look test
Alternatives ‐
t type
to
parametric wise ‐ recognize ‐ once.”
more
our Whitney
‐ want non at at
can sample
Pair we “all
we look
Mann Test:
two ‐
increase
us
of
this,
Parametric let will
instead, ‐ do
Three test
‐ that differences t Wilcoxon Wilcoxon This So, To analog Non Do 3 10 12
to
the used
of
exactly
equal
in
one being all
comparing
ANOVA
are
least
k classified
factors variable you’re
μ at
Hypothesis
is
= factor
means
…
that more
value
=
is variable the 2
Null
to single
μ
(groups) categorical
each the =
1 that is
the μ
is :
is called
samples 0 k
H because hypothesis
also variable is generalizes
ANOVA ANOVA: variable groups
assume
way ‐ different case,
the
hypothesis will
easily
is
way outcome factor one
We Way Way null alternative
‐ ‐ The The The define one means Simplest ANOVA The The One One 9 11
referred
are
qualitative
factors
variable
as
more
factors to
or
ANOVA
one outcome
SNP of referred of
a
ANOVA of
are
differentiates
effect
of
that the
quantitative of
a
genotypes
variables
on
study
SNP three
to Framework
levels
e.g., e.g., as
variables Want Qualitative Characteristics Independence Normality Homogeneity to Basic Assumptions 4 14 16
to
data
deviations
equal
the
of are
squared
variation
variances
average
(groups)
Variation total (groups)
levels simply
is levels
Total standardized
partition
ANOVA
within between
to
the of is variation
sources mean
true
that idea
the is Variation Variation two
another
0 1. 2. H
into one from Basic If Recall Rational Partitioning 13 15 th j
and was 4
. from
3 6 3 . . . 71 76 50 87 trait
55 3 3 3 (group) / / /
) ) ) sampled 68 55 59 97
level 78 59 83 th 38 i
ij N 83 38 82 x quantitative ( 68 ( (
the
J j . . . randomly 2 1 3
9 11 x x x 78 K is: i
(= 9) individuals , from SNP some
N .. 83 x of
97 68 55 data mean (= 3) groups … (= 3) groups
97 68 55
97
ANOVA single k
83, 78, 59, 83, 78, 59, individuals a
the
83 in sample of There are are There and grand
82 or
. denote 3 x ij Data: Details
x
AA:AG: 82, GG: 83, 38, AA:AG: 82, GG: 83, 38, random
measured population A Genotype Our Let observation Overall, Motivating The 5 18 20
mean
Statistics
F k ‐
error N
1, ‐ k
, and
α 5.14
F
= is 6
59 equal 2,
. group
12 of 0.05, are
F E region
2 . 3 . Calculating ratio
MST then
84 1062 the
and is rejection
G 0.05, E G
= AA AG GG for
α MST MST MST
There…
statistic Example
value
Value
true F
define
is test
0 H we
Our
The If Critical If squares In Almost 17 19
divide
we
freedom
of
degrees comparable,
Squares
squares 1062.1 8) 6) 2)
= = = =
associated
of
84.3 2
–1 –3 –1 / = Mean
6 (9 (9 (3 sum
their
/
k –1 by –1 ‐
the
k N N 2142.2 Example
506
:
: :
= = G E T one
G E make SST SST SST
Our
each To MST MST In Calculating 6 22 24
a
and
is likely
analog
is
have
trials to n
rate
of
distribution
each parametric
‐ success assumed
in the
of non
n
true
x data, situation…
succeeded
the .
0 Test: ..., mean
outcome
μ
, Alternative 2
them the x
Sum different following
if
of ,
value a
1 binary
whether
x a the
Rank
many
Data have
given
consider specified
wondering how
a we Wallis
‐ wondering distribution:
. to let’s p
kruskal.test()
Parametric
that were are ‐ know
ANOVA be R,
to Kruskal In Previously, We Now, Say We equal normal to we Non Binomial 21 23 test Tests
Binomial test Binomial Chi-square or Fisher’s exact Fisher’s McNemar’s test McNemar’s Sample ‐ test Wilcoxon- Wilcoxon test Wilcoxon test Wilcoxon Mann-Whitney Mann-Whitney One/Two
Type of Data of
t-test t-test Two sample One sample Paired t-test Tour
Table
Goal Gaussian Non-Gaussian Gaussian Goal Binomial value group to a hypothetical Compare two Compare two Compare one paired groups ANOVA Whirlwind unpaired groups 7 26 28
an
‐ 5
0.2 of
as
same chi
rows
to
R
the the
0.07883 observe
equal matrix
=
to a numbers
have not tables:
Proportions
of is
value ‐ p takes
Proportion
table 50, bigger
frequency a success =
2
that
approximation, x of
test too
2
populations
trials a on a
Binomial of is can
good
two up
a
0.2 of
Mutant alleleMutant allele WT is test
chisq.test probability
set
number
true 5, interval: Limits
You
there whether
=
50) . whether
0.2) square test
‐
function is of
but
50, test test R
chi
) success Equality 2
0.21813537 (out
50 an we χ
hypothesis: of confidence ( the successes
Population #1 #2Population 5 10 45 35 0.1
binomial hard, is
frequency?
of try: do and columns.
estimates: question is
5 C R, fact,
Exact percent mutants Our In
How This In There allele square outcomes: argument. by binom.test (5,
0.03327509 sample probability 95 alternative number data: > Confidence Testing 25 27
gene
there
a
of
on
that
allele
if there are are there if
you? p SNP
Tails a
know
you
expected
and surprise
Its studying to
in is 0.025
frequency?
Confidence limits for 5 mutants out of 50 subjects subjects,
going p = 0.21813537 50
not “right”
is 0.025
Intervals is mutation 0.2
Thrombosis.
with interested
p = the
the p of
with is
you’re p=0.2
p = 0.0332750 is range
having
population that 5
0 1 2 3 4 5 6 7 8 9 10 7 11 6 12 5 13 4 14 2 3 15 1 0 16 17 18 … a
are frequency associated Say In Then, What Binomials Example Confidential 8
30 32
one
the
in sums
so less
a
(row this of
of effect
1). cell in is
−
in
(a
5263 cell . classes, table this
1)(m
you
0 − quantities.
number of (a
frequencies (n
8947
class . distribution
2 is table the 7
table 95 50
row
the class
less 1) is number
Test? Test?
each 95
which
− 1, distributed
‐ in 1]
each (m to
−
total contingency
95 50
which
1) in expected) up n contingency
−
and the -
a
x
(m contingency is (total) 95 15
normally add
−
table, “Expected” number of subjects in pop #1 AND having mutant allele m
n / For expected
1) of
x −
numbers
Square Square an (n
m WT WT numbers ‐ ‐ −
allele Total square freedom sum) ‐
frequencies
an For
1578 (observed [nm squares of
(total) .
estimate.
chi
frequencies 0
/ a Chi Chi
to of) For are
expected on allele classes
column a a Mutant
expected had 95
15
degrees sum) up 1) (column table).
classes:
2 the x of expected −
you
the
table).
numbers freedom all
(n
Do Do value the
of
out
out
Total 15 80 95 sum) the over
(column
number
To To x (various
Population #1 #2Population 5 10 45 35 50 45 degrees estimated parameters because of The Look Figure Figure Sum (row
contingency
1. 1. 2. contingency sum) a is How How 29 31
in
this
cell
and
(a
table
way,
class
one
correction Test? in each
in
contingency
n them
WT
allele Total WT WT x allele Total
continuity (total)
m /
numbers “Expected” number of subjects in each cell
Square 0.1772 an Yates'
classify ‐
= )
allele 7.8957.105 42.105 37.894 Test sum) Mutant
For with
and
value c(2,2) ‐
Chi allele Mutant p
test
expected 1,
a
(column table). Total158095
df = the way.
x
squared ‐ Population #1 #2Population 5 10 45 35 50 45 Square Do
individuals ‐ out
c(5,45,10,35), Chi
Total 15 80 95 sum) 1.8211,
= To
Population #1 #2Population 5 10 45 35 50 45 another
“observed” Chi
draw
matrix( Figure tb (row
‐ contingency
Pearson's We 1. is a also squared chisq.test(tb) tb < ‐
data: X > > The How 9 34 36
the this
sum value
of
open
and
the
2(df)
the = ‐
df
of
mean trials =
are
variance expected well
20 square
E(x) Var(x) ‐ approximate with
) The and chi 1 , we with fairly
0
k, is
distribution Ν
Z ~ probabilities the
distribution:
deviates. heads
distribution is
of
class distribution
where ; Distribution 2
distribution
Number of mutated Normal Z
the normal
normal 1
df i a
Approximation the number
0.3,
distribution binomial 2
df
by binomial
each under
the
standard
Square ‐ Probability the
square For ‐
area
frequency
shows Chi
Chi Normal
the squared
approximated The Actually, This allele of circles. by The The 33 35
you
“Expected” for
2 ) when
89 . 7 3 - 37.89 (35 computed 2
distribution 2 2 ) χ 11 . values WT ‐ allele Total 7 the freedom: -
p 7.11
expected)
- of
(10 for
expected 2 correct )
7.8957.105 42.105 37.894 11 . values the 2
degrees (observed 4
allele - Mutant 42.11 get of
classes (45 Test can
critical
2 2
) you
895 . Total 15 80 95 Values numbers some 1234 3.8415 5.9916 7.8157 9.488 15 11.0708 20 12.5929 25 24.996 14.067 30 35 31.410 15.507 40 37.652 16.919 45 43.773 49.802 50 55.758 60 61.656 67.505 79.082 df Upper 95%point df Upper 95%point
10 18.307 70 90.531 7
Degrees of freedom = (# rows-1) x (# columns-1) = 1 = (# columns-1) x = (# rows-1) of freedom Degrees 7.895 - R. (5 Population #1 #2Population 5 10 45 35 50 45
course, are
Square use Of 2 1 ‐ Here different Chi Critical “observed”