<<

1 2 4 ) y

 1 m  x   2 n 1 (   m  p y n , s

  t x  Test t ‐ t

a

n

 1 of 

n  , /  x s t test 

t One SampleSample Two test

Form

test

exact

square ‐ df ANOVA Binomial Chi Fisher’s     Goals General 3 1 ‐ test Tests Chi

Binomial test Binomial Chi-square or Fisher’s exact Fisher’s

McNemar’s test McNemar’s 2012 Test,

Sample

ANOVA GS ‐

test &

Wilcoxon- Wilcoxon test Wilcoxon test Wilcoxon 2012 Mann-Whitney Mann-Whitney Spring

and CSE

22,

560, One/Two

Lee,

Type of Data Binomial In of May

Test, t-test t-test

[email protected] Su 7:

Two sample One sample Paired t-test Tour

GENOME square How groups? about 3 or more Goal Gaussian Non-Gaussian Gaussian Goal Binomial value Lecture group to a hypothetical Compare two Compare two Compare one paired groups Whirlwind unpaired groups 2 6 8

of

?

the

group Groups favorite

more

a

groups) here?

in individuals our

N in

whether

groups trait

within More

SNP test genotypes.

the these

a test?

‐ or to

t

of

appropriate

3 is

divide groups among

wise groups ps at once. ‐ of quantitative (=variability

genotype a

then pair

categories

the differs within

between all also noise

y

We

in y

framework

Analogous to pooled from a ttest. from variance pooled to Analogous

Tests

and value

measure

Summarizes the differences the mean differences Summarizes grou between all gene. genotype

perform we

Variabilit

trait 3

Variabilit Test statistical difference 

‐ background not

the

F F

individuals the

candidate into average N than Suppose What Why Is     Hypothesis The 5 7

time

a statistic

sample

a at

is

one

of

parametric ‐ pairwise

variance difference

analog non the

Test? one ‐ that at

error t

I test:

than look test

Alternatives ‐

t type

to

parametric wise ‐ recognize ‐ once.”

more

our Whitney

‐ want non at at

can sample

Pair we “all

we look

Mann Test:

two ‐

increase

us

of

this,

Parametric let will

instead, ‐ do

Three test

‐ that differences t Wilcoxon Wilcoxon This So, To analog      Non Do 3 10 12

to

the used

of

exactly

equal

in

one being all

comparing

ANOVA

are

least

k classified

factors variable you’re

μ at

Hypothesis

is

= factor

means

that more

value

=

is variable the 2

Null

to single

μ

(groups) categorical

each the =

1 that is

the μ

is :

is called

samples 0 k

H because hypothesis

also variable is generalizes

ANOVA ANOVA: variable groups

assume

way ‐ different case,

the

hypothesis will

easily

is

way outcome factor one

We Way Way null alternative 

‐ ‐ The The The define one    means Simplest ANOVA The The     One One 9 11

referred

are

qualitative

factors

variable

as

more

factors to

or

ANOVA

one outcome

SNP of referred of

a

ANOVA of

are

differentiates

effect

of

that the

quantitative of

a

genotypes

variables

on

study

SNP three

to Framework

levels

e.g., e.g., as

  variables Want Qualitative Characteristics Independence Normality Homogeneity to       Basic Assumptions 4 14 16

to

data

deviations

equal

the

of are

squared

variation

variances

average

(groups)

Variation total (groups)

levels simply

is levels

Total standardized

partition

ANOVA

within between

to

the of is variation

sources mean

true

that idea

the is Variation Variation two

another

0 1. 2. H

  into one from Basic If Recall    Rational Partitioning 13 15 th j

and was 4

. from

3 6 3 . . . 71  76 50 87 trait

   55 3 3 3 (group) / / /

 ) ) ) sampled 68 55 59 97

    level 78 59 83 th 38 i  

  ij N 83 38 82 x quantitative ( 68 ( (

the

J     j . . . randomly 2 1 3

9 11 x x x 78  K is: i

 (= 9) individuals ,  from SNP  some

N .. 83 x of 

97 68 55 data mean (= 3) groups … (= 3) groups

97 68 55

97

ANOVA single k 

83, 78, 59, 83, 78, 59, individuals a

the

83  in sample of There are are There and grand

82 or

 . denote 3 x ij Data: Details

x

AA:AG: 82, GG: 83, 38, AA:AG: 82, GG: 83, 38, random

      measured population A Genotype Our Let observation Overall,      Motivating The 5 18 20

mean

Statistics

F k ‐

error N

1, ‐ k

, and

α 5.14

F

= is 6

59 equal 2,

. group

12 of 0.05, are

 F E region

2 . 3 . Calculating ratio

MST then

84 1062 the

 and is rejection

G 0.05, E G

= AA AG GG for

α MST MST MST

There… 

statistic Example

value

Value

true F

define

is test

0 H we

Our

The If Critical If squares     In Almost 17 19

divide

we

freedom

of

degrees comparable,

Squares

squares 1062.1 8) 6) 2)

= = = =

associated

of

84.3 2

–1 –3 –1 / = Mean

6 (9 (9 (3 sum

their

/

k –1 by –1 ‐

the

k N N 2142.2 Example

506

:

: :

= = G E T one

G E make SST SST SST

Our   

each To MST MST    In Calculating 6 22 24

a

and

is likely

analog

is

have

trials to n

rate

of

distribution

each parametric

‐ success assumed

in the

of non

n

true

x data, situation…

succeeded

the .

0 Test: ..., mean

outcome

μ

, Alternative 2

them the x

Sum different following

if

of ,

value a

1 binary

whether

x a the

Rank

many

Data have

given

consider specified

wondering how

a we Wallis

‐ wondering distribution:

. to let’s p

kruskal.test()

Parametric

that were are ‐ know

ANOVA be R,

to Kruskal In Previously, We Now, Say We equal normal to we        Non Binomial 21 23 test Tests

Binomial test Binomial Chi-square or Fisher’s exact Fisher’s McNemar’s test McNemar’s Sample ‐ test Wilcoxon- Wilcoxon test Wilcoxon test Wilcoxon Mann-Whitney Mann-Whitney One/Two

Type of Data of

t-test t-test Two sample One sample Paired t-test Tour

Table

Goal Gaussian Non-Gaussian Gaussian Goal Binomial value group to a hypothetical Compare two Compare two Compare one paired groups ANOVA Whirlwind unpaired groups 7 26 28

an

‐ 5

0.2 of

as

same chi

rows

to

R

the the

0.07883 observe

equal matrix

=

to a numbers

have not tables:

Proportions

of is

value ‐ p takes

Proportion

table 50, bigger

frequency a success =

2

that

approximation, x of

test too

2

populations

trials a on a

Binomial of is can

good

two up

a

0.2 of

Mutant alleleMutant allele WT is test

chisq.test probability

set

number

true 5, interval: Limits

You

there whether

=

50) . whether

0.2) square test

function is of

but

50, test test R

chi

) success Equality 2

0.21813537 (out

50 an we χ

hypothesis: of confidence ( the successes

Population #1 #2Population 5 10 45 35 0.1

binomial hard, is

frequency?

of try: do and columns.

estimates: question is

5 C R, fact,

Exact percent mutants Our In

How This In There allele square outcomes: argument. by binom.test (5,

0.03327509 sample probability 95 alternative number data:   >     Confidence Testing 25 27

gene

there

a

of

on

that

allele

if there are are there if

you? p SNP

Tails a

know

you

expected

and surprise

Its studying to

in is 0.025

frequency?

Confidence limits for 5 mutants out of 50 subjects subjects,

going p = 0.21813537 50

not “right”

is 0.025

Intervals is mutation 0.2

Thrombosis.

with interested

p = the

the p of

with is

you’re p=0.2

p = 0.0332750 is

having

population that 5

0 1 2 3 4 5 6 7 8 9 10 7 11 6 12 5 13 4 14 2 3 15 1 0 16 17 18 … a

are frequency associated Say In Then, What     Binomials Example Confidential 8

30 32

one

the

in sums

so less

a

(row this of

of effect

1). cell in is

in

(a

5263 cell . classes, table this

1)(m

you

0 − quantities.

number of (a

 frequencies (n

8947

class . distribution

2 is table the 7

table 95 50

row 

the class

less 1) is number

Test? Test?

each 95

which

− 1, distributed

 ‐ in 1]

each (m to

total contingency

95 50

which

1) in expected) up n contingency

 −

and the -

a

x

(m contingency is (total) 95 15

normally add

table, “Expected” number of subjects in pop #1 AND having mutant allele m

n / For expected

1) of

x −

numbers

Square Square an (n

m WT WT numbers ‐ ‐ −

allele Total square freedom sum) ‐

frequencies

an For

1578 (observed [nm squares of

(total) .

estimate.

chi

frequencies 0

/ a Chi Chi

to of) For are

 expected on allele classes

column a a Mutant

expected had 95

15

 degrees sum) up 1) (column table).

classes:

2 the x of expected −

you

the

table).

numbers  freedom all

(n

Do Do value the

of

out

out

Total 15 80 95 sum) the over

(column

number

To To x (various

Population #1 #2Population 5 10 45 35 50 45 degrees estimated parameters because of The Look Figure Figure Sum (row

contingency

  1. 1. 2. contingency sum) a is    How How 29 31

in

this

cell

and

(a

table

way,

class

one

correction Test? in each

in

contingency

n them

WT

allele Total WT WT x allele Total

continuity (total)

m /

numbers “Expected” number of subjects in each cell

Square 0.1772 an Yates'

classify ‐

= )

allele 7.8957.105 42.105 37.894 Test sum) Mutant

For with

and

value c(2,2) ‐

Chi allele Mutant p

test

expected 1,

a

(column table). Total158095

df = the way.

x

squared ‐ Population #1 #2Population 5 10 45 35 50 45 Square Do

individuals ‐ out

c(5,45,10,35), Chi

Total 15 80 95 sum) 1.8211,

= To

Population #1 #2Population 5 10 45 35 50 45 another

“observed” Chi

draw

matrix( Figure tb (row

‐ contingency

Pearson's We 1. is a also squared chisq.test(tb) tb < ‐

data: X >  >  The How 9 34 36

the this

sum value

of

open

and

the

2(df)

the = ‐

df

of

mean trials =

are

variance expected well

20 square

E(x) Var(x) ‐ approximate with

  ) The and chi 1 , we with fairly

0

k, is

distribution Ν

Z ~ probabilities the

distribution:

deviates. heads

distribution is

of

class distribution

where ; Distribution 2

distribution

Number of mutated Normal Z

the normal

normal 1

 df i  a

Approximation the number

0.3,

distribution binomial 2

df

by  binomial

each under

the

standard

Square ‐ Probability the

square For ‐

area

frequency

shows Chi

Chi Normal

the squared

approximated The Actually, This allele of circles. by    The The 33 35

you

“Expected” for

2 ) when

89 . 7 3 - 37.89 (35 computed 2

distribution  2 2 ) χ 11 . values WT ‐ allele Total 7 the freedom: -

p 7.11

expected)

- of

(10 for

 expected 2 correct )

7.8957.105 42.105 37.894 11 . values the 2

degrees (observed 4

allele - Mutant 42.11 get of

 classes (45 Test can  

critical

2 2

 ) you

895 . Total 15 80 95 Values numbers some 1234 3.8415 5.9916 7.8157 9.488 15 11.0708 20 12.5929 25 24.996 14.067 30 35 31.410 15.507 40 37.652 16.919 45 43.773 49.802 50 55.758 60 61.656 67.505 79.082 df Upper 95%point df Upper 95%point

10 18.307 70 90.531 7

Degrees of freedom = (# rows-1) x (# columns-1) = 1 = (# columns-1) x = (# rows-1) of freedom Degrees 7.895 - R. (5 Population #1 #2Population 5 10 45 35 50 45

course, are

 Square use Of 2 1 ‐   Here different  Chi Critical “observed”