<<

Lecture 1

Introduction I: Pedigrees, genetics and probabilities

Magnus Dehli Vigeland

Statistical methods in genetic relatedness and pedigree analysis NORBIS course, 6th – 10th of January 2020, Oslo Outline • Part I: Pedigrees – Pedigree symbols and terminology – Some common relationships

• Part II: Genetics – Terminology . Locus, allele, genotype, marker – Mendelian . Autosomal, X, Y

• Part III: Pedigree likelihoods – Motivation: Real-life problems – Ingredients: . Hardy-Weinberg equilibrium . Mendelian transition probabilities – Likelihoods by hand – Computer algorithms

Statistical methods in genetic relatedness and pedigree analysis Outline • Part I: Pedigrees – Pedigree symbols and terminology – Some common relationships

• Part II: Genetics – Terminology . Locus, allele, genotype, marker – Mendelian inheritance . Autosomal, X, Y

• Part III: Pedigree likelihoods – Motivation: Real-life problems – Ingredients: . Hardy-Weinberg equilibrium . Mendelian transition probabilities – Likelihoods by hand – Computer algorithms

Statistical methods in genetic relatedness and pedigree analysis Founders: Pedigrees: Symbols and terminology No included in the pedigree

= male = female

Nonfounders

Statistical methods in genetic relatedness and pedigree analysis Pedigrees: Symbols and terminology

= male = female

Consanguineous

Statistical methods in genetic relatedness and pedigree analysis Pedigrees: Symbols and terminology

Medical pedigrees: ● ● = affected = unaffected

● = carrier of disease allele

Statistical methods in genetic relatedness and pedigree analysis Alternative ways of drawing pedigrees

1

3

5

Standard Simplified Directed acyclic graph

Statistical methods in genetic relatedness and pedigree analysis Some common relationships

(and some less common...)

Statistical methods in genetic relatedness and pedigree analysis Cousin relationships

Full

First cousins

Second cousins

Statistical methods in genetic relatedness and pedigree analysis Cousin relationships

First cousins once removed

Statistical methods in genetic relatedness and pedigree analysis Cousin relationships

Aunt - nephew

Statistical methods in genetic relatedness and pedigree analysis Cousin relationships

Grandaunt

Statistical methods in genetic relatedness and pedigree analysis Half cousin relationships

Half siblings (paternal)

Half first cousins

Half second cousins

Statistical methods in genetic relatedness and pedigree analysis Half cousin relationships

Half / half nephew

Statistical methods in genetic relatedness and pedigree analysis Half cousin relationships

Statistical methods in genetic relatedness and pedigree analysis More complicated relationships

3/4 siblings

Statistical methods in genetic relatedness and pedigree analysis What about this?

Double first cousins The connoisseur's favourite!

Quadruple half first cousins! Outline • Part I: Pedigrees – Pedigree symbols and terminology – Some common relationships

• Part II: Genetics – Terminology . Locus, allele, genotype, marker – Mendelian inheritance . Autosomal, X, Y

• Part III: Pedigree likelihoods – Motivation: Real-life problems – Ingredients: . Hardy-Weinberg equilibrium . Mendelian transition probabilities – Likelihoods by hand – Computer algorithms

Statistical methods in genetic relatedness and pedigree analysis Genetics

• Human genome: – Diploid – 22 pairs of autosomes – Sex chroms: X and Y

• Some important terms – Locus – Allele – Genotype – Genetic markers . SNPs . microsatellites

Statistical methods in genetic relatedness and pedigree analysis Locus, allele, genotype

M F alleles

A B locus

genotype: A/B

Homologous chromosomes

• LOCUS = a specific place in the genome, e.g. a base pair, a gene or a region

• ALLELE = any of the alternative forms of a locus

• GENOTYPE = the set of alleles carried by an individual at a given locus

Statistical methods in genetic relatedness and pedigree analysis Genetic markers

• Small parts of the genome which ... – have known position – vary in the population – are easy to genotype

• SNPs (single nucleotide polymorphisms) ...CCGTTATATGGGC...

– two alleles = minor allele frequency ...CCGTTAGATGGGC... – usual requirement: MAF > 1% ...CCGTTATATGGGC... – very common in the genome (millions!) ...CCGTTATATGGGC... – used in medical genetics +++ ...CCGTTAGATGGGC...

• STRs (short tandem repeats) = microsatellites – consecutive repeats of 2-5 bases ...ACG TTAG TTAG TTAG TTAG AAC.. – multiallelic: 5 - 50 alleles ...ACG TTAG TTAG AAC.. – allele names: # repeats ...ACG TTAG TTAG TTAG TTAG TTAG AAC.. – used in forensics

Statistical methods in genetic relatedness and pedigree analysis Outline

• Part I: Brief introductions – Pedigrees symbols and terminology – Some common relationships – Genetics . Locus, allele, genotype, marker – Mendelian inheritance . Autosomal . X, Y

• Part II: Pedigree likelihoods – Motivation: Real-life problems – Ingredients: . Hardy-Weinberg equilibrium . Mendelian transition probabilities – Likelihoods by hand – Computer algorithms

Statistical methods in genetic relatedness and pedigree analysis Mendelian inheritance: Autosomal (chromosomes 1-22)

Example: autosomal marker with 3 alleles: A, B, C

homozygous A/A B/C heterozygous

A/B A/C A/B

B/C Probability of transmitting either allele: always 50%

Statistical methods in genetic relatedness and pedigree analysis Mendelian inheritance: X-linked

Example: X-linked marker with 3 alleles: A, B, C

males are A B/C hemizygous

A/C A/B C

A no transmisison forced transmission from to from father to

Statistical methods in genetic relatedness and pedigree analysis Mendelian inheritance: Y-linked

Example: Y-linked marker with 2 alleles: A, B

A

B • no transmission involving females B

• father-son forced

Statistical methods in genetic relatedness and pedigree analysis Assumptions throughout (most of) this course

• Diploid species • No cytogenetic abnormalitites • No de novo mutations

COFFEE BREAK!

Statistical methods in genetic relatedness and pedigree analysis Outline • Part I: Pedigrees – Pedigree symbols and terminology – Some common relationships

• Part II: Genetics – Terminology . Locus, allele, genotype, marker – Mendelian inheritance . Autosomal, X, Y

• Part III: Pedigree likelihoods – Motivation: Real-life problems – Ingredients: . Hardy-Weinberg equilibrium . Mendelian transition probabilities – Likelihoods by hand – Computer algorithms

Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes

• Will my have the disease?

• Is NN the true father?

or half brothers?

• Is NN related to this ? How?

• Predict the missing genotype?

Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes

D/N

Disease locus: ? D alleles D and N

Will my child have the disease?

Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes

1 2

11/13 -/- 13/18

11/18 Suppose: • 11 is common • 18 is rare

Who is the true father?

Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes

Brothers or half brothers?

Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes

12/14 32/40 7/11 6/21

11/14 32/40 13/13 6/25 12/16 34/40 7/7 12/21 11/16 32/41 7/13 6/25

Is this woman related to the family?

Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes

A/B A/A

A/A ?/? A/B

A/B

Can we predict the missing genotype?

Statistical methods in genetic relatedness and pedigree analysis • Common to all of these: The need to calculate probabilities

P( genotypes | pedigree, marker info, allele freqs, .. )

• Called the likelihood of the pedigree.

Statistical methods in genetic relatedness and pedigree analysis Ingredients for likelihood computations

founder probabilities

A/B A/A

A/A -/- A/A transition probabilities A/B

untyped individuals

Statistical methods in genetic relatedness and pedigree analysis Ingredient 1: Founder probabilities

• Suppose the allele frequencies are:

푃 퐴 = 푝 푃 퐵 = 푞

• What are the frequencies of the genotypes AA, AB, BB? • Under certain assumptions, the alleles can be treated as independent:

푃(퐴퐴) = 푃 퐴 ∗ 푃 퐴 = 푝2

푃 퐵퐵 = 푃 퐵 ∗ 푃 퐵 = 푞2

푃 퐴퐵 = 푃 퐴퐵 표푟 퐵퐴 = 푝푞 + 푞푝 = 2푝푞

two possible orderings!

Statistical methods in genetic relatedness and pedigree analysis The Hardy-Weinberg principle

Assumptions: A 푝 B 푞 • infinite population • random mating ∗ ∗ ∗ • no selection AA AB BB • no mutations • no migration A 푝 B 푞 Hardy (1908): Shows 2 2 «... using a little mathemathics of the multiplication table kind»: 푝 2푝푞 푞 AA AB BB • allele freqs are unchanged from to generation

• after 1 generation the genotype freqs stay unchanged A 푝 B 푞

2 2 푃 퐴퐴 = 푝2 푝 2푝푞 푞 푃 퐴퐵 = 2푝푞 HW equilibrium AA AB BB 푃(퐵퐵) = 푞2

Statistical methods in genetic relatedness and pedigree analysis 2 푝퐴퐴 = 푝 푝퐴퐵 = 2푝푞 2 푝퐵퐵 = 푞

assuming HWE

Allele Genotype frequencies frequencies

always

푝 = 푝퐴퐴 + 0.5 푝퐴퐵 푞 = 푝퐵퐵 + 0.5 푝퐴퐵

Statistical methods in genetic relatedness and pedigree analysis Ingredient 2: Transition probabilities P(gchild | gparents)

• Easy - follows directly from Mendel's laws!

child A/A AB BB parents AA×AA 1 0 0 AA×AB 0.5 0.5 0 A/A A/B AA×BB 0 1 0 AB×AA 0.5 0.5 0 AB×AB 0.25 0.5 0.25 -/- AB×BB 0 0.5 0.5 BB×AA 0 1 0 BB×AB 0 0.5 0.5 BB×BB 0 0 1

Statistical methods in genetic relatedness and pedigree analysis Example

1 2 A/A A/B

3 A/B

L = 푃 푔1, 푔2, 푔3

= 푃 푔1) ∙ 푃(푔2) ∙ 푃(푔3 | 푔1, 푔2

= 푃 퐴퐴 ∙ 푃 퐴퐵 ∙ 푃 퐴퐵 parents = 퐴퐴 × 퐴퐵)

= 푝2 ∙ 2푝푞 ∙ 0.5

= 푝3푞

assuming HWE!

Statistical methods in genetic relatedness and pedigree analysis Example on X

A 1 2 A/B

3 4 5 B/B

A/B B

6

B/B

퐿 = 푃 푔푒푛표푡푦푝푒푠 | 푝푒푑푖푔푟푒푒, 푝, 푞 1 2 3 4 5 6 contribution from each individual = 푝 ∙ 2푝푞 ∙ 0.5 ∙ 0.5 ∙ 푞2 ∙ 1

= 0.5 푝2 푞3

Statistical methods in genetic relatedness and pedigree analysis Ingredient 3: How to deal with untyped individuals

Solution: Sum of all possible genotypes for the untyped

1 2

A/A -/-

3

A/B

푃 푔1, 푔3 = ෍ 푃(푔1, 푔2, 푔3) = ෍ 푃 푔1) ∙ 푃(푔2) ∙ 푃(푔3|푔1, 푔2 푔2 푔2

= 푃(퐴퐴) ∙ 푃(퐴퐴) ∙ 푃 퐴퐵 퐴퐴 × 퐴퐴) + 푃(퐴퐴) ∙ 푃(퐴퐵) ∙ 푃 퐴퐵 퐴퐴 × 퐴퐵) + 푃(퐴퐴) ∙ 푃(퐵퐵) ∙ 푃 퐴퐵 퐴퐴 × 퐵퐵)

= 푝2 ∙ 푝2 ∙ 0 + 푝2 ∙ 2푝푞 ∙ 0.5 + 푝2∙ 푞2 ∙ 1

= 푝3푞 + 푝2푞2 = 푝2푞 푝 + 푞 = 푝2푞

Statistical methods in genetic relatedness and pedigree analysis Pedigree likelihood: General formula

• Given: – pedigree with n individuals

– k members are genotyped: g1, g2, …, gk

founders non-founders • Then:

푃 푔1, ..., 푔푘 = ෍ ෍ … ෍ 푃 푔1 ⋯ 푃 푔푗 ∙ 푃 푔푗+1 푝푎푟 ⋯ 푃 푔푛 푝푎푟 퐺1 퐺2 퐺푛

Gi = set of possible • If everyone is typed: Only one term → easy genotypes for individual i

• Number of terms grows exponentially in #(untyped) – but clever algorithms exist!

Statistical methods in genetic relatedness and pedigree analysis Computer algorithms for pedigree likelihoods

• Elston-Stewart algorithm A/B A/A – a peeling algorithm – linear in pedigree size!

A/A -/- A/A

A/B • Lander-Green – based on inheritance vectors – hidden Markov model – best choice with many linked markers – small/medium pedigrees only

Statistical methods in genetic relatedness and pedigree analysis Software

• R/pedprobr – Part of the ped suite – Elston-Stewart – general likelihoods, , genotype distributions ++ • Familias – GUI for forensic applications – Elston-Stewart – handles mutations, HW deviations, ++ • MERLIN – command line program – Lander-Green – medical applications: multipoint linkage

Statistical methods in genetic relatedness and pedigree analysis