<<



© Various sources. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

 Computational : , Networks,

MIT 6.047 / 6.878 HSPH IMI.231 HST.507

 I. Administrivia

             !  "# 

$ Introductions # M   '  ()*+ ,  - .+- / M .    ,  0  +  +10 + 1,  +" .   +  

8 Course Information # *  M  , M$9 #    M :7 . $,  M    ( )< )"  #/ – All handouts, lectures, notes, etc will be posted here.

#    M : + ,3    䇾;98?* 䇿 O;98? P

> Goals for the term #    ,  3 . M 7  ,3    ,  3 . M    <    B6    . M    6 0,  ,  6  M C how   # 3 .     M "3  B     M "    b 5D, <    M 7 ,# D,  b,,  .,    b,6 6 6 (+ /

; Course content

? Computation & Biology | Foundations & Frontiers # 2 .E(D5 D/ ,  - . M  b , 3  ,3   M  b  B+, ,  # 2 .E(.5 D/7  7 M  M  56,3  +     M 䇺   䇻6 6  M M 5,     , D+ ,3  +,B M  3 B   M ,,# +   

F Course organized around bio/comp modules # 1    ,  0  6   M  M M M M M # 7   7 6 6  M 2.  , +  +  + +1+ 33) , + + 6 +7   + )G+ 7+D57 +, .  +< , +0  .  +=)+  ,, # 7   )  6 6 M 10  . +  ,  .+  H+H 6    .+1,  +      

  .+   +  3 .+$2 I 9 Textbook / class notes / resources

 (Optional) Books for the Course

© Cambridge University Press. All rights reserved.This © Wiley-Interscience. All rights reserved. This content is excluded from our Creative Commons license. content is excluded from our Creative Commons For more information, see http://ocw.mit.edu/help/faq- !'!#$$%&'( license. For more information, see http://ocw.mit fair-use/. .edu/help/faq-fair-use/. 0  3 .-C,+   (JK895;9/   30   -C1 3  Courtesy of The MIT Press. Used with permission.  New this year!! Book for the Course

;98?<;F?F G3  ,  .  3. .M G  .+ .M

0  3 .: "27

$ Lectures and Scribing # 1     0   3        M "  , 36    3 ,   # - 6 ,0.  M 0  3  3 #  ,  66 336,0   M C 䇻 6 ,0. (  / # 7  66 3; . 6   M N ,  ,0 6  ,0.    ,  # )      ,  3 # )   35. # 3 M 16+6 + .+ + M 8 Scribing details – DropBox 6047_book LaTex



> Sign up here if you haven’t already

; Online material from last year

? Lecture feedback:  N 0 , 5>       .,5> $ ! .6,  M ! .6 5> M  6.6D,  5> M C6 6  5> M =B B  . 5> 8 "  M 266  .6    .5# 5  M  6   0  5# 5   M " 6    5# 56  >  , (6 , / M Q9R+9589R+895;9R+;95F9R+SF9R

 F Homeworks and quiz

I Details on Problem sets # 1 ,3   ,   (/ M "  ,3   D,  B+ +    + .  .+, . +   33 06,3  <   M   ,3  ,<, ,+D,    <   <    ,   <,  (., .   0  ,3  6;F?F/ # 2 . F,  M * ,  . 6 D3 +0  6  M 6   6 +,   + D., . +D ,,     # )3      M H 36.䇻0 0  +.  .  < 0< 0

9 Details on the in-class quiz

#䇻   + 䇻 6 D  M䇻 B+6 .+6++ +6. # 2   .6   8   M C ., ,    M C 3 0  , M 3 . ,, .  0,  ,3   # .,6B M ' B<7#6.+  ,    M 2, B   M "  ,3     ,    M 2,3  (/< 6  + 3    +      Final Project

 Final Project: Original Research in Comp Bio #  # , 6  ,, .6      ,  3 . M 7  3  ,3    ,  . M    0       M ) 0  +    M ,  3  . #   3 .,.     M  6   ,, (6  ,< / M = 6  ,   .  M 0,,, +66 + ,0  M  063  0.,,  M =,.   6 , ,6  M "      6    #  ,# D,   ,  $ It’s a team project # "    66 .,M # 7    .   ,   .D,

8 Final Project at a Glance   73 

䇻 >

 Details on the final project #  66 , <63  M )5,6,#   .   M       ,   .  M ,  . 䇻,# +  , , M ",  3    + 0 3 +D,   M  ,+ ,+  +6 # "   M ) ,  0 .  M ,   +  +63  M "50   . 3,,, + 0 63 <+, B+ #  #  5 D, +     M  <6  ,,, +,0+.  .,+ 3 <66+  3 +, ,+0 

; Finding a research mentor / research advisor #    6  . <- < 0  M        M ,  , , 0    M 1D,(/    ,  0  +(/  H+($/  3    +(8/  ,,+   +0    .(6 6  / #     ,  M : + H+ . 6++ ,  +, .   (     / M  ,  ,-, # N,   D,  M  3 +   M   0   +6 M 0+3   6 

? Putting it all together

F Course Grading

#  

 # 8,3   M 1 ,3  ?59R+ 0$58  +  $58,3   M    ,3   ,   (")/ M   0   ,3      # 7 ,#  M      ,  3 .(?M/ M  ,50H 5. ,,    63  # ! M 5 B(H0>/H6 D  #  3 ,  . M  3  +3.  # =, . ,3  36  # = . #       3 H 

I Why Computational Biology ?

$9 Why Computational Biology: Last year䇻s answers # *6  (T 6  / #     # " 6 #䇻all 3   # 3 .0  # )  + ,   , # U06.(  ., 6/ # ",   < .D, 30  # H<  3 60  3  # 166  .( D,  ,  0/ # 6  6  ( 3 .  3  / #  +  5  , # . 6  .,    # *6 6 C     

$ ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA TATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC TAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC TGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT CTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG AATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA GCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA CTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAA TCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTG CACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCA TGGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATAT CCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTT GGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAA TAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCAC CAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAG TTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAG GCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGA AATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG CGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA GAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA ATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACT AGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA GTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGG ACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAG CTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTAC GAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACA AAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGA AATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCA TTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCAT CCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATT AGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAA GTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA GCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACA CAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATC CACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCG$T GTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA TATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC TAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC TGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT CTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG AATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA GCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA CTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAA TCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTG CACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACC A TGGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATA  . 6 T CCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTT GGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGA 1  A TAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCA   C CAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGA , G TTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAA D, G GCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGA AATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG CGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA GAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA ATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACT AGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA GTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGG ACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAG CTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTAC GAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACA AAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGA AATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCA TTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCAT CCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATT AGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAA GTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA GCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACA CAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATC CACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCG$$T GTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA TATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC TAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC TGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT CTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG AATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA GCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA CTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAA TCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTG CACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCA TGGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATAT CCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTT GGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAA TAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCAC CAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAG TTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAG GCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGA AATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG CGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA GAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA ATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACT AGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA GTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGG ACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAG CTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTAC GAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACA AAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGA AATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCA TTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCAT CCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATT AGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAA GTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA GCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACA CAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATC CACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCG$8T GTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA TATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC TAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC TGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAG1D  6 ATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT CTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG AATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA GCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA CTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAA TCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTG CACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCA TGGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATAT CCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTT GGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAA TAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCAC CAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAG TTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAG GCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGA AATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG CGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA GAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA ATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACT AGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA GTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGG ACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAG CTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTAC GAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACA AAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGA AATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCA TTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCAT CCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATT AGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAA GTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA GCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACA CAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATC CACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGT GTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT$> The components of genomes and gene regulation

#    , + < +  3 .  #  "5 D++5  H+ H6  #    " +  + +    #      . 6+  5  3 . , #     ,6 +   6+   #  .    ,3  .,+ 0 ,  #   b  b + "5B+   0. #   7< 6<   3 +, 0   #  0    0.+,,   +   , # 10 " .  +, .   +   +    . # =)

# 2 " (,/  +,    +.  3 . $; $?

37 Coupling each topic with foundational CS tools

Lect Fundamental problbio em Foundational comp. tool   

 )B    2.  , 

$ 2  3     

8+>  3    < <* <1

;+?   ,   .  <1

F+I  .  ,   +,    .

9  . 6 6 <33) , <1

 1,    6 < 

$5; ",   )     6 

F5I  ,  " .  <- . 6 

$F Overview of the 5 modules

$I Challenges in Computational Biology

8   3 .

>  . 6 0.  7 2H

 )B   

;  ,  0   TCATGCTAT TCGTGATAA $ 2  3  , TGAGGATAT ? 10  . . TTATCATAT TTATGATTT

F D,  .

H  , I   0. 9 33 ,   "  .

  3    

$ 1 ,,

89 Module 1: Aligning and Modeling Genomes

# 7 06 M 7     ,    <3  ,  M 7* 0 , +,B+     M 2 .6 3  ,3  <6   B # )B    M * < 3   6 5 0 0  .0 M 2  3    6  . 0    . #  0  M  0 ( /  ) M 2 +0  +, +   + 

8 Dynamic Programming : Align, HMMs

 [ಹಹಹಹಹಹಹಹಹಹ[0 1 6WDWH  

G(/

. ಹಹಹಹಹಹಹಹಹಹ\  \ [[[ಹಹಹಹಹಹಹಹಹಹಹಹಹಹಹ[1 # )B    #  0  # 2"  ,   B M "0 0  ,  +   ,  3 . M 7 .D, D,   , , . M M .   +3 5 + 0  M ),  B :, 3  M 7  + +, . .+ +,,G 8 Module II: Gene expression analysis and transcripts

#  ,  6  M C,0* 1D,   D   M ),0   0<   0   M   ,,+6  +,   ,  # -  6 M ")  0 + +,  M *;<*?    ,  M *FD,  . <  M *I  . 6 0.1+33 , +6

8$ H  ,,  <    )    ., )   .53 0,6 b b a a

 +H 99    +H 999 䇺䇻 ')( Courtesy of Macmillan Publishers Limited. Used with permission. *+ Source: Alizadeh, Ash A., Michael B. Eisen, R. Eric Davis, Chi Ma, Izidore S.Lossos, Andreas (011'1(12331 Rosenwald, Jennifer C. Boldrick et al. "Distinct types of diffuse large B- lymphoma 404566( identified by gene expression profiling." 403, no. 6769 (2000): 503-511. *((2 88 75)8$9;#$$#<0%93%= 6 3  6 6 .,  6.  6.66,   .

 +H 99    +H 999  ')( *+ (011'1(12331 Courtesy of Macmillan Publishers Limited. Used with permission. Source: Armstrong, Scott A. et al. "MLL translocations specify Source: Alizadeh, Ash A., Michael B. Eisen, R. Eric Davis, Chi Ma, Izidore S.Lossos, Andreas a distinct gene expression profile that distinguishes a unique Rosenwald, Jennifer C. Boldrick et al. "Distinct types of diffuse large B-cell lymphoma leukemia." Nature 30, no. 1 (2002): 41-47. identified by gene expression profiling." Nature 403, no. 6769 (2000): 503-511. 8> Module III: Epigenomics and gene regulation

#  ,  7  M  0 ( /  ) M 2 +0  +, +   +  M C,0* 1D,   D   M ),0   0<   0   # -  6 M ")  0 + +,  M *;<*?    ,  M *FD,  . < 

M *I  . 6 0.1+33 , +6 8; Motifs summarize TF sequence specificity

# )  6   #   . ,  #  6 6   # 2  6 0 6    #  , M ,  M 7D,  

8? Starting positions  Motif matrix # 0 B  .  ,,6  D

shared motif sequence positions

1 2 3 4 5 6 7 8 A 0.1 0.3 0.1 0.2 0.2 0.4 0.3 0.1 C 0.5 0.2 0.1 0.1 0.6 0.1 0.2 0.7 G 0.2 0.2 0.6 0.5 0.1 0.2 0.2 0.1 T 0.2 0.3 0.2 0.2 0.1 0.3 0.3 0.1

given profile matrix  # easy to find starting position probabilities

8F Multivariate HMM for Chromatin States Transcription Enhancer Transcribed Region Start Site DNA

Observed chromatin marks. Called based on a '8  '8 $ '8 $ '8  '$; $ '$; $ '$; $ '$; $ poisson distribution '?  '8 

Most likely Hidden State 1 2 3 4 6 6 6 6 6 5 5 5

High Probability Chromatin Marks in State 0.8 200bp 0.8 0.7 1: '?  4: All probabilities are intervals '8  '8  0.9 learned from the data 2: 0.8 5: '8 $ '8 

3: 0.9 6: 0.9 '8 $ '$; $

Courtesy of Macmillan Publishers Limited. Used with permission. Source: Ernst, Jason and . "Discovery and characterization of chromatin states for systematic annotation of the human ." Nature 28, no. 8 (2010): 817-825. 8I Modules IV and V: Evolution/phylogeny/

# " .  <" .    M " .  10  .  +3 +" . 6  M " .   <, +    +   +,, # ",    M * ,,  .6    (2 0  / M )     ,,,, ( 2 ./ M        ,, (" ) 3/ M     3 . 5   (N 01  / # 䇻M* ,H0+5 BH0 M H 38M 6  6,# + 0+7 >9 Characterizing sub-threshold variants in heart arrhythmia

© source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq- fair-use/.

Courtesy of Macmillan Publishers Limited. Used with permission. Source: Arking, Dan E. et al. "Genetic association study of QT interval highlights role for calcium signaling pathways in myocardial repolarization." Nature Genetics 46, no. 8 (2014): 826-836. Focus on sub-threshold variants (e.g. rs1743292 P=10-4.2) Trait: QRS/QT interval (1) Large cohorts, (2) many known hits

(3) well-characterized drivers > Evidence of NeanderthalbHuman

H: 1H17*:= 1H17*:=

H 56   JF>. 

0     J>99.  ", ,  J$>9. 

J99. 

H        H       

F F - -

Courtesy of Luna04 on wikipedia. Courtesy of Luna04 on wikipedia. License: CC BY. License: CC BY.  5  0  G1 1  5  0    > Structure of genetic code  evolutionary signatures # )3 ,0 ,,   D # * , 6 0  .      ,5  #    6  3

 0 , .3  QC 6  QN 6  303,    5  '7'* +(011'1(12331

 , 6.66 6 3+    66 ,3 3 6 .0  

>$ Distance matrix  Phylogenetic tree

Hum Mou Rat Dog Cat Human 9 8 > ? ; Mouse .  9 $ F > Rat .  9 I ? Dog D .D .D 9  Cat D  .D  .D    9

  ,  2 D  ,  2#     D      #      .  #(2#5#/       ,  .330   53    >8 ‘Peeling’ for P(D|B,T) term

  sites j evolve independently      6       branch independence+ , 

1D,  #,3 3 .,6"(DVD, +/

: ."(D5/ +,+3  6B $ =   , 66D,  P(xi|xparent(i),ti) 263.B   (O+'"+ 'N+ / 1  .  6 .0  6  8  0  bmarginalize ) 0 ,3 0 6  < *D +G+D ,B65  U 5 >> Two types of gene-tree species-tree reconciliation

Coalescence Duplication & Loss



 -   -  # Coalescent models of alleles in populations # DL models of genes in species Deal with 1-to-1 orthologs Deal with paralogous families Estimate divergence times, pop sizes, etc Estimate birth death rates Models move backward in time Models move forward in time Cannot cope with duplication and loss Cannot cope with incomplete lineage sorting >; >? Biology primer

!      3 . 6  6    

>F 䇾Central dogma䇿 of





>I DNA: The double helix #       

Image by MIT OpenCourseWare.

 DNA: the of heredity # ) 65  ,   .    3 6 . M ' +    , 6   M 䇾  ,   , 6 ,  0,    .  ,3  ,.   6    䇿= W +I>$

Nitrogenous bases Phosphate Deoxyribose molecule DNA REPLICATING sugar molecule ITSELF TA TA GC A T TA AT TA C G CG

TA A C T C G G G G C T C A A G T OLD T A G OLD GC NEW AT GC NEW TA Weak bonds AT {

{ CG between bases TA

GC TA TA TA GC Sugar-phosphate backbone GC AT

Image by MIT OpenCourseWare. 61 DNA: chemical details

# -    2䇻 3䇻 4䇻 #" , 3 3 T 1䇻 5䇻 • Weak hydrogen bonds hold the A  5䇻 two strands together 3䇻 • This allows low- opening 4䇻 1䇻 2䇻 C 4䇻 and re-closing of two strands 3䇻 2䇻 1䇻 G 5䇻 5䇻 • Anti-parallel strands 3䇻 • b 4䇻 1䇻 2䇻 Extension 5䇻 3䇻 tri- 4䇻 phosphate coming from 3䇻 2䇻 T 1䇻 A 5䇻 newly added nucleotide 5䇻 3䇻 The only parings are: 4䇻 1䇻 2䇻 C 4䇻 • 3䇻 2䇻 1䇻 A with T G 5䇻 • C with G 5䇻

4䇻 1䇻 3䇻 2䇻

; DNA: the four bases

" " ".  ".  =  =  ) )     ' '

;$ Alignment: all species/genes share common ancestry

© Various sources. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. )  ) 6 -   ;8 

© Neal Olander. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. ;> Tree of 

© Neal Olander. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. ;; Extinctions part of life

Phylogenetic tree showing archosaurs, dinosaurs, birds, etc. through geologic time removed due to copyright restrictions.

;?  6  ,   .6 6䇺䇻3   6 䇺䇻  䇺P䇻), + + .,+2 +   +*  +7  + Mammal family tree removed due to copyright restrictions.  +   ).   䇺䇻,   +    + D,+73+ 6+G  7 + ,+  6  0+䇺 06 䇻+ B 6D , +, +  " .  .D ,    (,  <  <   /

;F 䇾Central dogma䇿 of Molecular Biology





;I Chromosomes inside the cell

Eukaryote DNA Prokaryote

Nucleus DNA organized in a single chromosome. No nucleus. No . DNA organized in multiple chromosomes inside a nucleus. Mitotic division

Figures by MIT OpenCourseWare. DNA packaging # = .,   M 2H0.  M  0.  #  , M    >9+999 Image removed due to copyright restrictions.     Please see: Figure 8-10 from Alberts, Bruce, and Martin Raff. Essential . New York, NY: Garland Publishing Inc., D2H 1997. ISBN: 0815320450. # C 2H M -6 , 62H 6 . +    ,   ,  . # H  M  6 3 . M )    6 M  6$2  ? 20,  6 

Courtesy of the National Institutes of Health; in the public domain.    ,<<   , 0<,  < ? ˆˆ,.<<ˇ...... ,.ˇ. .˛.0<˝,.˛˝ˇ. ...< Diversity of epigenetic modifications

6  #99U66  6  # ,b $< 8< < - #b*.8('8/<'$;G #    6 b<"   #  #2H 6  # . 5,< . 5 2H ,,  , #H  ,

© source unknown. All rights reserved. This content #2H 3 . is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq- fair-use/ #    6 

#7< < <7<    ,?$ Epigenomics Roadmap across 100+ tissues/cell types

Diverse epigenomic assays: 1. Histone modifications # H3K4me3, H3K4me1 # H3K36me3 Art: Rae Senarighi, Richard Sandstrom # H3K27me3, H3K9me3 # H3K27/9ac, +20 more 2. Open chromatin: # DNase

Courtesy of Macmillan Publishers Limited. Used with permission. 3. DNA methylation: Source: Roadmap Epigenomics Consortium et al. "Integrative analysis of 111 # WGBS, RRBS, MRE/MeDIP reference human epigenomes." Nature 518, no. 7539 (2015): 317-330. 4. Gene expression Diverse tissues and cells: # RNA-seq, Exon Arrays 1.Adult tissues and cells (brain, muscle, heart, digestive, skin, adipose, lung, blood…) 2.Fetal tissues (brain, skeletal muscle, heart, digestive, lung, cord blood…)

3.ES cells, iPS, differentiated cells (meso/endo/ectoderm, neural, mesench, trophobl) ?8 Deep sampling of 9 reference epigenomes (e.g. IMR90)

Courtesy of Ting Wang. Used with permission. UWash Epigenome Browser, Ting Wang Chromatin state+RNA+DNAse+28 histone marks+WGBS+Hi-C ?> 20    ,   

Enhancers Promoters Transcribed Repressed # H3K4me1 # H3K4me3 # H3K36me3 # H3K9me3 # H3K27ac # H3K9ac # H3K79me2 # H3K27me3 # DNase # DNase # H4K20me1 # DNAmethyl # H3K4me3 # H3K4me1 # H3K27ac # H3K36me3 # H4K20me1 # H3K79me3 # H3K27me3 # H3K9me3 # H3K9ac # H3K18ac

© source unknown. All rights reserved. This contentis excluded from our Creative

Courtesy of Broad Communications. Used with permission. Commons license. Formore information, # 996 6 + . see http://ocw.mit.edu/help/faq-fair-use/  .

#).   ,, "5+- 65+2H 5)B ?; Chromatin state annotations across 127 epigenomes

Courtesy of Anshul Kundaje. Used with permission. Reveal epigenomic variability: enh/prom/tx/repr/het Anshul Kundaje ?? 䇾Central dogma䇿 of Molecular Biology





?F Genes control the making of cell parts

#   6  6    M 1 2H    9+999U M 6    (䇾, 䇿6   ./ M 10.  䇾, 䇿 +  , # , H+ ,+ 3 , , # H  , . ,. M    6 , 6 6   2H6 ,. ,5  .   H     M   ,  +   ,   M 1  H .     36  

?I mRNA: The messenger

# 6     M   03   M 30D.3  DNA A T T A C G G T A C C G T Replication U A A U G C C A U G G C A Transcription

RNA M  , 3 3 5,  .3 .3 Translation

Protein

Image by MIT OpenCourseWare.

F9 From DNA to RNA: Transcription

Image removed due to copyright restrictions.Please see: Figure 7-9 from Alberts, Bruce, and Martin Raff. Essential Cell Biology. New York, NY: Garland Publishing Inc., 1997. ISBN: 0815320450.

F From pre-mRNA to mRNA: Splicing

# 1 .+0., 6   M 7  D,3.5   M 2,5 H  + ,   M   +, .  , 39;3, 

Image removed due to copyright restrictions. Please see: Figure 7-16 from Alberts, Bruce, and Martin Raff. Essential Cell Biology. New York, NY: Garland Publishing Inc., 1997. ISBN: 0815320450.

M   0,   . 66D36  +   66,, 

F RNA can be functional

# ) )    , D  M ) 65  ,   .6      M  5    6  .6 H # 7.,6 H M H 6 6  M  H 55  , 6 . M  H 6 3  M  H,    # 3 G M=䇻         H  M : ,  +362H ,+ H 

F$ RNA structure: 2ndary and 3rdary

Courtesy of SStructView F8 Splicing machinery made of RNA

Image removed due to copyright restrictions. Please see: Figure 7-16 from Alberts, Bruce, and Martin Raff. Essential Cell Biology. New York, NY: Garland Publishing Inc., 1997. ISBN: 0815320450.

F> 䇾Central dogma䇿 of Molecular Biology





F; carry out the cell䇻s

#   , D, .  M H    083 3   M " 09 0  . DNA M 1    , 6 ,, Replication # )B b) b7  • Transcription

RNA M    B     5  6 6, Translation M  , 6   ., 䇻  6 6 $2  #• ", .0  Image by MIT OpenCourseWare. M   .+3+   + +  ,+  3  

F?

Sugar phosphate backbone

A DNA

2

3 1

Base pair Image by MIT OpenCourseWare.

Image by MIT OpenCourseWare. Image by MIT OpenCourseWare. Alpha-beta horseshoe Beta-barrel this placental ribonuclease inhibitor is a Helix-turn-helix Some antiparallel b-sheet cytosolic protein that binds extremely domains are better described as strongly to any ribonuclease that may leak Common motif for b-barrels rather than b- into the cytosol. 17-stranded parallel b DNA-binding proteins sandwiches, for example sheet curved into an open horseshoe shape, that often play a streptavadin and porin. Note with 16 a-helices packed against the outer regulatory role as that some structures are surface. It doesn't form a barrel although it mRNA level looks as though it should. The strands are transcription factors intermediate between the only very slightly slanted, being nearly extreme barrel and sandwich parallel to the central `axis'. arrangements. FF Protein building blocks #    

FI From RNA to protein: Translation

•tRNA + •Ribosome NH3

Tyr + NH3

Gly

+ NH3 A U G Met

U A C C C A Pro

G G C 5' AUGCCGGGUUACUAA 3'

Image by MIT OpenCourseWare.

I9 The Genetic Code

C0  .   , ,,

  ,  . 0,5  I Summary: The Central Dogma 2H  H "

Inheritance DNA

Replication

Transcription Messages RNA

Translation

Protein

Reactions

Image by MIT OpenCourseWare. I Cellular dynamics and regulation How cells move through this Central Dogma





I$ Animal/Human gene regulation: One genome  Many cell types

ACCAGTTACGACGGTCA GGGTACTGATACCCCAA ACCGTTGACCGCATTTA CAGACGGGGTTTGGGTT TTGCCCCACACAGGTAC GTTAGCTACTGGTTTAG Images of a heart, red blood cell, and a brain CAATTTACCGTTACAAC removed due to copyright restrictions. GTTTACAGGGTTACGGT TGGGATTTGAAAAAAAG TTTGAGTTGGTTTTTTC ACGGTAGAACGTACCGT TACCAGTA Image in the public domain.

 ) , I8 Eukaryotic Gene Regulation

Cartoon depicting eukaryotic gene regulation removed due to copyright restrictions.

I> Diverse roles for regulatory non-coding RNAs

# Small RNA pathways (18-21 nt) M   H # ,3.  $䇻C 3.  ,   . # 23 5  H     #   .6  ,    M , H #   , , 3     M  H M C5 H # Long non-coding RNAs (1000s nt, many exons) M ) 66 6,<73 M ) 66 6$2 6 H

I; Regulation of Gene Expression

# C, 6  promoter  #  , B  motifs # Transcription factors (7/ 3 6 # 7 RNA polymerase

#   ,  1D , 

I? Predicted motif drivers of enhancer modules

# Activator and repressor motifs consistent with tissues

Courtesy of Macmillan Publishers Limited. Used with permission. Pouya Kheradpour 98 IF Network components reveal functional modules

© Cold Spring Harbor Laboratory Press. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Zeitlinger, Julia et al. "Whole-genome ChIP–chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning prcesses in the Drosophila embryo." Genes & # 756  ,0 ,Development 21, no. 4 (2007): 385-390.  ,  # , 6 W 

aet al& II Systematic motif dissection in 2000 enhancers: 5 activators and 2 repressors in 2 cell lines

Figure 1: selection of activator and repressor motifs removed due to copyright restrictions. Source: Kheradpour, Pouya et al. "Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay." Genome Research 23, no. 5 (2013): 800-811.



Kheradpour et al Genome Research 2013 99 Emerging properties of regulatory networks

Figures removed due to copyright restrictions.

#     0 6 .   M )  363  5, # ), 6 < 63 3.   H   0  M  67  H    5 

9 From to ).   © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.  .H #  ,  ,,

 #  3 3 

.    <, ,  "   #  ,      

).        3   # ).3 0W

Courtesy of Macmillan Publishers Limited. Used with permission. Source: Benner, Steven A. and A. Michael Sismour. "Synthetic #  biology.“ 6, no. 7 (2005): 533-543. 9 Over-express a single microRNA leads to new wing

b <3  ).3  

b  b

# 2 0.6< 5  H #  .   3 0 ,  ,   # -.05D, (  H /  3    # =,   0  et al& © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. 9$ Brief intro to Human Genetics

98 The role of genetic alterations





9> Brief intro to human genetics # $- + ,+$   + 9+J$  )H"+J>99 , .,3  

www.genome.gov/GWAStudies www.ebi.ac.uk/fgpt/gwas/

Figure in the public domain. Created by Darryl Leja and Teri Manolio, NHGRI; Tony Burdett, Dani Welter, and Helen Parkinson, EBI.

9; The power and challenge of disease-association studies

'7'* + (011'1(12331 Slide credit: Luke Ward, Mark Daly # Large associated blocks with many variants: Fine-mapping challenge # No information on cell type/mechanism, most variants non-coding  Epigenomic annotations help find relevant cell types / nucleotides 9? The power of GWAS: reveal new disease genes

© ADAM, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Courtesy of Macmillan Publishers Limited. Used with permission. Source: Cho, Judy H. "The genetics and immunopathogenesis of inflammatory bowel disease." Nature Reviews 8, no. 6 (2008): 458-466.

rs11209026 A G    I?; *$  . , 3666 5     ;F I$ 22&&

9F Genomewide association in schizophrenia with 40,000 cases

 99 6        ,  MMM 

Courtesy of Macmillan Publishers Limited. Used with permission. Source: Ripke, Stephan et al. "Biological insights from 108 schizophrenia-associated genetic loci." Nature 511, no. 7510 (2014): 421. 9I Interpreting non- DD coding variants

# Disease-associated SNPs enriched for enhancers in relevant cell types # E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator '7'* +(011'1(12331 9 Mechanistic predictions for top disease-associated SNPs

Figures removed due to copyright restrictions.

Disrupt activator Ets-1 motif Creation of repressor Gfi1 motif  Loss of GM-specific activation  Gain K562-specific repression  Loss of enhancer  Loss of enhancer function  Loss of HLA-DRB1 expression  Loss of CCDC162 expression Chromatin state annotations across 127 epigenomes

Figures removed due to copyright restrictions.

Reveal epigenomic variability: enh/prom/tx/repr/het Anshul Kundaje  Characterizing sub-threshold variants in heart arrhythmia

© source unknown. All rights reserved.This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq- fair-use/.

Focus on sub-threshold variants -4.2 (e.g. rs1743292 P=10 ) Courtesy of Macmillan Publishers Limited. Used with permission. Source: Arking, Dan E. et al. "Genetic association study of QT interval highlights role for calcium signaling pathways in myocardial repolarization." Nature Genetics 46, no. 8 (2014): 826-836. Trait: QRS/QT interval (1) Large cohorts, (2) many known hits

(3) well-characterized tissue drivers $ Courtesy of Macmillan Publishers Limited. Used with permission. Source: Roadmap Epigenomics Consortium et al. "Integrative analysis of 111 reference human epigenomes." Nature 518, no. 7539 (2015): 317-330. 8 Linking traits to their relevant cell/tissue types

ES Liver

Brain Digestive

Heart

T cells B cells

'7'* +(011'1(12331 > Methylation differences a causal component of AD

'7' * + (011'1(12331 Methylation probes altered in AD are enriched in AD-associated SNPs G b M b D

G b M a D

G b D '7'* M +(011'1(12331 AD predictive power reduced Set-wise causality testing after removing meQTL effect

; Uncovering the molecular basis of top obesity gene

Lean

Obese '7' * +(011'1(12331 ARID5B KD IRX3, IRX5 knock-down (obesity) (anti-obesity phenotypes) ARID5B OE IRX3, IRX5 overexpression (anti-obesity) (pro-obesity phenotypes)

C-to-T motif rescue T-to-C motif disruption (anti-obesity phenotypes) (pro-obesity phenotypes)

117 Model: beige  white adipocyte development

'7'* +(011'1(12331

Shift therapeutic focus from brain to adipocytes

F Challenges in Computational Biology

8   3 .

>  . 6 0.  7 2H

 )B   

;  ,  0   TCATGCTAT TCGTGATAA TGAGGATAT $ 2  3  , ? 10  . . TTATCATAT TTATGATTT

F D,  .

H  , I   0. 9 33 ,   "  .

  3    

$ 1 ,,

I :,=  ,<<  

;98?<;F?F< )>9? ,  - . 7 9>

76  3     6C+0 ,<<  <