<<

The Two Versions of the & MEDICINE_Molecular

Sequenced, yes – but decoded? We still don’t fully understand our human genetic make-up. The answer to many of its mysteries lies in the diploid of the genome, which contains two sets of : one inherited from the father and one from the mother. Margret Hoehe from the Max Planck Institute for in Berlin has sequenced both versions of the separately for the first time and, in the process, discovered that individuals are even more individual than we thought.

TEXT CATARINA PIETSCHMANN

hen Margret Hoehe ex- occurred again in particular propor- plains her work, you tions – a phenomenon that can be ex- quickly realize how log- plained only by the existence of two ical her approach is, sets of chromosomes. how obvious – so much W so that you wonder why someone else FROM FATHER didn’t do what she has done a long AND MOTHER time ago. Maybe the reason is that, when one is too close to something, it’s Even if the crosses are not always as easy to lose sight of the big picture. clearly predictable in , we have Looking through a magnifying glass, one thing in common with many oth- one might see a craggy grey landscape er : every individual has a ge- and have no idea what it is. Taking a nome that consists of two different sets few steps back, you want to shout to of chromosomes. As a result, all genes them, “Put the magnifying glass down! arise in duplicate: one on the maternal Then you’ll see that what you’re look- and the other on the pa- ing at is a full-grown .” ternal one. refer to the differ- Margret Hoehe is one of the few ent parental sequence versions of each people who did just that: took a few chromosome as . Chromosomes at the steps back to get a better view. She re- A less well known fact is that, as far moment of division: The human genome consists flected on the genetic principles, with- back as 1957, American biophysicist of two sets of chromosomes, out which ’s laws of the demonstrated in a sim- one from the father and inheritance of traits can’t be under- ple experiment that it makes a differ- one from the mother. There stood, and without which no biology ence whether two different are thus two versions of each chromosome. A special lesson is complete. Mendel crossed ho- of a are located on the same chro- method creates a mozygous that had either mosome or whether they reside on op- characteristic band pattern purple or . The plants in posite chromosomes. But Benzer’s dis- for each chromosome, so the daughter generation all had purple covery was forgotten. that the pairs can be flowers, because the gene for purple So wouldn’t it make sense to look at identified on the basis of their comparable size, form color was dominant. In the third the two versions of the genome sepa-

Photo: SPL-Agentur Focus and color. generation, purple and white flowers rately? Up to now, only “one” genome

1 | 12 MaxPlanckResearch 65 BIOLOGY & MEDICINE_Molecular Genetics

Cis configuration Trans configuration

Defective Intact Defective Defective

The distribution of mutations in a pair of chromosomes can dictate health and : In the cis configuration, two mutations arise in one and the same gene form, and the associated becomes perturbed. The second copy and the protein originating from it remain unaffected. In the trans configuration, in contrast, both gene copies are mutated and produce two damaged .

has been read, a sort of composite of done practically nothing but write In the years that followed, medicine the maternal and paternal sequences. since August to get all of the analyses clearly made progress, but the great There are historical reasons for this: down on paper. breakthrough never came. This may When the Human was Why is it important to know how change now with Margret Hoehe’s hap- carried out in the 1990s, it was almost certain mutations are distributed be- lotype approach. technically impossible to analyze the tween the two parts of the genome? The DNA alphabet consists of only two versions of the approximately three “Because, it can mean, for instance, the the letters ACGT. They stand for the billion base pairs in the human genome difference between and no can- base molecules adenine, cytosine, gua- separately. It simply would have been cer,” she says. “If there are two muta- nine and thymine. Around 90 percent too labor- and cost-intensive. tions – for example, of the BRCA1 risk of the genetic differences between peo- gene associated with – ple consist of sites where one letter of GENETIC MAKE-UP IN DUPLICATE they need not necessarily cause the dis- the – that is, one base – ease.” There are two possibilities: the was replaced by another one. The re- Habits that have become entrenched mutations affect both forms of the gene searchers also refer to these base ex- are difficult to change. “Until recently, (trans), in which case the chances of changes as single polymor- even high-ranking experts thought of developing breast cancer are very high; phisms (SNPs). “Everyone has an aver- the genome as a ‘one-fold’ object, not or they are located on only one of the age of one base exchange every 1,700 a ‘two-fold’ one,” says Margret Hoehe, two chromosomes (cis), so the person letters,” explains Hoehe. head of the , Haplo- will also have a completely “healthy” types & Genetics of Complex Disease form of the gene. CHANGES IN THE GENETIC CODE group at the Max Planck Institute in Cis or trans? The answer to this Berlin, and the first to se- question can mean the difference be- The information for the production of quence the genome of a human being tween good health or illness – and even proteins from individual amino acids – a German – entirely separately based or death. It can also raise hope. Af- is contained in the order or sequence on the haplotypes. She analyzed both ter all, a “healthy” form of the gene can of the bases. Thus, for example, the sets of chromosomes completely and at be passed on to the next generation. replacement of a cytosine by a thy- a previously unattained level of detail. A good ten years have passed since mine can lead to the use of a different The approach that she developed with and – ini- . This can result in the pro- her colleagues to do this was singled tially colleagues and later competitors duction of a malfunctioning protein. out by the journal NATURE as one of the – announced the decoding of the hu- When bases in the control regions of most promising methods of 2011. genome side by side with Bill Clin- the DNA are exchanged, biochemical “Just a minute, we’ll be ready soon. ton in the Oval Office. The end of igno- processes in the body may be blocked It must be possible ... !” Hoehe is look- rance, as Venter said. What a triumph! or accelerated. ing for space on her desk for two Once we know the genetic code, we will To sequence the genome, the re- teacups and her visitor’s notes. Not an soon also know the function of all searchers isolate DNA from white blood easy task, as the desk is almost entirely genes. We will understand the origin of cells, shred it using ultrasound, and sort covered with towers of publications and be able to heal them. So the fragments by size. The DNA snip-

and data printouts. The researcher has far, so good. pets are then fixed to a surface using Graphic: Art 4

66 MaxPlanckResearch 1 | 12 Library in the freezer: The human genome fragments inserted into bacterial DNA molecules are stored at minus 80 degrees. Each of the aluminum-foil-packed genome libraries originates from one person. The of 100 people fill an entire freezer.

small anchor molecules and copied. The reading of the base sequence is car- ried out simultaneously for millions of DNA fragments. Today, it takes just a few days to read a complete genome us- ing a second-generation machine. “The very latest machines reach a throughput of up to three hun- dred billion bases,” says Hoehe. “The early ones could manage only 900 bas- es.” Billions of these sequence reads are stored on a supercomputer. Then the work on the vast jigsaw puzzle begins: the computer compares the base se- quences of the individual snippets and overlaps them to form bigger pieces un- til the DNA is complete. But how does the software know which piece of the puzzle belongs where? “It constantly compares the base sequence of the snippets with those of the reference sequence,” explains Hoehe, “and this is the critical point.” The human genome of 2001, which all scientists still use as a reference, is an artificial product, a cocktail of the DNA of at least six different people. But each person is unique, so this reference se- quence as such doesn’t exist naturally. It doesn’t reflect the genetic code of any particular living person. This was the intention, but many experts now view it as an error, as this makes it an unre- liable roadmap for exploring the causes of health and disease. “Because some- times we don’t know whether the ex- changed bases we find occur frequent-

Photo: Nina Lüth ly or rarely in reality.” > Simultaneous sequencing of the from several super-pools

1 2

Graphic above: Sequencing of the haplotypes of a genome: (1) The DNA of the chromosomes is mechanically sheared into smaller fragments. Fragments with a length of 40,000 base pairs are inserted into bacterial transport DNA and these so-called fosmids (representing now haploid DNA fragments) are reproduced in . Approximately 1.44 million fosmids are distributed on three plates, each containing 96 wells, representing the library of an individual. Each well contains a pool of 5,000 fosmids and each pool contains, as the result of dilution, almost without exception fragments of one . For reasons of efficiency, three pools are always combined to form a super-pool. These pools are decoded using the very latest sequencing technology (2) and the individual fosmid sequences are assembled (3). The relevant base exchanges (perpendicular lines) are identified in the mixture of fragments from the paternal and maternal genome (4). This allows the individual sequences to be assigned to one of the two haplotypes and combined to form the chromosomal haplotypes through identification and tiling of overlapping fragments (5). In this way, the genetic code of the chromosome pairs is being deciphered (6). below: Bioinformaticians put their heads together: Thomas Hübsch and Gayle McEwen discuss how they can analyze particularly complex DNA sections by haplotype.

A total of around 2,800 individual hu- published in 2007, and was followed this reason, the genomes of Africans man genomes have been sequenced recently by an Aborigine genome. The contain many old, common variants, as throughout the world – all based on a genes of Archbishop Tutu from the well as rarer and newer ones. composite of the two chromosome Bantu ethnic group were analyzed last The greater variability of African sets. Craig Venter started it all when he year at the same time as the DNA of genomes is also evident in their orga- sequenced his own genome. Years lat- the members of four other South Afri- nization: “In Africans, the shortest er, he tried to separate his genome into can ethnic groups. length of the DNA blocks, in which its haplotypes, but only partially suc- According to the findings, Africans the exchanged bases are inherited over ceeded. , the co-discov- are far more variable than Europeans. generations, is approximately 10,000 erer of the DNA double helix, also Modern humans developed over 100,000 bases on average,” says Margret Hoe- knows the contents of his genome. years ago in Africa, and their genomes he. The corresponding figure in Euro-

The first genome of a Chinese man was had a long time to intermix there. For peans is between 20,000 and 120,000 Graphic: Art 4 Science; photos: Nina Lüth (2)

68 MaxPlanckResearch 1 | 12 BIOLOGY & MEDICINE_Molecular Genetics

Haplotype 1

Haplotype 2

3 4 5

6

bases and more. “Obviously, just a exchange patterns and are not cut in enable the development of the long- small group left its continent of origin, exactly the same way – for example, at awaited individualized therapy in med- Africa, to populate the rest of the world, bases 128 and 40,200, or at 14,000 and icine. The door to a completely new and few people followed them. Conse- 55,030 – the computer can easily iden- research field is being opened here. quently, was signifi- tify, during overlapping, whether a cantly reduced – a genetic bottleneck, snippet belongs to part A or B of the ge- UNKNOWN REGULATION so to speak.” nome. However, whether A originates OF GENES from the father or mother can be estab- THE AFRICAN GENOME AS lished only through further compari- Inheriting good genes – this expression A REFERENCE with at least one parent. is now taking on a new molecular In this way, it was possible to re- meaning. “It sometimes seems to me as Up to now, all human genome research solve the two versions of almost all of though my maternal and paternal has been based on the reference se- the German subject’s 17,861 genes that genes antagonize each other,” says Mar- quence, a genome with European roots. code for proteins. Of those, 90 percent gret Hoehe laughing. No, seriously, This was due to the simple fact that the arise in two different molecular forms. there’s a grain of truth in this. “You just countries involved in the 2001 Human The two chromosome sets differ in have to think about it again. Perhaps Genome Project were the US, France, around two million sites. Comparison the genes are regulated differently at Germany, England and Japan. “Bio- with the reference sequence also shows different stages in life? Perhaps one ver- markers, receptor variants, disease re- that 60 to 70 percent of the given mo- sion of the chromosome is completely gions and drug targets – none of these lecular gene forms exist as such only in switched off?” We know almost noth- can be applied to the African genome this individual. Thus, we are far more ing about this. The buffer capacity of on a one-to-one basis.” The availability individual than was believed to be the the system appears to be high. There of an African to en- case. Moreover, the scientists identified are numerous small intermediary stag- able the development of more effective 159 genes with two or more potential es between healthy and sick. “If one drugs for people with African genetic disease-predisposing mutations, 86 of doesn’t work, the other one does. But roots is thus long overdue. which occur in only one gene copy. when too many things coincide, at To be able to sequence both chro- What could the relationship be- some point, everything is derailed.” Ul- mosome sets of a human genome sep- tween these two molecular gene forms timately, it is probably a question of the arately, Margret Hoehe and her team be? Do they cooperate or do they coun- cumulative effect of all of the different had to develop a new molecular genet- teract each other? Or do they alternate? individual predispositions. ic method along with all of the neces- Which one is dominant – and why? A In order to obtain a better catalog of sary tools. An important disease arises only when one defective the diversity of genetic variants, the difference between their method and gene overrules its intact counterpart. was followed the traditional technology lies in the Hoehe is certain that a new, more in- up in 2008 by the 1,000 Genomes Proj- fact that the DNA segments are not 25 depth view of biology will arise. And it ect, the aim of which is to decode 2,500 to 40 bases long, but around 40,000. Be- could actually provide the missing key genomes. Whereas many researchers all

Graphic: Art 4 Science cause they display characteristic base to understanding the genome, and over the world are cooperating on this

1 | 12 MaxPlanckResearch 69 BIOLOGY & MEDICINE_Molecular Genetics

left: Stefanie Palczewski (left), Sabrina Schulz (center) and Eun-Kyung Suk (right) in front of a high-throughput sequencing system for DNA analysis. right: Each of the device’s two flow cells can accommodate over 600 million DNA fragments.

project and contribute partial sequenc- So she included a chapter on the inter- and methods,” she says, smiling. The es to it, Hoehe and her team, consisting individual variability of drugs in her genes for the opiate and adrenergic re- of a handful of scientists, have com- doctoral thesis. “There was an uproar,” ceptors, which she had previously stud- pletely sequenced, haplotype-resolved she recalls, and she is still infuriated by ied pharmacologically, were discovered and analyzed an entire genome in a the fact that she was forced to remove soon after that. “Using methods that matter of a few months. She is aiming the chapter. But the idea itself could would be dismissed as today, to have completed twelve further indi- not be deleted. Today, particular atten- we found the first variations, and then vidual haploid-analyzed genomes by tion is paid in medical studies to the searched the genome for regions for de- the end of the year. differences between responders and pression and .” Individuality – this theme runs non-responders. through Margret Hoehe’s life as a re- The next step was to carry out fam- A NEW APPROACH TO searcher. She studied medicine and ily studies. Hoehe wanted to examine GENOME ANALYSIS psychology in Munich and completed the possible genetic roots of the indi- her doctorate on “the effects of opi- vidual markers she had discovered for At the time, it was still believed that ates on neuroendocrine and psycho- susceptibility to diseases and the effec- only a few genes in the genome were logical parameters” in 1986. During a tiveness of drugs. To this end, she went mutated, and that these mutations study on depressive and schizophren- to the National Institute of Mental had serious effects – a mistaken belief, ic patients, she quickly noticed that Health in the US in 1987. “The human as we now know today. Very few dis- some did not respond at all to drugs; genome project was already on the ho- eases are actually due to a single gene. and neither did a proportion of the rizon.” When her boss asked her wheth- And regarding schizophrenia: In 2009, healthy subjects. “The traditional mean er she wanted to continue working in the journal NATURE published a study value method ultimately leveled out the area of neuroendocrinology or with according to which thousands of mu- all of the individual differences.” What DNA, without hesitating, she opted for tations on many genes can cause the needed to be examined, instead, is the genetic material. disease, and one of these mutations how the individual’s DNA influences “I knew absolutely nothing about it alone causes only a very slight in-

the effect of drugs. and had to teach myself the basic tools crease in the risk of developing it. The Photos: Nina Lüth (2)

70 MaxPlanckResearch 1 | 12 NEW YORK TIMES referred to the discov- sequencing of the entire haploid ge- ery as “A Pearl Harbor in schizophre- nomes of human individuals. nia research.” All of this work took a great deal of Back to the early 1990s: Hoehe soon energy and dedication. “It wasn’t easy reached the conclusion that the entire to swim against the tide for such a approach to genetic research would long time.” But that’s changing now. have to be reversed. Research should And Margret Hoehe is already focusing not start with the complex disease and on individuality again, this time from look for the associated genes; instead, the perspective of a personalized ge- it should search the genome for inter- nome project. Illness can best be stud- individual DNA sequence differences ied where it occurs – in the patient. and observe their effect on genes in vi- The project will initially focus on a . She compiled a corresponding proj- breast cancer patient whose haploid ect application and sent it to George genome will be studied in the blood Church, one of the leading genetic re- and in the tumor tissue. The two wom- searchers. He immediately liked the en have known each other for a long idea, and a short time later, Hoehe was time and are writing a joint diary on accepted as a postdoctoral researcher the project, which may one day be- With her method of sequencing a genome at Harvard Medical School. She devel- come a book. separately based on the two haplotypes, oped the Multiplex PCR Sequencing An unusual aspect of this project is Margret Hoehe has laid the foundations for technology there, which enabled the si- that the patient, a journalist with med- . multaneous reading of 20, and later 50, ical training, is involved in the study DNA snippets. That Kary B. Mullis, No- and is being given access to her genet- bel laureate and inventor of the poly- ic information. Anonymized research is merase chain reaction, had said that it generally the rule in medicine. “People GLOSSARY could “never” work was something she make their samples available to science didn’t learn until later. voluntarily. We should give them some- Chromosome Hoehe returned to Germany in thing in return for this,” says Margret The genetic material of living organisms 1995 and established a research group Hoehe. “This topic is already being hot- with a is distributed on varying numbers of chromosomes. A chromosome at the Max Delbrück Center for Molec- ly debated in the US.” Patients who consists of the thread-like DNA molecule ular Medicine in Berlin – yet another want to know and have the necessary and various proteins. The biggest chromo- adventure. She had the machines for background knowledge should also be some in humans is around 250 million the research flown in from the Machine given the information about their ge- base pairs long, while the smallest contains only around 50 million base pairs. Shop at Harvard University. She rese- nomes, following the motto: “My ge- The chromosome with the most genes quenced the µ-opiate receptor gene in nome is mine!” has around 3,000 genes, while the male 250 patients and healthy subjects, and Rumor has it that Craig Venter is has only around 100. found first haplotypes that were associ- taking prophylactic drugs since he Diploidy ated with addictive disorders. This was found out about his genome. The Ger- Diploid organisms have a double set of one of the two initial studies that man subject whose genome was se- chromosomes. For example, humans have 46 chromosomes, which occur in the form showed that it is important to consider quenced can’t do this. Because his of 23 chromosome pairs. Each homologous the haplotypes of a gene and not just blood sample came from a biobank, it chromosome pair contains one chromo- randomly selected single mutations. In was anonymized for ethical reasons. All some from the mother and one from the 2002, Margret Hoehe moved to the that is known about him is that, when father. The 23rd chromosome pair contains either two XX chromosomes (women) Max Planck Society. One of her main he gave the blood sample, he was 51 or one XX and one XY projects there was the sequencing of years old and in good health. The sci- (men). the haplotypes of the most complex re- entists would have both good and bad Haplotype gion of the human genome: the histo- news for him: he also has two muta- The two parental sequence versions of each compatiblity complex that serves as the tions in the breast cancer gene BRCA1. chromosome. Because humans have two blueprint for the immune system’s an- Fortunately, they are both on the same (variant) forms of each chromosome, the chromosome exists in two haplotypes. tibodies, and that is also rich in disease copy of the gene. This means he also

Photo: MPI for Molecular Genetics genes. This eventually gave rise to the has a completely healthy version.

1 | 12 MaxPlanckResearch 71