Interactive Computer Program: Packaging DNA into Chromosomes

Xiaoli Yang 1, Yifan Cai 1 and Charles Tseng 2 1Department of Electrical and Computer Engineering 2Department of Biological Sciences Purdue University Calumet Hammond, IN, USA

Abstract - As part of the interactive program for teaching and serve as a model for STEM (science, technology, engineering, learning genetics, the module on packaging DNA into and mathematics) education via distance learning. chromosomes involves the simultaneous coordination of eyes, mind, and hands for visualization, cognitive feedback, and 2 Model Development manipulation, respectively. Computer modeling of various chromatin structures during packaging is based on OpenTK- Models of various structures were developed based on OpenGL on .Net Platform, which is coupled with an inquiry the following system: OpenTK-OpenGL on .Net Platform. based content design to enhance the efficiency of teaching and The Open Tool Kit (OpenTK) is a free project that allows learning. The prototype has been successfully tested in a developers to use OpenGL, OpenGL|ES, OpenCL, and genetics class at Purdue University Calumet. It should also be OpenAL APIs from a managed language (e.g. VB.NET). applicable to a number of undergraduate biology courses . Features include: • Written in cross-platform # and usable by all managed Keywords: DNA, Chromosomes, Modeling, Computer languages (F#, Boo, VB.Net, C++/CLI).

Program • Consistent, strongly typed bindings, suitable for RAD development. 1 Introduction • Usable standing alone or integrated with Windows.Forms, GTK#, and WPF. From its central role in real-life forensic investigations • Cross-platform binaries that are portable on .Net and to being the basis of major biotechnological applications in without recompilation. medicine, agriculture, and the environment, DNA based • Wide platform support: Windows, Linux, and Mac OS genetics is an essential discipline in the life sciences. As X, with iPhone port in process. fascinating as the subject is, however, teaching and learning genetics has often been fraught with difficulty [1-3]. Confronted with intricate molecular structures, complex 2.1 3D model of double helix DNA packaging schemes, and elaborate mechanisms of action, both DNA is modeled as a double helix. The model is teacher and student are frequently at a loss – the teacher in specified by l, the length of the helix, r, the radius of the helix, how to convey this material in a clear and understandable and w and h, the width and the thickness of one strand of the way, and the student in how to assimilate all the information double helix, respectively (Fig. 1). These parameters generate usefully. To be sure, the abstract and intangible nature of a group of points, which are used to construct the DNA much of the material is the source of the problem. model. The points are linked together to form a sketch of the Traditional methods of teaching genetics, employing double helix. After shading the sketch, a 3D DNA model is classroom lectures, textbook readings, homework created. The double helix model is calculated during runtime assignments, and laboratory exercises, have not proven to be based on the equations below: very effective [4, 5]. Recently, efforts have been made to integrate computer visualization technologies into pedagogy to enhance the learning process [6-8]. Current computer- , 0 (1) based tools, however, do not stress cognitive feedback in their ∗ sin ∗ designs. The present paper describes an innovative approach ∗ cos ∗ to teaching and learning genetics, in which students can Where t is the length variable along x-axis, r the radius of visualize a real-time, interactive DNA model, as well as the helix. the angle increment, controlling the smoothness actively control the dynamic process of packaging DNA into of the helix. We chose from the experiment to make the a compact metaphase chromosome. model smooth. determines 6 the initial angle. We chose The objectives of the program are to 1) develop, as part from the experiment to generate double helical of a web-based interactive program, a DNA packaging shapes. . From the above equations, module suitable for a wide range of college courses and 2) 0 45 ∗ ∈ 0, 2 determine the Cartesian coordinates x, y and z in 3D space. , , 1 , 1, 1 By calculating the position of all the points, a helical line can be generated (Fig. 2, left). The quadrupling of the line 3 , 3, 3 (2) (Fig. 2, right) is generated by replicating the original line four , , times. , , The result is a complete DNA model (Fig. 5).

Fig. 1. Parameters for a helix (perpendicular views)

Fig. 2. Left: helical line; Right: doubling of line Fig. 5. Screen snapshot of DNA model from the program After further duplicating the strand with a different value and shading the sketch, two DNA strands of different 2.3 3D model of histone octomer colors are created (Fig. 3). The histone octomer is represented by an elongated ball, which is described by the following equations:

∗sin ∗cos Fig. 3. DNA double helix structure with shading (3) ∗ cos , 0 2.2 3D model of nucleotide bases ∗ cos , 0 We use line segments (cuboid) of different colors to determine ∗sinφ∗sinθthe Cartesian coordinate x, y and represent DNA bases. The points that form the DNA strands z in ,a 3 dimensional, space, where r is the radius of the (determined above) are used to calculate the points histone; the angle between the diameter and z-axis, representing the line segments (bases). Assume that p1 (x1, ; the angle between the projection of the y1, z1) and p2 (x2, y2, z2) are corresponding points on different strands generated by the same t value, but different diameter ∈ ,on the plane and the x-axis. and h the height of the elongated ball (Fig. 6). values, p3 (x3, y3, z3) and p4 (x4, y4, z4) are the points , ∈ 0, 2 next θ to p1 and p2, respectively, w is the length of the side, pc1 is the midpoint between p1 and p2, and pc2 is the midpoint between p3 and p4 (Fig. 4).

Fig. 6. Sphere coordinate system Fig. 4. Parameters used to calculate the position and shape of These points generate a sketch of the histone octomer. bases. Shading the sketch with a color completes the model (Fig. 7).

Then all 8 points needed to describe a cuboid can be calculated as follows.

implemented this function by adjusting camera position when we developed the model. The camera was located on the surface of sphere with the target at the center of sphere, so that the distance between camera and target never changes. In

Fig. 7. Left: sketch of histone octomer; other words, the size of the target remains the same, so that Right: shaded histone octomer the model size does not change. Camera position (a point on the surface of sphere) is described by φ , θ and r, where r is the radius of sphere. While the value of r never changes, φ 2.4 DNA wrapping-formation of core and θ are variable. Changing φ and θ changes the camera nucleosome position, and they are changed by moving the mouse. Moving the mouse produces component values dx and dy along the x- In this step, DNA is simplified as a line, which can be and y-axis, respectively. Therefore, mapping dx and dy to φ wrapped around the histone octomer. Fig. 8 shows how to and θ is an effective way to adjust camera position. calculate the position of the binding points. 3 The Program Content Design The design of module focuses on inquiry based methods with cognitive feedback and interactive experiences as important components [9]. In every section, a question is followed by observations and measurements, hands-on experiments, and conclusions. In each of the learning steps, dynamic models of DNA molecules, chromatin fibers, and metaphase chromosomes are presented for interaction through

visualization, cognition, and operation. Completion of the Fig. 8. Calculation for DNA-protein binding position program requires comprehension of the entire concept and The histone octomer is projected on the xz -plane as a thus ensures the success of the learning experience. The circle. Assume that O is the center of the circle, P is outside computer-based content is summarized below: the circle, r is the radius of the circle, dx and dy are the differences between O and P in x and y components, 3.1 DNA and chromosomes in prokaryotes and respectively, is the distance between P and O, Pb is the eukaryotes binding point, is the angle between line P0 and the vertical line, and is α the angle between line P0 and the line PbO . Inside the cell, DNA molecules are packaged, with helped Then and β can be calculated as follows: of proteins, into thread-like structures called chromosomes. In prokaryotes (such as bacteria), the chromosomal DNA, α β when open, is often circular. The total length of a bacterial arcsin (4) chromosomal DNA (e.g., E. coli DNA) may be a thousand times longer than the cell that contains it. Little is known The value of is arctan about the packaging of bacterial DNA, although a few major (5) DNA regions anchored by proteins at specific sites in the cell Finally, Pb is represented by ( x,y ), where 270° have been noted.

In eukaryotes (such as animals and plants), DNA (6) ∗ sin molecules are linear. Each eukaryotic species has a fixed When DNA binds to a histone octomer, it starts to wrap ∗ cos number of chromosomes. For diploid species (species with 2 around the octomer. After Pb is determined, the points on the sets of chromosomes, one from each parent), chromosomes spiral nearest to it can be calculated (Fig. 9). are paired, so that the total number of chromosomes is always

even .

In humans, for example, the father provides a set (also called a genome) of 23 different chromosomes (from sperm), while the mother provides the other set (genome) of 23 different chromosomes (from egg). Thus, each of our somatic (body) cells contains 46 (or 23 pairs) chromosomes.

Fig. 9. Wrapping of DNA around histone octomer. Left: 3.2 How long is our DNA? sketch; Right: shaded The DNA of each human genome is about 3.2 billion (3.2 2.5 Camera position adjustment x10 9) deoxyribonucleotides long. Each deoxyribonucleotide is 3.4 A (0.34 nm), making the total length of human genomic In the interactive module, users can view the model DNA (0.34 nm)(3.2 X 10 9) ≈1 m (meter) per genome. Since from different angles by dragging the mouse. Our program there are 2 genomes per human cell, the total length of DNA Level 2 - 10 nm chromatin fiber: The DNA of eukaryotic per human cell is 2 x 1 m = 2 m. cells is tightly bound to basic (positively charged) proteins Assuming that an adult human body contains about 50 known as histones. This nucleoprotein complex is called trillion (50 x 10 12 ) cells, the total length of DNA in the human chromatin. The basic structural unit of chromatin is the body is (50 x 10 12 ) 2 m = 100 x 10 12 m (100 trillion meters of nucleosome. A nucleosome consists of a small segment of DNA per human). The Sun is 150 x 10 9 m (150 billion DNA wrapped around histones. The core nucleosome meters) from Earth. How many times can you stretch the particle consists of two molecules each of the core histones DNA from the Earth to the Sun? (100 x 10 12 )/ (150 x 10 9) = (H2A, H2B, H3, and H4), forming a histone octomer (Fig. 666 times (Fig. 10). The distance between the Earth and the 12), around which is wrapped approximately 146 base pairs Moon is about 3.84 x 10 8 m. Can you calculate the number of of DNA. The core particle is stabilized by a fifth histone times your DNA can stretch from the Earth to the Moon? called H1 (also called a linker histone). DNA between the core nucleosome particles is called linker DNA. The core nucleosome particle plus the linker DNA is about 200 bases long, as evidenced by digestion with the enzyme micrococcal nuclease.

Fig. 10. The total length of human DNA 3.3 Can our genomic DNA fit into a nucleus? Let’s examine the size of the human genome in the cell. In the nucleus (G1 phase) of each cell in the human body are 46 chromosomes. Since a set of 23 chromosomes constitutes a genome, there are 2 genomes per nucleus. Each genome contains approximately 3 x 10 9 base pairs, so there are 6 x 10 9 Fig. 11. Screen snapshot of double helical DNA molecule base pairs per nucleus in the G1 phase. Each nucleotide of the base pair measures approximately 3.4 x 10 -10 m in length. Therefore, the total length of DNA in each nucleus is: (2)(3 x 10 9 nucleotides)(3.4 x 10 -10 m/nucleotide) ≈ 2 m. However, the diameter of an average nucleus is 10 x 10 -6 m, making the total length of DNA in the nucleus 200,000 times longer than the diameter of the nucleus: (2 m)/(10 x10 -6 m) = 200,000. How can such a long strand of DNA fit in such a small nucleus? Cleary, the DNA must be folded. Let us examine how Fig. 12. Screen snapshot of histone octomer. much space the DNA occupies in the nucleus if it is somehow (blue: H3; green: H4; Yellow: H2A; red: H2B) folded so that it will fit. DNA exists in a double helix, which -10 can be approximated by a cylinder with diameter 20 x 10 m. The linear chromatin fiber at this level is about 10 nm in Therefore, the volume the DNA occupies is: 2 -10 -18 3 diameter (Fig. 13. a). This represents the state of most π(r )h = (3.14159)(10 x10 m)(2 m) = 6.4 x 10 m chromosomal DNA during interphase and is known as Assume that the diameter of an average nucleus is -6 euchromatin. Genes that are actively transcribed (with approximately 10 x 10 m, making the volume of the momentary detachment of histones) are in this less condensed spherical nucleus: 3 -6 3 -16 3 state. (4/3)( π)(r ) = (4/3)(3.14159)(5 x 10 m) = 5.24 x 10 m Level 3 – 30 nm chromatin fiber: Consequently, the fraction of the nucleus occupied by Some of the 10 nm chromatin fibers can be packed into DNA is: -18 3 -16 3 30 nm fibers. To do this, the H1 histones, each attached to a [(6.4 x 10 m ) / (5.24 x 10 m )] x 100 = 1.22% core nucleosome, interact with each other, turning inwards There is clearly enough room in the nucleus for the DNA and forming a new spiral structure known as a solenoid or 30 – and for its activities including transcription, replication, nm chromatin fiber (Fig. 13. b). Each 6-8 nucleosomes packaging and unpackaging. This leads to the our main topic: constitute one turn of the new spiral. In this state, the How is the DNA packaged into chromosomes? chromatin is tightly packed and is referred to as heterochromatin, a state in which DNA is genetically inactive 3.4 Levels of DNA packaging (no transcription or replication). Higher levels of packaging - looping: The above levels DNA packaging can be considered at the following of DNA packaging mainly describe the G1 phase of the cell levels: cycle. If the cell is destined for division, the G1 phase is Level 1 - Double helix: The double helical DNA molecule followed by the S phase, where a DNA molecule replicates has a width of about 20 A (2 nm) (Fig. 11) semiconservatively to form two identical DNA molecules before being packaged into chromatin fibers and entering the G2 phase. In the early stages of mitosis (or meiosis), the two replicated DNA molecules (in the form of chromatin fibers) continue to condense, and the 30 nm fibers are folded into loops (Fig. 14).

Fig. 15. Screen snapshot of DNA in sister chromatids of metaphase chromosome

Fig. 13 a. 11 nm chromatin fiber

Fig. 16. Screen snapshot of metaphase chromosome with centromere and telomeres 4 Conclusions This research uses computer technology to enhance life science education. It is part of the interactive program for genetics education [10-14]. The success of this interactive Fig. 13 b. 30 nm chromatin fiber computer program relies heavily on 1) innovative content design that stimulates cognitive feedback through coordinated hands-on interactions at key points of the learning process, and 2) efficient computer programs that are capable of demonstrating complex concepts and processes. The computer learning modules involve the simultaneous coordination of eyes, mind, and hands for visualization, cognitive feedback, and manipulation, respectively. In this way, complex concepts are scaffolded, reinforcing the learning process. From a pedagogical standpoint, science is

essentially taught with the scientific method, with questions, Fig. 14. Screen snapshot of looped chromatin fibers observations, experiments, and analysis. Science learned in The chromosome reaches its highest condensed state at this way is more meaningful – and more memorable. metaphase. Condensation is the result of the further folding The DNA packaging module, utilizing these methods, is one of the 30 nm fibers into different loops until a 700 nm of a series of modules for learning genetics that can be structure is reached (width of a chromatid). Therefore, a adopted for a number of biology courses at both college and metaphase chromosome, which consists of two sister high school levels. chromatids as thick as 1700 nm, is large enough to be clearly seen under a light microscope (Fig. 15). Two specialized 5 References structures, centromere and telomere, can be seen with special staining technique (Fig. 16). Chromatin fibers decondense through unpackaging and stretching as the cell returns to the [1] Tibell, L. A. E. and Rundgren, C. J. (2010) “Educational G1 phase. challenges of molecular life science: characteristics and implications for education and research” CBE - Life Sci. Educ. 9: 25-33.

[2] Huang, P. C. (2000) “The integrative nature of biochemistry: challenges of biochemical education in the USA” Biol. Educ. 28:14-17. [3] Bahar, M., A. H. Johnstone, and M. H. Hansell (1999) [10] Yang X., G. Rong, C. Tseng (2011) “Modeling of DNA “Revisiting learning difficulties in biology” J. Biol. Educ. 33: Replication” The 2011 International Conference on Modeling, 84-86. Simulation and Visualization Methods, p.146-149, Las Vegas, July 18-21, 2011. [4] Brig, J. (1996). “Enhancing teaching through constructive alignment” Higher Education, 32:347-364. [11] Wu W., X. Yang , C. Tseng (2011) “Effective Alogorithms for Altering Human Chromosome Shapes” The [5] Sheley, S. M. and T. R. Mertens (1990) “A Survey of 2011 International Conference on Modeling, Simulation and Introductory College Genetics Courses” J. Heredity 81: 153- Visualization Methods, p. 257-261, Las Vegas. 156 [12] Yang X., R. Ge, Y.Yang, H. Shen, Y. L and C. Tseng [6] Sved, J. A. (2010) “Genetics Computer Teaching (2009) “Interactive Computer Program for Learning the Simulation Programs: Promise and Problems” Genetic Principles of Segregation and Independent http://www.genetics.org/cgi/content/full/gentics.110.116640/ Assortment through Meiosis” The 31 st Annual International DC1 Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2009), p. 5842-5845, Minneapolis. [7] Lowe, R. (2004). “Interrogation of a dynamic visualization during learning” Learning and Instruction [13] Wu W., X. Yang, B. Chen, Z. Zhao, J. Lacny and C. 14:257-274. Tseng (2009) “Computer Based Simulation of Chromosome Abnormality” The 2009 World Congress in Computer Science [8] Tsui, C. Y. and D. F. Treagust (2001) “Teaching and Computer Engineering and Applied Computing learning reasoning in genetics with multiple external (WORLDCOMP 2009) p. 359-363, Las Vegas. representations” www.aare.edu.au/01pap/tsu01462.htm [14] Yang, X., D. Wen, Y. Cui, X. Cao, J. Lacny and C. [9] Inquiry Based Learning: Tseng (2009) “Computer Based Karyotyping” The Third www.thirteen.org/edonline/concept2class/inquiry/ International Conference on Digital Society (ICDS 2009), p. 310-315, Cancun, Mexico..