The Journal of Undergraduate Research in Natural Sciences and Mathematics California State University, Fullerton

Volume 16 | Spring 2014 Marks of a CSUF Graduate from the College of Natural Sciences and Mathematics

Graduates from the College of Natural Sciences and Mathematics:

Understand the basic concepts and principles of science and mathematics.

Are experienced in working collectively and collaborating to solve problems.

Communicate both orally and in writing with clarity, precision and confidence.

Are adept at using computers to do word processing, prepare spreadsheets and graphs, and use presentation software.

Posses skills in information retrieval using library resources and the Internet.

Have extensive laboratory/workshop/field experience where they utilize the scientific method to ask questions, formulate hypothesis, design experiments, conduct experiments, and analyze data.

Appreciate diverse cultures as a result of working side by side with many people in collaborative efforts in the classroom, laboratory and on research projects.

In many instances have had the opportunity to work individually with faculty in conducting research and independent projects. In addition to attribtes of all NSM students, these students generate original data and contribute to the reseach knowledge base.

Have had the opportunity to work with very modern, sophisticated equipment including advanced computer hardware and software. Dimensions

Dimensions: The Journal of Undergraduate Research in Natural Sciences and Mathematics is an official publication of California State University, Fullerton. Dimensions is published annually by CSUF, 800 N. St. College Blvd., Fullerton, CA 92834. Copyright ©2012 CSUF. Except as otherwise provided, Dimensions: The Journal of Undergraduate Research in Natural Sciences and Mathematics grants permission for material in this publication to be copied for use by non-profit educational institutions for scholarly or instructional purposes only, provided that 1) copies are distributed at or below cost, 2) the author and Dimensions are identified, and 3) proper notice of copyright appears on each copy. If the author retains the copyright, permission to copy must be attained directly from the author.

About the Cover

The national forests are one of California’s most beautiful features. The dimensions of California’s giant sequoias deem them to be the largest living thing on earth. The bark is extremely thick, and dynamic in depth and texture. The cover was shot at Sequoia National Forest, containing the world’s most massive trees, with widths that reach as wide as 40 feet, and reach as tall as 274 feet. Acknowledgements

Editor-In-Chief Special Thanks To Chris Baker - Geology President Mildred García, Dean David Bowman, and Assistant Dean Amy Mattern Editors for their support and dedication to Dimensions. Kim Conway - Biology Neha Ansari - Chemistry/Biochemistry O'hara Creager - Geology Kali (Duke) Chowdhury - Mathematics Phillipe Rodriguez - Physics

Advisor Amy Mattern - Assistant Dean for Student Affairs

Graphic Design Niv Ginat - Layout Editor Carose Le - Cover Designer

College of Natural Sciences & Mathematics Dr. David Bowman - Interim Dean Dr. Mark Filowitz - Associate Dean Dr. Kathryn Dickson - Chair Department of Biological Science Dr. Christopher Meyer - Chair Department of Chemistry and Biochemistry Dr. Phillip Armstrong - Chair Department of Geological Sciences Dr. Stephen Goode - Chair Department of Mathematics Dr. James M. Feagin - Chair Department of Physics

College of the Arts Dr. Joseph H. Arnold - Dean Maricela Alvarado - Interim Assistant Dean for Student Affairs John Drew - Graphic Design Professor Biology Chemistry & Biochemistry

Determinants Of Prevalence Of Bot Infestation In Development of Micelles Containing Self-Immolative 8 Thirteen-Lined Ground Squirrels In Colorado Shortgrass Steppe 40 Polymers for Applications in Drug Delivery Kim Conway Neha F. Ansari, Gregory I. Peterson, Andrew J. Boydston

Comparison of Worm Cast Application Methods on Soil The Synthesis of Anhydrous Cyclopropanol 14 Water Retention and Plant Growth 41 Jennifer Castillo Tiffany Duong

The Intrinsically Disordered Protein Stathmin Exists as an Oligomer Effects of Sorted and Unsorted Food Waste Diets on Worm 46 in Solution, as Measured by Static Light-scattering and Dipolar 19 Cast Quality and Quantity Broadening EPR Spectroscopy. Calvin Lung Ashley J. Chui, Katherina C. Chua, Jesus M. Mejia, and Michael D. Bridges

Effects Of Surface Drip Irrigation Compared To Sub 23 Surface Irrigation On The Yield Of Peppers Michaelis-Menten Kinetics of TEV Protease as Observed by Miriam Morua 47 Time-domain EPR Spectroscopy Estefania Larrosa and Michael D. Bridges

Morphological and Genetic Identification of 29 California Pipefishes (Syngnathidae) C.A. Rice, D.J. Eernisse, and K.L. Forsgren Geology

Geochemical Correlation of Basalts in Northern Deep Springs 30 Investigating Defense Responses of Nicotiana 48 Valley, California, by X-Ray Fluorescence Spectroscopy (XRF) Benthamiana Involving the 14-3-3 Gene Family Using Aaron Justin Case Virus-induced Gene Silencing (VIGS) Jennifer Spencer Possible Shell-Hash Tsunami Deposit at the Los Penasquitos Marsh, 56 San Diego County, CA. Jeremy Cordova

Paleotsumani Research at the Seal Beach Wetlands, 57 Seal Beach, CA D’lisa O. Creager, Brady P. Rhodes, Matthew E. Kirby, and Robert J. Leeper

DIMENSIONS 5 Mathematics

Michaelis-Menten Kinetics of TEV Protease as Observed by Permutation Test in Microarray Data Analysis 58 Time-domain EPR Spectroscopy 69 Mirna Dominguez Garcia, Dylan J., Kirby, Matthew E., Rhodes, Brady P., Leeper, Robert J.

Metasomatism of the Bird Spring Formation Near Slaughterhouse Mathematica and Clifford Projective Spaces 60 Springs, California 73 Andrew Halsaver Taylor Kennedy

Understanding the Orientation of the Sierra Nevada Frontal Fault Biomimetic Pattern Recognition in Cancer Detection 65 System in the Vicinity of Lone Pine and Independence, CA 78 Leonila Lagunes Garrett Mottle

Origin of Debris Flow Deposits on Starvation Canyon Fan,Death A Statistical Analysis of Orange County K9-12 Mathematics 66 Valley, California 87 Achievement Data Kelly Shaw Nathan Robertson, Soeun Park, Susan Deeb

Landslide Hazards in the Pai Valley, Northern Thailand 67 Garcia, Dylan J., Shaw, K., Rhodes, B.P., Chantraprasert, S. Physics

Petrographic and geochemical analysis of the Summit Gabbro Scattered Light Measurements for Advanced LIGO’s 68 and associated granitoids of the Kern Plateau, Southern Sierra 96 Output-Mode-Cleaner Mirrors Nevada, California Adrian Avila Alvarez Elizabeth White

Biographies 98 Dimensions 2014 Editors and Authors

DIMENSIONS 6 Determinants Of Prevalence Of Bot Fly Infestation In Thirteen-Lined Ground Squirrels In Colorado Shortgrass Steppe Department of Biological Science, California State University, Fullerton

Kim Conway Advisor: Dr. Paul Stapp

Abstract Larvae of parasitic grow inside and feed upon tissues of (Family Oestridae), for example, spend their entire larval cycle inside wildlife species and therefore depend upon healthy hosts. Bot fly mammalian hosts (Catts 1982) and benefit when their host is healthy ( sp.) larvae were discovered on thirteen-lined ground (Slansky 2007). Bot flies of the genus Cuterebra are host-specific squirrels (Ictidomys tridecemlineatus) during long-term monitoring (Catts 1967), and typically infest small North American rodents, studies in northern Colorado. Although bot flies are common including chipmunks and tree squirrels (Catts 1967; Jacobson et al. parasites of small mammals, there were no records of infestation 1961). Although bot fly infestation is usually not fatal, it can cause of this squirrel species and the species of bot fly was unknown. I energy loss, malnutrition, and secondary infection at the site of larva examined prevalence and load of bot flies in ground squirrels trapped emergence (Catts 1982; Slansky 2007). Higher loads of bot flies may in shrub and grassland habitats in spring and summer between 1999- also interfere with a host’s ability to forage, escape predators, and 2011 to determine host characteristics and environmental factors reproduce (Catts 1982; Slansky 2007). that influence patterns of infestation. I also investigated possible Host traits such as age and sex may affect the rate of bot fly effects of prescribed fires in grasslands on infestation prevalence. infestation (Jacobson et al. 1961). In addition, habitat structure may Lastly, I used molecular genetics techniques to sequence the COI influence parasitism rates; for example, Blair (1942) found that bot fly gene of preserved bot fly samples in an attempt to identify the parasitism of rodents was higher in shrub-dominated areas than in species. Infested squirrels were rarely found on shrub sites and grasslands. Prescribed burning, a common habitat management during spring trapping. Across all summers, average prevalence of technique in grasslands that alters habitat structure (Converse et al. infestation in grasslands was 7.9%. Infested squirrels had 1-7 bots, 2006), can also affect the abundance of both rodent hosts and their with 44.0% having only 1 larva. Infestation did not vary greatly with parasites. Working in Oklahoma, Boggs et al. (1991) found that bot host sex, age, or weight. Prevalence was significantly higher (33.0%) fly parasitism in small mammals was higher in unburned areas than in burned sites one year after a prescribed fire, and remained burned areas. consistently higher on burned sites than on unburned sites. My In 1999-2011, thirteen-lined ground squirrels (Ictidomys results suggest that fires may alter the environment in ways that tridecemlineatus) that were live-trapped a part of long-term increase the susceptibility of squirrels to infestation or the ability of monitoring studies in northern Colorado (Stapp et al. 2008) were flies to infest hosts. COI gene sequences revealed the bot fly species found to be infested with bot flies. Previous studies have determined to be most closely related to Cuterebra fontinella. infestation rates of bot flies on small mammals including the white-footed mouse (e.g. Clark and Kaufman 1990), gray squirrels Introduction (Jacobson et al. 1961), and chipmunks (Bergstrom 1992), but there Parasites harm their hosts by feeding on nutrients from inside the are no reports of bot fly parasitism of thirteen-lined ground squirrels, host’s body (Catts 1982). Although parasites can sometimes directly despite the fact that these squirrels are widespread and common in kill their host, many parasites require healthy hosts to survive Great Plains grasslands. In fact, a widely cited review by Catts (1982) and successfully reproduce, and therefore, may not pose a direct specifically stated that Cuterebra infest small rodents (including tree health risk unless infestation loads are high (Slansky 2007). Bot flies squirrels), but not ground squirrels. For example, none of the 179

DIMENSIONS 7 thirteen-lined ground squirrels trapped in Kansas were parasitized by Webs consisted of 12 100-m transects arranged in a spoke-like bot flies (Clark and Kaufman 1990), and none of the 46 thirteen-lined fashion, with extra-large Sherman live-traps every 20 m along each ground squirrels trapped in Alberta, Canada, had bot flies either transect and two traps placed at the center, for a total of 62 traps. (Gummer et al. 1997). Aside from a single anecdotal reference (Lugger Traps were baited with peanut butter and oats and shaded with PVC 1896, cited by Sabroksy 1986), there are no published records to to prevent mortality from heat. Traps were set at dawn and checked indicate which species of bot fly infests this squirrel. Morphological at mid-morning for four consecutive days in each session. Field traits usually can be used to identify species (Baird 1972) using a crews collected data on sex, age, weight, and physical condition of taxonomic key. However, keys for identifying bot flies are based on squirrels, including presence of parasites, and marked each morphological traits of the adult fly, which live only a short time individual with a colored Sharpie marker to distinguish recaptures once they emerge from the soil, and are very difficult to observe and from newly caught individuals. Individuals were released unharmed capture (Catts 1982). at their capture locations. I used data from squirrels only on their For my project, I sought to better understand the interactions first date of capture. Prior to analysis, juvenile and subadult squirrels between thirteen-lined ground squirrels in northern Colorado and their were combined into one age class (young-of-year; YOY). In some bot fly parasites. My specific objectives were to: 1) analyze existing data instances, field crews recorded evidence of multiple bot fly warbles from 1999-2011 on bot fly infestation of ground squirrels to identify on each host; however, because the exact numbers were not always host or environmental factors that may influence prevalence and recorded, squirrels were categorized as having no flies, one bot fly, intensity; 2) attempt to determine the species of bot fly on squirrels by or more than one bot fly. analyzing samples of the late-instar larvae collected from ground squir- I used two metrics of bot fly infestation. First, prevalence was rels using molecular genetics methods; and 3) ex tract live, late-instar the proportion of individual hosts that were infested with bot flies, larvae from ground squirrels, rear them to adulthood in the laboratory, and was calculated for different sex, age, and weight classes as the and confirm the species of bot fly using adult morphological traits. number of bot flies divided by the total number of individual hosts in a particular class. Second, for hosts that had at least one bot fly, Methods intensity was calculated as the proportion of infested hosts that had Data Analysis: one fly versus more than one fly. Intensity was calculated by dividing Information on rates of bot fly infestation of thirteen-lined ground the number of hosts that had multiple fly larvae by the total number squirrels from 1999-2011 was collected during live-trapping studies of infested hosts; i.e., with at least one. conducted as part of the Shortgrass Steppe- Long Term Ecological In late June 2013, thirteen-lined ground squirrels were live-trapped Research project the (SGS-LTER). The study site was the Central for four consecutive mornings in an attempt to capture squirrels Plans Experimental Range (CPER), approximately 14 km north of infested with bot fly larvae so that larvae could be extracted, reared Nunn, CO. Vegetation is characterized as shortgrass steppe, which to adulthood, and identified using a morphometric key. is dominated by two perennial, warm-season short grasses (blue grama Boutelloua gracilis, buffalograss Buchloe dactyloides), although some areas also have large, woody shrubs, especially four-wing saltbush Laboratory Methods: Atriplex canescens. Live-trapping was conducted in spring (May/June) Late-instar larvae were collected in July 2011. Three larvae were and summer (July) each year on six 3.14-ha webs (three grassland, dissected to obtain interior body tissue and cuticle tissue for DNA three shrub sites) from 1999-2011. In addition, squirrels were trapped extraction. Cuticle tissue was cut in 2-mm x 2-mm squares. A clear, on three additional grassland webs that were burned in the previous tube-like interior section of the larva was used, as it appeared to be autumn to examine the effect of prescribed fire on squirrel populations. the only visibly intact structure. Additional interior tissue was added Bot fly prevalence was measured at one, two, three, and four years to ensure approximately 0.025 grams of tissue was collected. DNA post-fire. One site (26NWSE) was burned in autumn 2007 and trapped was extracted from interior and cuticle tissue of three bot fly samples in July each year from 2008 to 2011. Other grassland webs that were using DNeasy Blood & Tissue kit and following the manufacturer’s burned in 2008, 2009 or 2010 were trapped only once in July 2011. recommendations (QIAGEN®, Valencia, CA).

DIMENSIONS 8 DNA was extracted following the manufacturer’s Tissue in a given year. Prevalence did not vary much by weight class, protocol, with the adjustment of incubating samples overnight and although the largest numbers of bots were on YOY (< 100 g; Fig. 3). eluting with 100 µl instead of 200 µl. Three samples were eluted a Bot fly prevalence of female adults (8.2%) was similar to that of male second time. DNA concentration (ng/µl) was measured using Nano- adults (7.0%). Likewise, prevalence rates of female YOY (7.4%) did Drop spectrophotometer. Three interior tissue samples were chosen not differ significantly from male YOY (7.1%; Fig. 4; X2 = 0.15, d.f. = 3, for further analysis because they had the highest DNA concentration P = 0.985). Intensity of infestation remained similar among different of all purified samples. sex and age classes (X2 = 2.49, d.f. = 3, P = 0.480), although adult Polymerase chain reaction (PCR) was used to amplify 657 bp of females had a higher intensity (75%) than other groups (Fig. 5). For the cytochrome oxidase subunit I (COI) gene in three interior samples hosts where the actual number of bots was recorded, the number of with the highest concentration of DNA. The COI gene is a region of larvae ranged from 1 to 7. mitochondrial DNA that is universally used in species identification. Combining all years of the prescribed fire study (2008-2011), The total reaction volume was 50 µl, including 1 µl of forward primer, bot fly prevalence was consistently higher on squirrels from the 1 µl of reverse primer, 3 µl of DNA sample, and 45 µl of Platinum three webs that were burned in 2007 than from those on unburned Taq Master Mix (Invitrogen™ Life Technologies). Samples were run grassland webs between 2008-2011 (Fig. 6; paired t = 3.91, d.f. = 3, in a BioRad t-100 Thermocycler at 94ºC for 3 min, followed by 35 P = 0.030). Combining results from webs based on the number of cycles of 94ºC for 30 s, 49ºC for 40 s, and 72ºC for 1 min, with a final years post-fire, prevalence of bot flies was significantly higher (33%) extension of 72ºC for 5 min. Genus-specific primers (LC01490f and on sites one year after being burned (Fig. 7; X2 = 11.98, d.f. = 3, P HC02198r) and species-specific sequences were provided by = 0.007). After an increase in prevalence during the first year after Dr. Brian Weigmann and Brian Cassell from North Carolina State burning, prevalence returned to the same rate (11-13%) by two years University. PCR products were visualized on a 2% agarose gel using after a fire, which was the same rate as that on unburned grassland gel electrophoresis. webs during the same time period (12%; Fig. 7). PCR products were sequenced by SEQUETECH (Mountain View, CA). Consensus sequences were assembled using bidirectional sequences using CodonCode Aligner software (CodonCode Corpo- ration) and aligned using clustW in MEGA 5.1 (Kumar et al. 2011). 0.08 Grassland 216 Phylogenetic analysis was conducted with maximum likelihood, Shrubs maximum parsimony, and neighbor-joining models using MEGA 5.1. 0.06 Results Analysis Of Long-Term Infestation Patterns 0.04 Between 1999 and 2011, the earliest date that bot fly warbles were

detected on squirrels was 16 May, but only six observations of bot fly Prevalence infestation were noted before 3 June. Prevalence of bot flies was 0.02 172 significantly lower in May and June (1.8% and 0.3%, respectively) 399 than in July (Fig. 1; X2 = 21.75, d.f. = 2, P < 0.001). In addition, squirrels 239 in grassland webs had a much higher prevalence of infestation (5.3%) 0 than those in shrub areas (Fig. 1; X2 = 16.02, d.f. = 1, P < 0.001). For Spring Summer this reason, further analyses were only conducted on data from Season grassland webs trapped in July. Long-term data (1999-2011) from grassland webs in the summer Figure 1. Prevalence of bot fly infestation in thirteen-lined ground squirrels by habitat type in long-term trapping webs in northern Colorado in spring and summer were highest in grasslands in summer (7.5%). indicated that prevalence was highest in 2008 (Fig. 2), with little Total number of hosts examined is shown above bars (Season: X2 = 21.75, d.f. = 2, P < 0.001, Habitat: variation in other years. On average, 7.5% of squirrels were infested X2 = 16.02, d.f. = 1, P < 0.001).

DIMENSIONS 9 0.30 0.80 8 0.25 5 0.60 25 0.20 22 22 25 5 0.15 0.40 1 bot 8 >1 bot 0.10 0.20 Prevalence 0.05 0.00 0.00 Adult YOY Adult YOY

1998 2000 2002 2004 2006 2008 2010 2012 Proportion of infested host Female Male Sex-Age-Class Year

Figure 2. Prevalence of bot fly infestation in thirteen-lined ground squirrels in Figure 5. Intensity of bot fly infestation in thirteen-lined ground squirrels in long-term long-term grassland trapping webs in northern Colorado during summer from grassland trapping webs in northern Colorado in the summer, based on data from 1999-2011. Number of hosts examined in a given year ranged from 31 to 101, 1999-2011. Number of infested hosts is shown above bars (X2 = 2.49, d.f. = 3, P = 0.480). with 59 hosts in 2008, the year with the highest prevalence.

0.14 33 0.12

0.10 23 161 24 233 0.08 68 65 104 0.06 18 23 23 25

Prevalence 0.04 Prevalence

0.02

0.00 <45 50 60 70 80 90 100 110 120 130 140 145+ Weight Class (Midpoint)

Figure 3. Prevalence of bot fly infestation in thirteen-lined ground squirrels in long-term grassland trapping webs in northern Colorado in summer, based on data from 1999-2011. Figure 6. Prevalence of bot fly infestation of thirteen-lined ground squirrels in three Number of hosts examined in each weight class is shown above bars. Values on the x-axis grassland trapping webs that were burned in autumn 2007, compared to three long-term are the midpoints of the 10-g weight classes. grassland webs that were never burned (2008-2011; paired t = 3.91, d.f. = 3, P = 0.030).

0.09 0.40

97 0.35 76 0.08 0.30 296 0.25 350 71 0.20 0.07 55 16 0.15 172 45

Prevalence 0.10

Prevalence 0.06 0.05 0.00 Unburned 1 2 3 4 0.05 Adult YOY Adult YOY Number of years since fire Female Male Figure 7. Prevalence of bot fly infestation in thirteen-lined ground squirrels in northern Sex-Age-Class Colorado on trapping webs in July as a function of the number of years since autumn Figure 4. Prevalence of bot fly infestation in thirteen-lined ground squirrels in the summer prescribed fires. Six sites were trapped one, two, and three years post-fire, and three sites in long-term grassland trapping webs in northern Colorado, based on data from 1999-2011. were sampled four years post-fire, from 2008-2011. Data from unburned webs were from Number of hosts examined is shown above bars (X2 = 0.15, d.f. = 3, P = 0.985). three grassland webs trapped in July from 2008-2011 (X2 = 11.98, d.f. = 3, P = 0.007).

DIMENSIONS 10 Molecular Genetics Analysis The COI consensus sequences revealed the three samples to be found lower levels of bot fly infestation of small mammals in burned 0.21% different from each other, and 4.9% different from its closest tallgrass prairie in Oklahoma. Boggs et al. (2007) argued that fire might relative C. fontinella (Fig. 8). have killed eggs and larvae belowground, or that the removal of litter by burning made the microclimate unsuitable for developing 77 Sample 1 100 Sample 2 larvae. My results suggest that, in shortgrass steppe, where there 99 Sample 3 is no significant litter layer, fires may cause environmental changes

81 Cuterebra f. fontinella that increase the susceptibility of squirrels to bot fly infestation. For Cuterebra bajensis example, burning of grasses may alter the vegetation near burrows

90 Cuterebra leupsuculi in a way that makes it easier for squirrels to come in contact with bot Cuterebra atrox fly eggs. In addition, fires may have altered the environment in a way Cephenemyia phobifer that increased bot fly survival, such as warming the soil. Having a Cuterebra austeni warmer soil temperature during the time larvae pupate may lead to 99 Cuterebra tenebrosa 100 a higher number of bot flies emerging in the spring. Lastly, autumn 69 Cuterebra emasculator fires may alter the suitability of habitat for oviposition by adult flies. Because bot fly larvae could not be identified morphologically, Figure 8. Phylogenetic tree indicating C. fontinella as the closest relative to the unidentified bot fly samples (Sample 1-3). Maximum Likelihood bootstrap consensus tree using General Time Reversible Model and molecular genetic testing of preserved larvae was used needed to discrete Gamma distribution was generated using MEGA 5.1 software. attempt to identify the species of fly. My results indicate that, with a difference of only 0.21% between sequences, the three bot fly larvae Discussion are most likely of the same species (intra-specific variations are less My study offers the first detailed description of infestation of thirteen- than 1.0%, Hebert et al 2003). Based on the maximum likelihood lined ground squirrels with bot flies. Prevalence was significantly trees, the larvae appear to be most closely related to C. fontinella, higher in summer (July) than late spring (May, June), and in grasslands although there is a 4.9% difference between the COI gene sequence than in shrub areas, which differs from the account of Clark and and that of C. fontinella. It is unclear, however, whether this difference Kaufman (1990) who found parasitism was higher in shrub areas is large enough to consider it a new species. Although the widely used than grasslands in tallgrass prairies. Prevalence did not vary much by difference in gene sequence between species is 2.0%, studies have host sex or age. Of the squirrels infested with bot flies, adult females indicated that this boundary may not be accurate in determining tended to have higher loads that other sex and age classes, although species divergence in , for which inter-specific variation can this result was not significant. Prevalence varied from 2.0% to 25.4%, range from 1.0-30.7% (Cognato 2006). Examination of sequences of with an average of 7.5%, across the 13 years of sampling in grasslands. other genes, e.g. 12S, may help to resolve the species identification. Prevalence was unusually high in 2008, although the reason for this The final part of my project was to extract live bot fly larvae from spike is unknown. April-June 2008 was the third of three consecutive squirrels, bring them back to the laboratory, and rear larvae to years with very low spring precipitation and represented a period adulthood to observe morphological traits. Trapping success in 2013 during which squirrel population numbers on the site had been was low, however, with only two squirrels captured in grasslands declining since 2006 (Fig. 2; P. Stapp, unpublished data). Moreover, areas in late June, neither of which had bot flies. Similarly, no squirrels relatively few squirrels were captured in May 2008 (7), compared to captured in 2012 had bot flies, although the number of hosts again an average of 29.5 in other years (SD=8.7, range 19-45), so available was low (nine individuals in July 2012). With so few hosts captured hosts may have been scarce. over the past two years, it is not surprising that prevalence would Prescribed burns seemed to affect prevalence of bot fly infestation, be at or near zero. It is not clear if bot flies could have switched to a with prevalence significantly higher the first year after a fire. In different, more abundant rodent host, although, to date, there are no addition, prevalence was significantly higher on three grassland records on bot flies on other rodents captured at CPER since 1994 webs that were burned than on unburned webs trapped at the same (P. Stapp, unpublished data). Future work should attempt to determine time. These results differ from those of Boggs et al. (2007), who the fate of the bot fly populations and to identify other possible hosts.

DIMENSIONS 11 Acknowledgements I thank the Faculty Development Center and Associated Students, Jeff Boettner for his extensive knowledge of bot fly biology and Inc., for funding, Mark Lindquist and Kevin Meierbachtol (SGS-LTER) support of my project, and Dr. Brian Wiegmann and Brian Cassell for for collecting larvae samples and for supervising field crews, Nicole providing primers and species sequences. I also thank Karla Flores, and Kaplan and SGS-LTER field crews for collecting data, and David whose knowledge and experience with molecular techniques made Augustine (USDA-ARS) for coordinating prescribed burns. I thank the laboratory analysis part of my project possible.

References Baird, C.R. 1972. Development of Cuterebra Ruficrus Converse, S.J., Block, W.M. and G.C. White. 2006. Small mammal (Diptera: Cuterebridae) in six species of rabbits and rodents with a population and habitat responses to forest thinning and prescribed morphological comparison of C. Ruficrus and fire, Forest Ecology and Management 228:263-273. C. Jellisoni third instars, Journal of Medical Entomology, 9:81-85. Gummer, D.L., Forbes, M.R., Bender, D.J., and R. M. R. Barclay. Bergstrom, B.J. 1992. Parapatry and encounter competition between 1997. (Diptera: Oestridae) parasitism of Ord’s kangaroo rats chipmunk (Tamias) species and the hypothesized role of parasitism, (Dipodomys ordii) at Suffield National Wildlife Area, Alberta, Am. Midl. Nat. 128:168-179. Canada, J. Parasitol. 83:601-602.

Blair, W.F. 1942. Size of home range and notes on the life history of Hebert, P.D.N, Ratnasingham, S., and J.R. de Waard. 2003. Barcoding the woodland deer-mouse and in northern animal life: cytochrome c oxidase subunit 1 divergences among Michigan, J. Mamm. 23:27-36. closely related species, Proc. R. Sond. Lond. 270:S96-S99.

Boggs, J.F., Lochmiller, R.L, McMurry, S.T., Leslie, D.M. Jr., and D.M. Jacobson, H.A., Hetrick, M.S. and D.C. Guynn. 1961. Prevalence of Engle. 1991. Cuterebra infestations in small-mammal communities as Cuterebra emasculator in squirrels in Mississippi, Journal of Wildlife influenced by herbicides and fire, J. Mamm. 72:322-327. Diseases 17:79-85.

Catts, E.P. 1967. Biology of a California rodent bot fly, Slanksy. F. 2007. /mammal associations: Effects of Cuterebrid Cuterebra latifrons, Journal of Medical Entomology 4:87-101. bot fly parasites on their hosts, Annu. Rev. Entomol. 52:17-30.

Catts, E.P. 1982. Biology of new world bot flies: Cuterebridae, Ann. Stapp, P., Van Horne, B., and M.D. Lindquist. 2008. Ecology of Rev. Entomol. 27:313-338. mammals of the shortgrass steppe. Pp. 132-180 in: Ecology of the shortgrass steppe: a long-term perspective (eds. W.K. Lauenroth and Clark, B.K. and D. W. Kaufman. 1990. Prevalence of botfly (Cuterebra I.C. Burke). Oxford Univ. Press. sp.) parasitism in populations of small mammals in eastern Kansas, Am. Midl. Nat. 124:22-30. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., and S Ku- mar. 2011. MEGA5: Molecular Evolutionary Genetics Analysis using Cognato, A. 2006. Standard percent DNA sequence difference Maximum Likelihood, Evolutionary Distance, and Maximum for insects does not predict species boundaries, J. Econ. Entomol. Parsimony Methods. Molecular Biology and Evolution 28: 2731-2739. 99:1037-1045. .

DIMENSIONS 12 Comparison of Worm Cast Application Methods on Soil Water Retention and Plant Growth Department of Biological Science, California State University, Fullerton

Tiffany Duong Advisor: Dr. Joel Abraham

Abstract Southern California experiences regular annual water deficit. have caused many political issues between California and its neighbors Because of this, conservation agriculture has been developed to help (California Department of Water Resources), (World Water Assessment California meet its increasing water needs. Conservation agriculture Programme). The cost of importing this water increases each year includes the use of soil amendments to increase the water retentive with a decreasing supply, so water conservation is becoming of utmost properties of the soil and thus, decreasing the amount of water importance (World Water Assessment Programme). Not only is most necessary for each crop yield. Some amendments, such as worm of the imported water used for agriculture, estimated to be up to casts, can accomplish this as well as increase plant productivity. 80-85%, but the state of California alone produces 7.1% of the world’s Worm casts, plant wastes processed by worms that are traditionally produce and more than half of the nation’s produce (California used as an organic fertilizer, were used in this experiment as a soil Department of Food and Agriculture), (California Department amendment in an attempt to increase the soil water retention and of Water Resources). With the predicted population increases, plant productivity. The worm casts were added to the soil in two ways: California’s water needs will only increase (World Water Assessment mulch and mixed into the soil. It was found that worm casts do Programme). For these reasons, identifying methods to conserve significantly increase soil water retention, more so when incorporated water in agricultural practices is currently the main area of focus in into the soil than used as mulch. During the plant productivity water conservation. observation based on the same treatments, it was also found that Agriculturists have combined several methods that conserve water the addition of the worm casts increase overall plant productivity, into what they refer to as conservation agriculture. Some common and for radishes, increases the root size instead of shoot size more so methods of conservation agriculture include efficient irrigation, genetic when the worm casts were used as mulch than mixed into the soil. In modification of crops, and watering crops only when needed in order conclusion, when worm casts are added to soil, they are able to not to decrease costs (Kassam et al., 2012). Soil amendments are known only increase the soil water retention, but also increase the roots of to help with the latter. Mulching has been developed in an attempt to plants, which have a direct implication to vegetation where the root prolong the times in between watering by decreasing the evaporative is the more necessary portion of the whole plant. water loss from the soil and is the process where the surface of the soil is covered by a layer of amendment which can range between varieties Introduction of substances. Popularly used soil amendments include wood chips and Southern California is a Mediterranean-type climate. For much of the inorganic substances such as rocks or pebbles (Dahiya, Ingwersen, and year, there are high evaporation rates with low levels of precipitation, Streck, 2007). One of the many organic amendments used as mulch causing a water deficit. This annual water deficit is predicted to increase are worm casts, or processed plant waste produced by worms. Typically over the next 100 years (World Water Assessment Programme). Because used as an organic fertilizer in agriculture, numerous studies have been of this annual water deficit, the water we use in California has to be done on their ability to provide plants with plenty of macro- and imported from neighboring states at progressively higher costs through micronutrients and their ability to reduce a plant’s susceptibility to a complicated system of aqueducts. It is estimated that up to 70% of parasites and diseases, but not a lot has been done to study the affect the water Californians use is imported from out of state sources which worm casts have on water retention (Norgrove and Hauser, 1999).

DIMENSIONS 13 It has been well documented that increasing the organic material the treatment pots, 35 mL each. 6 mL of water was added to each in the soil directly influences the water retention of said soil pot and were placed in a drying oven set to about 40-45 ̊C and their (Rawls et al., 2003). There are different methods of application of weights were measured every hour for 10 consecutive hours. soil amendments. The two main ways in which the amendments are added are through mulch or incorporating it into the soil. The different Experiment 2: Plant Productivity methods of application could have positive implications on plant The second portion of the project is the observation of plant productivity (Asawalam and Hauser, 2001). In conservation agriculture, productivity based on the different application methods of the casts. not only do costs need to be minimized, but the benefits need to The same four treatments were used, with twelve plant replicates be maximized. It is a constant struggle for farmers to identify the each. The radishes were grown from seed into seedlings for two correct combination of minimizing and maximizing, especially in arid weeks in large bins using regular potting soil inside a greenhouse. environments such as in Southern California (Kassam et al., 2012) The seedlings were given ample amounts of water at this stage. (Shemdoe, Van Damme, Kikula, 2009). Worm casts can aid farmers in The seedlings were then transplanted into their individual treatment finding this balance. They can not only retain water, but they can also pots and grown outside for an additional four weeks when the increase plant productivity. This study sets out to determine if the radishes achieved maturity based on radish growing instructions. use of worm casts can increase soil water retention and if method of 0.6 liter plastic pots commonly found in plant nurseries were used. application will affect the retention any differently. Also, this study The initial watering of the seedlings in their treatment pots was 80 sets out to determine if different methods of application will have an mL to account for the shock of transplantation and reduced to 40 effect on plant productivity for maximization of crop yield. mL for every watering period after. For the mixed experiment, 1.2 liters of casts and 4.8 liters of soil mixture were placed into a bucket, Methods and Materials mixed thoroughly, and 0.5 liters was placed in each treatment pot Experiment 1: Soil Water Retention along with one transplanted radish. For the mulch experiment, the The soil water retention levels were measured using gravimetric same amount of soil mixture was placed into a separate bucket, then water content methods. The evaporative rate of water from the soil divided into its treatment pots, and 0.1 liters of casts were added to mixture was measured for four different treatments each with five the surface of each pot after the addition of a transplanted radish. replicates. The treatments include an experimental control For the experimental and procedural controls, 6 liters of soil mixture (experimental) where the soil was not mixed and no casts were added, was placed in a large bucket for each control, the mixture for the a procedural control (procedural) where the soil was mixed but no procedural control was agitated, and then 0.5 liters of each mixture casts were added, the worm casts used as mulch (mulch) on top of the was added into their respective treatment pots. Again, the same soil soil, and the casts mixed (mixed) into the soil. For this portion of the mixture of sand and nutrient deficient soil was used. experiment, a 50% sand 50% nutrient deficient soil mix was used to For data collection, the radishes were removed from their ensure nothing present in the soil could affect the rate of evaporation treatment pots and rinsed gently in water to ensure maximum recovery of water other than the added worm casts. 35 mL clay pots commonly of the fine root hairs. The roots and shoots were separated, dried for found in any variety store were used as containers. All the soil mixture three days in a drying oven, and their dry biomass was recorded. as well as worm casts were dried in the oven for three days before use in the experiment. For the mixed treatment, 140 mL of the soil mix Statistical Analyses: was placed in a large mixing bowl along with 35 mL of worm casts For the water retention portion of the experiment, the data was (casts), and then divided into the treatment pots, 35 mL each. For the compiled into excel and an ANOVA test was run on the last data mulched experiment, the same amount of soil mix was placed in a point for each treatment. For the plant productivity portion of the separate mixing bowl, 28 mL were placed in each pot, and 7 mL casts experiment, a single factor ANOVA test was done on the root, shoot, were added to the surface. For both the experimental and procedural and overall biomass data. A pairwise comparison was performed on controls, 140 mL of soil mixture was placed in separate containers, the root:shoot and total biomass data to determine if the data were the soil used for the procedural control was agitated, and divided into significantly different from each other.

DIMENSIONS 14 Study Organism: For the plant productivity portion of the experiment, Raphanus Kikula, 2009). In this experiment, the increased water retention ability sativus (cherry radishes or radishes) was used. Easily found in of the sand and dirt mixture was due to the addition of the worm any variety store or gardening shop, radishes are a common casts, an organic material, when there was originally no organic household vegetable. Depending on the growing conditions, radishes material. Furthermore, it was also found that mixing the worm casts take up only six weeks to mature and are good companion crops into the soil significantly increased the soil water retention more because of this. than using the casts as mulch. Rawls, Pachepsky, Ritchie, Sobecki, and Bloodworth (2003) believed this is because the organic material Results has a water potential of its own. This means that the worm casts In the soil water retention portion of the experiment, every hour for have a different rate of evaporation than the soil mixture would. ten consecutive hours, the masses of the pots filled with a known Since the water was added after the casts were added, this could mean amount of dirt mixture and worm casts were measured. The data that the water in the mulch experiment stayed in the cast surface were recorded and it was found that the addition of worm casts to layer so all the water was also at the surface and more prone to the soil as both mulch and mixed into the soil increases the soil’s evaporation. However, when the casts were mixed into the soil, the water retention capabilities, shown in Figure 1. However, according water was drawn from the surface, deeper into the soil and wasn’t as to the data, it also shows that when the worm casts are added as readily able to evaporate, resulting in higher water retention. incorporated into the soil, it significantly increases the soil’s water For the plant productivity portion of the experiment, it was retention more than using worm casts as mulch. It was also found found that the overall biomass significantly increased upon addition that both the experimental control, where there were no worm casts of the casts. These findings correlate with Norgrove and Hauser added and no mixing of dirt, and the procedural control, where there (1999) who also demonstrated that the addition of worm casts were no worm casts added but there was mixing of the dirt, were not significantly increased plant biomass. This could be due to the significantly different from each other. increased retentive powers the soil had due to the added organic In the observation on plant productivity portion of the experiment, matter. In the experiment, the radishes were water stressed to radishes were grown in the same treatments as the soil water ensure that the difference between the treatments with and without retention portion of the experiment for four weeks. As shown in the casts could be more visible. Because of this, the radishes with the Figure 2a, it was found that no matter the method of application, casts had more water held within the soil and were able to out-perform the addition of the worm casts significantly increased plant the radishes that were not treated with worm casts. It is also possible productivity. Upon further inspection, it was observed that it was that the increased biomass could have been the result of the added the root of the plant, and not so much the shoots, that had increased macronutrients to the soil. In the experiment, the two controls had no in biomass (Figure 2b). According to the data collected and analyzed to little nutrients. The difference in biomasses between the controls using a pairwise comparison, it was found that the procedural and and the two treatments could possibly have been the addition of experimental controls were not significant from each other. It was more nutrients. This is what Norgrove and Hauser (1999) concluded also found that the mulch and mixed treatments were significantly as well. It was also found that the roots rather than the shoots were different from each other as well as from the two controls. increasing in size in the mulch and mixed treatments contrary to what previous literature has stated (Asawalam and Hauser, 2001). Conclusions They found that there was only a significant increase in dry biomass For the soil water retention portion of the experiment, it was found in the mulch experiment and determined that there was no significant that the addition of the worm casts positively and significantly increase in the mixed treatment. However, they studied maize and increased the soil water retention. This could be because worm casts cowpea plants where the edible portion of the plant is above ground are organic material. It is widely known that soil moisture and the as well as most of the plant. The roots of these plants do not play a amount of water soil is able to hold increases with increasing humus, major role in the overall biomass of the plant. The root of radishes or soil organic material (Rawls et al., 2003). One of the criteria for is where the plant stores its source of carbohydrates. In the case of high quality soil is the occurrence of humus (Shemdoe, Van Damme, this experiment, the root size increased because the plant was more

DIMENSIONS 15 easily able to obtain water from the soil and as a result, was able to photosynthesize and store more energy. These data have major implications on future farming techniques for vegetation where the root is edible. Urban farmers must maximize the use of the small plots of land they are allotted. Because of the results of these data, it would seem that urban farmers can possibly have higher yields or better results while also saving water and costs. However promising the data may seem, further research must be done on whether the biomass data are representative of the worm casts’ ability to increase soil water retention or whether the data are the result of the added macro- and micronutrients. Similarly, the soil water retention study portion of the experiment was done Figure 1. Average water loss in soil treatments over time. Mulch and mixed retained significantly in a controlled environment in small pots. Future research must be more water at 10 hours than did either control treatment (F= 21.17, df = 4, p < 0.001). done on whether or not the results would be the same if more common sized pots were to be used. Several errors could have arisen during the course of the experiment. For instance, during the water retention portion, an error could have occurred when the pots were weighed. There was a difference in mass between when the pots cooled and when they were measured right after they were removed from the oven. This error could possibly alter the results. Error could also have occurred during the plant productivity study portion as well. One error could have occurred due to the water stress put on the plants. The way in which the radishes were determined as experiencing water stress was slight wilt and depending on the amount of wilt, the amount of water the radishes were given differed. This could have resulted in too much water stress which would be detrimental to the health of Figure 2a. Mean total biomass of radishes across soil treatments. Radish biomass in mulch and the plants or it could have resulted in too little water stress and the mixed significantly greater than in control treatments (F = 14.666, df = 3, p < 0.001). true value of the worm casts was not actually determined. The last error that could have occurred was the consumption of the radish shoots by the larvae of an insect. This could have resulted in false data and without this error, it could have been that both the roots and shoots were increasing at a roughly equal rate.

Acknowledgements This project was supported by the United States Department of Agriculture. Special thanks to Dr. Abraham and Dr. Johnson, the supporting staff of the California State University Greenhouse, Ed Read and Jarrett Jones, and the staff at the Fullerton Arboretum. Figure 2b. Root: shoot of radish across soil treatments. Root: shoot in mulch significantly Without their support this project would not have been possible. greater than control treatments. Mixed root: shoot did not differ from other treatments This project was funded by USDA NIFA HSI Grant 2011-38422-30838. (F = 6.705, df = 3, p < 0.001).

DIMENSIONS 16 References Asawalam, D. O., & Hauser, S. (2001). Effects of earthworm casts on crop production. Tropical Agriculture, 78(2), 76-81.

California Department of Food and Agriculture. CDFA > STATISTICS. N.p., n.d. Web. 11 Sept. 2013. http://www.cdfa.ca.gov/Statistics/.

California Department of Water Resources. California Department of Water Resources. N.p., n.d. Web. 14 Sept. 2013. http://www.water.ca.gov/.

Dahiya, R., Ingwersen, J., & Streck, T. (2007). The effect of mulching and tillage on the water and temperature regimes of a loess soil: Experimental findings and modeling. Soil & Tillage Research, 96(1-2), 52-63. doi: 10.1016/j.still.2007.02.004

Kassam, A., Friedrich, T., Derpsch, R., Lahmar, R., Mrabet, R., Basch, G., . . . Serraj, R. (2012). Conservation agriculture in the dry Mediterranean climate. Field Crops Research, 132, 7-17. doi: 10.1016/j.fcr.2012.02.023

Norgrove, L., & Hauser, S. (1999). Effect of earthworm surface casts upon maize growth. Pedobiologia, 43(6), 720-723.

Rawls, W., Pachepsky, Y., Ritchie, J., Sobecki, T., & Bloodworth, H. (2003). Effect of soil organic carbon on soil water retention. Geoderma, 116(1-2), 61-62.

Shemdoe, R. S., Van Damme, P., & Kikula, I. S. (2009). Increasing crop yield in water scarce environments using locally available materials: An experience from semi-arid areas in Mpwapwa District, central Tanzania. Agricultural Water Management, 96(6), 963-968. doi: 10.1016/j.agwat.2009.01.007

World Water Assessment Programme І United Nations Educational, Scientific and Cultural Organization. World Water Assessment Programme І United Nations Educational, Scientific and Cultural Organization. N.p., n.d. Web. 19 Sept. 2013. http://www.unesco.org/ new/en/natural-sciences/environment/water/wwap/wwdr/.

DIMENSIONS 17 Effects of Sorted and Unsorted Food Waste Diets on Worm Cast Quality and Quantity

Department of Biological Science, California State University, Fullerton

Calvin Lung Advisor: Dr. Joel K. Abraham

Abstract Food waste has steadily increased over the past few decades, Vermicomposting is an increasingly popular alternative to food spurring the adoption of various waste diversion practices. One waste dispersal in many countries, including the USA, Japan, Australia form of waste diversion gaining in popularity is vermicomposting, and Italy.3 Vermicomposting is the use of worms, most commonly the use of worms to transform food waste into castings. Worm Eisenia fetida, to break down waste into fertilizer (worm castings).4 castings are an effective fertilizer. However, since food waste varies In the wild, E. fetida are epigeous, removing organic leaf litter, among households and over time, it is possible that the productivity vegetation and manure.5 This attribute makes E. fetida an ideal species of vermicompost systems, or the quality of the castings produced, for use in converting food waste into worm castings. are variable as well. The worms used in vermicompost systems The castings produced by the worms are valuable as soil respond poorly to some food inputs, such as citrus, onions, or amendment. Worm castings have high levels of nitrogen, phosphorus meats, so the inclusion of those foods may negatively affect the and potassium, and lead to increases in plant productivity when productivity of those systems. However, standardization of food applied to the soil.3,6 In addition, worm castings can reduce the waste streams is time consuming and thus may reduce adoption of prevalence or severity of certain pests and diseases, and are vermicomposting. In this study, I investigated whether food quality associated with increases in beneficial soil microbes.7–9 Some affects the productivity of vermicompost systems or the nutrient research shows that amending soil with worm castings helps reduce content of castings. I set up two vermicompost treatments (n = 7): heavy metal uptake in plants, increase levels of soil enzymes, and sorted and unsorted food inputs. I measured final mass of worm increase nutrient retention in the soil.10 In addition, producing castings, worm biomass, and casting nutrient content. I found no synthetic fertilizer is an extremely energy extensive process.11 By difference in the size of worm populations, casting yield, or casting replacing synthetic fertilizers with worm casts, it decreases the need quality between treatments. The results suggest that sorting food to consume so much energy. Given these benefits and the growing waste has little impact on the productivity of household- scale interest in home vermicomposting systems, it is important to know vermicomposting systems. Given the time investment needed to what factors might influence the quality of the castings produced to sort food waste streams and potential benefit of worm castings, the better understand the potential benefits. results may lead to higher adoption of vermicomposting as waste One factor that could affect casting quality is the quality of food diversion practice. waste used in the system. There is some controversy among avid worm growers about the different types of food that worms can Introduction consume.12 Most organisms have a preferred diet that will maximize Approximately, 34 million tons of food waste goes into landfills each growth and productivity. The majority of studies on vermicompost year.1 However, because landfills are compacted daily, they often worms focus on optimizing the population size of worms in a bin lack the oxygen and microbes necessary for waste decomposition.2 and the quantity of food worms can consume in a day, while other As the demand in landfills increase, the need to convert large areas studies focus on th e ability of waste to maintain a certain biomass of of land into landfills increases, leading to a decrease in land that can worm.4,13 Another variable that depends on food waste is the variability be used for beneficial purposes. in the quality of worm casting.

DIMENSIONS 18 Biomass and Population Dynamics Sampling Vermicomposting is increasingly popular for home use.3 The amount of worm casts produced is based off the amount of food However, given the variability in waste streams among households consumed by E. fetida, and as a result, biomass is a good indicator of and within households over time, there may be variation in the worm cast production. The biomass also describes how the worms nutritional quality of worm castings as a result. This study will fare in each type of environment. The biomass was measured by investigate whether or not such variation exists. filling two 400 mL beakers of substrate from alternate corners of the Sorting trash to remove potentially harmful components may bin and the worms and eggs were removed. The worms were sprayed improve the consistency of quality in worm casts.4 However, sorting with a small amount of water to remove the soil and then blotted with food takes time and energy that may outweigh any potential benefit. filter paper for the dry weight. Two worm biomass and egg samples Thus, it is important to understand what impact variation in food were taken every other week for the duration of the study. quality has on worm cast production.

This study focuses on the effects of food sorting on E. fetida 20000 population growth, casting production, and casting quality. I hypothesized that sorting food waste leads to larger populations, 18000 higher casting production, and higher casting nutrient content. 16000

14000

12000

Sorted Methods 10000 Unsorted Food waste was gathered from the CSU Fullerton Gastronome two 8000 times a week, Tuesdays and Thursdays, with food scraps with one of 6000 the day’s waste saved for Saturday. Waste was gathered and separated 4000 into two different categories. One category included unsorted Concentration (ppm) trash while waste from the other included an ideal mix of fruits and 2000 vegetables. Although worms are quite robust, there may be some 0 waste, such as meats, grains and fish, which may prevent them from Nitriate Phosphorus Potassium Magnesium producing optimum castings. As a result, the sorted category only contained food that is found in their ideal diet. Each category was Figure 1. The average concentration of nitrogen, phosphorus, potassium, and magnesium did not differ between the sorted and unsorted treatments. Orange bars represent the sorted then chopped into small pieces and was placed in its respective tray treatment group and the blue represents the unsorted treatment group. Error bars signify of soil and worms.14 standard error (p>0.05). A total of fourteen bins were used. Seven replicates were performed for each treatment and each sample required two trays. Soil Analysis The trays were placed in the shade at the Fullerton Greenhouse to At the end of the study, one cup of castings was taken from each bin. prevent desiccation and overheating. Although the age of the worms Each of the randomly selected samples was dried in an incubator factor into how much casts are produced on a daily basis, all worms for nine days at 35ºC. Four randomly selected samples from each will be obtained from one source so that there will be no significant treatment were sent to the Soil and Plant Tissue Testing Laboratory variation in relative age.15 The optimal amount of worms for each at the University of Massachusetts for soil analysis. One sample from bin was 1.60 kg worm/m2 and optimal amount of food was provided the sorted treatment was removed because it was an extreme outlier at 0.75 kg feed/kg-worm/day.4 Food waste was chopped into fine (order of magnitude difference in nutrient content). Each sample was pieces, estimated to be no larger than one cubic centimeter in size. analyzed for four key nutrients, (Phosphorous, Potassium, Magnesium 100 grams of food waste were measured into each bin three times and Nitrogen), in the samples. a week. There was at least one day in between feedings in order to The data were analyzed using StatPlus:mac V5 and Microsoft allow the worms to breakdown the food. Excel 2011 for Mac.

DIMENSIONS 19 Results There were no differences in nutrient levels (N, P, K, and M) between Encouraging sustainability, even on a small scale, plays a larger role the two treatment groups (p>0.05) (Figure 1). Over the course of ten by minimizing waste and decreasing a need for synthetic fertilizers. weeks, two out of the five measurements taken showed a significant These findings promote the ease and benefits of vermicompost. difference in the biomass of the worms (Figure 2). There was no significant difference in total worm biomass between the two treatment groups (P>0.05) (Figure 3). There was no significant difference found in the total mass of castings (p>0.05) (Figure 4).

Discussion In the two different types of diets tested, there appears to be no significant difference between the two treatments groups, in both quality and quantity of worm casts. All of the nutrients measured, nitrogen, phosphorus, potassium and magnesium levels were insignificantly different when analyzed. The biomass of the worms was also not significantly different. The culmination of these results dictates that there is no reason to separate out food types for worm Figure 2. The average biomass of each sample group over a series of weeks. Data was collected every other week. Error bars signify standard error. cast production. The two different types of diet had no effect in any of the variables measured and there were no distinct differences 9 between the two groups. 8 While there were clearly two different treatment groups in this 7 experiment, this research does not exactly quantify the difference in 6 the two treatment groups. As a result, the similar results obtained 5 in this experiment may be due to insignificant variation or quantity 4 of food. Further research should investigate if large amounts of 3 unsorted food would cause a larger difference in results e.g. changes

Worm Biomass (g) Worm 2 in microbial community. 1 As landfills fills up, it is necessary to utilize the space in a more 0 efficient manner. Previously, the main focus on waste divergent Sorted Unsorted strategies were on landfill design and optimizing the amount of Figure 3. A graph depicting the difference in average mass of worm casts 2,16 decomposition. However, another way to tackle this problem is between the two sample groups. Error bars signify confidence intervals (p>0.05). to divert waste from the landfills. Currently, only 24% of municipal solid waste is recycled and composted and vermicomposting helps in minimizing food waste in landfills.17 In addition, vermicompost has 1.8 1.6 the potential to convert waste into a novel income stream. One of 1.4 Figure 4. The average mass of the current obstacles in vermicomposting is the time and energy that castings collected over a ten-week 1.2 was previously thought to be required in managing food waste. period. Error bars signify confi- 1 Based on the results of this study, food sorting does not appear dence intervals and there were no 0.8 significant difference between the to have a measureable impact on casting production or quality. If two treatment groups (p>0.05). 0.6 true, it is likely that the additional effort put into sorting food for 0.4 household scale vermicompost systems does not yield any benefits. Mass of Castings (kg) 0.2 This may increase vermicompost adoption and participation rates of 0 households, thereby reducing load on landfill.18 Sorted Unsorted

DIMENSIONS 20 References 1. Food Recovery Challenge. (2014). at of Earthworms for Waste Management and in Other Uses – A Review. 1, 4–16 (2005). 2. Read, A. D., Hudgins, M. & Phillips, P. Aerobic landfill for test sustainable cells and waste their disposal implications. Geogr. J. 12. Vaneeckhaute, C. et al. Closing the nutrient cycle by using 167, 235–247 (2014). bio-digestion waste derivatives as synthetic fertilizer substitutes: A field experiment. Biomass and Bioenergy 55, 3. Businelli, M., Perucci, P., Patumi, M., Giusquiani, P. L. & Ash, 175–189 (2013). C. Chemical composition and enzymic activity of some worm casts. Plant Soil 80, 417–422 (1984). 13. Frederickson, J., Howell, G. & Hobson, A. M. Effect of pre-composting and vermicomposting on compost 4. Ndegwa, P. M., Thompson, S. A. & Das, K. C. Effects of stocking characteristics. Eur. J. Soil Biol. 43, S320–S326 (2007). density and feeding rate on vermicomposting of biosolids. 71, 5–12 (2000). 14. Haimi, J. & Huhta, V. Capacity of various organic residues to support adequate earthworm biomass for vermicomposting. 5. Aira, M., Monroy, F. & Domínguez, J. Microbial Ecology Eisenia Biol. Fertil. Soils 23–27 (1986). fetida (Oligochaeta : Lumbricidae) Modifies the Structure and Physiological Capabilities of Microbial Communities Carbon 15. Jain, K., Singh, J. & Gupta, S. K. Development of a modified Mineralization During Vermicomposting Improving of Pig vermireactor for efficient vermicomposting: a laboratory study. Manure. (2013). Bioresour. Technol. 90, 335–337 (2003).

6. Bachman, G. R. & Metzger, J. D. Growth of bedding plants in 16. Mba, C. Biomass and vermicompost production by the commercial potting substrate amended with vermicompost. earthworm. Rev. Biol. Trop. (1988). Bioresour. Technol. 99, 3155–61 (2008). 17. Barlaz, M. A. Carbon storage during biodegradation of 7. Pulleman, M. ., Six, J., Uyl, a., Marinissen, J. C. Y. & Jongmans, a. municipal solid waste components in laboratory-scale landfills. G. Earthworms and management affect organic matter Glob. Biochem. cycles 373–380 (1998). incorporation and microaggregate formation in agricultural soils. Appl. Soil Ecol. 29, 1–15 (2005). 18. Haaren, R. Van et al. The State of Garbage in America. Biocycle (2010). 8. Theunissen, J., Ndakidemi, P. A. & Laubscher, C. P. Potential of vermicompost produced from plant waste on the growth 19. Brehm, J. W. The Intensity of Motivation. Annu. Rev. Psychol. and nutrient status in vegetable production. Int. J. Phys. Sci. 5, 109–131 (1989). 1964–1973 (2010).

9. Nechitaylo, T. Y. et al. Effect of the earthworms Lumbricus terrestris and Aporrectodea caliginosa on bacterial diversity in soil. Microb. Ecol. 59, 574–87 (2010).

DIMENSIONS 21 Effects Of Surface Drip Irrigation Compared To Sub Surface Irrigation On The Yield Of Peppers Plants & H2O Lab Department of Biological Science, California State University, Fullerton

Mirian Morua Advisor: Dr. Jocken Schenk

Abstract Conservation of irrigation water is a major concern for urban agriculture resource centers. Appropriate irrigation models should agriculture in semi-arid regions, such as southern California. A field be researched to demonstrate to the general public how easy it is to experiment was carried out in the Fullerton Arboretum February practice efficient water management and conservation. through October 2013 to compare the efficiency of different drip About 39% of all fresh water in the United States goes to irrigation systems, surface irrigation and subsurface irrigation irrigation; of which mostly is used for Furrow irrigation. Furrow for growing arbol and poblano peppers. The subsurface irrigation irrigation is irrigation by water run in furrows between crop rows. system was buried six inches deep into the soil, while the other was Most farmers practice furrow because it can be recycled back to a placed on the soil for surface drip irrigation. It was hypothesized that major source of water to minimize water waste. However, furrow subsurface irrigation would be the most efficient system because of irrigation tends to not be uniform and labor is wasted in forming the reduced evaporation and water delivery directly to the root zone. canals, furrow ends, and borders. The lack of uniformity directly The results of this experiment revealed that the highest productivity affects the distribution of water irrigation because it can result of peppers was seen for plants under subsurface irrigation. Soil in drainage and increase of salinity in the soil. Since the 1990’s, moisture contents for subsurface irrigation varied less over time alternative methods of deficit irrigation have been used such as than those for surface irrigation. This suggests that there was less micro-jet irrigation, drip irrigation, and subsurface irrigation. English evaporation and more water retention in the soil under subsurface et al. (1990) urged the usage of deficit irrigation in order to optimize irrigation, which benefited the plant growth and productivity. Lastly, strategies to sustain water deficit and reduction of productivity yield. stomatal conductance for arbol peppers was higher under subsurface Furthermore, they suggested that increasing profits and crop irrigation, indicating lower water stress. Thus, it was concluded that production could be achieved if deficit irrigation was used because subsurface irrigation is a better system for conserving water and irrigation is limited within vegetative and late ripening stages in maintaining high pepper yields. order to optimize water use efficiency. Snyder et al. (2008) surveyed the irrigation methods used in Introduction California in 2001. The survey was mailed to 10,000 growers in Southern California is suffering from limited water supply due to high California that were randomly selected from the California evaporation and low precipitation coupled with frequent droughts Department of Food and Agriculture database. This study concluded (McDonald, 2007). It is estimated that half of the water used in that there was an increase of 15% to 31% of orchards mostly irrigated southern California is for irrigation (Baum, 2012). It is necessary to by low volume irrigation such as drip and micro-sprinkler irrigation optimize water usage especially in agricultural applications in order due to an increase of 33% from 1972 to 2002. There was also a 31% to carry the growing populations. Moreover, there are environmental increase of the use of surface irrigation methods in land use. The concerns related to irrigation including depletion of water source and authors noted that sprinkler usage has decreased in orchards and soil erosion. Therefore, it is crucial to present a low cost, efficient, vineyards but still used in vegetable crops. and resourceful irrigation model that can be used and applied in Based on Snyder et al. it was suggested that the most popular urban settings such as residential homes, school gardens, or irrigation method being used for land use was drip, surface, and

DIMENSIONS 22 micro-jet irrigation. There is a consensus that sprinkler systems are 150 seeds of each pepper variety. Each cell was filled with starting no longer used in orchard agriculture but may be used for vegetable compressed soil, sunshine#1 which contained important nutrients crops. Additional research must be made in order to assess which (i.e nitrogen) purchased from McConkeyCo in Anaheim. One seed irrigation system is more efficient in terms of productivity and water was added per cell to optimize growth of plants. Cells were kept conservation. For the purposes of this study, surface drip irrigation under fluorescent light for 24 hours at room temperature and (SDI) and subsurface irrigation (SSDI) are the two methods that are watered twice a week. Germination was seen in early February and investigated. the seed was transplanted into larger pots in early March (Fig A, SDI is commonly used for more than one land use purpose; Appendix). These pots were kept inside the greenhouse until they it can be used with orchards and vegetable crops. Surface drip began to mature. At the end of March, the pots were moved to an irrigation involves dripping of water slowly to the roots of plants on open greenhouse where they were allowed to acclimate to higher the soil surface through emitters that control water flow. SDI is the temperatures and sunlight (Figure b, Appendix). Also, the plants network of drip tape on the surface that carries a low flow of water were watered with soluble fertilizer bi-weekly to provide them under low pressure to plants. It is adaptable and changeable because with nutrients. In early April, the individual pots (200+) were moved it can be removed and manipulated to fit any crop. Another advantage outside the greenhouse to acclimate to stronger sunlight, wind, and is that it minimizes water loss through runoff. higher temperatures. Peppers (Capsicum annuum, two varieties of SSDI is the application of drip line polyethylene tubing with built peppers: chile de arbol and poblano) were transplanted at the end in emitters located beneath the soil to pump water under low pressure. of May in the Fullerton Arboretum farm site. The area being used SSDI involves temporarily buried drip tape located below or at the was 30ft by 30ft and contained 11 prepared soil beds. The distance plant roots. It involves water delivery directly to the root systems of the between the rows was 27 inches apart and treatments were divided plants; however, it is more permanent because perforated pipes are by rows. A total of 5 rows were used for each treatment as replicates installed beneath bed to control water flow and drainage. As a result it for the irrigation method. For example, 5 rows had the same set up expected to have no water loss due to evaporation, run off, or wind drift. that displayed the dripper line on the surface of the bed. The last five The purpose of this experiment was to compare these two rows obtained the dripper line 6 inches deep into the bed to display irrigation methods in order to investigate a difference between each the underground subsurface treatment. mechanism and access water management and productivity yields. The efficiency will be measured through productivity yield, stomatal Expirmental Treatment conductance, and soil moisture retention. It was hypothesized that The experiment consisted of two irrigation methods, SDI and SSDI subsurface irrigation would be the most efficient in conserving water on two different types of peppers. Each treatment was replicated (soil retention) and have highest productivity yield. four times with random distribution of pepper plants on each bed (n=20 count). Each replicate or row measured 30 feet in length and Methods 15 inches in width. Both treatments were equipped with separate Expirmental Site pipelines connecting to separate pressure regulators and pipelines The field experiment was carried out at the Fullerton Arboretum in that connected to the main water source. Each row was irrigated California State University, Fullerton. To investigate the efficiency of by a single lateral pipeline connected to the main water source. The different irrigation systems, the techline EZ (12 mm dripper line) was techline EZ was placed in the surface for the surface treatment and used under different irrigation conditions. The irrigation water source buried underground for the subsurface treatment (Figure C, Appendix). was obtained from the Arboretum’s water supply. The techline EZ was one quarter inch thin and was placed on the bed 12 inches apart. Drip irrigation pressure regulators were required in Expirmental Procedure order to ensure the operating water pressure rating for drip compo- Pepper seeds were ordered from the Chile Pepper Research Institute nents does not exceed manufacturers recommended of the University of New Mexico. Germination of seeds was done operating pressure. There were two pressure regulators used to using two multi-block sewing containers with 200 cells, used to plant separate the treatments at 45 psi. The same amount of water was

DIMENSIONS 23 delivered twice a week in the summer and once a week in the fall 300 time. The total amount of water delivered was 12 gallons.

500 s) Measurements 2 Stomatal Conductance (mmol/m2s) was collected using a steady state leaf porometer(Model SC-1,Decagon Devices Inc). Measurements 400 were collected once a week for 10 plants for each replicate (n=5) from July to September. Soil moisture content (pct) was collected by placing soil moisture probes (EC-10, Decagon Devices Inc.) at two 300 inches in depth for surface treatment, and 6.5 for subsurface treatment. Probes were installed upstream, middle, and downstream of each replicate’s’ bed for each treatment: upstream described the 200 probe closer to the balb valve, the middle probe was placed to describe the moisture in the middle of the bed. While the downstream

Mean Stomatal Conductance (mmolm 100 probe described the moisture at the end of the row near the kink, where flushing occurred. Measurements were done weekly to measure percent of water content (pct)(cm/m) in the soil to assess 0 the amount of water retained after irrigation (Model ECH2O, Poblano Arbol Poblano Arbol Decagon Devices Inc.). Lastly, productivity was measured using the Salter Brecknell scale (capacity 5 lbs). Water stress was evaluated Surface Subsurface based on the visible indications of reduced growth, delayed maturity, Figure 1: Mean stomatal conductance for arbol peppers was higher under subsurface irrigation, and reduced crop yield per treatment replicates. indicating lower water stress (m=333.1 mmol/m²s) compared to the surface irrigation value. Statistical analysis indicates significant difference between arbol peppers (p=0.033). 95% confidence intervals were established as error bars. Results For three months the pepper plants were allowed to grow and mature (mean=341.849 mmol/m2s). A t-test indicated that the values were at the Fullerton Arboretum site. Starting in July when they were not significantly different when comparing the poblano peppers of first transplanted, plants were maintained and watered at the same each treatment (surface vs. subsurface) (p=0.35). However, there was time and consistency. Based on weather reports from the Fullerton a significant difference between the arbol peppers of each treatment Municipal Airport weather station, the average humidity in Fullerton, for stomatal conductance (p=0.0325). California for the months of July through October were ranging from 43 % to 93% humid. October reached very humid conditions rising to Soil Moisture: almost 88%. During the treatment trials, weather was very dry and The results demonstrated that soil moisture retention was observed at hot temperatures thus plants were under very dry conditions. to be more consistent in the SSDI treatment because there was a trend that showed similar soil moisture retention (6in depth) Stomatal Conductance: between each replicate at upstream, middle, and downstream The mean for each replicate stomatal conductance per pepper variety placements (Figure 2). When comparing soil moisture retention in was obtained as a direct reflection for each treatment, SDI and SSDI. the SDI (2in depth) there was no obvious trend observed upstream, A high conductance mean indicated that the stoma for each pepper middle or downstream in the mean replicate values (Figure 3). variety was open more frequently for the subsurface treatment. ANOVA demonstrated there was a significant difference between all Therefore, there was a higher stomatal conductance observed the soil moisture groups for surface and subsurface values (p=0.011). for the subsurface treatment compared to the surface treatment However, the post hoc of multiple comparisons for each group at for both arbol (mean=333.11 mmol/m2s) and poblano peppers each treatment was not significantly different (p>.05).

DIMENSIONS 24 compared to the subsurface. Overall, the peppers under surface irrigation seemed to be doing better compared to the subsurface method. However, results indicated differently because overall pepper productivity was continuously higher under subsurface irrigation (224 lbs) compared to the surface irrigation (217 lbs) method throughout time (Figure 4). A t-test was used to compare the end productivity of each treatment where it showed no significant difference between them (p=0.33). The goal of this experiment was to investigate which irrigation method was better in conserving water while maintaining productivity. It was hypothesized that the subsurface irrigation would be a better choice for water conservation because the irrigation line was underground leading to reduced evaporation and

Figure 2: Soil moisture (pct) was averaged for each row to represent the surface treatment, up- higher producitivity because water was delivered under low pressure stream (blue), middle (red), and downstream (green) of the row. There is no obvious trend observed, thus no indication of similar soil moisture retention throughout the treatment. ANOVA indicates there is a significant difference between all groups in surface and subsurface sections (p=0.011).

Figure 3: The soil moisture mean was obtained for all four replicates to represent the subsurface Figure 4: Productivity was measured in pounds from July to September (t=90 days). Subsurface treatment, upstream (blue), middle (red), and downstream (green) of the rows. There is an obvious (224lbs) had a higher productivity over time compared to the surface (217lbs) treatment. Statistical trend that indicates similar water distribution. ANOVA demonstrated that there was a significant analysis from t-test between groups showed no significant difference (p=0.33). difference between all groups for surface and subsurface irrigation (p=0.011).

directly to the roots. Although surface irrigation also delivered water Productivity & Qualitative Analysis: under low pressure directly, risks of evaporation and run off had a Throughout the experiment, productivity was witnessed by observing higher chance to occur. However, it was an interesting method to physical stress of the plant, maturity, and relative growth of the investigate because it required less labor, better accessibility, and plants. When comparing methods, the surface irrigation method, water was still delivered at a close proximity to the plant. which involved the dripper line on the surface, had more flower Based on the results obtained from this study, it is recommended buds compared to the subsurface (Appendix, Figure D). In addition, that subsurface irrigation is the better system to use when insufficient surface had more arbol pepper count and stronger pepper coloration water is available because there was a more uniform soil moisture

DIMENSIONS 25 retention within this system. The results demonstrated an obvious grow, and thus reflects productivity of the plant. Although, at times trend of soil moisture content (pct) throughout each replicate and pepper plants under subsurface irrigation seemed to wither and section of the subsurface system treatment, however there was no break faster, it was due to the faster growth and higher productivity obvious trend to indicate water was delivered uniformly throughout of the plants. Thus, it was concluded that the subsurface irrigation in the surface treatment. Therefore, this suggests that subsurface system is a better method to use because it ensures conservation of irrigation allowed more water to be retained within the root systems water while promoting high productivity yields. of the plants. This ensures the plant to have water available under In addition, stomatal conductance or the state of the stoma drought conditions to prevent water stress, ensure growth and was observed in order to see how frequently the stomas were open. maturity, and provide higher productivity yields. The subsurface Results indicated that stomas were opened more frequently in plants dripper line was buried 6.5 inches below the soil surface, and root under subsurface irrigation treatment. Having plants with their systems were rooted in deep close to the dripper line. At this depth, stomas open suggests that plants are exchanging gases with the soil moisture retention was higher because of minimum evaporation environment at a higher rate. Since stomatal opening and closing is loss with this method. It was expected to find a higher percent water a result of changes in the turgor of guard cells and epidermal cells, content downstream of the row because that was the area where stomtatal state to reduced leaf water can be assumed. Tardieu and water was flushed out of the irrigation system. These observations Davies (!993), demonstrated that stomatal conductance is regulated were also reflected on the roots of the plants of each system when by leaf water status of plants in drying soil. That the opening of observed. The root system for subsurface were deep rooted compared stoma is controlled by chemical messages from dehydrating roots to shallow roots in the surface method. (Tardieu and Davies, 1993). The authors results agree with our Results compare to previous research findings because where investigation of stomatal conductance to suggest that the reason they also found higher soil moisture content in subsurface irrigation. stomatal conductance was lower in surface irrigation system was Kheira and Abdrabbo (2009) investigated different decifit irrigation because the roots may have been dehydrated due to a dryer soil. systems of corn in the nile river. The authors investigated three These findings agree with the random uniformity of the soil moisture systems of irrigations such as furrow irrigation using gated pipes content (Figure 2&3). technique, subsurface drip irrigation system, and surface drip It was concluded that subsurface irrigation is a more efficient irrigation. They concluded that subsurface irrigation was the best irrigation system compared to surface drip irrigation because it option for drought conditions because it displayed higher efficiency conserved water while maintaining high productivity of pepper yield. compared to the other irrigation systems (Kheira and Abdraddo, Based on these findings, it is recommended for areas with 2009). They compared the soil moisture distribution before irrigation high drought such as in southern California to consider using subsurface to find higher soil moisture retention at 60 cm compared to the other irrigation because it ensures plant growth and minimum evaporation treatments (Kheira and Abdraddo, 2009). Moreover, they discovered of water. that the distribution of water was more uniform for all treatments in SSDI is a good method to optimize water use because water can the subsurface for the corn irrigated with the subsurface treatment be conserved, supported by our findings of soil moisture retention. (Kheira and Abdraddo, 2009). Therefore, subsurface irrigation is a Although this system does require more labor than surface irrigation, better irrigation system because water is conserved more it makes a larger impact on water conservation and productivity. efficiently and thus a better system for high drought areas like Future studies should include measuring the resistance and southern California. transpiration rate per plant variet under irrigation methods which Due to more water being available for the plant’s subsurface would provide more direct information on the plant exchanges of root system, it was expected to find that productivity was higher gases. Measurements of leaf water potential and root water potential in plants under subsurface irrigation compared to the plants under can also be done to obtain information on water flow between the surface dripper irrigation. There was a higher productivity of pepper plants. Improvement on this experimental design can be done by yield throughout time in the subsurface system. This suggests that using the same pepper variety on each sub plot and not randomizing having higher soil retention near root systems ensures plants to the peppers to make a true replicate.

DIMENSIONS 26 References Baum-Haley (2012) Water Use Efficiency Programs Specialist Municipal MacDonald, G.M., 2007. Severe and sustained drought in southern Water District of Orange County California and the West: present conditions and insights from the past on causes and impacts. Quaternary International 173–174, 87–100. English, M. J., J. T. Musick, and V. V. N. Murty. 1990. Deficit irrigation. Ch. 17. In Management of Farm Irrigation Systems, (Eds.) ASAE. Snyder, R.L. Orang, M.N. and Matyac, J.S. 2008. Survey of irrigation methods in California in 2001. Journal of Irrigation and Drainage Kheira, A. and Abdrabbo, A. 2003. Comparison among Different Engineering. 134.1 (96-100). Irrigation Systems for Deficit Irrigated Corn in the Nile Valley. Agricultural Engineering International the CIGR Ejournal (5- 22). Tardieu, J. Davies, W. 1993. Integration on of hydraulic and chemical signaling in the control of stomatal conductance and water status of droughted plants. Journal of plant, cell and Environment. 16:4

Appendix

A: Germination and growth for C: The irrigation was set up peppers in Biology Greenhouse according to the image (right) under fluorescent light for which included the drip line, 24 hours. adaptors, balb valve, pressure regulator, and PVC

B: Growth for peppers in biolo- D: Demonstrates productivity gy greenhouse for acclimation growth of peppers from surface to the sun before transplanta- to subsurface (left to right) tion to the field. indicates differences in coloration and reveals overall growth.

DIMENSIONS 27 Morphological and Genetic Identification of California Pipefishes (Syngnathidae)

Department of Biological Science, California State University, Fullerton

C.A. Rice, D.J. Eernisse, and K.L. Forsgren

Abstract California pipefishes are highly diverse and particularly difficult to consistently identify. The widely used Miller and Lea Guide To The Coastal Fishes of California does not include all of the recognized Californian province species. We reexamined the key morphological traits emphasized in pipefish keys seeking to improve the diagnostic separation of pipefish species. Our data indicate that certain features, such as barred markings or a truncated snout, are reliable in separating three recognized species in California; kelp pipefish (Syngnathus californiensis), barred pipefish S.( auliscus), and snubnose pipefish Cosmocampus( arctus). In contrast, a combination of morphological and mitochondrial 16S and COI analyses have so far not supported three currently recognized species as distinct: the bay pipefish (Syngnathus leptorhynchus), and barcheek pipefish S.( exilis), and all could be synonymous with S. californiensis. Future work will further test these conclusions and our goal is to produce a more useful dichotomous key to California pipefishes. We expect to add more pipefish localities and species sequence comparisons and extend what we learn to also better characterize the identification of juveniles. The results from this study will be beneficial to fishery biologists working in the field to more effectively identify pipefishes.

DIMENSIONS 28 Investigating Defense Responses of Nicotiana benthamiana Involving the 14-3-3 Gene Family Using Virus-induced Gene Silencing (VIGS) Department of Biological Science, California State University, Fullerton

Jennifer Spencer

Introduction Plants possess multiple tiers of immunity that act to protect the injected effectors and ETS, plants have evolved R genes that produce plant cells against various types of invading pathogens including resistance – or R proteins. R proteins are typically classified by their bacteria, viruses, fungi, nematodes, aphids and oomycetes (Jones conserved domains, specifically the nucleotide binding (NB) and and Dangl, 2006). The first line of defense that provides plants with leucine rich repeat (LRR) domains with some R proteins containing a protection involves physical and chemical barriers to protect against coiled-coil (CC) domain or toll/interleukin-1 receptor (TIR) cytoplasmic infection. These types of barriers include the cell wall and waxy domain at their N terminus (Jones and Dangl, 2006). According to cuticle layer covered in anti-microbial compounds that surround the the guard hypothesis, R proteins indirectly recognize pathogen cell (Dangl and Jones, 2001). avirulence (Avr) proteins, termed effectors, by monitoring effectors If pathogens are able to breach this first line of defense, the cellular targets in order to activate and elicit an immune response to plants innate, or basal, immune system responds through recognition counteract their effects (Jones and Dangl, 2006). The concept known of conserved molecules associated with the pathogen known as as the gene-for-gene hypothesis is based on the idea that disease pathogen-associated molecular patterns (PAMPs) using resistance in plants is mediated through two complementary genes transmembrane pattern recognition receptors (PRRs) located on including the Avr pathogenic gene and the host plant R gene (Van the cell surface (Boller and He, 2009). PAMPs are essential to the Der Biezen and Jones, 1998). Through the elicitation of a localized ultimate survival of the pathogen, such as bacterial flagellin or viral programmed cell death (PCD), commonly known as a hypersensitive double- stranded RNA (dsRNA), and are therefore constitutively response (HR), the plant prevents the spread of infection to healthy expressed and not easily altered (Akira et al., 2006; Jones and Dangl, cells (Jones and Dangl, 2006). Recognition of Avr proteins by R 2006). These unchanging characteristics of PAMPs allow their proteins to trigger HR is known as effector-triggered immunity (ETI) continued recognition by plants. Recognition of PAMPs alerts the (Dangl and Jones, 2001). plant to the presence of pathogens and leads to PAMP-triggered The 14-3-3 gene family is highly conserved across eukaryotes immunity (PTI), which acts to arrest pathogen colonization in the and encodes multifunctional proteins acting in a range of cellular plant (Akira et al., 2006; Jones and Dangl, 2006). regulatory processes. 14-3-3 proteins have been previously shown Pathogens have evolved methods of counteracting PTI through to serve as linkers and adapters between sensing and activation proteins, commonly referred to as effectors. Bacterial pathogens in cell signaling through dimerization or by binding to functionally inject effectors directly into plant cells through a type III secretion diverse proteins that influence myriad cellular processes (Denison system (T3SS) while other pathogens use alternate methods (Sal- et al., 2011). In plant species, 14-3-3 genes have been shown to act mond and Reeves, 1993). Effectors have been shown to target PRRs in response to biotic stress in addition to abiotic stress, growth and and negatively affect downstream signaling or host vesicle transport division, metabolism, hormone pathways and responses to light in plant cells, leading to heightened virulence of the pathogen and (Denison et al., 2011). Several 14-3-3 proteins have been implicated interfering with plant PTI to cause effector-triggered susceptibility in disease resistance responses to pathogens that act as biotic stress. (ETS) (Jones and Dangl, 2006). In the case of the N protein recognizing Tobacco Mosaic Virus in To protect themselves from infection caused by pathogen Nicotiana tabacum, 14-3-3 isoforms interact with the TIR cytoplasmic

DIMENSIONS 29 domain, NBS and LRR domains of this R protein (Konagaya et al., In plants, gene expression can be blocked by the use of VIGS. VIGS 2004). Species of Arabidopsis have also demonstrated disease uses the mechanism of posttranscriptional gene silencing (PTGS), resistance caused by 14-3-3 genes in the case of the powdery mildew an endogenous defense mechanism that has evolved to protect cells fungal infection. Low expression of 14-3-3s have been associated against infection by targeting foreign RNAs for degradation with susceptibility and decreased resistance to powdery mildew (Purkayastha and Dasgupta, 2009). In PTGS, the cell recognizes fungus, while over- expression of 14-3-3 genes elicit HR potentially foreign double-stranded RNAs (dsRNAs) through the Dicer protein, due to an interaction between 14-3-3 proteins and the R protein which cleaves the dsRNAs into small interfering RNAs (siRNAs) that RPW8.2 in order to increase resistance (Denison et al., 2011). are subsequently loaded into the RNA-induced silencing complex Within Solanum lycopersicum (tomato), 14-3-3 gene products have (RISC) (Purkayastha and Dasgupta, 2009). RISC is able to recognize been shown to act as a receptor for the fungal toxin Fusicoccin (FC), foreign RNAs using siRNA complementarity to other RNA molecules and treatments with FC induced higher levels of 14-3-3 proteins and targets them for degradation by cleaving the base- paired region. (Moorhead et al., 1996). The mitogen-activated protein kinase Using VIGS, a fragment of the plant gene of interest is cloned into (MAPK) pathway has been shown to initiate a PCD or HR in tomato viral vectors to become equivalent to foreign RNA when the carrying the R protein Pto when infected with the pathogen recombinant viruses are used to infect plant tissue. The RISC complex Pseudomonas syringae (Oh et al., 2010); it was hypothesized that targets RNAs that complementary base-pair with the recombinant 14-3-3 proteins play a role in the MAPK pathway by stabilizing the virus-derived siRNAs, including both viral RNAs and endogenous MAPKKKα protein and activating down- stream signaling cascades mRNAs encoding the gene of interest produced by the cell that lead to PCD in plant tissues (Oh et al., 2010). However, it was (Purkayastha and Dasgupta, 2009). RISC will recognize the specified demonstrated that PCD induction was not dependent on 14-3-3 gene transcripts as foreign even if they are normally found protein interactions with MAPKKKα, but may help to bring MAPK endogenously in the cell and cleave them, causing their elimination. and MAPKKKα together in the cell and aid in their overall interaction Inhibition of gene expression through the destruction of targeted (Oh et al., 2010). Although some 14-3-3 proteins have demonstrated mRNAs is sometimes referred to as gene knockdown since a possible role in disease resistance to fungal or bacterial pathogens expression is greatly reduced but not eliminated (as in a gene in tomato, the 14-3-3 family of proteins has not been studied knockout) (Purkayastha and Dasgupta, 2009). Although VIGS has intensively in the context of resistance to the full spectrum of plant been attempted in other plant species, such as tomato, it has been pathogens, including viruses. highly inefficient in silencing targeted genes. However, Nicotiana Previously, an interaction between N. benthamiana 14-3-3 benthamiana has shown notably efficient gene knockdown using homologs and the Tm2-2 resistance protein from S. lycopersicum was VIGS, making it a model plant to use this type of silencing to study found biochemically (Sobhanian, 2011). Tm2-2 confers resistance to gene function (Velásquez et al., 2009). the Tobacco mosaic virus (TMV) 30K movement protein in tomato Preceding VIGS, plasmid constructs carrying the 14-3-3 genes species. The function of 14-3-3 genes in plant immunity through need to be created and validated to ensure the correct genes will be Tm2-2 will be investigated through silencing of endogenous silenced. This report will focus on the phylogeny of S. lycopersicum N. benthamiana 14-3-3 homologs in a transgenic line expressing and N. benthamiana 14-3-3 orthologs, amplification and creation immune receptor Tm2-2. N. benthamiana was chosen specifically for of the 14-3-3 homolog set and the subsequent verification of their this experiment as the genome was recently sequenced, it is thought cloning. The set of N. benthamiana 14-3-3 homologs will be used to to possess the same number of isoforms as tomato and has shown create a silencing library for gene knockdown using VIGS. By high efficiency in gene silencing using the virus-induced gene silencing investigating potential roles of 14-3-3 genes in resistance to pathogens (VIGS) method. Additionally, the Tm2-2 gene is functional within within N. benthamiana, we will be able to ultimately infer about N. benthamiana because it is in the same family as S. lycopersicum. functions of 14-3-3 genes in future experiments. To better understand what the overall function of a specific gene may be within the cell, the gene can be turned off and its function muted to observe phenotypic differences in its absence.

DIMENSIONS 30 Materials and Methods RNA Extraction Total RNA from Nicotiana benthamiana was isolated by RNA 1.25 μl 10 μM reverse primer, 1 μl DMSO and 0.5 μl Herculase II fusion extraction from leaves as outlined in the protocols of the RNeasy Plant DNA polymerase and ran in the thermocycler with the following Mini Kit (Qiagen). Briefly, leaf disks weighing an approximate total program: denaturation of the DNA for 2 minutes at 98°C, 40 cycles of 100 mg were punched out of leaves, flash frozen in liquid nitrogen of 30 seconds at 98°C, 30 seconds at 48°C and 1 minute at 72°C, and ground with a mortar. The frozen leaf powder was homogenized followed by 3 minutes at 72°C. in 450 μl buffer RLT containing beta-mercaptoethanol by vortexing Gel electrophoresis was performed at 125V using 0.8% agarose before being transferred to a spin column collection tube. The spin gels in 1X TBE buffer with 1% ethidium bromide (Fisher BioReagents) column was centrifuged for 2 minutes at 10,000 rpm and the flow- for visualization. The 100bp Low Scale DNA Ladder (Fisher Scientific) through was transferred to a clean tube. Absolute ethanol (0.5 was used for size estimation of PCR products. Pictures were taken with volumes of the leaf lysate) was added and mixed by pipetting. The the Gel Logic 100 Imaging System using an ultraviolet transilluminator. lysates were transferred to the new spin column, centrifuged for 15 DNA bands were excised with a razor blade and extracted from the seconds at 10,000 rpm and the flow-through discarded. The spin gel using the QIAquick Gel Extraction Kit (Qiagen). The gel slices were column was washed once with 700 μl buffer RWI and twice with 500 μl weighed and melted in 3 volumes buffer QG at 50°C for 10 minutes. buffer RPE with centrifugation for 15 seconds at 10,000 rpm, followed After mixing with 1 gel volume of isopropanol, the dissolved gel was by for a drying spin of 2 minutes at 10,000 rpm to remove residual transferred to a spin column, centrifuged for 1 minute at 10,000 rpm, wash solution. RNA was eluted from the column in 50 μl RNase-free and the flow-through discarded. This process was repeated once with water. The RNA concentration in ng/μl and 260 nm/280 nm ratio 500 μl QG buffer and twice with 700 μl PE wash buffer before were determined using a NanoDrop ND-1000 spectrophotometer. centrifuging an additional minute at 13,000 rpm and transferring the top of the spin column to a clean microcentrifuge tube. Lastly, DNA Primer Design was eluted from the column in 50 μl EB buffer and stored at 4°C. Forward and reverse primers were designed for ten 14-3-3 isoform sequences from Nicotiana benthamiana using the Primer 3 function from the JustBio online analysis tools (www.justbio.com) and incorporated restriction enzyme cut sites into their 5’ ends.

RT-PCR To create cDNA from the extracted RNA, reverse transcription-poly- merase chain reaction (RT-PCR) was performed using random hexamer primers and the SuperScript III Reverse Transcriptase Kit (Invitrogen). The combination of 8 μl RNA, 1 μl random hexamers (50 ng/μl) and 1 μl dNTP mix (10 mM) was incubated at 65°C for 5 minutes and chilled on ice for 1 minute. cDNA synthesis mix (2 μl 10X RT buffer, 4 μl 25 mM MgCl2, 3 μl 0.1 M DTT, 1 μl 40 U/μl RNaseOUT and 1 μl 200 U/μl SuperScript III RT) was added to the RNA/primer mixture and incubated for 10 minutes at 25°C, followed by 50 minutes at 50°C. The reverse transcription reaction was terminated by heating at 85°C for 5 minutes and then chilling on ice. RNaseH (1 unit) was used to digest RNA template at 37°C for 20 minutes. PCR was performed to amplify the targeted 14-3-3 sequences using the MJ Mini BioRad thermocycler and Herculase enzyme. The cDNA was mixed with 10 μl 5X Herculase II reaction buffer, 34.5 μl nanopure water, 0.5 μl 40 mM dNTP mix, 1.25 μl 10 μM forward primer, Table 1: Sequences, restriction enzyme sites and Tm values for all 14-3-3 isoform primers.

DIMENSIONS 31 Cloning into pGEM-T pGEM-T PLASMID ISOLATION In preparation to clone 14-3-3 genes into pGEM-T vectors, an A-tailing The isolation of plasmid DNA of pGEM-T clones containing the 14-3-3 reaction was performed to add a single adenine nucleotide to the 5’ cDNA fragments was performed as according to the directions outlined ends of the 14-3-3 PCR products. This was done by combining 7 μl in the QIAprep Spin Miniprep Kit (Qiagen). Briefly, the bacterial pellets of the PCR product from the gel extraction with 1 μl 10X Thermopol were resuspended with 250 μl P1 buffer and transferred to a sterile buffer, 0.5 μl dATPs, 1 μl Taq DNA Polymerase and 0.5 μl nanopure water microcentrifuge tube. 250 μl P2 and 300 μl N3 buffers were sequentially (New England BioLabs). The reaction mixture was incubated at 70°C for added to the bacteria, inverting between each addition, and the 30 minutes and then ligated into vector pGEM-T (Promega). The ligation lysate was centrifuged for 10 minutes at 13,000 rpm. The supernatant reaction was performed by combining 3.5 μl of the A-tailing product, was transferred to a spin column, centrifuged for 1 minute, and the 5 μl 2X ligation buffer, 1 μl pGEM-T vector and 1 μl T4 DNA ligase flow-through discarded. The spin column was washed with 500 μl PB (New England BioLabs) and incubating the mixture at 4°C overnight. buffer and twice with 500 μl PE buffer and centrifuged an additional 2 minutes for a dry spin. Plasmid DNA was eluted from the spin columns Transformation Into E-Coli in 50 μl EB buffer. The NanoDrop ND-1000 spectrophotometer was Ligation reactions were transformed into E. coli by mixing 50 μl used to determine the 260 nm/280 nm ratio as well as the DNA competent E. coli cells with 5 μl of the pGEM-T ligation reactions and concentration in ng/μl. incubating on ice for 10 minutes. Bacteria were heat shocked in a 37°C water bath for 90 seconds and snap cooled on ice for 1 minute DNA Sequencing/NCBI Blast Search before addition of 450 μl of LB to each tube and recovery at 37°C for The pGEM-T clones were sent to Eton Biosciences Inc., in San Diego, 30 minutes. The transformed E. coli were spread on LB agar plates CA for DNA sequencing. Sequence identities and similarities to containing 100 μg/mL ampicillin and 100 μg/mL tetracycline. known 14-3-3 tomato genes (TFTs) were determined using the Basic Bacterial plates were incubated at 37°C overnight. Local Alignment Search Tool (BLAST) from the National Center of A PCR screen for bacteria containing pGEM-T vector with Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov). inserts was performed on white colonies using the MJ Mini BioRad thermocycler DreamTaq program. Reactions containing 2 μl each of Phylogenetic Tree And Alignment Construction forward and reverse M13 primers, 6 μl nanopure water and 10 μl 2X NbFTT and TFT sequences were aligned for phylogenetic analysis Dream Taq polymerase (Thermo Scientific) master mix were set up using ClustalW2 to create a preliminary sequence alignment for before picking E. coli from the transformation plates, spotting onto further similarity comparison (Larkin et al., 2007). The initial Clustal a new LB agar plate containing 100 μg/mL ampicillin and 100 μg/mL alignment in conjunction with Seaview software was subsequently tetracycline and mixed into the PCR reactions using sterile toothpicks. used to construct a Parsimony phylogenetic tree for similarities in PCR reactions were performed with an initial denaturation at 98°C the amino acid sequences (Gouy et al., 2010). Lastly, the BoxShade for 3 minutes and 40 cycles as follows: 1 minute at 98°C, 15 seconds server 3.21 was used to create shaded sequence alignments to better at 98°C, 30 seconds at 50°C and 72°C for 1 minute, with an additional visualize relatedness of the NbFTTs and TFTs (www.ch.embnet.org/ 5 minutes at 72°C after the cycles ended. software/BOX_form.html). Gel electrophoresis was performed on the PCR screen products as described above. Clones containing inserts with the correct size Results predicted for the 14-3-3 isoform cDNA fragments were incubated in To investigate potential relatedness of 14-3-3 homologs in Nicotiana LB media containing 100 μg/mL ampicillin and 100 μg/mL tetracycline benthamiana to their previously studied orthologs from Solanum overnight at 37°C with shaking. Glycerol stocks were created by lycopersicum (tomato), the sequences were obtained from the adding 300 μl of 50% glycerol to 700 μl of the bacterial culture and Solgenomics Network genomic databases and a phylogenetic analysis freezing at -80°C. To isolate plasmid DNA, the remaining bacterial of the two families of proteins was conducted (Bombarely et al., 2012). cultures containing the pGEM-T 14-3-3 clones were centrifuged at Evolutionary relatedness of S. lycopersicum 14-3-3 proteins, termed 4,400 rpm for 15 minutes, the supernatant discarded and the pellets TFTs, and N. benthamiana 14-3-3 homologs is demonstrated in the were frozen before use. phylogenetic tree (Figure 1). N. benthamiana 14-3-3 homologs, referred

DIMENSIONS 32 to as NbFTTs, were given number designations based on the initial Basic PhyML ln(L)=-3981.323 sites LG 100 replic. 4 rate classes Local Alignment Search Tool (BLAST) search of the N. benthamiana 0.1 NbFTT4 genome database with the TFT protein sequences; the best matching 87 NbFTT was named based on the TFT used in the search. The oldest 58 NbFTT9 node separating the clusters indicates that the orthologs evolved from a single common ancestor forming two main clades. The branching off TFT9 100 points between the two groups show that NbFTT and TFT homologs 3, 7, 8 and 10 are most closely related to each other and paired together TFT8 in the same clade, confirming their named designations as accurate. 85 Interestingly, TFT1 and TFT2 evolved separately from NbFTT1 and NbFTT8 NbFTT2 and reside in various clades, stemming from their most recent common ancestor. TFT4 and NbFTT4 are shown to be the most distantly TFT7 related from each other of the homologs paired by their numerical 100 NbFTT7 designations, despite their initial match, and reside in separate clades. Although TFT6 and TFT9 are closely related to their N. benthamiana TFT10 100 orthologs, NbFTT6 and NbFTT9, the branch distances of the tree show 100 that NbFTT6 and NbFTT9 evolved much later than TFT6 or TFT9. 100 NbFTT10 A boxshade sequence alignment of NbFTT and TFT orthologs further demonstrated the evolutionary relatedness between the two TFT1 groups (Figure 2). TFTs and NbFTTs show common residues with each other and demonstrate blocks of amino acid similarities distributed at TFT3 73 both the N-terminal and C-terminal ends of the proteins. Additionally, a search of the conserved amino acid sequences using the BLAST 76 NbFTT3 database shows high conservation among the 14-3-3 gene superfamily as well as multiple peptide binding sites. Further supporting their TFT2 arrangement in the phylogenetic tree, TFT6, NbFTT6 and TFT5 show TFT5 groupings of amino acids around amino acid residues 7 and 61 that are 42 not shared by members of other clades. TFT3 and NbFTT3 also 71 100 NbFTT6 demonstrate similar sequence similarities not seen in other TFTs or

NbFTTs near amino acid position 34. NbFTT6 and NbFTT9 show less TFT6 conserved amino acids throughout the entirety of their sequences, starting at amino acid residue 161 for NbFTT6 and 154 for NbFTT9. As NbFTT3 39 99 similarly seen in the distance away from their most closely related orthologs in the phylogenetic tree, the differences in amino acid 100 NbFTT5 sequence of NbFTT6 and NbFTT9 further supports their more recent NbFTT1 evolution from the last common ancestors they share with TFT5 and 93 6 or NbFTT4, respectively. Additionally, many N. benthamiana 14-3-3 TFT4 proteins show large insertions in their amino acid sequences. The splice donor and splice acceptor sites surrounding introns may not be accurate as a result of the protein-coding sequences being pieced Figure 1. Phylogenetic tree demonstrating relatedness of 14-3-3 orthologs. The phylogenetic tree was constructed using Seaview software with PhyML tree function, 100 replicates and showing bootstrap values together in silico from the genomic DNA sequence and the insertions at branch nodes. S. lycopersicum 14-3-3 isoforms are indicated as TFTs and N. benthamiana 14-3-3 isoforms may not reflect real coding sequence insertions. indicated as NbFTTs.

DIMENSIONS 33 DIMENSIONS 34 Figure 2. Amino acid sequence alignment of 14-3-3 proteins. Clustal Box- Shade multiple sequence alignment of TFTs and NbFTTs using parameters of 0.5 similarities among amino acid residues and BoxShade server 3.2. Residues highlighted in black indicate identical amino acid residues and residues highlighted in grey indicate similarly charged amino acid residues. S. lycopersicum 14-3-3 isoforms are indicated as TFTs and N. benthamiana 14-3-3 isoforms indicated as NbFTTs.

DIMENSIONS 35 Figure 3. RT-PCR amplification of N. benthamiana 14-3-3 cDNA. cDNA was separated by gel electrophoresis and visualized on 0.8% agarose gels. A. NbFTT6 and NbFTT9 with bands shown around 600bp and 400bp were both extracted for cloning and sequencing. The single NbFTT7 band around 400bp was also excised. B. NbFTT8 with bands shown around 400bp and 300bp were both extracted for cloning and sequencing and single NbFTT4 band around 500bp was also excised. DNA bands were imaged with the Gel Logic 100 Imaging System, excised and purified with the Qiagen gel extraction kit.

Amplification of N. benthamiana 14-3-3 DNA. In order to create a library for use in silencing of the NbFTT homologs, isolation of 14-3-3 homolog cDNA by RT-PCR and gel extraction, the primers were designed for use in reverse transcription-polymerase cDNA fragments were cloned into the pGEM-T vector using A-tailing chain reaction (RT-PCR) to amplify NbFTT cDNA fragments from reactions, ligation reactions and transformation into Escherichia coli. N. benthamiana RNA. Primers were designed to contain restriction Transformation of the correct 14-3-3 homolog fragments needed to enzyme cut sites for subsequent cloning into the silencing vector. be validated before further use in silencing of NbFTT homologs in N. RT-PCR gel electrophoresis confirmed the amplification of cDNA of benthamiana leaves. Validation of 14-3-3 isoform clones were done 14-3-3 homologs of the expected sizes. Several 14-3-3 homologs, by using PCR screening with the primers designed for the vector and including NbFTT1, NbFTT2, NbFTT3, NbFTT5 and NbFTT10, were separated by gel electrophoresis (Figure 4). After validation from PCR previously amplified using RT-PCR, separated by gel electrophoresis screening, clones of NbFTT3 (lanes 3 and 4), NbFTT4 (lane 3), NbFTT6 and extracted for further use in cloning and creation of the silenc- (lanes 6 and 9) and NbFTT10 (lane 7) were selected for DNA sequenc- ing library. For this study, NbFTT6, NbFTT7, NbFTT9, NbFTT4 and ing for final confirmation of the correct 14-3-3 homolog insert se- NbFTT8 were reverse transcribed and amplified from Nicotiana quences. DNA sequencing confirmed cloning of fragments of NbFTT3 benthamiana leaves (Figure 3). Due to the presence of higher bands (lanes 3 and 4), NbFTT4 (lane 3) and NbFTT10 (lane 7). Several 14-3-3 around 500-600bp and lower bands around 300-400bp, the additional homologs, including NbFTT1, NbFTT2 and NbFTT5, were previous- bands were also extracted from the agarose gel in case the different ly validated and confirmed using this method. Clones of NbFTT6, sized PCR products represented alternative splice variants and the NbFTT7 and NbFTT9 were amplified in E. coli and sequenced directly PCR products were designated as NbFTT#up or NbFTT#down. Each for confirmation of correct gene fragment inserts. While most NbFTTs NbFTT RT-PCR product confirms the expression of 14-3-3 homologs were successfully amplified and cloned, DNA sequencing revealed in N. benthamiana leaves, which is promising for adequate silencing that the primers designed for the amplification of NbFTT8 did not of 14-3-3 genes in future experiments. amplify the targeted 14-3-3 gene and instead amplified another gene. New primers need to be designed to target a different region of Cloning of N. benthamiana 14-3-3 isoforms NbFTT8 in order to amplify and clone a fragment from the intended In order to complete creation the 14-3-3 homolog silencing library, sequence. Glycerol stocks of all sequenced and confirmed 14-3-3 ho- 14-3-3 cDNA fragments were cloned into the pGEM-T vector for fu- mologs were retained for further use in subcloning for VIGS studies. ture subcloning into the silencing vector. After the amplification and

DIMENSIONS 36 were named based on their retrieval from the genome database as matches to the TFT sequences, the phylogenetic tree and sequence alignment show the designated names for some NbFTT homologs did not correspond to their most closely related TFT homologs. The differences in phylogeny between TFTs and NbFTTs could be attributed to gene duplication among the NbFTTs and contribute towards functional redundancy of several NbFTT homologs. After silencing and testing each NbFTT homolog’s interaction with Tm2-2 individually in multiple N. benthamiana plants, NbFTT members of clades, including NbFTT2, NbFTT5 and NbFTT1, will be silenced together in one N. benthamiana plant. Silencing multiple homologs simultaneously would down-regulate NbFTT proteins together that are likely functionally redundant and provide a clearer assessment of the putative interaction between NbFTT homologs and Tm2-2 (Sobhanian, 2011). However, genetic duplication could have attributed to diversification of the NbFTT genes and resulted in differences in phylogeny. The incongruity between the designations of NbFTTs and their closest evolutionary match means that the N. benthamiana sequences need to be renamed accordingly in future experiments. Although NbFTT8 was not correctly amplified, the remaining nine homologs were successfully cloned into pGEM-T to partially complete the set of NbFTTs. Primers were redesigned for future amplification and cloning of NbFTT8 in order to complete cloning of all ten homologs. After completion of the set of N. benthamiana 14-3-3 homolog pGEM-T clones, the ten NbFTT homologs will be used to create a silencing library of clones in the pTV viral silencing vector. These clones will be subsequently transformed into A. tumefaciens in the future for agroinfiltration and use in VIGS experiments. Interactions between 14-3-3 homologs and the Tm2-2 resistance protein will be assessed through the presence or absence of HR when transgenic N. Figure 4. Screening of N. benthamiana 14-3-3 DNA inserts. PCR screen products separated by gel electro- benthamiana expressing Tm2-2 is challenged with the 30K movement phoresis and visualized on 0.8% agarose gels. A. NbFTT3 (lanes 3 and 4) and NbFTT6 (lanes 6 and 9) clones were selected for validation of the insert. B. NbFTT4 (lane 3) and NbFTT10 (lane 7) clones were selected for protein. Silencing of 14-3-3 homologs essential for Tm2-2 function validation of the insert. DNA bands were imaged with the Gel Logic 100 Imaging System. could potentially prevent HR in transgenic N. benthamiana in the presence of movement protein and substantiate a role for one or DISCUSSION several 14-3-3 proteins in Tm2-2-mediate immune responses. The role of tomato 14-3-3 homologs has been previously described In some species of Arabidopsis, low expression of 14-3-3 proteins in plant immunity through involvement in the MAPK pathway and a has been previously associated with susceptibility and decreased putative interaction observed between the tomato Tm2-2 resistance resistance to powdery mildew fungus (Denison et al., 2011). Conversely, protein (Denison et al., 2011) and a NbFTT homolog (Sobhanian, 2011). over-expression of Arabidopsis 14-3-3 genes elicited HR, potentially Investigating the evolutionary relationship between TFTs and NbFTTs due to an interaction between 14-3-3 proteins and the R protein, could prove beneficial in determining the potential role of NbFTTs RPW8.2, thereby increasing resistance (Denison et al., 2011). Additional in plant immunity. Although the N. benthamiana 14-3-3 homologs future experiments will address the latter of the two cases and

DIMENSIONS 37 involve over-expression of tomato 14-3-3 genes and co-expression of resistance protein. The initiation of faster, more robust HR could Tm2-2 in wild- type N. benthamiana leaves. When challenged with indicate that 14-3-3 proteins function in Tm2-2-mediated immune the 30K movement protein, differences in HR on the leaves responses. Additionally, co-immunoprecipitation experiments with over-expressing 14-3-3 genes and co-expressing Tm2-2 could further 14-3-3 proteins and the CC domain of Tm2-2 will further investigate indicate an interaction between 14-3-3 proteins and the Tm2-2 these putative interactions.

References Akira, S., Uemastu, S., and Takeuchi, O. 2006. Pathogen recognition Moorhead, G., Douglas, P., Morrice, N., Scarabel, M., Aitken, A., & and innate immunity. Cell, 124, 783-801. Mackintosh, C. 1996. Phosphorylated nitrate reductase from spinach leaves is inhibited by 14-3-3 proteins and activated by fusicoccin. Boller, T., He, S.Y. 2009. Innate Immunity in Plants: An Arms Race Current Biology, 6(9), 1104-1113. Between Pattern Recognition Receptors in Plants and Effectors in Microbial Pathogens. Science, 324, 742-744. Oh, C., Pedley, K.F. and Martin, G.B. 2010. Tomato 14-3-3 Protein 7 Positively Regulates Immunity-Associated Programmed Cell Death Bombarely, A., Rosli, H.G., Vrebalov, J., Moffett, P., Mueller, L.A. and by Enhancing Protein Abundance and Signaling Ability of MAPKKK Martin G.B. 2012. A draft genome sequence of Nicotiana benthamiana α. The Plant Cell, 22(1), 260-272. to enhance molecular plant-microbe biology research. Molecular Plant-Microbe Interactions, 25, 1523-1530. Purkayastha, A., and Dasgupta, I. 2009. Virus-induced gene silencing: A versatile tool for discovery of gene functions in plants. Plant Dangl, J.L. and Jones, J.D. 2001. Plant pathogens and integrated Physiology and Biochemistry, 47, 967–976. defense responses to infection. Nature, 18, 826-833. Ratcliff, F., Martin-Hernandez, A.M.and Baulcombe, D.C. 2001. Denison, Fiona C., Paul, Anne-Lisa, Zupanska, Agata K. and Robert Tobacco rattle virus as a vector for analysis of gene function by J. Ferl. 2011. 14-3-3 proteins in plant physiology. Seminars in Cell & silencing. Plant J., 25, 237-245. Developmental Biology, 22, 720–727. Salmond, G.P., and Reeves, P.J. 1993. Membrane traffic wardens and Gouy, M., Guindon, S. and Gascuel, O. 2010. Molecular Biology and protein secretion in Gram- negative bacteria. Trends Biochemical Evolution, 27, 221-224. Sciences, 18, 7-12.

Jones, J.D., and Dangl J.L. 2006. The Plant Immune System. Nature Sobhanian, S. 2011. Contribution of host proteins toward R-protein 444.7117, 323-29. mediated resistance in Solanaceae family. Graduate Thesis, California State University, Fullerton. Konagaya, K., Matsushita, Y., Kasahara, M., & Nyunoya, H. 2004. Members of 14-3-3 protein isoforms interacting with the resistance Velásquez, A.C., Chakravarthy, S. and Martin, G.B. 2009.Virus-induced gene product N and the elicitor of Tobacco mosaic virus. Journal Of Gene Silencing (VIGS) in Nicotiana benthamiana and Tomato. Journal General Plant Pathology, 70(4), 221-231. of Visualized Experiments, 28, 1292.

Larkin, M.A, Blackshields, G., Brown, N.P., Chenna, R., McGettigan, Van Der Biezen, E.A., and Jones, J.D.G. 1998. Plant disease-resistance P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., proteins and the gene- for-gene concept. Trends in Biochemical Thompson, J.D., Gibson, T.J. and Higgins, D.G. 2007. Bioinformatics, Sciences, 23(12), 454-456. 23, 2947-2948.

DIMENSIONS 38 Development of Micelles Containing Self-Immolative Polymers for Applications in Drug Delivery Department of Chemistry and Biochemistry, California State University, Fullerton Department of Chemistry, University of Washington

Neha F. Ansari, Gregory I. Peterson, Andrew J. Boydston

Abstract Self-immolative polymers (SIPs) have proven to be an innovative system for the controlled release of small molecules. Most SIPs have chain end triggers that once activated, can lead to a head-to-tail depolymerization of the polymer backbone. Each monomer is then released sequentially in a timely and consistent manner, concomitant with release of the output molecule (e.g., drug or reporter molecules) from the monomer. However, the SIP’s hydrophobic properties are problematic for its implementation into biological applications, thus making it necessary to further functionalize the polymer for water solubility. This can be achieved by attaching a hydrophilic polymer to the SIP. As a result, the amphiphilic diblock copolymer can form micelles which are of interest in the field of drug delivery. In this study, we examine the multiple syntheses that lead to the formation of the SIP-containing amphiphilic diblock copolymers. Through these experiments, we aspire to learn more about the creation and stability of micelles from SIP diblock copolymers and ultimately aim to use this as a platform to demonstrate thermally activated SIPs in a novel drug delivery system.

DIMENSIONS 39 Jennifer Castillo The Synthesis of Anhydrous Cyclopropanol Dr. Rogers

Abstract A strategy for the synthesis of anhydrous cyclopropanol is being developed from a commercially-available precursor. The core of the method involves protecting 1,3-dichloro-2- propanol with a suitable silylating agent such as t-butyldimethylsilyl chloride to form a precursor compound using Schlenk techniques. Subsequent 1,3-reductive elimination to close the three- membered ring, followed by removal of the protecting group under anhydrous conditions, should result in an accessible gram-scale preparative method for cyclopropanol. Formation of the protected alcohol was confirmed by infrared spectroscopy and 1H-NMR, which showed possible conformational isomers.

Introduction Cyclopropanol is not an unknown compound; however, it has not been prepared and isolated in gram-sized quantities as a pure alcohol. Main commercial use of cyclopropanol is in the formation of esters, which is used in preparations of pharmaceuticals, flavors, fragrances, and pesticides. Also, the chemical properties of cyclopropanol are more diverse than those of enols or enolates. Another useful property is that cyclopropanols are able to undergo synthetically useful The Synthesis of Anhydroustransformations Cyclopropanol where the three-carbon ring is retained or cleaved. Cleavage causes the cyclopropanols to act as equivalents of enolates or corresponding allylic derivatives.1 Department of Chemistry and Biochemistry,Many methods California to State synthesize University, cyclopropanol Fullerton have been attempted, however, these methods produced cyclopropanol containing other substituent groups. One such simple method of preparation involves the use of Grignard addition to 1,3-dichloroacetone, as shown in Equation Jennifer Castillo 2 Advisor: Dr. Rogers 1. Cyclopropanol was first synthesized in pure form accidently in 1942 by Cottle and Magrane. This method involved the reaction of epichlorohydirn with magnesium bromide, ethylmagnesium bromide, and ferric chloride.2 Abstract A strategy for the synthesis of anhydrous cyclopropanol is being developed from a commercially-available precursor. The core of the method involves protecting 1,3-dichloro-2-propanol with a suitable silylating agent such as t-butyldimethylsilyl chloride to form a precursor compound using Schlenk techniques. Subsequent 1,3-reductive elimination to close the three membered ring, followed by removal of the protecting group under anhydrous conditions, should result in an accessible gram-scale preparative method for cyclopropanol. Formation of the protected alcohol was confirmed by infrared spectroscopy and 1H-NMR, which showedModification possible of this method involves the reaction of epichlorohydrin with MgBr2 to form the conformational isomers. magnesium salt of 1-bromoModification-3-chloro of this-2 -methodpropanol. involves If the reaction salt is oftreated epichlorohydrin with a Grignard reagent in 3 the presence of ferric chloride,with MgBr 2ring to form closure the magnesium occurs. saltHowever, of 1-bromo-3-chloro-2-pro the problem- with this method is Introduction panol. If the salt is treated with a Grignard reagent in the presence of that the cyclopropanols have to avoid contact with3 acids or bases in the presence of protic Cyclopropanol is not an unknown compound; however,3 it has not ferric chloride, ring closure occurs. However, the problem with this been prepared and isolated in gram-sized quantitiessolvents. as aGerdil pure reportedmethod a further is that the modification cyclopropanols of have the to 1,3 avoid-dihalo contact-2 with-pr opanolacids method where alcohol. Main commercial use of cyclopropanoldehalogenation is in the formation is accomplishedor bases in the by presence an electrochemical of protic solvents. process3 Gerdil reported giving a furthergood yield of 4 of esters, which is used in preparations of pharmaceuticals,cyclopropanol flavors, seen inmodification equation 2. of the 1,3-dihalo-2-propanol method where fragrances, and pesticides. Also, the chemical properties of dehalogenation is accomplished by an electrochemical process givingJennifer Castillo cyclopropanol are more diverse than those of enols or enolates. good yield of cyclopropanol4 seen The in equationSynthesis 2. of Anhydrous Cyclopropanol Another useful property is that cyclopropanols are able to undergo Dr. Rogers synthetically useful transformations where the three-carbon ring is retained or cleaved. Cleavage causes the cyclopropanols to act as equivalents of enolates or corresponding allylic derivatives.1 Many methods to synthesize cyclopropanol have been attempted, however, these methods produced cyclopropanol containing other substituent groups. One such simple method of preparation involves the use of Grignard addition to 1,3-dichloroacetone, as shown in Equation 1.2 Cyclopropanol was first synthesized in pure form accidently in 1942 by Cottle and Magrane. This method involved the reaction of epichlorohydirn with magnesium bromide, ethylmagnesium bromide, and ferric chloride.2

Also, 1,3-dihalides can be converted into cyclopropanes by dehalogenation with magnesium.5 Attempts to purify cyclopropanol have been unsuccessful.DIMENSIONS Cottle and Magrane40 devised an isolation method for cyclopropanol using fractional distillation, however, they found that their experiments gave cyclopropanol fractions that contained halogen, possibly as epichlorohydrin, and fractional distillation or chemical treatment could not remove these fractions without damage to the alcohol.6 A later experiment’s sample purified by Stahl and Cottle resulted in a Zerewitinoff value that indicated an 87% content of cyclopropanol. Also, cyclopropanol’s change to propionaldehyde was obtained when the alcohol was treated with potassium carbonate.7 Furthermore, cyclopropanol readily rearranges to propionaldehyde especially in basic solution7, seen in equation 3.

Experiments conducted by Roberts and Chambers also concluded analytically pure cyclopropanol was not obtained in repeated Magrane and Cottle methods and rearranged easily to propionaldehyde.8

Another method in the formation of cylcopropanol is by cleavage of esters and ethers. Generally, basic cleavage of a cyclopropyl ester produces cyclopropanol in good yield, however Paukstelis and Kao have reported an instance where skeletal rearrangement takes precedence over the formation of alcohol.3 Preparing cyclopropanols using ethers was first developed by Schollkopf and his co-workers. The problem with this method, however, is finding an R group that can be removed without disrupting the three-membered ring.3

One of the most well-known transformations of cyclopropane derivatives is the cyclopropyl to allyl rearrangement and is widely used in the preparation of allylic compounds from halogenocyclopropanes.1

It is important to use a protecting group on the alcohol before ring closure can occur. The use of a tert-Butyldimethylsilyl group is advantageous in that it does not have a chiral center and is both stable and applicable in the protection of alcohol.9 Furthermore, silyl ethers are easily prepared, show resistance to oxidation, good thermal stability, low viscosity, and are easily recoverable from their parent compound.10

Jennifer Castillo The Synthesis of Anhydrous Cyclopropanol Jennifer Castillo Dr. Rogers The Synthesis of Anhydrous Cyclopropanol Dr. Rogers

Known physical properties of cyclopropanol include index of refraction (1.526), molar volume (50.2 3cm3), flash point (22.2 10.9 ), boiling-point (90.4 8.0 at 760 mmHg), density (1.2 0.1 g/cm3), and vapor pressure (34.4 0.3 mmHg at 25 ).11 ± ± ℃ ± ℃ The ±goal of this project is to synthesize anhydrous± cyclopropanol℃ from 1,3-dichloro-2-propanol. The approach to this project is to eliminate by-products and not allow an isomerization reaction to allyl alcohol while maintaining anhydrous conditions. Also, 1,3-dihalides can be converted intoStart cyclopropanesing with by a halogenated(22.2±10.9℃) straight, boiling-point chain hydrocarbon (90.4±8.0℃ at and 760 protecting mmHg), densit the alcohol could possibly 5 11 dehalogenation with magnesium. Attempts leadto purify to cyclopropanola reductive ring(1.2±0.1 closure g/cm3), reaction and vapor that pressure would (34.4±0.3 occur inmmHg a non at- aqueous25℃). solvent. It is have been unsuccessful. Cottle and Magrane devised an isolation Known physical properties5 of cyclopropanol include index of Also, 1,3-dihalides can be converted into cyclopropanesimportant by dehalogenation to protect the with alcohol magnesium. since there would be a competing reduction of the alcohol’s Attempts to purify cyclomethodpropanol for cyclopropanol have been using unsuccessful.fractional distillation, Cottle however, and theyMagrane refraction devised (1.526) an, molar volume (50.2±3cm3), flash point found that their experiments gave cyclopropanolacidic fractions proton that and competing intramolecular, boiling-point substitution to form epichlorohydrin., density Lastly, the isolation method for cyclopropanol using fractional distillation, however, they(22.2±10.9℃) found that their (90.4±8.0℃ at 760 mmHg) contained halogen, possibly as epichlorohydrin,removal and fractional of the protecting(1.2±0.1 group g/cm3) could, and potentially vapor pressure allow (34.4±0.3 a high mmHg yield at 25℃)of cyclopropanol..11 The experiments gave cyclopropanoldistillation or chemical fractions treatment that could contained not removeoverall halogen, these strategy fractions possibly is summarized as epichlorohydrin, below in Scheme 1. and fractional distillationwithout ordamage chemical to the alcohol.treatment6 A later could experiment’s not remove sample thesepurified fractions without damage 6 to the alcohol. A laterby Stahl experiment’s and Cottle resulted sample in a purifiedZerewitinoff by valueStahl that and indicated Cottle an resulted in a Zerewitinoff value that87% contentindicated of cyclopropanol. an 87% content Also, cyclopropanol’s of cyclopropanol. change to Also, cyclopropanol’s change to propionaldehydepropionaldehyde was obtained was obtained when when the the alcohol alcohol was was treated treated with with potassium carbonate.7 Furthermore,potassiumcarbonate.7 cyclopropanol Furthermore, readily rearrangescyclopropanol toreadily propionaldehyde rearranges especially in basic to propionaldehyde especially in basic solution7, seen in equation 3. solution7, seen in equation 3.

Experiments conductedExperiments by Roberts conducted and by Chambers Roberts and Chambersalso concluded also concluded analytically pure analytically pure cyclopropanol was not obtained in repeated Magrane cyclopropanol was not obtained in repeated Magrane and Cottle methods8 and rearranged easily and8 Cottle methods and rearranged easily to propionaldehyde. to propionaldehyde. Another method in the formation of cylcopropanol is by cleavage of esters and ethers. Generally, basic cleavage of a cyclopropyl ester Another method in theproduces formation cyclopropanol of cylcopropanol in good yield, however is by Pauksteliscleavage and of Kaoesters and ethers. Generally, basic cleavage of a cyclopropylhave reported an ester instance produces where skeletal cyclopropanol rearrangement in takesgood yield, however Paukstelis and Kao have reportedprecedence an instance over the where formation skeletal of alcohol. rearrangement3 Preparing cyclopropanols takes precedence over the formation of alcohol.using3 Preparing ethers was firstcyclopropanols developed by Schollkopfusing ethers and his w co-workers.as first developed by Schollkopf The problem with this method, however, is finding an R group that and his co-workers. The problem with this method, however, is finding3 an R group that can be can be removed without disrupting the three-membered3 ring. removed without disruptingOne of the the mostthree well-known-membered transformations ring. of cyclopropane derivatives is the cyclopropyl to allyl rearrangement and is widely used One of the most wellin- theknown preparation transformations of allylic compounds of cyclopropane from halogenocyclopropanes. derivatives1 is the cyclopropyl to allyl rearrangement andIt is is importantwidely used to use in a protecting the preparation group on theof alcoholallylic before compounds from 1 halogenocyclopropanes.ring closure can occur. The use of a tert-Butyldimethylsilyl group is advantageous in that it does not have a chiral center and is both 9 It is important to usestable a protecting and applicable group in the on protection the alcohol of alcohol. before Furthermore, ring closure can occur. The use of silyl ethers are easily prepared, show resistance to oxidation, good a tert-Butyldimethylsilylthermal group stability, is low advantageous viscosity, and are in easily that recoverableit does not from have their a chiral center and is both 9 stable and applicableparent in the compound. protection10 of alcohol. Furthermore, silyl ethers are easily prepared, show resistance to oxidation,Known physicalgood thermal properties stability, of cyclopropanol low viscosity, include index and are easily recoverable 10 from their parent compound.of refraction (1.526) , molar volume (50.2±3cm3) , flash point

DIMENSIONS 41 Jennifer Castillo Jennifer Castillo The Synthesis of Anhydrous Cyclopropanol The Synthesis of Anhydrous Cyclopropanol Dr. Rogers Dr. Rogers

Results ResultsResults Experimental Dichloromethane, 50 mL, and a magnetic stir-bar were added to a 250-mL round bottom flask. Imidazole, 3.3318 g (16.26mmol), dissolved in 10 mL of dichloromethane was added to the flask. Approximately 2.35 mL (25mmol) of 1,3-dichloro-2-propanol was added to the flask using an appropriate syringe. t-Butyldimethylsilyl chloride (TBDMSiCl), 8.1556 g (18mmol) dissolved in 10 mL of dichloromethane, was transferred under argon to an addition funnel along with an additional 30 mL of dichloromethane. The flask was placed under a slight positive pressure of argon. The TBDMSiCl solution was added drop-wise over a period of 48 hours at room temperature. Vacuum filtration was performed to separate the

precipitate. An extraction was performed three times with deionized Figure 1: IR of starting material 1,3-dichloro-2-propanol. Figure 1: IR of starting material 1,3-dichloro-2-propanol. water and magnesium sulfate was added to the organic liquid. The Figure 1: IR of starting material 1,3-dichloro-2-propanol. solution dried for 24 hours. An IR spectrum was taken, using a Perkin Elmer 1500 Series FTIR, (neat liquid on a polyethylene IR card) to check for any trace of alcohol; IR confirmed none was present. Vacuum filtration was performed to remove the drying agent. A rotary evaporator was used to remove the solvent. The product recovered was clear. An IR spectrum of product was taken as well as an 1H-NMR, which showed volatile impurities; therefore, rotovap was performed a second time. 1H-NMR of this purified product was taken, spectrum seen in Figure 3. Each 1H-NMR taken had TMS as the solvent.

DiscussionJennifer Castillo The Synthesis ofA Anhydrous white precipitate Cyclopropanol and yellowish liquid were formed during the Figure 2: IR indicating the –OH functional group is no longer present. Figure Figure 2: 2: IR IR indicating indicating the –OH the functional –OH group functional is no longer present. group is no longer present.reaction of the alcoholDr. withRogers the silylating agent. After vacuum

filtration, IR confirmed no alcohol was present in the solution, as seen in Figure 2. Also, the doublet present around 3000 cm-1 and second doublet around 1500 cm-1 represent the presence of TBDMSiCl and are consistent with those found in literature. 1H- NMR of the crude protected alcohol showed a doublet was present at 0.76 ppm with an integration number of 8.560 as shown in Figure 3. It is believed that this doublet is actually two singlets representing conformational isomers, which is consistent with restricted rotation due to the bulky t-butyl group.

FigureFigure 3: 3: NMR with awith doublet aaround doublet 0.8ppm in around crude product 0.8ppm in crude product

Experimental Dichloromethane, 50 mL, and a magnetic stir-bar were added to a 250-mL round bottom flask. DIMENSIONS 42 Imidazole, 3.3318 g (16.26mmol), dissolved in 10 mL of dichloromethane was added to the flask. Approximately 2.35 mL (25mmol) of 1,3-dichloro-2-propanol was added to the flask using an appropriate syringe. t-Butyldimethylsilyl chloride (TBDMSiCl), 8.1556 g (18mmol) dissolved in 10 mL of dichloromethane, was transferred under argon to an addition funnel along with an additional 30 mL of dichloromethane. The flask was placed under a slight positive pressure of argon. The TBDMSiCl solution was added drop-wise over a period of 48 hours at room temperature. Vacuum filtration was performed to separate the precipitate. An extraction was performed three times with deionized water and magnesium sulfate was added to the organic liquid. The solution dried for 24 hours. An IR spectrum was taken, using a Perkin Elmer 1500 Series FTIR, (neat liquid on a polyethylene IR card) to check for any trace of alcohol; IR confirmed none was present. Vacuum filtration was performed to remove the drying agent. A rotary evaporator was used to remove the solvent. The product recovered was clear. An IR spectrum of product was taken as well as an 1H-NMR, which showed volatile impurities; therefore, rotovap was performed a second time. 1H-NMR of this purified product was taken, spectrum seen in Figure 3. Each 1H-NMR taken had TMS as the solvent.

Discussion A white precipitate and yellowish liquid were formed during the reaction of the alcohol with the silylating agent. After vacuum filtration, IR confirmed no alcohol was present in the solution, as seen in Figure 2. Also, the doublet present around 3000 cm-1 and second doublet around 1500 cm-1 represent the presence of TBDMSiCl and are consistent with those found in literature. 1H- NMR of the crude protected alcohol showed a doublet was present at 0.76 ppm with an integration number of 8.560 as shown in Figure 3. It is believed that this doublet is actually two singlets representing conformational isomers, which is consistent with restricted rotation due to the bulky t-butyl group.

Jennifer Castillo The Synthesis of Anhydrous Cyclopropanol Dr. Rogers

CDCl3 QE-300 240 220 200 180 160 140 120 100 80 60 40 20 0

1211109876543210 FigureFigure 4: Literature 4: 1H-NMR Literature of 1,3-dichloro-2-propanol 1H-NMR of 1,3-dichloro-2-propanol Conclusion The remaining features of the spectrum are consistent with the A Schlenk technique resulted in the protection of the alcohol group Thetarget compound.remaining Peaks present features around 4.00, of 3.50, the and 2.00 spectrum ppm in 1,3-dichloro-2-propanol are consistent using silylating with agent the t-butyldimethylsilyl target compound . Peaks present indicate the presence of TBDMSiCl. Comparing Figures 3 and 4, chloride. Comparison between literature values and experimental arounddifferences are4.00, noticed in3.5 peak0, positions. and 2.00 ppm indicatevalues in theFTIR and presence 1H-NMR confirmed of the TBDMSiCl. protection. Future work Comparing Figures 3 and Future work will determine if the desired product was formed. including vacuum distillation, ring closure using reducing agents, and 4,Firstly, differences vacuum distillation areor chromatographic noticed purification in peak of the positions.removal of protecting group through the use of tetrabutylammoinium protected alcohol product must be done. 13C-NMR will check for fluoride in THF under anhydrous conditions can possibly form Futureimpurities. Variouswork reducing will agents determine can be used such asif magnesium, the desired anhydrous product cyclopropanol. was formed. Firstly, vacuum distillation or lithium, sodium amalgam, or zinc amalgam to form the ring closure. 13 chromatographicAfter ring closure, the protecting purificationgroup must be removed. of Removal the protected alcohol product must be done. C-NMR will checkcan be done for with tetrabutylammoniumimpurities. fluorideVarious in THF under reducing agents can be used such as magnesium, lithium, sodium anhydrous conditions. It is important to note that none of these amalgam,steps involves the useor of zinc water. amalgam to form the ring closure. After ring closure, the protecting group must be removed. Removal can be done with tetrabutylammonium fluoride in THF under anhydrous conditions. It is important to note that none of these steps involves the use of water.

Conclusion A Schlenk technique resulted in the protection of the alcohol group in 1,3-dichloro-2-propanol using silylating agent t-butyldimethylsilyl chloride. Comparison between literature values and DIMENSIONS 43 experimental values in FTIR and 1H-NMR confirmed the protection. Future work including vacuum distillation, ring closure using reducing agents, and removal of protecting group through the use of tetrabutylammoinium fluoride in THF under anhydrous conditions can possibly form anhydrous cyclopropanol.

References 1. Kulinkovich, O.G. The Chemistry of Cyclopropanols. J. Am. Chem. Soc. 2002. 103, 2597- 2632. 2. DePuy, C.H.; Mahoney L.R.. The Chemistry of Cyclopropanols. I. The Hydrolysis of Cyclopropyl Acetate and the Synthesis of Cyclopropanol. J. Am. Chem. Soc. 1963. 86 (13) 2653- 2657. 3. DePuy, C.H.; Gibson D.H. Cyclopropanol Chemistry. J.Am. Chem. Soc. 1973. 1974 (6) 605- 623. 4. Gerdil, R. Helv. Chim. Acta.1970. 53 2100. 5. Cudre, Y.; Fernandez-Zumel, M.A.; Risse, J.; Severin, K. Synthesis of Tri-fluoromethyl- Substituted Cyclopropanes via Sequential Kharasch Dehalogenation Reactions. Org. Lett. 2012. 14 (12) 3060-3063. 6. Cottle, D.L.; Magrane J.K. The Reaction of Epichlorohydrin with the Grignard Reagent. J. Am. Chem Soc. 1941. 65 483-487. References 1. Kulinkovich, O.G. The Chemistry of Cyclopropanols. J. Am. Chem. Soc. 2002. 103, 2597-2632.

2. DePuy, C.H.; Mahoney L.R.. The Chemistry of Cyclopropanols. I. The Hydrolysis of Cyclopropyl Acetate and the Synthesis of Cyclopropanol. J. Am. Chem. Soc. 1963. 86 (13) 2653-2657.

3. DePuy, C.H.; Gibson D.H. Cyclopropanol Chemistry. J.Am. Chem. Soc. 1973. 1974 (6) 605-623.

4. Gerdil, R. Helv. Chim. Acta.1970. 53 2100.

5. Cudre, Y.; Fernandez-Zumel, M.A.; Risse, J.; Severin, K. Synthesis of Tri-fluoromethyl- Substituted Cyclopropanes via Sequential Kharasch Dehalogenation Reactions. Org. Lett. 2012. 14 (12) 3060-3063.

6. Cottle, D.L.; Magrane J.K. The Reaction of Epichlorohydrin with the Grignard Reagent. J. Am. Chem Soc. 1941. 65 483-487.

7. Cottle, D.L.; Stahl G. W. The Reaction of Epichlorohydrin with the Grignard Reagent. Some Derivatives of Cyclopropanol. J. Am. Chem. Soc. 1943. 65 (9) 1782-1783.

8. Chambers, C.V.; Roberts J.D. Small Ring Compounds. VI. Cyclopropanol, Cyclopropyl Bromide and Cyclopropylamine. J. Am. Chem. Soc. 1951. 73 (7) 3176-3179.

9. Corey, E.J.; Venkateswarlu, A. Protection of Hydroxyl Groups as tert-Butyldimethylsilyl Derivatives. J. Am. Chem. Soc. 1972. 94 (17) 6190-6191.

10. Ballini, R.; Bigi, F.; Bosica G.; Maggi, R.; Righi P.; Satori, G. Protection (and Deprotection) of Functional Groups in Organic Synthesis by Heterogeneous Catalysis. Chem. Rev. 2004. 104 (1) 199-250.

11. Chemider. http://www.chemspider.com/Chemical-Structure. 109961.html (accessed 12/9/13).

DIMENSIONS 44 The Intrinsically Disordered Protein Stathmin Exists as an Oligomer in Solution, as Measured by Static Light-scattering and Dipolar Broadening EPR Spectroscopy.

Department of Chemistry and Biochemistry, California State University, Fullerton

Ashley J. Chui, Katherina C. Chua, Jesus M. Mejia, and Michael D. Bridges

Abstract Intrinsically disordered proteins (IDPs) are an interesting class of highly dynamic, typically regulatory, proteins. They lack a native three-dimensional fold, but often acquire a stable, ordered structure upon interaction with a binding partner. Various IDPs have been reported to exist in cells as ordered oligomers or disordered aggregates, often resulting in disease. In particular, stathmin is a regulatory IDP involved in the disassembly of cytoskeletal microtubules. As such, it is essential for proper cell function (i.e., processes coordinating the cell cycle, maintaining cell shape, etc.); improper regulation of stathmin activity has been linked to neurodegenerative diseases, mental disorders, and various cancers. It is thus important to study the solution-phase structure and conformational dynamics of stathmin, as they likely emulate the protein’s behavior in cellular environments. Upon obtaining preliminary Native-PAGE results that indicated multiple stathmin complexes of varying mass, we hypothesized that it may exist as an oligomer in solution, which contradicts previous observations of a purely monomeric state as seen by analytical ultracentrifugation. We thus obtained static multi-angle light scattering data for stathmin solutions of varying concentrations, which show distinct concentration-dependent variation on measured particle size, as expected from an equilibrium system of monomers and oligomers. To investigate this further, we then performed site-directed spin labeling (SDSL) electron paramagnetic resonance (EPR) spectroscopy on multiple singly- and doubly-labeled stathmin mutants. Interestingly, the resulting EPR data all exhibit ‘complex’ (multicomponent) spectra, which could be easily deconvoluted and interpreted. In each case, one spectral component exhibited the predicted high-mobility – but ordered – state of a nitroxide side chain tethered to a stable alpha helix. The other spectral component, on the other hand, was substantially broadened due to the dipolar interaction, implying the close proximity of two or more spin labels, likely due to dimerization or higher-order oligomerization. Upon dilution of the spin-labeled proteins with unlabeled wild-type stathmin, the spectral weight of the dipolar-broadened component was significantly reduced. The data collectively presented herein support our hypothesis that stathmin exists as an oligomer in solution. These results have important implications on our understanding of the conformational dynamics of this IDP and the roles that oligomerized or aggregated IDPs have in diseases.

DIMENSIONS 45 Michaelis-Menten Kinetics of TEV Protease as Observed by Time-domain EPR Spectroscopy

Department of Chemistry and Biochemistry, California State University, Fullerton

Estefania Larrosa and Michael D. Bridges

Abstract Here we report the utility of time-domain electron paramagnetic resonance (EPR) spectroscopy for the direct detection of tobacco etch virus (TEV) protease activity. Our target substrate for monitoring the action of the protease was a fusion protein construct consisting of maltose binding protein linked to the intrinsically disordered protein ‘stathmin’ by the TEV protease-specific amino acid sequence (ENLYFQG). We used site-directed mutagenesis and nitroxide spin labeling to probe local motion at specific sites in the stathmin portion of the fusion construct (i.e., at residues 3, 12, 54, 74, 91, 113, and 146). Spin labeling involves the attachment of an EPR-active nitroxide side chain to a cysteine residue and the relative mobility of the spin label is directly reported via its exhibited EPR spectrum. As such, we were able to monitor the cleavage of the fusion construct by an increase in the observed motion of the stathmin-bound nitroxide. The spectra recorded over the course of the proteolysis reaction were globally fit by using a simple Michaelis-Menten model.K m and kcat were extracted from these fits and were compared to previously published values. While theK m values (0.103 ± 0.011 mM) recorded for most mutants spin labeled in the C-terminal end or middle -1 of the protein agreed well with those in the literature, their kcat values (0.028 ± 0.005 s ) were consistently lower, approximately one-eighth of the published values. This might suggest that the intrinsically disordered protein portion of the fusion construct interacts with the protease while bound to it, interfering with the expected catalytic activity though not affecting substrate binding. Another interesting result is that the presence of the neutral-polar spin label close to the peptide ‘cut site’ itself (i.e., a nitroxide at residue 3 in stathmin) appears to dramatically increase the binding affinity of the peptide to the protease, as evidenced by a notably smaller Km value (0.022 ± 0.003 mM). This suggests that TEV protease activity is strongly dependent on not just the sequence of peptide cut-site, but also the polarity and charge make-up of nearby residues.

DIMENSIONS 46 Geochemical Correlation of Basalts in Northern Deep Springs Valley, California, by X-Ray Fluorescence Spectroscopy (XRF)

Department of Geological Sciences, California State University, Fullerton

Aaron Justin Case Advisor: Jeffrey R. Knott, Ph.D.

Abstract In northern Deep Springs Valley (DSV), between Owens Valley and Mountains occupied ancient river channels and flowed possibly as far Death Valley, California, Miocene-Pliocene-age, olivine basalts lie on southeast as Death Valley. Basalt flows found atop the White the valley floor and atop the adjacent White/Inyo Mountains to the Mountains, Deep Springs Range (DSR) and Last Chance Range (LCR) west and the Deep Springs Range to the east. Previous geologic are all mapped as the same geologic unit (Tb; McKee and Nelson, mapping shows the DSV basalt flows and the Last Chance Range (LCR) 1967; Wrucke and Corbett, 1990) presenting the possibility that one basalts found to the southeast are the same geologic unit with a flow emanated from the White Mountains and possibly reached source in the White/Inyo Mountains. The basalts in northern DSV are Death Valley, thus showing the possible ancient river trace that would offset ~400 m by the Deep Springs fault and have a K/Ar age of 10.8 Ma. connect pupfish populations. To determine if the olivine basalts found in the region are all In this study, I present geochemical analyses of the basalt flows from the same source, four samples were collected in a linear pattern of the northern DSR. These basalts are the first key link between the from west to east across northern DSV. The samples were powdered White Mountains and Death Valley. I will then compare these data and analyzed for major and trace element composition by X-Ray with basalts atop the LCR to the southwest to try and show that Fluorescence spectrometer (XRF). these are or are not the same basalt flow and potential pathway for Trace-element plots (e.g., Ba, Nb, Zr, Y, Ce, etc.) show that the pupfish dispersal. DSV basalts are similar and are likely from the same source; however, the composition of the DSV basalts are distinct from the LCR basalts. I interpret this data by presenting the DSV and LCR basalts have different sources and should not be mapped as the same geologic unit. The likely source of the DVS basalts is in the White/Inyo Range. The geochemical correlation shows that the DSV basalts flowed NW to SE in a paleochannel 10.8 Ma and that DSV did not exist at that time.

Abstract Questions exist about the possible dispersal pathways of various ancient species such as the pupfish (Knott et al., 2008; Phillips, 2008; Echelle, 2008). Early hypotheses suggested that pupfish dispersed along the ancient rivers and lakes of eastern California about 20,000 years ago (Blackwelder, 1933); however, biological studies show that this is an insufficiently short time period and a more realistic time is about 4 Ma (Smith et al., 2002). A 4,000,000 year time frame allows for plate tectonic motions to impact migration pathways. Reheis Figure 1: Eastern California/ Southern Nevada area showing mountain ranges and Sawyer (1997) hypothesized that basalt flows of the White and valleys. The box is the research area.

DIMENSIONS 47 Location Deep Springs Valley (DSV) is located about 37±1 kilometers (~23±1 miles) east of Bishop, California. DSV is a 25 kilometers long closed basin between the southern White Mountains and the northern Inyo Mountains (Figure 1). The olivine basalts crop out in the north end of DSV (Figure 2). The east side of DSV is bounded by the DSR. At the base of the DSR is the north-northeast striking, normal slip Deep Springs fault zone (Reheis and Sawyer, 1997).

Background Observations indicate that ancient basalts flowed downhill along ancient riverbeds (Dalrymple, 1963). Dalrymple (1963) determined by whole-rock K/Ar that the olivine basalts with locally scoriaceous features in the DSV and the White Mountains are 10.8±0.1 Ma. A tuff below the basalt yields an age of 10.8±0.1 Ma. McKee and Nelson (1967) mapped olivine basalt flows (Tb) in the northern DSV, DSR and LCR (Figure 3). These flows span from Figure 2: Deep Springs Valley area showing Mountain Ranges and Valleys the Inyo Mountains east to Piper Mountain in the DSR. McKee and zoomed in. The box is the research area. Nelson (1967) show the basalt offset by strands (west side down) Range is that there are no published geochemical analyses of the normal faults of the Deep Springs fault zone (Figure 4). Most of these White Mountain basalts. normal faults do not offset the older alluvium (Qoa); however, one Ormerod et al. (1988; 1991) showed that basalts of the Great north-trending normal fault produces a scarp in the Qoa. All of the Basin have a unique ratio of Zr/Ba. They used these two elements faults are buried by the younger alluvium (Qa) and alluvial fan (Qf) rather than Nb because the concentration of Nb is relatively low and deposits. Thus, the Deep Springs fault zone began sometime after 10.8 subject to analytical error. Zr and Ba are both relatively immobile Ma, but has not produced ground rupture since deposition of the Qoa. elements in basalt magmas and Ormerod et al. (1988; 1991) showed Krauskopf (1971) listed a 4.8 Ma K/Ar age for olivine basalts with that the ratio is consistent and indicative of magma source. They locally scoriaceous features (Tb); however, this basalt is actually not found that basalts with Zr/Ba ratios <0.2 are greater than 5 Ma at in the White Mountains, but across Fish Lake Valley to the northeast. the latitude of Bishop and indicate a lithospheric contamination. In Krauskopf mapped a basalt overlying rhyolite tuff near Cottonwood contrast, basalts with a Zr/Ba ratio >0.2 are younger than 5 Ma and Creek in the White Mountains, which is the likely location of Dalrymple’s indicate an asthenosphere source. (1963) 10.8 Ma basalt. Kempton et al. (1991), also working on Great Basin basalts, Reheis and Sawyer (1997) proposed that basalts of the Deep found that there was significant variation in Ce and Y concentrations. Springs/Eureka Valley area flowed from the White/Inyo Mountains They used the ratio of Ce/Y to illustrate the fractionation of rare-earth to the LCR and had the same source. They wrote “major oxide and elements in various magma sources. trace elements analyses indicate that the basalts are from the same Pluhar et al. (2005) showed that X-ray Fluorescence spectroscopy sequence of flows or at least share a common parentage” (Reheis (XRF) effectively sorted out different basalt flows of the Coso Volcanic and Sawyer 1997, pg. 284). They showed paleo-channels locations field, California, just southwest of Deep Springs Valley and part of and report the 10.8±0.1 Ma K/Ar age of Dalrymple (1963) for the the Great Basin. Many of the Coso basalts were thought to be the basalt at Piper Mountain. Correlation of the Piper Mountain basalt same; however, Pluhar et al. used Harker diagrams and plots of Rb/ with those dated in the White Mountains by Dalrymple (1963) is Nb vs. Sr/Zr to segregate different basalts from the same source. reasonable; however, McKee and Nelson (1967) mapped sedimentary Manoukian (2012) analyzed basalts from the LCR by XRF as well. rocks below the Piper Mountain basalt, not a rhyolite like Dalrymple Miller and Wrucke (1995) reported a K/Ar age of 5.5 Ma for the basalt (1963). One drawback to the correlation of basalts in the White/Inyo analyzed by Manoukian.

DIMENSIONS 48 Figure 3: The upper part isa portion of the Soldier Pass 15’ quadrangle geologic map by McKee and Nelson (1967). The lower portion is a Google Earth image of the same area. The unit Tb in the red boxed area is olivine basalt and the sample locations are shown on both. The cross section along A-A’ is shown in Figure 4.

Figure 4: Cross section of northern Deep Springs Valley by McKee and Nelson (1967).

DIMENSIONS 49 Methods I collected basalt samples (Figure 5) from four locations in northern Deep Springs Valley. Each sample will be described in the field and the location recorded using a hand-held GPS (Table 1.) I used XRF to determine the geochemical composition of each sample collected. The XRF uses x-rays to excite the electrons of the elements. When the electrons become excited, they jump an electron shell and when returning back to its original state, energy is released as light. The amount and wavelength of the light indicates the quantity and types of major and trace elements present. The data was analyzed and data reduction completed at Pomona College.

Table 1: Universal Transverse Mercator coordinates of samples collected. Coordinates were determined with a hand-held GPS receiver and are in sections 11S

AJC-DSV- AJC-DSV- AJC-DSV- AJC-DSV- 052813-1 052813-2 052813-3 052913-4 411661E 413804E 414430E 416827E 4141290N 4140912N 4141141N 4140683N

Results and Discussion Figure 5: Photograph (above) of outcrop The main results are XRF analysis of whole-rock samples (Table 2). where AJC-DSV-0528-13-2 was collected. In The data collected is relatively well behaved. One sample (AJC- the low background is the location of sample 3 with the location of sample 4 on the skyline DSV-052813-4) was randomly chosen to be run twice to measure atop Piper Mountain. At left is a close-up of accuracy. For the duplicate samples, the Relative Standard Deviation sample 2 location. (RSD) ranged from 0.0% to 254%. The highest RSD was for Ta at 491.9%. The concentrations of Ta for all five AJC samples ranged from -2.5 ppm to 2.9 ppm The negative values clearly indicate poor data, as a result, Ta was no longer considered for data analysis. Beside Ta, U and Th also had RSD results of 47% and 70%, respectively. The concentration of U ranged from 0.7 to 4.9 ppm whereas the concentration of Th ranged from 0.9 to 4.3 ppm. These Objective concentrations are very low and the range produces a mean with a I will test the following hypotheses by completing XRF analysis on high standard deviation. The explanation for the wide range of basalt samples of northern DSV: concentration of U and Th is probably related to the large atomic 1. If the olivine basalts in northern DSV are the same, then radius that may result in excessive interferences. they should all have the same geochemical composition. Excluding Ta, U and Th, the RSDs for the remaining elements 2. If the olivine basalts in northern DSV are different, then they and oxides for the duplicate samples range from 10-15% (Sm, Nd, should all have different geochemical compositions. Pb), 5-10% (Cr, Cu, Ni, Zn, Zr, Ce, La, Nb) and less than 5% for the 3. If the olivine basalts in northern DSV and the LCR have the remaining elements and oxides. In general, I regarded RSDs less than same source, then the geochemical compositions would be 10% for the duplicate analyses as acceptable and indicate that the the same proving that the olivine basalts came from the LCR. particular concentration of that element or oxide was acceptable for 4. If the olivine basalts in northern DSV and the LCR have data evaluation. different sources, then the geochemical compositions would The Total Alkali Silica (TAS) diagrams (Figure 6) distinguish igneous be different which would prove the source had a different rocks geochemically using the Silica (Si) and Total Alkali (K2O + origin such as the White Mountains located in the northwest. Na2O). The DSV samples plot in the basalt field.

DIMENSIONS 50 Table 2: X-Ray Fluorescence Data. Concentrations are in parts per million unless labeled as weight percent (%).

DIMENSIONS 51 Pluhar et al. (2005) used ratios of Rb/Nb and Sr/Zr for “lava shows that the LCR basalts plot in a separate cluster from both the fingerprinting”. A plot of Rb/Nb vs. Sr/Zr shows that the DSV basalts Coso and DSV basalts (Figure7). Plotting the LCR basalts on the Zr/ are distinct from the Coso basalts (Figure 7). The DSV basalts show Ba vs. Ce/Y diagrams (Figure 8) shows that the LCR basalts have a quite a range of Rb/Nb ratios. This is likely the result of the relatively Zr/Ba ratio >0.2. A ratio of >0.2 is attributed to mantle magmas <5 low Nb concentration as described by Ormerod et al. (1988). Ormerod Ma and a asthenosphere source. This >0.2 Zr/Ba ratio conflicts with et al. (1988; 1991) and Kempton et al. (1991) both noted that the Miller and Wrucke’s (1995) whole-rock 5.5 Ma K/Ar age. One possible unusually low concentration of Nb led to potential analytical errors. explanation is that Ormerod et al.’s observations are inconsistent. They found that ratios of Zr/Ba and Ce/Y adequately distinguished Alternatively, the K/Ar date is incorrect. The latter is more likely basalt flows of differing age and tectonic setting. considering that a preliminary 40Ar/39Ar date (sanidine) on a rhyolite A plot of Zr/Ba vs. Ce/Y shows that the 10.8 Ma DSV basalts tuff that underlies the basalt is ~3.5 Ma. have a ratio of Zr/Ba <0.2 (Figure 8). This is consistent with Ormerod The linear trend of Ce/Y shows that AJC-DSV-052813-1, -3 & -4 et al.’s hypothesis that basalts >5 Ma have Zr/Ba ratios <0.2 as a are from the same flow (Figure 8). The offset of AJC-DSV-052813-2 result of lithospheric contamination by the subducted Farallon plate. from the others suggests that this was separate flow or some process Plotting Manoukian’s XRF results on the Rb/Nb vs. Sr/Zr diagram impacted the Zr/Ba ratio as the basalt cooled.

DIMENSIONS 52 Conclusions, Significance, and Future Works Geochemical data (Zr/Ba vs. Ce/Y) show that the DSV basalts are different from the LCR basalts in a similar geomorphic position atop each range. The Zr/Ba ratio indicates that DSV basalts erupted prior to 5 Ma whereas the LCR basalts erupted after 5 Ma or after the Mendocino triple junction move north of Bishop. This different Zr/Ba ratio indicates that the DSV basalts and LCR basalts had different magma sources. Because the olivine basalts in the northern DSV are the same, then when the flows erupted, there was a topographic low (e.g. ancient river channels) that allowed the basalts to flow southeast from the White/ Inyo mountains to the Deep Springs Range. That would indicate that Deep Springs Valley and the Deep Springs Range did not exist. If this is true, another hypothesis arises in which we can calculate the minimum slip rate of the Deep Springs fault zone by taking the ~400m offset of the basalt flow and the 10.8±0.1Ma age to determine an estimated minimum 0.04 mm/yr slip rate. These data show that there are different basalts of differing age in the Deep Springs/Eureka/Last Chance region. Correlation of the DSV basalts indicates that a flow from the White Mountains traveled to at least Piper Mountain in the DSR and shows that there was a possible river channel that connected the White Mountains with the LCR to the southeast. As to whether the channel extended further east to Death Valley or turned south is unknown; however, this potential dispersal pathway is not eliminated at this time. Future work should include sampling of additional basalts found to the northwest and southeast of DSV. Some new geochronology of the basalts should also be done to improve the existing K/Ar chronology. Additional analytical data, such as mass spectrometry, should be done to improve the precision and accuracy of the geochemical character of the rocks.

Zr/Ba

DIMENSIONS 53 References Blackwelder, E., 1933, Lake Manly, an extinct lake of Death Valley: Miller R.J. and Wrucke C.T., 1995, Age, Chemistry, and Geologic Geographical Review, v. 23, p. 464-471. Implications of Tertiary Volcanic Rocks in the Last Chance Range and part of the Saline Range, Northern Death Valley Region, Califnornia. Dalrymple, G. B., 1963, Potassium-argon dates of some Cenozoic U.S. Geological Survey, Menlo Park, California 94025. ISOCHRON/ volcanic rocks of the Sierra Nevada, California: Geological Society of WEST, no. 62, May 1995. p. 30-36. America Bulletin, v. 74, p. 379–390. Ormerod, D. S., Hawkesworth, C. J., Rogers, N. W., Leeman, W. P., Echelle, A., 2008, The western North American pupfish clade and Menzies, M. A., 1988, Tectonic and magmatic transitions in the (Cyprinodontidae: Cyprinidon): Mitochondrial DNA divergence and Western Great Basin, USA: Nature, v. 333, p. 349-353. drainage history, in Reheis, M. C., Hershler, R., and Miller, D. M., eds., Late Cenozoic drainage history of the southwestern Great Basin and Ormerod, D. S., Rogers, N. W., and Hawkesworth, C. J., 1991, Melting lower Colorado River region: Geologic and Biotic perspectives, Vol- in the lithospheric mantle: Inverse modeling of alkali-olivine basalts ume Special Paper 439: Boulder, CO, Geological Society of America, from the Big Pine Volcanic Field, California: Contributions to p. 27-38. Mineralogy and Petrology, v. 108, p. 305-317.

Kempton, P. D., Fitton, J. G., Hawkesworth, C. J., and Ormerod, D. Phillips, F., 2008, Geological and hydrological history of the S., 1991, Isotopic and trace element constraints on the composition paleo-Owens River drainage since the Miocene, in Reheis, M. C., and evolution of the lithosphere beneath the southwestern United Hershler, R., and Miller, D. M., eds., Late Cenozoic drainage history States: Journal of Geophysical Research, v. 96, no. B8, p. 13,713- of teh southwestern Great Basin and lower Colorado river region: 713,735. Geologic and biotic perspectives: Boulder, Colorado, Geological Society of America Special Paper 439, p. 115-150.. Knott, J. R., Machette, M. N., Klinger, R. E., Sarna-Wojcicki, A. M., Liddicoat, J. C., Tinsley, J. C., David, B. T., and Ebbs, V. M., 2008, Pluhar et al., Glen, J.M., Monastero, F.C., Tanner, S.B., 2005, Lava Reconstructing late Pliocene - middle Pleistocene Death Valley lakes fingerprinting using paleomagmatism and innovative X-ray and river systems as a test of pupfish (Cyprinodontidae) dispersal fluoresence spectroscopy: A case study from the Coso volcanic field, hypotheses, in Reheis, M. C., Hershler, R., and Miller, D. M., eds., Late California. Vol. 6, Issue 4. Cenozoic drainage history of teh southwestern Great Basin and lower Colorado river region: Geologic and biotic perspectives: Boulder, Reheis, M. C. Sawyer, T. L. March, 1997,Late Cenozoic history and slip Colorado, Geological Society of America Special Paper 439, p. 1-26. rates of the Fish Lake Valley, Emigrant Peak, and Deep Springs fault zones, Nevada and California. Geological Society of America (GSA) : Krauskopf, K. B., 1971, Geologic map of the Mt. Barcroft Quadrangle, Boulder, CO, United States Vol. 109, Issue 3, pp. 280-299. California–Nevada: U.S. Geological Survey Geologic Quadrangle Map GQ-960, scale 1:62 500. Smith, G. R., Dowling, T. E., Gobalet, K. W., Lugaski, T., Shiozawa, D. K., and Evans, R. P., 2002, Biogeography and timing of evolutionary Manoukian, D. N., June 7, 2012, Geology of Cenozoic Deposits on top events among Great Basin fishes, in Hershler, R., Madsen, D. B., and of the Last Chance Range, Death Valley National Park, California. A Currey, D. R., eds., Great Basin Aquatic Systems History, Volume 33: Thesis Presented to the Faculty of California State University, Fullerton. Washington, D.C., Smithsonian Institution Press, p. 175-234. Department of Geological Sciences. Faculty Advisor Jeffrey R. Knott, Ph.D. Wrucke, C. T., and Corbett, K. P., 1990, Geologic map of the Last McKee, E.H. and Nelson, C.A., 1967, Geologic map of the Soldier Pass Chance Quadrangle, California: U.S. Geological Survey Open File quadrangle, California and Nevada: U.S. Geological Survey, Geologic Report 90-647-A, scale 1:62,500. Quadrangle Map GQ-654, scale 1:62,500

DIMENSIONS 54 Possible Shell-Hash Tsunami Deposit at the Los Penasquitos Marsh, San Diego County, CA.

Department of Geological Sciences, California State University, Fullerton

Jeremy Cordova Advisor: Dr. Brady Rhodes

Abstract The Los Penasquitos Marsh is one of a series of coastal wetlands between San Diego and Orange County that formed within stream valleys that flooded and filled with sediment during Holocene sea-level rise. In order to test the hypothesis that these wetlands contain prehistoric tsunami deposits, within the normal wetland sediments, 21 reconnaissance cores between 48 and 321cm in length were collected and described in the field. Nearly all the sediment cores contained a peaty layer in the top 5-25cm, underlain by dark brown and black mud, above interbedded fine-medium gray sand and mud. The stratigraphy in the cores is consistent with the complete infilling of a lagoon behind a baymouth bar of sand, a ‘drowned river valley’. Five of the cores taken, ranging from 1.0-1.5 km inland from the present day beach, intersect a distinctive 0.5 – 12.0 cm-thick shell-hash and muddy sand layer between 233 and 280cm depth. Based on this discovery we collected a 285cm long 5cm diameter piston core to further analyze this possible tsunami deposit. In this larger core the 10 cm-thick shell hash layer consists of angular shell fragments up to 3cm in size, in a muddy sandy matrix that includes the following genera: Mitrella, Venus, Spirotropis, Pecten, Nassarius, and an unidentified oyster. This fossil assemblage suggests a quiet water marine source for the shell hash debris– from the lagoon and/or offshore, not typical of low energy fine grained comparatively well sorted wetland sediments. The core was analyzed for loss on ignition (LOI) at both 550° and 950°C and magnetic susceptibility (ms). The LOI550 data is unremarkable, and the LOI950 data shows an expected spike in mass percent of carbonates within the shell-hash layer. The ms data shows low values for the lagoonal muds and sands, but a pronounced spike within the shell hash layer, where the average reading more than doubles from 2.0 to 4.6. We hypothesize that the anomalously high ms value for the shell hash layer indicates a substantial component from an offshore source, where heavier magnetic minerals accumulated seaward of the baymouth bar, but were subsequently swept back inland into the lagoon. If correct this layer may represent a large-wave event, either storm or tsunami. Three C-14 dates on shell fragments cluster between 1380-1420 yrs BP.

DIMENSIONS 55 Paleotsumani Research at the Seal Beach Wetlands, Seal Beach, CA

Department of Geological Sciences, California State University, Fullerton

D’lisa O. Creager, Brady P. Rhodes, Matthew E. Kirby, and Robert J. Leeper

Abstract The Seal Beach Marsh is located inside the Seal Beach Naval Weapons Station in north Orange County, CA. The wetland formed as a result of flooding and infilling of topographic lowlands during early Holocene sea-level rise. The Seal Beach marsh may contain a record of prehistoric tsunami and other paleoseismic data because the marsh is a low-energy depositional environment and historic anthropogenic disturbance is limited. To test if the marsh has a record of tsunami, sixteen reconnaissance gouge cores between 150 and 240 cm in length were collected and described in the field. The reconnaissance cores showed peaty organic layers interbedded with mud and sand. To investigate the stratigraphy at greater depths, a 377-cm vibracore was collected. Preliminary analyses of the vibracore show the top 15 cm is modern marsh. From 15 to 107 cm below land surface (bls), peaty mud and mud of varying thicknesses are interbedded. At 118 cm bls, a 10-cm thick sand layer covers mud at a sharp irregular contact. A 10-cm sand layer with an irregular basal contact at 137-cm bls covers peaty mud that consists of 50% organic matter. Peaty mud transitions to mud at 140 cm bls. Alternating mud and muddy sand layers of varying thickness continue to 246 cm bls. At 250 cm bls, a 2-cm thick mud layer caps a muddy peat layer. Mud at 270 cm bls extends down until a sharp irregular contact is made with a sand layer at 356 cm. A 21-cm thick sand layer marks the base of sediment recovery in the vibracore. The core was analyzed for loss on ignition (LOI) at 550°C (% total organic matter) and 950°C (% total carbonate) as well as magnetic susceptibility (CHI) at 1-cm intervals. These analyses confirm the existence of several organic-rich zones alternating with organic-poor mud. Our working hypothesis is that these peaty layers represent repeated subsidence of the marsh, perhaps related to seismic activity on the Newport-Inglewood Fault zone.

DIMENSIONS 56 Searching for a Paleotsunami Record at Seal Beach, Southern California

Department of Geological Sciences, California State University, Fullerton

Garcia, Dylan J., Kirby, Matthew E., Rhodes, Brady P., Leeper, Robert J.

Abstract There has been a substantial amount of research concerning south- calculated up to 25cm on the Pacific coast. The coarser material ern California’s potential and vulnerability to tsunami activity. tends to settle first, followed by the finer material due to wave Near-field or local tsunamis can be generated by submarine landslides deceleration (Morton et al., 2007). Grain size characterized by tsunamis and/or earthquakes. As many as 20 tsunamis have breached the coast can range from mud to boulders, but are commonly sand sized of California within the past two centuries, but none have been particles (Kawata et al., 1999). Deposits can be mixed under peat, reported since 1800 (Eisner et al., 2001). One of the earliest is thought given the wetland environment. Single or several beds of normally to be the Santa Barbara, southern California tsunami of 1812, which graded sand, mud laminae, and coral and shell debris all suggest damaged over 60km of Santa Barbara’s coast (Toppozada et al., 1981; tsunami deposits (Rhodes et al., 2011). Furthermore, many tsunami Lander et al., 1993). Southern California need only generate a deposits thin landward. M > 6.5 earthquake along one of the offshore fault lines to create a Core sediment was attained using a vibracore, which penetrated near-field tsunami (Mcculloch, 1985). Seal Beach lies above an active the subsurface to a depth of 344cm. The core was split, described, right-lateral strike-slip fault known as the Newport-Inglewood fault photographed, sampled, capped, and stored in a cold storage facility zone (NIFZ) (Vedder, 1975; Wright, 1991; Grant et al., 1999). The Seal at California State University, Fullerton. Multi-proxy methods were Beach environment is a known lowland trough and susceptible to used to describe the core including: descriptive analysis, magnetic fault zone activity causing this localized depression. Today, the susceptibility (CHI), and loss on ignition (LOI) at 550°C (% total coast is heavily developed and populated thus making tsunami organic matter) and 950°C (% total carbonate). The stratigraphic research important. record preserved in the subsurface sediments of this coastal wetland A 344cm sediment core extracted from the subsurface of does not show definitive evidence of rapid debris transport. We see the intertidal salt marsh at Seal Beach, Orange County, southern none of the indicators that characterize a tsunami deposit. Alternating California was examined to determine if there is any evidence for sand and mud layers were identified both visually and through past tsunamis. The objective of this study is to identify paleotsunami magnetic susceptibility. These sand layers do not necessarily indicate units in southern California’s marsh stratigraphy and to compare tsunamis, but perhaps a dynamic environment of materials washing the sedimentological data to that of others extracted from southern into Seal Beach. High CHI values (above 5×10−7 m3kg−1) represent California, and elsewhere. Tsunami deposits are identifiable by the a sandy or muddy magnetite-rich environment from a terrestrial types of deposits left in the sediment. High energy transportation is source, such as a beach, stream flow, or flooding. There is an unusual usually propagated by tsunamis or storms capable of carrying fairly sand intrusion feature (Unit I) between 202cm - 215cm depth. Unit large (≤0.5 m diameter) shells, coral, and sand in low energy I is an interesting interface of mud and sand, with softer overlying environments such as swamps, mangroves, and marshes. Discerning sediment surrounding a denser sand injection. This feature may the two has been the focus of several studies (Nanayama et al., 2000; have been due to subsidence caused by NIFZ activity. The structure Goff and McFadgen, 2004; Tuttle et al., 2004; Morton et al., 2007). could also be a result of liquefaction. It closely represents that of Because tsunamis can travel several hundreds of meters inland, they a sand or clastic dike, which are associated with storm events and are capable of leaving fairly thick deposits, some of which have been earthquakes. Another common feature in stratified sands are flame

DIMENSIONS 57 structures, which resemble this intrusion. These flame structures are formed from seismic activity and density variation between the stratified layers. Further research may characterize this structure and thus the causal event.

References Eisner, R.K., J.C. Borrero and C.E. Synolakis, 2001. Inundation maps Nanayama, F., Shigeno, K., Satake, K., Shimokawa, K., Koitabashi, for the State of California. In Proceedings of the International S., Miyasaka, S., Ishii, M., 2000. Sedimentary differences between Tsunami Symposium 2001 (ITS 2001) (on CDROM), NTHMP Review the 1993 Hokkaido-nansei-oki tsunami and the 1959 Miyakojima Session, R-4, Seattle, WA, 7–10 August 2001, 67-8. typhoon at Taisei, southwestern Hokkaido, northern Japan. In: Shiki, T., Cita, M.B., Gorsline, D.S. (Eds.), 15th International Goff, J., McFadgen, B.G.C.-G.C., 2004. Sedimentary differences Sedimentological Congress. Elsevier, pp. 255–264. between the 2002 Easter storm and the 15th-century Okoropunga tsunami, southeastern North Island, New Zealand. Marine Geology Rhodes B.P., M.E. Kirby, K Jankaew, M Choowong, 2011. Evidence for 204, 235–250. a mid-Holocene tsunami deposit along the Andaman coast of Thailand preserved in a mangrove environment. Mar. Geol. 282: 255-267. Grant, L. B., K. J. Mueller, E. M. Gath, H. Cheng, R. L. Edwards, R. Munro, and G. L. Kennedy, 1999. Late Quaternary uplift and Ratcliff, F., Martin-Hernandez, A.M.and Baulcombe, D.C. 2001. earthquake potential of the San Joaquin Hills, southern Los Angeles Tobacco rattle virus as a vector for analysis of gene function by basin, California, Geology 27, 1031–1034. silencing. Plant J., 25, 237-245.

Kawata, Y., B. Benson, J.C. Borrero, J.L. Borrero, H.L. Davies, W.P. Toppozada, T., C. Real, and D. Parke, 1981. Preparation of isoseismal deLange, F. Imamura, H. Letz, J. Nott, and C.E. Synolakis, 1999. maps and summaries of pre– 1900 California Eqs., CDMG # 81- Tsunami in Papua New Guinea was as Intense as First Thought, Eos 11SAC, p. 34, 136–140. Trans. AGU, 80, # 9, 101, 104-105. Tuttle, M.P., Ruffman, A.A.T., Jeter, H., 2004. Distinguishing tsunami Lander, J., P.Lockridge, and M. Kozuch (Eds.), 1993, Tsunamis from storm deposits in Eastern North America; the 1929 Grand Affecting the West Coast of the United States, 242 pp., U.S. Depart- Banks tsunami versus the 1991 Halloween storm. Seismological ment of Commerce, NOAA. Research Letters 75, 117–131.

Mcculloch, D. S., 1985. Evaluating tsunami potential. In Evaluating Vedder, J. G., 1975. Revised geologic map, structure sections and Earthquake Hazards in the Los Angeles Region (J. I. Ziony, ed) U.S. well table, San Joaquin Hills-San Juan Capistrano area, California, Geolog. Survey Prof. Paper 1360, pp. 375–413. U.S.Geol. Surv. Open-File Rept. 75-552, scale 1:24,000, 3 pp.

Morton, R.A., Gelfenbaum, G. and Jaffe, B.E. 2007. Physical criteria Wright, C., Mella, A., 1963. Modifications to the soil pattern of for distinguishing sandy tsunami and storm deposits using modern Southcentral Chile resulting from seismic and associated phenomena examples, Sedimentary Geology, 200, 184–207. during the period May to August 1960. Bulletin Seismological Society of America 53, 1367–1402.

DIMENSIONS 58 Metasomatism of the Bird Spring Formation Near Slaughterhouse Springs, California Department of Geological Sciences, California State University, Fullerton

Taylor Kennedy Advisor: Dr. Brandon Browne, Dr. William Laton

Abstract Geographical History The Bird Spring Formation has gone through two separate metamor- The geologic history of this region is best summarized by Burchfiel phism events; of faulting-induced metasomatism and contact and Davis (1977) and Miller and Wooden (1993), and their results metamorphism. A coarse grained white cream dolomite has gone will be summarized here. The underlying bedrock of the New York through metasomatism and metamorphism is compared to a white Mountains is Precambrian gneiss (Pcgn). This unit consists of both coarse grained marble with grossular garnet and epidote that has meta-igneous and metasedimentary rocks with potassium gone through metasomatism and a different grade of metamorphism. feldspar augens. Both of these are compared to a coarse grained white gray surface Unconformably overlying gneiss is the Cambrian Tapeats of marble, which has gone through metamorphism. Most samples Sandstone (Ct), which is overlain by the Cambrian Bright Angel Shale contained high amounts of calcium. Few samples contained iron (Cba), which is interbedded with calc-silicate and pelitic hornfels due or magnesium. While these units do not have high abundances of to localized metamorphism. precious elements, they still give insight into crystal formations. The The Cambrian Bonanza King Formation (Cbz) unconformable grossular garnet is a prime spot for the element vanadium. Another overlies the Bright Angel Shale, and is subdivided into the lower element, strontium, does not fit into the chemical structure of Papoose Lake member (Cbl) and the Banded Mountain member grossular garnet and is located within the other samples. These (Cbu). The lower Papoose Lake is interbeded with calcite and dolomite findings determine the chemical structures that these elements prefer. that has locally been metamorphosed to marble. Overlying the Bonanza King is the Cambiran Nopah Formation Introduction (Cn), which is a coarse-grained white dolomite unit underlain by Understanding how ore deposits develop is vital to our ability to shale known as the Dunderberg Shale Member (Cd). Atop the locate and develop mineral resources. In particular, ore deposits Nopah Formation is the Devonian Sultan Limestone (Ds), which containing precious metals like gold, copper, and lead, play a major contains a lower member that is interbedded calcite and dolomite role in our economy. In this study, I characterize the ore mineralization marble believed to be the metamorphosed equivalent of the of metasomatized Pennsylvanian Bird Spring Formation (Pbs) where Valentine Member (Dsv) found in other areas (REF). It has an upper in contact with the (1) high angle Slaughterhouse Fault and (2) Mid member called Crystal Pass (Dscp), which is a white limestone that Hills quartz monzonite of the Cretaceous Teutonia batholith in the has been metamorphosed to coarse grained marble. New York Mountains near Slaughterhouse Springs, California. Lying nicely on top of Crystal Pass is the Monte Cristo Limestone Comparing the mineralization of the Bird Springs Formation as a (Mm). It has three members, from lower to upper: Dawn (Mmd), result of faulting-induced metasomatism and contact metamorphism Anchor (Mma), and Buillion (Mmb). The Dawn and Anchor members will yield valuable insight into our understanding of how metasomatic have been recrystallized into marble, but the Anchor contains chert ore deposits are related to contrasting geological phenomena. My nodules. Buillion is calcite and marble with very few chert nodules. field area is located along the Slaughterhouse Fault in the New York The Bird Spring Formation overlies the Monte Cristo Limestone. Mountains in northeast San Bernardino California, south of Ivanpah Bird Spring is limestone and dolomite, and has been metamorphosed road from I-15(See Figure 1). to low grade marble in many places.

DIMENSIONS 59 Its exact date of deposition is unknown but it is restricted to the metamorphism in the area. This magma is now coarse grained quartz upper Mississippian to Pennsylvanian and possibly the Permian. monzonite to monzonite with phenocrysts of potassium feldspar Unconformable overlaying the Bird Spring is a calc-sillicate rock (Mp). Additionally, the Slaughterhouse Fault ripped across the area (Mcs). This unit has been proposed to have been deposited during and is slowly overturning the units. It is also responsible for the rest the Mesozoic, as well as is the next unit in the sequence: volcanic and of the metamorphism of the units. sedimentary rocks (Mmvs). Atop of these unknown units we have Following this faulting is the Tertiary volcanic rocks (Tv). This is the Mesozoic Sedimentary Rocks of Sagamore Canyon. This unit a unit of volcanic breccias that is associated with a late Miocene-Plio- correlates with the nonmarine environment of the south-eastern cene volcanic event (Burchfiel and Davis, 1977). California region. The last unit to be deposited in the New York Mountains is alluvium Finally an intrusion of magma has caused some of the units in the Quaternary; an older unit (Qoal) and a younger unit (Qal).

DIMENSIONS 60 Conclusions, Significance, and Future Works

Figure 2A. Geologic Map (U.S. Geologic Survey 1983)

Sampling Strategy and Analytical Method In order to compare the metamorphism, samples were selected from two sampling sights; one on the western flank of the Bird Spring Formation where it has been intruded by Cretaceous quartz monzonite and the other along the northern flank where the Bird Spring Formation is in contact with the Slaughterhouse fault. These samples were analyzed for mineral content by taking the average of X-ray fluorescence scans of each sample (See Table 1). While collecting, a ~1.5 square kilometers (km2) geological map of the area, mapping the contacts between rock units, the extent and style of the metamorphism and metasomatism (see Figure 2A), and geologic structures such as folds and faults (see Figure 2B) was completed.

Figure 2B. A-A` Cross Section

DIMENSIONS 61 Mineral Chemistry SFTK-1, a coarse grained white cream dolomite collected near the north end of the fault, has gone through metamorphism and metasomatism. SFTK-3, found near the north center of the fault, a white coarse grained marble with spots of green and brown, has gone through primarily metasomatism and a different grade of metamorphism. These spots gain there color from the high amounts g of grossular garnet and epidote within. SFTK-4, SFTK-5, SFTK-6 are a coarse grained white gray marble, collected along the contact between the Cretaceous Teutonia batholith and Bird Spring Formation. These samples have gone through metamorphism. The data shows that this unit does not possess a large abundance of precious elements. It also shows consistency in SFTK-4, SFTK-5 and SFTK-6(see Figure 5). These metamorphosed samples are very high in calcium as the original limestone should be. These samples are higher in most of the precious metals compared to SFTK-1. The first metamorphism event would have made Bird Spring formation similar in content. The only differences would be based on proximity to the contact. The metasomatism event only affects a small part of the Bird Spring formation making the differences in SFTK-1 and SFTK-3. It does show that SFTK-1 contains similar amounts of magnesium and calcium with very high amounts of zircon and strontium. This holds bearing to SFTK-3 that was also found near the fault. SFTK-3 hardly contains any magnesium, strontium, and zircon, but it does have all the iron and vanadium. During the metasomatism, elements like iron and vanadium relocated within the grossular garnet. Other elements like strontium did not fit into the chemical structure of grossular garnet, so it left and went to the rest of the unit instead. Both samples have undergone the same metasomatism but have very different chemical compositions. The other thing of note is how there is not any sodium in any of the samples. This most likely comes from the original rocks Table 1. Average geochemical analysis of three separate scans of samples. SFTK-2 was not able to get an accurate reading and was rejected. being limestone and not having much sodium to begin with.

Figure 5. Percent abundance in regards to major oxides

DIMENSIONS 62 Conclusion Acknowledgements The Bird Spring Formation does not contain much in the way of Thank you to California State University, Pomona and Jon Harris for precious elements, but it does show a difference in ore development the use of their XRF Machine. Thanks to my father, James Kennedy, in metasomatism and metamorphism. The different events have for keeping me company for the drive to my field area. Special thanks allowed for the elements to reallocate themselves in more favorable to my advisor Dr. Brandon Browne for continuing to work with me arrangements. Vanadium favored the grossular garnet but strontium even though he no longer needed to. Thanks to Dr. William Laton for did not. Bird Spring formation is 300 meters (m) thick and only a extra advising. small amount was sampled to determine a correlation. By analyzing different metamorphic events it could be possible to determine which elements and how they will relocate.

References Burchfiel, B., & Davis, G. (1977). Geology of the Sagamore Canyon-Slaughterhouse Spring Area, New York Mountains, California. Geological Society of America Bulletin, 88(11), 1623-1640

Miller, D., & Wooden, J. (1993). Geologic Map of the New York Mountains area, California and Nevada. U.S. Geological Survey. 93-198

Google Earth (4 Nov. 2006). Slaughterhouse Springs, Ca. Google Earth. Google. Retrieved March 25, 2013.

U.S. Geological Survey, (1983). Ivanpah Quadrangle California-San Bernardino Co. 7.5 Minute Series (Topographic).

DIMENSIONS 63 Understanding the Orientation of the Sierra Nevada Frontal Fault System in the Vicinity of Lone Pine and Independence, CA

Department of Geological Sciences, California State University, Fullerton

Garrett Mottle

Abstract Owens Valley in eastern California is a deep graben located between the Inyo and White Mountains to the east and by the Sierra Nevada Mountains to the west. The boundary between Owens Valley and the Sierra Nevada Mountains is the Sierra Nevada Frontal Fault System (SNFFS), which is a system of normal faults formed by crustal extension of the Basin and Range province. It generally is assumed in previous studies that the SNFFS normal faults dip steeply at ~60 degrees east. However, a recent study (Phillips and Majkowski, 2011) shows that the faults typically dip less than 50 degrees along the northern part of the SNFFS near Bishop. More recent work (Shagam 2011) near Independence shows that faults there dip 29-34 degrees east. My hypothesis is that faults farther south between Independence and Lone Pine dip shallowly (~30 degrees). I will test this hypothesis by mapping and measuring fault orientations of the SNFFS at George and Bairs Creeks using hand-held GPS and differential GPS devices at three fault exposures. The selected faults will have sufficient elevation exposure to allow for three-point analysis to determine their orientations in detail. Deliverables of this study will include a detailed map of a section of the SNFFS, an analysis of fault outcrop exposures to determine fault dip, and a report discussing the data and conclusions found in this study.

References Le, K., Lee, J., Owen, L.A., Finkel, R. 2007. Late Quarternary slip rates along the Sierra Nevada frontal fault zone, California: Slip partitioning across the western margin of the eastern California shear zone-Basin and Range, Geological Society of America Bulletin, vol. 119, p 240-256

Phillips, F. M., Majkowski, L., 2011. The role of low-angle normal faulting in active tectonics of the northern Owens Valley, California, Lithosphere, v. 3, p. 22-36.

Shagam, Greg. Analysis of the Sierra Nevada Frontal Fault orientation in the vicinity of Lone Pine and Independence, California. B.S. Thesis, California State University, Fullerton, 2012.

DIMENSIONS 64 Origin of Debris Flow Deposits on Starvation Canyon Fan,Death Valley, California

Department of Geological Sciences, California State University, Fullerton

Kelly Shaw Advisor: Jeffery Knott

Abstract Hunt and Mabey (1966, Fig. 48, pg. 67) mapped three debris fan boulders these flows would have traveled ≤ 16 km before being lobes on the Starvation Canyon alluvial fan in western Death Valley, deposited on the Starvation Canyon fan, with the origination point California, with an estimated volume of 8 to 25 million cubic yards. >8 km from the apex of the fan. My interpretation is that these These deposits were mapped on top of the fan indicating that they debris flows were emplaced in multiple events rather than one large were younger than the ~70 ka Qg2 alluvial fan. Tertiary volcanic rocks event. This is based on the differences in weathering of the boulders were mapped at the mouth of Starvation Canyon in the Panamint as well as the disconnected nature of the debris flow deposits. The Range by Hunt and Mabey, with faulted Cambrian and Precambrian active channel (Qg4) is comprised of similar debris flow deposits as metasedimentary rocks to the west. The Precambrian Sterling are found in the older channels. This would indicate that the same Quartzite and Johnny Formations are composed of distinctive brown processes continue to repeat, that this process has been ongoing for quartzite and purple shale. Granite at Hanaupah Canyon is 8.6 km ~70 ka and that debris flow travel may be ≤ 16 km. upstream from the piedmont. Field observations show that the northern lobe of the Qg2 deposits of the Starvation Canyon fan and consists of grusified References granite boulders at the surface. The debris flow deposits, which Hunt, C. R., Mabey, D. R., Stratigraphy and Structure, Death Valley, generally line a wash channel, are composed of 1-6 m diameter, California, 1996 varnished, but relatively unweathered, granite boulders with rare metasedimentarty boulders (<2%). The wash channel is incised through the Qg2 deposits with debris flow boulders overtopping the channel margins and resting atop the Qg2 deposits at the distal end. The southern lobe of the Starvation Canyon fan is composed of Qg3 gravels with an intervening active channel (Qg4). Both the younger and active channels are lined with large granite boulders (up to ~9 m diameter) that are unweathered and have very little, if any, varnish. Based on my field observations, Hunt and Mabey were correct in their conclusion that younger, unweathered granite boulders from debris flows are found atop the Qg2 gravel. However, my observations are that the debris flow boulders are limited to the channel and channel margins. I infer that these deposits traveled along the incised channels and did not overtop the Qg2 gravels, aside from the area at the distal end of the fan. I infer that the flow volumes were significantly lower volume than previously thought and, based on the dominance (>98%) and size (up to ~9 m) of granite

DIMENSIONS 65 Landslide Hazards in the Pai Valley, Northern Thailand

Department of Geological Sciences, California State University, Fullerton

Garcia, Dylan J., Shaw, K., Rhodes, B.P., Chantraprasert, S.

Abstract In this study we look at past mass wasting events in the Pai Valley of Northern Thailand to better understand the size and scope of previous slide events. Thailand is located in southeast Asia, sharing borders with Burma, Laos, Cambodia, Malaysia, The Andaman Sea and The Gulf of Thailand. Pai Valley is in the northernmost region of Thailand, close to the Burmese border. Research on past slides in this area is significant because of the population living on these slides. The inhabitants of Pai had an estimated population of about 2,284 as of 2006 (www.geonames.org). Residents built their homes on the suspected slide area and are dependent on the agriculture grown on the slides. A future slide event could be catastrophic for the valley, including major loss of life. It is hypothesized that in the past there was a massive debris flow that originated west of the valley. Our research shows the suspected slides are “Torrent Slides”, or “Long Slides”. These are channelized type slides discharging water and debris composed of bedrock, unconsolidated sediment or organic material and can be triggered by changes in ground water conditions or the geometry of the slope. Given the tropical setting of the area, it is presumed these slides were triggered by excessive rains or unseasonably wet periods. Through satellite imagery and on the ground mapping we determined at least two slide events and possibly a third. The mapping process involved driving and looking for locations with boulders versus bedded areas. When we determined the end of the boulder section and beginning of bedding, we plotted the location on our map. We continued until we had the boundaries of the slide determined. Clast counts were performed at a number of locations within the slide area. At each location an average of 50 clasts were counted, classified and measured. Sizes ranged from sand grain to 16 meters. The main lithologies of the clasts were granites and sandstones, this is expected if the slides originated west of the valley. There was no clear difference in weathering from one location to the next which leads us to believe these events occurred in a relatively short time frame. Future research could determine the precise origins of the slides and cosmogenic dating could be helpful in determining how old the slides are.

DIMENSIONS 66 Petrographic and geochemical analysis of the Summit Gabbro and associated granitoids of the Kern Plateau, Southern Sierra Nevada, California

Department of Geological Sciences, California State University, Fullerton

Elizabeth White Advisor: Dr. Diane Clemens-Knott

Abstract The Sierra Nevada Batholith (SNB) formed from multiple arc analyze the rock chemistry of each sample using the x-ray fluorescence magmatism events that occurred as a result of the collision between spectrometer (XRF) in order to identify the most mafic samples; and the North American and Farallon plates during the Mesozoic Era (3) analyze the compositions of minerals to determine their Mg# and (251-65 million years ago). Arc batholiths, such as the SNB, are further estimate the composition of the magma from which they important to understand because they record episodes of continental crystallized. In order to complete this last task, I will use two crustal growth. Calculating the amount of material, in this case instruments: the scanning electron microscope (SEM) and an magma, transferred from the mantle to the crust is dependent on electron probe microanalyzer (EPMA). A desired potential outcome the composition of the mantle wedge located beneath the continent. of this study is to locate gabbros with Mg#’s between Fo72-Fo65. The focus of this research is to understand the origin and formation The highest Mg# recorded to date from the eastern SNB is Fo50 of the SNB by geochemically characterizing the Summit Gabbro and (Gevedon, 2013), a value significantly lower than what would be suite from the Kern Plateau located in the southern Sierra Nevada predicted for a pristine mantle melt. Thus, geochemical Mountains, California. Gabbros are crucial to studies on mantle-wedge characterization of the Summit Gabbro may provide new information composition since they have experienced a minimal amount of about the type of mantle underlying the eastern Sierra Nevada arc chemical-evolution and represent a proxy for the chemical and is one of the first steps to determining how the western margin composition of the mantle source below the batholith. By definition of the North American continent was modified and expanded by gabbros are mafic, or silica-poor, and contain large quantities of Mesozoic arc magmatism. magnesium- and iron-rich minerals, such as olivine. The amount of chemical evolution that was experienced by a magma can be quantified using the magnesium number (Mg#), which is calculated References using the formula Mg # = [100 Mg/Mg + Fe2+]. In this formula, Mg Gevedon, Michelle, 2013, Paired oxygen and hafnium isotopic and Fe2+ are atomic proportions, which are measured using scanning analysis on zircon from gabbros: identifying potential Mesozoic electron microscopes (SEM) and electron microprobe analyzers mantle heterogeneity of the Sierra Nevada Arc: MS thesis. California (EMPA). Higher Mg#’s indicate more primitive magmas—i.e., magmas State University, Fullerton; California. Print. that have not evolved much from their original, mantle-derived composition. Therefore, gabbros with higher Mg#’s are ideal for identifying mantle-derived magmas that have not experienced much differentiation, and the geochemical compositions of these high-Mg# rocks can be used to determine whether the mantle source rock was either “enriched” or “depleted”. The methodology to complete this object includes: (1) study thin sections of each sample using a petrographic microscope to look for alteration that may have changed rock chemistry and identify minerals for SEM analysis; (2)

DIMENSIONS 67 Permutation Test in Microarray Data Analysis

Department of Mathematics, California State University, Fullerton

Mirna Dominguez Advisor: Gülhan Bourget

Abstract Microarray technology has become one of the most powerful tool to of computational difficulty, adopting these methods in microarray simultaneously study thousands of genes at once. Since microarray analysis have been slow in practice. data involves high dimensional data (i.e., number of genes are To test whether the difference in means is statistically significant, comparably higher than number of replicates), making inferences analysis of variance (ANOVA) method can be performed. If the using current statistical methods are insufficient. Despite the inability populations from which the data were sampled violate assumption(s) of the F-test to draw valid conclusions in Microarray data, it is one of of the ANOVA, then the results of the analysis may be incorrect or the common statistical methods that is used. One of the assumptions misleading. For example, if the assumption of independence of the of the classical F-test is that groups (genes) are supposed to be observations is violated, then the ANOVA is not appropriate. The test independent. However, this assumption is violated in microarray statistics used in ANOVA is the classical F-test. Since gene-gene data because gene-gene interactions are possible. In this paper, interactions can happen in nature, the F-test should not be considered we suggest performing permutation test to explore if the p-value to answer if there is a difference in means of genes under different obtained from the F-test can be improved. We consider various conditions. Permutation test can be an alternative test to consider. magnitudes of correlation among genes from no correlation to Permutation test is particularly used when the distributions of the strong correlations in a Monte Carlo study to compare the p-values data is unknown, sample sizes are small, or outliers are present [7]. of the F-test and the permutation test. Our findings show that the However, in this paper we suggest performing permutation test in lieu permutation test is preferred over the F-test. of the violation of the independency assumption in ANOVA. We run Monte Carlo Simulation studies by considering various magnitudes Introduction of correlations among genes to answer if the permutation test is Microarray experiments study gene expressions of cells, organisms, preferable over the F-test. or tissues. For example, in a cancer study healthy and diseased tissues In Section 2, we describe the data and outline the F-test and of breast is compared to identify disease causing genes. the permutation test, and in Section 3 we describe Monte Carlo Microarray data has a high dimensional data structure which Simulation study and present its findings. Finally, we draw makes challenging todraw statistical inferences [1]. Various methods conclusions in Section 4. have been proposed to answer different kinds of questions. For example, clustering and classification are two common methods to identify groups of genes that share similar functions [2,3]. These methods search for similar genes, but they do not help to identify which genes are differentially expressed under different conditions. To find differentially expressed genes, we need to perform hypothesis tests of no difference in the means of gene expressions under different conditions. Fold change, linear models, as well as Bayesian methods [4–6] are some of the statistical tests; however because

DIMENSIONS 68 3 3 in Section 3 we describe Monte Carlo Simulation study and present its findings. Finally, in Section 3 we describe Monte Carlo Simulation study and present its findings. Finally, we draw conclusions in Section 4. we draw conclusions in Section 4.

2 Methodology 2 Methodology Methodology A singleA multivariatesingle multivariate observation observation is the collection is the collection of measurements of measurements on p different variablesWe want to make inferences about the differences of the mean vectors A single multivariate observation is the collection of measurements on p different variables (genes)on taken p different from the variables same trial (genes) (array). taken If n observationsfrom the same have trial been (array). obtained, Ifn theof entirethe populations. That is, μ1 − μ2, where μi is the mean vector of (genes) takenobservations from the same have trial been (array). obtained, If n observations the entire data have set been can obtained,be the entirepopulation i(= 1, 2). We want to answer the question of μ = μ or data set can be represented in an n p matrix 1 2 represented in an n × p matrix× equivalently: Is μ − μ = 0? We need to make some assumptions to data set can be represented in an n p matrix 1 2 × provide answers to these questions. The assumptions are: X X X X 11 12 ··· 1p 1 X11 X12 X1p  X1  X21 X···22 X2p X2 X = ···   =  (1)1. The sample ' ' ' is a random sample of size X X. . .X .  X .  X1, X2 , . . . ,Xn1 21 . 22 . ..2p .  2 .  X =   ···  =     (1) n from a p-variate population with mean vector μ and  . . .. .   .   1 1  . . . .   .   covariance matrix .   Xn1 Xn2 Xnp   Xn  Σ1   ···       Xn1 Xn2 Xnp   Xn    ···    2. The sample Y'1, Y'2 , . . . ,Y'n2 is a random sample of size The row vector X represents the jth multivariate observation.  The matrix X represents j from a -variate population with mean vector and The row vector X represents the jth multivariate observation. The matrix X represents n 2 p μ2 p genes each havingj n observations. The row vector X'j represents the jth multivariate observation. The covariance matrix Σ2. p genes eachmatrix having n representsobservations. genes each having observations. Now, considerX a microarrayp experiment of n1 nand n2 sample from population 1 and ’ ' Now, considerNow, a microarray consider a experiment microarray ofexperimentn and n ofsample n 1 and from n 2 sample population 1 and 3. X’1, X’2, . . . ,X n1 is independent of Y'1, Y'2 , . . . ,Yn2. population 2, respectively. For example, population1 2 1 can represent the disease group, from population 1 and population 2, respectively. For example, population 2, respectively. For example, population 1 can represent the disease group, while populationpopulation 2 1 can can represent represent the the healthy disease group. group, Suppose while population that the expression 2 can levels of while populationrepresent 2 can the represent healthy the group. healthy Suppose group. that Suppose the expression that the expression levels of levelsFor of large samples, these assumptions are enough to make an p genes are measured and matrix representation of the population 1 and 2 are defined as p genes are measured and matrix representation of the population inference about − . However, when the sample sizes n and n p genes are measured and matrix representation of the population 1 and 2 are defined as μ1 μ2 1 2 X and 1Y and, which 2 are is defined in the form asX of and (1). Y Let, whichXij be is in the the expression form of ( level1). Let for X geneij j ofare sample small we need to have the following as- sumptions as well. X and Y, whichbe the is expression in the form level of (1). for Let geneX ij ofbe sample the expression from population level for gene . Thej of sample i from population 1. The expression levelj vectors fori sample i from population1 1 can be expression level vectors for sample from population can be ex- i from population 1. The expression level vectorsi for sample i from1 population 1 can be 5 expressed as Xi =(Xi1,...,Xip). The mean expression levels of gene j in population 1 pressed as X'i = (Xi1, . . . , Xip). The mean expression levels of gene j 1. Both populations are multivariate normal. expressed as X =(X ,...,X ). The mean expression levels of gene j in population 1 is, in populationi i1 1 is, ip 1. Both populations are multivariate normal. is, 2. Σ1 = Σ2 4 1 n1 4 X = X . 2. Σ1 = Σ2. j1 nn1 ij 1 i=1 The mean expression level vectorXj = for p genesXij. for population 1 is given by The null (H ) and alternative (H ) hypotheses are: The mean expression level vector forn1p genes for population 1 is given by 0 1 i=1 The null (H0) and alternative (Ha) hypotheses are: X¯ =(X¯ ,...,X¯ ) . Similarly, we can define these expressions for population 2. Now, we X¯ =(X¯ 1,...,X¯ p) . Similarly, we can define these expressions for population 2. Now, we 1 The pmean expression level vector for p genes for population 1 is are ready to outline the hypotheses we are interested in. H0 : μ1 − μ2 = 0 versus Ha : μ1 − μ2 = 0 are readygiven to outlineby X = the(X1 hypotheses, . . . , Xp)'. Similarly, we are interested we can define in. these expressions H : µ µ = 0 versus H : µ µ = 0, 0 1 − 2 a 1 − 2  for population 2. Now, we are ready to outline the hypotheses we are interested in. where = (μ , μ μ )' is the mean expression level of 2.1 Comparing Mean Vectors from Two Populations μ1 11 12 , . . . , 1p 2.1 Comparing Mean Vectors from Two Populationswhere µ1 =(µ11,µ12,...,µ1p) is the mean expression level of population 1, and µ2 = population 1, and μ2 = (μ21, μ22 , . . . ,μ2p)' is the mean expression level Comparing Mean Vectors from Two Populations of population 2. That is, the null and alternative hypotheses can be Consider a random sample of n1 and n2 from populations 1 and 2. The observations(µ21,µ22,...,µ on2p) is the mean expression level of population 2. That is, the null and Consider a random sample of n1 and n2 from populations 1 and 2. The observations on Consider a random sample of n 1 and n 2 from populations 1 and 2. rewritten as p variables can be arranged as follows: alternative hypotheses can be rewritten as p variablesThe canobservations be arranged on asp variables follows: can be arranged as follows:

H :(µ µ ,µ µ ,...,µ µ ) = (0, 0,...,0) Population 1: X1 , X2 ,...,Xn 1 0 11 21 12 22 1p 2p Population 1: X1 , X2 ,...,Xn 1 − − −

H :(µ µ ,µ µ ,...,µ µ ) = (0, 0,...,0) Population 2: Y1 , Y2 ,...,Yn 2 . a 11 21 12 22 1p 2p Population 2: Y1 , Y2 ,...,Yn 2 . − − − 

We want to make inferences about the differences of the mean vectors of the popula- We want to make inferences about the differences of the mean vectors2.2 of theF popula-Test DIMENSIONS 69

tions. That is, µ1 µ2, where µi is the mean vector of population i(= 1, 2). We want to tions. That is, µ1 −µ2, where µi is the mean vector of population i(= 1, 2). We want to − The classical F -test compares the means of the columns of X, and assumes that these answer the question of µ1 = µ2 or equivalently: Is µ1 µ2 = 0? We need to make some answer the question of µ1 = µ2 or equivalently: Is µ1 −µ2 = 0? We need to make some − columns are independent (univariate case). Here, we want to compare the differences of assumptions to provide answers to these questions. The assumptions are: assumptions to provide answers to these questions. The assumptions are: the p means of X and Y. To adopt the data structure from multivariate case to univariate

1. The sample X1 , X2 ,...,Xn 1 is a random sample of size n1 from a p-variatecase, we popula- consider the observations as the differences of the X and Y. That is, we compute 1. The sample X1 , X2 ,...,Xn 1 is a random sample of size n1 from a p-variate popula-

tion with mean vector µ1 and covariance matrix Σ1. Xij Yij and apply the univariate F -test on these observations. The statistic tion with mean vector µ1 and covariance matrix Σ1. −

2. The sample Y1 , Y2 ,...,Yn is a random sample of size n2 from a p-variate popula- 2. The sample Y , Y ,...,Y 2 is a random sample of size n from a p-variate popula- MST 1 2 n2 2 F = , MSE tion with mean vector µ2 and covariance matrix Σ2. tion with mean vector µ2 and covariance matrix Σ2.

where MST is the mean square for treatments (genes) and MSE is the mean square for 3. X1 , X2 ,...,Xn 1 is independent of Y1 , Y2 ,...,Yn 2 . 3. X1 , X2 ,...,Xn 1 is independent of Y1 , Y2 ,...,Yn 2 . errors, follows F distribution with p 1 and p(n + n 1) degrees of freedoms. − 1 2 − For large samples, these assumptions are enough to make an inference about µ1 µ2. For large samples, these assumptions are enough to make an inference about µ1 −µ2. − However, when the sample sizes n1 and n2 are small we need to have the following as- However, when the sample sizes n1 and n2 are small we need to have the following as- sumptions as well. sumptions as well. 5

1. Both populations are multivariate normal.

2. Σ1 = Σ2.

The null (H0) and alternative (Ha) hypotheses are:

H : µ µ = 0 versus H : µ µ = 0, 0 1 − 2 a 1 − 2 

where µ1 =(µ11,µ12,...,µ1p) is the mean expression level of population 1, and µ2 =

(µ21,µ22,...,µ2p) is the mean expression level of population 2. That is, the null and alternative hypotheses can be rewritten as

H :(µ µ ,µ µ ,...,µ µ ) = (0, 0,...,0) 0 11 − 21 12 − 22 1p − 2p

H :(µ µ ,µ µ ,...,µ µ ) = (0, 0,...,0) a 11 − 21 12 − 22 1p − 2p  7 2.2 F Test 7 F-Test The variance covariance matrices are defined as The classicalTheF -test classical compares F-test the compares means of the the means columns of the of Xcolumns, and assumes of XThe, and that variance theseThe covariance variance matrices covariance are definedmatrices as are defined as columns are independentassumes that (univariate these columns case). are Here, independent we want to(univariate compare case). the differences Here, of Σρ 00...... we want to compare the differences of thep means of X and Y. To Σρ 00... ......  the p means of X and Y. To adopt the data structure from multivariate case to univariate 0 Σ( ρ) 00... . adopt the data structure from multivariate case to univariate case,   . −  0 Σ( ρ) 00... . .  case, we considerwe consider the observations the observations as the differences as the differences of the X and of Ythe. That and is, we. That compute − 00Σρ 0 ... . X Y   .    Σ1 = Σ2 =  .   , is, we compute Xij − Yij and apply the univariate F-test on these  00Σρ 0 .... .  .  Xij Yij and apply the univariate F -test on these observations. The statistic Σ1 = Σ2 =   .00 , 7 Σ( ρ) 0 .  − observations. The statistic  .  .  −   .00Σ( ρ) 0. . . . . .   −  . . .0. .. .   . . .  . .   MST The variance covariance matrices are defined as  . . .0.  .. .   F = ,    ... MSE  ......  ......    ..   ......  ... .  Σ 00......  ρ   where MST is the mean square for treatments (genes) and MSE is the mean square for where where MST is the mean square for treatments (genes) and MSE is  where .  where 0 Σ( ρ) 00... . the mean square for errors, follows F distribution with p and −  errors, follows F distribution with p 1 and p(n1 + n2 1) degrees of freedoms.− 1  . n 2 n 1 n 2 n 1 − −  00Σ 0 1 ...ρ.  ... ρ − ρ − 1 ρ ... ( ρ) − ( ρ) − p(n + n − 1) degrees of freedoms.  ρ  − − − 1 2 Σ1 = Σ2 =1 ρ ... ρn 2 ρn 1  , 1 ρ ... ( ρ)n 2 ( ρ)n 1  . −  − .  n 2  − − n 2  .00Σ( ρ) 0 1 . ...... ρ − − − ρ −1 ...... ( ρ) −   −n 2   − n 2 − Permutation Test ρ . 1 ......  ρ −. . .. . .. .. ρ .  1 ... .... ( .ρ.) − .. .. .   . . Σρ.0. =  . .. . . . . − .  , Σ( ρ) =  . − . . . .  . Permutation test also called randomization or re-randomization test .  . . .  .    .  . − .  . .   Σρ =  .  ......  .  , Σ( ρ).=  . .  .. .. .. .  . .     n 2 .−. .  .   n 2  . .  has been proposed for a long time, but it has become practical with ......  ...ρ −  ...... .  . ρ  ( ρ) − ...  . . ρ   n 2 . .     n 2  .  − .  −  the advance of high-speed computers. Permutation test is more ρ − ... . .  nρ 1 n 2  ( ρ) −  ... .  .n 1 ρn 2    ρ −  ρ − ... ρ− 1  ( ρ) − ( ρ−) − ... ρ 1  useful when the distributions of the data are unknown, sample sizes n 1 n 2    n 1  n 2  − −  −  where ρ − ρ − ... ρ  1  ( ρ) − ( ρ) − ... ρ 1      − − −  are small, or outliers are present. The basic approach to permutation    The matrix Σρ and Σ( ρ) have dimensions n n, and the matrix Σ1 = Σ2 has dimension test that we used in the simulation follows: n 2 n 1 − n 2 × n 1 1 ρThe ... matrix ρ − Σρρ−and Σ( ρ) have dimensions1 n ρn, and ... the( matrixρ) − Σ(1 =ρ)Σ−2 has dimension − p p. −× − −  n 2 ×  n 2 Decide a test statistic. In ANOVA, the testρ statistic1 p is... thep. ... ρ − ρ 1 ...... ( ρ) − · × We considered− sample sizes of n = n =− 10 for p = 50 genes. We assume that there  . . . . .   . . . .1 2 .  F -test. Σρ =  . .. .We. considered.. .  sample, Σ( ρ) sizes=  of n. = n = 10.. for p..= 50 genes... We. assume . that there   −are 5 groups1 of 102 genes in each group, totaling of 100 genes. That is, we fixed n = 10  . .   . .  Calculate the F-test for the data, called nit F2 obs. . .   n 2 . .  · ρ − ... are. 5 groups. ofρ 10 genes in each( group,ρ) − totaling... of 100. genes.. That is,ρ we fixed n = 10   in Σρmatrix.− We assumed ρ =0, 0.1, 0.2,...,− 0.9 as various magnitudes of correlations, Repeat the following r times, where r isn a 1numbern 2 greater   n 1 n 2  · ρ − ρ − in...Σρ matrix. ρ 1 We assumed ρ=0( ,ρ0).1−, 0.2(,...,ρ) 0−.9 as... variousρ magnitudes1 of correlations, than .   and µ=−µ = (0.5−, 0.5, 0, 0,...,0).− We run 1000 data sets to test the null hypothesis at 1000   1 2  and µ = µ = (0.5, 0.5, 0, 0,...,0). We run 1000 data sets to test the null hypothesis at - Shuffle the labels of the populations (genes). 1 2 the significance level of α =0.05. We followed the permutation test approach described The matrix Σρ and Σ( ρ) have dimensions n n, and the matrix Σ1 = Σ2 has dimension − × - Calculate the -test for the shuffled data, thecall it significance . The level matrix of αΣ=0ρ and.05. Σ(- Weρ) have followed dimensions the permutation n × n, and test the approach matrix described F Fs in the previous section with r = 1000. We computed p-values, which is the probability of p p. = has dimension . × in the previousΣ1 sectionΣ2 with r = 1000.p× Wep computed p-values, which is the probability of Compute how many times Fs is greater than or equal to rejecting the null hypothesis when the null hypothesis is true, to draw conclusions about · We considered sample sizes of n1 We= n considered2 = 10 for psample= 50 genes. sizes of We n1 assume= n2 = 10 that for there p = 50 genes. Fobs, call this number s. Then calculate the p-valuerejecting as the the null hypothesis when the null hypothesis is true, to draw conclusions about We assumethe thatF there- and are the 5 permutation groups of 10 tests. genes in each group, totaling ratio of s and r. That is, p-value=ares/r 5. groups of 10 genes in each group, totaling of 100 genes. That is, we fixed n = 10 the F - and theof 100 permutation genes. That tests. is, we fixedn = 10 in Σρ matrix. We assumed · Decide if the -value can rejectin theΣ nullρ matrix. hypothesis. We assumed ρ =0ρ =, 0 .01,, 0.1,0.2,..., 0.2,0 . 9. as. , various0.9 as various magnitudes magnitudes of correlations, of correlations, and μ1 = μ2 = (0.5,0.5,0,0,...,0). We run 1000 data sets to test the null and µ1 = µ2 = (0.5, 0.5, 0, 0,...,0). We run 1000 data sets to test the null hypothesis at hypothesis at the significance level of α = 0.05. We followed the Results the significance level of α =0.05.permutation We followed test the approach permutation described test approach in the previous described section with We generated two multivariate normal distributionsin the previous by running section with r =r 1000. = 1000 We. We computed computedp-values, p-values, which which is the is probability the probability of of Monte Carlo simulation studies in R software: MVN(μ , Σ ) and rejecting the null hypothesis when the null hypothesis is true, to draw rejecting the1 null1 hypothesis when the null hypothesis is true, to draw conclusions about MVN(μ2, Σ2), each with dimension p (genes). conclusions about the F-and the permutation tests. the F - and the permutation tests. DIMENSIONS 70 8

8

Table 1. The probability of rejections for F and Permutation tests at α =0.05. Table 1. The probability of rejections for F and Permutation tests8 at ρ α =0.05. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 F -TestTable 1. The probability0.040 of 0.053 rejections 0.054 for F 0.056and Permutation 0.073ρ 0.083 tests at 0.094 0.106 0.125 0.163 Permutationα =0.05. Test0 0.037 0.1 0.051 0.2 0.055 0.3 0.059 0.4 0.073 0.5 0.081 0.6 0.093 0.7 0.104 0.8 0123 0.9 0.165 F -Test 0.040 0.053 0.054 0.056ρ 0.073 0.083 0.094 0.106Acknowledgments 0.125 0.163 Permutation Test 0.0370 0.1 0.051 0.2 0.055 0.3 0.059 0.4 0.5 0.073 0.6 0.081 0.7 0.093 0.8 0.9 0.104 0123 0.165 F -Test 0.040 0.053 0.054 0.056 0.073 0.083 0.094 0.106 0.125 0.163 We thank the California State University, Fullerton (CSUF) for Permutation Test 0.037 0.051 0.055 0.059 0.073 0.081 0.093 0.104 0123 0.165 providing us with an intramural grant to work on this project. The We consider the test statistic is valid if its p-value smaller than the chosen significance Table 1. The probability of rejections for F and Permutation tests at α = 0.05. author extend a special thanks to Dr. Gu ̈lhan Bourget for mentoring level α or the p-value lies in the (1 α)% confidence interval and allowing the opportunities to grow as undergraduate researcher. We considerWe consider the the test test statistic is valid is valid− if its p if-value its p smaller-value than smaller the chosen than significance the chosen significance We consider the test statistic is valid if its p-value smaller than the The authors also thank Dr. Bourget for her valuable comments and level α or the p-value lies in the (1 α)% confidence interval level α chosenor the p significance-value lies in level the− (1α or αthe)%p α confidence-value(1 α lies) in interval the (1 − α)% editions that helped improved the quality of this paper. α− 2 − . (2) confidence interval ±α(1 α) s α 2 − . (2) ± s α(1 α) α 2 − . (2) ± s In ourIn simulation our simulation with with α =0=0.05.05 and ands = 10000,s = 10000, the interval the (2) interval becomes (2) becomes

In our simulationIn our simulation with α with=0. α05 = and 0.05s and= 10000, s =10000 the, interval the interval (2) becomes (2) becomes References 0.05 0.95 Mehta T, Tanik M, B AD (2004) Towards sound epistemological 0.05 2 × (0.043, 0.057). ± 01000.05 ⇐⇒0.95 0.05 2 × (0.043, 0.057). foundations of statistical methods for high-dimensional biology. ± 1000 ⇐⇒ Nature Genetics 36: 943-947. When ρ = 0, the assumption0 of.05 the ANOVA0.95 is satisfied because populations are 0.05 2 × (0.043, 0.057). independent. Therefore, in± this situation,1000F -test should⇐⇒ be equal to the significance level. WhenWhenρ = ρ= 0,0, the assumption assumption of of the the ANOVA ANOVA is issatisfied satisfied because because populationsAlon areU, Barkai N, Notterman D, Gish K, Ybarra S, et al. (1999) Broad This observation is validated for tests from the table. To study the effect of the correlation populations are independent. Therefore, in this situation, F-test patterns of gene expression revealed by clustering analysis of tumor independent.among genes, Therefore, we considered in this various situation, degrees of magnitudesF -test should of correlations be equal from to small the to significance level. Whenshouldρ = be 0, equal the assumptionto the significance of the ANOVAlevel. This is observation satisfied because is validated populations and normal are colon tissues probed by oligonucleotide arrays. This observationhigh degrees. is When validated the correlation for tests is small from to the medium, table. we Tostill study suggest the using effectF -test of the correlation independent.for tests Therefore, from the in table. this To situation, study theF -test effect should of the be correlation equal to the among significance Proceedings level. of the National Academy of Sciences of the United because type I error is well protected. However, when the correlations among genes are amonggenes, genes, we considered variousvarious degreesdegrees of of magnitudes magnitudes of of correlations correlations from smallStates to of America 96: 6745-6750. This observationincreasingfrom smallF -test is to validated should high be degrees. avoided for tests to When make from anythe the kinds correlation table. of conclusion To study is small about the theto effect medium, data. of the correlation high degrees. When the correlation is small to medium, we still suggest using F -test amongWhenwe genes, still correlation wesuggest considered is medium using to variousF strong,-test permutationbecause degrees oftype test magnitudes is Ipreferred. error is well of correlations protected. from smallBrazma to A, Vilo J (2000) Gene expression data analysis. FEBS Letters becauseHowever, type I error when is the well correlations protected. However,among genes when are the increasing correlations F-test among genes480: are17-24. high degrees. When the correlation is small to medium, we still suggest using F -test should be avoided to make any kinds of conclusion about the data. increasing F -test should be avoided to make any kinds of conclusion about the data. becauseWhen type correlation I error is well is medium protected. to strong, However, permutation when the correlations test is preferred. among genesBaldi are P, Long A (2001) A bayesian framework for the analysis of When correlation is medium to strong, permutation test is preferred. increasing F -test should be avoided to make any kinds of conclusion about themicroarray data. expres- sion data: regularized t -test and statistical Conclusion inferences of gene changes. Bioinformat- ics 17: 509-519. When correlationMicroarray isexperiments medium to strong,produce permutation a large amount test of is preferred.gene expression data. The main goal is to apply appropriate statistical methods to Wang S, Ethier S (2004) A generalized likelihood ratio test to identify determine which genes function differently under different conditions differentially expressed genes from microarray data. Bioinformatics (i.e, normal lung cells versus diseased lung cells). 20: 100-104. One of the most commonly used methods to test whether the difference in means is statistically significant is the ANOVA. However, Wettenhall J, Smyth G (2004) limmagui: a graphical user interface for since genes can be correlated among themselves, the use of ANOVA linear mod- eling of microarray data. Bioinformatics 20: 3705-3706. can lead to invalid conclusions of the data. In this paper, we Good P (2000) Permutation Tests. Springer. investigated if the permutation test applied to ANOVA could improve the p-value of the classical F-test. Our findings show that we can perform F-test if the correlation among genes are small to moderate, but should be avoided if strong correlations are suspected. When correlations among genes are strong, permutation test is an alternative method. Overall, permutation test outperformed the F-test, hence we suggest to use permutation test in lieu of the F-test.

DIMENSIONS 71 Author: Andrew Halsaver Mathematica and Clifford ProjectiveAuthor: Spaces Andrew HalsaverMentor: Dr. Alfonso Agnew Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew Abstract Author: AndrewAuthor: Halsaver Andrew Halsaver Author:Mathematica AndrewMathematica Halsaverand Clifford ProjectiveandAuthor: Clifford Andrew Spaces Projective Halsaver SpacesMentor: Dr.Mentor: Alfonso Agnew Dr. Alfonso Agnew Abstract Computing canonical equivalence classes (EC) in Clifford projective space Mathematica and Clifford ProjectiveMathematica Spaces and CliffordMentor: Projective Dr. Alfonso Spaces Agnew Mentor: Dr. AlfonsoAuthor: Agnew Andrew Halsaver MathematicaComputing andAbstract Clifford canonical ProjectiveAbstract(CPS) equivalence can Spaces be difficult,classes (EC) especiallyMentor: in Clifford in Dr. high projectiveAlfonso dimension. Agnew space We created our own package in Mathematica to perform such computations. By passing in certain parame- Abstract Abstract (CPS) can beComputing difficult, especially canonical in equivalence high dimension. classes We (EC) created inClifford our own projective package space Abstractin Mathematica to performtersComputing such to our computations. user-defined canonical Mathematica By equivalence passing inclasses function, certain (EC) EquivalenceClass,parame- in Clifford projective we can analyzespace Computing canonical equivalence classesComputing (EC) in canonical Clifford equivalenceprojective(CPS) can space classes be difficult,(CPS) (EC) can especially in be Clifford difficult, in projective high especially dimension. space in high We dimension. created our We own created package our own package ters to our user-defined Mathematicacanonical representatives function, EquivalenceClass, of EC in CPS. Thewe can examples analyze in this paper involve 2,0 (CPS) can be difficult, especially in high(CPS) dimension. can be difficult, We createdComputing especially ourin own Mathematica incanonical packagehigh dimension. equivalencein to Mathematica perform We classes created such to computations. (EC) perform our own in Clifford packagesuch computations. By projective passing in space certain By passing parame- in certain parame-C canonical representativesexplicity, of EC in CPS.but our The Mathematica examples in package this paper works involve with Clifford 2,0 algebras such as 3,0 in Mathematica to perform such computations.in Mathematica By passing(CPS) to perform can in certainbe suchters difficult, to computations. parame- our especially user-definedters to inBy our high Mathematica passing user-defined dimension. in certain function, Mathematica We parame- created EquivalenceClass, our function, own package EquivalenceClass, weC can analyze we can analyzeC explicity, but our Mathematicaand package1,1. Other works Clifford with algebras Clifford algebraswill be available such as in 3 our,0 package in the future. in Mathematicacanonical to perform representatives such computations. of EC in CPS. By The passing examples in certain in this parame- paper involve 2,0 ters to our user-defined Mathematicaters function, to our EquivalenceClass, user-defined Mathematica we can analyze function,canonical EquivalenceClass,C representatives we can of EC analyze in CPS. The examplesC in this paper involve 2,0 and 1,1. Other Clifford algebras will be available in our package in the future. C ters to our user-definedexplicity, but Mathematica our Mathematica function, package EquivalenceClass, works with Clifford we can algebras analyze such as 3,0 C canonical representatives of EC in CPS.canonical The examples representatives in this paperC of EC involve in CPS. The2,0 explicity, examples but in thisour Mathematica paper involve package 2,0 works with Clifford algebrasC such as 3,0 canonical representativesand 1,1.C Other of ECIntroduction Clifford in CPS. algebras The examples will be available in thisC paper in our involve package 2 in,0 the future. C explicity, but our Mathematica packageexplicity, works with but our Clifford Mathematica algebras such packageC as works 3,0 and with 1 Clifford,1. Other algebras Clifford such algebras as 3 will,0 be availableC in our package in the future. explicity,Introduction but our MathematicaC packageC works with CliffordC algebras such as 3,0 and 1,1. Other Clifford algebras willand be available 1,1. Other in our Clifford package algebras in the will future. be availableWhat in is our Clifford package Algebra? in the future. C C and . Other Clifford algebras will be available in our package in the future.C WhatC 1,1 isIntroduction Clifford Algebra?IntroductionClifford Algebra is an associative algebra equipped with a quadratic form. p,q Introduction Introduction Clifford Algebra is anFor associative example, consider algebra equippedthe quadratic with space a quadraticR . Itform. is an n dimensional real vector What is CliffordWhatAuthor: Algebra? is Clifford Andrew Algebra? Halsaver IntroductionFor example, consider thespace quadratic where spacen = pR+p,qq., Itand is hasan n adimensional non-degenerate real symmetric vector scalar product which What is Clifford Algebra?Mathematica andWhat Clifford is Clifford Projective Algebra? SpacesCliffordMentor: AlgebraClifford is Dr. an associative AlfonsoAlgebra is Agnew algebraan associative equipped algebra with equipped a quadratic with form. a quadratic form. space where n = p+q, andinduces has a non-degenerate the quadratic form: symmetricp,q scalar product which Clifford AlgebraWhat is is an Clifford associativeFor example, Algebra? algebra consider equipped the quadratic with a quadratic space R . form. It is an n dimensionalp,q real vector Clifford Algebra is an associative algebra equipped withinduces a quadratic the quadratic form. form:For example, consider the quadratic space R . It is an n dimensional real vector Abstract p,q Cliffordspace Algebra where is ann =p,q associativep+q, and algebra has a non-degenerate equipped2 with symmetric2 a quadratic2 scalar form. product2 which For example, consider the quadratic spaceFor example,R . It is consider an n dimensional the quadratic real space vectorRspace. It where is an nndimensional=xp+x q=, andx + real has... vector+ ax non-degeneratex ... symmetricx [1, scalar p. 205]. product which For example,induces consider the the quadratic quadratic form: space p,q· . It is1 an n dimensionalp − p+1 real− vector− p+q space where n = p+q, and hasComputing a non-degeneratespace canonical where symmetricn = equivalencep+q, scalar and has classes product a non-degeneratex (EC)x which= inx2 Cliffordinduces+ ... symmetric+ x theprojective2 quadraticx2R scalar space... product form:x2 which[1, p. 205]. space where n = p+·q, and1 hasFurthermore, a non-degeneratep − p+1 Clifford− symmetric− algebrap+q scalar is generally product non-commutative. which For example the Mathematicainduces the quadratic and form:(CPS) Clifford can beinduces difficult, Projective the especially quadratic in form: high Spaces dimension. We created2 our own package2 2 2 induces the quadratic form:x x = x1 + ... + xp xp2+1 ... 2xp+q2 [1, p. 205].2 in Mathematica to perform suchFurthermore, computations. Clifford By passing algebra· Clifford in is certain algebra,generallyx parame-x −= non-commutative. x21,0+, has−... +e−x1ep2 =xp+1e2 Fore1..., examplewherexp+q e1 the,e[1,2 p.is 205]. a canonical basis of 2 2 2 2 2 2 2 2 · − − − x x = x + ... + x x ... x x x =[1,x p.+ 205].... + x x ... thex algebra[1, [1, p. p.188-189]. 205].C − { } 1 ters top our user-definedp+1 Mathematicap+q · Clifford1 function,Furthermore, algebra,p − EquivalenceClass,p+12 2−,0, Clifford has− 2ep1+eq2 algebra=2 we cane2e is analyze1, generally where2 e non-commutative.1,e2 is a canonical For basis example of the Department of Mathematics,· California− − State− University, Fullertonx x = xC1 + ...Furthermore,+ xp xp+1− Clifford... xp+ algebraq { [1, is p.} generally 205]. non-commutative. For example the canonical representatives of EC inthe CPS. algebra The· [1, examples p.188-189]. in this paper− involve− − 2,0 Furthermore, Clifford algebra is generallyFurthermore, non-commutative. Clifford algebra ForClifford example is generally algebra, the non-commutative.CliffordWhat 2,0, is algebra,has a projectivee1e2 = ForC ,e space?2 has examplee1,e wheree = thee1e,ee2 , whereis a canonicale ,e basisis a canonical of basis of explicity, but our Mathematica package works with CliffordC algebras such as 2−,0 1 2 { 2 1}n 1 2 Furthermore,the Clifford algebra algebra [1, p.188-189]. is generallyA projective non-commutative.C space,3,0 denoted as− For example, is the{ theset of} lines through the origin of Clifford algebra, 2,0, has e1e2 = Clifforde2e1, where algebra,e1,e2 2,Whatis0, a has canonical ise1 ae2 projective= basise2e1, of space? wherethe algebrae1,e2 [1,is p.188-189]. a canonicalC basis of FP C and 1,1.− Other Clifford{ algebrasC} will be available− in our package{ in} then future.n+1 the algebra [1, p.188-189]. C the algebra [1, p.188-189].Clifford algebra,A projective 2,0, space, has e denoted1thee2 = vectore as2e1FP space,, where, isF thee1 set.,e A2 of simpleis lines a canonical through example the basis would origin of be of the real projective space, Andrew Halsaver WhatC is a projective1 space?− { } 2 the algebra [1, p.188-189].n+1 What. is In a projectivethis case, eachspace?n element of 0 is mapped to its respective equiva- the vector space,A projectiveF .RP A space, simple denoted example as would, is be the the set real ofRn lines projective through space, the origin of Advisor:What is Dr. aprojective Alfonso space?AgnewIntroductionWhat is a projective space?1 A projective2 space,FP denoted as ,−{ is the} set of lines1 through the origin of lencen+1 class (i.e. a line). Topologically,FP the set of EC, RP represents a circle since n WhatRP is. a In projectivethe this vector case, space?n each space, element. of AR simple0n example+1is mapped would to its be respective the real projective equiva- space, A projective space, denoted as , isA the projective set of lines space, through denoted the as origin, ofis theF set vector of lines space,−{ through} . the A simple origin1 of example2 would be the real projective space, FP 1 FP any two pointsn onF the2 same line in R belong to the same EC [2, p.2]. n+1 What is Clifford Algebra? n+1lenceA projective class (i.e.. space, In a this line). denoted case, Topologically,1 each as element, is the the of set set of of EC, lines0 RPis through mappedrepresents2 the to its origina circle respective of since equiva- the vector space, . AAuthor: simple examplethe Andrew vector would Halsaver space, be the real. A projective simpleRP example space, would. In beFP this the case,2 real each projectiveR element space, of 0 is mapped to its respective equiva- F Author: AndrewAuthor:F Halsaver Andrewn Halsaver+1 RP −{ } R 1 1 Clifford2 Algebra1 is an associativethe vectorany two algebra space,lence points class equipped on2 . the (i.e. A same simple a with line). line a example Topologically, quadratic in R belong would form. theto be the set the same of real EC, EC projective−{ [2,represents p.2].} space, a circle1 since MathematicaMathematica and Clifford Projective andRP Clifford. In Spacesthis Projective case, each elementMentor:Spaces of Dr.RRP AlfonsoMentor:.0 Inis this mapped Agnew Dr. case, Alfonso each to its element respective Agnew ofFR equiva-0 lenceis mapped class (i.e. to its a line). respective Topologically, equiva- RP the set of EC, represents a circle since MathematicaAbstract and Clifford Projective Spaces Mentor:1 Dr.Author: Alfonsop,qMethodology Andrew Agnew HalsaverMethodology2  2 RP For example,lence−{ consider class} the (i.e. quadratic aRP line).1 . In Topologically, space thisany case,R two. each It points the is−{ an element set}n on ofdimensional the EC, of sameR 1 represents line real0 is in vector mappedR belong a circle to to since its the respective2 same EC equiva- [2, p.2]. lence class (i.e. a line). Topologically, the set of EC, RP represents a circle since any twoRP points−{ on} the same1 line in R belong to the same EC [2, p.2]. ComputingMathematica canonical and equivalence Cliffordspace Projective where classes2n =(EC)p Spaces+ inq ,Clifford and haslence aprojectiveMethodology non-degenerateMentor: class (i.e. Dr. a line).Constructing Alfonso symmetric2 Topologically, Agnew scalarCPS in product theMathematica set of which EC, represents a circle since Abstract Abstract Abstractany two points on the same line in Ranybelong two points to the on same the EC same [2, line p.2]. in R belongConstructing to the same EC CPS [2, in p.2]. MathematicaRP space (CPS) can be difficult,induces especially the quadraticin high dimension. form:Author:any two AndrewWe created points Halsaver onIt the is discussed same line in in Agnew-Childress’R2 belong to the paper sameEC the [2,standard p.2]. approach to Computing canonical equivalenceAbstract classes (EC) in Clifford projective space ConstructingMethodology CPS in MathematicaMethodologyIt is discussed in Agnew-Childress’ paper the standard approach to con- MathematicaComputingour ownComputing and canonical package Clifford equivalence canonical in ProjectiveMathematica equivalence classes Spaces to (EC) perform classes in Clifford such (EC)Mentor: computations. in projective Clifford Dr. Alfonso projectivespace Agnewconstructing space projective −space overAuthor: a field. Andrew Further they Halsaver discuss how Methodology Methodology2 2 2It is discussed2 in Agnew-Childress’structing projectiven papern thespace standard over a field. approach Further to they con- discuss how to construct (CPS) can be difficult,(CPS) can especially be difficult, in high especially dimension. in high We dimension.created ourx Weownx created package=Mathematicax1 + our... + ownxp package andxp+1 CliffordConstructing... xp Projective+q CPS[1, p. inSpaces 205]. Mathematica Mentor:− Dr. Alfonso Agnew By(CPS) passing can beinComputing certain difficult, parameters especially canonical into equivalence high our user-defined dimension.· classes We (EC)Mathematica createdMethodology in− Clifford our own− projective package−to construct spaceConstructinga a projective projective CPS space space in where Mathematicawhere a a ring ring of ofk k k k matrices, , , isis used for scalars. They in Mathematicain to Mathematica performAbstract such to computations. perform such computations. By passing in certain By passing parame- inAuthor: certainstructing parame- Andrew projective HalsaverIt is discussedn space in over Agnew-Childress’ a field. Further paper they discuss the standard× how to approachconstructM to con- inConstructing Mathematica(CPS) can CPS to be perform difficult,in Mathematica such especially computations.Constructing in high dimension. By passingCPS in We inMathematica created certainAuthor: our parame- Andrew ownAuthor: package− Halsavernote AndrewIt that is discussed since Halsaveris in typically Agnew-Childress’ not a field,× paper we are the working standard with approach modules over to con- rings ters to our user-definedtersMathematica to our Mathematicafunction, user-defined and EquivalenceClass, function,Clifford Mathematica EquivalenceClass,Projective function, Furthermore,we can Spaces analyze EquivalenceClass, we Clifford can canonical analyzeMentor: algebraAbstract representatives weConstructing can is Dr.a generally analyze projective Alfonsostructing CPS non-commutative. spaceused Agnew in Mathematica projective for where scalars. a ringn They Forspace of k examplenote overkM thatmatrices, a thefield. since Further M ,is is typically used they for discuss not scalars. a field, how They to we construct are ComputingtersMathematica to ourItin is Mathematica user-defined canonical discussedMathematica and equivalence in Mathematica Clifford to Agnew-Childress’ perform and Projective classes such Cliffordfunction, (EC)computations.It paper Spaces EquivalenceClass,Projective is in discussed Clifford the standard By projective Spaces in passingMentor: Agnew-Childress’ weapproach can in space certainDr. analyzeMentor: to Alfonso con- parame- paperstructing Dr.instead Agnew the− Alfonso standard projective of vector× Agnew approach spacesn space [2, toover p.2]. con- a Their field. discussion Further they on construcing discuss how matrix to construct projec- canonical representativescanonical representativesofof ECEC in CPS. The The of ECexamples examples in CPS. Cliffordin in The this this examplespaper paper algebra, involve involve in this 2, 0 , paper has2 , 0 explicity,e1 involvee2 note=It isbute that2 discussed e 1a,since projectivewhereworkingM ineis Agnew-Childress’1,e typically space with2 is wheremodules a notcanonical a field, ring over paper basis of weringsk theare− of insteadk working standardmatrices, of with vector approach modules, is spaces used to over for con-[2, scalars. ringsp.2]. They canonicalstructing representatives projective n ofspace EC in over CPS. a field.structing The examplesFurther projective they inComputing this discussn paperspace how2 involve,0 canonical over to construct a field. 2, equivalence0 Furthera projective they classes discuss space (EC) where how inClifford to a construct ring ofprojectivekM k matrices, space , is used for scalars. They (CPS) can beters difficult, to our especially user-defined inthe high Mathematica algebra dimension. [1, p.188-189]. function,C WeC created EquivalenceClass, ourinstead own−C package ofAuthor: vector weC can{ spaces Andrew analyze} [2,tive p.2]. Halsaverspaces Their serves discussion as× the cornerstone on construcing for ourmatrix Mathematica projec-M package. explicity, but ourexplicity, MathematicaAbstract butour our Mathematica package MathematicaAbstract works package with package− Clifford works works algebras with with Clifford Clifford such asalgebras algebras 3,0 structingsuch − asas projective 3,0note thatTheirn since discussionspaceM overis typically on a field. construcing notFurther a field, matrix they we discuss areprojective working how× tospaces with construct modules serves as over rings in Mathematicaexplicity,a projectivecanonical but to our perform space Mathematicarepresentatives whereAbstract such a computations. ring package of of ECk inworksa projectivek CPS.matrices, By withThe passingC Clifford(CPS) spaceexamplesM in, is where cancertainalgebras used in be thisa for difficult, parame- ring such scalars.paper of ask especially involve They3,k0 matrices,note in2,0 high thatM dimension., since is usedM is for typically We scalars. created not They our a field, own wepackage are working with modules over rings and . Other Clifford algebras willMathematica be available and in our Clifford packageAuthor: Projective× in the Andrew future. Spaces HalsavertiveC spacesMentor: servesC× Dr.− as the Alfonso cornerstone Agnew for our Mathematica package. n+1 1,1 and 1,1. Otherand Clifford . Other Other algebras Clifford Clifford will algebrasalgebras be available willwill bebe in availableavailable our package in ourour ina package thepackage projective future. inin instead the space future.the where of cornerstone vector aC ring spaces of Thefork [2, our constructionk p.2]. matrices,Mathematica TheirM ofdiscussion, a is package. CPS used requires on for construcing scalars. that They we matrix take a projec-module, p,q , over the C Cters toComputing ournote user-defined1,explicity, that1 sincecanonicalComputing butM Mathematicais ourequivalence typically MathematicacanonicalWhat function,not is classes equivalence a field, projective packagenote EquivalenceClass, (EC) we that are works classes inspace? since working Cliffordin with MathematicaM (EC)is Clifford withprojective typicallywe in can Clifford modules algebras analyze to not space perform projective over a field, such rings aswesuch spaceinstead are 3 computations.,0 working of× vector with spaces modules By passing [2, over p.2]. in rings certain Their discussion parame- on construcing matrixC projec- Mathematica and Clifford ProjectiveC Spaces ComputingMentor: canonical Dr. Alfonso equivalence Agnewnote thatclasses sincen tive (EC) spacesis in typically Clifford servesC Clifford notprojective as the a field, cornerstone algebra. space we are Next, working for our we with Mathematica impose modules an equivalence over package.n+1 rings relation (denoted by ) on canonical(CPS)theinstead can future. representatives beand difficult, of vector . Other especially spacesof EC Clifford in [2, in CPS. p.2]. highA algebras projectiveThe Their dimension.instead examples will discussion space, be of We available in vector createdters thisdenoted on construcing topaper spaces in our our asour involve user-defined ownFP [2,The package p.2]., packagematrix is constructionM the Their in setMathematica projec- the ofdiscussion future. lines oftive a through CPS spaces function, on requires construcing serves the EquivalenceClass, originthat as the matrix ofwe cornerstone take projec- a module, we forcan our analyze p,q Mathematica, over the package. (CPS)Abstract1, can1 be(CPS) difficult, can especially be difficult, in high especially dimension. in high We dimension. created our2,0 We own created package ourn+1 own package C ∼ Introduction C n+1 instead of vectorC spaces [2, p.2]. Their0 discussion. The definition on construcing of is, matrix projec-n+1 Introductionexplicity,in MathematicaIntroductiontive but spaces our Mathematica to serves perform as the such package cornerstonethe computations. vector works space,tive for with our spaces By CliffordF Mathematica passing. servescanonical A algebras simple in as certainClifford the package. representatives such example cornerstoneas parame- algebra. would for of Next, be EC our the in Mathematica we realCPS.p,q impose projective The anexamples package. equivalence space, in this relation paper involve (denoted 2, by0 ) on Abstract in Mathematicain Mathematica to perform such to perform computations. such computations. By passing in By certain3The,The0 passing constructionconstruction parame- inC certain ofof−{ a parame-a CPS} requires requires that we we∼ take a module, module, ∼ p,q ,, over the n+1 Computing canonical1 equivalence classestive spaces(EC) n+12 in serves Clifford0C . The as the projective definition cornerstoneThe spaceof construction foris, our Mathematica of a CPS package. requires thatC we takeC a module, p,q , over the andters to 1,1 our. Other user-definedters Clifford to our user-definedMathematica algebrasRP will. Mathematica befunction, In available this case, EquivalenceClass, infunction, each our elementexplicity, package EquivalenceClass, of in we butp,qR the can our future.Clifford analyze Mathematica0n+1is we mapped canalgebra. analyze package toNext, its respective works we impose with equiva- Clifford ann+1 equivalence algebras such relation as (denoted3,0 by ) onC What is CliffordWhat Algebra? is CliffordIntroduction Algebra?TheIntroduction constructionters of to a our CPS user-defined requires thatThe Mathematica we construction take a module, function,C of a−{ CPS EquivalenceClass, } requires,Clifford over the that algebra.1Clifford weDefinition we take can∼ algebra.Next, a analyze module, we 1 impose: Next,u v weifan, and overimposeequivalence only the an if thereequivalence relation exists (denoted a relationλ ∗ (denotedsuch that byu =)v on λ . Computing canonicalCWhat equivalence is Clifford(CPS) classes Algebra? can (EC) be difficult, in Clifford especially projective in high space dimension. We createdp,qn+1 our own package p,q n+1 C ∼p,q canonical representatives of EC inlence CPS. class The (i.e. examples a line). in Topologically,and this paper 1,1The. Other involve construction theC Clifford set of2,0 EC, algebras0 ofRP. a The CPSrepresentsn will definition+1 requires be available a ofthat circle weis, insinceC∼ takeour package a module, in the future., over the ∈C ∼ Clifford AlgebraClifford is anWhat associative AlgebraClifford is Cliffordcanonical is algebra. algebra an associativeAlgebra? representatives equippedNext,canonical we algebra withimpose representatives of aequipped EC quadratic anClifford in equivalence CPS. with of algebra. form.The EC a examples quadraticin relation CPS. Next, The in (denoted form. we this examples impose paperp,q byby an involvein ) this equivalenceon on paper 2 , 0 involve relation 0 . The The 2 (denoted,definition0 definition by ofof ) is,is, on p,q (CPS) can be difficult, especiallyCliffordWhat in highAlgebra isin Clifford Mathematica dimension. is an Algebra? associative We to created perform algebra our such own equipped computations. packageC withDefinition a By quadratic2 passingC C 1−{ form. in: u certain} v Cifp,q parame- and only if there∼ exists a λ p,q∗ Csuch that u = v λ . explicity,n but+1 our Mathematicap,q packageany twop,q works points withn+1 on the Clifford same algebrasClifford line in R such algebra.belong as 3 to Next,,0 ∼ the sameweC impose EC−{ [2, an p.2].} equivalenceC relation∼∼ (denoted by ) on For example, considerFor example,Introduction the quadratic consider explicity, space the0 . quadraticR The but. definition Itexplicity, our is space anMathematican dimensional of but. ouris, It isMathematica package an realp,qn dimensional vector works0 . packageThe with definition real Clifford works vector algebras with of Cliffordis, such∼ algebras as 3,0 such as 3,0 ∈C in Mathematica to performCliffordFor example,p,q such Algebra computations.Clifford considerters is toan Algebra our theassociative user-defined quadratic By is passingR an algebra associative space Mathematica in certainp,qRequipped. algebra It parame- is function, anwith equippedn dimensionalan +1quadratic EquivalenceClass, with real aC quadratic vector we form.canC analyze ∼ and 1C,1. Other−{ Clifford} algebras will∼ be availableC −{ in our} package in the0 . future. TheDefinition∼ definition 1 : ofu is,v if andC only if there exists a λ p,q∗ such that u = v λ . space where n =spacep+q, where and hasn = ap non-degenerate+andq, and 1, has1. Other aand non-degenerate symmetric Clifford 1,1. Other scalar algebras symmetric Clifford product will be algebras which scalar availablep,qIntroduction product willp,q in be our availablewhich package in in our the package future.Definition in the future. 1 : u v if and only if there existsAuthor:Author: a λ Andrew Andrewp,q∗ such Halsaverthat Halsaveru = v λ . ters to our user-definedWhatform.spaceC Mathematicais Clifford whereForFor example, example,Algebra?nCcanonical= function,p+ considerq consider, and representatives EquivalenceClass,C has the the a non-degeneratequadratic quadratic of EC wespace space incan CPS.symmetric R analyze .. The ItC is is examplesan scalar an −{nn dimensional product} in thisDefinition which paper realvector involve 1. ∼∼ 2,0 ∼ ∈C ∈C induces the quadraticinduces form: the quadraticDefinition form: 1 : u v ifMethodology and only if thereDefinition exists a 1λ: u p,q∗v ifsuch andMathematica thatMathematica onlyu if= therev λ . exists and and CliffordC a Cliffordλ p,q∗ Projective Projectivesuch that u Spaces= Spacesv λ . Mentor:Mentor: Dr. Dr. Alfonso Alfonso Agnew Agnew canonical representativesCliffordinduces of ECspaceAlgebra the inCPS. quadratic whereexplicity, is The ann = examplesform:associative butp+ ourq, and Mathematica in algebra thishas a paper non-degenerate equipped packageinvolve withWhat works symmetric a is quadratic∼ with CliffordAuthor: Clifford scalar Algebra? form. Andrew algebras product suchwhich Halsaver as 3,0∈C dimensional real vector∼ space where n = p +q, and has2,∈C0 Definitiona 1 : u v if and only if there exists a λ p,q∗ such that u = v λ . IntroductionMathematicaIntroduction andIntroduction Clifford Projectivep,q Spaces C Mentor: Dr. Alfonso∼ AgnewC ∈C Author: Andrew Halsaver explicity, but ourFor Mathematica2 example,2 induces consider packageAuthor:22 and the worksthe quadratic Andrew1 quadratic2,12. with Other2 Constructing Clifford form:Halsaver spaceClifford algebrasR2 algebras. CPS It is suchin an will Mathematican asdimensional be 3 available,0Clifford real inAlgebra our vector package is an associative in the future. algebra equipped with a quadratic form. x x = x1 + ...non-degenerate+x xxp =xxp+1+ ...... symmetric+C x x2p+qx scalar[1,2... p. product 205].x2 which[1, p.2 induces 205]. the · − 1 x−x =−xp1 + ...p+1+ xpIt isxp+1 discussedp+q ... x inp+q Agnew-Childress’C [1, p. 205].Mathematica paper the and standard Clifford approach Projectivep,q to con- Spaces Mentor: Dr. Alfonso Agnew Mathematica and Cliffordand Projective 1,1. Other Spaces CliffordspaceWhat where is algebras· CliffordnMentor:What= willp Algebra?+ isq· be, Clifford andDr. available hasWhat− Alfonso Algebra? a non-degenerateis in− Clifford our Agnew−− package Algebra?− symmetricin− the future.For scalar example, product consider which the quadratic space R . It is an n dimensional real vector n n C quadratic form: 2 2 2 2 In Inthis this this definition, definition, definition, p,q∗ p,q∗ denotesdenotesdenotes the the the invertible invertible invertible elements elements Author: of ofof p,q p,q, Andrew and , , andand p,q p,qP HalsaverPisis the the inducesClifford the quadratic Algebra form: is anx associativex =structingx1 + ... algebra+ projectivexp equippedxp+1 n ...spacespace withxp over+ where aq quadratic a[1, field.n = p.p 205]. Further+ form.q, and they has a discuss non-degenerate how to construct symmetric scalar product which Furthermore, CliffordFurthermore, algebra Clifford is generally algebraCliffordIntroduction non-commutative. is Algebra generallyClifford is annon-commutative. Algebra associative For example is an algebra associative the For equipped example algebra with the equipped a quadratic withn form. a quadraticC C form. C C C C Furthermore,In this definition,Clifford algebra· denotes is generally thep,q− invertible non-commutative.−− elements− of ForMathematica example, and projective projective the and isis the the space Clifford space projective defined defined Projective space as as the the defined set Spacesset of of EC ECas [2, the [2, p.2]. set p.2].Mentor: of EC [2, Dr.p.2]. Alfonso Agnew Abstract For example, consider the quadraticp,q∗a projective space space. It is where an ninducesp,qdimensional a ring theof k quadratic realp,qp,qk matrices, vector form:p,qPM, is used for scalars. They CliffordIntroduction algebra,Clifford 2,0, has algebra,e1e2 = For2,0e,2example, hase1, wheree1e2 =For considereC1 example,e,e2e21, theis where a quadratic consider canonicalRe1,e2 the space basisis quadratic aR canonical of . It space is an basisCn dimensional of. It isC an n realdimensional vector real vector n C CliffordprojectiveFurthermore, algebra,− What2 space 2 is,0, Clifford Clifford{ hasdefined2e1}e algebra2 Algebra? as2= thee2 set ise1, generally of where2 EC [2,e non-commutative.p.2].1,e2 is a canonicalR× In For this basis example definition, of the p,q∗ denotesAuthor: the invertible Andrew elements Halsaver of p,q, and p,qP is the the algebra [1, p.188-189].space whereC xnspace=xp=+ wherexq1C,+ and...n has=+−px+p anote non-degenerateq,x andp+1 that− has since...{ a non-degeneratexMp symmetric+}qis typically{[1, p. scalar} 205].symmetric not a product field, scalar we which are product workingRecallRecall which with that thatC modules Agnew-Childress Agnew-Childress over rings discuss discuss the the use use of of matricesC matrices asC as scalars. scalars. As As such such Computing canonical equivalencethe algebra classesthe (EC)[1, algebra p.188-189]. in Clifford· [1, p.188-189]. projectivespace where− spacenMathematica−= p+−q, and has and a non-degenerate Clifford Projective symmetric2 Spaces scalar2 product2 Mentor: which2 Dr. Alfonso Agnew What is Clifford Algebra? Clifford algebra,Clifford Algebra2,0, has e is1e an2 = associativee2e1, where algebrae1,e equipped2 xis axprojective canonical= withx + a... quadraticspace basis+ x ofdefinedx form. as... thex set of EC[1, [2, p. p.2].205]. induces the quadraticinducesRecall the thatform: quadratic Agnew-Childressinstead form: of vector discuss spaces the use[2, p.2]. of matrices Their discussion as scalars.weRecall1 we can on can As that make construcing suchmakep Agnew-Childress use usep+1 of of matrix matrix matrix isomorphims projec-p isomorphims+ discussq the which use which of relate relatematrices to to Clifford Cliffordas scalars. algebra. algebra. Asn So, So, using using (CPS) can be difficult, especially in high dimension. We created ourinduces ownC package the quadratic− form: {p,q } · In this definition,− ∗ denotes− − the invertible elements of , and is the What isClifford a projectiveWhat Algebra space?Furthermore, is a is projective an associativethe Clifford space? algebraFor algebra algebraexample, [1, p.188-189]. equipped is considertive generally spaces with the non-commutative. serves aquadratic quadratic as the space form. cornerstoneR For. It example is for an ourn dimensional the Mathematican+1n+1 real package. vectorp,q n n p,q p,qP What iswe a can projectiven make use space? of matrix isomorphims which relate to Clifford algebra.suchRecall p,q p,q So, weto that usingtocan build build Agnew-ChildressmakeC p,q p,quseP P, weof, wematrix start start discuss by isomorphims by programming programming the use of which matrices these theseC relate matrix matrix as scalars.to isomorphisms.CliffordC isomorphisms. As such An An in Mathematica to perform such computations. By passing in2 certainp,q parame-2n 2 n 2 projective space defined as the set of EC [2, p.2]. ForA projective example, consider space,AClifford projective denoted the algebra,quadratic space,asnxFP+1x spacedenoted,space=2 is,0,x the has+ where set... ase.1+e ofIt2nx isn= lines=, an is2pxe+n throughthe2eqdimensional1,, and set where...2 of has the linesx2 ae origin2 1 non-degenerate real,e through2 vector[1, ofFurthermore,is2 ap. thecanonical2 205].2 origin symmetric Clifford basis of 2 of scalarC algebraC product is generallyC whichC non-commutative. For example the A projectiveWhat to is build a space,1 projectiveR x denotedxFP=p, space?x we1 +p asstart+1...xFP+x byx,=p is programmingx thexpp++1q set... + of...x lines thesexpx+ throughq matrix[1,... p. the isomorphisms.wex 205]. origin canexampleexample make[1, of p. use of An205]. of a of matrixa matrix matrix isomorphismn isomorphims+1 isomorphism would which wouldn be, relate be, to Clifford algebra. So, using ters to our user-defined Mathematica function,n+1 Furthermore, EquivalenceClass,np,q+1· C Clifford we canalgebrap,qP analyze−− is Thegenerally− constructionIn− thisnon-commutative.1{ definition, of} ap CPSp +1 requiresp,q∗ Fordenotes thatp the+qalgebra. we invertible take aSo, module, elements using of , top,q over build, and the p,q P weis the start by programming thespace vector where space,then =F vectorpthe+.q algebra,A and space, simple hasC [1, a examplep.188-189]. non-degenerate.induces A simple wouldn+1C the· example be symmetric quadratic the real would form: scalar projective· be− product then realspace,− whichClifford projective−− algebra,− space,− n2,+10, has e1e2 = en2e1, wherep,q e1,e2 is a canonical basis of the vectorF space,A projectiveF . A space, simple denoted example as would, is be the the setC real of lines projective through p,q Recall space, theto build origin that Agnew-Childress ofp,qP , weC startC by discuss programmingC the use these of matrices matrix as isomorphisms. scalars. As such An 1 1 example2  of a matrix2 isomorphismClifford algebra.projective wouldFP Next, be, space we impose defined an as equivalence theC set of EC relation [2, p.2].− (denoted by { ) on} canonical representativesRP induces. In ofEC this the in case, CPS. quadratic each. The InFurthermore, element this examples form:example case,1 of each R inthe Clifford this Clifford element0 paper algebrais mapped ofalgebra, involve is generally to 0 2 its, 0 is , respective mapped2has non-commutative.e 1e2 = to equiva--e its1e2 respective,the where algebra For{e equiva-1e example [1,2} is p.188-189]. aC thethese matrixC isomorphisms. An example of a matrix isomorphism RP RP . InFurthermore, this case, eachFurthermore, Clifford elementR n+1Cn+1 algebra of R Clifford2 is0 generally algebrais2 mapped2 non-commutative. is generally to its respective2 non-commutative.weexample For can equiva- examplemake of a use matrix Forthe of matrix example isomorphism isomorphims the 2,∼ 02 would,0 MatMat which be, (2, (2R, relate)R) [1, [1,to p.205] Clifford p.205], , algebra. So, using explicity, but our Mathematica package worksWhat with is a Cliffordprojectivethe vector−{ algebras space?} space, suchF asx−{1 x. }= Ax simple0+ .... The+ example1x definitionx would of... beis,x the real[1, projective p. 205]. space, C C  lence class (i.e.lence a line). classClifford Topologically,canonical (i.e. algebra, a line).Clifford basis1 the Topologically, set2of,0 algebra, ,the of has EC,algebrae1eRP2 the = p,q,[1,represents set has3 ,e0p.188-189].n2 ofe1e EC,,e−{ where= a} circle2 eprepresentsee1,e since,2p where+1is1 a a canonical circlee ,ep+ sinceq is basis a canonicaln+1would of be, basis of n lence class (i.e. a line).Clifford Topologically,C2,·0C algebra, −{2,0 1 the}2Mat RP2 set,0, (2Recall of has,2− EC,1)e1e that[1,RP2 =− p.205]represents Agnew-Childress∼e−21e, 1,2 where a circle p,qe1 discuss,e sinceto2 buildis thea canonical usep,qP of, wematrices basis start of by as programming scalars. As such these matrix isomorphisms. An A2 projectiveRP2 2. space,C In2 this denoted case,C2 each as FP− element, is the of setR−{ ofR lines0 }Whatis through mapped{ is a theprojective to} originits respectivespace? of equiva- and 1,1. Other Cliffordany two algebras points will onx the bex = availablethe samex algebra+ line... in+ in our[1,x p.188-189]. packagebelongx to in... thexsame future.C EC[1,[2, p.2 C 205].p.2]. −{ } − 1 C { wherewhere} Mat(2 Mat(2C , ,) isn) is short-hand short-hand for for a 2-dimensionala 2-dimensional matrix matrix having having real real entries. entries. C any twothe points vector1any two on space,lence thethe pointsRp same algebra classpn+1 on line. (i.e.the A [1,the in samesimple a p.188-189].R line). algebrap+belong lineq example Topologically, in [1, toR p.188-189]. thebelong wouldwe same can the tobe makeEC the theset [2,same ofuse real p.2]. EC,A of projective EC projective matrix [2,represents p.2]. isomorphims space,example space, a denoted circle of which a matrixsince as relateR R isomorphism, to is Clifford the 2,0 setMat algebra. of would lines (2, R be, through) So, [1, using p.205] the, origin of · − FFurthermore,− − CliffordDefinition algebra 1n:+1 isu generallyv if and non-commutative. onlyRPn if there exists For a λ example p,q∗ FPsuch the thatC u = v λ . 1 What iswhere a projective Mat(2, Rspace?) is short-hand2  for a p,q 2-dimensional2 to∼ build p,q matrixP , we having startn by real+1Other programmingOther entries. isomorphisms∈C isomorphisms these could matrix could be isomorphisms.be programmed programmed using An using helpful helpful properties, properties, such such as, as, Furthermore, CliffordRPWhat. algebra In is this a projectiveany case, isWhat generally two eachClifford is points space? a elementprojective non-commutative.What algebra, on the of is space?R samea projective 2,0, line0 hasis inC Fore mapped1 space?Re2 examplebelong= toethe2eto its1 the,C vector thewhere respective same space,e1 EC,e equiva-2 [2,Fis p.2]. a. canonical A simple basis example of would be the real projective space, Introduction Methodology −{n } example of1 a matrix1 isomorphismwhere would Mat(2 be,, R) is short-hand2 2,0 forMat a 2-dimensional (2, ) [1, p.205] matrix, having real entries. MethodologylenceAMethodology classA projective projective (i.e.Other a line).space, isomorphismsspace, Topologically, denoted denoted could as as C the be ,, setis programmed is the the of EC,set set of ofn lines− lines usingrepresents through through. helpful Inn this athe{ theproperties, circle case, origin} since eachwhere of such element as,Mat( of2, ) is short-hand0 is mapped for a to 2-dimensionalR its respective matrix equiva- having real Clifford algebra, 2,0, has e1e2 = Athee projective2e1 algebra, where [1, space,Ae projectivep.188-189].1,eFP2 denotedis a space,canonical as FP denotedRP, isbasisRP the as of setFP of, lines is the through set of lines the origin through ofR the originC  of C − n+1 {n+1 2 } Other isomorphisms could−{ } be p+1 programmedp+1,q+1,q+1 Mat1 Mat (2 using, (2, p,q helpfulp,q)) [1, [1, p.214] properties, p.214]. . such as, What is Clifford Algebra?Constructingthe algebra CPS [1, p.188-189].inany Mathematicathe twoorigin vector points of space,Methodology thethe on vector theF same. space, Athe line simple vector F in R example. A space, Abelong simple simple would to nexample+1 examplethe. Abe same simple thewouldlence would EC real example [2, classbe projective be p.2].the (i.e.the would real a line). space, projective beentries. Topologically, the real Other space, projective isomorphisms the setC space,C of EC, couldRP representsbe programmedC C a circle using since helpful Constructing1Constructing CPS in Mathematica CPSWhat in is Mathematica a projective2 space?F 2,0whereMat Mat(2 (2, R,)R) [1, is short-handp.205]2 , for a 2-dimensional matrix having real entries. . In this case,1 each element1 of p+1,q+1 0 Matis mapped (22 , p,q any to) its [1, two respective p.214]2 points. on equiva- the same line in belong to the same EC [2, p.2]. Clifford Algebra isIt an is associative discussed algebra inRP Agnew-Childress’real equipped projectiveRP with. Inspace, paper a this quadratic case, the . standardIn each In form.this thisR element case, case, approach each each of R element element to con-0 n isof of mapped  0 to C isis its mapped respectiveproperties, to its equiva- respective suchR as, equiva- It is discussedIt is in discussed Agnew-Childress’A in projectiveRP Agnew-Childress’C paper space,−{ the denoted} paper standard asC the1 approach standard, isR the set to approach con-of lines to through con- the origin of Mat (2, ) [1, p.214]. For example, consider theWhat quadratic is a projective spacelencep,q space?. class It is an (i.e.Constructingn dimensional a line). Topologically, CPS real in vector Mathematica the set of EC,−{FP}represents−{ a1 } circleOther sinceInIn isomorphisms a1 ageneral general case, case, could givenp+1 given,q be+1 the programmed the CPS CPS dimension dimensionp,q using helpful (n (),n), we properties, we build build a acolumn such column as, of ofk k k k structing projectivestructingnMethodologyRspace projective over alencen field.space class Furthern over (i.e.lence a athey field. line). class discuss Further Topologically, (i.e.n+1 how a line). theywhere to construct discuss Topologically, the Mat(2RP set how, of) EC, to is theshort-handconstructRP setrepresents of EC, for a a 2-dimensional circlerepresents since aC circlematrix since having realC entries. mappedstructing to projective itsIt therespective is discussed vectorn space space,equivalence in over Agnew-Childress’2 a field.. class A simple Further (i.e. a example paperline). theyR Topologically, discuss the would standard how be to the approach construct realRP projective to con- space, ×× space where n = p+q, and hasA projective a non-degenerate space,−any two denoted symmetric pointsInany a as− on general twoFP scalar the points,same is productcase, the− on line set given the ofinwhichsame linesRF thebelong throughCPS line in todimension the the2 belong same origin (n ECto), of the we [2,2 same p.2].build EC a column [2,matrices p.2].matrices of k whose whosek entries entries belong belong to to p,q p,q. Exclusively. Exclusively for for this this discussion, discussion, we we build build a projective spacea projective wheren+1 a ring space of wherek k amatrices, ring1 ofanykM two,k is pointsmatrices, used for on scalars.M the,R is same used They line2 forMethodology in scalars. R belong They to the same EC [2, p.2]. C C the vector space,ConstructingF thea. projective A set simple ofstructing CPS EC,× examplespace inRP Mathematica projective .represents where In would this a case, ringn be a spacecircle the of eachk real oversince elementkOther projectivematrices, a any field. of isomorphismstwoR FurtherM space,points, is0 used they onis mappedcouldthe for discuss scalars. beIn to programmed how a itsa general They 2(ato respectiven 2( construct+n + 1) case,× 1)using2 equiva- givenmatrix,2 helpful matrix, p+1 the,q and+1 properties, andCPS considerMat consider dimension (2 such, if ifp,qthe as, the) (n columns), [1, columns we p.214] build are. are linearly a linearly column independent, independent, of k k or or induces the quadraticnote form: that since is typically notmatrices a field, we whose are working entries× belongwith modules× to p,q over. Exclusively rings−{ } for this discussion,1 we build C  C 1 Mnote that sincenoteM thatis typicallysincelence2M is not class typically a field, (i.e.− nota we line). are a field, working Topologically,C we are with working modules the set with of over modules EC, rings overrepresents rings a circle×× since a a × RP . In this case, eachItsame element is discussed lineaa 2( projective ofinn R+ 1) in belong Agnew-Childress’02 space matrix,is to mapped the where same and a to consider ring EC itspaper [2, respective of p.2].k the if thek standardmatrices, columns equiva-Constructing approach areM, islinearly used CPSRP tomatrices for con-in independent,linearlyMathematica scalars.linearly whose dependent They dependent or entries belong(LI (LI or or LD) to LD) .p,q.. Exclusively for this discussion, we build instead of vectorinstead spacesMethodology of [2, vector p.2]. spaces TheirMethodology discussion [2,−{ p.2].} Their on construcing discussion1 onmatrix construcing× projec-2 matrix projec- C 2lence class2 (i.e.2 astructing line).instead Topologically,2 projectivenote of vector thatanyn the spacessince× twospace setMethodology pointsof over[2, EC,is p.2]. typically aon field. the Theirrepresents samea Further not discussion a line field, a they incircle weR on discuss are sincebelong construcing workingIt how is to discussed theto withp+1 matrixconstruct same,q+1 modulesIna 2( inaEC projec-nMat general+ Agnew-Childress’ [2, 1) over (2 p.2]., 2 case, ringsp,q matrix,) given [1, p.214] and paper the consider. CPS the dimension standard if the columns approach (n), we are build to linearly con- a column independent, of k ork x x =tivex1 + spaces... + x servesp xp as+1 the... cornerstonexp+q linearly[1, for p. our dependent 205]. MathematicaM (LIRP or package. LD) . C  C · −tive spaces− −tive serves spaces as the serves cornerstone2 − as the cornerstone for our Mathematica for our Mathematica package. package. Now,Now,× there there are are several severala canonical canonical forms forms which which a matrixa matrix can can take take in in order× order for for any two points onaConstructing the projective same line spaceinsteadConstructing CPS in R where in ofbelong Mathematica vector aConstructing CPS ring to spaces the ofink Mathematicasame [2,k p.2].CPS ECmatrices, [2, inTheir p.2]. MathematicaM discussion, isstructing used for on construcing scalars. projectivematriceslinearly Theyn matrixspace dependent whose projec- over entries a field. (LI orbelong Further LD) to. they p,q. discuss Exclusively how to for construct this discussion, we build × n+1 its−its columns columns to to be be LI LI or or LD. LD.C However, However, we we can can significantly significantly reduce reduce the the number number Furthermore, Clifford algebraThe construction is generally of non-commutative. aIt CPS is discussedrequiresNow, that in there For Agnew-Childress’ we example aretake several a module, the canonical paper In a, formsthe overgeneral standard whichthea case,projectiven a+1 approach matrix given space the cann+1 to CPS take where con-in dimension a order ring for of (kn),k wematrices, build a column, is used of k for scalars.k They Thenote construction thatThe sincetive constructionM of spacesItisMethodology a typically is CPS discussed serves requires of not a asIt CPS athe in isthat field, Agnew-Childress’ discussedrequires cornerstone we we take are thatp,q in working a for Agnew-Childress’ module, we our take paper with Mathematica a module, modulesp,q the, overstandard paper over thepackage. a rings the, approach 2( overn standard+ the 1) to2 con-approach matrix, and to consider con- M if the columnsDIMENSIONS are linearly independent,72 or C p,q ofNow,of cases cases there× by by areusing using× several matrix matrix canonical reductions.a reductions. forms Moreover, which Moreover,× a matrix matrix matrix can reduction reduction take in isorder is a aresult for result of of Clifford algebra, Clifford, has e algebra.e = e Next,e ,structing where we imposee projectiveits,e ancolumnsis equivalence an canonical tospace beover LIrelation basis or a LD. field. of (denoted However, Furthermatrices by they we) whose can ondiscussnoteC significantly entries that how since belongtoC construct reduceis to typically the p,q. number Exclusively not a field, for we this are working discussion, with we modules build over rings 2,0 Methodology1 2 Clifford2 1instead algebra.Clifford of vector1 Next,structing algebra.2 spaces we impose projectiveNext, [2,structing p.2]. we an Their imposen equivalence projectivespace discussion an over equivalence relationn a field. onspace construcing (denoted Further over relation a field. they by matrix(denoted discussFurther) onM projec-linearly by how theyn)+1 to discussdependenton construct how (LI to orconstruct LD) . b b C n+1 − { } TheConstructing construction− CPS of ain CPS Mathematica requires∼ that we take a module,its columnsapplying Cp,qapplying, over to elementary the beelementary LI or LD. matrices matrices However, from from welinear linear can algebra, significantly algebra, so so it it is reduce is a familiara familiar the number process process[3,[3, the algebra [1, p.188-189]. 0 . Then definition+1 a projective ofn+1 ofis, space cases where by using a ring matrix of−k reductions.k matrices,a 2(−n Moreover,+ 1), isinstead used2 matrix,matrix for of scalars. vector∼ reduction and consider spaces They∼ isa [2, if resultthe p.2]. columnsof Their discussion are linearly on independent, construcing matrix or projec- p,q p,q tive0 spaces. The serves definitiona projective as the of cornerstone spaceis, where for a our ring Mathematica of k Mk matrices, package., is used forC scalars. They C −{ } p,q ∼ Clifford0 . The algebra. definitiona projective Next, of we×is, space impose where an equivalencea ring× of k relationMk matrices, (denoteda MNow,, is by used thereb) for onare scalars. several Theycanonical forms which a matrix can take in order for ConstructingC CPS−{note in Mathematica}C that since−{applying} iselementary typicallyIt∼ is discussed not matrices a∼ field, in we fromAgnew-Childress’ arelinearly linear working× dependent algebra,tive with paper spaces modulesso× (LI it the isserves or a overLD) standard familiarof as rings. cases thep.172].p.172]. process approach cornerstone by using[3, to matrix for con- our reductions. Mathematica Moreover, package. matrix reduction is a result of noten+1M that sincenoteM is that typically since notis a typically field, we not are a working field,n we+1 with are working modules with over∼ modules rings over rings b What is a projective space? The construction p,q structing0 of. a The CPS projective definition requiresn that ofMspace weis, over take a a field. module, Further p,q they, overapplyingits discuss the columns how elementary toconstruct be LI matrices or LD. However, from linear we algebra, can significantly so it is a familiar reduce the process number[3, DefinitionIt is discussed1 : Definitionu vinstead inif and Agnew-Childress’ 1 of only: vectorp.172].u ifv thereif spaces and exists paper only [2, ap.2]. ifλ the there Their standard p,q∗ existssuch discussion a thatapproachλ u on =∗ v construcing tosuchλ . con- that u matrix= v λ . projec- n DefinitionCinstead−{ 1 : ofu } vectorinsteadv if spacesand of only vector [2, if− p.2]. there∼ spaces Their exists [2,p,q discussion p.2].a λ Their p,q∗ onCsuch discussion construcing that u on= matrixv construcing λ . InIn general, projec- general, matrix p,q p,qis projec-is non-commutative, non-commutative, so so wen we+1 have have to to distinguish distinguish between between right right A projective space, denoted as FP∼,Clifford is the set algebra. of lines∼ Next, througha projective we the impose origin space∈C an of where equivalence a ring∈CNow, of relationk therek matrices, (denotedThe are several construction by, canonical isp.172].of) used cases on of for a formsby CPS scalars. using whichrequires matrix TheyC aC matrixthat reductions. we can take take a Moreover, module, in order for matrixp,q , over reduction the is a result of structing projectivetiven spacesspace over serves a field. as the Further∼ cornerstone they discussfor our Mathematicahow to construct∈C package. M ∼ andand left left scalar scalar actions actions (i.e. (i.e. multiply multiply a matrix a matrixC by by a scaling a scaling matrix matrix with with said said scalar scalar n+1 n+1−  tiveDefinitionIn spaces general, serves 1tive: p,qu spaces asis thev non-commutative,if cornerstoneserves and only asits the if columns there for cornerstone so our× we exists toMathematica have be a forλ toLI distinguish ouror LD.∗ Mathematica package.such However, between that u wepackage.= rightv canλ . significantly reduce the number b the vector space, F .a projectiveA simple example space wherep,q would a0 ring be. The the of k realdefinitionnotek projectivematrices, that of since space,is,M,is is typically used for not scalars. a field,Clifford They we are algebra. workingp,q applyingNext, with modules we elementary impose over an rings matrices equivalence from relation linear algebra, (denoted so by it is a) familiaron process [3, 1 2 C −{ } C ∼∼M n+1 ∈Cn+1 onInon general,the the left, left, or or right,is right, non-commutative, of of the the matrix matrix itself) itself) so we [2, have[2, p.2]. p.2]. to The distinguish The∼ use use of of row between row reduction reduction right on on a a . In this case, each element of 0 isThe mapped constructionand to left its× scalar respective of a actions CPS equiva- requires (i.e. multiply thatof we a cases matrix take by a bymodule, using a scaling matrix0 . matrix The reductions., over definition with then said+1 Moreover, of scalaris,p,qn matrix+1 reduction is a result of RP note that sinceR is typically not a field,Theinstead construction we are of vector workingThe ofconstruction spaces a with CPS [2, modules requires p.2]. ofa Their over CPS that rings discussion requiresp,qwe take athatp,q on module, construcing wep.172]. take p,q a module,, matrix overC the projec- , over the −{M } 1 C −{C } andmatrix leftmatrix scalar is equivalentis∼ actions equivalentp,q (i.e. to to the multiply the action action a of matrix of using using leftby leftb a scalar scaling scalar actionaction matrix on on awith 2 a 22( saidn2(+1)n scalar+1) matrix. matrix. lence class (i.e. a line). Topologically, theClifford setDefinition of EC, algebra.on 1 the:representsu left,Next,v orifwe and right, a circle impose only of since the ifan there matrix equivalence existsapplying itself) a λ relation[2, elementary p.2]. p,q∗ such (denotedThe that usematrices ofu by= rowv from λ) reduction . C on linear onalgebra, aC so it is a familiar process [3, instead of vector spaces [2, p.2].RPClifford Theirtive∼ discussionalgebra. spacesClifford serves Next, on algebra. construcing as we the impose cornerstone Next, matrix an we equivalence∈Cimpose projec- for our an Mathematica relation equivalence (denoted package. relation by (denoted) on by ) on ×× 2 n+1  matrixn+1 is equivalent to the action of using left scalarDefinition action on 1 a: 2onu∼2( theSimilarly,nInv +1)Similarly,if left,general, and matrix. or∼ only for right, for columnp,q if column there ofis non-commutative, the reduction, exists matrix reduction, a λ itself) one one must∗[2, so mustsuch we p.2]. use have use that rightThe right tou use scalardistinguish= scalarv of λ . row action action reduction between on on a 2( an2( right on+1)n+1) a 2 2 any two points on the same line in R belong p,q to the0 same. The EC definition [2, p.2].n of+1 is, p.172]. C ∼ p,q tive spaces serves as the cornerstonep,q for our0 . TheMathematica p,q definition0 . The package. of definitionis, of is, ×∼ n+1 ∈C ×× C −{ }Similarly,C −{ forThe} column construction∼ reduction, of onea∼ CPS must requires use right that scalar we take action amatrixand module, onmatrix. leftamatrix. 2( is scalarn equivalent+1) p,q This Thisactions,2 over is istovery thevery (i.e.the convenient, action convenient,multiply of using a since matrix since left Mathematica Mathematicascalar by a scaling action can matrixon can executea 2execute with2(n row+1) said row andmatrix. scalar and column column C −{ } ∼ C × Clifford algebra. Next, we imposenIn+1 general, an equivalence p,q is non-commutative, relationonSimilarly, thereduction (denotedreduction left, for or so column× quickly.right,by we quickly. have) of reduction, on Furthermore,the to Furthermore, distinguish matrix one itself) must not between not only[2, use only p.2]. right canright can The we scalar we use use use action matrix of matrix row on reductionsreduction a reductions 2(n+1) on to to2a yield yield The constructionDefinition of a CPSmatrix. requires1Definition: u Thisv thatif is and1 very we: u onlytake convenient,v ifaif theremodule, and onlysince exists ifp,q Mathematica a thereλ, over exists p,q∗ theCsuch a canλ that execute ∗u =such rowv λ . that andu column= v λ . Methodology ∼n+1 Definition 1 : u andC v leftif and scalar∈C only actions if there (i.e. existsp,q multiply a λ a matrix p,q∗ such by∼ that a scalingu = v matrix λ . with said scalar × Clifford algebra. Next, we imposereduction an p,q quickly.equivalence0 Furthermore,. The∼ relation definition (denoted not∼ of onlyis, by can) we on use matrix∈C reductionsmatrixmatrix.equivalence∈Cequivalence is This equivalent to yield is very classes, classes, to convenient, the we action we can can sincealsoof also using determine Mathematica determine left scalar what what action can scaling executescaling on a matrix2 matrixrow2(n+1) and is is necessary matrix.column necessary to to C −{ } ∼ × Constructing CPS in Mathematica n+1 0 . The definition ofequivalenceis, classes, we can also determineon the left, what∼ or right, scaling of the matrix matrixreductionSimilarly, isperform necessary itself)perform forquickly. [2, the column the top.2]. reduction reductionFurthermore, The reduction, use process. process. of one row not must reduction only use can right we on use a scalar matrix action reductions on a 2(n to+1) yield2 C p,q −{ } ∼ × It is discussed in Agnew-Childress’ paper theperform standard theDefinition approach reduction to 1 process.: con-u v if andmatrix only is ifequivalent there exists to the a λ actionequivalencematrix. p,q∗ ofsuch using This that left classes, isu veryscalar= v weconvenient, λ action . can alsoon a since determine2 2( Mathematican+1) matrix.what scaling can execute matrix row is necessary and column to ∼ Similarly, for column reduction,∈C one must use right scalar action× on a 2(n+1) 2 structing projective n spaceDefinition over a field. 1 : u Furtherv if and they only discuss if there how exists to construct a λ p,q∗ such that u = v λ . performreduction the quickly. reduction Furthermore, process. not only can we use matrix reductions to yield − ∼ ∈C matrix. This is very convenient, since Mathematica can execute row and column× a projective space where a ring of k k matrices, M, is used for scalars. They equivalence classes, we can also determine what scaling matrix is necessary to × reduction quickly. Furthermore, not only can we use matrix reductions to yield note that since M is typically not a field, we are working with modules over rings performa.a. Recall the Recall reduction the the definition definition process. of of linear linear dependence. dependence. As As stated stated in in Goode-Annin’s Goode-Annin’s text, text, “A “A finite finite equivalence classes, we can also determine what scaling matrix is necessary to instead of vector spaces [2, p.2]. Their discussion on construcing matrix projec- nonemptynonempty set set of of vectors vectors v , v v ,, v ..., v, ..., v inin a vector a vector space spaceV Vis saidis said to to be be linearly linearly dependent dependent if if a. Recall the definition of linear dependence. As stated in Goode-Annin’s text, “A finite { 1{ 12 2 k}k} perform the reduction process. tive spaces serves as the cornerstone for our Mathematicanonempty package. set of vectors v , v , ..., v in a vector space V is said to be linearlya. Recalltherethere dependent exist the exist definition scalars scalars if c1,cc of12,c, linear ...,2, ..., ck, c dependence.notk, not all all zero, zero, such As such thatstated thatc1 v inc11 v+ Goode-Annin’s1c+2 v c22 v+2...++...c+k vck text, v =k 0.”= 0.” “A Furthermore, Furthermore, finite { 1 2 k} there exist scalarsnc+11,c2, ..., ck, not all zero, such that c1 v 1 +c2 v 2 +...+ck v knonempty= 0.”a seta Furthermore, set of set ofvectors of vectors vectors which which v 1 is, v notis2, not ..., v linearly linearlyk in dependent a vectordependent space is calledisV calledis linearly said linearly to independentbe independent linearly dependent [3, [3, p.269]. p.269]. if The construction of a CPS requires that we take a module, p,q , over the { } C a.there Recallb. existb. A Amatrix thescalars matrix definition thatc ,c that is, ..., foundisof c found linear, not by byall dependence.using zero,using a such single a single that As elementary elementaryc stated v +c in v row Goode-Annin’s+ row... operation+ operationc v = on 0.” on text, the Furthermore, the identity “A identity finite matrix matrix Clifford algebra. Next, we impose an equivalence relationa set of vectors (denoted which by is not) linearly on dependent is called linearly independent [3, p.269]. 1 2 k 1 1 2 2 k k ∼ nonemptya setis, ofis, byvectors by set definition, definition, of which vectors an is an not elementary v elementary, linearly v , ..., v matrix dependent matrixin a [3,vector [3, p.172]. is p.172]. called space linearlyV is said independent to be linearly [3, p.269]. dependent if n+1 0 . The definition of is, b. A matrix that is found by using a single elementary row operation on the identity matrix { 1 2 k} p,q a. Recall the definition of linear dependence. As stated in Goode-Annin’s text, “A finite C −{ } ∼ is, by definition, an elementary matrix [3, p.172]. thereb. A existmatrix scalars thatc is1,c found2, ..., byck, usingnot all a zero, single such elementary that c1 v 1 row+c2 operation v 2 +...+ck on v k the= 0.” identity Furthermore, matrix nonempty set of vectors v 1, v 2, ..., vis,a setk by ofin definition, vectors a vector which an space elementary isV notis saidlinearly matrix to be dependent [3, linearly p.172]. dependent is called linearly if independent [3, p.269]. Definition 1 : u v if and only if there exists a λ p,q∗ such that u = v λ . { } ∼ ∈C there exist scalars c1,c2, ..., ck, notb. all A zero, matrix such that that isc1 found v 1 +c2 by v 2 + using...+c ak v singlek = 0.” elementary Furthermore, row operation on the identity matrix a set of vectors which is not linearlyis, bydependent definition, is called an elementary linearly independent matrix [3, p.172]. [3, p.269]. b. A matrix that is found by using a single elementary row operation on the identity matrix is, by definition, an elementary matrix [3, p.172]. Author:Author: Andrew Andrew Halsaver Halsaver Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew Mathematica and Clifford Projective Spaces Mentor:Author: Dr. Andrew Alfonso Halsaver Agnew Mathematica and Clifford Projective Spaces Mentor:Author: Dr. Andrew Alfonso Halsaver Agnew Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew ResultsResults Results ResultsWeWe can can take take advantage advantage of of matrix matrix representatives representatives of of Clifford Clifford elements elements in in order order toResults determine what scaling matrix is required to reduce a matrix appropriately. to determineWe can take what advantage scaling matrixof matrix is representatives required to reduce of Clifford a matrix elements appropriately. in order To do this, we look to the definition of elementary operations on matrices from toTo determine doWe this, can wetake what look advantage scaling to the matrix definitionof matrix is requiredrepresentatives of elementary to reduce operations of Clifford a matrix elementson appropriately. matrices in order from linear algebra [3, p.172]. We will work with right scalar action, but note that left Totolinear determine do this,algebra we what [3, look p.172]. scaling to the We definitionmatrix will work is requiredof with elementary right to reduce scalar operations action, a matrix on but matricesappropriately. note that from left scalar action is analogous and can be done by transposing matrices [2, p.2]. linearToscalar do algebrathis, action we is [3, look analogous p.172]. to the We and definition will can work be of donewith elementary right by transposing scalar operations action, matrices buton matrices note [2, p.2]. that from left scalarlinear algebra action is [3, analogous p.172]. We and will can work be done with byright transposing scalar action, matrices but note [2, p.2]. that left scalarConsiderConsider action is a a analogous matrix matrix with with and LD LD can columns, columns, be done byand and transposing let’s let’s consider consider matrices working working [2, p.2].with with the the scalar action is analogous and can be done by transposing2 matrices [2, p.2]. projectiveConsider line. a That matrix is let’s with suppose LD columns, we have andu let’s22, consider0 : working with the Author: Andrew Halsaver projectiveConsider line. a Thatmatrix is withlet’s suppose LD columns, we have andu∈C let’s2 consider,0 : working with the ∈C2 Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew projectiveConsider line. a That matrix is let’s with suppose LD columns, we have andu let’s2, consider0 : working with the 11 ∈C 2,0 projectiveuj = [(x line.k + λx Thatk+1)+( is let’sxk supposeλxk+1)e we1 +( haveλxku+ xk+12,0)e:2 +(λxk xk+1)e12] (1) uj = [(xk + λxk+1)+(xk λxk+1)e1 +(λxk +∈Cxk+1)e2 +(λxk xk+1)e12] (1) In a general case, given the CPS dimension (n), we build a column of where212 λ is a fixed real number,−− k = 1,3, and j = (k+1)/2. Also,−− uj = [(xk + λxk+1)+(xk λxk+1)e1 +(λxk + xk+1)e2 +(λxk xk+1)e12] (1) n uj = 21 [(xk + λxk+1)+(xk − λxk+1)e1 +(λxk + xk+1)e2 +(λxk − xk+1)e12] (1) In this definition,k × p,q∗k matricesdenotes thewhose invertible entries elements belong to of p,q ., Exclusively and p,qP foris the this whereu =assumeλ2 is[(x a fixedfor+ λx this real particular)+( number,x − λxcasek =1 that),e3,+( andx1 isλxj non-zero.=(+ xk + 1))e /As2.+( a Also, result,λx − assume xwe )e for] this (1) C C C wherej 2λ is ak fixedk real+1 number,k − k k=1+1 , 3,1 and jk =(kk++1 1)/2 2. Also,k − assumek+1 for12 this projective spacediscussion, defined as thewe setbuild of a EC [2, p.2]. matrix, and consider if the particularhave the case matrix that x representative1 is non-zero. As a result, we have the matrix representative 2(n + 1) × 2 whereparticularλ is acase fixed that realx1 number,is non-zero.k =1 As, 3, a andresult,j =( wek have+ 1)/ the2. matrixAlso, assume representative for this columns are linearly independent, or linearly dependent (LI or LD)a. whereparticularλ is case a fixed that realx number,is non-zero.k =1 As, 3, a result, and j =( wek have+ 1) the/2. matrix Also, assume representative for this Recall that Agnew-Childress discuss the use of matrices as scalars. As such particular case that x1 is non-zero. As a result, we have the matrix representative x1 λx1 we can make use of matrix isomorphims which relate to Clifford algebra. So, using particular case that x1 is non-zero. Asx1 a result,λx1 we have the matrix representative x2 λx2 n+1 n  xx12 λxλx12  p,q to build Now,p,qP , there we start are byseveral programming canonical these forms matrix which isomorphisms. a matrix can take An in  x λx  C C xx321 λxλx321 example of a matrix isomorphism would be,  x23 λx23  order for its columns to be LI or LD. However, we can significantly  x4 λx4   xx324 λxλx324 .  3 3 . reduce the number of cases by using matrix reductions. Moreover,  x3 λx3  2,0 Mat (2, ) [1, p.205],  x4 λx4 . Author: AndrewR Halsaver  4 4 . matrix reductionC  is a result of applying elementary matrices from UsingUsing column column reduction reduction on this on matrixthis x matrix4 yieldsλx4 yields Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew Using column reduction on this matrix yields. where Mat(2, R)linear is short-hand algebra, for so ait 2-dimensionalis a familiar process matrixb [3, having p.172]. real entries. Using column reduction on this matrix yields Using column reduction on this matrix1010 yields Other isomorphisms could be programmed usingAuthor:Author: helpful Andrew Andrewproperties, Halsaver Halsaver such as, Author: Andrew Halsaver x2/x101 0 MathematicaMathematica and and Clifford Clifford Projective Projective Spaces Spaces n Mentor:Mentor: Dr. Dr. Alfonso Alfonso Agnew Agnew  x210/x1 0  In this definition, p,q∗ denotes the invertible elementsIn general, of p,q , is and non-commutative, p,qP is the so we have to distinguish   x32/x101 0 C p+1,qC+1 Mat (2C , p,q) [1, p.214]. xx23/x/x11 00 projective space defined as the set of EC [2, p.2].between right and left scalar actions (i.e. multiply a matrix by a scaling  x4/x1 0  C  C  xx324/x/x11 00 . In this definition, denotes the invertible elements of , and n n is the  x3/x1 0  . In this definition, p,q∗ p,q∗ denotes the invertible elements of p,qp,q, and p,qp,qPP is the   Recall that Agnew-Childress discuss the usematrixCC of matriceswith said as scalar scalars. on the As suchleft,CC or right,CC of the matrix itself) [2,  x43/x1 0  In aprojective generalprojective space case, space defined defined given as as the the the set CPS set of of EC EC dimension [2, [2, p.2]. p.2]. (n), we build a column of k k Expressing this resulting matrix as its4 1 Clifford. algebra representative, we find we can make use of matrix isomorphims whichp.2]. relate The to Clifforduse of row algebra. reduction So, using on a matrix is equivalent to the× action ExpressingExpressing this this resulting resulting matrix matrix asx 4as its/x its1 Clifford Clifford0  algebra algebra representative, representative, we find matricesRecall whose that entries Agnew-Childress belong to discuss the. Exclusively use of matrices for as scalars.this discussion, As such we build  . n+1 n Recall that Agnew-Childress discussp,q the use of matrices as scalars. As such theExpressingthe EC EC to to be be this resulting matrix as its Clifford algebra representative, we find p,q to build p,qP , we start bywewe programming can can make make use useof of of using matrix these matrix isomorphimsleft matrix isomorphims scalar isomorphisms.C whichaction which relate relate on to a to CliffordClifford An algebra. algebra. matrix. So, So, using using Similarly, for Expressingwe find this the resultingEC to be matrix1 as itsx Clifford2 algebra representative, we find a 2(nwe+ can 1) make2 use matrix, of matrix and isomorphims consider which if the relate columns to Clifford2×2( aren algebra.+1) linearly So, using independent, or 1 x2 C C n+1n+1 n n theExpressing EC to be this resulting matrix(1 + e as1)+ its Clifford(e2 e algebra12) representative, we find example of a matrix isomorphism would p,q p,q toto× build be, build p,q p,qPP,, we we start start bya by programming programming these these matrix matrix isomorphisms. isomorphisms. An An the EC to be (1 + e1)+ (e2 e12) linearlyCC dependentCcolumnC (LI or reduction, LD) . one must use right scalar action on a 2(n+1)×2 21 2xx21 − exampleCexample of of a a matrixC matrix isomorphism isomorphism would would be, be, the EC to be  12(1 + e )+ 2xx21(e −e )   (1 + e1)+ (e2 e12)  matrix. This is very convenient, since Mathematica can execute row 21 2xx21 − 2,0 MatNow, (2, thereR) [1,are p.205] several, canonical forms which a matrix can take in order for  xx23(1 + e1)+2xxx14(e2 − e12)  2 ,,02,0 MatMat (2 (2, , R)) [1, [1, p.205] p.205], ,  3 (1 + e )+ 4 (e e )  C  and columnC reductionR quickly. Furthermore, not only can we use  2 (1 + e1 )+2x1 (e2− e12 )  its columns to be LI orC LD. However, we can significantly reduce the number  2xx1 1 2xx1 2− 12   2xx31 2xx41 − . wherewhere Mat(2 Mat(2, , )) is is short-hand short-hand for for a a 2-dimensional 2-dimensional matrix matrix having having real real entries. entries.  (1 + e1)+ (e2 e12) . where Mat(2, R) is short-handof for caseswhere a 2-dimensional by Mat(2 using,RRmatrix) is matrix short-hand matrix reductions reductions. for having a 2-dimensional to real yield Moreover, entries. equivalence matrix matrix having classes, real reduction entries. we iscan a also result determine of  x3 (1 + e1)+ x4 (e2 e12)   2x1 2x1 − . b Now, consider Λ∗ such that 2x1 (1 + e1)+2x1 (e2 − e12) . applyingOtherOther elementaryisomorphisms isomorphismswhat could matrices couldscaling be be programmed programmed matrix from linear is usingnecessary using algebra, helpful helpful properties, to properties, so perform it is such a such familiar the as, as, reduction process process.[3, Now, consider Λ∗ such that2x 2x  Other isomorphisms could be programmed using helpful properties, such as,  1 1 − . p.172]. Author:Now, Andrew consider Halsaver Λ∗ such∗ that  p +1p+1,q,q+1,q+1 MatMat (2 (2, , p,q p,q)) [1, [1, p.214] p.214]. . Now, consider Λ such that 1/x λ CC  CC 1/x1 λ p+1,q+1 Mat (2, p,qMathematica) [1, p.214]. and Clifford Projective Spaces Mentor: Dr.Now, Alfonso consider Agnew Λ∗ such thatΛ∗ = 1 − x1 =0. C  In general,C is non-commutative, so we have to distinguish between right Λ∗ = 01− x1 =0. In a general case,Resultsp,q given the CPS dimension (n), we build a column of k k 1/x011 λ ,  In a general case,C given the CPS dimension (n), we build a column of k k Λ∗ =  1 − , x1 =0. and left scalar actionsWe can (i.e. take multiply advantage a matrix of matrix by a scaling representatives matrix× with×Author: of Clifford said scalar Andrew Halsaver Λ∗ = 1/x011 −λ x1 =0. matricesmatrices whose whose entries entries belong belong to to p,q p,q.. Exclusively Exclusively for for this this discussion, discussion, we we build build 01  In a general case, given the CPS dimensionMathematica (n), we and build CliffordCC a column Projective of k Spacesk Mentor: Dr. Alfonso Agnew Λ∗ =  − , x1 =0 . on theaa 2( 2(left,nn++ 1) or1) right,22 matrix, matrix, of and the and consider matrixconsiderC if if itself) the the columns columns [2, p.2]. are are linearly linearly The useindependent, independent, of row orreduction or on a  01 × elementsResults in order to determine× what scaling matrix is required to ,  matrices whose entries belong to p,q. Exclusively× for thisa a discussion, we build This matrix has a Clifford representative that is invertible. The Clifford matrixlinearlyClinearly is equivalent dependent dependent (LI to (LIthe or or LD) LD) action. . of using left scalar action on a 2 2(n+1) matrix. This matrix has a Clifford representative that is invertible. The Clifford a 2(n + 1) 2 matrix, and consider if the columnsreduce are a matrix linearly appropriately. independent, To or do this, we look× to the definition ofrepresentative is, Similarly,Now,Now, for there columnthere are are several severalreduction,We canonical canonical can one take forms forms must advantage which which use a a matrixright matrix of scalarmatrixcan can take take action inrepresentatives in order order on for for a 2(n+1) of Clifford2 elementsrepresentativeThis in order matrix is, has a Clifford representative that is invertible. The Clifford linearly dependent× (LI or LD)a. Resultselementary operations on matrices from linear algebra [3, p.172].× We This matrix has a Clifford representative that is invertible. The Clifford matrix.itsits columns columnsThis is to to very be be LI LI convenient,to or or determine LD. LD. However, However, since what we we Mathematica can can scaling significantly significantly matrixcan reduce reduce is execute the required the number number row to and reduce column a matrixrepresentative appropriately.This matrix is, has a Clifford representative that is invertible. The Clifford ofof cases cases by by using usingwill matrixWe matrix work can reductions. reductions.with take advantageright Moreover, Moreover, scalar of matrix matrixaction, matrix reductionrepresentatives reduction but note is is a athat result result of left Clifford of of scalar elements action in is order representative is,11 11 11 reduction quickly. Furthermore,To do this,we not look only to can the we definition use matrix of elementary reductionsb b tooperations yield onrepresentative matrices from is,v = +1 + 1 e1 λ (e2 + e12) (2) Now, there are several canonicalapplyingapplying forms elementary elementary which matrices a matrices matrix from from can linear linear take algebra, algebra, in order so so it it isfor is a a familiar familiar process processb [3,[3, v = +1 + 1 e1 λ (e2 + e12) (2) to determine what scaling matrix is required to reduce a matrix appropriately. 21 x11 x11 − − , its columns to be LI or LD.equivalence However,p.172].p.172]. we classes, cananalogous significantly welinear can algebraand also reducecan determine [3, be p.172]. done the number what by We transposing will scaling work matrix with matrices right is necessary scalar[2, p.2]. action, to but note that left v = 12 x11 +1 +  x11 −1 e −λ (e + e ) , (2) To do this, we look to the definition of elementary operations on matrices from v = 1  1 +1+  1 1 e1 λ (e2 + e12)  (2) scalar action is analogous and can be done by transposing matrices [2, p.2]. 2 x1 x1 − − (2) of cases by using matrix reductions.perform theIn Moreover, general, reduction matrixis process. non-commutative, reduction sois we a haveresult to distinguish of between right v = 2 x1 +1 + x1 − 1 e1 − λ (e2 + e12) , (2) In general,linear p,qp,qis algebra non-commutative, [3, p.172]. so We we will have work to distinguish with right between scalar right action, but note that left     Author: Andrew Halsaver CC b 2 x1 x1 − − , applying elementary matrices fromandand linear left left scalar scalar algebra,scalar actionsConsider actions action (i.e.so (i.e. itConsider multiplya multiply is ismatrix aanalogous familiar a a matrix matrixwitha matrix and byprocessLD by a cana scalingcolumns, scaling with be[3, matrix done matrixLD and bycolumns, with with transposinglet’s said said consider scalar scalar and matrices let’s working consider [2, p.2]. withMathematica working with andthe Clifford Projective  Spaces Mentor: Dr. Alfonso Agnew on the left, or right, of the matrix itself) [2, p.2]. The use of row reduction on a Author: Andrew Halsaver p.172]. on the left, or right, of the matrix itself) [2, p.2]. The use of row reduction on a 2 matrixmatrix is is equivalent equivalenttheConsider toprojectiveprojective to the the action action a ofmatrix ofline. using using That left That withleft scalar scalar is LD is let’s action let’s action columns, suppose onsuppose on a a2 2 and2(2( nwen+1) we+1) let’s havematrix. have matrix. consider u working2,0 : withMathematica the and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew ×× ∈C Similarly,Similarly, for for columnprojective column reduction, reduction, line. oneThat one must must is let’suse use right right suppose scalar scalarweaction action have× on on au a 2( 2(nn+1)+1)2 : 22 and has inverse, In general, p,q is non-commutative, so we have to distinguish between right 2,0×× C matrix.matrix. This This is is very very convenient, convenient,1 since since Mathematica Mathematica can can execute execute row row and∈C and column column× and has inverse, and left scalar actions (i.e. multiplya. Recall a matrix the definition by a scaling ofu linear= matrix[( dependence.x + withλx said)+( As statedscalarx inλx Goode-Annin’s)e +(λx text,+ x “A)e finite+(λx x )e ] (1) reductionreduction quickly. quickly. Furthermore, Furthermore,1 j not notk only only can cank+1 we we use use matrix matrixk reductions reductionsk+1 1 to to yield yieldk k+1 2 k k+1 12 nonempty set of vectorsu = v [(1,x v 2,+ ..., v2 λxk in)+( a vectorx λx space)Ve−is+( saidλx to+ bex linearly)e +( dependentλx x if)e ] (1)−and has inverse, on the left, or right, of the matrixequivalenceequivalence itself) [2, classes, classes, p.2].j we The{ we can can usek also also of determine determinek} row+1 reduction whatk what scaling scalingk+1 on matrix a1 matrix is isk necessary necessaryk+1 to2 to k k+1 12 (1) 1 v x1 1 1 there exist scalars c ,c , ...,2 c , not all zero, such that− c v +c v +...+c v = 0.” Furthermore,− v− = = +1 + 1 e + λ (e + e ) provided x =0 matrix is equivalent to the action ofperformperform using the left the reduction reduction scalar1 2 action process. process.k on a 2 2(n+1) matrix.1 1 2 2 k k 1 2 12 1 . where λ is a fixed real number, k =1, 3, and j =(k + 1)/2. Also, assume forv thisv 8 x1 − x1 ,  a set of vectors whichwhere is notλ is linearly a fixeddependent real× number, is calledk =1 linearly, 3, and independentj =(k + 1)/ [3,2. Also,p.269]. assume for this 1 v x1  1   1  Similarly, for column reduction, one must use rightparticular scalar action case on that a 2(xn1 +1)is non-zero.2 As a result, we have the matrix representativev− = = +1 + 1 e + λ (e + e ) provided x =0 b. A matrix that is found by using a single elementary× row operation on the identity matrix 1 2 12 1 . matrix. This is very convenient, since Mathematicaparticular can case execute that x1 rowis non-zero. and column As a result, we have the matrix representative Lookingvv back8 atx equation1 (1), u− vx1(using right-scalar action, for k = 1) the is, by definition, an elementary matrix [3, p.172]. Looking back at equation (1), u ·· v (using right-scalar action for k = 1) reduction quickly. Furthermore, nota. Recallonly the can definitiona. we Recall use ofthe linear matrixdefinition dependence. reductionsof linear Asdependence. stated to in yield Goode-Annin’sAs stated in Goode-Annin’sx1 text,λx “A1 finite text, “A finite nonempty set of first component of the EC representative reduces to, a. Recall the definition of linear dependence. As stated in Goode-Annin’sx1 λx1 text, “A finite theLooking first backcomponent at equation of the (1), EC urepresentativev (using right-scalar reduces actionto, for k = 1) the equivalence classes, we can also determinenonemptynonempty set set of whatof vectorsvectors scaling v 1 v , 1 v , 2 v , 2 ..., v , ..., v matrixk k in inin aa avector vector vector isspace necessaryspace spaceV VV isisis said said said to to to be be be linearly linearly linearlyx dependent2 dependent dependentλx2 if if there if exist scalars { {1 2 k} } x2 λx2 1 · x there exist scalars c ,c , ..., c , not all zero, such that c v +c v +...+c v = 0.” Furthermore, first component of the EC representative2 reduces to, there exist scalars c 1 ,c 1 2 , 2 ..., c k , k not all zero, suchsuchthat that c 1 v 1 1 + 1 c 2  v 2 2 + 2 ... + c k v k k k =  0.” x “ Furthermore, Furthermore,λx a set of vectors which is not (1 + e )+ (e e ). perform the reduction process. x3 λx3 3 3 1 2 12 aa set set of of vectors vectors which whichlinearly is is not notdependent linearly linearly dependent is dependent called linearly is is called called independent linearly linearly independent independent [3, p.269]. [3, [3, p.269]. p.269]. 2 2x1 −  x λxx4 λx4  b.b. A A matrix matrix that that is is found found by by using using a a single single elementary elementary row row operation operation4 on on4 the the identity. identity matrix matrix. 1 x2 b. A matrix that is found by using a single elementary row operation on the identity matrix is, by defini- (1 + e )+ (e e ). is,is, by by definition, definition, an an elementary elementary matrix matrix [3, [3, p.172]. p.172].    For k = 3, the second component of1 the EC representative2 12 reduces to, tion, an elementary matrix [3, p.172]. 2 2x1 − UsingUsing column column reduction reduction on this matrix on this yields matrix yields x x 3 (1 + e )+ 4 (e e ) a. Recall the definition of linear dependence. As stated in Goode-Annin’s text, “A finite For k = 3, the second component2x of1 the2 ECx representative2 12 . reduces to, 1010 1 1 − nonempty set of vectors v 1, v 2, ..., vk in a vector space V is said to be linearly dependent if DIMENSIONS { } x2/x1 0 x3 x4 73  x2/x1 0 As required, we have the equivalence(1 + e1)+ class (e2 e12). there exist scalars c1,c2, ..., ck, not all zero, such that c1 v 1 +c2 v 2 +...+ck v k = 0.” Furthermore,x /x 0  3 1 x /x 0 2x1 2x1 − a set of vectors which is not linearly dependent is called linearly independent [3, p.269]. 3 1  x4/x1 0 . 2 1   x4/x1 0  u  : u = [(x + λx )+(x λx )e +(λx + x )e +(λx x )e ] b. A matrix that is found by using a single elementary row operation on the identity matrix   . As∈C required,2,0 j we have2 k the equivalencek+1 k − classk+1 1 k k+1 2 k − k+1 12  −→ is, by definition, an elementary matrix [3, p.172]. Expressing this resulting matrix as its Clifford algebra representative, we find the ECExpressing to be this resulting matrix as its Clifford algebra representative,2 we find 1 1 x u 2,0 : uj = [(xk + λxk+1)+(xk λxk+1)e1 +(λxk + xk+1)e2 +(λxk xk+1)e12] the EC to be (1 + e )+ 2 (e e ) ∈C 1 2 x2 − −  −→ 1 2 12 (1 + e1)+ (e2 e12) 2 1 2x1 − x2  (1 + e1)+ (e2 e12) 2 2x1 − k +1 2 2x1 −   where k =1, 3 and j = . x3 x4   1x3 xx4 2  (1 + e1)+ (e2 e12)   (1 + e )+ 2 (e e )   2x1 x3 2x1 − x4 . (1 + e1)+1 (e22 e1212)    22x1 2x2x1 −    (1 + e1)+ (e2 e12)   1 − , k +1  2x 2x −    where k =1, 3 and j = . Now, consider Λ∗ such that  1 1 . 2   x3 x4 We are inclined(1 + e1)+ to think(e that2 e equation12)  (2) can be applied to other canon- Now, consider Λ∗ such that 1/x1 λ  2x1 2x1 −n+1 , Λ∗ = x =0. ical matrix representatives of 2,0 . We do in fact find that this is true. We 01− 1 C ,  acknowledge this with the following theorem.  1/x1 λ Λ∗ = − x1 =0. We are inclined to think thatn+1 equation (2) can be applied2(n+1) n to other canon- 01  Theorem 1: Consider u 2,0 : such that u M R × where M has  , ical matrix representatives∈C of n+1. We do in fact∈ find that this is true. We This matrix has a Clifford representative that is invertible. The CliffordL.D. columns. A priori, we haveC 2,0 representative is, acknowledge this with the following theorem. 1 This matrix has a Clifford representative that is invertible. The Clifford n+1 2(n+1) n Theoremuj = [( 1x:k Consider+ λxk+1)+(u xk 2λx,0 k:+1 such)e1 +( thatλxku+ xkM+1)e2 +(R λxk × xwherek+1)e12]M, has 1 1 1 2 ∈C−  ∈ − representativev = is, +1 + 1 e1 λ (e2 + e12) (2)L.D. columns. A priori, we have 2 x x − −  1   1  , k +1 1 1 1 where j1 = , λ is a fixed real number, and k =1, 3, 5, ..., n+1. The canonical v = +1 + 1 e1 λ (e2 + e12) uj = [(x(2)k 2+ λxk+1)+(xk λxk+1)e1 +(λxk + xk+1)e2 +(λxk xk+1)e12] , 2 x1 x1 − − equivalence2 class on u is determined− by v 2∗,0 : −     , ∈C 1 1k +1 1 wherev = j = +1, λ+is a fixed1 reale1 number,λ (e2 + e and12) k, =1whenever, 3, 5, ...,xi n=0+1., and Thei =1 canonical, 2, 3, ..., n+1 2 xi 2 xi − −  equivalence  class on u is determined by v ∗ : ∈C 2,0 Proof: 1 1 n+1 1 v = Let u +12,0+. The matrix1 e representativeλ (e + e ) of ,u wheneveris a 2(n +x 1) =02, matrixand i with=1, 2, 3, ..., n+1 2 x ∈C x − 1 − 2 12 i × linearly  dependenti  columns i written as,  Proof: n+1 Let u 2,0 . The matrix representative of u is a 2(n + 1) 2 matrix with linearly dependent∈C columns written as, × Author: Andrew Halsaver Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew

and has inverse,

1 v x1 1 1 v− = = +1 + 1 e + λ (e + e ) provided x =0 vv 8 x − x 1 2 12 1  .  1   1  , Looking back at equation (1), u v (using right-scalar action for k = 1) the first component of the EC representative· reduces to,

1 x2 (1 + e1)+ (e2 e12). 2 2x1 − For k = 3, the second component of the EC representative reduces to,

x3 x4 (1 + e1)+ (e2 e12). 2x1 2x1 − As required, we have the equivalence class Author: Andrew1 Halsaver u 2 : u = [(x + λx )+(x λx )e +(λx + x )e +(λx x )e ] Mathematica and Clifford Projective Spaces Mentor:∈C Dr.2,0 AlfonsoAuthor:j 2 Agnew Andrewk k+1 Halsaverk − k+1 1 k k+1 2 k − k+1 12  −→ Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew Author: Andrew Halsaver and has inverse, Author: Andrew Halsaver 1 x2 Mathematica and CliffordMathematica Projective Spaces and CliffordMentor: Projective Dr. Spaces Alfonso AgnewMentor: Dr. Alfonso(1 + Agnewe1)+ (e2 e12) Author:2 Andrew2x1 Halsaver− k +1 v andx has1 inverse, 1   where k =1, 3 and j = . 1Mathematica1 and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew v− = = +1 + 1 e1 + λ (e2 + e12) provided xx13 =0. x4 2 vv 8 x1 v x −1 x1 1 ,   (1 + e1)+ (e2 e12)  and has inverse, and has inverse, 1  1   Author:Author: Andrew Andrew Halsaver Halsaver 2x 2x  v− = = +1 + 1 e1 + λ (e2 +e12)1 provided x1 1=0.− , MathematicaMathematica and and Clifford Clifford Projective Projectivevv Spaces Spaces8 x1 Mentor:Mentor:− x1 Dr. Dr. Alfonso Alfonso Agnew Agnew,   Lookingand back has at inverse, equation (1), u v (using  right-scalar action for k = 1) the 1 v x1 1 1 1v x1 1 · 1 v− = = +1 +firstv−1 component= =e + ofλ ( thee + EC+1e representative) +provided1 x reducese=01 + λ (e to,2 + e12) providedWe arex1 inclined=0. to think that equation (2) can be applied to other canon- vv 1 8 Looking2x 12 back at− equationx 1 (1),. u v (using right-scalar action for k = 1) the vv 8 x1 − x1 1 , 1  , n+1     1  v x1 1  1 · Author: icalAuthor: matrix Andrew representatives Andrew Halsaver Halsaver of 2,0 . We do in fact find that this is true. We andand has has inverse, inverse, vfirst− = component= 1 of the+1 ECx+ representative2 1 e1 reduces+ λ (e2 + to,e12) providedAuthor:x1 =0 Andrew. C Halsaver MathematicaMathematica and and Cliffordvv Clifford Projective8 (1 Projective+xe1)+ Spaces( Spacese2 − ex12).Mentor:Mentor:acknowledge Dr. Alfonso Dr. this Alfonso AgnewwithProof: the Agnew following theorem. Author: Andrew Halsaver Looking back at equation (1),Lookingu v (using backMathematica right-scalar at equation2 action  and(1),1 uClifford forvk2(usingx= 1) Projective− the right-scalar1  Spaces action for k,=Author:Mentor: 1) the Andrew Dr. Alfonso Halsaver Agnew · · 11 x2 Mathematica and Cliffordn+1 Projective Spaces 2(n+1) nMentor:Author: Dr.Author: Alfonso Andrew Andrew Agnew Halsaver Halsaver first component of the EC11 representativefirstvv componentForxx11 kMathematica = reduces1 3,1 the of second the to, EC andcomponent representative11 Clifford of Projectivethe reduces(1 EC + e representative1 to,)+ Spaces(e2Theoreme reduces12). Mentor: 1 to,: ConsiderLet Dr. u Alfonso  2 ,0 Agnew. :The such matrix that urepresentativeM R of× u iswhere a 2(n M+ has1) × 2 vv−− == == +1+1 ++ 11 ee ++λλ(e(e ++ee )) providedprovidedxx =0=0 ∈C  ∈ For k = 3, the secondLooking component back at equation1 of1 the EC2 (1),2 representative12u12 v (using2x1 right-scalar reducesL.D.−11 columns.. to,. actionMathematicaMathematica A priori,for k = we 1) have the and and Clifford Clifford Projective Projective Spaces Spaces Mentor:Mentor: Dr. Dr. Alfonso Alfonso Agnew Agnew vvvand88 has inverse,xx11 −Author:−xx11 Andrew Halsaver· , ,   matrix with linearly dependent columns written as, 1 and  x hasfirst inverse, component  1 of the EC representativex2  reduces to, Author: Andrew Halsaver 2 and has(1 inverse, + e )+ (e e ). Author: Andrew Halsaver Mathematica and Clifford Projective(1 + e1)+ Spaces(eFor2 ek12=). 3,Mentor: thex3 second Dr.1 component Alfonsox4 2 ofAgnew12 the EC representative1Mathematica reduces to, and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew LookingLooking2 back back at2 atx1 equationv equationand−x1 has (1), (1),1 inverse,uu2 vv(1(using(using + e1)+ right-scalar right-scalar21x1 (e−2 actione action12). for forkk== 1) 1) the the Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew 1 1 x2 uj = [(xk + λxk+1)+(xk λxk+1)e1 +(λxk + xk+1)e2 +(λxk xk+1)e12] , v− = 1 =v x1 +12x·1·1 v+ 1x1 2x11e11+−λ (e2 + e12)1 provided x1 =0. x1 λx1 Author: Andrew Halsaver firstfirst component component of ofv the− thev=v EC EC representative8 representative= x 1 +1 reduces reduces+ x(11 to,x to, +3 e1)+e + λ(e(x2e4 +e12e ).) 2provided x =0 − − For k = 3, the second componentv−1 = = of the− EC1 (1 representative+1 + e1+)+1 2 ( reducese, 12e1e+) to,λ (e2 + e12)1 provided. x1 =0. x λx For k = 3, the second component of the EC representativevv 8 x1 reducesvv  8 to,2 x− x1 12x1 −x 2 12 . Mathematica and Clifford Projective2 Spacesxx1 1 2 λxλx1 1Mentor: Dr. Alfonso Agnew 1 v x1  1 2x11  12−x1 1 − , ,    and has inverse, As required, we havev− the= equivalence= class +1 + 1  e1 + λ (e2 + ek12+1) provided x1 =0. 11 xx2 ...xx2 2 ... λxλx2 2 x AsLooking required,x back we athave equationvv xthe3 equivalence82 (1), ux1 v x(using4 class right-scalar− x1 actionwhere forj =k = 1), theλ is a fixed real number,U = and k =1, 3, 5, ..., n+1. The canonical 3 For4 k =(1 3,(1 + the +ee1)+1 second)+ ( componente(e22 ee1212).). of the EC representative reduces, to,   (1 + e1)+ Looking(Ase2 required,e back12). at we equation(1 have + e1 the)+ (1),· equivalenceu (ve2(usinge12 class). right-scalar action for2  k = 1) the  xkx1 ...λxλxk 1  ... first2 component12 of2 the ECLooking representative22xx11 back−− at equation reduces to, (1), u v (using right-scalar action for k = 1) the x...1 λx ...1 1 v x1 1 2x1 12x1 − 2x1 2x1· − · equivalence class on u is determined by v UU2∗,=0=:  u first2,0 : u componentj = [(xk + ofλx thek+1 EC)+( representativexk x3λxk+1)e1 reduces+(xλx4 k to,+ xk+1)e2 +(λxk xk+1)e12]  ...x2 x ...λx2 λx  v− = = +1 +∈C1 e1 +2 λ (firstLookinge2 + componente12) backprovided at of equation− thex1 EC=0 (1), representative. u v (using reduces right-scalar to, − action for k −→= 1) the ∈C  xxk2k λxλxk2k  vv 8 x1 − x1 1 (1 + e1)+ (e2 e12).  x1 λx1   ForForkk== 3,As 3, the the required, second second componentwe component have2 the of equivalenceof the the1 , EC EC representative representative classx2 reduces reduces· to, to,  x2(n+1)...λx...2(n+1) ...  ...  As required, we have  the equivalence  class ufirst component2,0 : uj =(1 of[( +x theek +)+2x ECλx1 k+1 representative()+(e xek 2)x. 1λxk reduces+1−)e1 +(1 to,λxk1+ xk+1)e2 +(1 λxk xk+1)e12] U =  ......  1 1 2x2 12  x2 λx2   ∈C 2 2 2x1 −1 − vx2= +1 + 1−e1 λ (e2  −→+ e12)U ,=wheneverxk x λxxi λxk=0,and i =1, 2, 3, ..., n+1 xx xx (1 + e1)+ (1(e2 + ee)+12). (e e ).   x2(2(nx+1)nk+1) λx2(λx2(n+1)nk+1). Looking back at equation (1), u v (using1 1 right-scalar33 x action424 for k = 1)2x the 1 22 x12i xi −1/xi λ−  ... ...  . 2 1 2 As required,(1(1 + + weee12)+1 have)+ the(e(e2 equivalence2 ee12121).). 21 class− x2 2x1 − Let Λ =  , assuming x =... 0. ...   u  : u = [(x + λx u)+(x · :λxuj =(1) +e[(e+(x1)+k +λxλx+k+1(xe2)+()eex12k+() λxk+1)xe1 +()eλx]k + xk+1)e2 +(λxk xk∗+1)e12] − U = i ......  first component2,0 j of thek ECk representative+1 Fork2k,0 = 3, reducesk+1 the2 second12xx11 to, componentk k+122xx112 of the−− ECk (1 representative +k+1e1)+12 ( reducese2 e12 to,). 011/xi λ  x λx  ∈C 2 ∈C − 2 2 2x1 − − 2− 2 −→x − k +1 − ∗ −→ 1/xi λ  x2(nk+1) λx2(nk+1)   For k = 3, the second11 componentx of2 the EC representative1 Proof: reduces to, LetLet Λ Λ∗ ∗ = =  − − , , assumingassuming assuming xx 2(xini+1)== 0. 0.λx2(n+1).  2 For k =(1x 3, + thee1)+ second(xwhere componente2 e12k )=1 of, 3 the and ECj = representative. reduces to, 0101 ...  ...  . AsAs required, required,1 we we have havexu the the equivalence equivalence2,0 : uj = class class[(3xk + λxk+1)+(4 xk λxk+1)e1 +(λxk +2 xk+1)ne+12 +(λxk xk+1)e12]    x23∈C x242 (1 + e1)+2x1 (−e2 −e12). Let u Multiplying, . Thek +1 matrix− we1/x seei representative thatλ  −→U Λ∗yields of uis a 2(n + 1) 2 matrix with (1 + e1)+ ((1e2For + ek12)+=). 3, the2x(e secondx3e ) component 2x x4 of the EC representative2 reduces,0 to, 1/xi λ  x2(n+1) λx2(n+1)  1 1 x2 12 12 1 x−3 wherex4k =1, 3∈C andLetj = Λ∗ = . − ,· assuming xi = 0. ×. 1 x2 2  22xx11 − 2x1 − (1 +e1)+ (1(e2 + ee)+12)linearly. (e dependente ) Multiplying, columnsLet Λ∗ = written01 we see as,− that ,U assumingΛ∗ yields xi = 0. (1 + e )+ 2(2e e )11 (1 + e1)+ x3(e2 e12) x,4 1 2 12 . Multiplying,2 we01 see that U Λ∗ yields  1 2 12  2x1  x3 22xx11 − x4 2x1 −   ·  x /x 0 2 uu 2x12,20,0: :−uujAsj== required,[([(xxk2k++λx weλxk+1 havek+1)+()+( the2xx1xk equivalencek (1λx− +λxke+1k1+1)+)e)e1 class1+(+(λx(λxe2kk++ex12xk+1k)+1)e)e22+(+(λxλxkk kx+1xk+1k+1)e)e1212] ] 1/xi λ  · ∗ 1 i ∈C 22 1 − kx+1 (1 + e1)+ (e2 −e12). Let −→ ΛMultiplying,∗ = we see ,that assuming · xyieldsi = 0. For k = 3, the second∈C component of the ECwhere representativek =1, 32x and1− reducesj = 2 to,2x. 1where−k =1, 3, and j =− .  −→ x − λx U Λ x /x 0 As required, we have(1 + thee1)+ equivalence(e2 2x class1e12)  2x1 − Multiplying, we101 see that1 U Λ∗ yields  2 i xx1/x1/xi i 00 x x We arex3 inclinedAs to required, thinkx4 that we equation have2 the (2) equivalence can be applied class to other2 canon-Multiplying, we see that U Λ∗ yields   3 4 2 1 2 2x1 − k +1 x λx · x /x 0  (1 + e1)+ x(3e2 ue12) :xu4(1= + e[(1)+x + λx(e2 )+(n+1e12x)  λx )e +( λx + x )e +(λx x )e ] 2 xx1 1 2 λxλx1 1 · 3 i xx2/x2/xi i 00 ical matrix2,0 representativesj As required,k of wek+1 have2,0 . thek We equivalence dok+1 in1 factclasswhere findk thatk+1=1 this2, 3 isand true.kj = k We+1 12.     2x1 2x1 (1−1 +1 ∈Ce1)+,2x1 (xex222Wee121 are).2x inclined1 − to think−, that equation (2) can be applied−Multiplying, to other −→ we canon- see that U Λ∗ yields x1/xi 0    2 2 x 2 C 1x  2 ...xx2 2 ... λxλx2 2 1/xi λ  x4/xi xx3/x30/xi i 00 2x1acknowledge(1(1 +u +ee1)+12)+x this1 : u with(j−e(=e22 thee3[(e12x12) followingk)+ λxk+1)+( theorem.4 xk λxn+1k+1)e1 +(λxk + xk+1)e2 +(λxk xk+1)e12] · =  x1/xi 0   2,0ical matrixu (1 representatives2,0 +:eu1j)+= [(x(ek2 of+ λxe12k+1) .)+( Wexk do inλxk fact+1)e find1 +( thatλxk + thisxk+1 is)e true.2 +(xλx Wek xλxk+1)e12]  − x /x 0 22 ∈C 22xx11 −∈C2− 1 2 − 2,0 kk−+1+1 − xk 1  −→...−...λxk 1  ......  −→0111/x/xi λλ ...2 xi x4/x .../xi  00   2x1 n+1 2x1 −C , 2(n+1) n  x1 λx1 i =  x24/xii  0  As required, we have the equivalenceTheoremWe areclass 1 inclined:acknowledge Consideru  to thinku: u thisj = that withwhere[(where: equationx suchk the+kkλx=1 following that=1k+1 (2), ,3)+(u3 and canandx theorem.Mjk bej==λx appliedk+1).e.1× to+(where otherλxk + canon-Mxk+1has)e2 +(λxkx xk+1)eλx12] −− =x1/xi 0  We are inclined to think that equation1 (2) can2 be,0 appliedx2 2,0 to other canon- R  ...2 xxk ... 2 λxλxk  0101 xk/x3 i i ...... 0  ......  xx xx∈C ∈C 2n+1  −∈ 22  − x2k  −→λx2k    x3/xi 0  L.D.n3+13 columns. A(14 priori, +4 e1)+ we have(e2 e12) n+1 2(n+1) n x1 λx1   x2/xi 0  ical matrix representatives ofical2,0 matrix(1(1. + We +ee1)+1 representatives do)+2 in fact(e(e22 finde2e12x12 of) that)−2 this,0 . is We true. do in We fact find that this is true. We  x2(n+1)...λx...2(n+1) ...  ... 1/xi λ  ...x4/xxi k/x ...i00  1 We are inclinedTheorem1 to think 1:1 Consider thatCx2 equationu (2)2,0 can: suchbe applied that u to otherMk +1 R × where Mhas......  1−/xi =λ   xx4k/x/xii  00  2 C22xx11  2We2xx11 are−− inclined, ,1 to think thatx2 equation (2) can be applied to other canon-x2 λx2    =x3/xi 0  acknowledgeu  : u this= with[(x the+ λx followingacknowledge)+(x theorem.λx this) withe (1+( + theλxe1)+ following+ x (1)(e2 +theorem.+(ee∈C)+λx12)wherex(ek =1)ee,]3) and j = ∈ .  xk x λxλxk  01−x2(n+1).../xi ...0 ... ...  2,0 j k k+1 k k+1L.D.1 columns.k Ak+1 priori,2 we1 havekn+1 k+12 1212 2   x2(2(nx+1)nk+1) λx2(λx2(n+1)nk+1) 01  ......  ∈C 2  canonical1− icalx 3matrix matrix2 representatives representativesx142x21 − of ofx  2 −, 0x . We We− do do in −→ infact fact find find that that thisk +1 this is true. We......   1/x λ   x /x 0  n+1u = [(x + λx )+(x2(nn+1)+1λxn )e +(2 λx1 + x )e 2(+(n+1)λx n x )e ] , k +1... ...  i  xk4/xi 0   Theoremj  1:k Consider (1k ++1e1)+u k(1(e +2 e:k+1 suche)+12)1 thatC (euk wheree Mk)+1k 2=1, 3× andk wherej k=+1 M12 has.   − = xx2(2(n+1)n+1)/x/xi i 00  Theorem 1: Consider u 2,0 : such2  thatacknowledgeu M thisR with−2,0 × the1 where following M2 has theorem.12 R where− k =1, 3 and j =  ......  n+1   xk/xi  0 . . WeWe are are inclined inclinedis true. to We to2x thinkthink1 acknowledge that that1 ∈C equation equation22x1 this− with (2) (2) canthe2 canx,1 following be be applied− applied theorem.∈ to to other other canon- canon- 2  x xk λxλxk  01  ......   ∈C  x3 ∈ xx34  x4 This matrixk +1 corresponds22(n+1) to2(nw+1) 2,0 :     L.D. columns. A priori, we haveL.D. columns. Au priori,j =(1 +n[( we+1n+1xek have)++ λxk+1(e)+(exnk+1) λxk+1)e1 +(λxk + xk+1)2(en2+1)+(λxn k xk+1)e12] ,x2(n+1) λx2(n+1)     ... ...  icalical matrix matrix representatives representatives of of  .1. We We do do(1 in2 +in facte fact1)+12 find find that( thate2 this thise12)where is is true. true.k =1 We We, 3 and j =  . ...... ∈C  n+1n+1  xk/xi 0   1 x2 kTheorem+1 2x 2 12:,20, Consider0 2x u − 2,0−: such that u M R × whereThis−This2 matrixM matrixhas corresponds corresponds to toww 2,0 : :  x2(n+1)/xi 0  (1 + e )+ (e e )  1 CC x3 2x11 ∈Cx4 2x, 1 − , ∈   x  e x 2e,0x e x  x2(n+1)/xi.0  1 acknowledgewhere2 this12jWe1= with are the inclined, λ followingis a to fixed think theorem. real that number, equation and (2)k =1 can, 3 be, 5, applied ..., n+1. to The other canonical canon-  x2(n+1) λx2(n+1)1  1∈C∈C1 2 2 122 ......  . 1 2 acknowledge2x1 − thisTheorem withL.D.2 the 1. columns. following A theorem. priori,(1 + e we1)+ have (e2 e12) + +   u = [(x + λx )+(x λxuj = )e[(+(xk +λxλx+k+1x)+()kex2+1kx+(1 λxkn+1+1k)xe+11 +(2x)e1λx]k, +− xk+1)e2 +(λxk xk+1)e12] ,   n+1     j  k k+1 k equivalenceicalk+1 matrix1 classwhere representativesk onkun=1k+1n+1is+1, determined32 and of jk=2,0 by.k We+1v . do12 in fact: find, that this is true. We  2xi  2xx1i1 ee12x1xx1i1 −ee22x2x2i2 x2(ee12n12+1)xx2/x2 i 0  2 Theorem− 1: Consider2 whereu j = : such−, λ thatis− a fixedu M real2∗ number,,02(2(n+1)n+1)nn andwherek =1M−, 3has, 5, ..., nThis+1. matrix The canonical corresponds to w 2,0 +: n+1 + . x Theoremx 1: ConsiderConsiderWe u are inclined1  2 ,2 0 , 0  We :: such2 to are think thatC inclined thatu 2 M toequation ∈C think R R that (2) × ×can equation wherewhere be applied M (2) hashas can to be other appliedThisThis canon- matrix tomatrix other corresponds corresponds canon- to∈C to w +2,0 : +    3 4 acknowledge this∈C∈C with the following theorem.n+1 ∈∈ 22x∈Cxi 22xxi 22xxi − 22xxi   (1 + e1)+ (e2 e12)  uj = [(xk + λxk+1)+(xk λxk+1)e1 +(n+1λxk + xk+1)e2 +(λxk xk+1)e12] , x e x i e xi e xi − i L.D.L.D. columns. columns.L.D. Aical A priori, priori, columns.k matrix+1equivalence we wehave representatives haveAical priori, matrix class we have on representatives ofu is2 determined,0 . We of do by in v fact. We find2∗,0 do: that in fact this find is true. that We this is true. We 3x1 1ne+131x1 2 e42x2 12e124x2 k2+1x1 2x1 −1 1 , 2We1 are inclinedn+1 to think− that equation2,0 ∈C2(n (2)+1) cann be appliedThis− matrix to other corresponds canon-  to w + x2,0 +: e x e x e x  where j = , λ is a fixed realwhere number,Theoremj = and 1:k, Considerλ=1is a, 3 fixed, 5,u ..., real n+1.2 number,,0 The:C such canonical and thatku=1C,M3, 5, ...,R n+1.× Thewhere canonicalM has  + 1 +1 1 2 2 12 2  v = acknowledge+1 + thisacknowledge with1∈C thee1 following thisλ ( withe2 + theorem.e the12) following,n+1whenever∈ theorem.xi =0, and i =1, 2, 3, ..., n+1  2xi ∈C2xx3i3 +ee12x1xx3i3 +−ee22x2x4i4 ee1212xx4 4   2 11 2 x 2 ical matrixx − representatives− of 2,0 . We do in fact find that this is true. We 2xi 2x+i 2x+i − 2xi L.D. columns.i A1 priori,k i+11 we have 1 C   2xi + 2xi + 2xi − 2xi   equivalence class on u isuuj determinedj== equivalence[([(xxkk++ λx byλxkv+1k+1 class)+()+(2∗x, on0xk:k u λxisλxk determined+1k+1)e)e11+(+(λxλx bynk+1k++vxxk+1k+1)e2∗),e202+(:+(nλx+1λxkk xxk+1k+1)e)2(e1212n]+1)], , n 2(n+1) n   x1 22xexi1x1 22xxei 2x2 22xxei 12−x−2 22xxi   Theoremwherevacknowledge= 1j:=Theorem Consider+1, λ thisuis 1+: a with fixedConsider2,0 the real:1∈C suchfollowingu number,e1 thatλ ( theorem.eu and2 +: sucheMk12=1) that,,R3whenever, 5u, ...,× nM+1.wherexi =0 TheM, and canonicalhasiwhere=1, 2,M3, ...,has n+1 + i + i i i  We are inclined to think22 that equation∈C (2) can−− be applied to other canon-2,0 −− R ×   x3 e1x3... e2x4 e12x4   1 2 x2i ∈Cxi − ∈C−  ∈  ∈   2xi 2xi 2xi − 2xi  Proofn+1:L.D. columns.L.D.  A priori, columns. we have A priori, wen+1 have  2(n+1) n  +x3 +e1x3 e2x4 e12x4  ical matrix representatives of 21,0uj.= We1 equivalence[( doxk Theorem+ inλx factk+11 class)+( find 1x: on thatk Consideruλxis thisk determined+1) iseu1 true.+(λx2,k We0 by+:xv suchk+1)e that22∗+(,0 : λxu k Mxk+1)Re12] , × where Mhas  + +   1 1 1 kCk+1+1 n+1 w =   2xi 2xi 2x...i ...− 2xi   (3) v = 2 +1 + 1 e−1 λ (e2 +∈Ce12) , whenever∈C xi =0−, and∈i =1, 2, 3, ..., n+1  2xi 2xi 2xi − 2xi  vacknowledge= +1 this+ withwherewhere thej1 followingj==e1 Letλ ,( theorem.e,λ2uλis+is ae a12 fixedProof fixed)2L.D.,0 ,. realwhenever: Thereal columns. number, number, matrixxi A=0 representativeand priori,and, kandk=1=1 wei, =1,3 have3, ,55,, of, ...,2 ...,,u3 n, nis+1. ...,+1. a n 2(+1 The Then + canonical 1) canonical2 matrix with  xk x3e1xk e1xe32xk+1e2x4 e12ex12k+1x4   2 xi xi − −2 xi∈C1 xi − 1  −  × ww== + + + +   (3)(3) linearly22  dependent1 columns1  writtenn+1 1 as,          n+1 uj =k +1[(xk + λxuj k=+1)+(2([(nx+1)kxk+nλxλxk+1k+1)+()e1 x+(k λxλxk +k+1x)ke+11 +()e2 λx+(kλx+kxk+1x)ke+12 +()e12λx] , k xk+1)e12] ,2xi 2xi2xxkik 2xeie1x12xk...xki 2xe−ie2x2−xk+1k2+12xxi i ee1212xxk+1k+1  Theorem 1: Considerequivalenceequivalenceu  class class: such on onuvu that=isis determined determinedu LetMu+1 by by+2,v0v.× The∗where2∗1, matrix0: :e M representativeλhas(e + e ) , whenever of u is a 2(xn=0+ 1), and2 matrixi =1, 2 with, 3, ..., n+1   2,0where j = 2 , λ is1 a∈C fixedR2 real number,−2,0 1 and k =1−2 , 3,125, ..., n+1. The canonical−i × −   ++ ++ ...   Proof: ∈CProof: 22 xi ∈ ∈Cx∈Ci − −  w =  2xi 2xi 2xi − 2xi   (3) L.D. columns. A priori, we have linearly uj = dependent[(xk +λx columnsk+1)+( x writtenk λxk as,+1)e1 +( λxk + xk+1)e2 +(λxk xk+1)e12] ,w = 2xi 2xi 2xi − 2xi   (3) equivalence classn+1k +1 on u2is determined by v −2∗,0 : −   xk e1xk ...e...2xk+1 e12xk+1   n+1 11 11 Let u 11 . The matrixk representative+1 ∈C of u is a 2(n + 1) 2 matrix with   +xk +e1xk e2xk+1 e12xk+1  Let u 2,0 . Thevv== matrix representative+1+1wherewhere++Proof j = 2 , 0 of: 1 1 u where eis e , a λ 2(λ isisλjn( aea(=+e fixedfixed+ 1)+ee real )2real,)λ matrix, number,,is wheneverwhenevernumber, a fixed with and realxandxk=0 number,=0=1, ,and,and3, 5 andi,i ...,=1=1 nk,+1.,2=12, ,33,, The,...,3 ...,, 5 n, n+1canonical ...,+1 n+1. The canonical  + +  (3) 1 ∈C ∈C 2 11 22 ×1212 i i k = 1, 3, 5,× ..., n+1. w =  2xi 2xi 2xi ...... − 2xi   (3) linearlyu = dependent[(x + λx columns)+(22x writtenlinearlyxxi iλx as,1 dependent)e +(1xxi λxi−− + columnsx −−1)ek written+1+(λx2 as,x )e ] ,     2xi e2xxi 2xi e −x 2xi   j k k+1  k k+1equivalence 1 k class k+1 onnu+12 is determinedk  k+1 by12v ∗ : x2n+1 xek1x2ne+11xk 2e22(xnk+1)+1 e1212xk2(+1n+1)   2 − v =The canonical+1whereLet equivalenceu+equivalencej =2,0 1. The e, class1λ is matrix−λ a on(e fixed2 u+ isis representativee realdetermined12 determined) , number,whenever2,0 by and of x vu i k =0is =1 a , 2(2∗ and, , 03 n :,:+5i,=1 ...,1) n, 2+1.2, 3 matrix, ..., The n+1 canonical with  + + ++   2 x ∈Cx − 2 − ∈C ∈C ×  x e x e x e x  Proof: linearlyi dependenti columns written as, 2xi 2x2xn2i+1n2+1xi 2xe1i 1x2n2+1n+1...22xxii e2 2−−x2(2(n+1)n+1)2x2ixi e1212x2(2(n.+1)n+1)  Proof:  equivalence  class on u is determined by v 2∗,0 :   ++ ++ ...   k +1 1 1 1 11 1 ∈C   2x 2x 2x − 2x   where j = , λ is a fixed realProof number,nv+1n+1=: and k +1=1, 3+, 5, ..., n+1.1 e The canonicalλ (e + e ) , whenever x =0, and i =1, 2, 3, ..., n+1  2xi i 2xi i 2xi i − 2xii . . LetLetuu 2,20,0.. The The matrix matrixv = representative representative+1 of1 of+uuisis a a2 2( 2(1nn+12e+1 1) 1) λ2(2e matrix2 matrix+ e12) with withi , whenever xi =0, and i=1x,22n,+13, ...,e n1+1x2n+1 e2x2(n+1) e12x2(n+1)   2 ∈C∈C 2 xi xi − − ××    x e x ... e x e x   linearlylinearly dependent dependent columns columns  n+1 written written1  21 as, as,xi  1 xi − −   +2n+1 1 +2n+1 2 2(n+1) 12 2(n+1) equivalence class on u is determinedLet by vu v2∗=,0 :. The matrix +1 representative+   1 e1 ofuλis(e a2 2(+ne12+) 1), whenever2 matrix withxi =0, and i =1,2, 3,2 ...,x n+1 2+x +2x 2x   ∈C∈C ×  i 2x i 2x i 2x− − i 2x.  linearly dependent2 columnsxi written as,xi − −   x2n+1 ei 1x2n+1 i e2x2(n+1) i e12x2(n+1) i . Proof: Proof  :      + +  1 1 1     Let u n+1. The matrixn+1 representative of u is a 2(n + 1) 2 matrix with 2xi 2xi 2xi − 2xi . v = +1 + 1 e1 λ (e2 + e12Proof) , whenever:2,0 Let uxi =02,,0 and. Thei =1 matrix, 2, 3, representative ..., n+1 of u is a 2(n + 1) 2 matrix with  2 xi xi − − ∈C ∈C × ×       linearly dependent linearly columns dependentn+1 written columns as, written as, Let u 2,0 . The matrix representative of u is a 2(n + 1) 2 matrix with Proof: linearly dependent∈C columns written as, × n+1 Let u 2,0 . The matrix representative of u is a 2(n + 1) 2 matrix with linearly dependent∈C columns written as, ×

DIMENSIONS 74 Author: Andrew Halsaver Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew

Abstract Computing canonical equivalence classes (EC) in Clifford projective space (CPS) can be difficult, especially in high dimension. We created our own package Author: Andrew Halsaver in Mathematica to perform such computations.Mathematica By passing and in Clifford certain parame- Projective Spaces Mentor:Author: Dr. Andrew Alfonso Halsaver Agnew Author: Andrew Halsaver Author:Author: Andrew Andrew Halsaver Halsaver Author: Andrew Halsaver ters to our user-defined Mathematica function,Mathematica EquivalenceClass, and Clifford we canProjective analyze Spaces Mentor:Author: Dr.Mathematica Andrew Alfonso Halsaver Agnew and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew Author: AndrewMathematica HalsaverMathematica and and Clifford Clifford Projective ProjectiveMathematica Spaces Spaces andAuthor:Mentor: CliffordMentor: Andrew ProjectiveDr. Dr.Alfonso Halsaver Alfonso Spaces Agnew Agnew Mentor: Dr. Alfonso Agnew Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew canonical representatives of EC in CPS. The examples in this paper involveMathematica 2,0 and CliffordAuthor: Projective Andrew Spaces Halsaver Mentor: Dr. Alfonso Agnew Author: Andrew Halsaver Mathematica and Clifford Projective SpacesFurthermore,Furthermore,Mentor: we seewe thatsee Dr. that this AlfonsoC this result result Agnew is equivalentAbstract is equivalent to rightto right scalar scalar multiplication One might wonder why the first argument is passed in as a list of explicity, but our Mathematica package works with Clifford algebras such as 3,0 Mathematica and Clifford Projective Spaces Author:Mentor: Andrew Dr. Alfonso Halsaver Agnew MathematicaofFurthermore, equationmultiplication and 1Clifford bywe see of equation that Projective thisC 1 result by Spaces is equivalentMentor: to right scalar Dr. Alfonso multiplication Agnewa list.One We might have wonder set up whythe coding the first so argumentthat we can is passedhandle inexamples as a list of a and 1,1. Other Clifford algebras will be available in our package in the future. Furthermore,Computing we see thatcanonical this resultMathematica equivalence is equivalent classes andOne to (EC) Clifford might right in scalarwonder Clifford Projective multiplication why projective the Spaces first space argumentMentor: is passed Dr. in asAlfonso a list of Agnew a C Furthermore,of equation 1 we by see that this result is equivalent to right scalar multiplicationlist.of CPS We havehaving set higher up the dimensions coding so thatwhich we involves can handle more examples than one ofAuthor: CPS Andrew Halsaver Furthermore, we see that this result is equivalent to right scalar multiplicationFurthermore,of equation(CPS) we can 1 see by be thatAuthor: difficult, this result Andrew especially is equivalent Halsaver in highlist. dimension. to Weright have scalar We set created multiplication up the our coding own package so that weAuthor: can handle Andrew examples Halsaver of CPS of equation 1 by 1 1 1 having higherMathematica dimensions and which Clifford involves Projective more than Spaces one Clifford element.Mentor: Dr. Alfonso Agnew Introductionof equation 1 by MathematicaFurthermore, and we Clifford seev = that1 Projective this1 result+1of equation+ isSpaces equivalent1 in 1 Mathematica1 bye1 toMentor: rightλ (e2 scalar+ toe12 Dr.perform) multiplication Alfonso suchMathematica computations. AgnewCliffordhaving and higherelement.One By Clifford might passingdimensions We Projective wonderexpress in certain which whythis Spaces parame-involvesin the the first next more argumentMentor: example. than Dr. is one passed Alfonso Clifford in Agnew as element. a list of a v = 2 xi +1 + xi − 1 e1 − λ (e2 1+ e121) 1We express this in the next example. of equation 1 by 2 x  x ters−  to our− user-definedv =  Mathematica+1 + function,We1 eOne expresslist.λ EquivalenceClass,might We(e this+ have wondere in)the set upnext why the we example.the can coding first analyze argument so that we is can passed handle in as examples a list of of a CPS What is Clifford Algebra? 1 1 i  1 i  1 1  1 1 2 12 1 1 1 which yields: v = +1 + 1 e1v =λ (e2 +2 e12+1)xi + x1i −elist.λ Wehaving(−e have+ e higher) set up dimensions the codingAuthor: which so that involves Andrew we can more handleHalsaver than examples one Clifford of CPS element. v = +1 + 1 whiche1 λyields:(e2 + e122) x x −canonical− representatives  of EC in CPS.1  TheOne examples2 Projective might12 inwonder thisOne line paperwhy example might the involve first wonder of argument 2 why , 0 : is the passed first in argument as a list of is a passed in as a list of a Clifford Algebra is an2 associativex algebraFurthermore,x equippedwhich yields: with we see a quadratic1 that 1 thisi form. resultxk ise11 equivalentxki e2xk+1 to right2e12xk scalarx+1i  multiplicationxi − Projectivehaving− We· higher line express example dimensions this of in the which next: involvesexample.C more than one Clifford element. i p,qi − − u v = + +explicity,Mathematica but  our Mathematica and  Clifford package Projective workslist. We with haveSpaces Clifford set up algebras the coding2,Mentor:0 such so that as Dr. we 3,0 canAlfonso handle Agnew examples of CPS For example, consider the quadratic  space of. equation It is an n 1dimensional byv = j real+1 vectorxk+ ewhich1xk 1 yields:e2ex1 k+1λ (e2e12+xek12+1) • Projectiveu2 line = list. example{2, 1, We 0,C of have1/2}; 2, set0 : up the coding so that we can handle examples of CPS R 2 u x· iv = 2xi + x2ixi−+ 2x−i − 2xi . u2We = having express2, 1, 0, higher 1/2this dimensions;in the next which example. involvesC more than one Clifford element. which yields: j which yields:and 1,1. Other Clifford xk algebrase1xk will•e be2xk available+1 e12x ink+1 ourhaving package higherC indimensions the future. which involves more than one Clifford element. space wherewhichn yields:= p+q, and has a non-degenerate symmetric scalar product· whichx2xi e 2xxi e x2xi −e x2xuij v. = + + u2{ = 2,eqc2 1, 0,} 1/2= EquivalenceClass; [{u1, u2}, 2, 0, 1] k 1 k 2 kC+1 12 k+1 xk e1xk e2eqc2xk+1 =We EquivalenceClass[e12 expressxk+1 this in the nextu1, example.u2 , 2, 0, 1] xk e1xk e2xk+1 e12xk+11 u v1 = + 1 + · 2xi 2xik +12xi Projective{− 2xi line. } example of 2,0 : induces the quadratic form: whichThese yields: are precisely thej components of (3), where k =1uj ,v3,=5, ..., 2+n+1, j+= eqc2, = EquivalenceClass[We express{ u1, this} u2 in, the 2, 0,next 1] example. uj v = + + v = · +12xi + 2xi 1 2ex1i λ−(e2 2+xie12.) One might wonderkThe+1 equivalence• why theThe first class equivalence argument here is, class is passedC here inis, as a list of a · 2x 2x These2x are− precisely2x the componentsxk e1xk of (3),e2x wherek+1 ke12=1xk·+1, 3, 5,2 ...,xi 2n+1,2xi j = 2x2iProjective,− u22x =i line2,. 1, example 0, 1/2 of; { 2,0 : } i i i i u2. v x=i + xi+− Introduction− TheProjective equivalence line example class here of is, : k +1 2 2 2 and2 Thesei N. are preciselyj  the components  These of are (3), precisely where thelist. components We have of setk (3),+1• up2 where the codingk =1{, 3 so, 5 that, ..., 2 wen+1,} canCj2,= handle0 examples, of CPS x x = x + ... + x x ... x ∈ [1, p. 205].· 2xi 2xi 2xi − 2xi . u2• = eqc22, 1, = 0, EquivalenceClass[ 1/2 ; Ck +1u1, u2 , 2, 0, 1] 1 p p+1 Theseandp+qi areN precisely. the componentsk +1 of (3), where k =1, 3, 5, ..., 2n+1, j = , u2 = 2, 1, 0, 1/2Projective; line example2 of 1 : e1 These are· precisely the components− − of (3),which− where yields:∈k =1, 3, 5, ..., 2n+1, j =Theseand are, iWhat precisely. is Clifford the componentshaving Algebra? higher of (3), dimensions where2 k =1{ which, 3{, 5, ..., involves2n}+1,} morej = than{ , one Clifford} 2 element.,0+ 1 e kSince = 1, v3,is 5, invertible, ..., 2n+1, and since v Λ∗,N it follows that the equivalencek +1 classeqc2eqc2 =The EquivalenceClass[ =1+ EquivalenceClass[equivalencee• e classe u1,u1, here2 u2 u2, is, 2,, 2, 0, 0,1] 1] C 1 Theseand arei Since preciselyN. v is invertible, the componentsx andk sinceande of1x2k (3),iv  wheree2Λ.x∈k,+1 itCliffordk follows=1e12,x3k, Algebra+15 that, ...,We2 then express+1, is equivalence anj associative= this in class the, algebra next example. equipped1 u22 =with2,12 a{ 1, quadratic 0, 1/2} } ; form. 2 2+ Furthermore,and i Clifford. algebra is generally non-commutative.for u∈is determined Foruj by examplev =v, and+ thexi is chosen+ N ∗ to be the first nonzero element in theTheThe equivalence equivalence1+− classe class1 1−{ heree2 is, is,e12 } 2 2  N ∈ Since v is invertible, and since v2 Λ∗, it followsp,q thateqc2 the− equivalence= EquivalenceClass[− class u1, u2 , 2, 0, 1] ∈ for u is determined· by v,2 andxi x 2isxi chosenFor2x toi example, be− the2x firsti consider. nonzero the element quadratic in space the R . It2+ is ane1 +n dimensionale121  −→ real vector 1 e1  Clifford algebra, 2,0, has e1e2 = e2e1, whereandfirsti eSince1 columnN,e.2 vvis is ofa invertible, invertible, canonicalU as determined and and basis since since ofi by v the Λ definition ∗,, it it follows follows of column-echelonthat that the the equivalence equivalence matrices class  { 15e}2 41e12 C − { } Sincefor u isv is determined invertible, by andv, since and xvi is chosenΛ∗, it follows to be the that first the nonzero2+The equivalencee21 equivalence+ elemente12 class in class −→ the here is,1 e +  Since v is invertible, and since v Λfirst∗,∈ it column followsc of thatU as the determined equivalence by class the definitionspace where of column-echelonn = Projectivep+q, and has line matricesk a example+1 non-degenerate of symmetric: 1+e scalar1 2 e2 producte12 which1 e1 + 11 15+ee212 412 e12 the algebra [1, p.188-189].  for[3,up.147]is determined.  by v, and xi is chosen to be the first nonzero element in the 2,0  − 1 e+1 +4 + +4 .  TheseSince areclass preciselyv forcis invertible,u is thedetermined components and since byfor v ofv, andu (3),firstisΛ x determined wherei, columnis it chosen followsk =1 of to by,U that3 be,asv5•,, the the ...,determinedand2 first equivalencenx+1,i is nonzero chosenj = by the class to definition be, the firstC of nonzero column-echelon1+e1 elemente2 −e matrices12 in1− the   2 2    for u is determined by v, and xi is chosen[3 to, p.147] be the. first nonzero element in the∗ induces the quadraticu2 = form:2, 1, 0, 1/22 ; 1+e1 −e2 − e12  − 2 42  4 . first column of U as determinedAuthor:first by theAndrew column definition of Halsaverc U as of determinedcolumn-echelon by the matrices definition of column-echelon −2+ matrices1e−1 + e12   −→ 15e  411e e1 Whatfirst is a projectivecolumn of space?U as determined by theandfor definitionExamplesui iselement determined. of column-echelon in the by firstv, and columnx matricesi is chosenofU [3 as, p.147] todetermined be the.  first by nonzero the definition element{ inof the} 2+e1 + 1e12  −→  2 12 N c 2 2  151 e2 e 41+e12 + +  Mathematicac and Cliffordn Projective[3Examples, p.147]∈ Spaces.  Mentor: Dr. Alfonsoc Agnew eqc2 =2 EquivalenceClass[2 Now,2 consideru1, u2u2 ,2+ 2, 0,e2,10 1]+with2 2 e12 a matrix1+e −→1 representativee12 ee12+ + with1 linearly indepen- A projective space, denoted as , is thefirst set columncolumn-echelon of lines of throughU as determined thematrices origin by [3,[3 of, the p.147]p.147] definitionc..  of column-echelonx x = x + matrices... + x xNow,Now,{ ...consider consider x }∈C u [1, , p. 205].with a matrix  representative representative1  −15e2 withwith441 e linearlylinearly12 24 indepen-2. [3, p.147] .  FP Examples 1 p p+1 p+q 2,20 − −−1 e +4 4+ .    n+1 SinceThesec v is operations invertible, andare done since withv Λ the∗, it function followsEquivalenceClass. that· theThe equivalence equivalencedent Below class− class columns. are here− is, In− particular∈C consider these six matrices:1 1  the vector space, F . A simple example[3 would,Examplesp.147] be the. real projective space, dent columns. In particular consider2+ thesee + sixe − matrices:  −→4 4 .  These operations are doneExamples with the function EquivalenceClass. Below areindependent columns. In particular1 consider 12 these six matrices: 15e2 41e12 1 Examples Abstract 2 for twou is examples determined of bythisv, function: and xi is chosen to be the first nonzero element in the 2 2 2 . In this case, each element of 0 is mapped to its respective equiva- TheseFurthermore, operations Clifford are done algebra with is the generally functionNow,Now, non-commutative. consider EquivalenceClass. consideru u 2,0 with For2,0 Below awith matrix example a are representativematrix the representative with linearly 1 with indepen-e1 + linearly+ indepen- RP R two examples of this function: x y x y ∈C x 1y e1 00 00 00  −{ }firstExamples columnTheseExamples1 of operationsU as determined are done by with the definitionthe function of column-echelon EquivalenceClass. matrices Below1 aredent1 columns.1 In particular1 2∈C consider1 1 these six matrices: − 4 4 . lence classThese (i.e. a operationsline). Topologically, are done the with set the of EC,functionrepresents EquivalenceClass. a circle since BelowThesetwo are examplesClifford operations algebra, of this are function: done 2,0, with has e the1e2 function= xNow,edent12e1y, EquivalenceClass.1 whereconsider columns.ex1u1,e In2y particular1 is2,0 aBelowwith canonicalx considera1+ are matrixy1 basis these representative of00 six matrices: with 00 linearly indepen-00 Computing canonical equivalenceRPc classes (EC) in Clifford projective space 1+xe21 ye22 e12 00 002 2 x1 y1 x1 y1 00  2 [3,twop.147] examplesThese. operations of this function: are donetwo with examples the function of this EquivalenceClass. function:C − { ∈C} any twotwo points examples on(CPS) the ofsame this can function:be line difficult, in belong especially toThese the in high same operations dimension. EC [2, p.2]. are We done created with our the own functionthe package algebra EquivalenceClass. [1, p.188-189]. Below dent are−x columns.2 1−y2,  In particular00,  consider00 these, 2 six matrices:x1 y1,  x1 y1,  00 R Projective point example using 2,0 coefficients: x3 y3 x1 y1 ,x2 yx21Now,y1  consider, 00x1 yu1  ,x002withy2 a matrix00, 00 representative00 ,x1 withy1 linearly indepen- •Below are two examples of this function:C 2+e1 + xe121 y1  −→ x1 y1 15ex1 y4121,0e 00 00 00 in Mathematica to performExamplestwo such examples computations.Projective of this point function: By example passing in using certain2,0 parame-coefficients: x3 x2y3y2 x002 y2 00002∈C x112 yx12 y2 x1 y1 0000 x1 y1 The first argument is an ordered listWhat representing is a projective the coefficients space? (u,ux4,uy,u4 2)  x3denty3 columns. 1 ex2 In+y particular2  + x3 considery3  these x2 sixy2 matrices:  x2 y2  • C Projective point example using0 12,x021coefficients:y121x2 y2 , x1 y100 , 1 x1 y100,  00x, 1 y1 00 , x1 y1 0000 ters to our user-defined MathematicaProjectiveThe function, first argument point EquivalenceClass, example is an ordered using we list can• representingcoefficients: analyze the coefficients (u,uC,ux,u4nx3y)4y3  ,xx32 y23 − 00,x2 4y2  4x2 ,yx23. y3 00 ,x2 y2x1 y1 ,x2 y2   Methodology Theseto the operations Clifford are algebra done having with the theProjective2 function,0 form A point projectiveEquivalenceClass. example space, using denoted Below20,0 coefficients: as1are FP2 12, is the set of lines  through the origin  of         Projective point example using 2,0 coefficients:• C The first argument is an ordered listx representing2 xy42x3y4 y3 the00x coefficients3 yx32 y2 (xu002 0,uy21,u002,u 12x3) xy13 y1x2 yx2 y2x1 y100x2 y2 00x1 y1 canonical representatives of EC in CPS. TheProjective examples point in this example paper• involve using 2 , 0 coefficients:n+1 C where x1y2 x2,y1 =0x. 1Usingy1 , column x1 reductiony1 ,  onx each1 y1 of, these matrices00, yields 00 00 • C Theto the first Clifford argument algebra is anordered having the list formrepresentingthe vector space, the coefficients. A (u simple,u,u example,u2 )  would be the real projective space,          Constructing CPS in Mathematica two examplesProjective· of point this function: example using The2,0 coefficients: firstC argument is anF ordered list0 representing1 wherex2 3 12yx3x−14y the2 y4 coefficientsx2yx12 =0y2x.3Using (yu30,u column100,u2,ux122 reduction)y2 x2  ony2x3 each y3 of these00 x matrices2 y2 x yields1 y1x2 y2  The first argument is an ordered list representing the coefficients (u0,u1,u2,u12)to the1 Clifford algebraNow, consider havingthese theu form matrices 22,0 with respectively: a matrixx representativey 00 with linearly00 indepen- x y x y 00 explicity, but our Mathematica• package worksThe with first Cliffordargumentu algebras= isu 0anC+ orderedu such1e1 + as listu.2 Ine3representing2,0+ thisu12 case,e12. eachthe coefficients element of where 0x1y−is2 mappedx2y1 =02 . toUsing2 its respectivecolumn  reduction equiva- on each of these matrices1 1 yields  1 1  to the Clifford algebra having the formRP ∈CtheseRx4 matricesy4   respectively:x3 y3  , x2 y2  , x3 y3  , x2 y2  , x2 y2  , It is discussedto the Clifford in Agnew-Childress’ algebra having thepaper form theThe standard first argument approach is an to ordered con- listto representing the CliffordC the algebradent coefficients columns. having (u the In0,u particular form1,u2,u12)−{ consider} − these six1 matrices:               and 1,1. Other Clifford algebras will be available in our packageu = u0 + inu the1e1lence future.+ u2 classe2 + u (i.e.12e12 a. line). Topologically, thesewhere the matricessetx ofy EC, respectively:x3y y=03represents.Usingx2 a column circley2 since reduction00 on each ofx2 thesey2 matrices 00 yields x1 y1 C (u0, u1, u2, u12) to the Clifford algebra having the formu = u0 + u1e1n+1+whereu2e2 + 1 u  12 2 e 12 . RP 2 1  . Using column  reduction on each of these   structing projective n space over a field. Furtherto they theThe discuss Clifford last three how algebra arguments to construct having in the EquivalenceClass form represent p, q, and n in  102 . 10−  10 00 00 00 − Projective point exampleu using= u0 +u2,10ecoefficients:1 + u2e2 + u12e12. p,q  x4 y4   x3 y3   x2 y2   x3 y3   x2 y2   x2 y2  any two pointsx on1 u they=1 u same0 + u line1xe11 +where inyC1uR2ne+1matrices2thesebelongx+110yu21210x matricese112 toxyields.2yy1 the1 =0 samerespectively:these10. Using00 ECmatrices column[2,1010 p.2]. respectively: reduction000000 on each0000 of00 these00 matrices00 yields    a projective space where a ring ofuk= u0k+matrices,u1e1•+ uM2eThe2,+ is lastu used12e three12 for. scalars. arguments They in EquivalenceClassC represent p, q, and n in p,q01. − 00 00 10 10 00 Introduction × The first argument is an ordered list representingThe last three the coefficients arguments (u in,u EquivalenceClass,utheseC,u matrices) 0101, represent respectively: 00p,, q, and00n00in , n+110. 10 , 10 10 , 00 00    note that since is typically not a field, we are working with modules overu = ringsu + u e + u e + u e . x2 y2 0 001 2 n+112 00 x1 y1 p,qx1 y1 00 M The last three arguments in0 EquivalenceClass1 1 2 2 12 represent12 p, q, and, n in  ab.,  10,where01,  , x1y102 , ,00x2y1n,=0+1C10,. Using,01 column ,,00 ,00 reduction , 00 on,10 each of00 these matrices yields to theu1 = Clifford1, 1, algebra-1, -1 ; having thenThe+1 form last three arguments in EquivalenceClassp,q representabab p,01 q, and n−in0000p,q . 0101 000010 10 instead of vectorThe lastWhat spaces three is [2, Cliffordarguments p.2]. Their Algebra? in EquivalenceClass discussion on construcing represent matrixp, q, and projec-n in p,q . Methodology x3 y3 x2 Cy2 cd00 theseab matricesx2 01y2 respectively:C  00ab  x011 y1  01 u1 = {1, 1, -1, -1}; C n+1 10cdcd01 10ab 00 01100100 ab00ab 100100011001 0001 00 tive spaces serves as theClifford cornerstone Algebra for is ouran associative MathematicaTheeqc1 last algebra = three{ package. EquivalenceClass[ argumentsequipped} with in EquivalenceClass au1 quadratic, 2, 0,u1 0] =form. represent1, 1, -1,x -14p,;y q4, and ninx3 p,qy3 . x2 y2 ,  x3 y3 , x2 y2,   x2 ,y2   ,    Theeqc1 last = three EquivalenceClass[ argumentsu = u 0in{+u1 EquivalenceClassu}1,e 2,1 + 0,Constructingu 0]2e2 + u12 represente12. CPS in Mathematica  C  01 ab 00 01  00 00 10 01 10 000010 u1 = 1,p,q 1, -1, -1 ; {  }    x3 y2x3,xy2 y3x2 y310x,1 yx31 y3x3x10y31y1, x44yy2102 x,x2 y24y4 00x, 1y4x1 yx44 00y1x4 y1 00 u1 = For1, example,1, -1, -1 ; consider the quadraticThis space yieldsR . theIt isn equivalence+1 an n dimensional{ class,} u1 real=eqc11, vectorIt 1, = -1,is EquivalenceClass[ discussed -1 ; in Agnew-Childress’u1where, 2, 0,awhere 0]= papera =x−y the−x standard,by =,b= approachx−y− x,c,c toy == con- x−−y , xand, andy d =d = − x−y x y The construction of a CPS requires that we takep, q a, and module,{ n in p,q } , . over the where x1y2 x2y1 =0. Usingab column cd3 2 reduction012 3 ab on each1003 of01 these3 101 matrices ab4 2 yields002 4 01101 014 4 1 { } eqc1 = EquivalenceClass[ u1 , 2, 0, 0]{ } { } where ax=y x1 xy2−y x2 y101,bx=yx1 y2 x x−00y2 y1 ,cx1=yy200xx2 y−1y 10, andx1 yd2x=yx2 10y1 x −y 00 space where n = p+q, and has a non-degenerateu1This = 1, yields 1, -1, the -1C symmetric; equivalence scalar class, producteqc1 = whichEquivalenceClass[ u1− , 2, 0, 0]n+1 12 2−1  1 2 −2 1  12− 2 1  −1 2 2 1 Clifford algebra.eqc1 = Next, EquivalenceClass[ we impose anu1 equivalence, 2, 0, 0]The relation last three (denoted arguments by in){ EquivalenceClass on } structingThis yields represent projectivethese the equivalencep, matrices qn,{ andspace} n respectively:in class, overp,q a. field. cd Furtherx−1 y2 theyabx2 y discuss1  x how,−101y2 tox construct2 y1,abx−1 y2,x012 y1  ,01x−1 y2 x2, y1  induces the quadratic{ form:} { } ∼ 1 e1 − C   x−y abxy  x−01y x y 00 x−y 01xy  00x−y x 10y n+1 0 . The definition of is, eqc1This = yields EquivalenceClass[ the equivalence1+u1e class,, 2,e 0, 0]ea projective+ space where a ring of k kWematrices, find that3 M the2 , is scaling used2 3  matrix for scalars. required1 3  They to3 yield1  such4 reduced2 2 matrices4  is  1 4 4 1 p,q This yields the equivalence class, { }1 This2 12 yields the1 equivalencee1 class, Wewhere find thata = the scaling−cd matrix,b= requiredab− to yield,c01= such reduced− ab matrices, and d01= is − 01 C −{ } ∼ u1 = {1, 1, -1, -1}; − −  −→ 2 2 × x3 y2 x xy2y3 x y x1 y3 x xy3 y1x y x4 y2 xxy2 y4 x y  x1 y4 x xy4 y1x y  2 2 2 2 1+e1 e2 enote12 that since+ M .is10 typically not10 a field,We we find10 are1 that working1e1 the2 scaling00 with2 1  modules matrix 100 overrequired2  rings2 1 to00 yield1 such2 reduced2 1  matrices 1 is2  2 1  u1This = yields1, 1,the -1, -1 equivalence; class,  where a = − −,b= y2 − −,cy=1 − −, and d = − − x x = x1 + ... + xp eqc1xp+1 = EquivalenceClass... xp+q [1, p.[{u1}, 205].− 2,− 0, 0] −→ 12 e12 . 1+e1 e2 e12 x y + x y x y x y   x y x y  x y x y   Definition 1 : u v if and only· if there exists a−λ { − p,q∗ such− that} u = v λ . instead of vector spaces01 [2, p.2].00 Their discussion1 001e212 on2 2 construcing1 10y21 2 matrix102 1 projec-−y1 001 2 2 1 1 2 2 1 1 e1 1+e1 e2 e12 + − −  −→ − . x13yy22 x−2xy21y3 x1 y2 xx12 y13 −x3 y1 x4 y2 x2 y4− x1 y4 x4 y1 ∼ 1+e e e eqc1∈C = EquivalenceClass[+ u1 , 2, 0, 0] 1+e1 e2, e12  , + where , a =  ,−y2 ,b−=,  y1 −  ,c= − , and d = − 1 2 12 c. This definition in Goode-Annin’s{ }− − texttive actually −→ spaces1 2 definese serves2 this. asab as the “row-echelon− cornerstone− 01 −→matri- for our2 We00 Mathematica2 find that the01x package.y scalingx− y matrix00x y required− x 10y  to yield such reduced(4) matrices is Furthermore, Clifford− algebra− is −→generally2 non-commutative.2 For example the 1  . 1 x21 y2 x22 x12 y1 1 2 x1x−12y21 x2 y1 x1 y2 x2 y1 x1 y2 x2 y1 c. This definition. in1+ Goode-Annin’se1 e2 e12 text actually+ defines this as “row-echelon matri-We find that the scalingx−1 matrix−y2 x2 requiredy1 x−1 y to2 yieldx2y1 such reduced matrices(4) is ThisThisces”. yields yields Even the thoughthe equivalence equivalence we cite this class, class, definition as “column-echelon matrices”,cd we noteab that the  01  ab  −n+101  01−  − − Clifford algebra, 2,0, has e1e2 = e2e1, where e1,e2 is a− canonical− c. −→ basis ThisThe2 of definition construction2 .  in Goode-Annin’s of a CPS requires text actually that defines we take this a as module, “row-echelonx1yx22 −x 2 y1y matri-,2 x over1y2x the1x2−y1 y1  (4) C c.−definitionsces”. This Even definitionare though analogous{ in we Goode-Annin’s cite} where this definition matrices text as are actually “column-echelon transposed defines to this one matrices”, as another. “row-echelon we The note definition that matri- the     − −Cxp,q2  −x1 −  c. Thisthe definition algebra in [1, Goode-Annin’s p.188-189]. text actually defines this as “row-echelon matri-c. ThisCliffordces”. definition Even algebra. though in Goode-Annin’s we Next, cite this we definition text impose actually as an “column-echelonequivalence defines this as relationmatrices”,We “row-echelonx find1 y2 (denoted wethatxy note−x2 matri-21y they12 that byscalingxx1 the2y2y1) on matrixxxy211yy12  requiredx2 y1 to yield such reduced matrices is definitions are analogous where matrices are transposed1 e1 to one another.x3 y2 Thex2 definitiony3 x1 y3 x3 y1 x4 y2 x2x1y4 y1 − x1y4 x4 y1 ces”.states, Even “An thoughm n matrix we citeis this called definition a reduced as “column-echelon row-echelonn+1 matrix matrices”, if it satisfies we note the that following thewhich is the inverse of the matrixx−1 y2 x2−y1 ∼x.−1 y2 x2−y1  (4) c. This definition× in Goode-Annin’s1+e1 e2 textces”.e12 actually Evendefinitions though defines+0 are wewhere. this analogous The cite as thisa definition “row-echelon= definition where− matrices of as matri- “column-echelonis,,b are= transposed− matrices”, to one,c another. we=x note1 y2 The that−x definition the2 y1 , andx1 y2d =x2 y1 − ces”. Even though we cite this definition as “column-echelonstates, “An m matrices”,n matrix we is note called that a reduced the row-echelonp,q 2 2 matrix if it satisfiesx y thex followingy x y x y x y −xx2yx2y2 −y x xy1 x y y What is a projective space? definitionsconditions: are (1) analogous It is a row-echelon where− matrices− matrix. areC −→ (2) transposed Any−{ column} to. one that another. contains1 2 The a leading definition∼2 1 1 has 1 2 2 1  1 2 −x2−1y  − 2 1 2  2 1 1 (4) definitions are analogous where matrices areces”. transposed Evenn though to one× we another. cite this The definition definitiondefinitions as “column-echelonstates, are “An analogousm matrices”,n matrix where is we matrices called note− a that reduced are the transposed row-echelon to− one matrix another. if it satisfies The definition− thex following1 1 x − − A projective space, denoted as conditions:, is the (1) set It isof a lines row-echelon through matrix. the origin (2) Any of column× that contains a leadingwhich 1 is has the inverse of the matrix x12y2 x1x2 yx.11 yx11 yx2 yx2 yx1 y x y states,zerosFP everywhere“An m n matrix else.” [3,is called p.147] a reduced row-echelon matrix if it satisfies the followingwhich is the inverse of the matrix− x2 y2 1 2. 2 1 1 2 2 1 states, “An m n matrix is calledn+1 a reduced row-echelondefinitions matrixare analogous× if it satisfies where the matrices followingstates, areconditions: transposed “AnDefinitionm n to(1)matrix one It 1 is: another. is au called row-echelonv aif The reduced and definition matrix. only row-echelon if (2) there Any matrix exists column if a itthatλ satisfies contains p,q∗ thesuch afollowing leading that− 1u has= v λ . − − − (4) the vector space, . A simple examplezeros everywhere would else.” be the [3, real p.147] projective space,× We∼ find that the scaling matrix required∈C tox1 yieldy2  suchx2 y1x reduced2xy12y2 matricesx2 y1  is   × F c.conditions: This definition (1) It in is Goode-Annin’s a row-echelon matrix. textconditions: actually (2) Any (1) defines column It is athis thatrow-echelon as contains “row-echelon matrix. a leading matri- (2) 1 Any has column that contains a leading− 1 has − x2  x1 conditions:1 (1) It is a row-echelon matrix. (2)states, Any2 column “An m thatn matrix contains is called a leading a reduced 1 has row-echelonzeros everywhere matrix ifelse.” it satisfies [3, p.147] the following   x1 y1  . In this case, each element of c. This definition0 ×is in mappedGoode-Annin’s to text its actually respective defines equiva-this as “row-echelon matri- ces”. Even though − RP ces”.zerosR Even everywhere though we else.” cite [3, this p.147] definition as “column-echelon matrices”, we note that the whichy is the inverse ofy the matrix  x1 y2 x.2 y1 x1 y2 x2 y1  zeros everywhere else.” [3, p.147] conditions:we cite−{ this definition (1)} It is aas row-echelon“column-echelon1 matrix. matrices”,zeros (2) everywherewe Any note columnthat the else.” definitions that [3, contains p.147] are analogous a leading where 1 has 2 1 x1 y1x2 y2  lence class (i.e. a line). Topologically,definitions the set are of analogous EC, RP whererepresents matrices a are circle transposed since to one another. The definitionwhich is the inverse of the− matrix  . − −  zerosmatrices everywhere are transposed else.” to [3,one p.147]another. The definition states, “An matrix is called a reduced x1 y2 x2 y1 x1 y2 x2 y1 x y any two points on the same line in 2 belong to the same EC [2, p.2]. m×n − − 2 2 x y states,row-echelonR “An m matrixn matrix if it satisfies is called the following a reduced conditions: row-echelon (1) It is a row-echelon matrix if itmatrix. satisfies (2)Any the column following    1 (4)1 × x2 which is thex1 inverse of the matrix . conditions: (1) It is a row-echelon matrix. (2) Any column that contains a leading 1 has − x2 y2 that contains a leading 1 has zeros everywhere else.” [3, p.147]  x y x y x y x y    Methodology zeros everywhere else.” [3, p.147]  1 2 2 1 1 2 2 1   − −  x y Constructing CPS in Mathematica which is the inverse of the matrix 1 1 . x y DIMENSIONS 75 It is discussed in Agnew-Childress’ paper the standard approach to con-  2 2  structing projective n space over a field. Further they discuss how to construct − a projective space where a ring of k k matrices, M, is used for scalars. They × note that since M is typically not a field, we are working with modules over rings instead of vector spaces [2, p.2]. Their discussion on construcing matrix projec- tive spaces serves as the cornerstone for our Mathematica package. The construction of a CPS requires that we take a module, n+1, over the C p,q Clifford algebra. Next, we impose an equivalence relation (denoted by ) on n+1 0 . The definition of is, ∼ C p,q −{ } ∼

Definition 1 : u v if and only if there exists a λ ∗ such that u = v λ . ∼ ∈C p,q Author: Andrew Halsaver Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew Author: Andrew Halsaver Author: Andrew Halsaver Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew Mathematica and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew Author: Andrew Halsaver One might wonder why the first argument is passedMathematica in as a list and of Clifford a Projective Spaces Mentor: Dr. Alfonso Agnew list. We have set up the coding so that we can handle examples of CPS Author: Andrew Halsaver Now, matrix (4), as its Clifford representative, is Λ∗: havingOne higher might dimensions wonder why which the first involves argument more thanis passed one Cliffordin as a list element.Mathematica of a and Clifford Projective Spaces Mentor: Dr. Alfonso Agnew list. We have set up the coding so that we can handle examples of CPS We express this in the next example. Now, matrix (4), as its Clifford1 y representative,2 + x1 y2 is Λx∗1: y1 + x2 y1 x2 having higher dimensions which involves more than one Clifford element. Λ∗ = + − e1 e2 − e12 (5) 2 x1y2 x2y1 x1y2 x2y1 − x1y2 x2y1 − x1y2 x2y1 We express this in the next example. 1 y2 + x1 Now, matrixy2 x (4),1 − as its Cliffordy1 + x2 representative,− y1 x2 is− Λ∗: −  Projective line example of 2,0 : Λ∗ = + − e1 e2 − e12 (5) • C 2 x1y2 x2y1Thex inverse1y2 x2 ofy1 equation− x1y2 (5)x is2y1 − x1y2 x2y1 u2 = 2, 1, 0, 1/2 ; − 1 y−+ x y −x y +−x  y x Projective{ line example} of : 2 1 2 1 1 2 1 2 eqc2 = EquivalenceClass[ u1,2,0 u2 , 2, 0, 1] The inverse of equationΛ∗ = (5) is + − e1 e2 − e12 (5) • {C } 2 x1y2 x2y1 x1y2 x2y1 2(−x1xy12y2 x2xy21y)1 − x1y2 x2y1 u2The = equivalence2, 1, 0, 1/2 class; here is, − − −− −  { } (x + y ) (x y )e (x + y )e +(x y )e eqc2 = EquivalenceClass[ u1, u2 , 2, 0, 1] 2(x y 1 x 2y ) 1 2 1 2 1 2 2 1 12 { } The inverse of equation1 2 (5)2 is1− − − − The equivalence class here is, 1 e1 − + (x1 + y2) (x1 y2)e1 (x2 + y1)e2 +(x2 y1)e12 1+e e e It follows that (5) is invertible because the denominator of its inverse is nonzero. 1 2 12 2 2 − − − 2(x1y2 −x2y1) − −   The only way the denominator would be zero is when x1 = x2 = y1 = y2 = 0 and 1 1 e1 It follows that (5) is invertible because the denominator− of its inverse is nonzero. 2+e1 + e12  −→ + this cannot(x1 happen+ y2) per(x1 definitiony2)e1 ( 1.x2 + y1)e2 +(x2 y1)e12 1+e1 e2 e12  15e2 41e12 − − − − 2  1 e +2 2 + The only way the denominator would be zero is when x1 = x2 = y1 = y2 = 0 and − 1−  − 1 4 4  2+e + e  −→  this cannot. happenIt follows per definition thatBy this (5) 1. is example, invertible we because may have the denominator found the canonical of its inverse matrix is nonzero. representative 1 12   15e2 41e12  2  1 e + +  The onlyfor waythe module the denominator2 . Moreover, would be we zero believe is when thisx applies= x = fory =ny+1=. By0 and this, we Now, consider u 2 with a matrix representative− 1 4 with linearly4 By this indepen- example, we may have found2,0 the canonical matrix representative1 2 1 2,02 2,0  . this cannotpropose happen the following perC definition conjecture. 1. C ∈C   2 n+1 dent columns. In particular consider these six matrices: for the module 2,0. Moreover, we believe this applies for 2,0 . By this, we 2 C n+1 C Now, consider u 2,0 with a matrix representative withpropose linearly indepen-the followingBy conjecture.Conjecture this example, 1: Let weu may have2,0 be found defined the ascanonical follows: matrix representative ∈C ∈C n+1 dentx1 columns.y1 In particularx1 y1 considerx1 y these1 six matrices:00 00 00for the module 2 . Moreover, we believe this applies for  . By this, we Conjecture 1: Let u n+1 beC defined2,0 as follows: C 2,0 x2 y2 00 00 x1 y1 x1 y1 00propose the2,0 following1 conjecture.   ,   ,   ,   ,   ,   ∈C uj = [(xk + yk+1)+(xk yk+1)e1 +(xk+1 + yk)e2 +(xk+1 yk)e12] , xx13 yy13 xx12 yy12 x001 y1 00x2 y2 0000 00x1 y1 2 − − 1 n+1 x y 00 00 x y x y 00Conjecture 1: Let u 2,0 be defined as follows:  x24 y24 ,  x3 y3 ,  x2 y2 ,  x13 y13 ,  x12 y12 uj, = x[(2xky+2 yk+1)+(xk yk+1)e∈C1 +(xk+1 + yk)e2 +(xk+1 yk)e12] ,  x y   x y   00  x y   00 2x y  − k +1 −  3 3   2 2     2 2     1 1  where j = and k =1, 3, 5, ..., 2n + 1. Let i

Department of Mathematics, California State University, Fullerton

Leonila Lagunes Advisor: Charles H. Lee

Abstract Cancer treatments have been shown to be more effective if the cancer scientists to develop alternative methods for detection, such as is detected and treated at an early stage. Current cancer detection pattern recognition. Pattern recognition techniques such as Support methods include imaging as well as tissue and blood-sample testing. Vector Machine, Discriminatory Analysis, etc. have been used in These methods are expensive and invasive for patients, thus scientists cancer detection in the past. In this paper, we consider and develop have been driven to develop new alternatives to detect cancer. new Biomimetic Pattern Recognition techniques. Biomimetic Pattern Recognition (BPR) is a technique that constructs a hyper-dimensional (HD) geometric body by mimicking a biological DNA Microarray Data system and uses it for classification. BPR is derived from the Principle Diagnosis and treatment of cancer can be improved by characterizing of Homology-Continuity, which assumes elements of the same class gene expression levels in healthy and cancerous tissue. Gene are biologically evolved and continuously connected. In other words, expression levels can be studied through microarray technology. between any two elements of the same class, there is a gradual Microarray technology allows researchers to measure and monitor connection. These connecting branches form HD line segments or the expression levels of thousands of genes simultaneously for a hyper-surfaces. The resulting topological structure, known as a given organism [1]. DNA microarray data can be used to determine biomimetic structure, mimics a biological class. In recent years, BPR which genes are expressed at different levels between cancer-free has been successfully used in voice, facial, and iris recognition cells and cancer-containing cells [2]. Biologists gather DNA from software. In this project we developed new BPR algorithms and both cancerous and healthy cells for comparison and is tagged with classification schemes to detect specific cancers using DNA microarray red fluorescence for cancer and green for normal. DNA fragments data. We investigated the performance of the proposed BPR methods then bind to their complements in a microarray chip as a part of a on data for bladder, colon, leukemia, and liver cancers. Results indicate process called hybridization (Figure 1). A red spot indicates that the that the proposed BPR has an increase in recognition rate when gene is highly expressed in a cancer cell and minimally in a healthy compared to previous techniques. BPR has shown to be a promising cell [2]. Green signifies that the gene is minimally expressed in a approach for cancer detection using DNA microarray data. cancer cell and highly in a healthy cell. Yellow fluorescence shows that a gene is almost equally expressed in both cells. A black spot Introduction indicates that the gene is inactive in both cell [3]. A laser then scans Cancer treatments have been shown to be more effective if detected the microarray and determines the expression levels of each gene and treated at an early stage. Current cancer detection methods according to the intensity of the color and is given a numeric value. include imagining and blood-sample testing. Cancer imaging Each sample is defined as a sequence of numerical values of gene encompasses various techniques including traditional X-Rays, expression levels. In recent years, DNA microarray technology has X-Ray-based computed tomography, Magnetic Resonance Imaging, provided a promising tool to determine the diagnosis and prognosis Positron Emission Tomography, ultrasound scans, and endoscopy [1]. of different cancer types [4-7]. Current detection methods can be expensive and invasive, driving

DIMENSIONS 77 bothboth cells. cells. A A black black spot spot indicates indicates that that the the gene gene is is class.class. These These connecting connecting branches branches can can be be HD HD line line both cells. A blackboth spot cells. indicates A black that spot the gene indicates is class. that the These gene connecting is class. branchesThese connecting can be HD branches line can be HD line both cells. A black spot indicates that the gene is inactiveinactiveclass. These in in both both connecting cell cell [3]. [3]. branches A A laser laser thencan then be scans scans HD the linethe segmentssegments or or hyper-surfaces hyper-surfaces and and the the resulting resulting topo- topo- both cells. A black spot indicatesinactive thatboth the cells. in gene both A is black cellinactiveclass. [3]. spot These A in indicates laser both connecting then cell that [3]. scans the branches A gene the laser issegments then canclass. be scans These HD orhyper-surfaces the line connectingsegments branches and or hyper-surfaces the resulting can be HD topo- and line the resulting topo- bothboth cells. cells. A Ainactive black black spot spot in indicates bothindicates cell that that [3]. the the A genelaser gene is thenis class.class. scans These These the microarraymicroarray connectingsegments connecting or and and branches branches hyper-surfaces determines determines can can be be the the and HD HD expression expression the line line resulting levels levels topo- of of logicallogical structure structure forms forms a a “biological” “biological” organism, organism, inactive in both cell [3]. A lasermicroarrayinactive then scans inand both the determinesmicroarray cellsegments [3]. the A andor expression laser hyper-surfaces determines then levels scans the and of expression the thelogicalsegments resulting levels structure ortopo- of hyper-surfaces formslogical a structure “biological” and the forms resulting organism, a “biological” topo- organism, inactiveinactiveboth in in both bothcells.microarray cell cell A black [3]. [3]. and A A spot determines laser laser indicates then thenthe scans scans that expression the the the genesegmentssegments levels is class. of or oreach hyper-surfaceseach hyper-surfaceslogical These gene gene connecting structure according according and and forms thebranches the to to the the resulting resulting aintensity intensity “biological” can topo- topo- be of of HD the the color linecolor organism, and and whichwhich can can be be used used for for classification. classification. One One special special microarray and determinesboth the cells. expressioneach Amicroarray geneblack levels according spot and of indicates determineseachbothlogical to the gene cells. that intensity structure according the the A expression black gene of the forms to is spot thecolorclass. levels intensity indicatesa and “biological” These ofwhich of thatlogical the connecting the colorcan organism, gene structure be and usedis brancheswhichclass. forforms classification. can These a be“biological” connectingHDused line Onefor classification. organism, special branches can One be HDspecial line both cells. A black spotmicroarraymicroarray indicatesinactive and and thateach determines indetermines the gene both gene according cell is the the [3].class. expression expression to A the Theselaser intensity levels levels then connecting scansof of thelogicallogical thecolor branchessegments and structure structure canisiswhich given given be or forms formsHD hyper-surfacescan a a numeric numeric line bea a “biological” “biological” used value. value. for and Eachclassification. Each the organism, organism, sampleresulting sample is is One topo- defined defined special as as characteristiccharacteristic of of BPR BPR is is that that it it requires requires only only a a small small each geneboth according cells. A to black theinactive intensity spot indicates inis ofgiveneach both the gene acolor cell that numeric according [3].the and gene Ais value.inactivewhich givenlaser tois Eachthe then aclass. can numericintensity in sample be both scans These used value. cell of is the theconnectingfor defined [3]. Eachcolorsegments classification. A as sampleand laser branchescharacteristic orwhich then ishyper-surfaces One defined scans can can special be of asbe the BPRHD usedcharacteristic andsegments line is for the that classification. resulting it or requires of hyper-surfaces BPR topo- only is One that a small special it and requires the resulting only a small topo- inactive in both cell [3].eacheach A gene genemicroarray laser according according thenis given scansand to to a thedetermines the numeric the intensity intensitysegments value. the of of the expressionthe Each or color color hyper-surfaces sample and and levels iswhichwhich defined of and canlogical can as the be beaa resultingcharacteristic sequence usedsequence usedstructure for for topo- ofclassification. ofclassification. forms numerical ofnumerical BPR a “biological”is thatvalues values One One it requiresspecialspecial of of gene gene organism, only expression expression a small numbernumber of of samples samples as as opposed opposed to to traditional traditional pattern pattern bothis given cells.inactive a numeric A black in value. both spotmicroarray cell Each [3]. indicates samplea A sequenceis and laser given is determines defined thenthat a of numeric numerical scans as the theamicroarraycharacteristic sequence value.gene the expression valuessegments Each is of and ofnumericallevels sample ofclass. gene determines BPR or of hyper-surfaces expression is isThese defined values thatlogical the it requires of asexpression connecting structurenumber gene andcharacteristic onlyexpression the of levelsforms resulting asamples small branches ofofa number “biological”BPR astopo-logical opposed is that of can structure samples itto organism, requires traditionalbe HDas forms opposed only linepattern a a “biological”small to traditional organism, pattern microarray and determinesisis given given theeach aexpression a numeric numeric genea sequence according value. value. levels ofEach Each of to numerical thelogical sample sample intensity isvalues structureis defined defined of the of geneas formscolor as characteristiccharacteristic expression and a “biological”whichlevels.levels. ofnumber of can BPR BPR organism, be In In isof isrecentused recent that thatsamples for itit years, years, requires requires classification. as DNA opposedDNA only only microarray microarray a a to small smallOne traditional special technology technology pattern recognitionrecognition algorithms. algorithms. In In recent recent years, years, BPR BPR has has inactivea sequencemicroarray in of both numerical and cell determineseach values [3]. gene of Abothlevels. theaccording genea laser sequence cells.expression In expression recentA then to black theof levelsyears, numerical intensity scanslevels. spoteachnumberofDNA indicates gene In of thelogical values recentof microarray the according samples colorsegments thatof years, structure gene and the as technology to DNA opposed expressionthe genewhich forms intensity microarray or is to canhyper-surfaces aclass.recognition traditional “biological” ofnumber be the technology usedThese color of pattern for algorithms. connecting samples and organism, classification. andrecognitionwhich as theopposed Inbranches can recent resulting One algorithms. be to can years,specialused traditional be for BPRtopo- HD In classification. recentpattern line has years, One BPR special has each gene according toa thea sequence sequence intensityis given of of oflevels. numerical numericalthe a numeric color In recent and values values value. years,which ofof Each gene gene DNA can sample expression expression microarraybe usedis defined fornumber technologynumber classification. as characteristic of of samples sampleshashasrecognition One provided provided as as of special opposed opposed BPR algorithms. a a is promising promising to tothat traditional traditional it requires In tool toolrecent pattern pattern toonly to years, determine determine a small BPR the the has beenbeen used used successfully successfully in in voice voice recognition recognition [9], [9], iris iris levels. Ineach recent gene years, according DNAis given to microarray the ainactivehas intensity numericlevels. provided technology in Inof value. both therecent a color Each promisingcellhas years,isrecognition and [3].sample given provided DNA Awhichtool a laseris numeric microarray defined algorithms. ato can then promising determine bevalue. as scansused technologycharacteristic In Each tool theforthe recent sample classification. tosegmentsbeen years, determinerecognition of isusedBPR defined BPR or successfully hyper-surfaces isOne the has algorithms.that as specialbeen itcharacteristic requires in used voice In and successfully only recentrecognition the of a resulting BPRsmall years, isin [9],that BPRvoice topo- iris it has requiresrecognition only [9], a small iris is givenbothmicroarray a numeric cells. A value. blacklevels.levels. and Each spotdetermines In Ina sample sequence indicatesrecent recenthas is years, years, provideddefined ofthat numericalthe DNA DNA the as expression gene a microarray microarray promisingcharacteristic values is class. of technology technology levelstool gene These of toBPR expression determineof connecting isrecognitionrecognition thatlogical itnumber requires the branches algorithms.diagnosis algorithms.diagnosisbeen structure of only samples canused a andsmall andbe successfully In In asHD formsprognosis recent prognosisrecent opposed line years, years, ina toof of voice “biological” traditional different differentBPR BPR recognition has has cancer cancerpattern [9], organism,types types iris recognitionrecognition [10], [10], and and facial facial recognition recognition [11]. [11]. BPR BPR has providedis given a apromising numerica value. sequence tool Eachmicroarraydiagnosis tohas of determine sample numerical provided and and is thedefined prognosis determines values adiagnosis promisingabeen sequence as of ofused gene thecharacteristic different and expressiontoolsuccessfully expression of prognosis numerical to cancer determine of levels in BPR ofnumber values types voice different of is the that recognition of oflogicalrecognition genesamples itbeen cancer requires expression structureused as[9], types [10], only successfullyopposed iris formsaandrecognition smallnumber to facial traditional a in “biological” voiceof recognition [10],samples recognition pattern and as [11].organism, facial opposed[9], BPR recognition iris to traditional [11]. pattern BPR a sequenceinactive of numerical in bothhashas cell values provided provided [3].levels. of A genediagnosis laserIn a a recentpromising promising expression then years, and scans tool prognosistool DNAnumber the to to microarray determine determinesegments of of samples different technology or the the as hyper-surfaces cancer opposedbeenbeen used usedtypesrecognition to successfullytraditional successfully and[4-7].[4-7].recognition the algorithms. resulting pattern in in voice[10], voice topo- andrecognition In recognition recent facial recognitionyears, [9], [9], iris iris BPR [11]. has BPR methodsmethods include include different different constructions constructions as as well well as as eachdiagnosis genea sequence accordingand prognosis of numerical tolevels. of the different In intensity valueseach[4-7]. recentdiagnosis gene cancer of years, gene according of types and DNA theexpression prognosis[4-7]. color microarray tolevels.recognition the intensityand Innumber of recent technologydifferent [10],which of of years, the andsamples cancer color facial DNA canrecognition asand types microarrayrecognition opposed bewhichmethods usedrecognition algorithms. to cantechnologytraditional [11]. for include be BPR classification. used[10], In different patternmethods recent forrecognitionand classification. facial years, constructions include recognition algorithms. BPROne differentOnehas special as [11]. specialwell In constructions recent BPRas years, as BPR wellas has levels.microarray In recent years, anddiagnosis determinesDNAdiagnosis microarrayhas and and the provided[4-7]. prognosis expressionprognosis technology a promising of of levels different differentrecognition of toollogical cancer cancer to algorithms. determine typesstructure types recognitionrecognitionthe In forms recentbeen a “biological”[10], years, [10],usedmethods andsuccessfully and BPR facial facial include has organism, recognition recognition in different voice recognitionconstructions [11]. [11]. BPR BPR [9], as iris well as differentdifferent classification classification techniques. techniques. is given[4-7]. levels. a numeric In recent value. years,has provided DNA Eachis given microarray[4-7]. sample a apromising numeric technology is defined value. toolhasmethods toprovided Eachrecognition determine as include samplecharacteristic a promising different is the algorithms. definedbeen constructions astool used In ofcharacteristicdifferent to successfullyrecentmethods BPR determine as years, classification is well includethat of in the asBPR BPR voicedifferent it differentbeen is hasrequires recognitiontechniques. that used itclassification constructions requires successfully only [9], only iris a techniques.smallasin a small voicewell as recognition [9], iris has providedeach gene a accordingpromising[4-7].[4-7]. to tool thediagnosis intensityto determine and of theprognosis the colorbeen and of differentusedwhich successfully can cancer be used typesmethodsmethods in voice forrecognition classification. recognitioninclude includedifferent different different [10], [9], classificationOne andiris constructions constructions special facial recognition techniques. as as well well [11]. as as BPR InIn this this paper, paper, the the focuses focuses are are to to develop develop two two has provided a promisingdiagnosisa andsequence tool prognosis to determine of numerical of differentdiagnosisdifferent the valuesbeen cancer classification and of used gene prognosis types successfully expression techniques.recognition of different innumber voicedifferent [10], cancer recognitionIn of and this samples classification types facial paper, [9], as recognitionrecognition the opposediris focusesIn techniques. this to [11]. [10],paper,are traditional to BPR and develop the facial focuses pattern two recognition are to develop [11]. two BPR diagnosisisa given andsequenceprognosis a numeric of ofvalue. numerical different[4-7]. Each sample cancer values is types defined ofrecognition gene as characteristic expression [10],bothboth and cells. cells. of facialdifferentdifferent BPR A Anumber black isblackrecognitionmethods that classification classification spot spot it of requires indicatesinclude indicatessamples [11].In techniques.thisonly techniques. BPR different that that paper, a as small the the opposed theconstructions gene gene focuses is is toclass.class. are traditionalas to Thesewell These develop as connecting connecting two patternnewnew branches branches techniques techniques can can for for be be developing developing HD HD line line BPR BPR algorithms algorithms and and diagnosis and prognosis[4-7]. oflevels. different In recent cancer years, types[4-7]. DNAInrecognition microarray this paper, [10],technology the focusesmethods and facial arerecognitionnew include recognition to techniques develop differentIn this algorithms. two [11]. forpaper, constructionsnew developing BPRmethods the techniques In focuses recent include BPR as years,forwell are algorithms different developing to as developBPR constructions hasand BPRtwo algorithms as well and as [4-7]. a sequence of numerical values of gene expressionmethodsnumber includeinactiveinactive of differentsamples in in both as bothconstructions opposedInIndifferent cell this cell this [3]. paper,new paper, [3]. to classification traditional Aas techniques A the the laserwell laser focuses focuses as then then pattern for techniques. arescans are developingscans to to developthe develop the segments BPRsegments two two algorithms or or hyper-surfaces hyper-surfaces and applyapply themand them and the the to to DNAresulting DNA resulting microarray microarray topo- topo- data data for for cancer cancer de- de- levels. In[4-7]. recent years, DNAhas microarraybothboth provided cells. cells. A a A blacktechnologypromising blacknew spot spot techniquesmethods indicatestool indicates torecognition fordetermine include that that developing the thedifferent gene gene the algorithms.BPR is isbeenapply classification constructionsclass.new algorithmsclass. used them techniques These These successfully to Intechniques. and asDNA connecting connecting well recent forapplydifferent microarray developing asin them voice branchesyears, branches classification to recognition data DNABPR BPRfor can can algorithms microarray cancer be be techniques.[9], has HD HD iris de- lineand line data for cancer de- levels. In recent years, DNA microarray technologydifferentrecognition classificationmicroarraymicroarray algorithms.new techniques.new and and techniques techniques determines determines In recentInapply for for this the years,the developing developingthem paper, expression expression to BPR the DNA BPRBPR focuses has levels levelsmicroarray algorithms algorithms areof of tologicallogical data develop and and for structure structure cancer two de- forms formstection.tection. a a “biological” “biological” We We aim aim to to build build organism, organism, HD HD topological topological formations formations has provided a promisingdiagnosis toolinactiveinactive to and in indetermine bothprognosis both cellapply cell [3]. of [3]. them thedifferent different A A to laser laserbeen DNA classification cancer then then microarray used scans scans types successfully techniques.theIn the datarecognitiontection. thissegmentsapplysegments for paper, cancer We them the [10], aimin or or tode- focuseshyper-surfaces voicehyper-surfaces toDNA andtection. build facialmicroarray arerecognition HDIn We to recognition topological this develop andaim and paper, data theto the build two resulting[9], forresulting[11]. formationsthe cancerHD focuses iris BPR topological topo- topo- de- are to developformations two has provided a promising tool to determine the Inbeen this used paper,eacheach successfully gene gene theapplyapply according focuses according them them innew are voice to totechniques to totection. the DNAthe DNA recognition develop intensity intensity microarray microarray We for two aim of developing of [9], the tothe irisdata builddata color color for for HDBPRand and cancer cancer topological algorithmswhichwhich de- de- can can formations and be be used used for for(skeleton-like(skeleton-like classification. classification. structures) structures) One One special special and and pattern pattern recognition recognition [4-7].microarraymicroarray and and determines determinestection. We the the aim expressionIn expression tothis build paper,new HDlevels levels the topological techniquesof focuses ofmethods(skeleton-likelogicaltection.logical arefor formations include to developingWestructure structure develop aim structures) different to(skeleton-likenew formstwo buildforms BPR techniques constructions and HD algorithms a a “biological” “biological”topological pattern structures) for anddeveloping recognition as formations well organism, organism, and as pattern BPR algorithms recognition and diagnosisdiagnosis and prognosisand prognosis of different of cancer different typesnew techniques cancerrecognitionisis types forgiven given [10], developing atection. atection. numeric andnumericrecognition facialapply We BPRWe value. value. aim aim recognition algorithmsthem(skeleton-like to Each to Each build buildto [10], sampleDNA sample HD and[11].HD and microarray structures)topological topological is isBPR defined definedfacial data as andformations asformations recognition forcharacteristic patterncharacteristic cancer recognition de- [11]. of of BPR BPR BPR isschemes.schemes. is that that it it requires requires We We propose propose only only a a a small new newsmall approach approach to to the the PHC, PHC, eacheach gene gene according according(skeleton-like to to the thenew intensity intensity techniques structures) of of the the for colorapply color developingand and and them patterndifferentschemes.which(skeleton-like towhich BPR DNA recognition classification can We algorithmscan microarray proposebe be structures)used usedschemes.apply and techniques.afordata for new them classification. classification. for We andapproach cancer to propose pattern DNAde- to microarray aOne the Onerecognition new PHC, special special approach data for to the cancer PHC, de- [4-7]. apply themmethodsbothboth to cells. cells.aa DNA sequence include sequence A A microarray black black(skeleton-like(skeleton-like different of of spot spotnumerical numericaltection. data indicates indicates constructionsschemes. for structures) structures)We values values cancer aim that that of Weto of the theas de-gene build andgene and propose well gene gene expressionpattern HD pattern expression as is is topologicala newclass.class. recognition recognition approachnumber These Thesenumber formations connecting connecting ofto of samplesthe samples PHC, branches branchesas aswhere opposedwhere opposed elementscan elementscan to to be betraditional traditional HD HD of of theline theline pattern same patternsame class class are are topologically topologically [4-7]. isis given given a a numeric numericschemes. value. value.apply Each Each We sample them proposesamplemethods to is isDNA a defined definedtection. new include microarray approach as as Wewherecharacteristicschemes. aimcharacteristic different toIndata elements to the this build for WePHC, paper, cancer HD ofpropose constructionsofwhere BPR the BPRtopological thetection. de- same focuses is ais elements that new that We class it formationsitapproach are aim requires requires are ofas to to thetopologically develop build wellonly tosame only the HD asa two a class smallPHC, smalltopological are topologically formations tection.differentinactiveinactive We aimlevels.levels.classificationin to in build both both In Inschemes.schemes. recent recent HD cell cell topological [3].techniques. years,(skeleton-like [3]. years, We We A A DNAwherepropose propose DNA laser laser formations microarray microarrayelementsthen then structures)a a new new scans scans approach approach of technology technology the the and samesegmentssegments patternto to the theclassrecognitionrecognition PHC, PHC, recognition areor or hyper-surfaces hyper-surfacestopologically algorithms. algorithms.assembled and andassembled In In the the recent recent resulting resulting as as years, years, nodes nodes topo- topo- BPR BPR in in a a has HD hasHD space space and and are are con- con- aa sequence sequence of of numerical numericalwhere elementstection. values values ofdifferent Weof of gene gene aimthe expression same toexpression(skeleton-like build classification class HDnewassembled arenumber topologicalwherenumber techniques topologically structures) elements of ofastechniques. samplessamples formations nodes for andassembled ofdeveloping(skeleton-like inthe as aspattern aopposed opposedsame HD as BPR recognition classspace nodesto to structures) algorithmstraditional traditionalare and in topologically are a HD patterncon- and pattern space pattern and recognition are con- (skeleton-likemicroarraymicroarrayInhashas structures) this providedand providedand paper,wherewhere determines determines and the a a elements elementsschemes. promising focuses promisingpattern the theassembled expressionexpressionof areof Werecognition thetool the toolto propose samedevelop same to as to levels levelsdetermine nodes determineclass class a two new of of are are in approach topologicallyalogical topologicallylogical the the HDbeen spacebeen structureto structure the used used and PHC, successfully successfullyare forms forms con- a a “biological”nected “biological”nected in in voice voice by by recognition means recognitionmeans organism, organism, of of nearest nearest [9], [9], iris iris neighbor. neighbor. levels.levels. In In recent recent years, years,assembled DNA DNA(skeleton-like microarray asmicroarray nodes in structures) technology(skeleton-like technologyschemes. a HD spaceapplynected and Werecognitionassembledrecognition structures) and propose pattern them by are means to con-a recognition as DNA algorithms.new algorithms. nodesand ofnected microarrayapproachschemes. nearest pattern in by aIn In neighbor. HDmeans Weto recentrecognition recent data the space propose offorPHC, years, years, nearest cancerand a schemes. BPR new areBPR neighbor. de- con- approach has has We to the PHC, schemes.neweacheach We techniques gene genediagnosis proposediagnosis according according aassembled forassembled and new and developing to to prognosis approach prognosiswhere the the intensityas intensityasInnected elements nodes nodesBPR this toof of the different ofalgorithms differentof by inpaper, in thePHC, the of meansa a colortheHD colorHD cancer cancersame the spaceof andspace and and nearest focuses class types types andwhich andwhich are neighbor. are arerecognition cantopologicallyrecognition can are con- con- be be to used used develop [10], [10], for for and classification. classification.and two facial facial recognition recognition One One special special [11]. [11]. BPR BPR hashas provided provided a a promising promisingnectedschemes. by toolmeans tool to to We of determine determine nearest proposeproposewhere neighbor. a the theelements newatection. newbeennected approachbeen approachof Weused used by the aim tosuccessfullymeans successfullysame the to to build PHC,the class ofwhere nearest PHC, HD are in in elements topologicalvoice voicetopologically where neighbor. recognition recognition of elements the formations same [9], [9], of class iris iristhe are same topologically where elementsapplyisis given given them[4-7].[4-7]. a aof numeric numeric to the DNAnected samenectednew value. value. microarray class byassembled by techniques Each Eachmeans means aretopologically sample sample data of of as nearest nearestfor nodes is isfor cancer defined defined neighbor. inneighbor. developing ade- as asHDcharacteristiccharacteristic spacemethods BPR andmethods arealgorithms of of includecon- BPRinclude BPR is isdifferent that thatdifferent itand it requires requires constructions constructions only only a a small small as as well well as as diagnosisdiagnosis and and prognosis prognosiswhere of of different different elements cancer cancer ofassembled the types types same(skeleton-like asrecognitionclassrecognition nodes are topologically in structures)[10], [10],a2. HD M andassembledETHODOLOGY and space facial facial and and recognitionpattern as recognition are nodes con- recognition in [11]. [11]. a HD BPR BPR space and2.2. M areMETHODOLOGYETHODOLOGY con- assembledtection.aa sequence sequence as nodes We aim of of in numerical numerical to a build HDnected HDspace values valuesclass topological by and ofare of means geneare gene topologically con- formations ofexpression expression nearest neighbor.assemblednumbernumberdifferentdifferent of of as samples samples nodes classification classification2. as as in M opposed opposed aETHODOLOGY HD techniques.space techniques. to to traditional traditional and are pattern pattern [4-7].[4-7]. assembledapply2. MasETHODOLOGY nodes themnected in to abyschemes. HDDNA meansmethodsmethods space2. microarrayof We M nearestandincludeETHODOLOGY include propose are neighbor. differentnected con-different a newdata by approach constructions constructions for means cancer of to nearest the as as2.12.1 de- PHC, well well Data Dataneighbor. as as Sets Sets nected(skeleton-likelevels. bylevels. means In In recent recent of nearest structures) years, years, neighbor. DNA DNA andcon- microarray microarray2. pattern 2.nected M METHODOLOGYETHODOLOGY2.1 recognition technologyby technology Data means Sets ofrecognition recognitionnearest2.2.1 M DataETHODOLOGY neighbor. algorithms.SetsIn algorithms.In this this paper, paper, In In the the recent recent focuses focuses years, years, are are BPR BPRto to develop develop has has two two nected by means2.1 of nearest Datawhere Sets neighbor.differentdifferent elements classification classification of the same techniques. techniques. class are topologically schemes.hashas2.1 provided provided Data We propose Sets a a promising promisingtection. a new tool approachtool We to to aim determine determine to2.1 to the build Data PHC, the the Sets HDbeenbeen topological used usednewnew successfully successfullytechniques techniques formations for forin in developingvoice voice developing recognition recognitionTheThe BPR BPR BPR BPR algorithms algorithms[9], [9], technique technique iris iris and and introduced introduced in in this this paper paper 2.12.1 Data Data Sets Sets assembled2. METHODOLOGYThe2.In MIn asBPR thisETHODOLOGY this nodes paper, techniquepaper, in the the aThe HD focuses introduced focuses BPR space aretechnique are2. and intoM toETHODOLOGY thisdevelop developare paperintroduced con- two two in this paper wherediagnosis elements and prognosis of the same of class different are topologically cancer types recognition [10], and facial recognition [11]. BPR diagnosis2.The and METHODOLOGY prognosisBPR(skeleton-like technique of different introducedTheis structures)cancernewBPR anew generalin techniques thistechniquestechnique types paper one,recognition andforintroducedand for developingapply developing canapplypattern therefore [10],them them in this BPR BPRandrecognitionto to DNApaper be DNA facialalgorithms algorithms applied microarray microarrayisis recognition a a general to general and and data data [11].one, one, for for and BPRand cancer cancer can can de- de- therefore therefore be be applied applied to to assembled as nodes inTheThe2.1 a HDBPRMethodology DataBPR2. M space techniqueETHODOLOGY techniqueSets nected and are introduced introduced by con-The means BPR in in ofis thistechnique this nearest a paper generalpaper neighbor. introduced one, and in can this therefore paper be applied to [4-7].[4-7].is a general one, and can2.1is atherefore Data generalany Sets be one,data applied setand in can tomethodsmethods a specifiedtherefore2.1tection.tection. Data include include format. be Sets We We applied different different aim aim An to to to applicablebuild build constructions constructionsanyany HD HD data data topological topological set set as as in in well well a a formationsspecified specified formations as as format. format. An An applicable applicable 2.1 Data Sets isis a aschemes. general generalData one, one, We Sets and and propose can canapplyisapply atherefore therefore general them them a new to tobe beone, DNA DNA appliedany applied approach and microarray data microarray can to to set therefore in to data dataa the specified for be for PHC,cancer applied cancer format. de- de- to An applicable nected by means2.1 Data of nearest Sets neighbor.Theany dataBPR set technique in a specified introduceddifferentdifferent format. in classification classification this An paper applicable techniques. techniques.datadata set set should should be be a ann byby mm matrix,matrix, where wherennss any data set in a specified format.Thedata BPRtection.tection. An set techniqueapplicable should We We aim aim be introduced toa ton build buildS(skeleton-like(skeleton-likebyThe HD HD inm topologicalBPRG topologicalthis matrix, papertechnique structures) structures)n where formations formations introducedmn and ands pattern pattern in this recognition recognition paperSSn GG The BPR techniqueanyany introduced data data set setThe in in inBPR a a this specified specified technique paperany format. format. data introduced set An An2. in M applicabledata applicableaETHODOLOGY specified −in set this should− paper format. be ais AnaS general applicableby Gone,matrix, and wherecan −−s −− data set shouldwhere beis a n a generalelementsisdataby aFigureFigure generalm set one, shouldismatrix,1.theof 1. and one,DNADNAthe number be can where and a microarrayssame microarraysn thereforeS cann ofby samplestherefore classismschemes.In beIn aGschemes. process process generalthismatrix, this applied are and be paper, paper,topologicallyappliedm Wefrom from We whereone, to propose the theis propose and then tofocuses focusess− can numberisis a a− newthe the new therefore are are number number approach to approachto develop develop be of of applied to to samples samples two two the the PHC, PHC, to and and mmGG isis the the number number Figure 1. DNA microarraysFigure 1.datadata processDNAThe set set BPR shouldmicroarrays should fromS technique be be a a nGn process introduced(skeleton-likedataby(skeleton-likeby fromm setm shouldmatrix, inmatrix,− thiss structures) structures)beis− paperwhere where a thenS numbernnby and andm patternofG patternmatrix, samples recognition recognition where and mnsG is the number is a general one, and2. can M thereforeETHODOLOGYanytherefore data−is be the− set applied number in 2.1beSS a applied Data specified to of SetsGG samples to format.anynewnew data and techniques techniques Anm set− applicable sinsis −a the forspecified for number developing developing format.ofof BPRobjects, BPR objects, algorithmsAn algorithms applicable such such as as and and genes, genes, images, images, or or frame frame data. data. Figure 1. DNA microarraysFigure processreference 1. DNA from [3]. microarrays (A) DNAis the process from numberis a a generalfromcell assembled of is samples extracted. one,anyreferencereference and and data as canmof nodes setG− therefore− [3]. [3].schemes.is objects,schemes.is in the−− (A) the(A) a numberin specifiedDNA number DNA be such We We a applied frompropose HDfrom propose as ofof format.any genes, samples aobjects, awhere spacetowhere cell cell adataG a new new images,An is is elements andelementsset suchextracted. extracted. andapproach applicable approach inm as or a are of specifiedgenes,is offrame theto the tothe con- the thesame numbersame data.images, PHC, PHC,format. class class or are An are frame topologically applicabletopologically data. FigureFigure 1. 1. DNADNA microarrays microarraysFigureany process process data 1. DNA set from fromreference in microarrays a specified [3].isis the(A) the process format. number DNAnumberdata set from An of shouldofapplicable samples asamples cell be is a and extracted.andn mmbyGG ismis the thematrix, number number where n G reference [3]. (A) DNA fromof a objects, cell is extracted. such as genes,dataEachEachdataof images,setobjects, setDNA DNA should shouldHowever, segment segment orsuch The framebe S as a for BPR has hasn genes, Sdata. our −by− a aapply techniquebyapplyG corresponding correspondingpurposes, images, datam themassembled themGassembled matrix,matrix, set introducedweor to toshould DNA frameDNAapply “spot” “spot” wherewhere as ass be microarraynodes microarraythis nodesdata.in aon onn thisns techniqueS inisHowever,However, in paperby the a a data data HD HDmnumberG for forspace formatrix, spacefor cancer cancer our our of and and purposes, purposes,where de- de- are aren con- con-s we we apply apply this this technique technique reference [3]. (A) DNA from aEach cellreference is DNA extracted. segment [3].2.1 (A)Each Data has DNA DNA a Sets correspondingany from segmentofof data a objects,nected objects, cell set has is in “spot” extracted. such asuch a by corresponding specified onasmeans as genes, genes, format.whereofwhere of images,objects, images, “spot”− nearest elements elements An− on orsuch− orapplicable frame frame of−However,neighbor. of as the the genes, data. data. same same for images, class class our purposes, are are or topologically topologically frame− we− data. apply this technique referencereference [3]. [3].Figure (A) (A) 1.DNA DNADNA from from microarraysdata a a cell cell set is is should extracted. extracted.process be from a nS by misG thematrix,However, numberthethe microarray microarray where of for samplesn ours chip. chip. purposes, and (B) (B)−tection.tection.m When When− weG is apply comparingWe comparingWe the aim aim numberthis to to technique build buildgene gene HD HDtoto topological topological DNA DNA microarray microarray formations formations data. data. Each DNA segment hasEach a correspondingFigure DNAthe segment1. microarrayDNA “spot” microarrayshas on a chip. correspondingHowever, (B) processFigure Whendata for− set1. from“spot” comparing ourDNA should− purposes, on microarrayssamples beis gene a the wenS numberapply andbyisto process aassembledHowever, DNAassembledm this general G of ismatrix, techniquefrom microarray samplesthefor one, asnumber as where our nodes nodes and andto purposes,is data. DNAnectedn can themofnected ins inG objects,a numbertherefore amicroarrayis HD by HDwe bythe means applyspacemeans spacenumber such of be samples data.this of andapplied ofandas nearest nearesttechniquegenes, are are and to con- con- neighbor. neighbor.images,mG is the or number EachEach DNA DNA segment segment has has aEach a corresponding corresponding DNAis the segment number “spot” “spot”theThe has ofmicroarray onBPR on samplesa correspondingHowever,However, technique chip. andofm for(B) forobjects, introduced “spot” ourouris When the purposes, purposes, such− on numbercomparing in− as thiswe genes,we paper apply apply gene images, this this technique technique or frame data. Figure 1.theDNA microarray microarrays chip.reference process (B)referencethe When [3]. microarrayfrom comparing (A)expression [3]. DNA (A) chip. DNA from gene levels, (B) a from When cell DNAtoreference a is DNA cell comparing fromextracted.is is microarray the each extracted.[3]. number (A) cellgene DNAis data. ofG labeledofexpressiontoexpression samples from objects,DNA aany cell microarray and levels, suchlevels,data ism extracted. as setDNA DNAis genes, data. inthe from(skeleton-like froma(skeleton-like number specified images,of each each objects, cell cell or format. frame structures)is structures)is such labeled labeled An data. as applicablegenes, and and pattern images,pattern recognition recognition or frame data. Figurethethe microarray microarray 1. DNA chip.microarrays chip. (B) (B)the When When process microarray comparing comparingis from a chip.expression general gene gene (B) one, When levels,toto and DNA DNA comparing DNA can microarray microarrayframe therefore from gene eachdata. data. data. be cell nectedHowever,to appliednected DNA isG labeled by by to microarray meansfor means our of of purposes, data.nearest nearest neighbor. neighbor. we apply this technique to reference [3]. (A) DNA fromEach a cell DNAEachexpression is extracted. DNAsegment segment levels, hasof a DNA hascorrespondingobjects, a from correspondingEach such each DNAas “spot” cell genes, segmentis “spot” on labeled images,However, on hasHowever, awithorwith corresponding frame for different different our for data. purposes, our fluorescent fluorescent “spot” purposes, we onschemes. applyschemes. tags tags wenHowever, applyandthis and We We techniquehybridized hybridized thism propose proposefor technique our purposes,on ona a new new approach approach wen apply to tothis the the PHC,technique PHC, expressionreference levels, [3]. DNA (A)Each from DNA DNA each fromwith segmentcellexpression a different cellis labeled ishas extracted. levels, fluorescentanya correspondingwith data DNA different set tagsof from in objects,and a “spot” each fluorescentspecified hybridized cell such on DNA is format. as labeledtags on genes,microarray and Andata2.2 hybridizedimages,2. applicable BPRset M data. shouldETHODOLOGY Algorithm or on frame be a data. withS by Hyper-DimensionalG matrix,2.2. M whereMETHODOLOGYETHODOLOGY2.22.2 Line BPR BPRs Algorithm Algorithm with with Hyper-Dimensional Hyper-Dimensional Line Line Each DNA segment hasexpressionexpression a correspondingthe levels, levels, microarraywith “spot” DNA DNA different chip. from fromon fluorescent (B) eachHowever, each When cell cell istags comparingforis labeled labeled our and purposes, hybridized gene weto on apply DNAto this DNA microarray technique microarraythe data.the microarray microarray data.wherewhere2.2to chip. chip.elements−BPR DNAelements− Algorithm microarray of of the the samesame with data. classHyper-Dimensional class are are topologically topologically Line with differentEach DNA fluorescent segmentthe tags has microarray aand corresponding hybridizedwithFigure chip. different 1. (B) onDNAthe “spot” Whenfluorescent microarray microarraysthe on comparing microarrayHowever, tags chip. processthe and gene chip.for microarray hybridized our from (B) purposes,2.2 When chip.BPR on is comparing Algorithmwe the apply number this gene with of technique Hyper-Dimensional samplesETHODOLOGYETHODOLOGY and mG is Line the numberSegmentsSegments the microarray chip. (B)withwith When different differentexpression comparing fluorescent fluorescent levels, gene DNA tags tagsto and and from DNA hybridized hybridizeddata each microarray set2.2 cell should BPR on onis labeled data. Algorithm be a nS by withm Hyper-DimensionalG matrix,Segments2.2 where BPRns Algorithm Lineassembledassembled2.Segments2. M M2.12.1 with Data Dataas as Hyper-Dimensional nodes nodes Sets Sets in in a a HD HD space space Line and and are are con- con- the microarraythe microarray chip.expression (B) chip. Whenreference levels,the comparing DNAmicroarray [3]. (A) from gene DNAexpression chip.each from cellto DNA a2.2 is2.2 levels, cell labeled2.1 BPR BPR microarray is− DNA extracted.Data Algorithm Algorithm−Segments fromSets data. each with withof objects,cell Hyper-Dimensional Hyper-Dimensional is labeled such as genes, Line Line images, or frame data. Figure 1. DNA microarraysthethe process microarray microarray from chip. chip. isthe theSegments microarray number of chip. samples and mG is theSegments number 2.2.12.2.1 Training Training Process: Process: ByBy providing providing a a new new expression levels, DNA from eachwith celldifferentwith is different labeled fluorescent fluorescent tags and tagswith hybridized and different hybridizedSegmentsSegments on fluorescent on2.2BPR BPR tagsAlgorithm Algorithm andHowever,2.1 hybridized2.1 withwith Data2.2.1 Data for Hyper-DimensionalHyper-Dimensional SetsTraining Setsour onnectednected purposes, Process: by by2.2.1 means meansweTheThe applyTrainingBy Line BPR of ofBPRLine providing nearestthis nearest technique techniqueSegments Process: technique neighbor. neighbor. a new introducedBy introduced providing in in this a this new paper paper reference [3].expression (A) DNA levels, from DNA a cell fromEach is extracted. each DNA cell segment isof labeled objects, has a corresponding such as genes, “spot” images,2.2 on BPR2.2.1 or Algorithm frame Training data. with Process: Hyper-Dimensional2.2By BPR providing Algorithm a Line new withapproachapproach Hyper-Dimensional to to the the PHC, PHC, Line we we connect connect nodes nodes from from the the with different fluorescent tags and hybridizedthe on microarraythe2.2 microarray BPR chip. Algorithm chip. 2.2.1 with Training Hyper-DimensionaltheSegments microarray Process:The BPR chip.Bytoapproachproviding Line DNA technique microarrayThe2.2.1The to a the BPRnew BPR Training PHC,introducedapproach techniquetechnique data.isis we Process: a aconnect general general to introduced introduced the inBy one,PHC,nodes this one, providing andin inandpaper we from this this can connect can a paperthe paper new therefore therefore nodes be be from applied applied the to to Each DNA segmentwith different has a corresponding fluorescent1.2 tagsthe “spot” Biomimetic microarray and hybridized on However, Pattern chip. on (B) Recognition for When2.2 our BPR purposes,comparing Algorithm2.2.12.2.1Training we Training Training1.2 gene1.2Segmentsapproach apply Biomimetic withBiomimetic Process Process:this Process: Hyper-Dimensional to technique the Pattern Pattern PHC,ByBy providing providing Recognition weRecognition connectSegments Line a a new new nodes from the samesame class class in in a a HD HD space space by by means means of of nearest nearest the microarray chip. Segments 1.2approach Biomimetic to the Pattern PHC, Recognition we connectsame nodesisapproachis a a class general general from into the the one,a one, HDsame PHC, and andany spaceany class can we can data data connectby therefore therefore in2. set2. set means aM M inETHODOLOGY HDinETHODOLOGY nodes a abe beof specified space specified applied appliednearest from by the format. toformat. means to An of An nearest applicable applicable the1.2 microarray Biomimetic chip. Pattern (B) Whenthe Recognition1.2 microarray comparingBiomimeticexpression chip. gene Pattern levels, Recognitionto DNA DNA microarray fromSegments eachapproachapproachiscell data. a is generalto toBy labeled the the2.2.1same providing PHC, PHC,BiomimeticBiomimetic Training class one, we we a in connect connectnew and Process: a PatternHD Pattern approach can nodes nodes spaceBy Recognition thereforeRecognition providingfrom from byto meansthe the the PHC, a(BPR)(BPR) be newof we appliednearest is is connect a a neighbor.neighbor. to nodes In Infrom order order the to to build build the the HD HD topological topological 1.21.2 Biomimetic Biomimetic Pattern Pattern Recognition Recognition1.2Biomimetic Pattern PatternsameRecognition classRecognition in a HD (BPR) space is bya2.2.1 meansneighbor. Training of Innearest Process: order toBydatadata build providing2.2.1 set set theshould should Training HD a benew be topological a Process: annSS byby BymmGG providingmatrix,matrix, where wherea newnnss with different fluorescent2.2.1 TrainingBiomimetic tags Process:same andsame hybridized classPattern classapproachBy providing in in Recognition aon a to HD HD the a space space PHC,newanysameany (BPR) databy bydata we class means means connectsetis set ina2.1 in2.1 in a ofneighbor. ofa aData DataHD nodes specified nearestspecified nearest space Sets Sets from In format. format.by orderthe means An to An build ofapplicable applicable−− nearest the−− HD topological expressionBiomimetic levels, DNA Pattern from each RecognitionBiomimetic celltechnique is labeled (BPR) Pattern constructing is a Recognitionneighbor. a hyper-dimensional In (BPR) orderany isdata toasametechniquetechnique buildapproachneighbor. (HD) set class the in2.2formations, to constructingin constructingHD Ina BPRthea orderHD specified topological PHC, Algorithmspace itto iswe a a build important hyper-dimensionalby hyper-dimensional connect format. approachmeans with theisis theHyper-Dimensional theHDnodes to of number understand numberAn to nearest topological thefrom applicable (HD) (HD) PHC, of of the neighbor. samplespoints samples weformations,formations, Line connect and and andIn ordermmnodes it it is isisis importantto important thefrom the number number the to to understand understand points points and and 1.2 Biomimetic PatternBiomimetic Recognitionthe microarraytechnique PatternFigureFigure Recognitionconstructing chip.neighbor.neighbor.2.2.1 1. 1.DNADNA Training In (BPR)In a microarrays microarrays hyper-dimensional order order Process: is toa to build processbuilddataneighbor. processdataBy set set theproviding thefromshould (HD) fromshould HD HDIn order topological betopologicalformations, bea new a an tonSS buildbyby itmm istheGG importantmatrix,matrix, HD topological where where to understandnnss GG points and with different fluorescentBiomimeticBiomimetic tags1.2technique and Biomimetic Pattern Pattern hybridized constructing Recognition Recognition Patternapproach on Recognition a (BPR)hyper-dimensional (BPR) to1.2 the Biomimetic is isPHC, a a we connect Pattern (HD)samegeometric Recognitionnodesgeometric classformations, frominSegments a body body HDthe it is totospace important mimic mimic by a ato means biological biological understandThe− of− BPR nearest− system− system points technique for andfor introducedlineline segments segments in this in in paper HD HD space. space. Let Let xx bebe an an element element technique constructing a hyper-dimensionalgeometrictechnique body (HD) constructing2.2 togeometric BPRmimic formations,referencereference Algorithm a aapproach hyper-dimensional biological body it [3]. [3].isdata to with important to (A) mimic(A) thesystem set Hyper-Dimensional DNAbuild DNA PHC,same shoulda to frombiologicalfor (HD)the from understand we class HDline aconnect abe cellisformations, cellis intopological segments the system thea isa is pointsn Line extracted.HD numbernodes extracted. numberS for space and itby in from isformations, of HD ofline importantsame bym samples samplestheof space.TheofG segments means objects,objects,matrix, class BPR to Letandit and ofunderstandis in technique inx suchmimportant suchnearestm abe HDwhere HDis as anis as space. the thegenes, space pointselementintroduced genes,n numberto numbers Let understand by andimages, images,x means inbe this anor or of framepaperelement frame nearest data. data. 1.2 BiomimeticFigure 1. DNA Pattern microarraystechniquetechnique Recognition process from constructing constructing reference [3]. (A) DNA a a hyper-dimensional hyper-dimensional fromFigureFigure samea cell is extracted. 1. class 1.DNADNA inEach microarrays(HD) (HD)microarrays a DNA HD segmentformations, spaceformations, has process process a by means it it from from is is important important of nearest ton to understand understand points points and and GG n nn nn 1.2the Biomimetic microarray PatternBiomimeticgeometric chip.Biomimetic Recognition body Pattern to Pattern mimic Recognition Recognition aline biological (BPR)segmentsBiomimetic (BPR) system isa in isHDneighbor. Pattern for a space.classification.classification.neighbor.line Recognition In segments Let orderinx Inbe BPRn BPRorder to in(BPR)anandbuild HD waselement wasx to−isx space. first first build theisis abe− aintroduced a introduced HDneighbor. a general Let general the linen topologicalx HD segmentbe one, one, by by Intopological an Shoujeu Shoujeu orderand elementand in can can ton as therefore thereforein buildin well.RR theandand be be HDxx applied1 applied1xx22n topologicalbebe a ato to line line segment segment in in RR asas well. well. geometric body to mimic a biologicalclassification.geometric system body BPR forSegmentsclassification. towas Each mimicEach firstsame DNA introducedDNA a biological BPR classsegment segment was inbypoints systemfirsthas aShoujeu has HDa introduced a corresponding correspondingand space for lineoflineRof by by objects,segments2.2.1 objects, means segments Shoujeu “spot” “spot” Training1 such of such2 on in nearestonin asHD HDas Process:RHowever, genes,However, genes,space. space.and images,xBy images,1Let Letforx for2 providing ourbex our be orR a orpurposes, purposes, line frame an frame aelement element segment new data. data. we we apply apply in R this thisandas technique technique well. BiomimeticcorrespondingFigure “spot” Pattern ongeometricgeometric the 1. microarray RecognitionDNAtechniquebody bodychip. microarrays (B)to (BPR)toWhen constructing mimic mimic comparing isreferencereference a a gene biological biological processneighbor. a expression hyper-dimensional [3]. [3]. (A)systemlevels, system (A)from In DNA orderDNAn for forfrom from from to each (HD)lineline build cell a ais cell segmentsis cellsegments theformations, the is is extracted. extracted.HD number in in topological HD HDn it is space. space. ofimportant samples Letn Let x tox be understandbe andan an element elementm pointsG isn and the number DD xx xx xx classification. BPR was firsttechniqueclassification. introducedWang constructing BPR byin Shoujeu2002 was a firstin hyper-dimensional Beijing, introducedtechniquein and China constructing byx1x andShoujeu2n (HD)be was a line derivedWang aWangformations,in hyper-dimensional segmentR in inandapproach 2002The 2002x itin1 isminimumx in in2n importantbe Beijing, Beijing,toas a (HD)the well. line distance,any PHC,any toChina China segmentn formations, understand data data we and andD set set connect, inwas frominwas inR points it a a derivedis derivedasspecified nodesspecifiedx important well.to and a fromxnTheThe format.1 format.x to2 the minimumis minimumunderstand An An applicable applicable distance, distance, points and ,, from from toto a a 11 22 isis labeled with different fluorescentBiomimetic tags and Patternhybridized Wangon Recognition theEachclassification.Each microarray in DNA DNA 2002 chip (BPR). segment segment in BPR Beijing,Wang is2.2.1 was hasR athehasthe in first aneighbor. ChinaTrainingmicroarray a 2002microarray corresponding corresponding introduced inandn Process: Beijing, Inwas chip. chip. order by x derived“spot” “spot” (B)x Shoujeu (B)China be to When When ona buildon andline comparingHowever,in comparingHowever, wassegment theRR HDderivedand forfor gene topologicalx genein1 ourx our 2 The n be purposes,as purposes,to atowell. minimum lineDNA DNA The segmentwe we microarray microarray applyminimum applydistance, in this thisR data. data.techniqueDtechnique distance,as, from well. x to, from a x1x 2 is technique constructingclassification.classification. a hyper-dimensionalgeometric BPR BPR body was was1.2 (HD) firstto first Biomimetic mimic introduced introducedformations, a biological Pattern by by itShoujeu Shoujeu is Recognition important systeminin forRRof to understandandandline objects,x segmentsfrom1from1ThexBy22 bebe pointsthe theminimumproviding such a a Principle in Principleline line and HD as segment segment adistance, space. genes, new of of Homology-Continuity Homology-Continuity in in LetRDR images,x, fromasasbe well. well. anx elementto or a framex (PHC) (PHC)x is data.determineddetermined basedD based on on whether whether its its projection projection is is inside inside referenceWang intechnique 2002 [3]. in (A) constructing Beijing, DNAgeometricWang China from a infrom hyper-dimensional and2002 body a wasthe in cellto Principle Beijing, derived mimic is extracted. aChina of (HD)geometric biologicalThe Homology-Continuityexpressionexpression minimum andformations, was bodysystem levels, derivedlevels, distance, to for it mimic DNAis DNA important (PHC)lineD afrom, from segmentsbiological from eachsamedetermined eachtoxThe understandto cellcellclass in systemminimum a HD isx is1basedlabeled inx labeled space.2 pointsfor adataisdata HD on distance,determinedline Letset whether setand space should shouldx segmentsbeD by its, based an be frombe projection means element a ain1n onnxS HDS2 whether oftobyby space.is anearest insidemxmGG itsx Letmatrix,matrix, projectionis x be where where an is element insidennss WangWang in in 2002 2002 in in Beijing, Beijing,Wangthethe China China microarray microarray in and2002 andapproach was was infrom chip. chip. Beijing, derived derived the (B) to (B) Principlethe When ChinaWhenTheThe PHC, comparing minimum comparingminimumand of we Homology-Continuity was connectn deriveddistance, genedistance, gene nodestotoDD from DNA,, DNA from from (PHC)the microarray microarrayxx toto a a xx11 data.x data.x22nisis −− −− 1 2 x x 22 x x 11 geometric body to mimic a biologicalclassification. system BPR for wasline first segmentsintroduced in by HD Shoujeu space. Letin R x [8]. [8]. determinedbetoand a PHCan PHCn x 1 element x 2 assumes assumes isbe based determined a line on that that segmentwhether the the based difference difference its in projectionR non x aswhether2 between between well.x n1 is inside its el- el- projectionthethex line linex segment. segment. is insiden Let Let uu == −− bebe a a unit unit vector vector from the Principle of Homology-Continuityclassification.from the Principle BPR wasof (PHC) Homology-Continuity first introducedclassification.determinedFigure 1. byDNADNA based Shoujeu BPR (PHC) microarrays microarrays on was whether firstin R introduced processits processand projectionthex1 linefrom fromx2 bybe segment. Shoujeuis a inside lineisis Let segment the theinu numberR number= and in −R ofx of1xas samples samplesbe2 well.be a unit a line and and vector segment2mmGG1 isis the the in R number numberas well. x x 22 x x 11 1.2 Biomimeticgeometric Pattern body Recognition to mimic[8]. a biological PHCBiomimetic assumes system Pattern thatFigure forwithwith the Recognitionline differencedifferent 1. different segmentsHowever, fluorescent fluorescent between(BPR) in HD isR for el- tagsa space.tags ourneighbor. and anddetermined Let purposes, hybridized hybridizedx Inbe anorder based on elementon wethe toon applyline whether build segment.x 2 theR thisx its1 HD projection technique Let topologicalu = is inside be a unit vector Each DNA segment has a correspondingfromexpressionexpression the Principlen levels, levels,same “spot”[8]. DNAof class DNA PHC Homology-Continuityfrom inon from assumes adetermined eachHD each spacethat cell cell theisbased is by labeled labeled difference(PHC) means onn whether of between nearest its projection el- x 2 is insidex 1 2 1 −  −−  fromfrom the the Principle Principle of of Homology-Continuity Homology-Continuityin and x1 (PHC)x (PHC)2 be a linedetermined segmentThe based minimum in x 2 onasx 1 whether distance, well. its projectionD, from x is2.2to2.2 inside a BPR BPRx−x Algorithm Algorithm is with withx 2 Hyper-Dimensional Hyper-Dimensionalx 1 Line Line classification.Biomimetic BPR wasPattern first introduced RecognitionWang[8]. in 2002by PHC Shoujeu in assumes Beijing, thatR China the and difference was derived betweenn el-theementsementsthe lineR line segment. of of segment. the the same same Let Let class classu n = is is − gradually gradually bebe a1 x 2unit 2 changed. changed.x 1 vector goinggoing − from from x x 11 toto x x 2 2 andand qq == xx x x 11,, u u bebe the the [8]. PHC assumes that theWang difference inements 2002 between in of Beijing, the el- samereferencereference ChinaWangthe class line andin [3]. [3]. issegment. 2002was (A)gradually (A) derived DNAin DNA Let Beijing, from fromchanged.u =The aChina a cell minimumcell− formations,going is is andbe extracted. extracted. a wasx x 2 fromdistance, unit2 x x 1 derived1 itvectorx is importantoftoofD, objects,x objects,x The2 fromandx 1 minimum tox q such such understandto= ax as asx1 distance, genes,x genes,2x , pointsis u images, images,beD and, the from or or framex frameto a data. data.x1x2 is classification. BPR was firsttechniqueements introduced[8]. PHC of constructing the assumes by same Shoujeuements that class a hyper-dimensional the ofin isR the difference graduallyand samethethex1 microarrayclassx between microarray changed.2 be (HD) is ax 2 line gradually el-x chip.1 chip. segmentthe line changed. in segment.R1 asgoing well.2 Let u from= x − to bex 1 aand unitq vector= x x , u be the Biomimetic[8].[8]. Pattern PHC PHC assumes assumesRecognition that that(BPR) the thewithwith difference difference different is different a neighbor. between between fluorescent fluorescent el-In el- tagsorder tagsthethe andtoand line line to DNAhybridizedhybridizedbuild segment. segment. the microarray Let Leton HDonuu = topological= −− data.bebe a a unit unit−Segments vector vectorSegments x 2 1 x 1 2 1  −−  Wang in 2002the microarray in Beijing, Chinafrom chip. and the (B) was Principle derivedWhen of Homology-Continuity comparingThe minimum gene distance, (PHC)D, fromdeterminedInxInto other other a− basedx words,1 words,x22.2 on2.2x isx whetherx BPR22 thereBPR therex x 11x AlgorithmAlgorithm is is its a a projection gradualq gradual= with withx connection connection is Hyper-Dimensional Hyper-Dimensionalx inside, u − be- be-  projectionprojection Line Line − of of xx ontoonto the the line line segment.. segment.. Note Note that that Biomimeticements ofPattern the same Recognition classfromements is the gradually(BPR)geometric ofIn Principle otherthe is samea changed. words,technique of body Homology-Continuity classEachEach to there mimic isfromgoing constructing DNA DNA isgradually a the a fromThe gradualsegment segment biological Principle minimumx changed.1 (PHC)a connectionto has has ofx system2 a a Homology-Continuity correspondingand correspondingdistance,determinedgoing qfor be- = from Dline xprojection , based from “spot”“spot” segments x  1 1 , to−− u onx be onof (PHC)onbeto2 whether andthex inthe aHowever,However,ontox HD projection1xdetermined its2 space.theis projectionfor linefor Let our ourof1 segment.. based− purposes,x ispurposes, ontobe inside on anthe whetherthe Note element we we line apply thatapply itssegment. projection this this technique technique is inside Wang in 2002 in Beijing, Chinaements and of was theformations,the derivedthe sameIn microarray othermicroarray class it words, is is important chip. chip. gradually there is tox a changed.understand gradualx connectiongoing pointsq = from andx be-x 2 x x 1x 1,toprojection u x 2 and− q of=x  ontox x the1, u linebe segment.. the Note that from thetechnique Principle constructing of Homology-Continuityementsements[8]. aof of hyper-dimensionalthe PHC the same same assumes (PHC) class class that is is (HD)determined graduallythe gradually difference changed. basedchanged. between on whethergoinggoing el- fromits fromthe projection linex 11 segment.toto x is22 and insideand−Segments LetSegmentsnq u== x −x 11,x u 2bebex be1 a unit the the2.2.12.2.1 vector Training Trainingn Process: Process:x 2 x 1ByBy providing providing a a new new expressionIn other words, levels, there DNA is[8].In a gradual other PHC from words, assumes connection each there that cell is be- the a isthethe gradual difference[8].projection labeled microarray microarray PHC connection between assumes of chip. chip.x onto (B) (B) be-el- that the When Whentweentween thetheprojection line linedifference comparing comparing any segment..any segment.in two twoof x betweenand elements geneelements geneonto Note Letx −ux x the2 that el-=toto thatx be that1 line DNA DNA athe− belong belong line segment.. line microarray microarraybe segment segment. toa to unit the the Note invector samedata. samedata.− Let thatuas= well.andand− denotebedenote a unit the the vector usual usual Euclidean Euclidean norm norm and and the the hyper-dimensionalfrom the Principle(HD) geometric of Homology-Continuityclassification.tween body to any mimic two BPR elementsa (PHC)biological was first thatdetermined introduced system belongx 2 tobased byx 1 the ShoujeuNote on same whetherthat R itsandand projection  denote1−2 is the insidex the2 usualx 1usual Euclidean Euclidean normR norm and theandx 2 x 1the inner geometric bodyInIn to other other mimic words, words, a biological there there is is systemIn a a gradual other gradual for words, connection connectionlinetween there segments isbe- any be- a gradual twoinprojectionprojection HD elements space. connection of of that Letxx ontoontox belong be-be the the anprojection toline line element the segment.. segment.. same of− x onto Note Note− and the that that linedenote segment.. the usual Note·· Euclidean that− ·· norm and the [8]. PHC assumes that the differenceements of between the same el- classthe line is gradually segment. Let changed.u = −goingbe from a unitx 1 vector·tox 2 x x 21 and· q = x −approachapproachx 1, u be to theto the the PHC, PHC, we we connect connectn nodes nodes from from the the tween any[8]. PHCtwo elements assumesementstween that that belong the any ofWang difference the two to in same elements the 2002 same between class inexpressionexpression that Beijing, isnements el- belonggraduallyand Chinathe levels, levels,of to linedenote the changed.the and DNADNA segment. samex same2 wasthex 1 from from usual class derivedgoing Let each each Euclidean isandu from= gradually cell cellThedenoten is is−x minimumnorm1 labeled labeledto2.2.1 changed.be2.2.1 thex and2 a usualand unit Trainingthe Trainingdistance,q vector· Euclidean=goingn Process:x Process:D· from, fromx norm1, u x By1Byx andbeto providingto providinginner the thex 2 a andx product1x2q a ais new= new inx R ,x 1 respectively., u be the It can be shown algorithms use the same name set, a different struc- forwith classification. different BPR fluorescent was first introduced tagstween and any hybridizedby twoinShoujeu elementsand Wang onx thatx inbe belong2002 a line− toproduct segment the· same in· x inner 2 , asx respectively.1and well.productdenote in It− the can, respectively. usual be shown Euclidean It that can norm be and shown the algorithms use the same name set, a different struc- ementsclassification. of the same BPR classtweentween was is anyIn any first gradually other two two introduced elements words, elements changed. by there that that Shoujeu isgoing belong belong a gradual from to toR the the· connectionx 1.211.2 same sameto Biomimetic Biomimeticx 1·2 2and be-2.2andandq = Patternprojection Pattern BPRx denotedenote Recognitionx Recognition Algorithm1,theof uthex usualbe usualnRonto the− Euclidean Euclidean the with line normHyper-Dimensional segment..normR and andsamesame the− the Note class class that in in a a HD HD Line space space by by− means means of of nearest nearest ements of the sameIn other classfrom words, is the gradually there Principle is changed. awithwith gradualofIn Homology-Continuity different different other connectiongoing words, fluorescent· fluorescent· frominner there be-··x product tagsisto tags (PHC)projection ax andgradual andand in hybridized hybridizedRdeterminedq of,approach· connection=approach respectively.x x onto on on· basedx to theto, be- u the the It lineonnbe canPHC, PHC,projection whether thesegment.. be we we shown its connect connect of projection Notex algorithmsontothat nodes that nodes theis frominside from line use the the thesegment.. same name Note set, that a different struc- ture is constructed. Figure 3 shows an example of Wang in 2002 in Beijing, China and was derived The minimum distance, D1 ,− from2 x innertothat a x product1x2 is1 in 2.22.2, BPR BPR respectively. Algorithm Algorithm It can with with be Hyper-Dimensional Hyper-Dimensional shown algorithmsture is Line Lineconstructed. use the same Figure name 3set, shows a different an example struc- of In otherin words,Beijing, there China is aand gradualthe tweenwas microarray derived connection any two from elements1.2 be-1.2 chip. the Biomimetic Biomimeticprojection Principle that belong Pattern Pattern of Homologyx toonto Recognition theRecognitionthe same line segment..and denote Note that the usual− EuclideanR neighbor.neighbor. normx 2 andx 1 Inthe In order order to to build build the the HD HD topological topological In other words, theretween is any a[8]. gradual two PHC elements connectionassumes that thattween be- belong the anyprojection differenceBiomimeticBiomimetic tothethe two theSegments microarray microarraythat elements same of between Patternx Patternonto chip.that chip. el-and Recognition the Recognition belongthe linedenotesamesame line segment.. to segment. the(BPR) classthe(BPR) class same usual in Note in is is Let aEuclidean a a HD HDthatu =and space space norm− denote by bybe and meansture means a the the unit is usual constructed.of of vector nearest nearest Euclidean Figure norm 3 and shows the an examplen of the development of and the contrast between both from the Principle of Homology-Continuity (PHC) determined based on whether· its· projectionthat is inside SegmentsSegments x 2 x 1 x xture1 the isinner development constructed. productq<0 Figure of in andR , 3 the respectively. shows contrast an example between It can be of both shown algorithms use the same name set, a different struc- tween any-Continuity two elements (PHC) that [8]. belongPHC assumes to the same that the differenceand denotetechniquetechnique between the usual constructing constructing Euclidean· a norm a hyper-dimensional hyper-dimensional and· the x (HD) (HD)x 1 ·formations,formations, ·− q< it it is0 is important important to to− understand understand points points and and 2 tween any two elementsements that belong ofBiomimeticBiomimetic the to thesame same Pattern classPattern is Recognition Recognition graduallyand denote changed. (BPR) (BPR)x 2 thex 1 usualis is a agoing Euclideanneighbor.neighbor. from normx In1 Into order order andx 2 theand to to buildq build= x the thethe HDx HD1 development, u topological topologicalbe the of and2 the contrast2 between2 both proposed algorithms in R . The Segment Connec- [8]. PHC assumes that the difference between· el- the· line segment. Let u = −x x be1 a unit vector q<− 0 2 2 D = x 2theproposedx development1 that+ q algorithms0 ofq and in thex 2R contrast.x The1 Segment between Connec- both ture is constructed. Figure 3 shows an example of · · 2.2.1x 2 x 1 TrainingD = x Process:x x x line2.2.1By2.2.1line+ segmentsprovidingq segmentsTraining Trainingq<00−q inProcess: Process: in HD HDax new space. space.ByByx providing providing Let Letx x bebe a a2 an newan new element element elements of the same class is graduallyIntechnique othertechnique changed. words, constructing constructing thereIn othergeometric isgeometric a words, a graduala hyper-dimensional hyper-dimensional body body connection to to mimic mimic − be-− (HD) (HD) a a biological biological projectionformations,formations,2 2 system system of x it1 it2onto is for is for important important1 the line to to segment.. understand understandproposed Note points2 points algorithms that 1 and and− in R . The Segment≤ ≤ Connec-2 − tion algorithm provides a more compact structure ements of the same class is gradually changed. going from x 1 to x 2Dand= q = xx 2 xx 11, u +beq the−0 approach q− 2 x 2 to2nxn the1 PHC,≤ ≤ we connect− x nodesxproposed2 tion from algorithm algorithmsn then q> providesx in2 R x .1 a The more Segment compact Connec- structure the development of and the contrast between both classification.classification. BPR BPR was was first first introduced− introducedD = by by Shoujeuxx Shoujeu2≤ approachx x 2≤ 1 in+inqR−R toand the0and q> PHC,xqx11xx2x22xbe webe2 a connectx a1 linex line1  segment− segment nodes infrom inRR theasas well. well.x x 1 − q<0 there is a gradual connection betweentweengeometricgeometric any any two two body bodyelements elements1.2 to Biomimeticto mimic mimic that that a a belong biologicalPattern biologicalapproach to Recognition the system system samex  tox for− for thelinelineand PHC, segments segmentsdenote  weq>− connect inxthe in HD HD usualx space. space. Euclidean nodes Let Lettionx xfrom normbebe algorithm− an an and the element element the provides a more compact− structure(1) than that from the Nodalproposed Connection algorithms algorithm. in 2. The Segment Connec- In other words, there is a gradual connection be- projection1.2 Biomimetic of x onto Pattern the Recognition line segment..2 · Noten·n that −samesame2 class class1 in in≤ a a HD HD≤ space space−nn by by means means(1)tionthan of of algorithmD nearest nearest that= from providesx the Nodalx a more2 + Connectionq compact2 0 q structure algorithm.x x R classification. BPRWang wasWang first in in introduced2002 2002 in in Beijing, Beijing, by  Shoujeu− China China in andin and was wasandx and derived derivedxxx211x x22 bebe−The aThe a line line minimum minimumq> segment segmentx than2 distance, indistance, inx that1 as fromasD well.D well.,, from the from Nodalx x toto a Connection axx11xx22 isis2 algorithm.1 Namely, the2 sum1 of all minimum distances in U is belong1.2 Biomimetic to the same class. Pattern These Recognition connectingclassification. branches BPR can was be first HD introduced linesame by class Shoujeu in a HDRR  − space by means(1) of− nearestRR Let S bethanNamely, the that set from ofthe M sum the elementsof Nodal− all minimum Connection of the distances≤ algorithm.≤ in U− is tion algorithm provides a more compact structure tween any two elements that belong to the same andBiomimeticBiomimeticdenote the Pattern Pattern usual Recognition EuclideanRecognition norm (BPR) (BPR) andLet is theis a aS neighbor.beneighbor. thedetermineddetermined set In In of order orderM based basedelements to to buildon build on whether whether theof the(1) the HD HD its its projection projection topological topological is is inside insidex x 2 q> x 2 x 1 WangWang in in 2002 2002· in in Beijing, Beijing,fromfrom· the the China China Principle Principle and and wasofLetwas of Homology-Continuity Homology-Continuity derivedS derivedbe theTheThe set minimum minimum of M (PHC) (PHC)elements distance, distance, ofDD the,, from fromNamely,xtrainingx toto a a thex setx11x sumx2 and2 isis ofU allbe minimuman empty  set.distances Without in U lossis of smaller for the Segment Connection algorithm. The x 2x 2Namely,x 1x 1smaller the for sum the of Segment− all minimum Connection distances algorithm. in−U is The than that from the Nodal Connection algorithm. segmentsBiomimetic or hyper-surfaces Pattern and the Recognition resulting topo-techniquetechnique (BPR) logical[8].[8]. is constructingPHC constructingstructure PHC a assumes assumesneighbor. a a that hyper-dimensional hyper-dimensionalthat the theLet In difference differenceSorder betraining theLet between (HD)set (HD) betweentoS set of buildbe and Mformations,formations, el-the el-elementsU be setthethethe anof line lineemptyHD it itofM is is segment. segment.the important importantelements set. topological training Without Let Let to to ofu u understandset understand= loss= the and of−− U be points pointsbebe an a aunit andunit and vector vector (1) fromfrom the the Principle Principle of of Homology-Continuity Homology-Continuity (PHC) (PHC) determineddetermined based based on on whether whether its its projection projectionsmaller for is is the inside insidex Segment2x 2 x 1x 1 Connection algorithm. The algorithms are performed on each training class, training set and U be an empty set. Without loss of generality, let x−smaller−1 andalgorithms x 2 forbe the the are Segment two performed closest Connection elements on each algorithm. training The class, Namely, the sum of all minimum distances in U is geometricgeometric body body to to mimic mimic a a biological biologicaltraininggenerality, system system set forfor and letUlinexlinebeand segments ansegmentsx emptybe thex inset. in2x 2 two HDx HD1x Without1 closest space. space. loss elements Let Let ofxx bebe an an element elementLet S be the set of M elements of the formstechnique a “biological” constructing organism, a which hyper-dimensional[8]. [8].can PHC PHCbe used assumes assumes for classification. thatements thatements (HD) the the ofdifference differenceof theformations, the same same between betweenempty class class set. itel- is el-is is gradually graduallyWithoutthe importantthe line line segment.changed.loss segment.changed. of to1 generality, understand Let Letgoinggoingu2u == from from let−− x pointsalgorithms x 1 1andbetobeto a ax x 2 unit unit2 andandbeand vectorare vectortheqq= performed =twox x closestx x 11, on, u u eachbebe the the training class, hence, they produce two “biological organisms,” generality, let x 1 and x 2 be the two closestnn elementsx 2x 2 x 1x 1 in S. Remove x algorithms1andhence,nn−−x 2 from they are S performedproduceand add two them on “biological each to U training organisms,” class, smaller for the Segment Connection algorithm. The classification.classification. BPR BPR was was first first introduced introducedgenerality, byin byS Shoujeu Shoujeu. Remove let x 1 inandinx 1RRandx 2andandbex 2 thefromxx11x twox22−S−bebe closestand a a line addline elements themsegment segment to U in in RR astrainingas well. well. set and U be an empty set. Without loss of One special characteristic of BPR is thatementsements it requires of of the the sameonly sameInIn a class other classsmall other is words, is words, gradually gradually there thereelements changed. is changed. is a a gradual gradual in goinggoing. connectionRemove connection from from x x 1 be-1 be-andtoto x projectionx 2 projection 2 andfromandqq == and of ofxhence,x xaddontoontox x 1 1them, they, u uthe thebebe line produceto line the the x 1 segment.. segment..so that two “biological Note Note that that organisms,” one structure for the Training Normal class and one in S. Remove x 1 andSx 2 from S andx 1 add themx to SU so that U = U . Then, we select the next element algorithms are performed on each training class, geometric body to mimic a biologicalWang system in 2002 for in Beijing,line segments China and inwas HD derived space.TheThe Let minimum minimumbe distance, distance, an −− elementDD,, from from xxx 2hence,totoone a a xgenerality,x structure they11xx22 isis produce for let thex 1 twoand Training “biologicalx 2 be Normal the two organisms,” class closest and elements one Wangtween intween 2002 any any in two twoBeijing, elements elements China thatx that1 andin belong belong wasSso. Remove that derived to toU the the=x same1 sameand. Then,x 2 fromand weandS selectanddenotedenote add the them nextthe the usual usual element to U Euclidean Euclidean norm norm and and the the number of samples as opposed to traditionalInIn other other words,pattern words, there there recognition is is a a gradual gradual so connectionthat connectionn U = be- be- . Then, Then,projectionprojection wewe selectselect of ofx x the2x theontoonto next next the elementthe element line line segment.. segment..none x structure inin NoteS Note soso that for that that that the its Training distance Normal to the line class segments and one in for the Training Cancer class. x 2 x 1 ·· ·· 3 S onefor structure the Training for the Cancer Training class. Normal class and one hence, they produce two “biological organisms,” classification. BPR was first introducedfrom byfrom Shoujeu the the Principle Principlein of of Homology-Continuity Homology-Continuityand x1sox2 thatx 3beinU (PHC) (PHC)S a=so line thatdetermined.determinedsegment Then, its distance we selectbased based in to the onthe on linewhether whether nextas segments elementwell. its its projection projection in  is isin inside insideS. Remove x 1 and x 2 from S and add them to U tween any two elements that belongR to the same andand x denote2denote the the usual usual Euclidean EuclideanRfor the norm norm Training and and the the Cancer class. tween any two elements that belongx 3 in toS so the that same its distance to the line segments in U is minimal;x x 22 x x 11 currently U simplyx contains1 a line algorithms. In recent years, BPR has been used successfully[8]. PHC assumes in voice that thedistance difference to betweenthe·· line ·segments el-· thethe line line in segment. segment.U is minimal; Let Let uu =currently= −− Ubebe forsimply a a unit theunitso Training vector vector that U = Cancer. class. Then, we select the next element one structure for the Training Normal class and one [8]. PHC assumes that the difference x 3 betweenUin isS so minimal; that el- its currently distance toU thesimply line contains segmentsx x 22 ax x 1 in1 line x 2 Wang in 2002 in Beijing, China and was derived TheU is minimal;minimum currently distance,U simply D , contains from ax lineto a segmentx1x2 −− connectingis x 1 to x 2. Again, we remove x 3 for the Training Cancer class. recognition [9], iris recognition [10], and facial recognitionementsements of of [11]. the thesame sameBPR class classcontains is is gradually gradually aU linesegmentis changed. minimal;changed.segment connecting currently connectinggoinggoingx from from1Utosimplyx x x 21 1 . toto Again,to containsx x 22 . andAgain,and we removeqq a== we linexx removex 3 x x 11,, u u x be3 be in the theS so that its distance to the line segments in from the Principle of Homology-Continuity (PHC) determinedsegment connecting basedx 1 onto x whether2. Again, we its remove projectionx 3 from is insideS and add−− it toU in the following  fashion: methods include different constructions as well InInas other otherdifferent words, words, there there is is a afrom gradual gradual S and connection connectionsegment addfrom itS connectingto be-and be- U in addprojection projectionthe itx 1following totoUx 2in of of. Again, thexx fashion:ontoonto following we the the remove line line fashion: segment.. segment..x 3 Note NoteU is that that minimal; currently U simply contains a line from S and add it to U in the followingx 2 x fashion:1 [8]. PHC assumes that the difference betweentweentween any any twoel- two elements elementsthe line that that segment. belong belongfrom to to the theS Letand same sameu add= it toandand−U in thedenotedenotebe following a the theunit usual usual fashion:vector Euclidean Euclidean norm normsegment and andx 1 the thex connecting3 x 1 to x 2. Again, we remove x 3 classification techniques. ··x 2 x ·1·x 1 x 3 U = (2) x 1 x 3  U−=  (2) fromx 2S xand3∗ add it to U in the following fashion: ementsIn this ofpaper, the the same focuses class are to isdevelop gradually two new changed. techniques forgoing from x 1U =to x 2 and q = x x1 x 2x 3(2)xx 3∗1, u be the   x 2 x 3∗ U =  −   (2) x x developingIn other BPR words, algorithms there and is aapply gradual them to connection DNA microarray be- dataprojection of x onto the line segment..x 2 x 3∗ Notewhere thatx 3∗ is determined based on the proposed1 3 where x 3∗ is determined based on the proposed U = (2) where x 3∗ is determined based on the proposed algorithms: x 2 x 3∗ fortween cancer any de- tection. two elements We aim to that build belongHD topological to the formations same and denotewherealgorithms: thex 3∗ usualis determined Euclidean based norm on the and proposed the   ·algorithms:· Ia. Nodal Connection: Connectx x 3 to the closest algorithms:Ia. Nodal Connection: Connect x 3 to the closest where 3∗ is determined based on the proposed Ia. Nodal Connection: Connect x 3 to the closest node of the line segmentalgorithms:x 1 to x 2. In this case, Ia. Nodalnode Connection: of the line segmentConnectx 1x to3 tox 2 the. In closest this case, node of the line segment x 1 to x 2. In this case, x ∗ is either x 1 or x 2. nodex ofis the either linex segmentor x . x to x . InDIMENSIONS this3 case, Ia.78Nodal Connection: Connect x 3 to the closest x x x 3∗ 1 2 1 2 3∗ is either 1 or 2. x 1 if x 3 nodex 1 < of thex 3 linex 2 segment x to x . In this case, x 3∗ is eitherx 1x 1ifor x x23. x 1 < x 3 x 3 x 2= − − (3) 1 2 x 1 if x x 33 =x 1 < x 3 x 2 − −∗ x (3)2 if x 3 xx 2 is eitherx 3 x x 1or x . x 3 = −∗ x 1 x 2if if−x 3 x 3x 1 x (3)2< x 3 x 3x 2 x 1  − 3∗ ≤ −1 2 ∗ x 2 if x 3x 3= x 2 x 3 x −1 − ≤ − − (3)  ∗ − x ≤ 2 if −x 3 x 2 x 3 IIa.x 1 Segment Connection: Connectx 1 if x 3x 3to thex 1 < x 3 x 2 IIa. Segment Connection: − ≤ Connect− x 3 to the x 3 = − − (3) IIa. Segment Connection: Connect x 3 to the closest element of x1∗x2. In thisx 2 case,if xx 3∗3 couldx 2 x 3 x 1 IIa. Segmentclosest elementConnection: of x1x2Connect. In this case,x 3 tox 3∗ thecould  − ≤ − closest element of x1x2. In this case, x 3∗ could be x 1 , x 2 or a new element,x t (not from the closestbe x element1 , x 2 or of ax new1x2. element, In this case,x t (notx 3∗ fromcould the IIa. Segment Connection: Connect x 3 to the be x 1 , x 2 or a new element,x t (not from the original training set) on the segment from x 1 be originalx 1 , x 2 or training a new set) element, on thex t segment(not from from the x 1 closest element of x1x2. In thisFigure case, x 3.3∗ couldDevelopment of the skeleton-like original training set) on the segment from x 1 to x 2. Figure 3. Development of the skeleton-like originalto x 2. training set) on the segmentFigure from 3.x 1Development ofbe thex 1 , skeleton-likex 2 or a new element,structuresx t (not from using the two assembling algorithms. In the to x 2. structuresFigure 3. usingDevelopmentx two assembling ofx thex skeleton-like algorithms. In the to x . structures– usingIf the two projection assemblingoriginal of algorithms.3 lies training outside In set) the1 on2, theNodal segment Connection from x 1 (left), the next point is selected 2– If the projection of x 3 lies outside x1x2,structuresNodal Connection using two assembling (left), the next algorithms. point is In selected the Figure 3. Development of the skeleton-like – If the projection of x 3 lies outside x1x2, Nodal Connectionthen x 3 (left),is definedtheto nextx 2. aspoint in is Equation selected 3. based on its minimal distance to the current – If thethen projectionx 3 is defined of x 3 lies as outside in Equationx1x2, 3.Nodal∗ based Connection on its minimal(left), the distance next point to the is selected current structures using two assembling algorithms. In the then x 3 is defined as in∗ Equation 3. based onFigure its minimal 2 shows distance how the to theprojection current of x 3 structure. In the Segment Connection (right) the ∗ thenFigurex 3 2is shows defined how as the in projection Equation of 3. x 3 structure.– InIf the the Segment projection Connection of x 3 lies (right) outside thex1x2, Nodal Connection (left), the next point is selected Figure 2 shows how the∗ projection of x 3 can lie insidebased (A) on orits outside minimal (B). distance to the current Figurecan lie2 shows inside how (A) theor outside projectionstructure. (B). of Inx the Segment Connectionthen x (right)3 is the defined asnext in point Equation is the 3. closest point on the segment. 3 structure.next point In the is the Segment closest∗ Connection point on the (right) segment. the based on its minimal distance to the current can lie inside (A) or outside (B). next point– If is the the projectionclosest point of onx 3 thelies segment. inside x1x2, x can– If lie the inside projection (A) or of outsidex 3 lies (B). inside x1x2, next point is theFigure closest 2 shows point onhow the the segment. projection of 3 structure. In the Segment Connection (right) the – If the projection of x 3 lies inside x1x2, then We also defined two different extension pos- – If thethen projection of x 3 lies inside x1x2, We alsocan defined lie inside two (A)different or outside extension (B). pos- next point is the closest point on the segment. then We also defined two differentIf the extension projection pos- of x sibilities,lies inside wherex xU ,can be composed of nodes which then x 3 =sibilities,x We1 +( alsox where3– definedx 1U) can( twou ) be differentu composed(4) extension3 of nodes pos- which1 2 x 3 = x 1 +(x 3 x 1sibilities,) (u ) whereu (4)U∗can be composed−then of• nodes∗ which can connect to more than one node from S (“Multi- x 3 = x 1 +(x 3 x 1) ∗ (u ) u (4)− • ∗ can connect toU more than one node from S (“Multi- We also defined two different extension pos- x = x +(x x ) (u ) u (4) sibilities,x 2 wherex 1 can be composed of nodes which ∗ − 3 • 1 ∗x 2 3x 1 1 can connectwhere to moreu = than one− node. from S (“Multi- Lateral”), or U can contain nodes which can only where∗ u = − − . • ∗ Lateral”),x 2 x 1 or U can contain nodes whichS can only sibilities, where U can be composed of nodes which x 2 x 1 x 2 x 1 can connect − to morex than3 = onex 1 +( nodex 3 fromx 1) (“Multi-(u ) u (4) where u = − . x 2  x 1−  Lateral”),Continue or U can in contain this fashion nodes untilwhichS canhas only been connect to a one node (“Sequential”). x 2 x 1where u = − . connect to a one node∗ (“Sequential”).− • ∗ can connect to more than one node from S (“Multi-  − Continue inx this2 x 1 fashion until S has beenLateral”), or U can contain nodes which can only connect to a one node (“Sequential”). x 2 x 1 Continue in this fashion until S −has been exhausted. At the end of the algorithm,where u = the set− U. 1. Multi-lateral: The algorithmU proceeds such exhausted.Continue At in the this end fashion of the untilalgorithm,S has the been set Uconnect1. Multi-lateral: to a one node (“Sequential”).The algorithmx 2 x 1 proceeds such Lateral”), or can contain nodes which can only exhausted. At the end of the algorithm, the set U 1.willMulti-lateral: contain (M The1) segments algorithm proceeds such−  that each line segment can connect to any exhausted.will contain At the(M end1) ofsegments the algorithm, the set U 1.− Multi-lateral:thatContinue each lineThe in segment algorithm this fashion can proceeds connect until S such tohas any been connect to a one node (“Sequential”). will contain (M 1) segments − that each line segment can connect to any previously constructed line as shown in Figure −will contain (M 1) segments x that1exhausted.previouslyx each3 line Atx constructedM segment the end of can line the asconnect algorithm, shown to in anyFigure the set U 1. Multi-lateral: The algorithm proceeds such − x 1 x 3 x M previouslyU constructed= line··· as shown in Figure(5) 4. x x Ux = ··· (5) x previouslywill4.x contain constructedx (M 1) linesegments as shown in Figure that each line segment can connect to any 1 3 M x x x 2 3∗ M∗ − U = ··· x 1 2x 3 3∗ (5)x M M∗ 4.  ···  2. Sequential: The algorithmpreviously proceeds constructed such that line as shown in Figure x 2 x 3∗ U =x M∗  ······  (5) 2.4. Sequential: The algorithm proceeds such that  ··· x 2 x ∗ x ∗ 2.withSequential: at least oneThe node algorithm of the proceeds segment suchx 1 being thatx 3 an x M each line can only connect to a node that does with at least one node3 of··· theM segment being an 2. Sequential:each line canTheU = only algorithm connect proceeds··· to a node such that that does (5) 4. with at least one node of the segment being an elementeach line of the can training only connect set. Notice to a node that that whilex 2 doesx both∗ x ∗ not belong to U as shown in Figure 4. withelement at least ofone the training node of set. the Notice segment that being while an both eachnot line belong can only to U connectas shown3 to a··· in node FigureM that 4. does 2. Sequential: The algorithm proceeds such that element of the training set. Notice that while both not belong to U as shown in Figure 4. element of the training set. Notice that while both notwith belong at least to U oneas shown node of in Figurethe segment 4. being an each line can only connect to a node that does element of the training set. Notice that while both not belong to U as shown in Figure 4. bothboth cells. cells. A A black black spot spot indicates indicates thatbothboth that the thecells. cells. gene gene A A isblack black is class.class. spot spot These Theseindicates indicates connecting connecting that that the the branches gene genebranches is is canclass. canclass. be be These These HD HD line line connecting connecting branches branches can can be be HD HD line line inactiveinactive in in both both cell cell [3]. [3]. A A laser laserinactiveinactive then then scans scans in in both both the the cell cellsegmentssegments [3]. [3]. A A or or laser laser hyper-surfaces hyper-surfaces then then scans scans and and the the the thesegmentssegments resulting resulting or ortopo- topo- hyper-surfaces hyper-surfaces and and the the resulting resulting topo- topo- microarraymicroarray and and determines determines the the expression expressionmicroarray levels levels and of of determineslogicallogical structure structure the expression forms forms levels a a “biological” “biological” of logicallogical organism, organism, structure structure forms forms a a “biological” “biological” organism, organism, eacheach gene gene according according to to the the intensity intensityeach ofeach of the the gene gene color color according according and and whichwhich to to the the can can intensity intensity be be used used of of the thefor for color color classification. classification. and and which One One can special special be used for classification. One special isis given given a a numeric numeric value. value. Each Each sample sampleisis given given is is defined defined a a numeric numeric as as characteristic value.characteristic value. Each Each sample sample of of BPR BPR is is is is defined defined that that it it requires as asrequirescharacteristiccharacteristic only only a a small small of of BPR BPR is is that that it it requires requires only only a a small small aa sequence sequence of of numerical numerical values values of of geneaa gene sequence sequence expression expression of of numerical numericalnumbernumber values values of of samples samples of of gene gene as asopposed expression expressionopposed to to traditional traditionalnumbernumber of of pattern pattern samples samples as as opposed opposed to to traditional traditional pattern pattern levels.levels. In In recent recent years, years, DNA DNA microarray microarraylevels.levels. technology technology In In recent recent years, years,recognitionrecognition DNA DNA microarray microarray algorithms. algorithms. technology technology In In recent recent years,recognitionrecognition years, BPR BPR hasalgorithms. algorithms. has In In recent recent years, years, BPR BPR has has hashas provided provided a a promising promising tool tool to tohashas determine determine provided provided the the a a promising promisingbeenbeen used used tool toolsuccessfully successfully to to determine determine in in voice voice the the recognition recognitionbeenbeen used used [9], [9], successfully successfully iris iris in in voice voice recognition recognition [9], [9], iris iris diagnosisdiagnosis and and prognosis prognosisbothboth of cells. of cells. different different A A blackdiagnosisdiagnosis black cancer cancer spot spot andtypes andindicates types indicates prognosis prognosisrecognitionrecognition that that the of of the different differentgene [10], [10], gene is and and is cancer cancerclass. facial facialclass. These types types recognition recognition These connectingrecognitionrecognition connecting [11]. [11]. branches BPR BPR [10], [10], branches and and can facial facialcan be be HD recognition recognition HD line line [11]. [11]. BPR BPR [4-7].[4-7]. [4-7].[4-7]. methodsmethods include include different different constructions constructionsmethods as as well wellinclude as as different constructions as well as inactiveinactive in in both both cell cellbothboth [3]. [3]. cells. cells. A A laser A Alaser black black then then spot scansspot scans indicates indicates the thesegments thatsegments that the the or gene gene or hyper-surfaces hyper-surfaces is is class.class. These These and and the connecting connecting the resulting resulting branches topo- branches topo- can can be be HD HD line line differentdifferent classification classification techniques. techniques. differentdifferent classification classification techniques. techniques. microarraymicroarray and and determines determinesinactiveinactive the in thein expression both both expression cell cell levels [3]. [3]. levels A A of laser laseroflogicallogical then then structurescans scans structure the the formssegments formssegments a a“biological” “biological”or or hyper-surfaces hyper-surfaces organism, organism, and and the the resulting resulting topo- topo- InIn this this paper, paper, the the focuses focuses are are to to develop developInIn this this two two paper, paper, the the focuses focuses are are to to develop develop two two eacheach gene gene according accordingmicroarray tomicroarray tothe the intensity intensity and and of determines determines ofthe the color color and the the and expression expressionwhichwhich can levelscan levels be be used of of usedlogical forlogical for classification. classification. structure structure formsOne forms One special a aspecial “biological” “biological” organism, organism, newnew techniques techniques for for developing developing BPR BPRnewnew algorithms algorithms techniques techniques and and for for developing developing BPR BPR algorithms algorithms and and both cells. A black spot indicatesis givenis given that a numericthe a numeric gene value. iseacheach value.class. Eachgene gene Each accordingThese sample according sample connecting istodefined is to thedefined the intensity intensity as branches ascharacteristic ofcharacteristic of the thecan color color be HDof and and of BPR line BPRwhichwhich is thatis that can canit requires it be berequires used used only for onlyfor a classification. classification.small a small One One special special applyapply them them to to DNA DNA microarray microarray data dataapplyapply for for cancer cancer them them to to de- de- DNA DNA microarray microarray data data for for cancer cancer de- de- inactive in both cell [3]. Aa laser sequencea sequence then of scans of numerical numerical theisis given givensegments values values a a numeric numeric of or of gene hyper-surfacesgene value. value. expression expression Each Each sample sample andnumbernumber the is is resulting defined of defined of samples samples as as topo- ascharacteristiccharacteristic as opposed opposed to totraditional of of traditional BPR BPR is is pattern that that pattern it it requires requires only only a a small small tection.tection. We We aim aim to to build build HD HD topological topologicaltection.tection. formations formations We We aim aim to to build build HD HD topological topological formations formations microarray and determines thelevels. expressionlevels. In In recent recent levels years, years, ofaa sequence DNA sequencelogical DNA microarray microarray of structureof numerical numerical technology technologyforms values values a “biological” of ofrecognition generecognition gene expression expression organism, algorithms. algorithms.numbernumber In In recent of of recent samples samples years, years, as as BPR opposed opposed BPR has has to to traditional traditional pattern pattern (skeleton-like(skeleton-like structures) structures) and and pattern pattern(skeleton-like(skeleton-like recognition recognition structures) structures) and and pattern pattern recognition recognition each gene according to the intensityhashas provided of provided the color a apromising and promisinglevels.levels.which In In tool recent recenttool can to tobe years,determine years, determineused DNA DNA for the classification.microarray microarray thebeenbeen used technology technology used One successfully successfully specialrecognitionrecognition in invoice voice recognition algorithms. algorithms. recognition [9], In In [9], iris recent recent iris years, years, BPR BPR has has both cells. Aschemes. blackschemes. spot We We indicates propose propose that a a new new the approach approachgeneschemes. isschemes. to toclass. the the We WePHC, ThesePHC, propose propose connecting a a new new approach approachbranches to tocan the the be PHC, PHC, HD line is given a numeric value. Eachdiagnosis samplediagnosis is and defined and prognosis prognosis ashashascharacteristic provided ofprovided of different different a a promising promisingofcancer cancerBPR types istypes that tool tool itrecognition to torequiresrecognition determine determine only [10], [10],athe the small and andbeenbeen facial facial used used recognition successfullyrecognition successfully [11]. [11]. in in voiceBPR voice BPR recognition recognition [9], [9], iris iris inactive in bothwherewhere cell elements elements [3]. A oflaser of the the then same same scans class class theare arewhere topologically topologicallysegments elements or of hyper-surfaces the same class and are the topologically resulting topo- a sequence of numerical values[4-7].[4-7]. of gene expressiondiagnosisdiagnosisnumber and and of samplesprognosis prognosis as of opposedof different differentmethods tomethods traditional cancer cancer include include types types pattern different differentrecognitionrecognition constructions constructions [10], [10], and and as facial asfacial well well asrecognition recognition as [11]. [11]. BPR BPR microarray andassembledassembled determines as as the nodes nodes expression in in a a HD HD levels space space ofassembledassembled and andlogical are are con-as as con- structure nodes nodes in in forms a a HD HD a space space “biological” and and are are organism, con- con- levels. In recent years, DNA microarray technology[4-7].[4-7].recognition algorithms. In recentdifferentdifferent years, classification classification BPR hasmethodsmethods techniques. techniques. include include different different constructions constructions as as well well as as each gene accordingnectednected to by by the means means intensity of of nearest nearest of the color neighbor. neighbor. andnectednectedwhich by by means means can be of of used nearest nearest for neighbor. neighbor.classification. One special has provided a promising tool to determine the been used successfully in voice recognitionInIn this this paper, [9], paper, iris thedifferentdifferent the focuses focuses classification classification are are to to develop develop techniques. techniques. two two is given a numeric value. Each sample is defined as characteristic of BPR is that it requires only a small diagnosis and prognosis of different cancer types recognition [10], and facial recognitionnewnew techniques techniques [11]. for BPR for developing developingInIn this this BPR BPR paper, paper, algorithms algorithms the the focuses focuses and and are are to ton develop develop two two botha sequence cells. A of black numerical spot indicates values of that gene the expression gene is class.number These of samples connectinginner as opposed branches product to cantraditional in R be, HD respectively. pattern line It can be shown algorithms use the same name set, a different struc- [4-7]. methods include different2.2. constructionsM METHODOLOGYapplyETHODOLOGYapply them them as to towell DNA DNA asnew microarraynew microarray techniques techniques2.2. data M M dataETHODOLOGYETHODOLOGY for for for developing developingcancer cancer de- de- BPR BPR algorithms algorithms and and inactivelevels. In in recent both cellyears, [3]. DNA A microarraylaser then scanstechnology the segmentsrecognition or hyper-surfaces algorithms.that In and recent the resultingyears, BPR topo- has ture is constructed. Figure 3 shows an example of different2.12.1 classification Data Data Sets Sets techniques.tection.tection. We We aim aim2.12.1 toDatato buildDataapplyapply build SetsHD Setsthem HD them topological topological to to DNA DNA formations microarray microarray formations data data for for cancer cancer de- de- microarrayhas provided and determinesa promising the tool expression to determine levels of the logicalbeen used structure successfully forms in a voice “biological”x recognitionx organism, [9], irisq<0 the development of and the contrast between both In thisinner paper, product the focuses in (skeleton-liken(skeleton-like, are respectively. to develop structures) It structures) two cantection.tection. be and shown We Weand pattern aim aim patternalgorithms to to build recognitionbuild recognition HD HD use topological topological the same1 name formations formations set, a different struc- 2 eachdiagnosis gene according and prognosis toTheThe the BPR intensity BPRof different technique techniqueR of the cancer color introduced introduced typesand inwhich inrecognition thisThe this canpaper paperBPR be [10],technique used and for facial introduced classification. recognition− in this One [11]. paper2 special BPR2 proposed algorithms in R . The Segment Connec- new techniques for developingschemes. BPRschemes. algorithms We We propose propose and a new a new approach approachD to to= the the PHC, PHC,x 2 x 1 + q 0 q x 2 x 1 is given a numericisis athat a value. general general Each one, one, sample and and can is can defined therefore thereforen asisis be abecharacteristic a general general applied(skeleton-like applied(skeleton-like one, one, to to of and and BPR structures) structures)ture can can is is that therefore therefore constructed. it and and requires patternbe be pattern− applied applied Figure only recognition recognition a small3 to to shows≤ an example≤ − of [4-7].apply them to DNA microarrayinner productwherewhere data elements in for elementsR cancer, respectively. of de-ofmethods the the same same It include class can class arebe differentare shown topologically topologically constructionsalgorithmsx x use as the well same asq> namex set, ax different struc-tion algorithm provides a more compact structure n n schemes.schemes.n We We propose proposethe development a a new new approach approach of2 and to to the the the contrast PHC, PHC, between2 both1 a sequence of numericalanyany data data setvalues setx in ininner a of ax specified1 specified gene product expression format.inner format. in q< productanyany, An An0respectively.numberdifferent data data applicable applicable in set setR of, inclassificationsamples in respectively. It a a can specified specified be asopposed shown techniques. It format. format. can  toalgorithms be− traditional An An shown applicable applicable usealgorithms pattern the same use name the− set, same a different name set,than struc- a different that from struc- the Nodal Connection algorithm. tection. We aim to buildinnerthat HD product topologicalassembledassembled in R formationsasR,as respectively. nodes nodes in in a HDa It HD can space bespace shown and and are arealgorithmsture con- con- is constructed. use the2 same Figure name 3 shows set, a different an example(1) struc- of levels. In recentdatadata years, set set DNA should should microarray− be be a a nn 2 technologybyby m2m matrix,matrix,datadatarecognition set setwhere where where should should elementsn elementsn algorithms. be be a a nproposed of of thebyby the In same samem recent algorithmsmatrix, class class years, are are where in topologically topologically BPRR .n The has Segment Connec- D = thatxthat2 x S1S + thatq GG0 q x 2 Inn x thiss1s paper,SS the focusestureGtheture development is are constructed. is to constructed. developture ofss Figure istwo and constructed. Figure the 3 contrastshows 3 shows Figure an between an example example 3 shows bothNamely, of of an example the sum of of all minimum distances in U is (skeleton-like structures) andnected patternnected−x− − byxinner−1 by meansrecognition means product of ofassembled nearestassembled in nearestq2 x new22 x x1 techniques1q<0 forq< developing0 theproposedthe developmentBPR development algorithms algorithmsture of is andof in constructed. andR the. the contrast The contrast Segment Figure between between 3 Connec- shows bothsmaller both an example for the Segmentof Connection algorithm. The schemes. We propose D a− new= approachx x x2 1 tox11 the+ PHC, nq nectednected−q<0 0q by by means meansx 2 oftrainingx of1 nearest nearest set neighbor. neighbor. and U be an empty set.2 Without loss2 of referencereference [3]. [3]. (A) (A) DNA DNA from from a a cell cellreferencereference isdiagnosis is extracted. extracted. [3]. [3]. and (A) (A) prognosisofof DNA DNA objects, objects, from from of such such a adifferent cell cell as asinner is is genes, genes, extracted. extracted. cancer − product images,− images, types inofof orR or recognition objects, objects, frame frame,− respectively. data. data.such such2 [10],(1) as as It2 genes, andgenes,than can facial that be images, images, shown recognition fromproposed or or the framealgorithms frame Nodal [11]. algorithms data. data.proposed BPR Connection use thein algorithms2 same. algorithm. The name Segment in R set,. The a Connec- different Segment struc- Connec- D = xD− = x 2 2 +apply2 xq22 ≤ them0x 1 ≤ q to+ q DNAx −0 microarrayx q proposedtionx 2 data algorithmx 1 algorithms for cancerthe provides development de- in R a.R more The of Segment compact and the Connec- structure contrastalgorithms between are both performed on each training class, where elements of theD same=  classx x are2x 2 topologically2x 1 1+2.xq2. METHODOLOGY Mx0q>ETHODOLOGY1 q x 2 x 2x 1Namely,2q<xgenerality,1 01 the sum let x of1 and all minimumx 2 be thedistances two closest in elementsU is EachEach DNA DNA segment segment has has a a corresponding correspondingEach[4-7]. DNA “spot” “spot” segment on on However,However, has a correspondingLet for for ourS ourbe purposes, purposes,that the  “spot” set− we we− of on apply− applyM However, thiselements thismethodstection. technique technique−≤ for ofinclude We≤ our ≤ the aim−≤ purposes,different to− build−≤ we HD constructions≤ applytionthan topologicaltion algorithm−this that algorithmture technique from as isformations provideswelltion constructed. the provides algorithmas Nodal a more a Connection Figure more provides compact compact 3 shows a2 algorithm. structure more structure an compact example structure of assembled as nodes in a HD spacex x x andx 2 are con-x − xq>2 q>x 2 x 2x 2 x 1q>in (1)S.x Remove2 x 1 x and xproposedfrom S algorithmsand add them in R to.U Thehence, Segment they Connec- produce two “biological organisms,” thethe microarray microarray chip. chip. (B) (B) When When comparing comparingthethe microarray microarray gene gene chip. chip.toto DNA DNA (B) (B) When microarrayWhen microarray comparing comparingU data.2.1 data.2.1 Data  gene gene Data−D2 Sets= Sets toto  DNAdifferent DNA−x 2 microarray microarray x 1 classification2 + q−1 data. data.smaller2.2.0 M M techniques.ETHODOLOGYETHODOLOGYq for− thex 2 Segment x 1 Connectionthan2 that from algorithm. the Nodal The Connection algorithm. training set and be  an− empty set. Without(skeleton-like loss − of structures) andthanNamely,than pattern thatx 1the that the(1) from development recognition sumfrom the of the Nodalall Nodal minimum of Connection and Connection the distances contrast algorithm. algorithm. in U betweenis both nected by means of nearest neighbor.Let S bex thex 1 set of −M elements q<0 ofso≤(1) the that(1)≤ U = − . Then,tion we algorithm select the provides next element a more compactone structure structure for the Training Normal class and one x x 2In this paper,algorithmsq> the focusesx 2 arex are1 performedx 2 to develop on two each training2 class, expressionexpression levels, levels, DNA DNA from from each eachexpression cellexpression cell is is labeled labeled levels, levels, DNA DNAgenerality, from from each each let x cell cell1 and is isx labeled labeled2 Thebe The the− BPR two BPRLet  technique closest techniqueSschemes.2.122.1be Dataelements Data the2 introduced We introduced Sets set Sets propose of M in in this aelements newthis paperNamely,smallerpaper approachNamely, ofproposed for the the theto sum theNamely, Segmentsum of PHC,algorithms allof allminimum the Connection minimum sum in ofR distances all. algorithm.distances The minimum Segment in U in TheisU distancesis Connec- in U is trainingLetD setLet=S andSbeUbe thexbe2 the anset x setempty1 of− ofM+ set.qMelements Withoutelements0 q of lossx of thex of2 theSx−1 (1) than that from the Nodal Connectionfor the algorithm.Training Cancer class. withwith different different fluorescent fluorescent tags tags and and hybridizedwith hybridized different on on fluorescentin S. Removetags andx hybridizedand x from on S andnew add techniques them to U forhence, developingn 3 in theyso BPRsmaller produce that algorithms its for twodistance thesmaller “biological Segment and to for the the Connection line organisms,” Segment segments algorithm. Connection in The algorithm. The 2.22.2 BPR BPR Algorithm Algorithmtraininggenerality,training1 is with setwithis a setlet and general a2Hyper-Dimensional Hyper-Dimensionalx and generaltrainingUand beU one,bex an2.2−2.2 one, set anemptybeinner and BPRwhere BPR and theempty and can two set.U productAlgorithm Algorithm can Lineelements LinebeThe set.The therefore closestWithoutan therefore Without BPR≤ BPR inempty elementsof withR with≤ loss technique betechnique the, set.loss be respectively. Hyper-Dimensionalapplied Hyper-Dimensionalof same applied− Without ofsmalleralgorithms introduced classintroduced to to loss Itare fortion can oftopologically the are in inalgorithm be SegmentNamely,Line Linethisthis performed shown paper papertheConnection providesalgorithms onsum each of a all algorithm. more usetraining minimum the compact same class, The distances name structure set, in aU differentis struc- 2. METHODOLOGYx 1 1x xLet22 S be theq> set ofx 2MU xiselements1 minimal; of currently the U simply contains a line thethe microarray microarray chip. chip. thethe microarray microarrayso that U chip. chip.= . Then, we  select− theapplythatassembled next them element to as DNA nodes one microarray− structure in a HDalgorithms for data space the for Training and cancer arealgorithms are performed de-Normal con- ture are class on is performed andconstructed. each one training on Figure each class, training 3 shows class, an example of SegmentsSegments generality,in generality,Sx 2. Removeanyany let data letx datax generality,xand setand1 setandx inxSegmentsSegments inxbe afrom2 letspecified a thebeis specifiedx thetwoS a1 and generaland two closest format.x add closest2 format.be one,them elements the An elementstwo and Anto applicableU closest applicablecanalgorithmshence, therefore(1) elements theythanare be produce thatsmaller applied performed fromtwo for to the the on “biological NodalSegment each training Connection Connection organisms,” class, algorithm. algorithm. The training11 2 set2 andisU a generalbe an empty one,forsegment set. and the Without Training can connecting therefore loss Cancer of bex class.1 appliedto x 2. Again, to we remove x 3 2.1 Data Setsx 3 in S so that its distancex 1 to the linetection.nected segments We by aimmeans in to build of nearest HD topological neighbor.hence, they formationshence, produce they two produce “biological two organisms,” “biological organisms,” insoSin that. RemoveS.dataU Removedata= setx setin shouldxand.SS should1 Then,.andn Removex bexfrom we be2 afromanynany aselectSxSn1Sand databyandS data thebyandx addxmM set set2 nextG addmfromx them inmatrix,G in elementthemmatrix, a aS to specified specifiedandU towhere whereU addhence,oneq element developmentmmxGG of2 ofisis nearest nearestx the the1 number number of and the contrast between both from S and add it to xU inx the followingwherex 2  fashion:q< elements0 of the same class are topologicallyU = (2) EachEach DNA DNA segment segment has hasany a corresponding a datacorresponding set in a “spot” specified “spot” onU onis format.However, minimal;However, AnU 1 for currentlyis applicable for minimal;our our purposes, purposes,U currentlysimply we we− applycontainsU apply simply this this a technique line techniquecontains ahence, line− for theyx the produce Trainingx than2 thattwo Cancer from“biological class. the Nodal organisms,” Connection algorithm. BiomimeticBiomimetic Pattern Pattern Recognition Recognition (BPR) (BPR)Biomimeticreferencereference is is a a [3]. [3].neighbor. Patternneighbor. (A) (A) DNA DNA Recognition In InU fromsegment fromorder orderis minimal; ain ato cell tocell(BPR)S connecting. buildbuild Remove is is− extracted.x extracted.currently is3 the thein a xS HD1neighbor.1neighbor. HDsotoandUx that2.1topological topological2ofsimplyxof.2 Again, Data itsobjects, objects,from In In distance contains Setsorder orderS we suchand such remove to to add a as theas build build line genes,themx genes, line3 the the segments to images, images,U HD HD topological topological in or or frame frame2(1) data. data.3∗ D = 2 assembled2 n as nodes in a HDproposed spacen and algorithms are con- in R . The Segment Connec- data set should be a n segmentby tomto DNA connectingmatrix, DNAx segment2 microarray microarrayx x where11 x connecting+tonq data.x data..0 Again,xq1 to wexx inner22 remove. Again,x 1 productx we remove in ,x respectively.3 ItNamely, can be shown the sumalgorithms of all minimum use the distances same name in U set,is a different struc- techniquetechnique constructing constructingthe athe a microarray hyper-dimensional hyper-dimensional microarraytechniquetechnique chip. chip.EachEach (B) (HD)(B) constructing constructing (HD) DNA DNAWhen When segment segmentformations, comparingformations, comparing a a hyper-dimensionalhas hyper-dimensional has itgene a itasegmentSfrom iscorresponding gene iscorresponding important importantsoSand connecting thatG add toU tox (HD) (HD)innerU1“spot” understand= “spot” understand− itisx tox3 productminimal;1 formations,Uformations, onto on.1in Then,x 2 points theHowever, points.sHowever, in Again,2 currently followingRweLet≤ itand it, and select isrespectively. is weS for≤ importantfor importantbe removeU our fashion:theoursimply the− purposes, nextpurposes,x set to to It 3 containselement understand understandcan of3 tion weM beweapply shown applyR algorithmelementsaone line points points this this structurealgorithmstechnique and andtechnique of provides the for usethe a more Trainingthe same compact Normalname structure set, class a different and one struc- − −U = x 2 nected byThe means BPR(2) of technique nearestwhere neighbor.xintroduced3∗ is determined in this paper based on the proposed DNA microarraysexpressionexpression process levels, from levels, DNA DNAis from the from each number each cell cell of is samplesis labeled labeledfromS andS mandx G thatis addfromx 2 the itSU numbertoandUtrainingin add theq> it following set to xU and2 inUxthat the1 fashion:be following an empty fashion: set. Withoutfor theture Training loss is of constructed. Cancersmaller forclass. Figure the Segment 3ture shows is constructed.Connection an example algorithm. Figure of 3 shows The an example of Figuregeometricgeometric 1. body body to to mimic mimic a a biological biologicalgeometricgeometric system systemthethe body bodymicroarray microarray for for to tolineline mimic mimic chip. segments chip. segments a a (B) (B) biological biologicalfrom When inWhen in HD HDx 3 comparingand comparing systemsystem space.in space.  addS−x sosegment2 Let foritfor Let that tox gene gene3∗x x lineline itsbe connectingbein distance an segments segmentstheantoto element elementDNA following DNAx to 1 microarray in inmicroarray theto HD HD−x fashion: line2. space.Again, space. segments data. data. Let weLetthan removexx inbebe that an anx 3 element elementfrom the Nodal Connection algorithm.   x 1is ax 3 general one, andalgorithms: can(1) therefore be applied to algorithms are performed on each training class, reference [3]. (A) DNA fromwith awith cell different isdifferent extracted. fluorescent fluorescentof tags objects, tags and and suchhybridizednn hybridized asx genes,xxx on on images, or frameU = data.generality,nnnn x x let x 1 and x 2(2)nbe the two closestnn elementsthe development of and thethe contrast development between of both and the contrast between both classification.classification. BPR BPR was was first first introduced introducedclassification.classification. by byexpressionexpression Shoujeu Shoujeu BPR BPR levels,in levels,in was wasRR first first DNAand DNAand introduced introduced1 from from1 22 bebe each eachU a2.2 a by byis line2.2 line cell cell ShoujeuBPR Shoujeuminimal; segmentBPR segment isfrom is Algorithmlabeled labeled AlgorithmS currentlyinin inand inxRRRx add with1andxandasinner1as with itxU well.3 Hyper-Dimensionalwell. to11simply Hyper-Dimensional productU22 bebeinx 1 thea a containsx lineinq< line3 followingR0 segment segment, respectively.x a LineNamely, line Linefashion:x in1 in R the Itasas can sum well. well. be ofq< shown all0 minimumalgorithms distances use intheU sameis name set, a different struc- wherewhere x 3∗ isis determined determinedLet S bebased based the on setx x 12 anythe the ofx x 3 3∗proposed dataM proposedU =elements set in2. algorithms: a M specifiedETHODOLOGY of the format. An(2) applicableConnecthence,x to they the closest produce2 two “biological organisms,”2 Each DNA segment has a corresponding “spot”thethe microarray on microarraywithwithHowever, different different chip. chip.The for fluorescent fluorescent minimumour purposes, tags tags distance, weand and apply hybridized hybridizedDD, this fromU technique=U onxThe onx= toin− minimum aSx.thatx Remove xx is2 distance,x 21 andIa.xD(2)2,,from(2)Nodal from from−Sxx Connection:and toto add a a2x themxproposed2 isis to U algorithms3ture is in constructed.R .proposed The Segment Figure algorithms Connec- 3 shows in R an. example The Segment of Connec- WangWang in in 2002 2002 in in Beijing, Beijing, China China and andWang was was in derived derived 2002 in Beijing,The minimum China and distance,segment wasSegmentsSegments derivedD, connecting from= tox 2.2 a212.2to1x BPR1x BPRx∗212.is Again, Algorithm+ Algorithmxq2 D wex0=3∗n remove withq with Hyper-Dimensional Hyper-Dimensionalxsmallerx m2x2 3 xx 11 for+ the11 q22 Segment0 Linen Lineq Connectionx 2 x 1 algorithm. The algorithms: training set and U be anx 2 emptydatax 3∗ set set.3 should Withoutx 1 bex a lossS ofby G matrix, where s one structure for the Training Normal class and one the microarray chip. (B) When comparing gene to DNA microarray data.where x ∗ is determined 2.1so based that Data−U on Sets = the 1 proposed. Then,3≤− wenode≤ select− of−− the the line next segmenttion element algorithm≤x 1≤ to x provides2.− In this a case, moretion algorithmcompact structure provides a more compact structure fromfrom the the Principle Principle of of Homology-Continuity Homology-Continuityfromfrom the the (PHC) Principle Principle (PHC) determineddetermined of ofthethe Homology-Continuity Homology-Continuity microarray microarray based based onchip. chip.on whether whether3 (PHC) (PHC) its its projection projectiondetermineddeterminedx x is is insideU inside based based= x on on2 whether whetherq> x its itsx projection projectionalgorithmsx x (2) is is are inside inside performedq> x onxthe each development training class, of and the contrast between both Figure 1. DNA microarraysgenerality,from processS let2.2.1and fromx 2.2.11 and add Training Trainingx it2 be toisSegmentsSegments theUProcess: the2 Process:in two number the closestBy followingx xBy of providing elementsx providingx samples1  fashion:2 a and new a new1 m2 q thisx 2 technique for−x the1 Training Normal− class(3) and one x 3∗ is either x 1 or x 2x .2  −− approachapproach x x to to the theConnect PHC, PHC,x x we wex to connect connect the− closest nodes nodes smaller from fromx the for the the Segment Connectionsmaller for algorithm. the Segment The Connection algorithm. The InIn other otherthe words, words, microarray there there chip. is is a a gradual gradualInIn connection connection other other words, words,Segments be- be- there thereprojectionprojection is is a a gradual gradual of ofIa.x x Ia.ontoonto connectionNodal connectionnodeNodal the the of Connection: line the line Connection:trainingalgorithms:Ia. be- be- line segment.. segment..Nodal segmentprojectionprojection setanyConnectsegment and Connection: NoteConnect Note dataxU1 2 oftobe of thatx connectingthat setx3x anx 2xto3∗. inonto3onto empty In theto a this−training thespecified closestthe the set. case, closest 1 line lineto∗ Without set3 segment..2 segment.. format.. andfor Again,x U theloss2 be An Trainingweif ofNote Note an applicable remove emptyx − that3 that Cancerx set.23 Without class.x 3than lossx 1 that of from the Nodal Connection algorithm. BiomimeticBiomimetic Pattern1.2the1.2 Pattern Biomimeticmicroarray Biomimetic Recognition Recognition chip. Pattern Pattern (BPR) (B) (BPR) Recognition Recognition When isx 3 isaina comparingneighbor.S neighbor.so thatits gene In Indistance order orderto to DNA to to build the build microarray line thethe segments HD HD data. topological topological in  − ≤ (1) − x 1 ifnodex 3 of thex 1 linenode< segmentx offrom3 same thesamex S2 linex and class classto segmentx add in. in In it a a this tox HD HD1Uto case,in spacex space2 the. In following thisby by means case,means fashion:algorithms of of nearest nearest aren performed onalgorithms each training are performed class, on each training class, tweentween any any two two elements elements that that belong belongtweentween to to the any theany same same two two elements elementsandand that thatdenotedenote belong belongnode thex the3∗ to tois usual usual of the the either the Euclideansamegenerality, same Euclidean linex 1 segmentor x norm2data letnormand.andx x1 set and1 andandtodenote shoulddenote1 thex thex22.be In2 the betheConnectS thisthegenerality, a usual usual twon case,S closestEuclideanx Euclideanbyto letm thexG elements1matrix, closestand norm normMinnerx 2 and and wherebe product the the twons in closestR , respectively.Namely, elements the It sum can of be all shown minimumalgorithms distances use in theU is same name set, a different struc- techniquetechnique constructing constructingexpression a hyper-dimensional a hyper-dimensional2.2.1 levels, Trainingx DNA3 = from Process:(HD)U (HD) eachiswhere minimal;formations, cellBy formations, providing isx−∗Ia. labeled currentlyis itNodal determined is it a importantis new important Connection:U−simply based to to understandLet contains(3) understand onbe theIIa. a pointsthe line proposed pointsSegment3 set and of and Connection:elements ofConnect the x 3 to the BiomimeticBiomimetic·· Pattern Pattern∗·· Recognitionx Recognition2 ifx x 3 (BPR) (BPR)3 x 2 xx is∗ is··is ax ax either3 neighbor.neighbor.x··1x 1 or Inx In2. order order− to to build build− the the HD HD topologicalhence, topological they produce two “biologicalhence, they organisms,” produce two “biological organisms,” x 3∗ is3∗ eitherisx eitherinifxS1. Removeor13x xor2is. x2 the.x 1 < numberandx x 2 from ofx in samplesxSS.and Removex add and themxm1Gand tothatisUx the2 from numberS and add themsmaller to U for the Segment Connectionture is algorithm. constructed. The Figure 3 shows an example of geometricgeometric body body to to mimicwithFigure mimicapproach different a biological a1. biologicalDNA tofluorescent the microarrays system PHC, system tags wesegment for for andprocess connectalgorithms:line hybridized line connecting segments from1 nodes− segments ≤ node onfromx in31 into ofHD thex− HDthe12. space.training Again, line space. segment3 Let we set Letx remove2andxbe1beU antobe anxx elementclosest312 an. element Inx empty3 this element case, set. Without of x x loss. In of this case, x could techniquetechnique constructing constructing a a hyper-dimensional hyper-dimensionalx 3 = (HD) (HD)−2.2formations,x formations,1 BPR Algorithm− it it is is U important important= with(3) Hyper-Dimensional to tox understand1 understandone points points1 structure Line2 and and for the Training3∗ one Normal structure class for and the one Training Normal class and one x xn xnso1if thatif x xU x=of3x x objects,1x <1.if Then,<x x x suchx 33 wex x x asso selectx 12 genes,that x 2 x 1 WangWang in in 2002 2002 in in Beijing, Beijing,IIa. China ChinaIIa.SegmentclosestSegment and andx ∗ element was wasis Connection: either derivedConnection: derived ofthatxx1algorithms:xor2TheThe. Inx Connect2. thisminimum minimumConnect case,x 3x distance,3∗ distance,to3couldto the theto DxD2,,. from from x x toto a astructure.xx11xx22ture Inisis the is Segment constructed. Connection Figure (right) the 3 next shows point is an the exampleclosest point ofon the segment. 3 segment connecting xapproach2 x 3∗ x x 12 totox 2x − the1x x 21segment. PHC, ≤ Again, we connectingwe− connect remove nodesx 13 to x from2 . Again,− the we removefor thex Training3 − Cancerstructures class. usingthan two that assembling from the Nodalalgorithms. Connection In the algorithm. [8].[8]. PHC PHC assumes assumesexpression that that theline the difference segments levels, differenceII.a. DNA between in Segmentoriginal between HD from space. el-each training el- Connection:the cell Letthe line set) isx line labeledbe segment. on ansegment. theclosestConnect element segment Letelement Letu x =3u from toin= ofSthe−x xso−1 1closestx thatbe2.be In a itsunit this aelement distance unit case,vector vectorx toof∗ could the line segments in (1) geometric body to mimic a biological system forfromfrom the the Principle Principle of of Homology-Continuity Homology-Continuityclosestbeclosestx 1 , elementx 2 elementor (PHC) (PHC)a of newx of1xx element,21determined.determinedx In2. this Inx thisx t2 case,(notx 2x case,based1 basedx x1 from∗ xcould on on3∗Figurecould the whether whether 3. its itsDevelopment3 projection projection is is of inside inside thethe skeleton-like development of and the contrast between both 1.2 Biomimeticn Pattern Recognition fromIIa.x n1SegmentS andif Ia.same addx 3 Connection:Nodal itx class tox 1Ux− Connection:1< inin− theaxfrom33 HD followingConnect xS space2 andConnectq<– addIfx by fashion:03 theto itmeansx to3 projection thetoU the ofin closestthe nearest of followingx 3 lies fashion: outside x1x2, Namely, the sum of all minimum distances in U is with different fluorescentto x tags2. and hybridized on U is minimal; currently Ux x 22 simplyx x 11 containsLet S abe line the set of M elements2Nodal of Connection the (left), the next point is selected classification. BPR was first introducedementsements of byof the Shoujeu the same same[8].[8]. classin classPHC PHCR is assumes isand assumesgradually gradually x 1 x 2 that. that beIn changed. this changed.the a the linewhere differencecase, differenceoriginal segmentgoingbex x 3 going3∗ x could1=xis between training betweenfrom, inx determined from2R beorbex 1 as aset) el-x el-tox 1new well.1,2.2 to x ,on 2 the basedx the element,x or BPR2and2 the − or lineaand line new segment−qx a Algorithm on segment. new= segment.qx t = theelement,(notx element,x from proposed2 from−x Letwith Let1,x ux21 u,x u the uHyper-Dimensionalt = be(not=(notbe the(3) − fromthe− be thebe a a unit unit Line vector vectorproposed algorithms in . The Segment Connec- be 1 , 2 or aclosest newD = element, elementt of(notx1x from2.structures In this the1 case, usingthenx x 2x2 ∗x twox 1could1x assemblingis defined algorithms. as in Equation In the 3. R Biomimeticthe microarray Pattern Recognition chip. ∗ (BPR)x 2 is aif neighbor.x 3segmentnodex x 22 of In the connectingx order1x line−3 −+ segment toqx 1x build0tox xq1 theFigure−−3to. Again,xtraining HDx232. 3.In topological we thisx Development1 set remove case, and Ux be of an the empty skeleton-like set. Withoutbased loss of onsmaller its minimal for the distance Segment to Connection the current algorithm. The Wang in 2002 in Beijing, ChinaInIn other and other words, was words, derived there thereements isThe is a gradual a of minimum gradual thethe connection same connection original distance,– classIfalgorithms: be-the training isD be- projectiongradually,projectionoriginal fromx projection set)x  trainingto changed.ofon of ax theoriginal3 ofxx lies1 xxontosegment set)Segments2 onto outsideisgoing training on the− the the line fromfromx line ≤ segment1 set)−x segment..x2, 1x segment..x on toxto3 the− from x x Note segment.and x Note11 ≤ thatqq = that=2 from≤ x x x∗1−x x x ,1, u u x be3 tion the 3 algorithm provides a more compact structure ements of the same class isoriginalto gradually2. training changed.be set)x 1 on, x going2 thexorx segment a from newx element,11 fromx Nodalto 2x 2x 1x Connectiontandq>(notFigure fromx (left), the2x shows11 theWeDevelopmentFigure hownextbe also the pointthe defined 3. projectionDevelopment is of selected the two skeleton-like ofdifferentx of the skeleton-like extensionalgorithms pos- sibilities, are performed whereU can on each training class, technique constructing a hyper-dimensional (HD) formations, from3∗Uis=S either2 itand is addimportant1 or it to2.U tostructuresin understandFigure the2FigureUgenerality, following(2)=−− 3.using1 points3.Development two fashion: let and assemblingx 1 and ofx 2 thebe algorithms. theskeleton-like(2) two closest3 In thestructure. elements In the Segment Connection (right) the from the Principle of Homology-Continuitytweentween any any two two (PHC) elements elementsdetermined that that belong belong based to to the on thethen whether same samex IIa.3 to itsisx projectionSegmentand2. definedand denotetodenote is asConnection:x inside2 in. theConnect the Equation usual  usual− Euclideanx EuclideanConnect 3. xto2 x thex ∗ norm closest normx 3 andto and the the the− x 2 x ∗ than that from the Nodal Connection algorithm. InIn other other words, words, there there is is a a gradualIa. gradualtoNodal–x 2If. connection connection the Connection: projectionoriginal be- be- ofprojection trainingprojectionx 2.2.13 lies set)Training3 outside of of onx onto the3ontox Process:based1x segment2 the the, on line lineBy its from segment.. segment..minimalprovidingx 1 distancestructures Notebe Notea(1) new composed3 that that to using the current of two nodes assembling which algorithms.can connect In to the more than one node geometric body to mimic a biologicalx 2∗·x ·1 system·· for line segmentsx inif HDx space.x structuresNodalcan Letstructures< liexx Connectionbe inside usingx an using element two (A) two(left), assembling or assembling outside the next (B). algorithms. point algorithms. is selected In theIn the hence, they produce two “biological organisms,” the line segment. LetFigureu = 2− showsbe how a unit the vector projection of x 13 3 1 in3 S. Remove2 Figurex 1 and 3. xDevelopment2 from S and of add the themnext skeleton-like to pointU is the closest point on the segment. [8]. PHC assumes that the difference between el-tweentween any any two two elements elements that that belong belongnodex 2 x 1closest of to to the the the line element same same– segmentIf theofx xn projectionxand1=andx1xto2. Inx 2denote.denote this In of this case,x the3 the case,xlies usualx usualx∗ outsidecould Euclidean Euclideanx1x2, norm normn and and(3)Namely, the the the sum of all minimum distances in U is – If–then theIf thex projection3 projectiontoisx defined.approachboth ofLet3x of cells.3 asliesS3 A tolies inbe black outside the Equation theoutside spot PHC, setstructure.x indicates 1x we of2 3.1,−32 connectM that, Inx 1elements the the x gene nodes3 Segment− is ofclass. from Nodal the Connection These thex 1 Connection connecting (right) branches (left), the can the be next HD pointline is selected classification. BPR was first- If introduced the projection−  bywhere Shoujeu of x 3∗ 2 liesisin· · determinedoutsideR∗ and·· x x 2 1 x based2 if where, bethen ax U on3line x = 3∗ thex –isNodal segment2is definedIf proposedNodal determinedbased the Connectionx so3 projectionin Connection onthat Rx its1U basedasfrom minimal= (left),well. of (left),S on x(“Multi-.(2)3the distance Then, thelies the next proposed nextinside we Lateral”),point to select pointthex is1 currentx theselected is2 or, selected next U can element containone nodes structure which for can the Trainingonly Normal class and one ements of the same class is gradually changed. going from x to x canand lieq inside= x (A)x or∗, u outsidebe the (B). structuresx 2 using two assembling algorithms. In the 1.2 Biomimetic Pattern1 Recognition2 x 3∗ isbethen eitherx 1 1x,x 3x 12ororisxtheninactive adefined2. newx 3 in element, as bothisU in defined cell Equationx [3].t next(not A as laserx point from− in 3. then Equationx ≤ is the2 scans thex ∗ closest the− 3. segments point or onsmaller hyper-surfaces the segment. for the and Segment the resulting Connection topo- algorithm. The Wang in 2002 in Beijing, China andthenFigure was−algorithms:x 3 derived 2 showsis–training definedIfsame howThe the set projectionthe classminimum as∗ and projectionin in Equation abe of distance,algorithms: HD anx of3 empty spacelies 3.3 D outside set., bystructure.then frombased Without meansbasedx31x x on2to In, of on itsloss a the nearest itsminimalx of1 Segmentxbased minimal2 is distanceon Connectiondistance its minimal to theto the (right) currentdistance current the to the currentfor the Training Cancer class. – Ifas the in projectionEquation 3. of∗ Figurex∗3 liesmicroarray 2 inside shows andx 1howx determines2, the projection the expression of levels x  3 of in logicalSNodalsoconnect that structure Connection its to distance forms a one a (left), “biological”node to the the (“Sequential”). line next organism, segments pointWe is in selected also defined two different extension pos- In other words, there is a gradual connection be- projection of x onto the line segment..originalFigurex if 2 Notetraining showsx FigureIIa. thatx how set)Segment 2< onthe showsx the projection segmenthowConnection:x the of projection fromx 3 xConnect1 of x 3 x 3 to thealgorithms are performed on each training class, fromBiomimetic the Principle Pattern of Homology-Continuity RecognitionFigurecan (BPR) lie1 2 inside(PHC) showsis agenerality, (A)3thenneighbor. howdetermined or1x the outside3 let projectionis Inx 13 baseddefinedand (B).orderx2 on2 of tobe as whetherx buildthe3 in two Equationstructure.next the its closeststructure. projection HD point 3. elements In topological is theIn the isstructure. the Segmentinside closest SegmentDevelopmentpoint In Connection the Connection on Segment the of (right)segment. the (right) Connection skeleton-like the the (right) the tween any two elements that belong to the same and denote thethen usualcanx 3lie Euclidean= inside Ia.(A) norm Nodalor andoutside−each theConnection: genewhere (B) according x 3∗−Connectis toIa. the determined intensityNodalx(3)3 to of Connection: the based color closestU and onisFigure minimal;which theConnectbased proposed can 3. be currentlyx on used3 to its for the minimal classification.U closestsimply distance Onecontains specialsibilities, to the a line current where U can be composed of nodes which tocanx x 2 lie. if insidex can (A)x lie orclosest inside∗ outsidex element (A) (B).x or outside of xWe1x2 also (B)..x 2 Inx thisdefined1x case,= twox x ∗+( differentcouldx hence,x extension) they(u ) produce pos-u (4) two “biological organisms,” technique[8]. PHC· constructing assumes· that a the hyper-dimensional difference∗– canIf between the lie2 projectioninside (HD) el-in (A)3FigureSformations,is.the Removeoforgiven2 lineoutsidex 23 a numericshowslies segment.x it13 (B). inside isand value. how important1x Let2 Each thexfrom1xu 2projection sample=, toS understandand is− definednext add of3be point them asx 3 a points unitcharacteristic1 tois3next theU vector and3 closest point of BPR1 is point theis that closest on it requires the point segment. only aon small the segment.  node of− thealgorithms: ≤ line segment− x 1 tonodex 2. of Innextx 2 the thisx 1 line point case,segmentstructures∗ segment is thestructure. connecting closestx using1 to−x two point2 In.x In the•1 assembling thisto on Segmentx the2∗ case,. Again, segment. Connection algorithms. we remove (right) Inx 3 the the If the projectionx 1 of x lies inside x x , one structure for the Trainingcan Normal connect class to more and thanone one node from S (“Multi- geometric body to mimic a biological– If–thensystem theIf theIf projection the projection for–so projection thatlinea of sequenceU segmentsofxbe3=x lies ofx 3 of1 liesx numerical, insidex. in2lies Then, insideor HDsibilities, outsideax values1 wenew3 space.xx21, selectx of element,2 genewhere,x Letx− the expression,xU nextx betcan1(not2 an element benumber fromelement composed of the samples1. ofMulti-lateral: nodesas opposed which to traditional The patternalgorithm proceeds such that each line ements of the same class isx 3 gradually= x 1 +(– changed.x 3 x 1)cangoing(u lie) insideu fromx 2 (4)3 (A)x 1 to or outsidex 2x and q (B).1=2 xxWe alsox 1x , u definedx 2 bex 1 the two different extension pos- IIa. Segment Connection:x 3∗ is eitherlevels.nIa. Inx Connect1 recentorNodalx years,2. Connection:x DNA3 to microarray3∗ theis eitherwhereConnect technology1fromNodaluor=x n3Srecognition2to. Connectionandnext− the add closest point. algorithms. it to is (left),U thein In closestthe recent thenext following years, point point BPR on fashion: isthehas selected segment.U ∗ then − then• original∗ trainingcan connect set) on theto more segment− than from onex 2 x node1x 1for from the TrainingS (“Multi- Cancer class.Lateral”), or can contain nodes which can only classification.In other words, BPR there was first is a- introduced gradualIf the projection connectionthen by Shoujeuthen of be- x– x 33 If inliesinprojectionis theRS inside definedso projectionand that x of 1its x as 2 x distance, ofbethen inontox a Equation3 line thelies to segment linethe insidesibilities, line 3. segment..Wex segmentsin1Wex wherealsoR2, also Noteas definedU− inwell.definedcan thatWe twobe alsotwocomposedsegment differentFigure differentdefined extensionof 3.can two nodesextensionDevelopment connect different which pos- pos- to extension ofany the previously skeleton-like pos- constructed line as closestx 2x elementx 1 = x of+(xhasx1x x12 provided.if Inx this)nodex a3( case,u promising) ofx 1 theux 3∗ x 2 x 1  −  x 3 = x 1 +(x 3 previouslyx 1) −of(u objects,) constructedu such(4) as genes,line as images, shown or in frame Figure data.  − − than that from the Nodal ConnectionxContinue algorithm.1 x with3reference inat thisleast[3].x M (A) fashion one DNA node from∗ untilContinue a of cellS the ishas extracted. segment in− been this• fashion beingconnectthen∗ an untilx to3 a oneisS caneachhas defined node connect been line (“Sequential”). as can toconnect in only more Equation connect than to a one 3. to node nodea nodebased from (“Sequential”). thatS on does(“Multi- its minimal distance to the current (1) algorithm. Namely, theU sum= of allEach originalminimum··· DNA segment training distances has set) a corresponding on (5)in theU is segment smaller “spot”4. on fromHowever,x 1 for our∗ purposes, we apply this technique element of the training set.x 2 Noticex 1 that whileFigure bothFigure 2Figure showsnot 2: 3. how belongshowsDevelopment the to projection howU as ofshown the ofthe projectionx skeleton-like in Figure 4. of a point x 3 can lie outside (Left) or inside (Right) a line  Namely, the sum of all minimumexhausted. distancesx 2 inx At3∗U theis endx M∗ ofwhere theexhausted. algorithm,u = At− the the. set endU of the1. algorithm,Multi-lateral:Lateral”), the setTheU algorithm or U1. canMulti-lateral: containproceeds3 nodes suchstructure.The which algorithm canIn the onlyproceeds Segment Connection such (right) the Let S be the set of M elements of the  tothex··· microarray2.  chip. (B) When comparingx 2 x 2.1 Sequential: gene to DNAThe microarray algorithm data. proceeds such that for the Segment Connection algorithm. The algorithms are − (M 1) segment,can liestructures inside namely (A) using or outsidetwox assemblingx (B).. (Top) algorithms. Case 1 In shows the the projecting landing outside the line segment. (Bottom) training set and U be an empty set. Without loss of smaller for the Segmentwith Connection at leastwill one algorithm. contain nodeexpression of( TheM the1) segment levels,Continuesegments DNAwill being from in contain each anthisx cell fashion is labeledeach untilsegments linexS canx hasthat only been each connect lineconnect to segment a node to a1 thatone can2 that nodedoes connect each (“Sequential”). line to anynext segment point can is the connect closest to point any on the segment. – −If the projection of 3 lies outside− 1 If2, theNodal projection Connection of x (left),lies inside the nextx x point, is selected algorithmsperformed are performed on each on training each training class, class,withhence,exhausted. different they fluorescent produce At the tags endtwo and ofhybridized “biological the algorithm, on 2.2– BPR thepreviously Algorithm set U with constructed1. Hyper-DimensionalMulti-lateral:3 line aspreviously LineshownThe1 algorithm in2 constructed Figure proceeds line as such shown in Figure generality, let x 1 and x 2 be the two closest elements element of the training set. Noticethenx x thatx3 whileis definedx both as innot Equationx 1 belongx 3 3. toCaseUx asM shown 2 shows in Figure the 4. projection landing inside the line segment. In both cases, the projection is a red hence, theyorganisms,” produce twoone “biologicalstructure for organisms,” the UTraining= 1 Normalthe∗3 microarray··· class chip.M andU one= for(5) Segmentsthen···4. based on its(5) minimal distance4. to the current in S. Remove x 1 and x 2 from S and add them to U willFigure contain 2 shows(M how1) segments the projection of x that each line segment can connectWe also to defined any two different extension pos- x 2 x 3∗ −x M∗ x 2 x 3∗ 3 x structure.M∗ dashed In the line Segment and the Connection distance (right) from the the point to the line segment as a green dashed line. x 1 2.2.1 Training Process:The algorithmBy providing proceeds a new suchThe that algorithm proceeds such that so that U = . Then, we select the next element one structurethe Training for the Training Cancer Normal class. class and onecan lie inside··· (A) or outside (B). 2.···Sequential:x =x +(x previouslyx ) 2.(u constructed)Sequential:u (4) linesibilities, as shown where in FigureU can be composed of nodes which x 2 x x x approach to3 next the PHC, point1 we connectis3 theclosest1 nodes from point the on the segment. for the Training Cancer class. with at least one node ofwith the segment at1 least3 being one node anM of theeach segment∗ line can being only− an connect• toeach∗ a node line that can doescan only connect connect to to more a node than that one does node from S (“Multi- x 3 in S so that its distance to the line segments in 1.2– BiomimeticIf thePattern projectionU Recognition= of x 3 lies··· inside samex1x2 class, in(5) a HD space4. by means of nearest x 2 x ∗ x ∗ x 2 x 1   element of the training set.element Notice that of the while3 training bothM set.neighbor.where Noticenot In belongu order that= whileto to− buildU both.as the shown HD topological innot Figure belong 4. to Lateral”),U as shown or U in Figurecan contain 4. nodes which can only U is minimal; currently U simply contains a line Biomimeticthen Pattern Recognition (BPR)··· is a  x 2 2.x 1 Sequential: The algorithm proceeds such that technique constructing a hyper-dimensional (HD) formations, it is importantWe − also to understand defined pointstwo different and extension pos- segment connecting x 1 to x 2. Again, we remove x 3 with at least one node of the segmentContinue being in an this fashioneach line until canS onlyhas beenconnectconnect to a node to thata one does node (“Sequential”). geometric body to mimic a biological system for lineFigure segments 2.sibilities, shows in HD how space. wherethe projection LetUx canbe of ana be point element composed can lie outside of nodes (Left) whichor inside (Right) a line segment,Positive (TP), True Negative (TN), False Positive x 3 = x 1 +(x 3 x 1) (u ) u (4) from S and add it to U in the following fashion: Figure 2: showselement of how the trainingthe projection set. Noticeexhausted. of that anamely whilen point At the both end(Top) x 3 canCase of the 1not shows lie algorithm, belong outsidethe projectingn to theU (Left)landingas set shownU outside or in1. the inside Figure lineMulti-lateral: segment. 4. (Right) (Bottom)The Case a line algorithm2 proceeds such classification. BPR∗ was first introduced− by• Shoujeu∗ in R and canx1x2 connectbe a line tosegment more in thanR as one well. node from S (“Multi- (FP), and False Negative (FN) values with Equation Wang in 2002 in Beijing,x 2 Chinax 1 and was derivedwill containTheshows minimum( theM projection distance,1) segments landingD, from insidex theto alinex1 segment.x2 is In both cases, thethat projection each is line a red segmentdashed line can connect to any x 1 x 3 segment, namelywherex1ux=2. (Top)x − x . Case 1 shows theLateral”), projecting− or U landingcan contain outside nodes which the linecan only segment. (Bottom) from the Principle of Homology-Continuity2 1 (PHC) determinedand the distance based on from whether the point its projection to the line is segment inside as a green dashedpreviously line. constructed(6) line We as shown also in determine Figure sensitivity and specificity in U = (2)  −  x x x 2 x ∗ [8].Continue PHC assumes in that this the fashion difference until betweenS el-hasthe been line segment.connectx Let1 u tox =3 a one2− 1 nodebex M a unit (“Sequential”). vector 3 Case 2 shows the projection landing inside theU = line segment.x 2 x 1 In both cases,(5) the4. projection is a red   ements of the same class is gradually changed. going from x to x and q =···− x x , u be the Equations (7) and (8) respectively. exhausted. At the end of the algorithm, the set U 1.1 xMulti-lateral:22 x 3∗  −x TheM∗1  algorithm proceeds such where x is determined based on the proposed dashed line and the distance from the pointx  to the··· line segment as a green2. Sequential: dashedThe line. algorithm proceeds such that 3∗ willIn contain other words,(M there1) issegments a gradual connection be- projection of ontothat the each line segment.. line segment Note that can connect to any algorithms: tween any two elements− that belong to the samewith at leastand onedenote node the usual of Euclidean the segment norm and being the an eachDIMENSIONS line can only connect79 to a node that does TP + TN · · previously constructed line as shown in Figure Accuracy = (6) x x 1 x 3 x M element of the training set. Notice that while both not belong to U as shown in Figure 4. Ia. Nodal Connection: Connect 3 to the closest U = ··· (5) 4. TP + TN + FP + FN x x Figure 2:x 2showsx 3∗ how thex ∗ projection of a point x 3 can lie outside (Left) or inside (Right) a line node of the line segment 1 to 2. In this case,  ··· M  2. Sequential: The algorithm proceeds such that segment, namely x x . (Top) Case 1 shows the projectingPositive landing (TP), outside True the Negative line segment. (TN), (Bottom) False Positive x 3∗ is either x 1 or x 2. with at least one node of1 the2 segment being an each line can only connect to a node that does Case 2 shows the projection landing inside the line segment. In both cases, the projection is a red TP x 1 if x 3 x 1 < x 3 x 2 element of the training set. Notice that while both not(FP), belong and to U Falseas shown Negative in Figure (FN) 4. values with Equation Sensitivity = (7) x 3 = − − (3) ∗ x 2 if x 3 x 2 x 3 x 1 dashed line and the distance from the point to the line segment as a green dashed line. TP + FN  − ≤ − (6) We also determine sensitivity and specificity in IIa. Segment Connection: Connect x 3 to the Equations (7) and (8) respectively. closest element of x x . In this case, x could TN 1 2 3∗ Positive (TP), True Negative (TN),TP False+ TN Positive Specificity = . (8) be x 1 , x 2 or a new element,x t (not from the (FP), and False Negative= (FN) values with Equation TN + FP original training set) on the segment from x Accuracy (6) 1 Figure 3. Development of the skeleton-like (6) We also determine sensitivityTP + TN and+ specificityFP + inFN to x . Accuracy is used to calculate the percent 2 structures using two assembling algorithms. In the Equations (7) and (8) respectively. – If the projection of x 3 lies outside x1x2, Nodal Connection (left), the next point is selected TP of correctly classified samples. Sensitivity shows then x is defined as in Equation 3. TP + TN 3∗ based on its minimal distance to the current AccuracySensitivity= = (6) the(7) ability of the algorithm to correctly identify Figure 2 shows how the projection of x TP + TN +TPFP ++FNFN 3 structure. In the Segment Connection (right) the a sample as cancerous, while specificity gives the can lie inside (A) or outside (B). next point is the closest point on the segment. TP – If the projection of x 3 lies inside x1x2, Sensitivity = TN (7) ability of the algorithm to correctly identify a then SpecificityTP=+ FN . (8) We also defined two different extension pos- TN + FP sample as cancer-free. Ideally, sensitivity should be sibilities, where U can be composed of nodes which x 3 = x 1 +(x 3 x 1) (u ) u (4) TN close to 100% since, clinically, we do not want ∗ − • ∗ can connect to more than one node from S (“Multi- AccuracySpecificity is used= to calculate. (8) the percent x 2 x 1 TN + FP to identify a cancerous sample as cancer-free. An where u = x − x . Lateral”), or U can contain nodes which can only  2− 1 Figure 4.ofShows correctly the different classified possible samples. extensions. Sensitivity shows Continue in this fashion until S has been connect to a one node (“Sequential”). Accuracy is used to calculate the percent optimal method will reach close to 100% for all (Top) Thethe multilateral ability of extension the algorithm shows to how correctly some identify exhausted. At the end of the algorithm, the set U 1. Multi-lateral: The algorithm proceeds such of correctly classified samples. Sensitivity shows three of the above. will contain (M 1) segments that each line segment can connect to any nodesa canthe sample abilityconnect as ofcancerous, theto more algorithm than while to two correctly specificity nodes. identify gives the − previously constructed line as shown in Figure 2.2.2 Classification Process: Two structures x x x (Bottom)abilitya The sample sequential of as thecancerous, algorithm extension while specificity to shows correctly gives how the identify a U = 1 3 ··· M (5) 4. x x x ability of the algorithm to correctly identify a are developed from the Training algorithm one for  2 3∗ ··· M∗  2. Sequential: The algorithm proceeds such that each segmentsample ascan cancer-free. connect only Ideally, two nodes. sensitivity should be sample as cancer-free. Ideally, sensitivity should be the cancer training set (UC ) and one for the normal with at least one node of the segment being an each line can only connect to a node that does closeclose to 100% 100% since, since, clinically, clinically, we dowe not want do not want training set (UN ). The resulting structures provide a element of the training set. Notice that while both not belong to U as shown in Figure 4. toto identify identify a a cancerous cancerous sample sample as cancer-free. as cancer-free. An An Figure 4. Shows the different possible extensions. Figure 4. Shows the different possible extensions.In Figure 4,optimal we see method how willthe “Sequential” reach close to 100% Extension for all basis for classification of an arbitrary node from the (Top) The multilateral extension shows how some optimal method will reach close to 100% for all (Top) The multilateral extension shows howgrows some outwardthree ofwhile the above. the “Multilateral” can expand test set, TS. We introduce a classification technique nodes can connect to more than two nodes. three of2.2.2 the Classification above. Process: Two structures nodes can(Bottom) connect The to moresequential than extension two nodes. showsfrom how a single node into different directions. which we call “BPR Proximity” where an arbitrary are developed2.2.2 Classification from the Training Process: algorithm oneTwo for structures (Bottom) Theeach sequential segment can extension connect only shows two nodes. how The resulting skeleton-like structures, one for node is classified as part of a class depending on its arethe developed cancer training from set (U theC ) and Training one for the algorithm normal one for each segment can connect only two nodes.the cancertraining training set set(UNU).C Theand resulting one structures for the providenormal a location relative to each “skeleton”. If the distance the cancer training set (UC ) and one for the normal In Figure 4, we see how the “Sequential”training Extension setbasisUN for, classification are used of for an arbitrary the classification node from the from the node to the structure of class A is closer grows outward while the “Multilateral” can expand trainingtest set, T set. We(U introduceN ). The a resulting classification structures technique provide a of an arbitrary nodeS from the test set, TS. Two than the distance from the node to the structure of In Figure 4, wefrom see a single how node the “Sequential” into different directions. Extension basis for classification of an arbitrary node from the classificationwhich techniques we call “BPR are Proximity” introduced. where Accuracy an arbitrary class B, then the node is classified as part of class grows outward whileThe resulting the “Multilateral” skeleton-like structures, can expand one for testnode set, is classifiedT . We as introduce part of a class a classification depending on its technique for the algorithm isS calculated based on the True A. Figure 5, depicts a visual representation of the from a singlethe node cancer into training different set UC directions.and one for the normal whichlocation we relative call to“BPR each “skeleton”. Proximity” If the where distance an arbitrary training set UN , are used for the classification from the node to the structure of class A is closer The resultingof an arbitrary skeleton-like node from structures, the test set, oneTS. for Two nodethan theis classified distance from as the part node of to aclass the structure depending of on its the cancer trainingclassification set U techniquesC and one are introduced. for the normal Accuracy locationclass B, then relative the node to eachis classified “skeleton”. as part of If class the distance training set forUN the, algorithm are used is calculated for the based classification on the True fromA. Figure the node5, depicts to thea visual structure representation of class of the A is closer of an arbitrary node from the test set, TS. Two than the distance from the node to the structure of classification techniques are introduced. Accuracy class B, then the node is classified as part of class for the algorithm is calculated based on the True A. Figure 5, depicts a visual representation of the Figure 2: shows how the projection of a point x 3 can lie outside (Left) or inside (Right) a line shows how thesegment, projection namely x1x2. (Top) of a Case point 1 shows x the projectingcan lie landing outside outside the (Left) line segment. or inside (Bottom) (Right) a line Figure 2: Case 2 shows the projection landing inside3 the line segment. In both cases, the projection is a red segment, namely x1x2. (Top) Casedashed line 1 and shows the distance the from projecting the point to the landing line segment asoutside a green dashed the line. line segment. (Bottom) Case 2 shows the projection landing inside the line segment. In both cases, the projection is a red Positive (TP), True Negative (TN), False Positive dashed line and the distance from the point to the(FP), line and False segment Negative (FN) as values a green with Equation dashed line. (6) We also determine sensitivity and specificity in Equations (7) and (8) respectively. TP + TN Accuracy = (6) Positive (TP), TrueTP + TN Negative+ FP + FN (TN), False Positive TP (FP), and FalseSensitivity Negative= (FN) values(7) with Equation TP + FN (6) We also determine sensitivity and specificity in TN Specificity = . (8) Equations (7) and (8)TN + respectively.FP Accuracy is usedAccuracy to calculate is used the to percent calculate of the correctly percent classified of correctly classified samples. SensitivityTP shows+ TN samples.the SensitivityAccuracy ability of theshows algorithm the= ability to correctly of the algorithm identify to correctly (6) identifya a sample sample as as cancerous, cancerous, while whileTP specificity specificity+ TN gives +gives the FP the ability+ FN of the algorithmability ofto correctly the algorithm identify to correctlya sample identifyas cancer-free. a Ideally, sensitivitysample should as cancer-free. be close to Ideally, 100% sensitivity since, clinically, shouldTP be we do not want to identifyclose a tocancerous 100% since, sample clinically, as cancer-free. we do not An want optimal method to identify a cancerousSensitivity sample as cancer-free.= An (7) Figure 4. Shows the different possible extensions. will reachoptimal close method to 100% will for reach all three close of to the 100% above.TP for+ all FN (Top) The multilateral extension shows how some three of the above. nodes can connect to more than two nodes. Classification2.2.2 Process: Classification Process: Two structures (Bottom) The sequential extension shows how are developed from the Training algorithm oneTN for each segment can connect only two nodes.Two structures are developed from the Training algorithm one for the cancer trainingSpecificity set (UC ) and one= for the normal . (8) the cancer training set (U ) and one for the normal training set training set (UN ). TheC resulting structuresTN provide+ a FP In Figure 4, we see how the “Sequential” Extension(UN). Thebasis resulting for classification structures of anprovide arbitrary a basis node fromfor classification the of an grows outward while the “Multilateral” can expandarbitrarytest node set,AccuracyT fromS. We the introduce test is set, a classification usedTS. We introduce to technique calculate a classification the percent from a single node into different directions. techniqueof correctlywhich which we call we “BPRcall classified “BPR Proximity” Proximity” where samples. where an arbitrary an arbitrary Sensitivity node is shows The resulting skeleton-like structures, oneclassified for node as is part classified of a class as part depending of a class dependingon its location on its relative to each the cancer training set U and one for the normal location relative to each “skeleton”. If the distance C “skeleton”.the ability If the distance of the from algorithmthe node to the structure to correctly of class A identify training set UN , are used for the classification from the node to the structure of class A is closer is closer than the distance from the node to the structure of class B, of an arbitrary node from the test set, TS. Twoa samplethan the distance as cancerous, from the node to thewhile structure specificity of gives the classification techniques are introduced. Accuracythen theclass node B, thenis classified the node as is classifiedpart of class as partA. Figure of class 5, depicts a visual for the algorithm is calculated based on therepresentation TrueabilityA. Figure of of 5, the thedepicts BPR algorithma Proximity visual representation method. to Mathematically, of correctly the one identify a can write the classification rule as follows: sampleBPR as Proximity cancer-free. method. Mathematically, Ideally, one sensitivity can mean since should the average be accuracy is given in percent- BPRclose Proximitywrite to method. the 100% classification Mathematically, since, rule asone follows:clinically, can mean since the we averageages do accuracy across not differentis given want in parameters.percent- A different form write the classification rule as follows: ages across differentof generalized parameters. A mean, different like form the arithmetic mean, can- Figure 4. Shows the different possible extensions. (Top) The multilateral extension shows how some Cancer if UC x UN x Class( x ) = − ≤ of generalized− mean,(9)not like account the arithmetic for percent mean, changes can- from one metric to nodes can connect to more than two nodes. (Bottom) The sequential extension shows how to identifyCancer a if U cancerousNormalC x if UUNN x sample x UC x as cancer-free. An Class( x ) = − ≤ −− ≤ not account− for percentanother. changes We fromalso onedefine metric the to algorithm performance Figureeach 4. segmentShows can connect theonly two differentnodes. possible extensions. Normal if UN x UC x (9) optimal method − will ≤ reach− (9) another. close We also to defineas: 100% the algorithm for performance all as: (Top) The multilateral extension shows how some Figure 5. The “BPR Proximy In Figure 4, we see how the “Sequential” Extension grows outward three of the above. NH classification method. The TrueNH Pos- NH Hi nodes can connect to more than two nodes. NH Hi while the “Multilateral” can expand from a single node into different itive (TP), False PositiveP = (FP), True G Pn = (11) Gn (11) 2.2.2 Classification Process: nTwo structuresn  i=1 i=1 (Bottom)directions. The The resulting sequential skeleton-like extension structures, shows one for the how cancer Negative (TN), and False Negative  (FN) values are determinedNH  based NH  training set U and one for the normal training set U , are used are developed from the Trainingwhere algorithmHi i=1 iswhere a set of one holdoutHi i=1 for percent-is a set of holdout percent- C N on the node’s{ location} relative to { } each segment can connect only two nodes. ages. We also useages. the overall We also performance use the for overall each performance for each for the classification of an arbitrary node from the test set,T S. Two each derived “skeleton.” Red nodes the cancer training set (UC )cancerand type, oneOp, definedcancer for as: thetype, O normal, defined as: classification techniques are introduced. Accuracy for the algorithm indicate cancer samples, and blue p squares signify normal samples. The is calculated based on the TruePositive (TP), True Negative (TN), training set (UN ). The resulting structures3 provide a OlineCancer segment= structureSensitivity representsAccuracy 3 Specificity OCancer∗ = Sensitivity∗ Accuracy Specificity False Positive (FP), and False Negative (FN) values with Equation (6) the skeleton-like formation forthe ∗(12) ∗ In Figure 4, we see how the “Sequential” Extension basis for classification of an arbitrary node from the (12) We also determine sensitivity and specificity in Equations (7) and (8) cancerfor training each set. cancer type. To then determine the grows outward while the “Multilateral” can expand test set, T . We introduce aperformance classification for each methodfor technique each in order cancer to identify type. To then determine the respectively. S the optimal technique,performance we use the for Overall each methodPerfor- in order to identify mance across all cancerthe optimal types, defined technique, as, we use the Overall Perfor- from a single node into different directions. whichFigure 5. weThe “BPR call Proximy “BPR classification Proximity” where an arbitrary method. The True Positive (TP), False Positive mance across all cancer types, defined as, Figure 5. The “BPR Proximy classificationDIMENSIONS4 80 The resulting skeleton-like structures, one for node(FP), True is Negative classified (TN), and False as Negativepart of aOAll class= O(Bldr depending) O(Col) O(Leuk on) O its(Liv) method. The True Positive (TP), False Positive ∗ ∗ ∗ (FN) values are determined based on the node’s 4 (13) (FP), True Negative (TN), and False Negative OAll = O(Bldr) O(Col) O(Leuk) O(Liv) the cancer training set UC and one for the normal locationlocation relative relative to each derived to “skeleton.” each Red “skeleton”.where, Bldr Ifis Bladder, the Col distance in Colon,∗ Leuk ∗ ∗ (13) nodes indicate(FN) cancer values samples, are determined and blue squares based onis the Leukemai, node’s and Liv is Liver cancer. Equations 11 training set UN , are used for the classification fromsignify thelocation normal node samples. relative The to to line each the segment derived structure “skeleton.”- 13 help of Red determine class the optimalwhere, A is technique. Bldr closer is Bladder, Col in Colon, Leuk T structure representsnodes indicate the skeleton-like cancer samples,formation forand blue squares is Leukemai, and Liv is Liver cancer. Equations 11 of an arbitrary node from the test set, S. Two than thethesignify cancer distance trainingnormal set. samples. from The the line node segment to the-3. 13 R helpESULTS structure determine the of optimal technique. classification techniques are introduced. Accuracy classstructure B, then represents the the node skeleton-like is classified formationIn this for paper, as the part proposed of BPR class methods For a fixed numberthe of cancer genes considered, training set. a (two biomimetic construction algorithms3. and RESULTS two for the algorithm is calculated based on the True fixedA. holdout Figure percentage 5, (percentagedepicts of athe visualtotal classification representation methods) were applied of to four the differ- samples used for training), and a cancer type, the ent cancer types (bladder,In colon, this leukemia,paper, the liver), proposed BPR methods proposed BPR algorithmsFor a fixed are run number numerous of times, genes considered,generating four a data(two sets. biomimetic Each data set construction contained algorithms and two each withfixed different holdout randomly percentage picked training (percentage and caner-free of the total (normal)classification and cancerous methods) (cancer) were sam- applied to four differ- H testing sets.samples The average used accuracy for training),An (cancer and) ais cancerples. type, For bladder the ent cancer, cancer the types data set (bladder, contained colon, leukemia, liver), recorded. Dueproposed to a BPRlarge algorithmsnumber of parametric are run numerous125 samples, times, of whichgenerating 103 were four cancerous data sets. and Each22 data set contained variations (numbereach with of genes,different holdout randomly percentages, pickedwere training normal. and Datacaner-free for 6688 genes(normal) was and provided cancerous (cancer) sam- cancer types), a unified metric is needed to assess testing sets. The average accuracy AbyH (cancer reference) is [15].ples. For colon For bladder cancer, the cancer, data set the data set contained the performance of the proposed algorithms. A containedn 62 samples, of which 40 were cancerous geometric mean,recorded. Due to a large number ofand parametric 22 were normal.125 samples, Data for of2000 which genes 103 was were cancerous and 22 variations (number of genes, holdoutprovided percentages, from referencewere normal. [16]. For Data leukemia, for 6688 the genes was provided Ntypes cancer types), a unified metric is neededdata toset assess containedby 73 reference samples, of [15]. which For 48 colon were cancer, the data set GH = Ntypes AH (Cancer) (10) n the performance n of the proposed algorithms.Acute Myeloid A Leukemiacontained (AML) 62 samples, and 25 of were which 40 were cancerous  Cancer =1 Acute Lymphoblastic Leukemia (ALL). We con- geometric mean, and 22 were normal. Data for 2000 genes was is calculated the for all considered cancer sidered the AMLprovided samples to from be cancerous reference and the[16]. For leukemia, the Ntypes types, where n is number of genes considered and ALL samples to bedata normal. set contained Data for 7129 73 samples, genes of which 48 were H is the holdoutG percentage.H = Ntypes We use the geometricAH (Cancerwas) taken(10) from reference [17]. For liver cancer, the n  n Acute Myeloid Leukemia (AML) and 25 were  Cancer =1 Acute Lymphoblastic Leukemia (ALL). We con-  is calculated the for all considered cancer sidered the AML samples to be cancerous and the types, where n is number of genes considered and ALL samples to be normal. Data for 7129 genes H is the holdout percentage. We use the geometric was taken from reference [17]. For liver cancer, the BPR Proximity method. Mathematically, one can mean since the average accuracy is given in percent- write the classification rule as follows: ages across different parameters. A different form of generalized mean, like the arithmetic mean, can- Cancer if UC x UN x Class( x ) = − ≤ − not account for percent changes from one metric to Normal if UN x UC x − ≤ − (9) another. We also define the algorithm performance as:

NH P = NH GHi (11) n  n i=1   where H NH is a set of holdout percent- { i}i=1 ages. We also use the overall performance for each cancer type, Op, defined as:

O = 3 Sensitivity Accuracy Specificity Cancer ∗ ∗ (12)  for each cancer type. To then determine the performance for each method in order to identify the optimal technique, we use the Overall Perfor- mance across all cancer types, defined as, Figure 5. The “BPR Proximy classification method. The True Positive (TP), False Positive 4 (FP), True Negative (TN), and False Negative OAll = O(Bldr) O(Col) O(Leuk) O(Liv) ∗ ∗ ∗ (13) (FN) values are determined based on the node’s  location relative to each derived “skeleton.” Red where, Bldr is Bladder, Col in Colon, Leuk nodes indicate cancer samples, and blue squares is Leukemai, and Liv is Liver cancer. Equations 11 signify normal samples. The line segment - 13 help determine the optimal technique. structure represents the skeleton-like formation for the cancer training set. 3. RESULTS In this paper,Results the proposed BPR methods For a fixedFor number a fixed of number genes ofconsidered, genes considered, a fixed aholdout(two biomimeticper- In construction this paper, algorithmsthe proposed and BPR two methods (two biomimetic fixed holdout percentage (percentage of the total classification methods) were applied to four differ- centage (percentagesamples used of for the training), total samples and a cancer used for type, training), the ent and cancer a typesconstruction (bladder, colon, algorithms leukemia, and liver), two classification methods) were cancer type,proposed the proposed BPR algorithms BPR algorithms are run numerous are run times,numerousgenerating times, four dataapplied sets. to Each four datadifferent set contained cancer types (bladder, colon, leukemia, each witheach different with differentrandomly randomly picked pickedtraining training and testing and caner-freesets. The (normal)liver), and generating cancerous four (cancer) datasam- sets. Each data set contained caner-free H average accuracytesting sets. H The(cancer) average is recorded. accuracy ADuen (cancer to a large) is numberples. For bladder(normal) cancer, and the cancerous data set contained (cancer) samples. For bladder cancer, the An of parametricrecorded. variations Due to(number a large of number genes, ofholdout parametric percentages,125 samples, ofdata which set 103 contained were cancerous 125 samples, and 22 of which 103 were cancerous and variations (number of genes, holdout percentages, were normal. Data for 6688 genes was provided cancer types),cancer a types),unified a unifiedmetric is metric needed is needed to assess to assess the performanceby reference [15].22 Forwere colon normal. cancer, Data the for data 6688 set genes was provided by reference of the proposedthe performance algorithms. of A the geometric proposed mean, algorithms. A contained 62 samples,[15]. For of whichcolon 40cancer, were cancerousthe data set contained 62 samples, of which geometric mean, and 22 were normal.40 were Data cancerous for 2000 and genes 22 were was normal. Data for 2000 genes was provided from reference [16]. For leukemia, the Ntypes provided from reference [16]. For leukemia, the data set contained data set contained 73 samples, of which 48 were GH = Ntypes AH (Cancer) (10) 73 samples, of which 48 were Acute Myeloid Leukemia (AML) and 25 n  n Acute Myeloid Leukemia (AML) and 25 were Cancer=1 were Acute Lymphoblastic Leukemia (ALL). We considered the AML   Acute Lymphoblastic Leukemia (ALL). We con-  is calculated the for all considered cancer sidered the AMLsamples samples to to be be cancerous cancerous and and the the ALL samples to be normal. Data for is calculatedtypes, the where for alln consideredis number of cancer genes types, considered where and n isALL number samples to7129 be normal. genes was Data taken for 7129 from genes reference [17]. For liver cancer, the data of genes consideredH is the holdout and H percentage. is the holdout We use percentage. the geometric We wasuse takenthe from referenceset contained [17]. For 181 liver samples, cancer, of the which 105 were cancerous and 76 were BPR Proximity method. Mathematically, onegeometric can mean mean since since the averagethe average accuracy accuracy is given is in given percent- in percentages normal. Data for 5520 genes was provided by reference [18]. Table 1 write the classification rule as follows: ages across different parameters. A different form BPR Proximity method. Mathematically, oneacross can differentmean since parameters. the average accuracyA different is given form in of percent- generalized mean, summarizes the microarray data sets used in this study. BPR ProximityBPR method. Proximity Mathematically, method. Mathematically, one can mean one since can the averageofmean generalized since accuracy the mean,average is given like accuracy in the percent- arithmetic is given mean, in percent- can- Cancer if UC x UNlike x the arithmetic mean, cannot account for percent changes from Several metrics have been proposed for as- sessing the accuracy write the classificationwriteClass( the x rule) = classification as follows: rule as− follows: ≤ ages across− differentages across parameters. different A different parameters. form A different form Normal if U x U x not account for percent changes from one metric to UN − x ≤ ofU generalizedCone− x metric mean,of to generalized another. like the arithmetic mean,We also like define mean, the arithmetic can- the algorithm mean, can- performance as: of the BPR algorithms. Accuracies are calculated based on the a Cancer if UC Cancer x ifUNUC x x UN x (9) another. We also define the algorithm performance Class( x ) = Class( x ) = − ≤ C−− ≤ N − not account for percent changes from one metric to Normal if U Normal x ifU UN − x x ≤ notU accountC − x foras:not percent account changes for percent from one changes metric from to one metric to verage of 100 runs. We ran 100 times in order to obtain objective N Normal if CUN − x ≤ UC − x another. We also define the algorithm performance − ≤ −− (9) ≤ another.− We (9) alsoanother. define We the alsoalgorithm define performance the algorithm performance view of the algorithm performance with different metrics since as: as: as: NH Training and Testing sets are randomly selected. Accuracy is calculated NH Hi Pn = Gn (11) (11) NH based on the remaining test set with Equation 6. Below are results NH NHi=1 NH Hi NH PHni = NH GnHi (11) Pn = GPnnNH=  G(11)n (11) obtained as well as how optimal conditions are determined and where Hi is ai=1 set of holdout percent- i=1 i=1 i=1 { }  highest accuracy attainable for each cancer type. Table I summarizes ages. We also use theNH overall performance for each where H NH where is a setH ofi holdoutiNH=1 is a percentages. set of holdout We percent- also use the the microarray data sets used in this study. Data for liver and bladder where canceri i=1where type,is a setO{Hp, ofi} definedi holdout=1 is as: a percent- set of holdout percent- { ages.} We also{ use} the overall performance for each ages.overall We also performance useages. the We overall also for performanceuse each the overallcancer for performance type, each Op, defined for each as: has been provided by genome- www5.stanford.edu in reference [12] O cancer type, Opcancer, defined type, as:Op, defined as: 3 p and [13] respectively. Colon and leukemia data came from reference OCancer = Sensitivity Accuracy Specificity ∗ ∗ [14] and [15], respectively. Limitations of our study are discussed in 3 (12) 3 OCancer = 3 Sensitivity Accuracy Specificity (12) OCancer = SensitivityOCancerfor= eachAccuracySensitivity cancer type.Specificity∗ Accuracy To then∗ determineSpecificity the the Conclusion section of the paper. ∗ ∗ ∗ ∗ (12) performance for each method in(12) order to identify(12)  for each cancer type. To then determine the In order to develop the HD skeleton-like structure, we consid- for eachthe cancer optimalfor type. each technique, To cancer then determine type. we use To the then the Overall determine Perfor- the for each cancerperformance type. forTo then each determine method inorder the performance to identify for each ered the gene expression level of each sample as a single node in the performance formanceperformance each acrossmethod for all in eachcancer order method types,to identify indefined order as, to identify Figure 5. The “BPR Proximy classificationthe optimalmethod technique, inthe order optimal weto identify use technique, the Overallthe we optimal use Perfor- the technique, Overall Perfor- we use the space, of either cancer-containing or cancer-free cells. The Cancer method. The True Positive (TP), Falsemance PositiveOverall across Performance allmance cancer across types, across all defined cancer all as,cancer types, defined types, as,defined as, Training Set, S , and Normal Training Set, S , were randomly Figure 5. The “BPR Proximy classification 4 C N Figure 5. The(FP), “BPRFigure True Proximy 5. NegativeThe “BPR classification (TN), Proximy and False classification Negative OAll = O(Bldr) O(Col) O(Leuk) O(Liv) method. The Truemethod. Positive The (TP), True False Positive Positive (TP), False Positive ∗ ∗ ∗ selected from the total number of samples, based on a Holdout value (FN)method. values The are True determined Positive (TP), based False on the Positive node’s 4 (13) (FP), True Negative (TN), and False Negative4 OAll = 4 O(Bldr) O(Col) O(Leuk) O(Liv) (percentage of the total samples used for training alone). Holdout (FP), True Negativelocation(FP), (TN), True relative Negative and to False each (TN), Negative derived and False “skeleton.”O NegativeAll = RedO(BldrO)All =Owhere,(ColO)( BldrO()Leuk is∗ O Bladder,()ColO)(∗Liv ColO)(Leuk in Colon,) ∗ O(Liv Leuk) (13) (FN) values are determined based on the node’s ∗ ∗ ∗ ∗ ∗ ∗ (13) (FN) values arenodes(FN) determined values indicate are based cancer determined on samples, the node’s based and on blue the squares node’s is Leukemai, and Liv is Liver cancer.(13) Equations(13) 11 percentages used in this study included 33%, 50%, and 80%. The location relative to each derived “skeleton.” Red where, Bldr is Bladder, Col in Colon, Leuk location relativelocation tosignify each relative derived normal to “skeleton.” samples. each derived The Red line “skeleton.” segmentwhere, Red Bldr- 13 is help Bladder,where, determine Bldr Col is in the Bladder, Colon, optimal Leuk Col technique. in Colon, Leuk remaining samples composed the Testing sets. Given S , and S , nodes indicate cancer samples, and blue squares is Leukemai, and Liv is Liver cancer. Equations 11 C N nodes indicatestructure cancernodes indicate samples, represents cancer and the blue samples, skeleton-like squares andblue formationis Leukemai, squares for andis LivLeukemai, is Liver and cancer. Liv is Equations Liver cancer. 11 Equations 11 signify normal samples. The line segmentwhere, Bldr- 13 is help Bladder, determine Col in the Colon, optimal Leuk technique. is Leukemai, and Liv is we defined the following two techniques when choosing the training signify normal samples.signify normalthe The cancer line samples. segment training The set. line- segment 13 help determine- 13 help the optimal determine technique.3. the RESULTS optimal technique. structure representsstructure the skeleton-like represents the formation skeleton-like for formationLiver for cancer. Equations 11 - 13 help determine the optimal technique. and testing sets. structure represents the skeleton-like formation for In this paper, the proposed BPR methods the cancer trainingthe cancer set. training set. 3. RESULTS 3. RESULTS For a fixedthe cancer number training of genes set. considered, a (two biomimetic construction3. RESULTS algorithms and two In this paper, the proposed BPR methods fixed holdout percentage (percentage of theIn total this paper,classificationIn the this proposed methods) paper, BPR the were proposed methods applied BPR to four methods differ- For a fixed number of genes considered, a (two biomimetic construction algorithms and two For a fixedsamples numberFor used a of fixed for genes training), number considered, of and genes a a cancer considered,(two type, biomimetic the a ent(two construction cancer biomimetic types algorithms (bladder,construction and colon, algorithms two leukemia, and liver), two fixed holdout percentage (percentage of the total classification methods) were applied to four differ- fixed holdout percentageproposedfixed holdout BPR (percentage percentage algorithms of (percentageare the run total numerousclassification of the times, total methods)generatingclassification were four methods)applied datasets. to were four Each applieddiffer- data to set four contained differ- samples used for training), and a cancer type, the ent cancer types (bladder, colon, leukemia, liver), samples used foreachsamples training), with used different and for a training), cancer randomly type, and picked the a cancerent training type,cancer and the typescaner-freeent (bladder, cancer (normal) types colon, (bladder, leukemia, and cancerous colon, liver), leukemia, (cancer) liver), sam- DIMENSIONS 81 proposed BPR algorithmsproposed BPR are run algorithms numerous are times, run numerousH times, generating four data sets. Each data set contained testingproposed sets. BPR The algorithms average accuracyare run numerousAngenerating(cancer times,) is fourples.generating data Forsets. bladder Eachfour data data cancer, sets. set contained Each the data data set set contained each with differenteach randomly with different picked randomly training picked and caner-free training and (normal)caner-free and cancerous (normal) and (cancer) cancerous sam- (cancer) sam- recorded. Due to a large number ofH parametric 125 samples, of which 103 were cancerous and 22 testing sets. The averageH accuracy AnH (cancer) is ples. For bladder cancer, the data set contained testing sets. Thetestingvariations average sets. (number accuracy The average ofAn genes,(cancer accuracy holdout) is Anples. percentages,(cancer For) bladderis wereples. cancer, Fornormal. bladder the Data data cancer, for set 6688 contained the genes data wasset contained provided recorded. Due to a large number of parametric 125 samples, of which 103 were cancerous and 22 recorded. Duecancerrecorded. to a large types), Due number a to unified a of large metric parametric number is needed of125 parametric to samples, assess ofby125 which reference samples, 103 were of[15]. which cancerous For 103 colon were and cancer, 22 cancerous the data and set 22 variations (number of genes, holdout percentages, were normal. Data for 6688 genes was provided variations (numberthevariations performanceof genes, (number holdout of of the genes, percentages, proposed holdout algorithms.were percentages, normal. A Datacontainedwere for normal. 6688 62 samples, Data genes for was of 6688 which provided genes 40 were was cancerous provided cancer types), a unified metric is needed to assess by reference [15]. For colon cancer, the data set cancer types), ageometriccancer unified types), metric mean, a is unified needed metric to assess is neededby reference to assess [15].andby reference For 22 werecolon [15]. normal. cancer, For the Data colon data for cancer, set 2000 the genes data was set the performance of the proposed algorithms. A contained 62 samples, of which 40 were cancerous the performancethe of performance the proposed of algorithms. the proposed A algorithms.contained 62 A samples,providedcontained of fromwhich 62 samples, reference 40 were of whichcancerous [16]. 40For were leukemia, cancerous the geometric mean, Ntypes and 22 were normal. Data for 2000 genes was geometric mean,geometric mean, and 22 were normal.dataand 22set Data werecontained for normal. 2000 73 samples, Data genes for was of 2000 which genes 48 were was H Ntypes H Gn = An (Cancer) (10) provided from reference [16]. For leukemia, the  Ntypes provided fromAcuteprovided reference Myeloid from [16]. reference ForLeukemia leukemia, [16]. (AML) Forthe leukemia,and 25 were the Ntypes CancerNtypes=1 H Ntypes data set containeddata 73 set samples, contained of 73 which samples, 48 were of which 48 were H Ntypes Ntypes  H Acutedata set Lymphoblastic contained 73 Leukemiasamples, of (ALL). which We 48 were con- G = GnH = AH (Cancer) AnH (Cancer) (10) n Gn = n  An (10)(CancerAcute) Myeloid(10) Acute Leukemia Myeloid (AML) Leukemia and 25 (AML) were and 25 were  is calculatedCancer the for=1 all considered cancer sideredAcute Myeloid the AML Leukemia samples to (AML) be cancerous and 25 and were the Cancer=1 Cancer =1 Acute Lymphoblastic Leukemia (ALL). We con- types,  where n is number of genes consideredAcute Lymphoblastic and ALLAcute samples Lymphoblastic Leukemia to be (ALL). normal. Leukemia We Data con- (ALL). for 7129 We genes con-  is calculated the for all considered cancer sidered the AML samples to be cancerous and the is calculatedH is the theis holdoutcalculated for all considered percentage. the for all cancer We considered use thesidered geometric cancer the AMLwassidered samples taken the from to AML be reference cancerous samples [17]. to and be For the cancerous liver cancer, and the types, where n types,is number where ofn genesis number considered of genes and consideredALL samples and toALL be normal.samples Data to be for normal. 7129 Datagenes for 7129 genes H is the holdoutH percentage.is the holdout We percentage. use the geometric We use thewas geometric taken fromwas reference taken [17].from Forreference liver cancer, [17]. For the liver cancer, the data set contained 181 samples, of which 105 were random selection of samples from all of the raw cancerous and 76 were normal. Data for 5520 genes data amassed. Figure 6 shows the average gene was provided by reference [18]. Table 1 summarizes expression levels for each type of cancer in our the microarray data sets used in this study. study. Several metrics have been proposed for as- After testing each algorithm, we determined sessing the accuracy of the BPR algorithms. Accu- the overall perfermance with Equations 10 and 12. racies are calculated based on the average of 100 In Figure 7 we see the overall performance across runs. We ran 100 times in order to obtain objective all cancers. Specifically, we see which of the four view of the algorithm performance with different methods consistently reaches the highest perfor- metrics since Training and Testing sets are ran- mance score. In each plot, we see the dark blue bar, domly selected. Accuracy is calculated based on the representing the Multilateral Segment Connection remaining test set with Equation 6. Below are re- method, almost consistently score higher than the sults obtained as well as how optimal conditions are other methods. It is important to also note that determined and highest accuracy attainable for each between the two plots in each figure, the methods cancer type. Table I summarizes the microarray data in the Cross validation plot score higher than the sets used in this study. Data for liver and bladder ones in the Differential Mean plot. has been provided by genome- www5.stanford.edu The average accuracy for each cancer type in reference [12] and [13] respectively. Colon and using the mentioned metrics is compared to previ- leukemia data came from reference [14] and [15], ous algorithms and experiments. We limited com- respectively. Limitations of our study are discussed parison to studies using the same data sets of DNA in the Results section of the paper. microarray samples. Table 2 summarizes the overall In order to develop the HD skeleton-like performance of the proposed BPR and performance structure, we considered the gene expression level from other articles using the same DNA microarray of each sample as a single node in the space, of data. In Peterson (2004), the accuracy is calculated either cancer-containing or cancer-free cells. The using a single dominant mode of the Principal of Cancer Training Set, SC , and Normal Training Set, Orthogonal Decomposition (POD) method. Testing SN , were randomly selected from the total number was done separately with either cancer or normal of samples, based on a Holdout value (percentage of set [7]. In Abbasi (2007), an improved POD classi- the total samples used for training alone). Holdout fication method was introduced, where accuracy is percentages used in this study included 33%, 50%, determined from a combination of both cancer and and 80%. The remaining samples composed the normal sets [8]. Lee uses a multi-nodal POD along Testing sets. Given SC , and SN , we defined with Support Vector Machine, Self-Organized Map, the following two techniques when choosing the and Neural Networks to determine the accuracy training and testing sets. for each cancer type [9]. The last column shows I. Differential Mean: for each gene, g, we define the attained accuracy when using the proposed I. Differential Mean:foreachgene,g,wedefine the differential mean, than previous studies. Except in the case for liver cancer. In Table 2, the differential mean, D(g), as Biomimetic Pattern Recognition method for each D(g), as cancer type. the reported accuracy is that with the following parameters: Cross D(g)= SC (g) SN (g) . (14) | − | From Table 2validation we see the as highesttraining accuracy set selection, Multilateral as extension method, We thenWe sorted then the sorted differential the differential mean meanvalues values for all theoccurs genes at Cross Validationand Segment and Multilateral. Connection Except in construction. consideredfor allin descending the genes considered order and inchose descending the highestin thedifferential case for liver cancer, where the highest accu- order and chose the highest differential mean mean to construct a HD biomimetic structure. racy occurs with DifferentialConclusion Mean and Sequential. to construct a HD biomimetic structure. From Table 2 we see that the proposed BPR reaches In this paper we proposed a new BPR algorithm which employs a II. Cross Validation: for each gene, g, we define higher accuracies than previous studies. Except in II. Cross Validation:the cross validation, for each crossvalind, gene, g, we as define described the crossthe case validation, for liver cancer.different In Table approach 2, the to reported the PHC where elements of the same class are crossvalind,in MatLab as described 2011a Bioinformatics in MatLab 2011a toolbox. Bioinformatics This accuracy toolbox. is that withconnected the following according parameters: to nearest neighbor. This approach allows for This functionfunction generates generates randomly randomly selecting selecting a train- a train- ing setCross and validation test set. as trainingthe development set selection, of two Multilat- different biomimetic structure constructions ing set and test set. eral as extension method,(Nodal and and Segment Segment). Connection The proposed methods were applied to bladder, For each simulation, the training sets contained a in construction. For each simulation, the training sets contained a random selection colon, leukemia, and liver cancer data. We considered different number of samples from all of the raw data amassed. Figure 6 shows the of genes to test highest recognition rate on each cancer type. average gene expression levels for each type of cancer in our study. Results from Figure 7 indicate that the combination of the After testing each algorithm, we determined the overall per- Segment Connection construction and Multilateral extension yield fermance with Equations 10 and 12. In Figure 7 we see the overall higher accuracies than the other combinations examined. Table 2 performance across all cancers. Specifically, we see which of the suggests that the given combination of schemes with a 33% holdout four methods consistently reaches the highest performance score. value give a higher accuracy for each cancer type. We also determined In each plot, we see the dark blue bar, representing the Multilateral that experiments shows the new BPR algorithm has high recognition Segment Connection method, almost consistently score higher than rate when compared to previous techniques, as seen by Table 3. It the other methods. It is important to also note that between the two is important to note that the proposed BPR has higher accuracies plots in each figure, the methods in the Cross validation plot score than the previous studies for bladder, colon, and leukemia cancers. higher than the ones in the Differential Mean plot. However, it is believed that liver cancer is better detected with the The average accuracy for each cancer type using the mentioned Differential Mean and Sequential implementations. This could largely metrics is compared to previous algorithms and experiments. We be due to the fact that liver is the largest data set, as described in limited comparison to studies using the same data sets of DNA Table 1, and thus a sequential method suffices. We also note that microarray samples. Table 2 summarizes the overall performance colon cancer has the lowest accuracy. We believe this could be due to of the proposed BPR and performance from other articles using the small number of genes available. However, Biomimetic Pattern the same DNA microarray data. In Peterson (2004), the accuracy is Recognition has shown to be a promising tool for cancer detection calculated using a single dominant mode of the Principal of Orthogonal using DNA microarray data. Due to constrains, we also expanded our Decomposition (POD) method. Testing was done separately with BPR algorithm into Planar structures. For further information, please either cancer or normal set [7]. In Abbasi (2007), an improved POD refer to the thesis in the Department of Mathematics, California classification method was introduced, where accuracy is determined State University, Fullerton. from a combination of both cancer and normal sets [8]. Lee uses a multi-nodal POD along with Support Vector Machine, Self-Organized Acknowledgements Map, and Neural Networks to determine the accuracy for each cancer The author would like to thank Drs. Amy- beth Cohen and Laura type [9]. The last column shows the attained accuracy when using the Arce for their support, mentoring, and biology consultation. proposed Biomimetic Pattern Recognition method for each cancer type. Graphics and computations were generated with MatLab2011a. From Table 2 we see the highest accuracy occurs at Cross The author would like to acknowledge Dai Nguyen for help in the Validation and Multilateral. Except in the case for liver cancer, where MatLab 2011a coding. This work was supported by a Minority Access the highest accuracy occurs with Differential Mean and Sequential. to Research Careers grant to from the National Institutes of Health From Table 2 we see that the proposed BPR reaches higher accuracies (2T34GM008612-17) and the CSUF Mc- Nair Program.

DIMENSIONS 82 Table 1: Data Sets Used in Study

Table 1: DataData Sets Sets Used in Study Cancer Type Bladder Colon Leukemia Liver No. of Genes 6699 2000 Data Sets 7129 5520 CancerNo. of Type Cancer Samples Bladder103 Primary Tumor 40Colon Primary Tumor 48Leukemia Acute Myeloid 105 PrimaryLiver Tumor No. of Genes 6699 2000 Leukemia7129 5520 No.No. of of Cancer Normal Samples Samples 10322 Healthy Primary Tissue Tumor 2240 Healthy Primary Tissue Tumor 2548 Acute Acute Lymphoblastic Myeloid76 Healthy105 Primary Tissue Tumor LeukemiaLeukemia No.Table of 1: Data Normal sets used Samples in studies. 22 Healthy Tissue 22 Healthy Tissue 25 Acute Lymphoblastic 76 Healthy Tissue Leukemia

Figure 6. Graphs represent the average gene expression level for genes with highest differential mean. Includes Bladder, Colon, Leukemia, and Liver average differential gene expression.

Figure 6. Graphs represent the average gene expression level for genes with highest differential mean. FigureIncludes Bladder, 6. Graphs Colon, Leukemia, represent and Liver the average average differential gene gene expressionexpression. level for genes with highest differential mean. Includes Bladder, Colon, Leukemia, and Liver average differential gene expression.

DIMENSIONS 83 Overall Performance for All CancerTypes for Differential Mean in LS Holdout @ 33% Overall Performance for All CancerTypes for Cross Validation in LS Holdout @ 33% 105 Overall Performance for All CancerTypes for Differential Mean in LS Holdout @ 33% 105 Overall Performance for All CancerTypes for Cross Validation in LS Holdout @ 33% 105 Multi−LS Multi−Node Seq−LS Seq−Node 105 Multi−LS Multi−Node Seq−LS Seq−Node Multi−LS Multi−Node Seq−LS Seq−Node Multi−LS Multi−Node Seq−LS Seq−Node

100 100

100 100

95 95

95 95

90 90

90 90 85 85 Performance Performance

85 85

Performance 80 Performance 80

80 80 75 75

75 70 7570

10 20 30 40 50 60 70 80 90 100 200 300 400 500 600 700 800 900 1000 2000 Max 10 20 30 40 50 60 70 80 90 100 200 300 400 500 600 700 800 900 1000 2000 Max No. of Genes No. of Genes 70 70

10 20 30 40 50 60 70 80 90 100 200 300 400 500 600 700 800 900 1000 2000 Max 10 20 30 40 50 60 70 80 90 100 200 300 400 500 600 700 800 900 1000 2000 Max No. of Genes No. of Genes Figure 7. Performance of Line Segment method across all cancer types. (Left) The performance across all cancers using Differential mean as the training set selection. (Right) The performance using Cross FigureFigurevalidation. 7.7. PerformancePerformance Forof Line bothSegment of plots, method Line across the Segment x-axisall cancer types. represents method (Left) The performance acrossthe number across all all cancer of cancers genes using types. consideredDifferential (Left) mean whenas Thethe training implementing performance set selection. (Right) the acrossThe performancetechnique using Cross validation. and the For both y-axis plots, the shows x-axis represents the performance the number of genes using considered Equation when implementing 12 and the technique for all and four the y-axis methods. shows the performance All allusing cancers Equation using12 and for all Differential four methods. All methods mean were as completed the trainingwith 33% holdout set 100 selection.times. (Right) The performance using Cross validation. For both plots, themethods x-axis were represents completed the with number 33% of holdout genes 100 considered times. when implementing the technique and the y-axis shows the performance using Equation 12 and for all four methods. All Table 2: Overall Accuracy for each Method methods were completed with 33% holdout 100 times. Overall Accuracy for each Cancer Type Cancer Type TableStudies 2: Overall AccuracyBladder for eachColon Method Leukemia Liver POD Cancer & Normal Separately Peterson (2004) 60.00% N/A N/A 75.00% POD Cancer & Normal TogetherOverall Abbasi Accuracy (2007) for each64.52% Cancer TypeN/A N/A 82.30% Multi-Modal POD Lee (2010) N/A 65.35%Cancer97.30% Type 96.43% StudiesMultilateral Bladder98.95% 82.29%Colon 94.09%Leukemia 96.64%Liver Cross Validation POD Cancer & NormalSequential Separately Peterson (2004) 60.00%98.61% 82.04%N/A 94.03%N/A 96.36%75.00% Multilateral 64.52%99.89% 81.59%N/A 94.07%N/A 95.95%82.30% PODDifferential Cancer & Mean Normal Together Abbasi (2007) Multi-Modal PODSequential Lee (2010) N/A99.88% 81.24%65.35% 93.87%97.30% 97.00%96.43% Multilateral 98.95% 82.29% 94.09% 96.64% CrossTable Validation 2: Overall Accuracy for each Method Sequential 98.61% 82.04% 94.03% 96.36% Multilateral 99.89% 81.59% 94.07% 95.95% Differential Mean 4.Sequential CONCLUSION 99.88%with a 33%81.24% holdout value93.87% give a higher97.00% accuracy for each cancer type. We also determined that In this paper we proposed a new BPR al- experiments shows the new BPR algorithm has gorithm which employs a different approach to high recognition rate when compared to previous the PHC where elements of the same class are 4. CONCLUSION withtechniques, a 33% as holdout seen by value Table give 3. It a is higher important accuracy to connected according to nearest neighbor. This ap- fornote each that the cancer proposed type. BPR We has also higher determined accuracies that proach allows for the development of two differ- In this paper we proposed a new BPR al- experimentsthan the previous shows studies thenew for bladder, BPR algorithm colon, and has ent biomimetic structure constructions (Nodal and leukemia cancers. However, it is believed that liver gorithm which employs a different approach to high recognition rate when compared to previous Segment). The proposed methods were applied to cancer is better detected with the Differential Mean the PHCbladder, where colon, elements leukemia, of and the liver same cancer class data. are We techniques, as seen by Table 3. It is important to connected according to nearest neighbor. This ap- and Sequential implementations. This could largely considered different number of genes to test highest notebe due that to the the proposed fact that BPR liver has is the higher largest accuracies data proachrecognition allows for rate the on development each cancer type. of two differ- thanset, as the described previous in studies Table 1, for and bladder, thus a sequential colon, and ent biomimeticResults structure from Figure constructions 7 indicate (Nodal that the and com- leukemiamethod suffices. cancers. We However, also note it thatis believed colonDIMENSIONS cancer that liver 84 Segment).bination The of proposed the Segment methods Connection were applied construction to cancerhas the is lowest better accuracy. detected We with believe the Differential this could due Mean bladder,and colon, Multilateral leukemia, extension and liver yield cancer higher data. accuracies We andto the Sequential small number implementations. of genes available. This could However, largely than the other combinations examined. Table 2 considered different number of genes to test highest beBiomimetic due to the Pattern fact Recognition that liver has is the shown largest to be data suggests that the given combination of schemes recognition rate on each cancer type. set, as described in Table 1, and thus a sequential Results from Figure 7 indicate that the com- method suffices. We also note that colon cancer bination of the Segment Connection construction has the lowest accuracy. We believe this could due and Multilateral extension yield higher accuracies to the small number of genes available. However, than the other combinations examined. Table 2 Biomimetic Pattern Recognition has shown to be suggests that the given combination of schemes References Hoopes. L. Genetic Diagnosis: DNA Microarrays and Cancer. (Nature Wang. S., Zhao. X. Biomimetic Pattern Recognition Theory and Its Education, Scitable by Nature Education). 2008 Applications. Chinese Journal of Electronics. Vol 3. No. 3. 2004

Babu. M. Microarray Data Analysis,(Encyclopedia of Genetics, Qin. H., Wang. S., Sun. H. Biomimetic Pattern Recognition for Computational Genomics, Proteomics and Bioinformatics, Horizon Speaker-Independent Speech Recognition, Proceedings of the Press, UK). 2004 International Conference on Neural Networks and Brain. Vol. 2. (pgs 1290 - 1294). 2005 Vessey. K. Use of Microarrays to Investigate Plant Response to Stress. (GNC Training Workshop). 2008 Zhai. Y., Zeng. J., Gan . J., Xu .Y. A study of Biomimetic Pattern Recognition based Iris Recognition Method, Proceedings of the Rhodes. D.R., Yu. J. Large-scale meta-analysis of cancer microarray International Sympo- sium on Information Processing. Vol. 2. data identifies common transcrip- tional profiles of neoplastic (pgs. 71-74) 2009 transformation and progression. Proceedings of the Natural Academy of Sciences of the United States of America. 2004 Wang. Z., Mo. H., Lu. H., Wang. S. A method of Biomimetic Pattern Recognition for Face Recognition, Proceeding of the International Peterson. D., Lee. C.H. A DNA-based Pattern Recognition Technique Joint Conference on Neural Networks ). Vol. 3. (pgs. 2216 - 2221). 2003 for Cancer Detection. Proceedings of the 26th Annual International Conference of the IEEE EMBS. 2004. Chen, X., et. al., Variation in Gene Expression Pat- terns in Human Gastric Cancers, Mol Biol Cell. 2003 Aug; 14(8): 3208-15. Epub 2003 Abbasi. N., Lee. C.H. Feature Extraction Techniques on DNA Microarray Data for Cancer Detection. Proceedings of World Alon. U., et al. Broad Patterns of Gene Expression Revealed by Cluster- Congress on Bioengineering. 200 ing Analysis of Cancer and Nor- mal Colon Tissues Probed by Oligonucleotide Arrays. Proc. National Academic Science. Lee. C.B., Lee. C. H. Extended Principal of Orthogonal Decomposition Method for Cancer Detection. Proceeding of the International Journal T. R. Golub, D. K. Slonim, et al. Molecular classifi- cation of cancer: of Bioscience, Biochemistry and Bioinformatics. 2010 class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531-537.

Chen, X. et al. Gene expression patterns in human liver cancers. Molecular Biology of the Cell. 2002.

DIMENSIONS 85 A Statistical Analysis of Orange County K9-12 Mathematics Achievement Data

Department of Mathematics, California State University, Fullerton

Nathan Robertson, Soeun Park, Susan Deeb Advisor: Dr. Sam Behseta, Dr. David Pagni

Abstract In this work, we consider the problem of identification of those The problem of missing values is a well-studied subject in statistically significant variables that have an effect on the performance statistical sciences (Little, 2011). Missing data problems often arise of K9-11 students in mathematics. We address this problem by in the context of collecting educational data mainly because many constructing a number of statistical linear models with mixed-effects. students withhold personal and other types of information. Also, Additionally, in order to identify the top performers, we employ students get to take a variety of math courses, not necessarily in a three types of clustering techniques, namely, the method of sequence and thus, a gap will appear in a row associated with their Kmeans, Hclust, and Mclust. We compare the precision of these records. In other words, our matrix of observations will be sparse due methods through measuring their misspecification rates via to missing observations per individuals. extensive simulation studies. Finally, we employ a Bayesian missing In this work, we shall particularly concentrate on applying value imputation methodology in order to estimate some of the Bayesian methods for missing value estimation. This is beneficial missing values associated with our data set. for multiple reasons: 1 - we contribute to the sparse literature in the field, 2 - by applying model checking techniques, we will verify the Introduction quality of our missing value estimations, and 3 – we would potentially This data set was generated through an NSF funded project titled provide a more complete picture of the underlying relationship Teachers Assisting Students to Excel in Learning Mathematics Phase between the variables of interest. II (TASEL-M2), led by professor David Pagni from the mathematics department at CSUF. As part of this extended study, California State Modeling Testing (CST) scores in mathematics were collected from a group of The original data set is a combination of surveys collected at different nearly 3000 K6-12 students whose names and personal information time periods, “waves” in order to measure students’ and teachers’ were removed to protect their privacy. An ultimate goal of this study performance throughout the different school years. Our models are is to examine students’ socio-economic and demographic variables based on data from wave 3 of the survey which was conducted in the to view their effect on mathematical achievement. spring of 2005. We employ a two-stage modeling scheme in order to first, identify Missing Values the significance of teachers’ effects, and second, quantify their In this blind data set, we had access to a number of variables such as strengths on the performance of freshmen and sophomore high school students’ math scaled scores from CST, mathematics courses taken, students in Orange County, California. The first stage borrows from the grade level progression, and mathematics raw scores for students in properties of the so-called “fixed-effect” linear modeling and is merely grade 6 to 11 from 2004-2008. Due to a steady flow of student used to demonstrate whether teachers’ performance would play any migration in and out of the program at various time points, we role on students’ mathematical achievement. The second stage is built encountered a considerable number of nonexisting values that may around the idea of utilizing multi-level regression modeling in order to be interpreted as missing at random. It is those missing values that measure the magnitude and statistical significance of teachers’ effect we aim to estimate. on students’ California Standards Test (CST) math scores.

DIMENSIONS 86

value estimations, and 3 – valuewe would estimations, potentially and provide 3 – we awould more potentiallycomplete picture provide of a the more complete picture of the f(θ,ψ|Y,M) = C x π(θ,ψ) x L(θ,ψ,|Y) underlying relationship betweenunderlying the variables relationship of interest. between the variables of interest. f(θ,ψ|Y,M) = C x π(θ,ψ) x L(θ,ψ,|Y) value estimations, and 3 – we would potentially provide a more complete picture of the

underlying relationship between the variables of interest. where, f(θ,ψ|Y,M) is the posterior distribution, C is a constant not dependent on ψ and θ, π(θ,ψ) is where, f(θ,ψ|Y,M) is the posteriorf(θ ,distribution,ψ|Y,M) = C xC π is(θ a,ψ constant) x L(θ,ψ not,|Y) dependent on ψ and θ, π(θ,ψ) is Modeling Modeling f(θ,theψ|Y,M) full prior= C x distribution π(θ,ψ) x L( onθ,ψ the,|Y)f( θparameter ,ψ|Y,M) = space, C x π (andθ,ψ )L( xθ L(,ψθ,|Y),ψ,|Y) is the complete-data likelihood The original data set is a combination of surveys collected at different the full time prior period distributions, “waves” on the in parameter space, and L(θ,ψ,|Y) is the complete-data likelihood Modeling The original data set is a combination which of surveys can be col writtenlected asat different time periods, “waves” in order to measure students’ and teachers’ performance throughoutwhere,which the different canf(θ, ψbe|Y,M) written school is the as years. posterior Our distribution, C is a constant not dependent on ψ and θ, π(θ,ψ) is The original Clusteringdata set is a combination orderof where,surveys to measure f( colθ,ψlected|Y,M) students atis differentthe’ andposterior teachers time where,distribution, period ’ performance s,f( θ“waves”,ψ|Y,M) C is throughouta isin constant the posterior not thef( dependentθdifferent ,distribution,ψ|Y,M) school= on C ψ xC andπ years.is(θ a, ψ θconstant,) πxOur( θL(,ψ θ), ψisnot,|Y) dependent on ψ and θ, π(θ,ψ) is models are based on data from wave 3 of the survey which was the cond fullucted prior in distribution the spring of on 2005. the parameter space, and L(θ,ψ,|Y) is the complete-data likelihood order to measureFrom students the larger’ and set teachers of thosemodels’ theperformance students full are prior based who distributionthroughout on were data enrolled from theon wavethedifferent in parameter either the3 of fullschool the of prior surveyspace, years. distribution andwhich Our L( θwas,ψ on,|Y) cond the is uctedtheparameter completeL( inθ, ψthe|Y)=f(Y| space, s-pringdata and θ likelihoodof)f(M|Y, L(2005.θ,ψψ,|Y) ) . is the complete-data likelihood We employ a two-stage modeling scheme in order to first,where,which identify canf(θ, ψ bethe|Y,M) written significance is the as posterior of distribution,L(θ,ψ|Y)=f(Y|f(θ ,Cψ |Y,M)isθ a)f constant(M|Y, = Cψ x) . π not(θ, ψdependent) x L(θ,ψ,|Y) on ψ and θ, π(θ,ψ) is models are basedthe two on datastudied from high wave schools 3 ofwhich the from surveyW canethe employ TASEL-M2be which written a twowas as -project, condstage ucted modeling we which in selected the scheme scanpring be inofwritten order2005. as to first, identifyf( theθ,ψ |Y,M)significance = C x πof(θ ,ψ) x L(θ,ψ,|Y) value estimations,teachers' effects,and 3 – andwe wouldsecond,teachers' potentially quantify effects, their provide strengthsand asecond, more on complete quantifythe performancethe picturetheir full strengths prior of of the freshmen distribution on the and performance on the parameter of freshmen space, and L(θ,ψ,|Y) is the complete-data likelihood We employa subset a twocontaining-stage modeling students scheme enrolled in inorder ninth to grade first, identify in 2004 the or 2005 significance However, However, of with withincomplete incomplete data, data,the full the posterior full posterior distribution distribution becomes becomes underlyingsophomore relationship high between school tstudentshe variables in Orange of interest. County, California. whichThe first canHowever, stage be where,written borrows with f( asθ ,incomplete fromψ|Y,M) the is the data,L( posteriorθ, ψthe|Y)=f(Y| full distribution, posteriorθ)f(M|Y, distributionψ C). is a constant becomes not dependent on ψ and θ, π(θ,ψ) is teachers' effects, and second, quantifysophomore their strengths high on school the performance students in Orangeof freshmenL(θwhere,, ψC|Y)=f(Y|ounty, and f( θ California.,θψ)f|Y,M)(M|Y, isψ )theThe. posterior first stageL(θ ,distribution,ψ borrows|Y)=f(Y| θfrom)f C(M|Y, is the a ψconstant ). not dependent on ψ and θ, π(θ,ψ) is propertiestotaling toof 1070the so students.-called properties"fixed We considered-effect" of the linear sothe- calledmodeling following "fixed and mathematical-effect" is merely linear used modeling theto demonstra full andprior is tedistribution merely used on to the demonstraf( parameterθ,ψ|Y,M)te = space, C x π and(θ,ψ L() xθ L(,ψ,|Y)θ,ψ,|Y) is the complete-data likelihood sophomore high school students in Orange County, California. The first stage borrowsthe full from prior the distribution f fullon(θ the,ψ|Y parameterobs,M) πspace,(θ,ψ)L( andθ,ψ L(,|Yθobs,ψ,M),|Y) is the complete-data likelihood whetherachievement teachers' variables: performance the students’ would play standardized any role on test students' scores mathematical from However, which achievement. with can incomplete be written Theffull (θ as,data,ψL( |Yθobs, ψthe,M)|Y)=f(Y| full πposterior(θ)f,ψ(M|Y,)L(θ distribution,ψ),|Y. obs,M) becomes propertiesModeling of the so-called "fixed-effect"whether linear teachers' modelingHowever, performance and with is merelyincomplete would used data,play to demonstra whichanytheHowever, fullrole can posterior teon be withstudents' written incomplete distribution asmathematical data, becomes the achievement. full posterior The distribution becomes secondgrades stage 9, 10, is and buil 11,t around theirsecond highestthe idea stage CAHSEEof utilizingis buil score,t around multi and- levelthe their idea regression overall of utilizing modeling multiwhere, - inlevel f(orderθ ,regressionψ|Y,M) to is themodeling posterior in distribution,order to C is a constant not dependent on ψ and θ, π(θ,ψ) is whetherThe original teachers' data performance set is a combination would play of anysurveys role colon lectedstudents' at differentmathematical time where periodachievement. L(θs,,ψ “waves”,|Yobs The,M) in is the likelihood due to ∝observed values. Now, the observed likelihood is measure the magnitude and statistical significance of teachers’ e whereffect on L(However, students’θ,ψ,|Ytheobs full ,M)with California prior isincomplete the distribution likelihood ffull(θ ,data,ψ|Y on obsdue the ,M)the tofull parameter ∝observed πposterior(L(θ,θψ,)L(ψ|Y)=f(Y| values.space,θ distribution,ψ,|Y obsand θNow,)f,M)(M|Y, L( θthebecomes,ψψ,|Y) )observed. is the completelikelihood-data is likelihood secondorder stage to measuremath is buil GPA. tstudents around We ’implementedthe and idea teachers ofmeasure utilizing’ threeperformance the multiclustering magnitude-level throughout techniquesregression and stat theisticalf modelingfull differentin( θorderobtained, ψsignificance|Yobs inschool,M) order by integratingwhere years.πof to(θ teacher ,ψ)L( Our θ , s ψall ’ ,|Y e missingffect obs ,M) on isvaluesstudents’ the likelihood out California of the due full tolikelihood: observed values. Now, which can be writtenffull(θ as,ψ |Yobs,M)L(θ ,ψ |Y)=f(Y|π(θ,ψ)L(θθ)f,ψ(M|Y,,|Yobsψ,M)). Standards Test (CST) mathSta scores. ndards Test (CST) math scores. obtained by integrating all missing values out of the full likelihood: measuremodels the are tomagnitude based address on data andtwo stat fromobjectives:istical wave significance 3 (a) of identifythe survey of which teacher which clusterings ’was effect cond on techniqueucted students’ in the California springthe of observed 2005. likelihood is obtained∝ by integrating all missing values where L(θ,ψ∝,|Y obs,M)However, is the likelihoodffull with(θ,ψ incomplete|Y obsdue,M) to observed πdata,(θ,ψ )L(the values.θ full,ψ,|Y posteriorobs Now,,M) the distribution observed likelihoodbecomes is Standards WTesteattains employ (CST) the amath two lowest -scores.stage misspecification modeling where scheme L( θrate,,ψ in,|Y order followedobs,M) to isfirst, theby (b)identifylikelihood applyingwhere the due significanceL( θto,ψ observed,|YoutHowever,L(obsθ of,,M)ψ of ,|Ythe values.obsis withfull ,M)the likelihood: likelihood =incompleteNow, the dueobserved data, to ∝observedthe likelihood full posterior values. is Now,distribution the observed becomes likelihood. is obtained by integrating L(θ,ψ,|Yobs all,M) missing = values out of theL(θ ,fullψ|Y)=f(Y| likelihood:θ)f(M|Y, ψ). . teachers' Clusteringeffects,that method and second, on student quantify Clusteringmathematical obtainedtheir strengths by achievementintegrating on the performance all data missing in orderobtained ofvalues freshmen to out by integratingof and the full likelihood: all missing values out of the full likelihood: where L(θ,ψ,|Y obs,M) is the likelihood!"# due!"##"$% ftofull ∝(observedθ,ψ|Yobs,M) values. !"# π( θNow,!"##"$%,ψ)L( theθ,ψ ,|Yobservedobs!"##"$%,M) likelihood is sophomoreFrom high the school larger students set of those in OrangeFrom students the C ounty,larger who were setCalifornia. of enrolled those Thestudents in eitherfirst stagewho of the were borrows Astwo enrolledin studied Rubin from in thehigh(1976), either schools ofunder the𝑓𝑓 ( twothe𝑌𝑌ffull!"# (missing θstudied, ,𝑌𝑌ψ!"##"$%|Yobs ,M)athigh| 𝜃𝜃random ) 𝑓𝑓schools (π𝑀𝑀(θ|, 𝑌𝑌ψmechanism,!"#)L( ,θ𝑌𝑌,!"##"$%ψ,|Yobs ,M)one, 𝜓𝜓) 𝑑𝑑can𝑌𝑌!"##"$% assume Clusteringstudy the trajectory of students’ performances, as well as facilitateobtained As by in integrating L(Rubinθ,ψ,|Y However,(1976),obs all,M) missing =u nder with𝑓𝑓( the𝑌𝑌 valuesincomplete missing, 𝑌𝑌 out of at data, | the𝜃𝜃random)𝑓𝑓 full(the𝑀𝑀 |likelihood:full 𝑌𝑌mechanism, posterior, 𝑌𝑌 one,distribution𝜓𝜓) 𝑑𝑑can𝑌𝑌 assume becomes. propertiesfrom of the the so TASEL-called- "fixedM2 project,-effect"from we linearthe selected TASEL modeling aL( -subsetM2θ,ψ andproject,,|Y containingobs is,M) merely we = selected studentsused to a demonstrasubsetenrolledL( containing θin,teψ ni,|Y nthobs ,M)grade students = in enrolled in ninth. grade in . From the largerfurther set of comparisons those students between who were achievement enrolled in variables. either of the two studied high where schools L( θ,ψ,|Yobs,M) is the likelihood due to ∝observed values. Now, the observed likelihood is whether teachers'2004 or 2005 performance totaling towould 10702004 play students. or any 2005 role We totaling on considered students' to 1070 mathematicalthe students. following We whereachievement.mat consideredhematical L(θ,ψ ,|Ythe Theachievementobs following,M) is the mat f(M|Y likelihood!"#hematicalobs,!"##"$% Ymissing due achievement ,toψ) ∝observed = f(M|Y!"# obs values.,!"##"$%ψ) Now, the!"##"$% observed likelihood is from the TASEL-M2 project, we selected a subset containing students enrolled !"# in !"##"$%ninth obtainedgrade in by!"# integrating!"##"$%𝑓𝑓(𝑌𝑌 all, 𝑌𝑌missingf!"##"$%full(θ, ψ|values𝜃𝜃|Y)𝑓𝑓obs(𝑀𝑀,M) out|𝑌𝑌 of π, (𝑌𝑌theθ,ψ full)L(θ likelihood:,,ψ𝜓𝜓,|Y)𝑑𝑑obs𝑌𝑌 ,M) variables: the students’ standardized test scores from grades 9,𝑓𝑓 10,( 𝑌𝑌 and, 𝑌𝑌 As11, intheir AsL(Rubin|𝜃𝜃 θ in)highest,ψ𝑓𝑓 Rubin(,|Y 𝑀𝑀(1976),obs|𝑌𝑌 ,M) (1976),, 𝑌𝑌=under underf(M|Y the!"#, 𝜓𝜓 missingobs )the𝑑𝑑,!"##"$% 𝑌𝑌Y missing at ,randomψ) at= randomf(M|Y mechanism,!"#obs, !"##"$%ψmechanism,) one can one!"##"$% assume can. 2004second or 2005 stage totaling is buil tto around 1070 students.the ideavariables: ofWe utilizing considered theAs multi instudents’ Rubin the-level following (1976),standardized regression umatnder hematicalmodeling test the scoresmissingobtainedAs inachievement infromorder at Rubin byrandom grades to integrating (1976), mechanism,9, 10, u nderalland 𝑓𝑓missing (11,the𝑌𝑌 one theirmissing, 𝑌𝑌can values highest assume at |out 𝜃𝜃random) 𝑓𝑓 of(𝑀𝑀 the| 𝑌𝑌mechanism, full, 𝑌𝑌likelihood: one, 𝜓𝜓) 𝑑𝑑can𝑌𝑌 assume

variables:measure the CAHSEEtheMethodology students’ magnitude score standardized and, and stat theiristical test overallCAHSEE significance scores math fromscore GPA. ofgrades, and teacher We their 9,implemented 10,s ’overall e ffectand 11, mathon theirthstudents’forree GPA. all highest clustering missing We California assumeimplemented values, techniques and th inree clustering!"# !"##"$% techniques in !"#∝ !"##"$% !"##"$% for all missingAs in whereRubin values, L((1976),θ and,ψ,|YL( obsuθnder,,M)ψ𝑓𝑓,|Yf(M|Y( the 𝑌𝑌obsis ,M)the missingobs, 𝑌𝑌 , likelihood =Y missing at|, 𝜃𝜃ψrandom))𝑓𝑓 due=( 𝑀𝑀f(M|Y |to 𝑌𝑌mechanism, observedobs, 𝑌𝑌,ψ) values. one, 𝜓𝜓) 𝑑𝑑can𝑌𝑌 Now, assume the observed likelihood. is Standardsorder MissingTest to(CST) address Values math two scores. objectives: order (a)to address identify two which objectives: clustering (a) technique f(M|Yidentifyobs, which Yattainsmissing clustering, ψthe) = L(lo f(M|Ywestθ,ψ technique,|Y obsobs,ψ,M)) f(M|Y = attains π,( θYthe,ψ) lo =west, ψπ()θ =) πf(M|Y(ψ). ,ψ) . CAHSEE score, and their overall math GPA. We implemented three clustering techniquesobtained in by integrating allobs missingmissing values out ofobs the full likelihood: misspecification rate, followed by (b) applying that method on student mathematical π(θ,ψ) = !"#π(θ) π!"##"$%(ψ). !"# !"##"$% !"##"$% order to addressSuppose two objectives: that the data (a) identifyaremisspecification represented which clustering as rate, an n followed xtechnique p matrix by -attains (b)a apply the ing lowest that method on student mathematical𝑓𝑓(𝑌𝑌 , 𝑌𝑌 |𝜃𝜃)𝑓𝑓(𝑀𝑀|𝑌𝑌 , 𝑌𝑌 , 𝜓𝜓)𝑑𝑑𝑌𝑌 achievement data in order to study the trajectory of students’ performancesfor all missing, as values,well asAs andfacilitate in Rubin f(M|Y (1976),obs, u!"#Yndermissing!"##"$% the,ψ ) missing= f(M|Y atobs random,ψ!"#) !"##"$% mechanism,!"##"$% one can assume misspecificationClusteringmatrix rate, with followed rows i by= 1,..., (b) achievementapply n, foranding all columns thatmissing data method j invalues,= 1,...,order on studentandp to. Let study math thefor ematical trajectory all missingSpecifically, offorAs students’values, allin missingRubin as and per performances (1976), values,Hoff (2009), u andnder𝑓𝑓, (as the 𝑌𝑌c onsiderwell missing, 𝑌𝑌 as facilitatethe at |case 𝜃𝜃random)𝑓𝑓 ( of𝑀𝑀 data| 𝑌𝑌mechanism, modelled, 𝑌𝑌 one,via𝜓𝜓) 𝑑𝑑acan 𝑌𝑌multivariate assume further comparisons between achievement variables. Specifically, as perL( Hoffθ,ψ,|Y (2009),obs,M)π ( c=θonsider, ψ) = π (theθ) πcase(ψ). of data modelled via a multivariate . achievementFrom the larger data inset order of those to study Tstudents thefurther trajectorywho were comparisons of enrolled students’ betweenin eitherperformances ofachievement the two,normal as studied wellπ variables.( θdistribution ,asψ ) high facilitate= π ( schoolsθ) π indexed (ψ). with the meanπ(θ, ψparameter) = π(θ) π θ(ψ, and). the covariance Σ. Subsequently, the M M M indicate a vector of zeros and ones such that f(M|Y obs, Ymissing,ψ) = f(M|Yobs,ψ) i = ( 1,..., p) fornormal all missing distribution values, indexed and with the mean parameter!"# !"##"$%θ, and the covariance!"# !"##"$%Σ. Subsequently,!"##"$% the furtherfrom comparisonsthe TASEL- M2between project, achievement we selected variables. a subset containing students enrolledsampling in niprobabilitynth grade ofin data for studentf(M|Y i canobs𝑓𝑓 be,( Y𝑌𝑌 missing given, 𝑌𝑌,ψ as) = f(M|Y|𝜃𝜃)𝑓𝑓obs(𝑀𝑀,ψ|𝑌𝑌) , 𝑌𝑌 , 𝜓𝜓)𝑑𝑑𝑌𝑌 M = 1 whenever the (i,j) cell is missing, and Mi,j = 0, otherwise. sampling Specifically, probability Asas of per indata Rubin Hoff for (2009),student(1976),π ( ciuθ onsidercannder,ψ) be= the π given(the θmissing) πcase( asψ). of at data random modelled mechanism, via a multivariate one can assume 2004 or 20052. i,jMethod totaling to 1070ology students. We consideredSpecifically, the followingas per Hoff mat (2009), hematical cSpecifically,onsider achievement the case as per of dataHoff modelled (2009), c onsidervia a multivariate the case of data modelled via a multivariate Also, let Y be the observed2. Method value in rowology i and column j. Withoutnormal distribution for all missingindexed values,with the and mean parameter θ, and the covariance Σ. Subsequently, the variables: the students’obs ,standardizedi,j testnormal scores distribution from grades indexed 9, 10, andwith 11, thenormalP(m their meani,{yfor highestdistributioni,j allparameter:m i,jmissing = 1}| θindexedθ values,,,Σ and) the andwith covariance the mean Σ parameter. Subsequently, θ, and the the covariance Σ. Subsequently, the 2. Methodology samplingP(mi,{ySpecifically,i,j :mprobabilityi,j = 1}|θ as, Σof )per data Hoff for student(2009), ic onsidercan f(M|Ybe given theobsπ case,( θYas,missingψ )of = data ,πψ()θ =) modelled π f(M|Y(ψ). obs ,viaψ) a multivariate CAHSEE lossscore of, andgenerality, their overall we can math assumesampling GPA. aWe missing probability implemented at random of data three mechanism for clustering studentsampling i techniquescan= probabilityf(m bei) givenf({ySpecifically, ini,j :masof i,jdata = 1}| for θ as,studentΣ )per = f(mHoff i )cani (2009),π be(θ ,givenψ) =consider π as(θ ) π( ψthe). case of data . Missing Values Missing Values normal distribution= f(m i) f({yi,j indexed :mi,j = 1}|withθ,Σ the) = mean f(m)i parameter θ, and the covariance Σ. Subsequently,. the order to address(our intent two objectives: has been to (a) demonstrate identify which that clustering in an extremely technique large attains data the lowestmodelled via a multivariate normal distribution indexed with the Suppose that the data are representedSuppose that as thean ndata x p arematrix represented - a matrixsamplingP(m as withani,{y ni,j rowsx :mprobability p i,jmatrixfor i= = 1}|all 1,…, θ-missing ,Specifically, aΣof )matrixn ,data and values, for with student as rows and per iHoffi can= 1,…, (2009),be given n!,!, and c onsideras !,! the case!!,!: !of!, !data!! modelled!,! via a multivariate Missing Values P(mi,{yi,j :mi,j = 1}|θ,Σ) P(m i,{yi,j :mi,j = 1}|θ,Σ) (𝑓𝑓(𝑦𝑦!,!, … , 𝑦𝑦!,!|𝜃𝜃, 𝛴𝛴) !!,!:!!,!!! 𝑑𝑑𝑦𝑦!,! ) misspecificationset such rate, as thefollowed one we by are (b) workingapplying with, thatT method it is nearly on studentimpossible math toematicalT =The f(m mainmeannormalSpecifically,) f({y consequence parameter :mdistribution = as 1}| per θ θof,, Σ indexedHoffand )this = f(mthe(2009),is that ) withcovariance (we𝑓𝑓 cthe(onsider 𝑦𝑦can mean, π…integrate (Σθ ,the., 𝑦𝑦ψ parameterSubsequently,) case=|𝜃𝜃 π ,out𝛴𝛴( θof)) the πdataθ(,ψ andmissing). modelledthe the 𝑑𝑑covariance data𝑦𝑦 via) .to aobtain multivariate Σ. Subsequently, the the columns j = 1,…, p. Let Micolumns = (M1,…, j =M 1,…,pf(m) indicate) f({yp. Let :m aM vectori == ( M1}| 1of,…,θ, Σzeros ) M =p )f(m and indicate)The ones maini asuch vector consequencei,j that of i,jM zerosi.j =1 ofand this ones is thatsuchi we that can. M i.jintegrate =1 out the missing data to obtain the Supposeachievement thatseek the data data out in deterministicare order represented to study patterns theas an trajectory n xfor p missingmatrix of istudents’ - values,a matrixi,j performancesi,j and with thus, rows marginalwe i, = as normal1,…, well=i f(mprobability nas, samplingi distribution) andfacilitate f({y i,j :m of probability i,j observed =indexed 1}|θ,Σ data.)with of= dataf(m theThis)i formean would student parameter allow i can us θ beto, and utilizegiven the asthecovariance simple Gibbs Σ.. Subsequently, sampling the whenever the (i,j) cell is missing,wheneverT and the M i,j(i,j =0,) cell otherwise. is missing, Also, and letP(m M Yi,jiobs,,{y =0,i,ji,j be:motherwise. thei,jsampling = observed 1}|θ Also,,Σ )probability value let Y obs,in i,j of be data the forobserved student value i can in be given as columns j = 1,…, p. Let Mi = (M1,…,Mp) indicate a vector of zeros and onesmarginal suchsampling that probability Mi.j probability =1 of observed of data data. for student This would i can!,! allowbe given! us,! toas utilize!!, !the:!!, !simple!! ! ,!Gibbs sampling further comparisons between achievement variables. algorithm to update!,! Specifically,the!, !posterior !asdistribution!,! :per!!,! !Hoff! ( 𝑓𝑓 (2009),!of,!(𝑦𝑦 θ and, … c ,onsiderΣ𝑦𝑦 along|𝜃𝜃, 𝛴𝛴 thewith) case im!,! plementingof!,! data𝑑𝑑 𝑦𝑦modelled) the Bayesian via a multivariate rowtreated i and the column problem j. Without as onerow loss of i missingand of generality,column at random). j. Without we can However, assumeloss of generality,a asalgorithm missing noted=The f(mat (towe 𝑓𝑓mainrandom (iupdate) 𝑦𝑦can f({y ,consequence assume…i,j mechanismthe:m, 𝑦𝑦 i,j posterior |= 𝜃𝜃a ,1}|missing𝛴𝛴)θ of, Σ distribution)this = at f(mis random that)i 𝑑𝑑 (𝑦𝑦we𝑓𝑓 of (mechanism) 𝑦𝑦 canθ!, !and, …integrate ,Σ𝑦𝑦 !along,! |𝜃𝜃 ,out𝛴𝛴 with) the! im missing:!plementing!! 𝑑𝑑 data𝑦𝑦!,!) .tothe obtain Bayesian the whenever the (i,j) cell is missing, and Mi,j =0, otherwise.The main Also, consequence let Yobs,i,j ofbe this themultiple observedis thatThe imputation we mainnormal value can consequence integrate in distributionscheme. out of the indexed this missing is that with data we the canto mean obtain integrate parameter the out the θ, andmissing the covariancedata to obtain Σ. Subsequently,the the (ourin Little intent (2011), has been the to mechanism demonstrate(our intent has that haslittle in been aneffect extremely to demonstrate on the large Bayesian datathat marginalmultiple setin an such extremelyimputationprobability asP(m thei,{y one large i,j scheme.of :mwe observed i,jdata are = 1}|setθ ,suchdata.Σ) asThis the would one we allow are us to utilize the simple Gibbs sampling row i and column j. Without loss of generality,marginal we probabilitycan assume of a observedmissing at data.marginal random ThisP(m mechanismprobability iwould,{yi,j :m allowi,j = of 1}| usobservedθ ,toΣ )utilize data. the Thissimple would Gibbs!,! allow sampling !us,! to utilize!! ,!the:!! ,!simple!! ! ,!Gibbs sampling 2. Methodworkingparadigmology with, explained it is nearly below. impossibleworking Indeed, with, to Little seek it is recommends out nearly deterministic impossible ignoring p algorithmatterns to seek the Thefor out to missing maindeterministic samplingupdate consequence values, the= probabilityf(m posterior pi )atternsand f({y ofi,j distribution this:mforofi,j data missingis= that1}| forθ (we , 𝑓𝑓studentΣ ofvalues,()𝑦𝑦 canθ= andf(m, …integrate i and),Σcani𝑦𝑦 along be|𝜃𝜃 ,outgiven𝛴𝛴 with) the imas missing plementing𝑑𝑑 data𝑦𝑦 ) theto obtain Bayesian the . (our intent has been to demonstrate that inalgorithm an extremely to update large the data posterior set such distribution Inas iterationthe one we of(r= +θare f(m and1) iof) Σf({y the alongi,j algorithm, :m withi,j = 1}| im θplementing,Σ) = ff(m(m))i the Bayesian . thus, we treated the problem as one of missing at random). However,algorithm as noted to update in Little the (2011),posterior distribution ofi θ and Σ along with implementing the Bayesian working with,missing it is nearly data impossiblemechanism tothus, when seek we outit treatedis deterministicjustified. the problem Note patterns that as onethe for offullmissingmarginalmultipleIn missing iterationdata values, imputationprobability at (r (r+1random) + ) and1) of scheme.of .the However, observed algorithm, (r) asdata. noted (r)This in wouldLittle (2011),allow us to !utilize,! the!,! simple Gibbs!!,!:!!,! !sampling! !,! multiple imputation scheme. multiple1) Sample imputation θ(r+1) from scheme. f(θ|Yobs ,Y(r)miss,Σ(r)); (𝑓𝑓(𝑦𝑦 , … , 𝑦𝑦 |𝜃𝜃, 𝛴𝛴) 𝑑𝑑𝑦𝑦 ) the mechanism has little effectthe mechanism on the Bayesian has little paradigm effect onexplained the Bayesian below. paradigm Indeed,(r+1)P(m i,{y Littlei,j Theexplained :m i,jmain = 1}| consequence below.(r)θ,Σ) (Indeed,r+1) of thisLittle is that!,! we can!,! integrate!! ,out!:!! ,the!!! missing!,! data to obtain the thus,Missing we treatedset Values willthe problembe the aggregate as one of missingof Y atY random)Y . However,, where Yas notedalgorithm1)2) are Sample in theLittle to θΣ (2011),update fromfromThe the f(mainf( θposteriorΣ|Y|Yobs consequence,Y,Y missdistribution,,Σθ ); )of; thisof θ is (and𝑓𝑓 that(𝑦𝑦 Σ wealong, … can, 𝑦𝑦 with integrate|𝜃𝜃, im𝛴𝛴)plementing out the the𝑑𝑑𝑦𝑦 Bayesian) recommends ignoring the missing = data( obs mec, missinghanism) when it missingis justified. Note(r+1) Thethat themain= full f(mconsequence datai) obsf({y seti,j(r) :mmiss i,jof =( r+1this 1}|) θis,Σ that) = wef(m )cani integrate out the missing data to obtain. the theSuppose mechanism that hasthe datalittle areeffect represented on the recommendsBayesian as an n xparadigm p ignoringmatrix explained - athe matrix missing below. with data rowsmultipleIn 2)Indeed, meciterationSample i =hanism 1,…, imputationLittle Σ(r(r+1)marginal n+ when, 1)and from of scheme.it the f(probabilityisΣ justified.algorithm,|Yobs ,Y miss ofNote ,observedθ(r+1) that);(r+1) the data. full Thisdata setwould allow us to utilize the simple Gibbs sampling missing data. In iteration (r + 1) of the algorithm,In3) iterationSamplemarginal Y(r(r+1(r+1)missing + probability) 1)miss of from datathe algorithm,f(Y toofmiss obtainobserved(r)|Yobs , θthe(r+1)(r) data. marginal,Σ(r+1) This) would probability allow ofus observed to utilize thedata. simple Gibbs sampling will be the aggregate of Y =will (Yobs beT, Ythemissing aggregate),(r+1 where) of Y missingY = (Y are, the(r)Y missing),(r) where data. Y algorithm are tothe update missing the data. posterior distribution of θ and Σ along with implementing the Bayesian recommendscolumns j =ignoring 1,…, p. theLet missing Mi = (M data1,…, mecMp)hanism indicate when a vector it is justified.of zerosobs andNotemissing 1)ones3) that SampleSample such the full thatθY(r+1 missingdata M) missfromi.j set=1 from f(θ|Y f(Yobsmiss,Y(r)|Ymissobs,,θΣ(r));, Σ ) !,! !,! !!,!:!!,!!! !,! 1) Sample θ from f(θ|Yobs,Y miss1) , ΣSamplealgorithm); θ (r+1) tofrom update f(θ|Y theobs ,Yposterior(r)miss,Σ ( r+1distribution); ) of θ and(𝑓𝑓 (Σ𝑦𝑦 along, … , 𝑦𝑦with|𝜃𝜃 im, 𝛴𝛴plementing) the𝑑𝑑𝑦𝑦 Bayesian) Now,Now, suppose suppose that that f(Y| θ ) isNow, the distribution suppose(r+1) that of of f(Y|all all θdata,data,) is(r) thegiven given2) distributionSample( r+1all ) associated Σ Thismultiple of fromall would unknown data,The imputationf(Σ allowmain |Ygiven ,Y consequenceus all scheme.to associated ,utilizeθ ) ;the of unknown this simple is that Gibbs we samplingcan integrate algorithm out the tomissing data to obtain the willwhenever be the aggregate the (i,j) cell of Y is = missing, (Yobs, Ymissing and M), i,jwhere =0, otherwise. Ymissing are Also, the missing let Yobs, data.i,j beIn iterationthe observed (r(r+1) + 1)value of the in algorithm,obs (r)miss (r+1) 2) Sample Σ from f(Σ|Yobs,Y miss2) ,Sampleθ multipleIn); theΣ(r+1(r+1) caseimputation) from of normalf(Σ|Y scheme.obs ,Ydistributions,(r) miss ,θ(r+1)(r) ); (r+1)the marginal posterior distributions of θ and Σ as parametersall associated θ. Also, unknown let f(M|Y, parametersparametersψ) represent θθ.. Also,Also, (r+1)the distribution let f(M|Y, ψ ) of represent M 1)3)given Sample(r+1) the theIn distribution(r+1) full theθY(r+1)marginalupdate datacase missfrom set offrom the of f(normalindexedprobabilityθ M |Yposteriorf(Y givenmiss ,Ydistributions, by|Y theobs of distribution,,θ Σfullobserved(r+1)) ;data, Σ (r+1)the set data.)ofmarginal indexed θ andThis Σ posterior wouldby along allow with distributions implementingus to utilize of θthe and simple Σ as Gibbs sampling row iNow, and column suppose j. thatWithout f(Y|θ loss) is theof generality,distribution3) Sample we of Y canall missdata,assume from given af(Y missing allmiss associated|Yobs3) signifiedat, θ Samplerandom, Σunknown in Y mechanism steps) (1) from and f(Y (2)obs have|Ymiss closed,θ , Σforms) when conjugate priors are used. That is, we the unknown parameters ψthe. unknown parameters ψ. (r+1)algorithmIn iterationmiss to (r update +miss 1) (r)of obsthe the posterior (algorithm,r+1) distribution of θ and Σ along with implementing the Bayesian parameters(our intent θthe. hasAlso, distribution been let tof(M|Y, demonstrate ofψ) M represent given that the thein fullan distribution extremelydata set indexedof large M given data by setthethe such fullunknown2)signifiedconsider Sampledataas the set aonein Σindexedmultivariate stepsthe we from Bayesianare (1) by and f( Σnormal |Y(2) multipleobs have,Y priormiss closed imputation, θon θ forms)and; an whenscheme. inverse conjugate-Wishart priors on Σ .are It is used. also Thatstraightforward is, we If there were no missing values, using Bayes theorem, one couldIn iteration write(r+1) the (r posterior + 1) of(r+1 the ) algorithm,(r+1) (r+1)(r) (r) parameters If there were no missing values,3) consider Sample usingIn Bayesa the Ymultivariatemultiple1) case Samplemisstheorem,(r+1 offrom) normalimputation θnormal f(Yonemiss distributions,fromcould|Y prior obsscheme. f(, θwrite θ(r)on|Y obsθ, Σ theand,Y the(r) posterior miss an)marginal ,inverseΣ ); posterior-Wishart distributionson Σ. It is also of straightforward θ and Σ as theworking unknown with, parameters it is nearly ψ. impossible to seek outIn deterministicthe case of normal patterns distributions, for to missing obtainIn the the thevalues, marginal marginalcase andof normal posteriordistribution(r+1) distributions, distributions of (Ymiss the|Y (r)ofobs marginal ,θθ ,andΣ()r+1 in Σ) step asposterior (3) through distributions of θ and Σ as distribution of all parametersdistribution as of all parameters as 1) Sample 2) Sample θ from Σ f( θfrom|Yobs ,Yf(Σ|Ymiss,Σ,Y ); ,θ ); thus, Ifwe there treated wereIf the there no problem missing were noas values, onemissing of using signifiedmissing values, Bayes at inusing random) theorem,steps Bayes (1). However, andone theorem, (2)could have as write one signifiednotedtoclosed obtain couldthe in posteriorforms Little inthe steps In marginalwhen (2011),iteration (1)(r+1) conjugate and distribution (r+1) (2) have of priors the closed of algorithm, are (r)(Y missobsused. forms|Y(r+1obs Thatmiss) ,whenθ,Σ )is, in conjugate we step (3) throughpriors are used. That is, we signified 2) Sample in steps Σ (1) and from (r+1)(2) f( haveΣ|Yobs closed,Y miss forms,θ )when; (r+1) conjugate(r+1) priors are used. That is, we distributionthe mechanism of all parametershas little effect as on the Bayesian paradigm explained below. consider Indeed,In a the multivariateLittleIn3) case iterationSample (r+1) of normal normalY (r + 1)missdistributions, priorof from the on algorithm,f(Y θmiss and (r+1)the|Y anobsmarginal, (r+1)θinverse,Σ posterior-Wisha) rt ondistributions Σ. It is also of straightforward θ and Σ as write the posterior distributionconsider of all a parameters multivariate as normal priorconsider on 3)θ andSample a multivariate an inverse Y miss- Wishanormal (r+1from) rt f(Y prioronmiss Σ .|Yon Itobs isθ, θandalso(r) an ,straightforwardΣ inverse(r)) -Wisha rt on Σ. It is also straightforward recommends ignoring the missing data mechanism when it is justified. Notesignifiedto obtain that the inthe stepsfull 1) marginal Sample data (1) andset distribution θ (2) havefrom closed f(ofθ (Y|Yobsmiss forms,Y|Yobsmiss ,whenθ,,ΣΣ) in) ;conjugate step (3) through priors are used. That is, we to obtain the marginal distributionto of obtain (Y miss |Ytheobs marginal,θ,Σ) in step distribution(r+1) (3) through of (Ymiss|Y(r)obs,θ3,Σ ()r+1 in) step (3) through will be the aggregate of Y = f((Yθ,ψ|Y,M), Y =), Cf( where θx, ψπ(|Y,M)θ, ψY) x= L( C θare 2x,ψ π ,|Y)the(θ,ψ missing) x L(θ ,considerdata.ψ,|Y) a multivariate2 2) SampleIn thenormalΣ case from prior of normalf( Σon|Y θobs and distributions,,Y anmiss 3inverse, θ ) ;the- Wisha marginalrt on Σ posterior. It is also distributions straightforward of θ and Σ as obs missing missing In the case of(r+1) normal distributions, the(r+1) marginal(r+1) posterior distributions of θ and Σ as 3)signified Sample in Y stepsmiss (1) from and f(Y(2)miss have|Yobs closed,θ ,formsΣ ) when conjugate priors are used. That is, we f(θ,ψ|Y,M) = C x π(θ ,ψ) x L(θ,ψNow,,|Y)f( θ ,supposeψ|Y,M) =that C xf(Y| π(θ),ψ is) xthe L( distributionθ,ψ,|Y)2 of all data, given all associatedto obtain the unknown marginal distribution of (Ymiss|Yobs,θ,Σ) in step (3) through signified consider in steps a (1)multivariate and (2) have normal closed prior forms on θwhen and anconjugate inverse- priorsWisha arert on used. Σ. It That is also is, straightforwardwe where,parameters f(θ,ψ|Y,M)where,where, θ. Also, is the f( θ let , posterior ψ |Y,M)f(M|Y, isψ distribution, )thethe represent posterior the C distribution, distribution,isdistribution a constant Cof Cnot is Mis a dependent a givenconstant constant the notfullon not ψdependent data and set θ, πindexed( onθ,ψ ψ) isand by θ, π(θ,ψ) is 3 consider 3a multivariateIn the case normal of normal prior distributions,on θ and3 an inverse the marginal-Wisha rtposterior on Σ. It distributions is also straightforward of θ and Σ as where, f(θ,ψ|Y,M) is the posterior distribution,where, f(θ,ψ |Y,M)C is thea isconstant thefullthe unknown posteriorprior not distributiondependent thedependent parametersdistribution, full prior on distribution ψ the C. and isparameter a θ constant, π ( onθ , ψ the space,) isnot parameterthe dependent and full L(priorθ space,,ψ ,|Y)distributionon ψ isand and the L( θcompleteθ, ,π ψon(,|Y)θ ,theψ )is is- datathe complete likelihoodto obtain-Indata the thelikelihood case marginal of normal distribution distributions, of (Ymiss the|Yobs marginal,θ,Σ) in stepposterior (3) through to obtainsignified the marginal in steps distribution (1) and (2) of have (Ymiss closed|Yobs,θ forms,Σ) in whenstep (3) conjugate through priors are used. That is, we the full prior distribution on the parameterthe full prior space, distribution andwhich L( θ, ψcan ,|Y)on If bethe is there parameterwritten which theparameter completewere can as no bespace, space, missing-writtendata and andlikelihood asvalues, L( θ , ψ ,|Y) using is theBayes completecomplete-data theorem,-data one likelihood likelihood could write the posteriordistributions of θ and Σ as signified3 in steps (1) and (2) have closed consider a multivariate normal prior on θ and an inverse-Wishart on Σ. It is also straightforward which can be written as which can be written distribution as which of all can parameters be written as as forms when conjugate priors are used. That is, we consider L(θ,ψ|Y)=f(Y|θL()fθ(M|Y,,ψ|Y)=f(Y|ψ). θ)f(M|Y,ψ). to obtain the marginal distribution of (Ymiss|Yobs,θ3, Σ) in step (3) through 3 L(θ,ψ|Y)=f(Y|θ)f (M|Y,ψ). L(θ,ψ|Y)=f(Y|θ)f(M|Y,ψ). However, withHowever, incomplete with data, incomplete the full posterior data,2 the distributionfull posterior becomes distribution becomes DIMENSIONS 87 However, with incomplete data, theHowever, full posterior with incompletedistribution data,becomes the full posterior distribution becomes 3 ffull(θ,ψ|Yobs,M)ffull ( θπ,(ψθ|Y,ψobs)L(,M)θ,ψ ,|Y πobs(θ,M),ψ)L( θ,ψ,|Yobs,M) ffull(θ,ψ|Yobs,M) π(θ, ψ)L(θ,ψ,|Yobsf,M) (θ ,ψ|Y ,M) π(θ,ψ)L(θ,ψ,|Y ,M) full obs obs ∝ where L(θ,ψ,|Ywhereobs,M) L( isθ ,theψ,|Y likelihoodobs,M) is thedue likelihood to ∝observed due values. to observed Now, thevalues. observed Now, likelihood the observed is likelihood is obtained by integratingobtained byall integratingmissing values all missing out of thevalues full outlikelihood: of the full likelihood: where L(θ,ψ,|Yobs,M) is the likelihoodwhere due L( toθ,ψ ∝observed,|Yobs,M) values. is the likelihoodNow, the observeddue to ∝observed likelihood values. is Now, the observed likelihood is obtained by integrating all missingobtained values out by ofintegrating the full likelihood: all missing values out of the full likelihood: L(θ,ψ,|Yobs,M)L( = θ,ψ,|Yobs,M) = . . L(θ,ψ,|Yobs,M) = L(θ, ψ,|Y ,M) = . . obs !"# !"##"$%𝑓𝑓(𝑌𝑌!"#, 𝑌𝑌!"##"$%!"# |𝜃𝜃!"##"$%)𝑓𝑓(𝑀𝑀|𝑌𝑌!"#, 𝑌𝑌!"##"$%!"##"$%, 𝜓𝜓)𝑑𝑑𝑌𝑌!"##"$% As in Rubin (1976),As in Rubinunder𝑓𝑓( (1976),the𝑌𝑌 missing, 𝑌𝑌 under at | the𝜃𝜃random)𝑓𝑓 missing(𝑀𝑀| 𝑌𝑌mechanism, at, 𝑌𝑌 random one, 𝜓𝜓mechanism,) 𝑑𝑑can𝑌𝑌 assume one can assume !"# !"##"$% !"# !"##"$%!"# !"##"$%!"##"$% !"# !"##"$% !"##"$% As in Rubin (1976), under𝑓𝑓( the𝑌𝑌 missing, 𝑌𝑌 As in at Rubin| 𝜃𝜃random)𝑓𝑓 (𝑀𝑀 (1976),| 𝑌𝑌mechanism,, 𝑌𝑌under 𝑓𝑓( the𝑌𝑌 one, 𝜓𝜓 missing), 𝑑𝑑𝑌𝑌can𝑌𝑌 assume at| 𝜃𝜃random)𝑓𝑓 (𝑀𝑀| 𝑌𝑌mechanism,, 𝑌𝑌 one, 𝜓𝜓) 𝑑𝑑can𝑌𝑌 assume f(M|Yobs, Ymissingf(M|Y,ψ) =obs f(M|Y, Ymissingobs,ψ) = f(M|Yobs,ψ) f(M|Yobs, Ymissing,ψ) = f(M|Yobs,ψ) f(M|Yobs, Ymissing,ψ) = f(M|Yobs,ψ) for all missingfor values, all missing and values, and for all missing values, and for all missing values, and π(θ,ψ) = π(θ) π(θψ,).ψ ) = π(θ) π(ψ). π(θ,ψ) = π(θ) π(ψ). π(θ,ψ) = π(θ) π(ψ). Specifically, asSpecifically, per Hoff (2009), as per c Hoffonsider (2009), the case consider of data the modelled case of data via amodelled multivariate via a multivariate Specifically, as per Hoff (2009), considerSpecifically, thenormal case as perof distribution dataHoff normalmodelled (2009), indexed distribution c viaonsider awith multivariate the theindexed case mean of with parameterdata the modelled mean θ, and parameter via the a multivariatecovariance θ, and the Σ . covarianceSubsequently, Σ. Subsequently, the the normal distribution indexed with thenormal mean distribution parametersampling θindexed, and the probability with covariancesampling the mean of probabilitydataΣ .parameter Subsequently, for student of θ ,data and i canthe forthe be student covariance given ias can Σ be. Subsequently, given as the sampling probability of data for studentsampling i can probability be given as of data for student i can be given as P(mi,{yi,j :mi,j =P(m 1}|i,{yθ,Σi,j) :mi,j = 1}|θ,Σ) = f(mi) f({yi,j :mi,j = 1}|θ,Σ) = f(m)i . P(mi,{yi,j :mi,j = 1}|θ,Σ) P(mi,{yi,j :mi,j = 1}|θ,Σ) = f(mi) f({yi,j :mi,j = 1}|θ,Σ) = f(m)i . = f(mi) f({yi,j :mi,j = 1}|θ,Σ) = f(m)=i f(mi) f({y i,j :mi,j = 1}|θ ,Σ) = f(m)i . !,! !,! !,!. !!,!: !,!!,!!! !,!!!,!:!!,!!! !,! The main consequenceThe main of consequence this is that ( weof𝑓𝑓( 𝑦𝑦thiscan, is…integrate that, 𝑦𝑦 (we|𝑓𝑓𝜃𝜃( , out𝑦𝑦𝛴𝛴can) ,the …integrate, missing𝑦𝑦 |𝜃𝜃 ,out𝛴𝛴𝑑𝑑 data)𝑦𝑦 the) to missing obtain𝑑𝑑 datathe𝑦𝑦 ) to obtain the !,! !,! (𝑓𝑓(𝑦𝑦!,!, … , 𝑦𝑦!,!|𝜃𝜃, 𝛴𝛴) ! :! !! 𝑑𝑑𝑦𝑦!,!) !,! !,! !!,!:!!,!!! !,! The main consequence of this is thatThe we main can marginal consequenceintegrate probability out ofmarginalthe this missing isof thatprobabilityobserved data (we𝑓𝑓( 𝑦𝑦tocan data.obtain, …ofintegrate ,observed𝑦𝑦 This the|𝜃𝜃 would,out𝛴𝛴) data. the allow missing This us would 𝑑𝑑 todata𝑦𝑦 utilize) allowto obtain the us simple to the utilize Gibbs the samplingsimple Gibbs sampling marginal probability of observed data.marginal This probabilitywould allowalgorithm of us observed to to utilize updatealgorithm data. the the simpleThis posteriorto wouldupdate Gibbs distribution allowthe sampling posterior us to utilizeof distribution θ and the Σ simplealong of θ withGibbs and im Σ sampling plementingalong with imtheplementing Bayesian the Bayesian algorithm to update the posterior distributionalgorithm to of update θ andmultiple Σthe along posterior imputation withmultiple distributionimplementing scheme. imputation of theθ and scheme.Bayesian Σ along with implementing the Bayesian multiple imputation scheme. multiple imputation scheme. In iteration (r +In 1) iteration of the algorithm,(r + 1) of the algorithm, (r+1) (r+1) (r) (r) (r) (r) In iteration (r + 1) of the algorithm,In iteration (r + 1)1) of Sample the algorithm, θ 1) fromSample f(θ |Yθ obs,Y frommiss ,f(Σθ|Y);obs ,Y miss,Σ ); (r+1) (r) (r) (r+1) (r+1) (r) (r+1) (r) (r+1) (r+1) 2) Sample Σ (r)2) fromSample(r) f(Σ Σ|Y ,Y from f(,θΣ|Yobs); ,Y miss,θ ); 1) Sample θ from f(θ|Yobs,Y miss1), ΣSample); θ from f(θ|Yobs,Y miss,Σ ); obs miss (r+1) (r) (r+1) (r+1) (r+1) (r+1) (r+1) (r+1) (r+1) (r+1) 3) Sample Y 3)(r)miss Sample from(r+1) Yf(Ymissmiss|Y obsfrom,θ f(Y,Σmiss|Y)obs ,θ ,Σ ) 2) Sample Σ from f(Σ|Yobs,Y miss2), θSample); Σ from f(Σ|Yobs,Y miss,θ ); (r+1) (r+1) (r+1)(r+1) (r+1) (r+1) 3) Sample Y miss from f(Ymiss|Yobs3),θ Sample,Σ Y) miss from f(Ymiss|Yobs,θ ,Σ ) In the case of normalIn the case distributions, of normal thedistributions, marginal posterior the marginal distributions posterior of distributions θ and Σ as of θ and Σ as In the case of normal distributions, In the the marginal casesignified of normalposterior in stepsdistributions, signifieddistributions (1) and in (2)thesteps of have marginalθ (1)and closed and Σ as posterior(2) forms have when closeddistributions conjugate forms ofwhen priorsθ and conjugate Σare as used. priors That are is, weused. That is, we signified in steps (1) and (2) have closedsignified forms in steps when consider(1) conjugate and (2) a multivariate have priorsconsider closed are used. a normalforms multivariate That when prior is, conjugateonwe normal θ and prior an priors inverse on areθ and- Wishaused. an Thatinversert on is,Σ.- WishaweIt is alsort on straightforward Σ. It is also straightforward consider a multivariate normal priorconsider on θ and a multivariate an inverseto obtain-Wisha normal thert marginal toprioron obtain Σ. onIt isdistributionθ the alsoand marginal anstraightforward inverse of distribution (Y-missWisha|Y obsrt,θ onof,Σ )(YΣ in. missIt step is|Y alsoobs (3),θ ,throughstraightforwardΣ) in step (3) through to obtain the marginal distribution toof obtain (Ymiss|Y theobs ,marginalθ,Σ) in step distribution (3) through of (Ymiss|Yobs,θ,Σ) in step (3) through

3 3 3 3

f(θ,ψ|Y,M) = C x π(θ,ψ) x L(θ,ψ,|Y) where, f(θ,ψ|Y,M) is the posterior distribution, C is a constant not dependent on ψ and θ, π(θ,ψ) is the full prior distribution on the parameter space, and L(θ,ψ,|Y) is the complete-data likelihood which can be written as

L(θ,ψ|Y)=f(Y|θ)f(M|Y,ψ).

However, with incomplete data, the full posterior distribution becomes

ffull(θ,ψ|Yobs,M) π(θ,ψ)L(θ,ψ,|Yobs,M) where L(θ,ψ,|Yobs,M) is the likelihood due to ∝observed values. Now, the observed likelihood is obtained by integrating all missing values out of the full likelihood:

L(θ,ψ,|Yobs,M) = .

!"# !"##"$% !"# !"##"$% !"##"$% As in Rubin (1976), under𝑓𝑓( the𝑌𝑌 missing, 𝑌𝑌 at| 𝜃𝜃random)𝑓𝑓(𝑀𝑀| 𝑌𝑌mechanism,, 𝑌𝑌 one, 𝜓𝜓) 𝑑𝑑can𝑌𝑌 assume

f(M|Yobs, Ymissing,ψ) = f(M|Yobs,ψ) for all missing values, and π(θ,ψ) = π(θ) π(ψ).

Specifically, as per Hoff (2009), consider the case of data modelled via a multivariate normal distribution indexed with the mean parameter θ, and the covariance Σ. Subsequently, the sampling probability of data for student i can be given as

P(mi,{yi,j :mi,j = 1}|θ,Σ) = f(mi) f({yi,j :mi,j = 1}|θ,Σ) = f(m)i .

!,! !,! !!,!:!!,!!! !,! The main consequence of this is that (we𝑓𝑓( 𝑦𝑦can, …integrate, 𝑦𝑦 |𝜃𝜃 ,out𝛴𝛴) the missing𝑑𝑑 data𝑦𝑦 ) to obtain the marginal probability of observed data. This would allow us to utilize the simple Gibbs sampling algorithm to update the posterior distribution of θ and Σ along with implementing the Bayesian multiple imputation scheme.

In iteration (r + 1) of the algorithm, (r+1) (r) (r) 1) Sample θ from f(θ|Yobs,Y miss,Σ ); (r+1) (r) (r+1) 2) Sample Σ from f(Σ|Yobs,Y miss,θ ); (r+1) (r+1) (r+1) 3) Sample Y miss from f(Ymiss|Yobs,θ ,Σ )

In the case of normal distributions, the marginal posterior distributions of θ and Σ as 2 signified in steps (1) and (2) have closeda multivariate forms when normal conjugate prior priors on θ areand used. an inverse-Wishart That is, we on Σ. It is where µ is the mean level of all students, and α is the uncertainty for th consider a multivariate normal prior onalso θ andstraightforward an inverse-Wisha to obtainrt on Σthe. It marginal is also straightforward distribution of the j intercept . In multilevel modeling, αjs are partially pooled toward α σ to obtain the marginal distribution of (Y miss |Y obs , θ , Σ ) inin step (3)(3) through ( Y i,miss |Y i,obs , θ , Σ ), which would beµ proportional. The model topools the teachersconditional with distribution fewer students of the toward the mean would be proportional to the conditionalmissing! distribution values given of the the observed missing values,level the more mean than and teachers the covariance with a higherparameters. number The of main students. !!! α values given the observed values, theadvantage mean𝑓𝑓 andof this the procedure covariance is that in estimating missing values, the method is borrowing parameters.3 The main advantage ofstrength this procedure from the observedis that in values. Stage 2: Su et al. (2011) have developed a multiple imputation package in R called mi. In addition estimating missing values, the method is borrowing strength from to implementing the Bayesian missing valueTaking imputation the nested method, structure this of programthis data provides set into account, the users we fitted a the observed values. with sensitivity analyses associated withtwo the level missing varying-intercept, value predictions varying-slope to the effect thatMixed-Effects the model Su et al. (2011) have developedefficiency a multiple of imputationthe predictions package can be assessed.including We aaim teacher to compare level predictor the various (see outputs page of280 this in Gelman and Hill, in R called mi. In addition to implementingsoftware the with Bayesian the results missing obtained. value 2007). This model has a single student-level predictor x (the indicator (Yi,miss|Yimputationi,obs,θ,Σ), which method, would this be program proportional provides to the the conditional users with distributionsensitivity of thefor whether the student was Asian or Hispanic), and a single teacher missing! values given the observed values, the mean and the covariance parameters. The main !!! analyses associated with the missingModeling value predictions to the effect advantage𝑓𝑓 of this procedure is that in estimating missing values, the method is borrowinglevel predictor μ (the average math GPA of students associated with

strength fromthat the observedthe efficiency values. of the predictions can be assessed. We aim to each teacher). Thus, (Yi,miss|Yi,obs,θ,Σ), which would be proportionalStage to1: the conditional distribution of the Su et compareal. (2011) the have various developed outputs a multiple of this software imputation with package the results in R obtained. called mi. In addition missing! values given the observed values, the meanAfter and removing the covariance students parameters. with missing The information, main we ended up with 1242 students who, in turn, to implementing!!! the Bayesian missing value imputation method, this program provides the users advantage𝑓𝑓 of this procedure is that in estimating weremissing nested values, within the 45 method teachers. is borrowing We consi der three different measures of math achievement,, with sensitivityModeling analyses associated with the missing value predictions to the effect that the ! strength from the observed values. taken as the response variable for our linear model. These measurements! are students’ efficiency of the predictions can be assessed. We aim to compare the various outputs of 𝑦𝑦this!~ 𝑁𝑁 𝛼𝛼! ! + 𝛽𝛽! ! 𝑥𝑥!, 𝜎𝜎! , 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1, … , 1242 and 𝑗𝑗 = 1, … , 45, Su etStage al. (2011) 1: have developed a multiplemathematics imputation GPA, package CST in scoresRand called and mi CAHSEE. In𝑦𝑦and addition~𝑁𝑁 𝛼𝛼scores. + In 𝛽𝛽 this𝑥𝑥 ,stage, 𝜎𝜎 ,we would 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 =like1, …to ,see 1242 whether, and 𝑗𝑗 = 1, … , 45, software with the results obtained. ! ,, to implementingAfter the removing Bayesian students missing with value missing imputationthere information, exist method,s a teachers’ thiswe endedprogram effect. up Inprovides other words the! users, we! ! would! like! ! to investigate! the possibility of 𝑦𝑦 ~𝑁𝑁 𝛼𝛼 + 𝛽𝛽 𝑥𝑥 , 𝜎𝜎 , ! ! 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1, … , 1242 and 𝑗𝑗 = 1, … , 45, with sensitivitywith analyses 1242 students associated who, with in turn,the missing werestatistical nested value significance predictionswithin 45 teachers. ofto theteachers’and effect We performancesthat the !! on their!!!! students!!!! !!’ mathematics!! learning. To andand 𝑦𝑦𝑦𝑦~~𝑁𝑁𝑁𝑁 𝛼𝛼𝛼𝛼 ++ 𝛽𝛽 𝛽𝛽 𝑥𝑥𝑥𝑥,, 𝜎𝜎 𝜎𝜎 ,, 𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 𝑖𝑖==11,,……,, 1242 1242 and and 𝑗𝑗 𝑗𝑗==11,, … …,, 45 45,, Modelingefficiency of the predictions can be assessed. Weachieve aim to thatcompare goal wethe constructvarious outputs a simple of mixedthis -effects model! !with a nested! structure in terms of consider three different measures of math achievement, taken as ! !! ! ! software with the results obtained. ! ! ! ! ! ! ! student-teacher relationship. We can express𝛼𝛼! each of the𝛾𝛾 three+ 𝛾𝛾 µμmodels 𝜎𝜎as 𝜌𝜌𝜎𝜎 𝜎𝜎 the response variable for our linear model. These measurements are ! ! ! Stage 1: ! ~𝑁𝑁 ! , ! ! ! , 𝑓𝑓𝑓𝑓𝑓𝑓 𝑗𝑗 = 1, … , 45, 𝛽𝛽! 𝛾𝛾!! + 𝛾𝛾!!µμ! 𝜌𝜌𝜎𝜎!!𝜎𝜎! 𝜎𝜎! After removingstudents’ students mathematics with missing GPA, information, CST scores we andended CAHSEE up with scores. 1242 students In this who, in𝛽𝛽 turn,! 𝛾𝛾! + 𝛾𝛾! µμ! 𝜌𝜌!𝜎𝜎 𝜎𝜎 ! 𝜎𝜎! Modeling where and 𝛼𝛼 vary by teacher𝛾𝛾 + 𝛾𝛾and!!µμ is!! the𝜎𝜎 between ! ! 𝜌𝜌𝜎𝜎 t eacher𝜎𝜎 ’s correlation parameter. were nested withinstage, 45 we teachers. would like We to consi see derwhether three theredifferent exists measures a teachers’ of math effect. achievement, In where α~ 𝑁𝑁and!! β vary! by!! teacher!,! !! and ρ !is! the between! !!,! ! 𝑓𝑓𝑓𝑓𝑓𝑓 teacher’s 𝑗𝑗 = 1, … , 45, ! ! j𝛼𝛼𝛼𝛼 j! 𝛾𝛾𝛾𝛾! ++! 𝛾𝛾𝛾𝛾 µμµμ ! !𝜎𝜎𝜎𝜎 ! 𝜌𝜌 𝜌𝜌𝜎𝜎𝜎𝜎 𝜎𝜎𝜎𝜎 ! ! ! We! ! can !𝛽𝛽express this𝛾𝛾 model+ 𝛾𝛾! !byµμ substituting!! 𝜌𝜌𝜎𝜎 𝜎𝜎 the formulas 𝜎𝜎 ! !for and into the equation taken as the responseother words, variable we wouldfor our likelinear to investigatemodel. These the measurements possibility𝑦𝑦 ~𝑁𝑁 of𝛼𝛼 are statistical +student𝑋𝑋𝑋𝑋 ,𝛼𝛼 ! s 𝜎𝜎’ correlation, 𝛽𝛽 𝑓𝑓𝑓𝑓𝑓𝑓! 𝑖𝑖 =!! 1parameter.,~~ 𝑁𝑁…𝑁𝑁, 1242, 𝜌𝜌𝑎𝑎𝑎𝑎𝑎𝑎 𝑗𝑗!!=,, 1 , … ,!45! !,! !! ,, 𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓 𝑗𝑗 𝑗𝑗==11,, … …,, 45 45,, Stage 1: where is student i’s mathwherefor achievement: 𝛼𝛼 and 𝛽𝛽 (GPA, vary𝛽𝛽 𝛽𝛽byCST teacher score 𝛾𝛾,and 𝛾𝛾!or! + CAHSEE +𝜌𝜌𝛾𝛾 𝛾𝛾is!! µμtheµμ between score),𝜌𝜌𝜌𝜌𝜎𝜎𝜎𝜎 𝜎𝜎𝜎𝜎 tXeacher is the 𝜎𝜎 ’ 𝜎𝜎 sdesign correlation parameter. mathematics GPA, CST scores and CAHSEE scores. In this stage, we wouldfor likewhere :towhere see whether and and vary vary by by teacher teacher and and is is the the between between t eacherteacher’’ss! correlation correlation! parameter. parameter. After removingsignificance students withof teachers’ missing information,performancesmatrix we on endedwtheirhich students’ upconsists with 1242 ofmathematics information studentsWe who, on can student’s in express Weturn, can gender,this express model ethnicity, this by modelsubstituting English by substituting languagethe formulas the forformulas 𝛼𝛼 and for 𝛽𝛽 αintoj and the equation there exists a teachers’ effect. In other words, we would𝑦𝑦 like! to investigate the possibility! 𝛼𝛼! 𝛽𝛽of! 𝜌𝜌 were nested learning.within 45 To teachers. achieve We that consi goalder we three constructproficiency, different a simple measuresand the mixed-effects level of math of Math 𝑦𝑦achievement,! they βareWe!Wej! into in, can can the !express !isexpress equation a vector this this forof model model coefficientsy : by by substituting substituting that corresponds the the formulas formulas to the for for and and into into the the equation equation for 𝑦𝑦 : i ! ! statisticaltaken as thesignificance response variableof teachers’ for ourperformances linear model. predictorson theirThese students measurementsin X and’ mathematics is fixed are for student alllearning. teachers,s’ 𝛼𝛼𝛼𝛼 To and 𝛽𝛽𝛽𝛽 is the intercept for𝜌𝜌𝜌𝜌 student i associated with𝛼𝛼 𝛽𝛽 model with a nested structure in terms of student-teacher relationship. forfor : : ! ! ! ! !! !! achieve that goal we construct a simple mixed-effects model with a nested structure ! in terms of 𝛽𝛽 ! ! ! ! ! ! 𝛼𝛼𝛼𝛼 𝛽𝛽𝛽𝛽 mathematics GPA, CST scores and CAHSEE scores.teacher In jthis. According stage, we to would Gelman like 𝑦𝑦 and to Hillsee whether (2007)!, in usual! regression! ! ! , ! ! would! come! from! ! the ! ! ! ! student-teacherWe relationship. can express We each can of express the three each models of the as three models as !! 𝑦𝑦 =! ! 𝛾𝛾 + 𝛾𝛾 µμ + 𝜂𝜂! ! + 𝛾𝛾! + 𝛾𝛾! µμ + 𝜂𝜂! ! 𝑥𝑥 + 𝜀𝜀 . there exists a teachers’ effect. In other words, weclassical would likeleast to squares investigate estimation the possibility forHere 𝑦𝑦 𝑦𝑦each ofteacher and𝛼𝛼 but in this represent model, teacher is assumed j’s random to be randomeffects on the intercept and slope of ! ! !! ! ! ! statistical significance of teachers’ performances on their students’ mathematics learning.! To ! ! ! ! 𝛼𝛼 ! ! ! ! so that student i’s CST! scores, respectively.! !! The!! error! ! term!! ! is !iid! !! ! ! and!! is assumed to be student i’s CST! scores! !𝑦𝑦 =, respectively.𝛾𝛾! +! !𝛾𝛾 µμ The+ 𝜂𝜂 error! +term𝛾𝛾 +is𝛾𝛾 iidµμ + 𝜂𝜂 and𝑥𝑥 + is 𝜀𝜀assumed. to be achieve that goal we construct a simple mixed-effects model with a nested structureHere in t erms𝛾𝛾 µμ of and 𝛾𝛾!!! µμ ! !represent!! !! !teacher! !!! !j’s random!! effects!! !!!! on the!!!! intercept!! !! and slope of independent of𝛾𝛾 µμ and 𝑦𝑦.𝛾𝛾𝑦𝑦 The==µμ 𝛾𝛾 𝛾𝛾model++𝛾𝛾𝛾𝛾 assumesµμµμ𝛼𝛼 ++𝜂𝜂 𝜂𝜂that teacher++ 𝛾𝛾𝛾𝛾 +s+’ 𝛾𝛾effect𝛾𝛾 µμµμ s! differ++𝜂𝜂𝜂𝜂 randomly𝑥𝑥𝑥𝑥 ++𝜀𝜀𝜀𝜀. .across ! independent HereofHereHere and . and !Theand model representassumes representrepresent that teacher teacherteacher teacher! j ’sj ’sj’s srandom random’random effect seffects ! effects differeffects on randomlyon theon the the intercept intercept across and and slope slope of of student-teacher relationship.! ! ! We can! express each of the three models as student i’s CST! scores, respectively. The error term 𝜀𝜀 is iid 𝑁𝑁(0, 𝜎𝜎 ) and is assumed to be 𝑦𝑦 ~𝑁𝑁 𝛼𝛼 + 𝑋𝑋𝑋𝑋, 𝜎𝜎 , 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1, … , 1242, 𝑎𝑎𝑎𝑎𝑎𝑎 𝑗𝑗 = 1, …students,45, with different! ! ! ethnicities! ! ! !! but that CST scores increase linearly with teachers’ averaged where is studentwhere i ’sy math is student achievement i’s math (GPA, achievement CST score (GPA,, or CAHSEECST score, score), or Xstudentstudent is theintercept 𝛾𝛾idesign ’si’sµμ CST !CST !! scores scoresand!𝛾𝛾 slopeµμ, ,respectively. respectively. of student The The i’s CSTerror error scores, term term respectively.isis iid iid The and and error is is assumed assumed to to be be i independentGPA at the same of 𝛼𝛼 !fixed and!! ! 𝛽𝛽 rate!!!. The for allmodel!! teachers!!!! assumes. that teachers’ effects! differ randomly across matrix which consists of information on student’s gender, ethnicity, EnglishGPA language !at the same! !fixed𝛾𝛾𝛾𝛾 µμµμ rate for𝛾𝛾 𝛾𝛾all2 µμ µμteachers. ! ! CAHSEE score), X is the design matrix which consists of information term ε is iid N(0, ) and is assumedth to𝜀𝜀 be independent𝑁𝑁(0, 𝜎𝜎 ) of !α! and β . ! where is the mean level of all𝛼𝛼independent independentstudents, ~N µμ , 𝜎𝜎and of ofi, 𝑓𝑓𝑓𝑓𝑓𝑓 isand and the 𝑗𝑗 = uncertainty1. . , αThe …The, 45model model, for assumes assumesthe j intercept that that teacher teacher!! . In ss’’ effect effects!s !differ differj randomly randomlyj across across proficiency,𝑦𝑦 and the level of Math they are in, is a vector of coefficients that students correspondsAn with efficient different to! the way ethnicities! to estimate but that CST score swould increase be linearlyto use the with likelihood teacher sfunction’ averaged of y on student’s gender, ethnicity,! English language proficiency, and studentsthestudents The with with𝛼𝛼 model different different 𝛽𝛽 assumes ethnicities ethnicities that teachers’ but but that that CST CSTeffects score score differ𝜀𝜀s𝜀𝜀s increase increase randomly𝑁𝑁𝑁𝑁( (linearly0 linearly0,, 𝜎𝜎 𝜎𝜎 ))across with with teach teachererss’’ averaged averaged ! ! ! ! multilevel modeling, s areGPAand partially maximize at the pooledsame it with fixed toward !respect! !rate for. ! toThe! all each teachersmodel of these pools. parameters. teachers! with However, fewer maximizing the likelihood predictorswhere isin studentX and𝑦𝑦 ~is i𝑁𝑁’s fixed math𝛼𝛼 for+ achievement 𝑋𝑋𝑋𝑋all ,teachers, 𝜎𝜎 , (GPA, 𝑓𝑓𝑓𝑓𝑓𝑓 and 𝑖𝑖 = CST1 ,is …scorethe,! 1242intercept, or ,CAHSEE𝑎𝑎𝑎𝑎𝑎𝑎 for 𝑗𝑗 =student 1score),,and… , 45 imaximize associated X, is the designit with with 𝛼𝛼 !𝛼𝛼respect 𝛽𝛽 𝛽𝛽 toσ each of! these! parameters.!! However, maximizing the likelihood level of Math they are in, β is a vector𝛽𝛽 students of coefficientsµμ toward the that mean corresponds level function moreGPAGPA An in than atefficient theatstudents the the presenceteachers same same way𝜎𝜎 with fixed fixed with ofto differentrepeatedestimate rate arate high for forer measures𝛼𝛼all ethnicitiesall !number, teachers teachers 𝛽𝛽!, 𝑎𝑎𝑎𝑎𝑎𝑎 isof . but known. students. 𝜎𝜎 !that would toCST produce bescores to use biasedincrease the likelihood estimations. linearly function To of y teachermatrix jw. Accordinghich consists to ofGelman information and Hill on (2007) student’s, in! !gender,usual regression ethnicity,, English !wouldfunction language come fromin the the presence of repeated! measures is known to produce biased estimations. To ! to the predictors in X and is fixed 𝛼𝛼for all teachers, andα j𝛼𝛼[i] is theandelaborate, maximize thiswithAnAn isit efficientwith a efficientteachers’ multilevel respect way µμway averaged regressionto to to each estimate estimate of GPA thesemodel at theparameters. in samewhich! fixed st However, udentswould would rate havebe befor maximizingto to allrepeateduse use teachers. the the likelihood likelihoodmeasurements the likelihood function function of of y y classicalproficiency,𝑦𝑦 least an squaresd the level estimation of Math for they each are teacher in, isbut a vectorin this ofmodel coefficients, is assumed thatelaborate, corresponds to be this random is toa multilevel the regression! model! in which! students have repeated measurements intercept for student associated withStage teacher 2: . According! toand Gelman thereforeandand maximize maximize the Anregression it itefficient with with respect respectcoefficients way to to𝛼𝛼 toeach each , estimate are 𝛽𝛽 of ,of expected 𝑎𝑎𝑎𝑎𝑎𝑎 these these , 𝜎𝜎 parameters. parameters., and to carry! ! 2 would high However, However, correlationbe to usemaximizing maximizing the between the thethem likelihood likelihood and sopredictors that in X and is fixed for all iteachers, and is the interceptj 𝛼𝛼for studentandfunction thereforei associated in the the presence with regression of repeated coefficients measures are! !expected isα! !jknownβj to tocarry!! produceα high biasedcorrelation estimations. between To them and Taking the nested structure! elaborate,hence of thisfunction functionare data notthis set independent. in isin into athe the multilevel account,presence presence Therefore, regressionweof of repeated fittedrepeated wea modeltwo measures𝛼𝛼 considermeasures𝛼𝛼, , level 𝛽𝛽 𝛽𝛽in, , which 𝑎𝑎𝑎𝑎𝑎𝑎 varying 𝑎𝑎𝑎𝑎𝑎𝑎 ais istechnique known known 𝜎𝜎 𝜎𝜎students-intercept, to to known produce producehave asrepeated biased Restrictedbiased estimations.measurements estimations. Maximum To To teacher j. Accordingand Hill (2007),to Gelman in usual and Hillregression, (2007)𝛽𝛽, inαj wouldusual regression come from𝛼𝛼, the would classicalhence come are fromnotlikelihood independent. the function Therefore, of y and we maximize consider a it technique with respect known to each as Restricted of these Maximum varying! ! -slope Mixed-EffectsLikelihoodand model thereforeelaborate,elaborate, including Estimation the this this regression ais isteacher a (RMLE)a multilevel multilevel coefficients level for predictorregression regressionthe purpose are (seeexpected model model of page parameter in in to280 which whichcarryσ in Gelman estimation st highstudentsudents correlation have have in this repeated repeated betweensetting measurements measurements(1975 them DR.and classical leastleast squares squares estimation estimation for each for eachteacher teacher𝛼𝛼 but in but this in model this model,, is assumed αj is to beparameters. random However, maximizing the likelihood function in the and Hill, 2007). This model! Cox.hence has Partialanda andare single therefore nottherefore Likelihood, studentindependent. the the-level regression regression Biometrika). predictorTherefore, coefficients coefficients x RMLE we(the consider indicator isare are based expected expecteda fortechnique on whether the to to idea carry carryknown the of high highsynthesizing as correlationRestrictedcorrelation the Maximumbetween betweenjoint them them and and so that assumed to be random so that! 𝛼𝛼 presence of repeated measures is known to produce biased ! ! ! student was Asian or !Hispanic),Likelihoodlikelihood henceandhence a function Estimationsingleare are not not teacher independent. forindependent. (RMLE) level predictor forTherefore, Therefore, the purpose through µ (the we we consideraverage ofconsiderforming parameter math aconditionala technique technique estimationGPA of known inferencesknown in this as as Restricted setting Restrictedbased on(1975 theMaximum Maximum DR. where is the mean level of all𝛼𝛼 students, ~N µμ , 𝜎𝜎and, 𝑓𝑓𝑓𝑓𝑓𝑓 is the 𝑗𝑗 = uncertainty1, … , 45, for𝛼𝛼 the jlikelihoodth intercept function . estimations.In for To elaborate, thisthrough is a multilevelforming conditional regression inferences model in based on the students associated with eachCox. teacher). PartialLikelihoodLikelihood Thus,Likelihood, Estimation Estimation Biometrika). (RMLE) (RMLE) for RMLE!for t hethe purpose ispurpose based of ofon parameter parameterthe idea of estimation estimation synthesizing in in this thisthe setting jointsetting (1975 (1975 DR. DR. parameters that are not being estimated.! Due to the lack of independence in this case, we multilevel modeling, s are partially pooled toward! . The model pools teachersCox.Cox. with Partial Partial fewerwhich Likelihood, Likelihood,students𝛼𝛼!, 𝛽𝛽 have! ,Biometrika). Biometrika). 𝑎𝑎𝑎𝑎𝑎𝑎 repeated 𝜎𝜎! RMLE measurementsRMLE is is based based on andon the the therefore idea idea of of synthesizing synthesizingthe the the joint joint µμ! 𝜎𝜎! considerlikelihood a functionrecursive for estimation 𝛼𝛼 , 𝛽𝛽 , 𝑎𝑎𝑎𝑎𝑎𝑎technique 𝜎𝜎 through that would forming facilitate conditional estimating inferences a quasi based-likelihood on the students toward the mean level more than teachers! with a higher number of students. ! ! ! ! ! pafunctionrameterslikelihoodlikelihood (as that opposed arefunction function not to beingthe for for original estimated. likelihood! Due tothrough throughfunction), the lack forming forming of which independence conditional conditionalis an adjusted in inferences inferences this version case, based webasedof the on on the the 𝛼𝛼 𝛼𝛼 ~N µμ , 𝜎𝜎 , 𝑓𝑓𝑓𝑓𝑓𝑓 µμ 𝑗𝑗 = 1, … , 45, functionth (as opposed to the! original! likelihood! function), which is an adjusted version of the where is the mean level of all students, and is the uncertainty for the consideroriginalj interceptpapa rameterslikelihood rametersa recursive . In that that function estimation are are𝛼𝛼 not ,not 𝛽𝛽that being being, 𝑎𝑎𝑎𝑎𝑎𝑎technique accommodates estimated. estimated. 𝜎𝜎 that!! Due wouldDue for to tobias. facilitatethe the lack lack of ofestimating independence independence a quasi in in- thislikelihood this case, case, we we Stage 2: original likelihood function that!! accommodates!! !! for bias. DIMENSIONS 88 multilevel modeling, s are partially pooled toward! . The model pools teachersfunction considerconsiderTo with (as app opposed fewer lya a recursive recursiveRMLE, 4to the we estimation estimation original 𝛼𝛼used𝛼𝛼,, 𝛽𝛽 𝛽𝛽the, ,likelihood 𝑎𝑎𝑎𝑎𝑎𝑎 technique 𝑎𝑎𝑎𝑎𝑎𝑎Rtechnique package 𝜎𝜎 𝜎𝜎 function),that thatand would wouldwe relied which facilitate facilitate on is codesan estimating estimatingadj usteddescribed version a a quasi quasi in “Data -of-likelihoodlikelihood the Taking µμthe! nested structure of this data set into account,𝜎𝜎! we fitted a two level varyingTo- intercept,apply RMLE, we used the R package and we relied on codes described in “Data students toward the mean level more than teachers with a higher number oforiginalAnalysis students.functionfunction likelihood Using (as (as Regression opposed opposedfunction to toandthat the the Multilevel/Hierarchicalaccommodates original original likelihood likelihood for bias. function), function), Models” which which by Andrew is is an an adj adj Gelmanustedusted version version and of of the the varying-slope Mixed-Effects! model including a teacher! level predictor (see Analysispage 280 Usingin Gelman Regression and Multilevel/Hierarchical Models” by Andrew Gelman and 𝛼𝛼 µμ JenniferoriginaloriginalTo Hill. app likelihood lylikelihood RMLE, function wefunction used that thethat Raccommodates accommodates package and wefor for reliedbias. bias. on codes described in “Data and Hill, 2007). This model has a single student-level predictor x (the indicatorJennifer for whether Hill. the Stage 2: Analysis UsingToTo Regressionapp applyly RMLE, RMLE, and we weMultilevel/Hierarchical used used the the R R package package and and Models” we we relied relied by onAndrew on codes codes Gelman described described and in in “Data “Data student was Asian or Hispanic), and a single teacher level predictor µ (the average math GPA of Taking the nested structure of this data set into account, we fitted a two levelJennifer varyingAnalysisAnalysis Hill.-intercept, Using Using Regression Regression and and Multilevel/Hierarchical Multilevel/Hierarchical Models” Models” by by Andrew Andrew Gelman Gelman and and students associated with each teacher). Thus, Clustering varying-slope Mixed-Effects model including a teacher level predictor (see pageJennifer Jennifer280 in Gelman Hill. Hill.

and Hill, 2007). This model has a single student-level predictor x (the indicatorClusteringKmeans for whether method the student was Asian or Hispanic), and a single teacher level predictor µ (the averageKmeans math method GPA of Suppose ClusteringClustering we have a data set consisting of n observations of a random D-dimensional variable x. students associated with each teacher). Thus, th LetKmeans the D method-dimensional set of vectors µk for which k=1,…,K, be a prototype associated with the k 4 KKmeansmeans method method Supposecluster. We we aim have to a assign data set all consisting data points of to n clustersobservations as well of asa random a set of Dvectors-dimensional {µk} such variable that we x. th Let theSupposeSuppose D-dimensional we we have have seta a data dataof vectors set set consisting consisting µk for which of of n n observations observationsk=1,…,K, be of of a a prototypea random random D associatedD--dimensionaldimensional with variable variablethe k x x. . thth cluster.LetLet We the the aim D D--dimensional todimensional assign all set dataset of of points vectors vectors to µclustersµk kfor for which which as well k=1,…,K k=1,…,K as a set, ,be beof a vectorsa prototype prototype {µk }associated associated such that with wewith the the k k 4 cluster.cluster. We We aim aim to to assign assign all all data data points points to to clusters clusters as as well well as as a a set set of of vectors vectors { {µµk}k} such such that that we we

5

5 55

minimize the sum of squares of the distances of each data point to its closest vector µk (Bishop 2006). For each xn we assign a set of binary indicator variables rnk {0,1} where k = 1,…,K and r {0, 1}, where k = 1, . . . , K, such that x is assigned 0 or 1 dependent upon which cluster it is minimize the sumnj of squares of the distances of each datan point to its closest vector µ (Bishop assigned to. We next define a function known as a distortion measure∈ byk 2006). ∈ For each xn we assign a set of binary indicator variables rnk {0,1} where k = 1,…,K and rnj {0, 1}, where k = 1, . . . , K, such that xn is assigned 0 or 1 dependent upon which cluster it is ! ! assigned to. We next define a function known as a distortion measure∈ by ∈ ! 𝐽𝐽 = 𝑟𝑟!" 𝑥𝑥! − 𝜇𝜇! !!! !!! which represents the sum of the squares of the distances of each data point to its assigned vector ! ! µk. We wish to minimize J by assigning values to rnk for each xn. ! The Kmeans clustering method!" !allows! for us to choose the number of clusters that it will 𝐽𝐽 = 𝑟𝑟 𝑥𝑥 − 𝜇𝜇 output. Since we generated!! three! !!! different distributions, we ran Kmeans with three clusters. We which represents the sum of the squares of the distances of each data point to its assigned vector Hierarchicalmade these simulations clustering methodbased on a grand assumption that each cluster is uniquely represented by µ . We wish to minimize J by assigning values to r for each x . regression coefficients are expected to carry high correlationk between Thisa multivariate method clusters normal baseddistribution on the with distancenk distinct between meann and data covariance points. It structures. The Kmeans clustering method allows for us to choose the number of clusters that it will them and hence are not independent. Therefore, we consider a starts by clustering each data point as its own cluster. The distance output. Since we generated three different distributions, we ran Kmeans with three clusters. We Hierarchical clustering method technique known as Restricted Maximum Likelihood Estimationmade these simulationsbetween eachbased cluster on a grand is then assumption measured that and each the clustertwo points is uniquely closest represented by This method clusters based on the distance between data points. It starts by clustering each data (RMLE) for the purpose of parameter estimation in this settinga multivariate togethernormal distribution are joined withinto onedistinct cluster. mean The and process covariance is repeated structures. until the point as its own cluster. The distance between each cluster is then measured and the two points (1975 DR. Cox. Partial Likelihood, Biometrika). RMLE is based on the number of clusters is reduced to the desired amount. The algorithm closest together are joined into one cluster. The process is repeated until the number of clusters is Hierarchical2 clustering method idea of synthesizing the joint likelihood function for αj, βj, and α usedreduced in each to the step desired is amount. The algorithm used in each step is This method clusters based on the distance between data points. It starts by clustering each data through forming conditional inferences based on the parameters point as its own cluster. The distance between each cluster is then measured and the two points that are not being estimated. Due to the lack of independenceσ in closest together are joined into one cluster. The process is repeated until the number of clusters is

this case, we consider a recursive estimation technique thatreduced would to the desired amount. The algorithm used in each step is where𝐷𝐷 𝑟𝑟, 𝑠𝑠 D(r,s)= max represents𝑑𝑑 𝑖𝑖, 𝑗𝑗 , the dissimilarity between nodes r and s, and d(i,j) represents the distance facilitate estimating a quasi-likelihood function (as opposed to the where D(r,s) represents the dissimilarity between nodes r and s, and between i and j where object i is in cluster r and object j is in cluster s. original likelihood function), which is an adjusted version of the d(i,j) represents the distance between i and j where object i is in We used the Hclust function in R to cluster the data hierarchically. The Hclust command

original likelihood function that accommodates for bias. clusterrequires r thatand weobject select j is the in clusternumber s .of clusters we prefer. Since we generated three different 𝐷𝐷where𝑟𝑟, 𝑠𝑠 D(r,s)= max represents𝑑𝑑 𝑖𝑖, 𝑗𝑗 , the dissimilarity between nodes r and s, and d(i,j) represents the distance To apply RMLE, we used the R package and we relied on codes multivariateWe used normal the Hclust distributions, function in we R tochose cluster three the clusters. data hierarchically. That is, the process of hierarchical between i and j where object i is in cluster r and object j is in cluster s. described in “Data Analysis Using Regression and Multilevel/Hierar- Theclustering Hclust will command repeat until requires there that are weonly select three theclu sters.number of clusters We used the Hclust function in R to cluster the data hierarchically. The Hclust command chical Models” by Andrew Gelman and Jennifer Hill. requires that wewe select prefer. the Since number we generatedof clusters threewe prefer. different Since wemultivariate generated normal three different

multivariate normaldistributions, distributions, we chose we chose three threeclusters. clusters. That Thatis, the is, process the process of of hierarchical Model Based Clustering Method Clustering clustering willhierarchical repeat until clusteringthere are only will threerepeat clu untilsters. there are only three clusters. Model based clustering computes an approximate maximum for the classification likelihood: Kmeans Method

Suppose we have a data set consisting of n observations of a random Model Based Clustering Method Model Based Clustering Method D-dimensional variable x. Let the D-dimensional set of vectors μk Model based clustering computes an approximate maximum! for the th Model based clustering computes an approximate maximum for the classification likelihood: for which k=1,...,K, be a prototype associated with the k cluster. We classification likelihood: ! ! 𝐿𝐿!" 𝜃𝜃!, … , 𝜃𝜃!; 𝑙𝑙!, … , 𝑙𝑙! 𝑦𝑦 = 𝑓𝑓! (𝑦𝑦!|𝜃𝜃! ), aim to assign all data points to clusters as well as a set of vectors {μk} !!! th where li are labels classifying each observation. For example, li = k if yi belongs to the k such that we minimize the sum of squares of the distances of each ! minimize the sum of squaresminimize of the the distances sum of squares of each of data the point distances to its of closest each datavectorcomponent. point µk (Bishop to its Model closest based vector clustering µk (Bishop merges pairs of clusters according to the greatest increase in data point to its closest vector (Bishop 2006). 2006). 2006). μk the classification!" likelihood! ! !among! all possible !pairs.! ! The!! process begins by treating each data 𝐿𝐿 𝜃𝜃 , … , 𝜃𝜃 ; 𝑙𝑙 , … , 𝑙𝑙 𝑦𝑦 = 𝑓𝑓 (𝑦𝑦 |𝜃𝜃 ), minimize the sumForFor of squareseach xxnn ofwe the assign distances aFor set eachof of binary each xn we data indicatorindicator assign point a variables tovsetariables its of closest binary r nk vector indicator {0,1} µ wherek (Bishop vwherepointariables k = as l1,…,Ki itsarernk own labels{0,1} and cluster. classifying where When k = 1,…,K eachthe probability observation. and !!! model For inexample, the classification li = k likelihth ood is where li are labels classifying each thobservation. For example, li = k if yi belongs to the k 2006). rnj where {0, 1}, k= where1,...,K andk = 1 r , . . .{0, , K 1}, , such , where that k x ==n 1 is1,..., ,assigned .K, . .such , K, thatsuch0 or x1that dependentis assignedx is assigned upon ifmultivariate which 0y or belongs 1 dependentcluster normalto itthe is upon k with component. which equal -clustervolume Model it sphericalis based covariance clustering λI, the selection criterion is the sum- nj n component.n Modeli based clustering merges pairs of clusters according to the greatest increase in Forassigned each xn to.we Weassign next a definesetassigned of binarya function to. indicator We known next vdefineariables as a distortiona function rnk {0,1} measure known where∈ asby k a =distortion 1,…,Kof-squares and measure criterion.∈ by 0 or 1 dependent upon which cluster it is assigned to. We nextthe define classification a merges likelihood pairs ofamong clusters all accordingpossible pairs. to the The greatest process increase begins by in thetreating each data rnj {0, 1}, where∈ k = 1, . . . , K, such ∈ that xn is assigned 0 or 1 dependent upon which cluster it is function known as a distortion measure by point as its ownclassification cluster. When likelihood the probability among model all possible in the classificationpairs. The process likelih beginsood is assigned to. We next define a function known as a distortion measure∈ by multivariate normalby treating with equaleach -datavolume point spherical as its own covariance cluster. λWhenI, the selectionthe probability criterion is the sum- ∈ ! ! ! ! of-squares criterion.model in the classification likelihood is multivariate normal with ! ! 6 !" ! ! !" ! ! ! !𝐽𝐽 = 𝑟𝑟 𝑥𝑥 − 𝜇𝜇𝐽𝐽 = 𝑟𝑟 𝑥𝑥 equal-volume− 𝜇𝜇 spherical covariance λI, the selection criterion is the !!! !!! !!! !!! which represents the sumwhic ofh the represents squares ofthe the sum distances of !the squares of each of data the point distances to itssum-of-squares of assigned each data vector point criterion. to its assigned vector !" ! ! µk. We wish to minimizeµk 𝐽𝐽.J We =by assigningwish to𝑟𝑟 minimize values𝑥𝑥 − to𝜇𝜇 J rbynk forassigning each x nvalues. to rnk for each xn. !!! !!! Mclust is a function that combines6 model-based hierarchical which represents which Thethe represents sum Kmeans of the theclustering squares sum of ofThemethod the the Ksquares meansdistances allows clusteringof theforof each usdistances to methoddata choose point of allowseachthe to number itsdata for assigned us of to clusters clustering,choose vector thatthe expectation numberit will of clusters maximization that it will (EM) for Gaussian mixture µk. We wishoutput. to minimize Since we J bygenerated assigningoutput. three valuesSince different weto rgeneratednk distributions,for each three xn. wedifferent ran K meansdistributions, with three we ranclusters Kmeans. We with three clusters. We point to its assigned vector μ . We wish to minimize J by assigning Mclustmodels, is a function and Bayesian that combines Information model Criterion-based hierarchical (BIC) approximation clustering, expectation- Themade Kmeans these clustering simulations methodmade based theseallows onk a simulationsgrand for us assumption to choose based theonthat anumber eachgrand cluster assumptionmaximizationof clusters is uniquely thatthat (EM) iteach representedwill for cluster Gaussian is by uniquely mixture represented models, and by Bayesian Information Criterion (BIC) values to r for each x . (Fraley and Raftery 2002). output. Sincea multivariate we generatednk normal three distribution adi multivariatefferentn distributions, with normal distinct distribution we mean ran andKmeans withcovariance withdistinctapproximation three structures. mean clusters and (Fraley.covariance We and structures.Raftery 2002) . The Kmeans clustering method allows for us to choose the Most commonly, fk is the multivariate normal (Gaussian) density Øk, made these simulations based on a grand assumption that each cluster is uniquely representedMost commonly, by fk is the multivariate normal (Gaussian) density Øk, parameterized by its a multivariateHierarchical normal distribution clustering with method distinct mean and covariance structures. number of clusters thatHierarchical it will output. clustering Since wemethod generated threemean µk covarianceparameterized matrix ∑ byk, its mean μk covariance matrix Σk, Thisdifferent method clustersdistributions, basedThis onmethodwe theran Kmeans distance clusters withbetween based three on data theclusters. points.distance We It betweenstarts made by dataclustering points. each It starts data by clustering each data Hierarchicalpointthese clustering as its simulations own method cluster. basedpoint The ondistanceas itsa grand own between cluster. assumption each The clusterdistance that each is betweenthen cluster measured each cluster and the is twothen points measured and the two points This methodclosest clusters together based are on joined theclosest distance into together one between cluster. are data joinedThe points. process into oneIt is starts repeatedcluster. by clustering The until process the eachnumber is datarepeated of clusters until isthe number of clusters is is uniquely represented by a multivariate normal distribution with ! !! point as itsreduced own cluster. to the Thedesired distance amount.reduced between Theto the algorithm each desired cluster amount.used is inthen eachThe measured algorithmstep is and used the twoin each points step is k 1 ! ! ! ! ! distinct mean and covariance structures. 𝑒𝑒𝑒𝑒𝑒𝑒 − (𝑦𝑦 − 𝜇𝜇 ) k (𝑦𝑦 − 𝜇𝜇 ) closest together are joined into one cluster. The process is repeated until the number of clusters𝜙𝜙 is! 𝑦𝑦! 𝜇𝜇!, 𝑘𝑘 = 2 . reduced to the desired amount. The algorithm used in each step is det (2𝜋𝜋 𝑘𝑘)

Data generated by mixtures of multivariate normal densitiesDIMENSIONS are characterized by groups 𝐷𝐷where𝑟𝑟, 𝑠𝑠 D(r,s)= max represents 𝑑𝑑 𝑖𝑖, 𝑗𝑗 𝐷𝐷where, the𝑟𝑟 ,dissimilarity𝑠𝑠 D(r,s)= max represents𝑑𝑑 between𝑖𝑖, 𝑗𝑗 , the nodes dissimilarity r and s, betweenand d(i,j) nodes represents r and thes, and distance d(i,j) represents the distance 89 or clusters centered at the means µk , with increased density for points nearer the mean. between i and j where objectbetween i is i inand cluster j where r and object object i is j inis clusterin cluster r and s. object j is in cluster s. 𝐷𝐷where𝑟𝑟, 𝑠𝑠 D(r,s)= max represents𝑑𝑑We𝑖𝑖, 𝑗𝑗 used, the the dissimilarity Hclust function between in R nodes to cluster r and the s, anddata d(i,j) hierarchically. represents Thethe distanceHclust command We used the Hclust function in R to cluster the data hierarchically. The Hclust command , between i requiresand j where that objectwe select i is theinrequires cluster number that r andof we clusters object select jwethe is in prefer.number cluster Since of s. clusters we generated we prefer. three Since different we generated three different

Wemultivariate used the Hclust normal function distributions,multivariate in R to clusterwe normal chose the distributions,three data clusters.hierarchically. we That chose is, The thethree Hc process lustclusters. command of hierarchicalThat is, the process! of hierarchical! ! ! ! in which, D represents2 logdata( 𝐷𝐷and|𝑀𝑀 M) ≈ depicts2 log 𝑝𝑝the𝐷𝐷 k𝜃𝜃th possible, 𝑀𝑀 − 𝑌𝑌modellog 𝑛𝑛to= be𝐵𝐵𝐵𝐵𝐵𝐵 considered. requires thatclustering we select will the repeat number untilclustering of there clusters are will weonly repeat prefer. three until Sinceclu sters.there we are generated only three three clu differentsters. k In this setting, the corresponding surfaces of densities are ellipsoidal. Multivariate multivariate normal distributions, we chose three clusters. That is, the process of hierarchical normality facilitates clustering of data by changing shape, value, and orientation of each density clustering will repeat until there are only three clusters. (Fraley, Raftery, 1997). Another advantage of model-based clustering is the determination of Model Based ClusteringModel Method Based Clustering Method optimal number of clusters which is achieved by criteria such as the Bayesian Information Model based clusteringModel computes based an clusteringapproximate computes maximum an approximate for the classification maximum likelihood: for the classification likelihood: Criterion (BIC). The algorithm calculates the BIC for varying number of clusters, and selects the Model Based Clustering Method number of clusters for which the BIC is minimized. Model based clustering computes an approximate maximum for the classification likelihood:

! ! Misspecification Rate: Comparing the Three Methods via Simulation !" ! ! ! ! !" ! !!! We!! ! !start! ed by generating!! ! random!! data in R from three separate multivariate normal distributions 𝐿𝐿 𝜃𝜃 , … , 𝜃𝜃 ; 𝑙𝑙 , … , 𝑙𝑙! 𝐿𝐿𝑦𝑦 =𝜃𝜃 , … , 𝑓𝑓𝜃𝜃 (;𝑦𝑦𝑙𝑙 |,𝜃𝜃… ,)𝑙𝑙, 𝑦𝑦 = 𝑓𝑓 (𝑦𝑦 |𝜃𝜃 ), !!! with equally spread!!! means.th Subsequently, we useth d Kmeans, Hclust, and Mclust on the simulated where li are labels classifyingwhere leachi are observation.labels classifying For example, each observation. li = k if y iFor belongs example, to the li k= k if yi belongs to the k ! ! component. Model𝐿𝐿!" based𝜃𝜃!component., … clustering, 𝜃𝜃!; 𝑙𝑙!, … Model ,merges𝑙𝑙! 𝑦𝑦 based= pairs clustering of𝑓𝑓! clusters(𝑦𝑦!|𝜃𝜃! merges) according, data. pairs We to of the then clusters greatest calculate according increased what toin percentage the greatest of increase points f romin each clustering method was !!! misspecified thfrom the true scenarios. We then repeated this process while increasing the number where li arethe labels classification classifying likelihood eachthe observation. classification among all Forpossible likelihood example, pairs. among li The= k if processall yi possiblebelongs begins topairs. theby treatingThek process each beginsdata by treating each data component.point Model as its based own clusteringcluster.point When merges as t heits pairsprobabilityown cluster.of clusters model When according in the the probability classification to theof greatestdistributions model likelih increasein theood being classification is in sampled aslikelih wellood as increasing is the dimensionality of each underlying the classificationmultivariate likelihood normal among withmultivariate equalall possible-volume normal pairs. spherical withThe processequal covariance-volume begins λI ,spherical bythedistribution. treating selection covariance each criterion data λ I ,is the the selection sum- criterion is the sum- point as itsof own-squares cluster. criterion. When theof probability-squares criterion. model in the classification likelih ood is multivariate normal with equal-volume spherical covariance λI, the selectionTime criterion dependent is the sumclassification- of-squares criterion. We then applied the clustering methods on variables from the data set. Variables Math GPA and CST scores were paired on a yearly basis for analysis. The students were classified into different 6 groups6 based on the clustering in order to observe their mathematical achievement relative to other students throughout their high school years. 6 The count of students with the highest mathematics achieving cluster is represented by in which: where determine whether or not the students were

in( !the!,!!, !highest!) scoring! cluster.! ! ! ! ! 𝑦𝑦 This way 𝑖𝑖 , 𝑖𝑖 specifies, 𝑖𝑖 ∈ 1 the,0 student𝑖𝑖 was, 𝑖𝑖 ,clustered𝑖𝑖 in the highest scoring cluster in 9th grade while specifies that the student was not placed in the highest scoring cluster in 9th ! grade. Similarly, 𝑖𝑖 refers= 1 to the clustering placement of student in grade 10 and refers to the ! clustering placement𝑖𝑖 = 0 in grade 11. ! ! 𝑖𝑖 𝑖𝑖

7

Mclust is a function that combines model-based hierarchical clustering, expectation- maximization (EM) for Gaussian mixture models, and Bayesian Information Criterion (BIC) approximation (Fraley and Raftery 2002). Most commonly, fk is the multivariate normal (Gaussian) density Øk, parameterized by its mean µk covariance matrix ∑k,

! !! Results 1 3. Results 𝑒𝑒𝑒𝑒𝑒𝑒 − (𝑦𝑦! − 𝜇𝜇!) ! (𝑦𝑦! − 𝜇𝜇!) Missing Value ! ! ! 2 Data generated𝜙𝜙 𝑦𝑦 𝜇𝜇 by, mixtures𝑘𝑘 = of multivariate normal densities are. We implemented the multiple imputation approach of the previous det (2𝜋𝜋 𝑘𝑘) characterized by groups or clusters centeredMclust is ata functionthe meansMissing that combines , with Value modelsections- based on hierarchical the achievement clustering, data expect set containingation- the scaled math Data generated by mixtures of multivariate Mclustnormal is densities a function are that characterizedμ kcombines modelby groups-based hierarchical clustering, expectation- increased density for points nearermaximization the mean. (EM) for GaussianWe implemented mixture models, scoresthe multipleand of Bayesian3474 observations imputation Informatio acquired approachnn CriterionCriterion between (BIC)of(BIC) the the previous years 2005-2008 sections on the achievement Mclust is a functionor clusters that combines centered model at the-based means hierarchical µk , with increased clustering, density expect foration points- nearer the mean. maximization (EM) for Gaussian mixture models, and Bayesian Informatioapproximationapproximationn Criterion (Fraley(Fraley (BIC) andanddata RafteryRaftery set containing 2002)2002).. from the the scaled TASEL-M2 math project scores assembled of 3474 observationsby AnneMarie Conley acquired from between the years Most commonly, ffk isis thethe multivariatemultivariate normalnormal (Gaussian)(Gaussian) densitydensity ØØk,, parameterizedparameterized byby itsits approximation (Fraley and Raftery 2002). 2005k -2008 , from theUC Irvine.TASEL To -assessM2 project the quality assembledk of the imputation by AnneMarie technique, Conley we from UC Irvine. To Most commonly, f is the multivariate normal (Gaussian) densitymean صkk , covariance covarianceparameterized matrixmatrix by ∑itskk ,, k k assess the qualityconcentrated of the imputation on the longitudinal technique, data we associatedconcentrated with onmath the scaled longitudinal data ! ! ! ! ! mean µk covariance matrix ∑k, 2 log(𝐷𝐷|𝑀𝑀 ) ≈ 2 log 𝑝𝑝 𝐷𝐷 𝜃𝜃th , 𝑀𝑀 − 𝑌𝑌 log 𝑛𝑛 = 𝐵𝐵𝐵𝐵𝐵𝐵 in which, D represents data and Mk depicts the k possible modelth to be considered. scores in the same period. We employed the Gibbs algorithm of in which, D represents data and Mk depicts the k possibleassociated model with math scaled scores in the same period. We employed the Gibbs algorithm of In thisto setting, be considered. the corresponding In this setting, surfaces the of corresponding densities are ellipsoidal. surfacessection of 2 Multivariate with a normalsection 2model with a havingnormal modelnormal having conjugate normal priors.conjugate priors. !! !!! normality facilitates clustering of data by changing shape, value, and orientation of each density1 ! ! ! ! densities are ellipsoidal. Multivariate normality facilitates clusteringIn Figure 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 1,− weIn((𝑦𝑦 includeFigure! − 𝜇𝜇 ! ))1, wea !!missing include((𝑦𝑦! − 𝜇𝜇 !apattern)) missing graph. pattern The graph. graph The graph is comprise is d of four blocks (Fraley, Raftery, 1997). Another advantage! !! of model- based clustering!! !! is!! the determination of2 1 ! ! ! ! ! 𝜙𝜙 𝑦𝑦 𝜇𝜇 , 𝑘𝑘 = . optimal numberof data of𝑒𝑒𝑒𝑒𝑒𝑒 clusters by− changing( 𝑦𝑦which− 𝜇𝜇 shape,is) achieved value,(𝑦𝑦 −by 𝜇𝜇and criteria) orientation such as𝜙𝜙 theof (rows),each𝑦𝑦 Bayesian𝜇𝜇 density, each 𝑘𝑘Information = showing comprised the portion of four blocks of available (rows),. each observed showing data the portion in black, of available as opposed to the portion of ! ! ! 2 det ( (2𝜋𝜋 𝑘𝑘)) 𝜙𝜙 Criterion𝑦𝑦 𝜇𝜇 , (BIC).𝑘𝑘 (Fraley,= The algorithmRaftery, 1997). calculates Another the BICadvantage for. varying of model-based numbermissing of clusters, clustering data and in white.selectsobserved the data in black, as opposed to the portion of missing data det (2𝜋𝜋 𝑘𝑘) Data generated by mixtures of multivariate normal densities are characterized by groups number of clustersis the determinationfor which the BIC of optimal is minim numberized. of clusters which is achieved in white. Data generated by mixtures of multivariate normal densitiesoror are clustersclusters characterized centeredcentered by atat groups thethe meansmeans µkk ,, withwith increasedincreased densitydensity forfor pointspoints nearernearer thethe mean.mean. by criteria such as the Bayesian Information Criterion (BIC). The or clusters centered at the means µk , with increased density for points nearer the mean. Misspecificationalgorithm Rate: calculatesComparing the the BIC Three for varying Methods number via Simulation of clusters, and We started by generating random data in R from three separate multivariate normal distributions ,, selects the number of clusters for which the BIC is minimized. with equally spread means. Subsequently, we used Kmeans, , Hclust, and Mclust! on the simulated ! ! ! ! 2 loglog((𝐷𝐷||𝑀𝑀!)) ≈ 2 loglog𝑝𝑝 𝐷𝐷 𝜃𝜃thth!,,𝑀𝑀! − 𝑌𝑌! loglog𝑛𝑛 = 𝐵𝐵𝐵𝐵𝐵𝐵! data. We then calculated what percentage ofinin points which,which, f rom D representsrepresents each clustering datadata andand method Mkk depictsdepicts was thethe kk possiblepossible modelmodel toto bebe considered.considered. 2 log(𝐷𝐷|𝑀𝑀!) ≈ 2 log 𝑝𝑝 𝐷𝐷 𝜃𝜃th!, 𝑀𝑀! − 𝑌𝑌! log 𝑛𝑛 = 𝐵𝐵𝐵𝐵𝐵𝐵! in which, D represents datamisspecified and Mk depicts Misspecificationfrom the truek possible scenarios. Rate: model WeComparing tothen be repeatconsidered. theInIn ed thisthisThree this setting,setting, processMethods thethe while correspondingcorrespondingvia Simulationincreasing surfacessurfaces the number ofof densitiesdensities areare ellipsoidal.ellipsoidal. MultivariateMultivariate In this setting, theof corresponding distributionsWe being surfacesstarted sampled byof densitiesgenerating as well are as random ellipsoidal.increasingnormalitynormality data the Multivariatefacilitatesfacilitates in dimensionalit R from clusteringclustering three y separate of ofeachof datadata underlying byby changingchanging shape,shape, value,value, andand orientationorientation ofof eacheach densitydensity normality facilitates clusteringdistribution. of data multivariate by changing normal shape, value,distributions and(Fraley,(Fraley, orientation with Raftery,Raftery, equally of each 1997).1997). spread density AnotherAnother means. advantageadvantage ofof modelmodel--basedbased clusteringclustering iiss thethe determinationdetermination ofof optimaloptimal numbernumber ofof clustersclusters whichwhich isis achievedachieved byby criteriacriteria suchsuch asas thethe BayesianBayesian InformationInformation (Fraley, Raftery, 1997). Another advantageSubsequently, of model we-based used clustering Kmeans ,i Hclusts the determination, and Mclust onof the simulated optimal number of clusters which is achieved by criteria such as theCriterion Bayesian (BIC). Information The algorithm calculates the BIC for varying number of clusters, and selects the Time dependentdata. classification We then calculated what percentage of points from each Criterion (BIC). The algorithmWe then calculates applied the the clustering BIC for varying methods number on numbernumbervariables of clusters, ofof from clustersclusters and the forselectsdatafor whichwhich set. the Variables the the BICBIC isisMath minimminim GPAized.ized. and clustering method was misspecified from the true scenarios. We then number of clusters for whichCST thescores BIC were is minim pairedized. on a yearly basis for analysis. The students were classified into different groups basedrepeated on the clustering this process in order while to increasingobserveMisspecification their the mathematical number Rate: of Comparing distributions achievement the relative Three Methodsto via Simulation Misspecification Rate: otherComparing students thebeing throughout Three sampled Methods their as highwell via asschool Simulation increasingWe years. start theeded bydimensionalityby generatinggenerating randomrandom of each datadata inin RR fromfrom threethree separateseparate multivariatemultivariate normalnormal distributionsdistributions We started by generating randomThe data countunderlying in R of from students distribution.three with separate the highest multivariatewith mathematics equally normal spread achievingdistributions means. cluster Subsequently, is represented we use bydd Kmeans,, Hclust,, andand Mclust onon thethe simsimulatedulated with equally spread means. Subsequently, in which: we used Kmeans, Hclust where,data.data. and WeMclustWe thenthendetermine on calculatecalculate the sim whetherulateddd what or percentage not the students of points we refromrom eacheach clusteringclustering methodmethod waswas data. We then calculated what percentage of points from each clusteringmisspecified method fromwas the true scenarios. We then repeateded thisthis processprocess whilewhile increasingincreasing thethe numbernumber in the! ! highest! Time scoring Dependent cluster. Classification (! ,! ,! ) ! ! ! ofof !distributionsdistributions! ! beingbeing sampledsampled asas wellwell asas increasingincreasingth thethe dimensionalitdimensionalityy ooff eacheach underlyingunderlying misspecified from the true𝑦𝑦 scenarios.This wayWe then𝑖𝑖 ,repeat𝑖𝑖 specifies, 𝑖𝑖 ed∈ this 1 the, 0process student while𝑖𝑖 was, 𝑖𝑖 ,increasingclustered𝑖𝑖 in the the number highest scoring cluster in 9 We then applied the clusteringdistribution.distribution. methods on variables from the data th of distributions being sampledgrade whileas well as increasing specifies the that dimensionalit the student wasy of noteach placed underlying in the highest scoring cluster in 9 set. Variables! Math GPA and CST scores were paired on a yearly distribution. grade. Similarly, 𝑖𝑖 refers= 1 to the clustering placement of student in grade 10 and refers to the ! Time dependent classification clustering placement𝑖𝑖basis= 0 for in analysis. grade 11. The students were classified into different ! We then applied thethe clusteringclustering methodsmethods! onon variablesvariables fromfrom thethe datadata set.set. VariablesVariables MathMath GPAGPA andand Time dependent classification groups𝑖𝑖 based on the clustering in order to observe their mathematical𝑖𝑖 CST scores were paired on a yearly basis for analysis. The students were classified into different We then applied the clustering methodsachievement on variables relative from the to dataother set.CST students Variables scores throughout were Math paired GPA their on and a high yearly basis for analysis. The students were classified into different CST scores were paired on a yearly basis for analysis. The studentsgroups groupswere classified basedbased onon into thethe clusteringclusteringdifferent inin orderorder toto observeobserve theirtheir mathematicalmathematical achievemachievementent relativerelative toto school years. groups based on the clustering in order to observe their mathematicalotherother achievem studentsstudentsent throughoutthroughout relative to theirtheir highhigh schoolschool years.years. other students throughout their high schoolThe years. count of students with the Thehighest count mathematics of students withachiev the- highest mathematics achieving cluster is represented by 7 The count of students with theing highest cluster mathematics is represented achieving by cluster inin is which: which:represented by Figure 1. MP where plot used with determinedetermine theFigure mi 1 .package MP whetherwhether plot used for with oror R. the notnot Themi thepackagethe black studentsstudents for R.part The wes weblack ofrere theparts plotof the areplot are the the available data while the white available data while the whiteare are the the missing missing data. data. in which: wherewhere determine whether whetherinin the!the! !!or highesthighestor!! not not the the scoringscoring students students cluster.cluster. we werere in the ((!! ,,!! ,,!! )) !! !! !! !! !! !! th highest scoring cluster. 𝑦𝑦 This way 𝑖𝑖𝑖𝑖 ,,𝑖𝑖𝑖𝑖 specifies,specifies,𝑖𝑖𝑖𝑖 ∈ 1 thethe,,0 studentstudent𝑖𝑖 𝑖𝑖 waswas,,𝑖𝑖𝑖𝑖 , ,clusteredclustered𝑖𝑖𝑖𝑖 inin thethe highesthighest scoringscoring clustercluster inin 99th in( !the!,!!, !highest!) scoring! cluster.! ! ! ! ! 𝑦𝑦 𝑖𝑖 , 𝑖𝑖 , 𝑖𝑖 ∈ 1,0 𝑖𝑖 , 𝑖𝑖 , 𝑖𝑖 grade while specifiesth that the student was not placed in the highest scoring cluster in 9thth This way specifies the studentThis was way clustered i1 = 1 specifies in the highestgrade the student whilescoring clusterwas clustered specifies in The9 in thatscaled the the highest student math scoreswasThe notscaled ofplaced math2008 in scores theare highest on of the2008 scoring top are blockon cluster the topfollowed in block 9 followed by those by those of 2007, 2006, 2005. It is !! th th grade.grade. Similarly,Similarly, 𝑖𝑖𝑖𝑖 refers=refers1 toto tthehe clusteringclustering placementplacement ofof studentstudent inin gradegrade 1010 andand refersrefers toto thethe grade while specifies that the studentscoring was cluster not inplaced 9 grade in the while highest i = scoring 1 specifies!! cluster that inapparent 9the student from was Figureof 2007, 1 that 2006, a 2005.considerable It is apparent portion from ofFigure the 1response that a considerable variable is missing over the ! 1 𝑖𝑖𝑖𝑖 = 0 grade. Similarly, 𝑖𝑖 refers= 1 to the clustering placement of student in gradeclusteringclustering 10 and placementplacement threfers in into grade gradethe 11.11. ! not placed in the highest scoring cluster in 9 grade.!! period Similarly, of i study. portion of the response variable is missing!! over the period of study. clustering placement𝑖𝑖 = 0 in grade 11. 𝑖𝑖𝑖𝑖 2 𝑖𝑖𝑖𝑖 ! refers to the clustering placement of student! in grade 10 and refers In Figure 2, we show the posterior distribution of scaled math 𝑖𝑖 𝑖𝑖 Ini3 Figure 2, we show the posterior distribution of scaled math scores obtained from the to the clustering placement in grade 11. Gibbs sampling schemescores obtained of section from 2. the Gibbs sampling scheme of section 2.

77 DIMENSIONS 7 90

8

Histogram of mathss0 Histogram of mathss3 Histogram of mathss6 1500 1500 1500 1000 1000 1000 Frequency Frequency Frequency 500 500 500 0 0 0

200 400 600 200 400 600 200 400 600

mathss0 mathss3 mathss6

Histogram of mathss0 Histogram of mathss3 Histogram of mathss6 Histogram of mathss8 Histogram of mathss9 1500 1500 1500 1500 1500 1000 1000 1000 1000 1000 Frequency Frequency Frequency Frequency Frequency 500 500 500 500 500 0 0 0 0 0 200 400 600 200 400 600 200 400 600 200 400 600 200 400 600

mathss0 mathss3 mathss6 mathss8 mathss9

Figure 2. Histograms of the filled out data sets generated from the Gibbs Sampler method. Figure 2. Histograms of the filled out data sets generated from the Gibbs Sampler method. Histogram of mathss8 Histogram of mathss9 It is quite important to note that these distributions are realized 600 throughIt is quite borrowing important strength to note from thatthe information these distributions of the full dataare realized through borrowing strength from 1500 1500 (observedthe information and missing) of theas manifested full data (observed by the hierarchical and missing) algorithm as manifested by the hierarchical ofalgorithm section 2.1. of Finally, section in Figure 2.1. Finally, 3, we demonstrate in Figure the 3, wemean demonstrate of the mean of each of the posterior 1000 1000

500 eachdistributions of the posterior of the distributions previous ofgraph. the previous graph.

Frequency Frequency The decline in the mean scores is not surprising, especially

500 500 considering that we followed students progressively from grade 6 to grade 11 meaning that, in each wave of the study, the students 400 0 0 would have to deal with a more challenging set of mathematical 200 400 600 200 400 600 problems. The Bayesian methodology applied in this work is

mathss8 mathss9 quite robust and is computationally inexpensive. This approach mean_scaled_score 300 is considerably more reliable than ad-hoc methods such as data Figure 2. Histograms of the filled out data sets generated from the Gibbs Sampler method. deletion and estimating missing values with averages.

200 Modeling It is quite important to note that these distributions are realized through borrowing strength from As seen in Figure 4, teachers’ performances play a major role in the information of the full data (observed and missing) as manifested by the hierarchical students’ mathematics achievement. algorithm of section 2.1. Finally, in Figure 3, we demonstrate the mean of each of the posterior 0 1 2 3 4 We note that for some teachers there is a significant overlap distributions of the previous graph. between the three types of intervals of interest, while this pattern wave is easily distorted for some other teachers. Similarly, we could not establish a general pattern of preference for one measure Figure 3. A plot comparing the mean values generated by the mi package in R and the Gibbs sampler method. mi is Figure 3. Ain plot red comparing while the the mean Gibbs values generated method by theis inmi packageblue. in R and of achievement in comparison to the others. That is, for a9 given the Gibbs sampler method. mi is in red while the Gibbs method is in blue. teacher we can have students performing better in CST as opposed The decline in the mean scores is not surprising, especially consideringto that GPA we and followed CAHSEE, whereas for another teacher students performances for CAHSEE is shown to be superior. students progressively from grade 6 to grade 11 meaning that, in each wave of the study, the Figure 5 demonstrates that Asian students universally scored students would have to deal with a more challenging set of mathematical problehigherms. Theon the CST than Hispanic students. Bayesian methodology applied in this work is quite robust and is computationally inexpensive. This approach is considerably more reliable than ad-hoc methods such as data deletion and estimating missing values with averages. DIMENSIONS 91

Modeling As seen in Figure 4, teachers’ performances play a major role in students’ mathematics achievement. 9

10

FigureFigure 4. A plot 4. A of plot 12 of selected 12 selected teachers teachers (6 from (6 fromthe high the highperformance performance group group and 6and from 6 from the low the performancelow perfor- group).FigureFigure 6. A 6. plot A plot of theof the same same 12 12 selected selected teachers. teachers. The The x x-axis-axis demonstrates CSTCST scores scores taken taken from from wave wave 3 3 of the The x-axis demonstrates the percent change from the overall mean. The y-axis represents teacher id. Different colors mance group). The x-axis demonstrates the percent change from the overall mean. The y-axis represents study. The y-axis represents teacher id. Different colors are used to show pairings of ethnicity and gender. Intervals show a specific measure of achievement. The intervals are obtained by adding/subtracting one standard deviation of the study. The y-axis represents teacher id. Different colors are used to show pairings of ethnicity and teacher id. Different colors show a specific measure of achievement. The intervals are obtained by adding/ are obtained by adding/subtracting one standard deviation from the parameter estimate in the mixed-effect model. from the parameter estimate in the mix-effect model. Clearly teachers for whom there is a significant overlap gender. Intervals are obtained by adding/subtracting one standard deviation from the parameter estimate betweensubtracting the intervals one standard have demonstrated deviation from higher the consistency. parameter estimateAlso intervals in the onmix-effect the positive model. side Clearlyof the xteachers-axis can be in the mixed-effect model. for whom there is a significantperceived overlap between as effective the intervals mathematics have learning. demonstrated higher consistency. Also Mathematical achievement does not have a unique definition. As such, we see that intervals on the positive side of the x-axis can be perceived as effective mathematics learning. different measures yield different results indicating that the entire picture cannot be seen by looking at one dimension of mathematical achievement. Teachers’ effect on mathematical We note that for some teachers there is a significant overlap between the three types of intervalsachievement Mathematical cannot be ignored achievement. does not have a unique definition. As of interest, while this pattern is easily distorted for some other teachers. Similarly, we could not such, we see that different measures yield different results indicating establish a general pattern of preference for one measure of achievement in comparison to the Figure 6. A plot of the same 12 selected teachers. The x-axis demonstrates CST scores taken from wave 3 of the others. That is, for a given teacher we can have students performing better in CST as opposed to thatstudy. the The yentire-axis represents picture teacher cannot id. Different be colors seen are used by to looking show pairings at of one ethnicity dimension and gender. Intervals of Clusteringare obtained by adding/subtracting one standard deviation from the parameter estimate in the mixed-effect model. GPA and CAHSEE, whereas for another teacher students performances for CAHSEE is shown tomathematical achievement. Teachers’ effect on mathematical achieve- be superior. We noticeMathematical from Table achievement 1 that does the Kmeansnot have a method unique definition. yields a Assignificant such, we see number that of Figure 5 demonstrates that Asian students universally scored higher on the CST thanmisclassified mentdifferent cannot datameasures points be yield ignored. in different the control results case. indicating that the entire picture cannot be seen by Hispanic students. looking at one dimension of mathematical achievement. Teachers’ effect on mathematical achievementScenario cannot I be ignored. Scenario II Scenario III Clustering Centers far apart Centers closer together Centers closest

MethodWe notice3 D from6 D Table 101 that D the3 DKmeans 6 Dmethod 10 yields D a3 significantD 6 D 10 D K-meansClustering 0 24.85 14.76 .067 24.95 14.88 26.03 19.22 10.19 numberWe of notice misclassified from Table 1 that data the Kmeanspoints method in the yields control a significant case. number of Hclustmisclassified 0 data0 points in the0 control case..033 0 0 38.77 42.57 26.47 Mclust 0 0 0 0 0 0 27.63 19.8 10.08 Table 1 . MisclassificationScenario rates, I in percentages, Scenariofor each testedII scenario pertainingScenario to 3, III 6, and 10-dimensions. Centers far apart Centers closer together Centers closest It is worthMethod noting 3 Dthat sometimes6 D 10 theD Kmeans3 D method6 D would10 D misclassify3 D entire6 D distributions10 D by generatingK-means artificially 0 large24.85 or sm14.76all number.067 of clusters.24.95 Kmeans14.88 26.03performs 19.22 better in10.19 high Hclust 0 0 0 .033 0 0 38.77 42.57 26.47 dimensionsMclust as well0 as high0 er number0 of 0clusters 0that are 0close together.27.63 Additionally,19.8 10.08 under 11 Table 1. Misclassification rates, in percentages, for each tested scenario pertaining to 3, 6, and 10-dimensions. FigureFigure 5. 5. A A plotplot of of thethe same12 same12 selected selected teachers. teachers. The The x- x-axisaxis demonstrates demonstrates CST CST scores scores taken taken from from wave wave 3 of 3 the Table 1. Misclassification rates, in percentages, for each tested scenario pertaining to 3, 6, and 10-dimensions. study. The y-axis represents teacher id. Red shows the scores for Vietnamese students while blue show the scores foofr Hispanic the study. students. The y-axis The represents intervals are teacher obtained id. Redby adding/subtracting shows the scores onefor Vietnamese standard deviation students from while the blueparameter It is worth noting that sometimes the Kmeans method would misclassify entire distributions by show the scores for Hispanic students.estimate The intervals in the aremix obtained-effect model. by adding/subtracting one standard generating artificially large or small number of clusters.13 Kmeans performs better in high deviation from the parameter estimate in the mix-effect model. Itdimensions is worth as noting well as high thater number sometimes of clusters the that areKmeans close together. method Additionally, would under misclassify entire distributions by generating artificially large or small It is also worth noting that in most cases, the intervals did not overlap indicating a significant differenIt is alsoce in worthscores for noting a given that teacher. in most cases, the intervals did not overlap number of clusters. Kmeans performs13 better in high dimensions as indicatingFigure 6 a, takes significant gender into difference account and hintsin scores that there for maya given be more teacher. significant well as higher number of clusters that are close together.Additionally, differencesFigure between 6, takesthe CST gender scores for into Asian account males and and females hints compared that there to those may of be Hispanic males and females. under scenario III (see Figure 7) where underlying distributions are more significant differences between the CST scores for Asian males taken to be very close, we found Hclust consistently performing and females compared to those of Hispanic males and females. worse than the other two methods.

DIMENSIONS 92

12

scenario III (see Figure 7) where underlying distributions are taken to be very close, we found Hclust consistently performing worse than the other two methods.

scenario III (see Figure 7) where underlying distributions are taken to be very close, we found Hclust consistently performing worse than the other two methods.

Figure 7. Scenarios generated using Mclust for distributions of different means of 6-dimensional Gaussians. Scenario I is realized under the assumptions that the means of the distributions are 10 units apart.Figure Scenario 7. II is Scenariosthe case when the generated means are 4 units using apart and, Mclust finally, in for scenario distributions III, the means are takenof different to be 1 unit apart. means In the figure, of 6 each-dimensional color represents its Gaussians. own cluster. Clearly, as the means Scenariobecome closer I tois each realized other, the clustersunder are thealso closer assumptions together allowing that for more the misclassification. means of In the scenario distributions I, the distributions areare clustered 10 unit perfectlys apart. while in scenarioScenario III, the clustersII is theoverlap. case when the means are 4 units apart and, finally, in scenario III, the means are taken to be 1 unit apart. In the figure, each color represents its own cluster. Clearly, as the means becomeConclusion closer to each other, the clusters are also In general, Mclust performs quite well. However, when the In this work, we considered three approaches for the analysis of closer together allowing for more misclassification. In scenario I, the distributions are clustered perfectly while in Figure 7. Scenarios generated using Mclust for distributions of different means of 6-dimensional Gaussians. predeterminedScenario I is realized under number the assumptions of clusters that the means grows, of the distributions scenariothe computational are 10 unitIII,s apart. the Scenario clusters II is the overlap.TASEL-M2 data. Even though these methods, namely, missing data complexitycase when the means of Mclust are 4 units increasesapart and, finally, significantly. in scenario III, the means are taken to be 1 unit apart. In the imputation, modeling, and clustering are addressing different aspects figure, each color represents its own cluster. Clearly, as the means become closer to each other, the clusters are also closerA together solid allowing majority for more ofmisclassification. the students In scenario in I, our the distributions data set are wereclustered consistentlyperfectly while in of measuring the effect of explanatory variables on the response, In general, Mclust performsscenario III, the quite clusters overlap. well . However, when the predetermined number of clusters classified in the lower achieving group throughout the three years they share the common objective of identifying those students whose grows,In general, the Mclust computational performs quite well. H owevercomplexity, when the predetermined of Mclust number ofincreases clusters significantly. ofgrows, the the study. computational There complexityis a noticeable of Mclust increasesdecrease significantly. in performance between mathematical performance was significantly higher (lower) than students AA solid classifiedsolid majority majority of theas studentsbeing ofin our thethe data highest setstudents were consistentlycluster, in comparedclassifiedour data in the to setlower those were others. consistently In this process, classified we encountered in the lower a number of technical and achieving group throughout the three years of the study. There is a noticeable decrease in achievingstudentsperformance who betweengroup were students throughout not. classified We can as being observe the in thethree highestfrom cluster,yearsTable compared 2 ofthat the groupsto those study . Therecomputational is a noticeable challenges, decrease mainly stemmed in from the fact that the students who were not. We can observe from Table 2 that groups with smaller counts pertain to performancewithhigher smaller precision. counts between pertain students to higher classified precision. as being in the highestunderlying cluster, data setcompared had an extremely to those complex structure. This was the main motivation for our simulation studies. In the future, we plan to students who were not.GPA We can observeMathss from averaged Table 2 that groups with smaller counts pertain to higherTrajectory precision Count Freq.. Mean Median Standard Mean Median Standard tackle the same set of problems from a more theoretical perspective. deviation deviation 1 1 1 20 .019 3.774 3.917 .316 444.9 440.2 25.567 1 1 0 18 .017 3.488 3.5 .308 410.5 406.2 20.710 Acknowledgments 1 0 1 4 .004 2.688 2.688 .452GPA 434.7 438.7 24.160 We would Mathsslike to sincerely averaged thank: the Mathematics Department 0 1 1 7 .007 3.446 3.5 .512 422.6 426.3 9.409 Trajectory1 0 0 32 Count.030 2.463Freq. 2.464 .546Mean 375.2 Median376.2 22.660 StandardSummer ResearchMean ProgramMedian for providing Standard us with the opportunity 0 1 0 41 .038 3.328 3.375 .327 374.3 373.3 18.755 0 0 1 1 .001 2.75 2.75 NA 387.3 387.3 NA deviationand funding for this research project; deviationDr. Sam Behseta for his 0 0 0 947 .885 1.806 1.75 .738 306.1 302.7 32.890 professional guidance, sparking our interest in research; Dr. David 1 Table1 2. Time1 Dependent20 Clustering; showing.019 the count, frequency, 3.774and the mean, median, 3.917standard deviation .316 444.9 440.2 25.567 1 Tableregarding1 2. Time Math0 Dependent GPA 18and CST Clustering; scores (mathss9, showing.017 mathss10, the count, mathss11) frequency,3.488 averaged and for the grades mean, 9-3.5 11median, for each standard trajectory. .308Pagni for providing410.5 insight406.2 regarding the20.710 data set; Dr. AnneMarie deviation regarding Math GPA and CST scores (mathss9, mathss10, mathss11) averaged for grades 9-11 Conley and Nayssan Safavian for their enthusiastic direction on the forConsequently, each trajectory. data summaries in those groups can be used for a more precise estimation of the 1 parameters0 1 of the 4underlying population.004 from which this 2.688data is extracted. 2.688 .452project. We434.7 would also like438.7 to thank Selene24.160 Black, Antouneo Kassab, 0 1 Consequently,1 7 data summaries.007 in those3.446 groups can3.5 be used for a.512 Jimmy Kwon,422.6 and Calvin426.3 Pham for their9.409 contributions to the modeling 1 more0 precise0 estimation32 of the.030 parameters 14 2.463 of the underlying2.464 .546portion of 375.2this project as376.2 well as Reina22.660 Galvez for her contributions to 0 population1 0 from41 which this data.038 is extracted.3.328 3.375 .327the clustering374.3 part of the373.3 project. 18.755 0 0 1 1 .001 2.75 2.75 NA 387.3 387.3 NA 0 0 0 947 .885 1.806 1.75 .738 306.1 302.7 32.890 DIMENSIONS 93 Table 2. Time Dependent Clustering; showing the count, frequency, and the mean, median, standard deviation regarding Math GPA and CST scores (mathss9, mathss10, mathss11) averaged for grades 9-11 for each trajectory.

Consequently, data summaries in those groups can be used for a more precise estimation of the parameters of the underlying population from which this data is extracted.

14 References Bishop, C.M. (2006). Pattern Recognition and Machine Learning. Rubin DB. (1976). Inference and missing data (with discussion). Springer, New York, NY. Biometrika Vol 63, 581-592.

Fraley, C., Murphy, T.B., Raftery, A.E., Scrucca, L. (2012). Mclust Ver- Rubin DB (1978). Multiple imputations in sample surveys. sion 4 for R: Normal Mixture Modeling for Model-Based Clustering, In Proceedings of Survey Research Section 20-34, American Statistical Classification, and Density Estimation. Seattle, WA. Association, Alexandria, VA.

Laird NM and Ware JH. (1982). Random-Effects Models for Rubin DB (1987). Multiple imputation for nonresponse in surveys. Longitudinal Data. Biometrics. Vol 38, 963—974. Wiley, New York.

Little RJA (2011) Calibrated Bayes, for statistics in general, and Sanders W, Saxton A, and Horn B. (1997). The Tenessee value-added missing data in particular (with discussion). Statistical Science. system: a quantitative outcones-based approach to educational Vol 26, No 2, 162-174. assessment. In J. Millman (Ed.), Grading teachers, grading schools: Is student achievement a valid evaluational measure? (137-162). Crownin Little RJA, Rubin DB (1987). Statistical Analysis with missing data. Press, Thousands Oaks, CA. Wiley, Hoboken, N.J., 1st edition. Spiegelhalter D, Thomas A, and Best N. (1997). WinBUGS: Bayesian Little RJA, Rubin DB (2002). Statistical Analysis with missing data. inference using Gibbs Sampling (Technical Report). Biostatistics Unit. Wiley, Hoboken, N.J., 2nd edition. Cambridge, UK:

Lockwood JR, McCaffrey DF, Mariano LT, Setodji C. (2007). Bayesian Su Y, Gelman A, Hill J, Yajima M. (2011) Multiple imputation with methods for scalable multivariate value-added assessment. Journal of diagnostics (mi) in R: opening windows into the black box. Journal of Educational And Behavioral Statistics. Vol 32. 125-150. statistical software. 45(2), 1-31.

Pinheiro J C and Bates DM.(2000). Mixed-Effects Models in S and Tanner MA, Wong WH. (1987) The calculation of posterior distributions S-PLUS. New York: Springer. by data augmentation (with discussion). Journal of American Statistical Association. Vol 82, 528-550.

DIMENSIONS 94 Scattered Light Measurements for Advanced LIGO’s Output-Mode-Cleaner Mirrors

Department of Physical Sciences, California State University, Fullerton

Adrian Avila Alvarez

Abstract Abstract: The Advanced Laser Interferometer Gravitational-Wave Observatory (LIGO), with sites in Livingston, LA and Hanford, WA, and its international partners, Virgo, GEO600, and KAGRA, are being built to detect gravitational waves, a phenomenon theorized by Einstein in his Theory of General Relativity (GR). By direct prediction of GR, gravitational waves are ripples in space-time that propagate at the speed of light and are created by violent astro- physical processes. The gravitational-wave detectors are all based on the Michelson interfer- ometer, which has an input laser, a beam splitter, and two perpendicular arms with mirrors at each end. However, their configurations have significantly more complexity to augment the sensitivity. Higher order spatial modes can create ‘junk light’ that decreases the shot-noise limited sensitivity of the detectors. To combat this, each LIGO detector has an Output Mode Cleaner (OMC) at its detection ‘dark port.’ Scattered light from the OMC mirrors can reduce the shot-noise limited sensitivity of the instruments, and add noise via stray and count- er-propagating light. Thus, it is important that the light scattering from the OMC mirrors in Advanced LIGO be minimal. This paper will describe measurements of the scattered light from sample Advanced LIGO OMC mirrors.

DIMENSIONS 95 Authors Adrian Avila Alvarez

Adrian is currently a junior physics major at CSUF and a member of both the Society of Advancing Chicanos and Native Americans in Science (SACNAS) and the Gravitational-Wave Physics and Astronomy Center (GWPAC). As part of GWPAC, Adrian conducts research under the supervision of Dr. Joshua Smith. As part of his research, Adrian measures how much light is being scattered on an optic’s surface in attempt to contribute towards the detection of gravitational waves, which are a thousand times smaller than the diameter of a proton. Adrian has had the opportunity to present his research findings at numerous meetings, symposiums, and conferences. In October of 2013, he presented his research at the SACNAS National Conference in San Antonio, TX. Adrian plans to continue his gravitational-wave research while pursuing a PhD program. With a passion for working with equations, gravitational-wave research, and physics, expect to read much more about Mr. Adrian Avila-Alvarez.

Neha Ansari

Neha Ansari is a third year biochemistry major minoring in print journalism. Ansari currently serves as the Chemistry and Biochemistry Section editor for Dimensions. At Cal State Fullerton, Ansari conducts research in Dr. Peter de Lijser’s organic chemistry lab where she studies the use of a carbonyl group as a nucleophile in photosensitized electron transfer reactions of oxime ethers. In the summer of 2013, Ansari was selected as an Amgen Scholar at the University of Washington, where she investigated the development of micelles in self-immolative polymers for the purpose of drug delivery in Dr. AJ Boydston’s organic chemistry lab. Ansari is a member of the ASI Board of Directors representing the College of Natural Sciences and Mathematics, the University Honors Program, and is a President’s Scholar.

Aaron Justin Case

“Aaron Justin Case is a geology enthusiast who is soon to graduate this summer with a Bachelor of Science degree Geology. He enjoys and participates in meetings and conferences to expand his knowledge and love for geology. His participation in theses conferences is indicative of his passion to learn. His research includes studies of geochemical correlation of basalts in northern Deep Springs Valley, California. Methods to correlate the basalts include X-Ray Fluorescence Spectroscopy (XRF). His research has been published in the GSA Abstracts with Programs Vol. 46, No. 5 and presented at the at the 2014 GSA Joint Rocky Mountain/Cordilleran Section Meeting. It is titled Geochemical Correlation of Basalts in Northern Deep Springs Valley, California, by X-Ray Fluorescence Spectroscopy (XRF). His research has made him proficient with the use of X-ray Fluorescence Spectroscopy, lab work, and geochemical understanding of basalt as well as associating rocks. He actively leads and arranges geology club hikes, areas includes corundum explorations in the San Gabriel Mountains and Salton Sea trips. Aaron is quite the personality and strives to succeed in his field.”

Jennifer Castillo

Jennifer Castillo is hoping to graduate this spring with a Bachelor’s degree in Chemistry with a minor in Criminal Justice. She joined Dr. Rogers’ lab to research a method to synthesize anhydrous cyclopropanol. This method involved Schlenk techniques, vacuum filtration, extraction, rotary evaporation, 1H-NMR, and infrared spectroscopy. She hopes to have a career in forensic science. The experience she gained using the different instrumentation at CSUF and in Dr. Rogers’ lab has been an extremely rewarding and beneficial to her career hopes.

DIMENSIONS 96 Katherina Chua

Katherina Chua is a senior undergraduate majoring in Biochemistry. She started her undergraduate research experience in fall of 2012 and has participated in multiple research programs on campus and off campus, including Research Careers Preparatory Program in the fall of 2011, Minority Health and Health Disparities International Research Training Program in Buenos Aires, Argentina during the summer of 2012, and UCSF Summer Research Training Program during the summer of 2013. Katherina is currently conducting research under the guidance of Dr. Michael Bridges as a Minority Access to Research Careers (MARC) scholar studying the oligomeric structure of the intrinsically disordered protein stathmin and measuring interspin distances using electron paramagnetic resonance spectroscopy. Through this research, she is able to verify that stathmin exists as an oligomer in solution via site-directed spin labeling technique. In her multitude of undergraduate research experience, Kather- ina has decided to continue her pursuit of a research career and has applied to PhD graduate schools that focus on combining pharmaceutical sciences and biophysics. After graduating this May, she will be entering these PhD programs next fall.

Ashley Chui Ashley Chui is a fourth-year undergraduate majoring in Biochemistry. She began her undergraduate experience in the spring of 2012 through the Research Careers Preparatory Program under the guidance of Dr. Michael Bridges studying the character- ization of aggregates of the intrinsically disordered protein, stathmin. She was able to exhibit the reversibility of the aggrega- tion of stathmin in its native environment as well as show that stathmin demonstrates a concentration-dependent behavior. Currently as a HHMI Two-Year Scholar, her project has evolved into the verification of the oligomerization of stathmin through a number of techniques, one being site-directed spin labeling electron paramagnetic resonance. Through Ashley’s experience in the Bridges lab, she has developed a strong interest in biophysics and would like to continue conducting research in biophysical chemistry for graduate school, which she hopes to attend in the fall of 2015.

Kim Conway

I am a graduating senior double-majoring in Biology and Health Science. I am currently working in Dr. Paul Stapp’s Vertebrate Ecology and Conservation Biology laboratory studying the prevalence of bot fly infestation in thirteen-lined ground squirrels in northern Colorado, while also attempting to identify the species of bot fly through molecular genetics techniques. I am a BURST (Biology Undergraduate Research Scholar Training) scholar at CSUF, and have been working as a veterinary assistant for 4 ½ years. I will be attending the UC Davis School of Veterinary Medicine next fall to obtain a DVM degree with a focus on wildlife medicine.

Jeremy Cordova

Jeremy Cordova is a graduating Geology student interested in structural geology and how it relates to geologic hazards and mineral deposits. For the last two years he has been working extensively in the field at Los Penasquitos Lagoon, in San Diego County, analyzing a potential paleotsunami deposit. As a member of multiple geological organizations Jeremy plans to become highly involved in his field. Graduating in the Fall of 2014 he will remain in the West working in geologic hazard assessment or the mineral industry, while pursuing a Master’s Degree in Structural Geology.

DIMENSIONS 97 D’Lisa O’hara Creager

D’lisa O’hara Creager is an undergraduate student that will be finishing up this summer with a Bachelor of Science degree in Geology. She has been working with Dr. Brady Rhodes on her senior thesis project studying paleotsunami deposits in Seal Beach, CA. Dr. Rhodes also advised her in completing a hydrogeochemical study done in Chiang Mai, Thailand during her participation in the Environmental Science Research in Thailand (ESRT) program in the summer of 2013. O’hara has contributed to the California State University, Fullerton Geology club for the past year and served as its historian. She enjoys attending conferences and has given a number of poster presentations on her research. In her spare time she enjoys playing volleyball, which she spent three years coaching at Sunny Hills High School in Fullerton, CA. O’hara plans on earning her Master of Science degree in Geology and is currently deciding on a graduate program.

Susan Deeb

Susan Deeb is a fourth year undergraduate student hoping to graduate Fall 2014 with a Bachelor of Arts degree in Mathematics as well as a minor in Physics. For the past year, she has been working on statistical analyses of TASEL-M2 data. TASEL-M2 is an NSF funded project, led by Dr. David Pagni, conducted on Orange County schools over several years. The data collected includes demographic information and mathematics scores of over 3000 K6-12 students. Susan has been working with a group of undergraduate students, under the supervision of Dr. Sam Behseta, to analyze this portion of the data. Together, they have presented their results to a group of researchers at University of California, Irvine. They have also had the opportunity to present in a small conference to mathematicians at Chapman University, as well as at SCCUR 2013 and most recently, the CSUF Research Competition, in which Susan was the lead presenter. They have recently been recognized for their research activities by the College of Natural Sciences and Mathematics at CSUF.

Mirna Dominguez

Mirna Dominguez is currently in her fourth year and graduating spring 2015 with a Bachelors in Mathematics. The most fascinating part about research for her was the one-to-one learning interaction. Under the supervision of Dr. Bourget statistical applications such data arrays were analyze through simulations; Gene-gene interactions are analyzed through comparison of methods. She is very excited to have her first publication and desires to do further research work.

Tiffany Duong

My name is Tiffany Duong and I am an undergraduate student getting my bachelors of science degree. My research is on water conservation in hopes that one day my research will be of use in the fight against California’s current water crisis. In the future after I graduate from California State University, Fullerton, I plan on pursuing my career in health sciences by getting my medi- cal degree.

DIMENSIONS 98 Dylan Garcia

Dylan Garcia is an undergraduate student working towards a B.S. in Geology. He has been working with Dr. Matthew Kirby in the Paleoclimatology and Paleotsunami Laboratory as his lab assistant and thesis student. He has been learning geochemical and sedimentological lab techniques, as well as training other students. Aside from his paleotsunami research, he has also participated in the Environmental Science Research in Thailand program as a Louis Stokes Alliance for Minority Participation student. Here, he worked with Dr. Brady Rhodes in Pai, Northern Thailand on a landslide hazards research project. Dylan is also a volunteer mentor for America on Track, a non-profit organization dedicated to helping children of prisoners. He has jump- started an organization on campus where he recruits mentors for AOT and gives STEM lectures to the mentees and mentors – he is interested in making science accessible to others through science communication. He frequently attends conferences, gives research presentations, and is currently deciding on a graduate program. “Whenever we are curious about something, we inquire about it, and we look for an answer. What we don’t realize is that we are conducting research – we are being scientists – we are exploring the new and undiscovered. Question Everything.” [email protected]

Andrew Halsaver

Andrew is a first year graduate student in the Applied Math program at CSUF, and he graduated in the Fall of 2013 at CSUF with a Cum Laude degree in Applied Math. He began his research with Dr. Agnew in the summer of 2013 where he worked on using Mathematica with Clifford Projective Spaces. Andrew’s work can be used to help Dr. Agnew’s research advance regarding Clifford Projective Spaces. Also, Andrew’s work may be useful to other areas of study such as theoretical physics. He hopes to continue his work with Dr. Agnew while studying for his Masters degree at CSUF.

Taylor Kennedy

Taylor Kennedy is a California native from Huntington Beach. His love for geology was discovered when working toward his Associate of Arts degree at Orange Coast College. Taylor is finishing his thesis and will graduate from California State University Fullerton with a Geol Sci degree this fall. Taylor also works for his teacher Professor Laton and pursues an active life performing martial arts and suffers through mudruns.

Estefania Larrosa

Estefania Larrosa is a sophomore at Santiago Canyon College. She started working at the Bridges Lab last summer as a part of the STEM^2 CSUF Summer Research Experience Internship. After that summer, Dr. Bridges was kind enough to let her stay for an additional semester and work on this kinetics project. Larrosa’s major is Biochemistry, and after she finishes her bachelor’s degree she plans to apply to medical school.

DIMENSIONS 99 Calvin Lung

Calvin Lung is an alumnus of CSU, Fullerton and has his BS in biology, with a concentration in molecular biology and biotechnology. In May 2013, Calvin received the CSUF MSTI Scholarship for Future Teachers, a scholarship given to those pursing a career in education. Working under Dr. Joel K. Abraham, Calvin’s research explored the effects of diet optimization has on worm cast quality and quantity. Eisenia fetida, a type of worm, can break down organic matter, such as food wastes, and turn them into a nutrient-rich soil amendment that plants can easily uptake. Calvin found that food sorting had no measurable impact on cast production or nutrient content. Calvin is currently taking a year off of school by working before pursing a PhD in biology.

Jesus Manuel Mejia

Jesus Manuel Mejia Ochoa is in his senior year as an undergraduate student majoring in Biochemistry. He started his research experience in fall 2013 by joining Dr. Michael Bridges’ laboratory. Jesus is currently conducting research under the guidance of Dr. Bridges which focuses on the intrinsically disordered protein stathmin, its oligomerization, and measuring interspin distanc- es using electron paramagnetic resonance spectroscopy. Through this research he was able to measure intra and inter-molecu- lar distances in stathmin and further corroborate the existence of stathmin as an oligomer in solution through the use of double mutants and site directed spin labeling. After graduating, Jesus plans on working on a research team or obtaining a teaching position, before continuing his education in a graduate program.

Miriam Morua

My name is Miriam Morua and as an undergraduate at CSUF, I joined the urban agriculture community based research experi- ence (U-ACRE) program with Dr. Abraham where I was able to develop my own independent research project with Dr. Jochen Schenk. My research focused in water management strategies where I compared the efficiency of different irrigation systems such as surface and subsurface irrigation for growing pepper plants in the Fullerton Arboretum. I was able to communicate my research in conferences such as the 2012 HACU student track conference in Washington D.C and the 2013 Latinos in Agricul- ture in El Paso, TX. Along with travel grants, I was also awarded a scholarship from the Rose Society of Saddleback Mountain Research Assistant Award in Plant Sciences in 2013. This year, I was accepted to the Watershed Management Internship (WMI) program that collaborates with the USDA and California State Universities to promote and expose students to career opportu- nities with government agencies while performing important research in conservation. Being accepted to the WMI program will help me reach my goal to obtain a master’s degree in Biology because it will directly fund my research.

Garrett Mottle

Garrett Mottle is a fifth year senior graduating this summer pursuing a Bachelor of Science degree in Geological Science. He has spent the past couple years doing undergraduate research in the eastern Sierra Nevada mountains studying normal fault orientations near Lone Pine, CA. Garrett has attended field trips such as those with the Pacific Section SEPM (Society for Sedimentary Geology). He attends professional meetings such as those with the South Coast Geological Society (SCGS). As of late, he will be attending Geology Field Camp in Dillon, MT, and then look to pursue a career as a professional geologist and/or continue his studies in geology and obtain a Master of Science Degree in Geological Science.

DIMENSIONS 100 Cristy Rice

Cristy Rice is hoping to graduate in spring 2015 with a B.S. in Biology and an emphasis in Marine Biology. She transferred here and also became a Southern California Ecosystems Research Program (SCERP) scholar in the Summer of 2012. Under the direction of Dr. Kristy Forsgren, she has worked on California pipefishes to establish a faster method of identification and deter- mined species via genetic analysis. She has presented her work at numerous symposiums and conferences and won a research grant and best poster award from the Southern California Academy of Science annual meeting. She hopes to continue on into a graduate program after she graduates and is very interested in ecology and systematics among other topics.

Nathan Robertson

Nathan Robertson is a fourth-year undergraduate student graduating this spring with a Bachelor of Arts in Mathematics and a concentration in Statistics. Nathan has spent the past two years researching mathematics achievement with Dr. Behseta. Nathan hopes to continue his studies by pursuing graduate school in Statistics.

Kelly Shaw

Kelly Shaw is an undergraduate student set to graduate in December 2014 with a Bachelor of Science Degree in Geology. Kelly has done geologic research studying mass wasting events in Thailand through the Environmental Science Research in Thailand (ESRT) program at CSUF. She has been dedicated to the Geology Department serving first as the club treasurer and currently as the club president. She is also a member of Phi Beta Delta Honor Society on campus at CSUF. Kelly regularly attends profes- sional meetings and lectures. In the past Kelly has served her community by being actively involved with the Boy Scouts of America and the Parent Teacher Association (PTA) in the Brea-Olinda School District where she held various office positions including PTA President at Brea Jr High. Kelly plans to apply to the CSUF graduate program in the spring of 2015 to work on a Master of Science Degree in Geology.

Jennifer Spencer

Working under the mentorship of Dr. Melanie Sacco, my research involves studying disease responses of the 14-3-3 gene family in Nicotiana benthamiana and Solanum lycopersicum. Through this research, we hope to substantiate a role for 14-3-3 genes in effector-triggered immunity in plants. I have conducted this research while participating as a Minority Access to Research Careers (MARC) scholar and Ronald E. McNair scholar. After my undergraduate degree, I will obtain a Ph.D. involving infectious disease biology, virology and immunology. I aspire to work for the Center for Disease Control and Prevention or the World Health Organization to research and help mitigate detrimental diseases.

DIMENSIONS 101 Elizabeth White

Elizabeth White is a senior pursuing a Bachelor of Science degree in Geology. She is set to graduate this summer after attending field camp and investigating the regional geology of Montana at the University of Montana-Western. She also will be traveling to Thailand for the Earth Science in Thailand (ESIT) class offered as a joint program between CSUF and Chiang Mai University. This fall, Elizabeth will begin her master’s degree with a focus in geochemistry. When she is not immersed in her coursework, Elizabeth can be found in the University Learning Center (ULC) where she works as a geology tutor.

DIMENSIONS 102 Editors Neha Ansari

Neha Ansari is a third year biochemistry major minoring in print journalism. Ansari currently serves as the Chemistry and Biochemistry Section editor for Dimensions. At Cal State Fullerton, Ansari conducts research in Dr. Peter de Lijser’s organic chemistry lab where she studies the use of a carbonyl group as a nucleophile in photosensitized electron transfer reactions of oxime ethers. In the summer of 2013, Ansari was selected as an Amgen Scholar at the University of Washington, where she investigated the development of micelles in self-immolative polymers for the purpose of drug delivery in Dr. AJ Boydston’s organic chemistry lab. Ansari is a member of the ASI Board of Directors representing the College of Natural Sciences and Mathematics, the University Honors Program, and is a President’s Scholar.

Chris Baker

Chris Baker (Editor-in-Chief) is a graduating Geology student. He is currently working with Dr. Phil Armstrong to determine the rates of exhumation of the southern Alaskan Mountains. Chris plans to further his education and obtain a graduate degree with emphasis in hydrogeology. Currently Chris works as a geologist for The Source Group, is involved with multiple professional societies, and serves as an officer in the South Coast Geological Society.

Kali Prasun Chowdhury

Kali Prasun Chowdhury (Duke) graduated with high honors from University of California, Riverside in two and a half years with a degree in Business Economics. Since then, he has worked in consulting for five years before embarking on his graduate career. His endeavors have led him to complete a Master of Arts in Economics, from University of California, Santa Barbara and he is in his last semester of finishing up his Master of Science in Statistics, here at California State University, Fullerton. With a curiosity and a passion for research on some of the most fundamental questions of the Standard Social Science Model, Duke’s ultimate goal is to attain his doctorate in Statistics. He wishes to apply it to the current models of human decision making, to improve predictive accuracy and inferential ability, over and above what is currently observed in such settings.

Kim Conway

I am a graduating senior double-majoring in Biology and Health Science. I am currently working in Dr. Paul Stapp’s Vertebrate Ecology and Conservation Biology laboratory studying the prevalence of bot fly infestation in thirteen-lined ground squirrels in northern Colorado, while also attempting to identify the species of bot fly through molecular genetics techniques. I am a BURST (Biology Undergraduate Research Scholar Training) scholar at CSUF, and have been working as a veterinary assistant for 4 ½ years. I will be attending the UC Davis School of Veterinary Medicine next fall to obtain a DVM degree with a focus on wildlife medicine.

DIMENSIONS 103 D’Lisa O’hara Creager

D’lisa O’hara Creager is an undergraduate student that will be finishing up this summer with a Bachelor of Science degree in Geology. She has been working with Dr. Brady Rhodes on her senior thesis project studying paleotsunami deposits in Seal Beach, CA. Dr. Rhodes also advised her in completing a hydrogeochemical study done in Chiang Mai, Thailand during her participation in the Environmental Science Research in Thailand (ESRT) program in the summer of 2013. O’hara has contributed to the California State University, Fullerton Geology club for the past year and served as its historian. She enjoys attending conferences and has given a number of poster presentations on her research. In her spare time she enjoys playing volleyball, which she spent three years coaching at Sunny Hills High School in Fullerton, CA. O’hara plans on earning her Master of Science degree in Geology and is currently deciding on a graduate program.

Niv Ginat

Niv Ginat is a graduating graphic design student from Los Angeles, California. Motivated by his love for design and hunger for knowledge, Niv draws his experience from a wide range of projects that he has worked on since embarking on his creative journey. In 2013, he was awarded the Jerry Samuelson Scholarship for his commitment and artistic potential as a graphic designer. He hopes to one day be able to carry his work with him as he travels around the world.

Carose Le

Carose Le is a graphic designer from Seattle, Washington. She is expected to graduate Fall 2014 with a Bachelor of Fine Arts degree in Graphic Design. She serves as a graphic designer for the Spring 2014 issue of CSUF’s literary magazine of the College of Communications, TUSK. Carose has previously worked as a designer for City of Fullerton Parks & Recreation and CSUF Mihaylo College of Business & Economics. On behalf of College of the Arts and the Jerry Samuelson Scholarship Foundation, Carose was selected as a Jerry Samuelson Scholar. The scholarship recipient is selected by faculty and awarded to a visual arts student who has demonstrated talent and artistic potential. She enjoys using bright and vibrant colors when compiling her use of typography, photography and graphics.

Phillipe Diego Rodriguez

Phillipe is currently a second year physics major at CSUF and part both the Honors Program and the Dan Black Physics Business Program. With an interest in both physics and business, Phillipe plans on a career in technology based new venture start ups. Phillipe is a member of many physics, business, and honors organizations on campus. In addition, he volunteers with a youth soccer organization as a coach, referee, and mentor. Phillipe is currently working with Dr. Jocelyn Read on crust interactions of neutron star binaries during inspirals and mergers as part of his Senior Honors Project. Phillipe plans to attain his Masters in Business Administration while giving back to the organizations that helped him reach his goals.

DIMENSIONS 104 DIMENSIONS 105