Scientist : the Ability and Audacity to Scale Your Science with Extensible Cyberinfrastructure

Total Page:16

File Type:pdf, Size:1020Kb

Scientist : the Ability and Audacity to Scale Your Science with Extensible Cyberinfrastructure Unleash your inner (data) scientist : The ability and audacity to scale your science with extensible cyberinfrastructure Nirav Merchant The University of Arizona & iPlant Collaborative [email protected] Topic Coverage • The “Big Data” and “Data Scientist” wave • What is cyberinfrastructure (CI) • Delivering pragmatic CI ecosystem • What has the community built with our CI • Lifecycle of research and innovation • Continuing education and learning with CI • Future thoughts and challenges Science Paradigms 1. Thousand years ago: science was empirical describing natural phenomena, observations 2. Last few hundred years: theoretical branch using models, generalizations 3. Last few decades: a computational branch simulating complex phenomena 4. Today: data exploration (eScience) unify theory, experiment, and simulation Based on the transcript of a talk given by the late Jim Gray to the National Research Council – Computer Science and Telecommunication Board in Mountain View, CA, on 3January 11, 2007 The Fourth Paradigm: Data-Intensive Scientific Discovery • Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets. • The speed at which any given scientific discipline advances will depend on how well its researchers collaborate with one another, and with technologists, in areas of eScience such as databases, workflow management, visualization, and cloud computing technologies. http://research.microsoft.com/en-us/collaboration/fourthparadigm/ 4 The Discovery Lifecycle The Fourth Paradigm: Data-Intensive Scientific Discovery 5 Evolution of X-Info • The evolution of X-Info and Comp-X for each discipline X e.g. (Bio-Informatics , Computational-Biology) • How to codify and represent our knowledge • The Generic Problems: • Data ingest • How to share it with others • Managing a petabyte • Query and Vis tools • Common schema • Building and executing models • How to organize it • Integrating data and literature • How to reorganize it • Documenting experiments • Curation and long-term preservation 6The Fourth Paradigm: Data-Intensive Scientific Discovery Paradigm Shift •Classic paradigm: You produce data, analyze, interpret (end to end) •Conventional paradigm: Consortium/centers produce data and you consume it •New Paradigm: Consortium/centers have produced data and creating “cyber infrastructure” to tackle the “grand challenge” 7 ∧ 8 Big Data • Extracting meaningful results from vast amount of data (linked data) • Big data “information assets” demand cost-effective, innovative forms of information processing for enhanced insight and decision making. • “Big Data” Is only the Beginning of Extreme Information Management • Big Data Technology, all Is Not New 9 Attributed to Gartner Consulting A few word about “Big Data” and “Data Science” The 2014 Gartner Technology Hype-Cycle http://www.gartner.com/newsroom/id/2819918 Simple Formula for Success + = 11 The Reality • Excel, R • PERL • Amazon • Python • Azure • ARCGIS + + • Rackspace • Java Ruby • Campus HPC • Fortran C C# • XSEDE C++ Matlab • Etc. • etc. and lots of glue….. 12 Simple Formula + = http://cloudtweaks.com/2011/05/the-lighter-side-of-the-cloud-data-transfer/ Rise of the “data janitors” 15 The relevance • Bioinformatics has become too central to biology to be left to specialist bioinformaticians. • Biologists are all bioinformaticians now - Lincoln Stein Dec. 2008 http://genomebiology.com/2008/9/12/114 iPlant Collaborative: Vision Enable life science researchers and educators to use and extend cyberinfrastructure www.iPlantCollaborative.org The iPlant Collaborative We are a Cyberinfrastructure Platforms, tools, datasets Storage and compute Training and support From data to discovery The iPlant Collaborative And a virtual organization • Developer Expertise • Computational Capacity • Science Domain Expertise • Training • Administrative and Organization iPlant Collaborative: CI for Scalable Science • Facilitating the 4A’s of “Computational Thinking” approaches for Life Sciences: Abstraction, Automation, Ability and Audacity • Allowing researchers and educators to establish and manage data driven collaborations: Supporting distributed teams and virtual organizations (VO) at global scale • Making efficient and coordinated use of CI resources from national, regional, institutional and commercial providers: NSF XSEDE, iPlant, campus HPC and high bandwidth connections to commercial cloud providers • Adopting best practices from science domains where key CI challenges have been solved: Astronomy, Particle Physics etc. • Community driven, self-provisioning, extensible and open source: Development and prioritization driven through community engagement, active engagement with CISE communities iPlant Collaborative: Platform Philosophy • Strive to provide the CI Lego blocks • Danish 'leg godt' - 'play well’ • Also translates as 'I put together' in Latin • If desired functionality is not available, the community can craft their own by using and extending iPlant CI components (like lego blocks) • Through these extensible and customized platforms create a ecosystem of interoperable tools that benefit the broad community (and not few lab groups) • Provide the tools to allow community to manage their digital assets (cloud, HPC etc.) • Improve Computational Productivity Who did we build it for ? iPlant: Platform for Big Data Collaborations iPlant Collaborative: Products Ready to use Platforms Extensible Services Established CI Components Ease of use Foundational Capabilities iPlant: Cohesive Platform for Big Data lifecycle Researchers like to share ! • User Statistics • ~27000 user accounts • 4900 users with data • 2600 users (53% of users with data) made at least 1 share • 2100 shares per user • 42 million files (58% shared) • 59 million (1.1 million/month) shares • Community Data Statistics • 5 million files • 55 million (1.0 million/month) shares • ~1.1PB of User Managed data • Our users consume 5M+ SU annually and more (we graduate them to compete for their own allocations from XSEDE) How is it being used ? • User build their own systems (powered by iPlant components) but managed by them • Consume specific components (a la carte, data store, Atmosphere) • Directly use applications (DE) • Custom design appliances (Atmosphere) • Publish their findings (PNAS, Nature) • Advocate use • Create learning material and courses iPlant CI: What is the community building ? • Many 1000’s omes project manage their data & analysis • Execute large scale workflows (25-50TB data , Million+ CPU hours) • Data infrastructure to coordinate digitization efforts for multiple sites • Sharing, Visualizing (3D) & Analyzing high resolution microscopy images (40K x 40K) via web browser • Learning material, new course work, custom applications And it goes way beyond plants and life science iPlant Collaborative: Training data scientists • Partnership with Software Carpentry and Data Carpentry to provide best practices necessary to make efficient use of CI • Allowing individual researchers and educators to utilize data and computational infrastructure at scale (and encounter real challenges) • Community contributed material (built on iPlant CI) Applied Cyberinfrastructure Concepts (ACIC) • Semester long project based learning course: introduces fundamental concepts, tools and resources for effectively managing common tasks associated with analyzing large datasets. • Graduate + Undergraduate course working on a REAL research workflows where scalability is a bottleneck • Provide familiarity with cyberinfrastrucutre (CI) resources available at the University of Arizona campus, iPlant Collaborative, NSF XSEDE centers, Cloud (Future Grid and commercial providers such as Amazon). • Learning to apply relevant CI skills (for final project) and developing wiki based documentation of these best practices. • Learning how to effectively collaborate in interdisciplinary team settings. • Deliver a functional solution to the stakeholder From research question to reality Why is it valuable ? • Users are able to over come data and computational bottle necks • Share data of ANY size with ANYONE • Connect data and compute on single platform • Manage their data and computations regardless of scale • Build their own apps and solutions (create their own community iAnimal, iVirome) • Create custom appliances iPlant: What worked • All major CI components have seen steady adoption (few exception) • “Think tank to do tank” transition was rapid • Evolved to a technology proving ground • Take research products (NSF funded) to production use for our community • Running infrastructure is not fun, building is. Allowing people to focus on science (while stream line CI) iPlant: What worked • Evolution of training (software carpentry) • Sharing/collaboration • Give people exit strategy (options) and they are happy adopt solution • Provide feedback to CI component creators to improve (usability) • Expectation management: Do not expect the same experience (cable cord cutting v/s netflix/hulu) What did not work • Managing distributed teams is harder in VO (load balancing, enthusiasm etc) • Technology lifecycle is not synchronized across all products • Relying on multiple providers for solution is challenging (downtimes) • Changing/Evolving needs of community are hard to predict • Growth of users out paces our cloud capabilities (see tweets) Even the tech geeks notice Connect with iPlant! Get a account: http://user.iplantcollaborative.org Email us: [email protected] Questions: http://ask.iplantcollaborative.org Twitter: @iPlantCollab #iPlant Facebook: facebook.com/iPlantCollab LinkedIn: iplant.co/iPlantCollabLinkedIn Google+: iplant.com/iPlantGooglePlus Luck favors the brave Analysis favors the organized.
Recommended publications
  • Iplant Collaborative Taking Plant Biology Into Cyberspace by Janni Simner and Susan Mcginley Leslie Johnston
    The iPlant Collaborative Taking plant biology into cyberspace By Janni Simner and Susan McGinley Leslie Johnston Richard Jorgensen, lead investigator and director of the iPlant Collaborative, surrounded by petunias used in gene expression studies. he National Science Foundation (NSF) has awarded a environmental changes; how plants have evolved in the past and University of Arizona–led team $50 million to create a global their potential to evolve in the future; and how plants live together Tcenter and computer cyberinfrastructure to answer plant with other organisms in ecosystems. biology’s grand challenge questions, which no single research entity iPlant’s team of plant biologists, computer scientists, information in the world currently has the capacity to address. scientists, mathematicians and social scientists will set about The project will unite plant scientists, computer scientists and building a “Discovery Environment”—a cyberinfrastructure—to information scientists from around the world for the first time to answer those questions in partnership with the community’s grand provide answers to questions of global importance and to advance challenge teams. knowledge in all of these fields. The iPlant center will be located in and administered through Dubbed the iPlant Collaborative, the five-year project is the UA’s BIO5 Institute in Tucson. BIO5 was founded to encourage potentially renewable for a second five years for a total of $100 collaboration across scientific disciplines, accelerate the pace of million. scientific discovery and develop innovative solutions to society’s “This global center is going to change the way we do science,” most complex biological problems. The iPlant Collaborative will says UA plant sciences professor and BIO5 Institute member create both a physical center and a virtual computing space where Richard Jorgensen, who is the lead investigator and director of researchers can communicate and work together as they share, the iPlant Collaborative.
    [Show full text]
  • Plant Pathology, Physiology, and Weed Science 2013
    News PPWSPlant Pathology, Physiology, and Weed Science www.ppws.vt.edu 2013 Greetings to our alumni, colleagues, and friends! We have had a busy year in the Department of Plant Pathology, Physiology, and Weed Science. We are pleased to share an update of our accomplishments with you. In the fall we welcomed new faculty member Dr. Xiaofeng Wang whose research interests include plant virology. We would like to congratulate the graduate students who successfully completed their degrees. We had a total of 12 graduations in the past academic year. In Fall 2012 we debuted a new student-organized mini symposium that now includes a poster competition and travel awards for outstanding graduate students. Our popular Agricultural Research and Extension Center and Ag Industry tour visited the Alson H. Smith, Jr. center in Winchester, the Southern Piedmont AREC in Blackstone, and the Eastern Virginia AREC in Warsaw last fall to provide students a first- Elizabeth Grabau, hand view of our applied research programs. We were pleased to be able to honor department head Scott Hagood as an emeritus professor of weed science during this past year. At the annual college alumni awards celebration in March we also recognized two outstanding PPWS alumni, Anne Dorrance and David McCall. As we look ahead to the coming year, we hope to visit with many of you at meetings and conferences. Please keep us informed of your news and accomplishments. We also invite you stop by the department whenever your travel finds you in the Blacksburg area. Elizabeth Grabau, department head CALS Alumni Awards Anne Dorrance (Ph.D.
    [Show full text]
  • Arxiv:1810.00224V2 [Q-Bio.PE] 7 Dec 2020 Humanity Is Increasingly Influencing Global Environments [195]
    A Survey of Biodiversity Informatics: Concepts, Practices, and Challenges Luiz M. R. Gadelha Jr.1* Pedro C. de Siracusa1 Artur Ziviani1 Eduardo Couto Dalcin2 Helen Michelle Affe2 Marinez Ferreira de Siqueira2 Luís Alexandre Estevão da Silva2 Douglas A. Augusto3 Eduardo Krempser3 Marcia Chame3 Raquel Lopes Costa4 Pedro Milet Meirelles5 and Fabiano Thompson6 1National Laboratory for Scientific Computing, Petrópolis, Brazil 2Friedrich-Schiller-University Jena, Jena, Germany 2Rio de Janeiro Botanical Garden, Rio de Janeiro, Brazil 3Oswaldo Cruz Foundation, Rio de Janeiro, Brazil 4National Institute of Cancer, Rio de Janeiro, Brazil 5Federal University of Bahia, Salvador, Brazil 6Federal University of Rio de Janeiro, Rio de Janeiro, Brazil Abstract The unprecedented size of the human population, along with its associated economic activities, have an ever increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide them. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision-makers in ways that they can effectively use them. The development and deployment of mechanisms to produce these indicators depend on having access to trustworthy data from field surveys and automated sensors, biological collections, molec- ular data, and historic academic literature. The transformation of this raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques used to manage and analyze this data comprise an area often called biodiversity informatics (or e-Biodiversity). Bio- diversity data follows a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis.
    [Show full text]
  • The Iplant Collaborative:Cyberinfrastructure for the Life Sciences
    NSB Information Item The iPlant Collaborative: Cyberinfrastructure for the Life Sciences PI: Parker Antin, University of Arizona Professor of Cellular and Molecular Medicine Associate Dean for Research of the College of Agriculture and Life Sciences Eric Lyons Matthew Vaughn Doreen Ware Ann Stapleton Nirav Merchant David Micklos James Olds, AD and Karen Cone, Program Director Directorate for Biological Sciences February 2016 Overview • iPlant is having a transformative impact on data- to-discovery RESEARCH in life sciences and beyond – User metrics – Publication record • iPlant has become a reference model for CYBERINFRASTRUCTURE – Modular, extensible design – Hub in an interoperable cyberinfrastructure ecosystem • iPlant is on a path to SUSTAINABILITY – Maintenance of high-value operations – Continued development and innovation 2 iPlant Cyberinfrastructure Advancing Transformative Data-to-Discovery Research Whole-genome analyses resolve early branches in the tree of life of modern birds The pineapple genome and the evolution of CAM photosynthesis Using multi-timescale methods and satellite-derived land surface temperature for interpolation of daily maximum air temperature Wild-Type N-Ras, overexpressed in breast Identification of dopamine cancer…promotes tumor formation receptors…among vertebrates 3 Increase in Publications Citing iPlant Biology All Other 600 400 200 0 2008 2009 2010 2011 2012 2013 2014 2015 4 Strong and Growing User Base 40,000 User Accounts 30,000 Virtual Images launched in 20,000 Atmosphere User data in Data Store
    [Show full text]
  • Downloaded Tiles H08v04 and H09v04 for the 2001–2010 Time Period from NASA [60])
    Remote Sens. 2014, 6, 8639-8670; doi:10.3390/rs6098639 OPEN ACCESS remote sensing ISSN 2072-4292 www.mdpi.com/journal/remotesensing Article An Assessment of Methods and Remote-Sensing Derived Covariates for Regional Predictions of 1 km Daily Maximum Air Temperature Benoit Parmentier 1,2,3,*, Brian McGill 2, Adam M. Wilson 4, James Regetz 1, Walter Jetz 4, Robert P. Guralnick 5, Mao-Ning Tuanmu 4, Natalie Robinson 5 and Mark Schildhauer 1 1 National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, 735 State Street, Suite 300, Santa Barbara, CA 93101, USA; E-Mails: [email protected] (J.R.); [email protected] (M.S.) 2 Sustainability Solutions Initiative, University of Maine, Deering Hall Room 302, Orono, ME 04469, USA; E-Mail: [email protected] 3 iPlant Collaborative, University of Arizona, Thomas W, Keating Bioresearch Building1657 East Helen Street, Tucson, AZ 85721, USA 4 Department of Ecology & Evolutionary Biology, Yale University, 165 Prospect Street, New Haven, CT 06520-8106, USA; E-Mails: [email protected] (A.M.W.); [email protected] (W.J.); [email protected] (M.-N.T.) 5 Department of Ecology and Evolutionary Biology, University of Colorado at Boulder, Ramaley N122, Campus Box 334, CO 80309-034, USA; E-Mails: [email protected] (R.P.G.); [email protected] (N.R.) * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +1-508-873-9508; Fax: +1-805-892-2510. Received: 5 June 2014; in revised form: 12 August 2014 / Accepted: 25 August 2014 / Published: 16 September 2014 Abstract: The monitoring and prediction of biodiversity and environmental changes is constrained by the availability of accurate and spatially contiguous climatic variables at fine temporal and spatial grains.
    [Show full text]
  • Biodiversity Is More Than Just a Numbers Game 7 October 2014, by Daniel Stolte
    Biodiversity is more than just a numbers game 7 October 2014, by Daniel Stolte The results of the new approach are published in a special issue of the journal PNAS. Functional biogeography is an emergent discipline focused on understanding the geographic distribution of the forms and functions—the traits—of organisms, with the goal of helping scientists and managers better predict the origin of biological diversity and how ecological communities respond to climate change. "It has long ago been proposed that the higher species diversity in the tropics could be explained by a larger number of habitats available for the organisms living there," said Christine Lamanna and Cyrille Violle, both recent Ph.D. and postdoctoral researchers in the laboratory of Brian A researcher on Brian Enquist's team measures traits Enquist and the leading authors of the special such as leaf size and branch length in a tropical issue. "While appealing, this hypothesis has not yet rainforest. been tested in plants partly due to a lack of available data. Surprisingly, we found the greatest diversity of functions in the temperate regions." A new look at one of ecology's unsolved Building on a unique database compiled by an puzzles—why biodiversity is higher in the tropics international group of researchers, the Botanical compared with colder regions—revealed that while Information and Ecology Network, or BIEN, this long-recognized pattern holds true for the analyzed data containing more than 20 million data sheer number of species, it does not for how entries of species occurrence and information different species make a living.
    [Show full text]
  • Plant Sciences
    Discover the science that feeds the world. B.S., Plant Sciences Plants and their associated microbes are fundamental to all aspects of our existence. Given the growth of the human population and the effects of this growth on the environment, research and training in the Plant Sciences has never been more critical. Coursework and optional research training prepare majors for post-graduate studies in research, medicine, and pharmacy, or careers in horticulture, agriculture, microbiology, or biotechnology. For more information contact an advisor: School of Plant Sciences Maya Azzi, Academic Advisor [email protected] 520-621-5403 Why major in Plant Sciences? Spectacular career opportunities. Our majors go on to graduate school, medical school, pharmacy school, and law school – or enter the workforce in horticulture, biotechnology, biofuels development, microbiology, computational biology, international development, ag- business, and sustainable agriculture. An outstanding learning environment. Our majors enjoy an interdisciplinary environment with small class sizes, a low student-to-faculty ratio, scholarships, internships, and opportunities for travel. Students gain hands-on experience in real-life applications of bioinformatics, genomics, molecular and cellular biology, microbial sciences, genetics, biotechnology, and plant breeding, propagation, and improvement. Research and training opportunities in world-class facilities. Plant Sciences majors engage in cutting-edge research and training in our internationally recognized centers, such as the Controlled Environment Agriculture Center, iPlant Collaborative, Bio5 Institute, the University of Arizona Herbarium, and diverse field stations and agricultural centers. A chance to change the world. Plants feed, power, and medicate the planet. They are the foundation of global biodiversity and global climate – and key to our future.
    [Show full text]
  • Cyverse Austria—A Local, Collaborative Cyberinfrastructure
    Mathematical and Computational Applications Communication CyVerse Austria—A Local, Collaborative Cyberinfrastructure 1, 1, , 2 3 Konrad Lang y , Sarah Stryeck * y , David Bodruzic , Manfred Stepponat , Slave Trajanoski 4 , Ursula Winkler 2 and Stefanie Lindstaedt 1,5,* 1 Institute for Interactive Systems and Data Science, Graz University of Technology, 8010 Graz, Austria; [email protected] 2 Server and Storage Systems, University of Graz, 8010 Graz, Austria; [email protected] (D.B.); [email protected] (U.W.) 3 Central Information Technology, Graz University of Technology, 8010 Graz, Austria; [email protected] 4 Core Facility Computational Bioanalytics, Medical University of Graz, 8010 Graz, Austria; [email protected] 5 Know-Center GmbH, 8010 Graz, Austria * Correspondence: [email protected] (S.S.); [email protected] (S.L.); Tel.: +43-316-873-30677 (S.S.); +43-316-873-30600 (S.L.) These authors contributed equally to this work. y Received: 25 May 2020; Accepted: 22 June 2020; Published: 24 June 2020 Abstract: Life sciences (LS) are advanced in research data management, since LS have established disciplinary tools for data archiving as well as metadata standards for data reuse. However, there is a lack of tools supporting the active research process in terms of data management and data analytics. This leads to tedious and demanding work to ensure that research data before and after publication are FAIR (findable, accessible, interoperable and reusable) and that analyses are reproducible. The initiative CyVerse US from the University of Arizona, US, supports all processes from data generation, management, sharing and collaboration to analytics.
    [Show full text]
  • The Iplant Collaborative: a Model for Collaborative Science Cyberinfrastructure (At 30 Seconds a Slide)
    The iPlant Collaborative: A model for collaborative science cyberinfrastructure (at 30 seconds a slide) Dan Stanzione Co-Director, The iPlant Collaborative Deputy Director Texas Advanced Computing Center, UT-Austin American Geophysical Union 2012 Meeting What is iPlant • The iPlant Cyberinfrastructure Collaborative is building a comprehensive informatics infrastructure for plant biology. • (and lately, some animals as well). • This rapidly evolving infrastructure is sometimes very visible in your work, and sometimes hides in the background. iPlant – EarthCube • iPlant began with many goals and a community building process, somewhat similar to EarthCube. • The starting point in bio is wayyy behind geosciences. – Biology wasn’t really computational before the late ’90s; pre-sequencer not much digital data, very little modeling and simulation, etc. • Presuming we have done anything right, there might be some lessons for Earthcube (or perhaps just what not to do!). Data-intensive biology will mean geng biologists comfortable with new technology… One key goal in our infrastructure, training and outreach is to minimize the emphasis on technology and return the focus to the biology. 1973 Sharp, Sambrook, Sugden 1958 Gel Electrophoresis Chamber, MaC Meselson & $250 Ultracentrifuge, $500,000 What does iPlant Provide • DATA – iPlant Data Storage: All data large and small • COMPUTING: – Large Scale: Up to hundreds of thousands of processors – Virtual: “Cloud Style” server hosting • A Programmer’s Interface – Easily embed iPlant resources in your
    [Show full text]
  • Meeting Report: Advancing Practical Applications Of
    Walls et al. Standards in Genomic Sciences 2014, 9:17 http://www.standardsingenomics.com/content/9/1/17 MEETING REPORT Open Access Meeting report: advancing practical applications of biodiversity ontologies Ramona L Walls1*, Robert Guralnick2, John Deck3, Adam Buntzman4, Pier Luigi Buttigieg5, Neil Davies6, Michael W Denslow7, Rachel E Gallery8, J Jacob Parnell7, David Osumi-Sutherland9, Robert J Robbins10, Philippe Rocca-Serra11, John Wieczorek3 and Jie Zheng12 Abstract We describe the outcomes of three recent workshops aimed at advancing development of the Biological Collections Ontology (BCO), the Population and Community Ontology (PCO), and tools to annotate data using those and other ontologies. The first workshop gathered use cases to help grow the PCO, agreed upon a format for modeling challenging concepts such as ecological niche, and developed ontology design patterns for defining collections of organisms and population-level phenotypes. The second focused on mapping datasets to ontology terms and converting them to Resource Description Framework (RDF), using the BCO. To follow-up, a BCO hackathon was held concurrently with the 16th Genomics Standards Consortium Meeting, during which we converted additional datasets to RDF, developed a Material Sample Core for the Global Biodiversity Information Framework, created a Web Ontology Language (OWL) file for importing Darwin Core classes and properties into BCO, and developed a workflow for converting biodiversity data among formats. Keywords: Ontology, Biodiversity, Population, Community, Darwin Core, OWL, RDF, Microbial ecology, Sequencing Introduction [5]. The BCO focuses on how to model material samples Biological data range from information about small-scale and observations, while the PCO models assemblages of material entities such molecules and cells to large-scale individuals and their interactions.
    [Show full text]
  • DNA Subway – an Educational Bioinformatics Platform for Gene and Genome Analysis: DNA Barcoding, and RNA-Seq J. Williams*, †
    DNA Subway – An Educational Bioinformatics Platform for Gene and Genome Analysis: DNA Barcoding, and RNA-Seq J. Williams*, †, S. McKay‡, M. Khalfan*, †, C. Ghiban*, †, U. Hilgert†, §, Sue Lauter*, †, Eun-Sook Jeong*, †, and D. Micklos*, † *Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, †iPlant Collaborative, T.W. Keating Bioresearch Building, ‡Ontario Institute for Cancer Research, MaRS Centre, Toronto, ON, Canada, §BIO5 Institute, T.W. Keating Bioresearch Building, U. Arizona, Tucson, AZ ABSTRACT: DNA Subway is an educational often a limiting factor, both in availability and expertise. bioinformatics platform developed by the iPlant Fortunately, advances in technologies and good Collaborative (NSF #DBI–0735191). DNA Subway bundles timing have produced promising solutions to these research-grade bioinformatics tools, high-performance challenges. The cost of HTS has become reasonable – more computing, and databases into workflows with an easy-to- than 1000-fold reduction since 2004 (NHGRI (2013) – and use interface. “Riding” DNA Subway lines, students can the amount of data freely available for students presents predict and annotate genes in up to 150kb of DNA (Red real opportunities for them to contribute to a biology Line), identify homologs in sequenced genomes (Yellow paradigm that operates along a continuum of research and Line), identify species using DNA barcodes and education. This paper outlines how DNA Subway and other phylogenetic trees (Blue Line), and examine RNA-Seq iPlant related resources enable educators to take advantage datasets for differential transcript abundance (Green Line). of these opportunities while bringing HTS to their students. With support for plant and animal genomes, DNA Subway Driven by educational design principles. DNA engages students in their own learning, bringing to life key Subway was conceived to address a need not just for concepts in molecular biology and genetics.
    [Show full text]
  • Emerging Semantics to Link Phenotype and Environment Anne E
    Emerging semantics to link phenotype and environment Anne E. Thessen1,2 , Daniel E. Bunker3, Pier Luigi Buttigieg4, Laurel D. Cooper5, Wasila M. Dahdul6, Sami Domisch7, Nico M. Franz8, Pankaj Jaiswal5, Carolyn J. Lawrence-Dill9, Peter E. Midford10, Christopher J. Mungall11, Mart´ın J. Ram´ırez12, Chelsea D. Specht13, Lars Vogt14, Rutger Aldo Vos15, Ramona L. Walls16, JeVrey W. White17, Guanyang Zhang8, Andrew R. Deans18, Eva Huala19, Suzanna E. Lewis11 and Paula M. Mabee6 1 Ronin Institute for Independent Scholarship, Monclair, NJ, United States 2 The Data Detektiv, Waltham, MA, United States 3 Department of Biological Sciences, New Jersey Institute of Technology, Newark, NJ, United States 4 HGF-MPG Group for Deep Sea Ecology and Technology, Alfred-Wegener-Institut, Helmholtz-Zentrum fur¨ Polar-und Meeresforschung, Bremerhaven, Germany 5 Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States 6 Department of Biology, University of South Dakota, Vermillion, SD, United States 7 Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States 8 School of Life Sciences, Arizona State University, Tempe, AZ, United States 9 Departments of Genetics, Development and Cell Biology and Agronomy, Iowa State University, Ames, IA, United States 10 Richmond, VA, United States 11 Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States 12 Division of Arachnology, Museo Argentino de Ciencias Naturales–CONICET, Buenos Aires, Argentina 13 Departments of Plant
    [Show full text]