Big Data and Computing Building a Vision for ARS Information Management

Big Data and Computing Building a Vision for ARS Information Management

Big Data and Computing Building a Vision for ARS Information Management Workshop Summary Feb. 5-6, 2013 USDA Agricultural Research Service TABLE OF CONTENTS TABLE OF CONTENTS ............................................................................................................................. 2 INTRODUCTION ....................................................................................................................................... 3 BIG DATA AND COMPUTING ................................................................................................................ 5 DEFINING RESEARCH NEEDS ............................................................................................................................................................... 5 STEPS TO ACHIEVE THE VISION: SPECIFIC RECOMMENDATIONS .................................................................................................... 6 CONCLUSIONS .......................................................................................................................................... 8 TABLE 1. SUMMARY OF RECOMMENDATIONS. ................................................................................................................................. 9 ACKNOWLEDGEMENTS...................................................................................................................................................................... 10 APPENDIX 1: EXAMPLES OF HOW IMPROVED BIG DATA CAPACITY WILL INCREASE ARS SCIENTIFIC CAPACITY .......................................................................................................................... 11 APPENDIX 2: WORKSHOP AGENDA .................................................................................................. 13 APPENDIX 3. ACRONYMS DEFINED .................................................................................................. 14 2 INTRODUCTION Box 1 On February 5-7, 2013, scientific leaders from the Agricultural The Bigness Research Service (ARS) held a workshop to identify scientific of Big Data information management needs within the agency, and to find solutions to address those needs. Workshop participants represented The phrase ‘Big Data’ scientists from all National Programs and each geographic area. obviously implies volume, but Speakers from industry, academia, and federal agencies provided IT industry analysts and information about their experiences with Big Data. Participants were consultants have long challenged to articulate a vision for ARS information management acknowledged other unconstrained by current ARS capabilities and to formulate a characteristics of data that add strategy to achieve that vision. This document outlines the commensurate challenges. The recommendations arising from that workshop. discussion remains vigorous, to the point of moving from the The ARS is a worldwide leader in agricultural research, providing a description of Big Data to that unique breadth and depth of scientific expertise to address problems of Extreme Data. of national and international importance. ARS scientists use leading- edge methods to solve problems and answer questions across a broad For the purposes of this white array of disciplines related to agriculture. The scope of research paper, Big Data is undertaken by the agency is highlighted by the ARS Mission characterized as having Statement: extreme or variable values of ARS conducts research to develop and transfer solutions to one or more of the following agricultural problems of high national priority and provide characteristics: information access and dissemination to: Volume1 (size) • ensure high-quality, safe food, and other agricultural Variety1 (structure) products • Velocity1 (acquisition rate) • assess the nutritional needs of Americans • Veracity (uncertain quality or • sustain a competitive agricultural economy • provenance) • enhance the natural resource base and the • Variability2 (in meaning) environment, and Complexity3 (in relation- • provide economic opportunities for rural citizens, • ships, sources, etc.) communities, and society as a whole. • However, the nature of the science supporting this mission is 1. Doug Laney. 2001. 3-D Data Management: Controlling Data changing rapidly. In the past, scientific methods were often labor- Volume, Velocity, and Variety. intensive and consumed many person-hours to adequately address a 2. Brian Hopkins. 2011. Blogging From single scientific question. Scientists are now generating vast amounts the IBM Big Data Symposium - Big Is More Than Just Big. 2011. Brian of high-quality data rapidly and relatively inexpensively. This Hopkins. fundamental change in the nature of science is presenting new 3. Valentin Sribar. 2011. ‘Big Data' Is challenges and demanding new approaches to maximize the value Only the Beginning of Extreme Information Management. extracted from these large and complex datasets. This dramatic growth in data volume, variety, and velocity has come to be known as Big Data (Box 1). 3 As a result of these changes, a new paradigm is emerging in science that is characterized by its data intensity. Previous methods for data collection, storage, and analysis are inadequate for handling the scale and complexity of this avalanche of new data. Consumer-oriented services such as Google Maps, Facebook, Wikipedia, and the National Oceanic and Atmospheric Administration (NOAA) weather forecasts, demonstrate the benefits that can be gained by aggregating data from multiple sources. These products have become essential resources intensively adopted by millions of people in an astonishingly brief time. Similarly, the vast data resources produced by ARS scientists could be used to describe the world in ways never imagined, but critical resources and coordination to maximize this impact are missing. A few examples of agricultural research that are enabled by, and even necessitate, such a paradigm shift in scientific information management include: • Increasing the resilience of production systems: Deriving local or regional responses to environmental stressors requires defining the complex interactions among crops, animals, soil, water, weather, and climate. A better understanding of these interactions can be gleaned, even using current data or data collection strategies, but only with an improvement in how the vast amounts of data are integrated across locations. • Improving our understanding of genotype by environment interactions: In all organisms, the environment affects phenotype, and variability in the environment can confound attempts to identify genes underlying important agronomic traits, and even human diseases. Traditional studies typically only identify genes with large effects, but many important traits are driven by multiple genes that are easily affected by the environment. Identifying these genes requires whole genome studies coupled with good environmental data for each individual. Such studies require high computing capacity for analyzing very large datasets. • Enhancing human and animal health: Human and animal nutrition, health, and well- being can be better predicted by understanding the rich interactions among microbial communities through insights gleaned from the genomes of those community members. For a more extensive discussion of these and other examples, see Appendix 1 (p. 11). Enhancements in scientific computing in the ARS are both critical and urgent. The ARS must upgrade computing and data management infrastructure to maintain its status as a world leader in the science of agriculture. The Big Data paradigm shift in science is taking place across disciplines, and action is necessary for ARS scientists to effectively perform key research missions. Enhancements to current systems are vital for maintaining the ARS as an effective, nationally-coordinated, multidisciplinary agency. Current research priorities target objectives that are unattainable without some broad, systematic improvements. ARS scientists cannot perform state of the art research without the appropriate tools, knowledge, and skills. 4 BIG DATA AND COMPUTING Participants at the Big Data Workshop expressed enthusiastic support of the worldwide leadership provided by the ARS in agricultural research and embraced the role of the agency to lead in the collection, storage, analysis, and distribution of scientific data related to agriculture (see Box 2). Workshop participants agreed that the ARS is uniquely positioned to provide long-term, stabile, durable solutions for information management challenges involving Big Data. Participants reported that urgent solutions are needed to address these technology issues to help meet scientific needs, needs that reach across agricultural research institutions and disciplines. This enhanced role for the ARS is consistent with the Big Data initiative recently announced by the White House Office of Science and Technology Policy (OSTP). Box 2 Big Data Vision Statement Scientists in the ARS and elsewhere are generating data at increasing rates. To maximize knowledge extracted from these large and diverse data sets, ARS should support scientific computing to efficiently combine disparate information for scientific discovery, and to enable the transfer of that knowledge quickly and efficiently to other scientists and to the public. The enhancements envisioned here will: • enable or enhance data exchange and analysis • create a more robust network of computer resources • encourage data standardization, and • help to develop a well-trained

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us