Big Data in Geoinformatics
Total Page:16
File Type:pdf, Size:1020Kb
GIS Karimi Techniques and Technologies Big Data in Geoinformatics Big data has always been a major challenge in geoinformatics as geospatial data come in various types and formats, new geospatial data are acquired very fast, and geospatial Big Data databases are inherently very large. And while there have been advances in hardware and software for handling big data, they often fall short of handling geospatial big data efficiently and effectively. Big Data: Techniques and Technologies in Techniques and Technologies Geoinformatics tackles these challenges head on, integrating coverage of techniques and technologies for storing, managing, and computing geospatial big data. in Geoinformatics Providing a perspective based on analysis of time, applications, and resources, this book familiarizes readers with geospatial applications that fall under the category of big data. It explores new trends in geospatial data collection, such as geo- BIG DATA crowdsourcing and advanced data collection technologies such as LiDAR point clouds. The book features a range of topics on big data techniques and technologies in geoinformatics including distributed computing, geospatial data analytics, social media, and volunteered geographic information. Features Edited by • Explains the challenges and issues of big data in geoinformatics applications • Discusses and analyzes the techniques, technologies, and tools for storing, Hassan A. Karimi managing, and computing geospatial big data • Familiarizes the readers with the advanced techniques and technologies used for geospatial big data research • Provides insight into new opportunities offered by geospatial big data With chapters contributed by experts in geoinformatics and in domains such as computing and engineering, the book provides an understanding of the challenges and issues of big data in geoinformatics applications. The book is a single collection of current and emerging techniques, technologies, and tools that are needed to collect, analyze, manage, process, and visualize geospatial big data. K20296 6000 Broken Sound Parkway, NW Suite 300, Boca Raton, FL 33487 ISBN: 978-1-4665-8651-2 711 Third Avenue 90000 New York, NY 10017 an informa business 2 Park Square, Milton Park www.crcpress.com Abingdon, Oxon OX14 4RN, UK 9 781466 586512 www.crcpress.com K20296 mech rev2.indd 1 1/6/14 9:47 AM Big Data Techniques and Technologies in Geoinformatics Big Data Techniques and Technologies in Geoinformatics Edited by Hassan A. Karimi CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2014 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20140108 International Standard Book Number-13: 978-1-4665-8655-0 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit- ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Preface......................................................................................................................vii Editor ........................................................................................................................ix Contributors ..............................................................................................................xi Chapter 1 Distributed and Parallel Computing .....................................................1 Monir H. Sharker and Hassan A. Karimi Chapter 2 GEOSS Clearinghouse: Integrating Geospatial Resources to Support the Global Earth Observation System of Systems ............ 31 Chaowei Yang, Kai Liu, Zhenlong Li, Wenwen Li, Huayi Wu, Jizhe Xia, Qunying Huang, Jing Li, Min Sun, Lizhi Miao, Nanyin Zhou, and Doug Nebert Chapter 3 Using a Cloud Computing Environment to Process Large 3D Spatial Datasets .................................................................................. 55 Ramanathan Sugumaran, Jeffrey Burnett, and Marc P. Armstrong Chapter 4 Building Open Environments to Meet Big Data Challenges in Earth Sciences ................................................................................69 Meixia Deng and Liping Di Chapter 5 Developing Online Visualization and Analysis Services for NASA Satellite-Derived Global Precipitation Products during the Big Geospatial Data Era ................................................... 91 Zhong Liu, Dana Ostrenga, William Teng, and Steven Kempler Chapter 6 Algorithmic Design Considerations for Geospatial and/or Temporal Big Data ................................................................ 117 Terence van Zyl Chapter 7 Machine Learning on Geospatial Big Data ..................................... 133 Terence van Zyl v © 2010 Taylor & Francis Group, LLC vi Contents Chapter 8 Spatial Big Data: Case Studies on Volume, Velocity, and Variety ....149 Michael R. Evans, Dev Oliver, Xun Zhou, and Shashi Shekhar Chapter 9 Exploiting Big VGI to Improve Routing and Navigation Services ....177 Mohamed Bakillah, Johannes Lauer, Steve H.L. Liang, Alexander Zipf, Jamal Jokar Arsanjani, Amin Mobasheri, and Lukas Loos Chapter 10 Efficient Frequent Sequence Mining on Taxi Trip Records Using Road Network Shortcuts ........................................................ 193 Jianting Zhang Chapter 11 Geoinformatics and Social Media: New Big Data Challenge ..........207 Arie Croitoru, Andrew Crooks, Jacek Radzikowski, Anthony Stefanidis, Ranga R. Vatsavai, and Nicole Wayant Chapter 12 Insights and Knowledge Discovery from Big Geospatial Data Using TMC-Pattern .......................................................................... 233 Roland Assam and Thomas Seidl Chapter 13 Geospatial Cyberinfrastructure for Addressing the Big Data Challenges on the Worldwide Sensor Web ...................................... 261 Steve H.L. Liang and Chih-Yuan Huang Chapter 14 OGC Standards and Geospatial Big Data ........................................279 Carl Reed © 2010 Taylor & Francis Group, LLC Preface What is big data? Due to increased interest in this phenomenon, many recent papers and reports have focused on defining and discussing this subject. A review of these publications would point to a consensus about how big data is perceived and explained. It is widely agreed that big data has three specific characteristics: volume, in terms of large-scale data storage and processing; variety, or the availability of data in different types and formats; and velocity, which refers to the fast rate of new data acquisition. These characteristics are widely referred to as the three Vs of big data, and while projects involving datasets that only feature one of these Vs are considered to be big, most datasets from such fields as science, engineering, and social media feature all three Vs. To better understand the recent spurt of interest in big data, I provide here a new and different perspective on it. I argue that the answer to the question of “What is big data?” depends on when the question is asked, what application is involved, and what computing resources are available. In other words, understanding what big data is requires an analysis of time, applications, and resources. In light of this, I categorize the time element into three groups: past (since the introduction of computing several decades ago), near-past (within the last few years), and present (now). One way of looking at the time element is that, in general, big data in the past meant dealing with gigabyte-sized datasets, in the near-past, terabyte-sized datasets, and in the present, petabyte-sized datasets. I also categorize the application element into three groups: scientific (data used for complex modeling, analysis, and simulation), business (data used for business analysis and modeling), and general (data used for general-purpose processing). Finally, I classify the resource element into two groups: advanced com- puting (specialized computing platforms)