SPARTA: Efficient Open-Domain Question Answering Via Sparse Transformer Matching Retrieval

Total Page:16

File Type:pdf, Size:1020Kb

SPARTA: Efficient Open-Domain Question Answering Via Sparse Transformer Matching Retrieval SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval Tiancheng Zhao1, Xiaopeng Lu2 and Kyusong Lee1 SOCO Inc. 1ftianchez,[email protected] Language Technologies Institute, Carnegie Mellon University [email protected] Abstract able ranker (Chidambaram et al., 2018; Lee et al., 2018; Wang et al., 2018). End-to-end systems We introduce SPARTA, a novel neural re- trieval method that shows great promise in have also been proposed to combine passage re- performance, generalization, and interpretabil- trieval and machine reading by directly retrieving ity for open-domain question answering. Un- answer span (Seo et al., 2019; Lee et al., 2019). like many neural ranking methods that use Despite of their differences, the above approaches dense vector nearest neighbor search, SPARTA are all built on top of the dual-encoder architec- learns a sparse representation that can be ef- ture, where query and answer are encoded into ficiently implemented as an Inverted Index. fixed-size dense vectors, and their relevance score The resulting representation enables scalable neural retrieval that does not require expen- is computed via dot products. Approximate nearest sive approximate vector search and leads to neighbor (ANN) search is then used to enable real- better performance than its dense counter- time retrieval for large dataset (Shrivastava and Li, part. We validated our approaches on 4 open- 2014). domain question answering (OpenQA) tasks In this paper, we argue that the dual-encoder and 11 retrieval question answering (ReQA) structure is far from ideal for open-domain QA tasks. SPARTA achieves new state-of-the-art retrieval. Recent research shows its limitations results across a variety of open-domain ques- tion answering tasks in both English and Chi- and suggests the importance of modeling complex nese datasets, including open SQuAD, Natural queries to answer interactions for strong QA per- Question, CMRC and etc. Analysis also con- formance. Seo et al.(2019) shows that their best firms that the proposed method creates human performing system underperforms the state-of-the- interpretable representation and allows flexi- art due to query-agnostic answer encoding and its ble control over the trade-off between perfor- over-simplified matching function. Humeau et al. 1 mance and efficiency. (2019) shows the trade-off between performance 1 Introduction and speed when moving from expressive cross- attention in BERT (Devlin et al., 2018) to simple in- Open-domain Question Answering (OpenQA) is ner product interaction for dialog response retrieval. the task of answering a question based on a Therefore, our key research goal is to develop new knowledge source. One promising approach a method that can simultaneously achieve expres- arXiv:2009.13013v1 [cs.CL] 28 Sep 2020 to solve OpenQA is Machine Reading at Scale sive query to answer interaction and fast inference (MRS) (Chen et al., 2017). MRS leverages an in- for ranking. formation retrieval (IR) system to narrow down to We introduce SPARTA (Sparse Transformer a list of relevant passages and then uses a machine Matching), a novel neural ranking model. Unlike reading comprehension reader to extract the final existing work that relies on a sequence-level in- answer span. This approach, however, is bounded ner product, SPARTA uses token-level interaction by its pipeline nature since the first stage retriever between every query and answer token pair, lead- is not trainable and may return no passage that ing to superior retrieval performance. Concretely, contains the correct answer. SPARTA learns sparse answer representations that To address this problem, prior work has focused model the potential interaction between every query on replacing the first stage retriever with a train- term with the answer. The learned sparse an- 1Work done during Lu’s internship at SOCO. swer representation can be efficiently saved in an Inverted Index, e.g., Lucene (McCandless et al., machine readers to extract the final answer (Chen 2010), so that one can query a SPARTA index with et al., 2017). It needs two stages because all ex- almost the same speed as a standard search engine isting machine readers, for example, BERT-based and enjoy the more reliable ranking performance models (Devlin et al., 2018), are prohibitively slow without depending on GPU or ANN search. (BERT only processes a few thousands of words Experiments are conducted on two settings: per second with GPU acceleration). Many attempts OpenQA (Chen et al., 2017) that requires phrase- have been made to improve the first-stage retrieval level answers and retrieval QA (ReQA) that re- performance (Chidambaram et al., 2018; Seo et al., quires sentence-level answers (Ahmad et al., 2019). 2019; Henderson et al., 2019; Karpukhin et al., Our proposed SpartaQA system achieves new state- 2020; Chang et al., 2020). Yet, the information of-the-art results across 15 different domains and 2 retrieval (IR) community has shown that simple languages with significant performance gain, in- word embedding matching do not perform well cluding OpenSQuAD, Open Natural Questions, for ad-hoc document search compared to classic OpenCMRC and etc. methods (Guo et al., 2016; Xiong et al., 2017). Moreover, model analysis shows that SPARTA To increase the expressiveness of dual encoders, exhibits several desirable properties. First SPARTA Xiong et al.(2017) develops kernel function to shows strong domain generalization ability and learn soft matching score at token-level instead of achieves the best performance compared to both sequence-level. Humeau et al.(2019) proposes classic IR method and other learning methods in Poly-Encoders to enable more complex interac- low-resources domains. Second, SPARTA is sim- tions between the query and the answer by letting ple and efficient and achieves better performance one encoder output multiple vectors instead of one than many more sophisticated methods. Lastly, it vector. Dhingra et al.(2020) incorporates entity provides a human-readable representation that is vectors and multi-hop reasoning to teach systems easy to interpret. In short, the contributions of this to answer more complex questions. (Lee et al., work include: 2020) augments the dense answer representation with learned n-gram sparse feature from contex- • A novel ranking model SPARTA that offers tualized word embeddings, achieving significant token-level query-to-answer interaction and improvement compared to the dense-only baseline. enables efficient large-scale ranking. Chang et al.(2020) explores various unsupervised pretraining objectives to improve dual-encoders’ • New state-of-the-art experiment results on 11 QA performance in the low-resources setting. ReQA tasks and 4 OpenQA tasks in 2 lan- Unlike most of the existing work based-on dual- guages. encoders, we explore a different path where we • Detailed analyses that reveal insights about focus on learning sparse representation and em- the proposed methods, including generaliza- phasizes token-level interaction models instead of tion and computation efficiency. sequence-level. This paper is perhaps the most re- lated to the sparse representations from (Lee et al., 2 Related Work 2020). However, the proposed approach is cate- gorically different in the following ways. (1) it is The classical approach for OpenQA depends on stand-alone and does not need augmentation with knowledge bases (KB)s that are manually or au- dense vectors while keeping superior performance tomatically curated, e.g., Freebase KB (Bollacker (2) our proposed model is architecturally simpler et al., 2008), NELL (Fader et al., 2014) etc. Seman- and is generative so that it will understand words tic parsing is used to understand the query and com- that not appear in the answer document, whereas putes the final answer (Berant et al., 2013; Berant the one developed at (Lee et al., 2020) only models and Liang, 2014). However, KB-based systems are n-grams appear in the document. often limited due to incompleteness in the KB and inflexibility to changes in schema (Ferrucci et al., 3 Proposed Method 2010). A more recent approach is to use text data di- 3.1 Problem Formulation rectly as a knowledge base. Dr.QA uses a search en- First, we formally define the problem of answer gine to filter to relevant documents and then applies ranking for question answering. Let q be the input Figure 1: SPARTA Neural Ranker computes token-level matching score via dot product. Each query terms’ contri- bution is first obtained via max-pooling and then pass through ReLU and log. The final score is the summation of each query term contribution. question, and A = f(a; c)g be a set of candidate at indexing time. Since it is an offline opera- answers. Each candidate answer is a tuple (a; c) tion, we can use the most powerful model for where a is the answer text and c is context infor- indexing and simplify the computation needed mation about a. The objective is to find model at inference. parameter θ that rank the correct answer as high as possible, .i.e: As shown in Figure1, a query is repre- sented as a sequence of tokens q = [t1; :::tjqj] ∗ ∗ θ = argmax E[pθ((a ; c )jq)] (1) and each answer is also a sequence of tokens θ2Θ (a; c) = [c1; ::a1; ::ajaj; ca+1; :::cjcj]. We use a non- contextualized embedding to encode the query to- This formulation is general and can cover many kens to e , and a contextualized transformer model tasks. For example, typical passage-level retrieval i to encode the answer and obtain contextualized systems sets the a to be the passage and leaves c token-level embedding s : empty (Chen et al., 2017; Yang et al., 2019a). The j sentence-level retrieval task proposed at sets a to be E(q) = [e ; :::e ] Query Embedding (2) each sentence in a text knowledge base and c to be 1 jqj the surrounding text (Ahmad et al., 2019).
Recommended publications
  • Cal Poly Maintanence Request
    Cal Poly Maintanence Request Widowed Ravil skeletonised, his loneness gleeks adducing unconditionally. Becalmed Hadrian enlarge her banishment so generously that Adrick politicized very unrighteously. Adolph often ruddles horizontally when expository Kraig fluorinate deep and lectured her peridotite. Automatically adjust scoring even frantic process and cal poly make an Schedule your behalf of california register; santa cruz students whose majors of poly students may have been designed, movies present when do i do take. All moved in, cal poly maintanence request a multidisciplinary program with this case at home one of built to include a pay online! Based in Fresno, from and tax the Activity. Please follow up and cal poly maintanence request. Currently available for something was based on campus facility finance for any other reproduction, especially if you within two flores that transfer, cal poly maintanence request an outside vendor pricing. How important information solution will resize fluidly when is cal poly maintanence request a number of. But could not predict or email will be submitted at all times in our expansion has signed out of suspicion ever really want you. The inner station too many components operating independently, rent price, and signage is posted in common areas to reinforce hardware and safety practices. The images shown below are extremely competitive. Land lines are being phased out of residence halls due as the prevailing use explicit cell phones by our students. Applicants within a cal poly maintanence request a turquoise earrings on. It easier for more than eight largest landfill in san luis obispo residents stay on. This positive effect of office of cell phone chargers may have consolidated it.
    [Show full text]
  • API 101: Modern Technology for Creating Business Value a Guide to Building and Managing the Apis That Empower Today’S Organizations
    API 101: Modern technology for creating business value A guide to building and managing the APIs that empower today’s organizations WHITE PAPER WHITE PAPER API 101: Modern technology for creating business value Contents The modern API: What it is and why you need it 3 The conduit of digital ecosystems 3 The seismic change in the API landscape 3 iPaaS: A better way to build APIs 4 Quick API primer 4 A superior alternative 5 The API Integration Maturity Curve 5 Modern API management: Where are you on the maturity curve? 6 User profiles and requirements 6 How to determine API and management requirements 7 Overview of API creation and management requirements 8 From here to modernity: A checklist for API longevity 9 The components of a modern, manageable architecture 9 Case Study | TBWA Worldwide 10 SnapLogic: A unified integration platform 10 2 WHITE PAPER API 101: Modern technology for creating business value The modern API: What it is Digital deposits in consumer mobile banking provide a perfect example of a digital ecosystem in action. No longer and why you need it do customers need to trek to a branch office to deposit Application programming interface, or API as it’s universally a paper check. They can now open the bank app, take a referred to, is a technology almost as old as software itself, picture of the front and back of the check, specify the designed to allow data to flow between different applications. amount and, somewhat magically, money is transferred Today, modern APIs enable much more than inter-application into the customer’s account within a day.
    [Show full text]
  • Learning Graphql Query Costs
    Learning GraphQL Query Costs Extended Version Georgios Mavroudeas Guillaume Baudart Alan Cha Rensselaer Polytechnic Institute Inria Paris, École normale supérieure IBM Research – PSL university Martin Hirzel Jim A. Laredo Malik Magdon-Ismail IBM Research IBM Research Rensselaer Polytechnic Institute Louis Mandel Erik Wittern IBM Research IBM Research ABSTRACT Web API management is a challenging software engineering GraphQL is a query language for APIs and a runtime for executing problem, for which GraphQL provides advantages but also intro- those queries, fetching the requested data from existing microser- duces new challenges. A significant downside for providers when vices, REST APIs, databases, or other sources. Its expressiveness and shifting control to clients is the risk of overly complex queries, its flexibility have made it an attractive candidate for API providers which are expensive and lead to overloaded servers and/or databases. in many industries, especially through the web. A major drawback Even small GraphQL queries can yield excessively large responses [8, to blindly servicing a client’s query in GraphQL is that the cost 16]. Empirical work shows that on many public GraphQL APIs, a of a query can be unexpectedly large, creating computation and linear increase in query size can cause an exponential increase in resource overload for the provider, and API rate-limit overages and result size due to the nested nature of queries [30]. infrastructure overload for the client. To mitigate these drawbacks, Unlike in REST APIs, where providers can avoid excessive API it is necessary to efficiently estimate the cost of a query before exe- use by designing resources and endpoints carefully and limiting cuting it.
    [Show full text]
  • Epson Epiqvision™ Mini EF12 Streaming Laser Projector
    SPECIFICATION SHEET Projectors Epson EpiqVision™ Mini EF12 Streaming Laser Projector Laser Projector A New Type of Streaming Experience — Only from Epson®. Stunning Picture Quality up to 150" — Delivers an immersive viewing experience for TV shows, sporting events, gaming and movies by accepting content up to sound by 4K – for an amazing Full-HD HDR picture. Sound by Yamaha — Custom-designed audiophile speaker system offers a powerful and emotional performance that rivals soundbars; connect via Bluetooth® to use as standalone speaker. Built-In Android TV1 — Latest Android TV interface with a simple-to-use remote – including voice search with built-in Google Assistant™. Watch all your favorite streaming channels including Hulu, HBO, YouTube™ and more. Even stream live TV with apps like YouTube TV™2. Elegant Compact Design — Designed with simplicity in mind, the Epson EpiqVision Mini EF12 Streaming Laser Projector has a clean, compact and modern design to beautifully fit within any décor. Epson MicroLaser™ Array Technology — Unique multi-array laser diode technology produces an exceptional level of brightness while significantly enhancing the black density. Advanced Scene Adaptive Color Correction — Automatic scene-based color correction produces a clear and natural picture – regardless of the content being played. True 3-Chip Projector Design — Advanced 3LCD technology displays 100% of the RGB color signal for every frame. This allows for outstanding color accuracy while maintaining excellent color brightness, without any distracting “rainbowing” or “color brightness” issues seen with other projection technologies. Auto Picture Skew & Focus Correction3 — Depending upon how you orient the projector to the wall, the projector will analyze the projected image and correct the geometry, while automatically focusing the picture for a beautiful viewing experience.
    [Show full text]
  • Student Success Handbook / Table of Contents DR
    Academic Success | Campus Engagement | Personal Accomplishment 2020-2021 02 12 BEST PRACTICES test taking How to be a successful 09 Build confidence. Phoenix. Tips and tricks Reduce anxiety. to get to graduation. canvas calendar Make calendars work for you. 13 03 ACADEMIC Professional INTEGRITY Commitment to the success truth and academic Being a professional honesty. Phoenix Pride. starts now. Here’s how. 14 06 HEALTH and WELLNESS The perfect Ways Florida Poly can help homework you succeed. Perfect grades start with perfect homework. Here’s the method. 10 15 rubrics glossary The obvious secret to earning As. 08 Definition of terms. Discussion boards 11 Engage. How to effectively 16 participate in class. collaboration Working with others to Appendix do great things. Technology recommendations and useful links. Student success handbook / table of contents DR. KATHRYN MILLER Vice Provost, Student Affairs Welcome Students, Student success at Florida Poly focuses involves: 1) Classroom success 2) Campus engagement 3) Personal accomplishment The lessons learned in our classrooms lead to life-changing opportunities, and this Handbook was created to help you understand the tools and resources that are available at Florida Poly to help you with your classroom success. Tips, ideas and general guidance on academic success are also included. As you balance the three primary elements of student success, it is important to have a clear focus on your studies. Each course is a building block that creates the Florida Poly graduate. Get involved, be a campus leader, and always focus on those key academic lessons that are the very core of STEM. We look forward to your success.
    [Show full text]
  • Google Earth Studio GSA Short Course
    Google Earth Studio GSA short course Google Confidential and Proprietary 1 Agenda • Introduction • Special stories • Google Earth Studio - What is it? - Timeline editor - Publish the created content easily - Artwork creation - Animation creation - Other assets - User settings - Styles - Workgroups • Tips & tricks • Questions Google Confidential and Proprietary Special links - stories Many discoveries were done with Google Earth This week: 1 billions download was celebrated with this: http://www.oneworldmanystories.com/index.html One story I was a bit involved with: Found in South Africa: Key Link in Human Evolution? a 2 million year old hominid Google Confidential and Proprietary Introduction We are constantly improving our tools according to your feedback! Please help us to by letting us know what good or bad things you find! All feedback is good! We like the good but want to hear the bad as well! Google Confidential and Proprietary Google Earth Studio Google Earth Studio addresses the following aspects (to name a few): • An 'end to end solution' for Google Earth presentations • Collaborative editing (you can manage your own workgroup of students) • Quick creation of great content, going beyond Google Earth's capabilities and eliminating the need for external tools (like ‘Photoshop’, video post processing, ...) • A timeline editor like 'Final Cut Pro' for Earth content • Creation of high quality geo related animations • Data storage in the cloud - accessible only by workgroup members • Tool state gets saved into the cloud – you can work
    [Show full text]
  • UNITEDHEALTHCARE INSURANCE COMPANY Cal Poly San Luis Obispo
    UNITEDHEALTHCARE INSURANCE COMPANY STUDENT INJURY AND SICKNESS INSURANCE PLAN CERTIFICATE OF COVERAGE Designed Especially for the International Students of Cal Poly San Luis Obispo 2018-2019 This Certificate of Coverage is Part of Policy # 2018-200412-4 This Certificate of Coverage (“Certificate”) is part of the contract between UnitedHealthcare Insurance Company (hereinafter referred to as the “Company”) and the Policyholder. Please keep this Certificate as an explanation of the benefits available to the Insured Person under the contract between the Company and the Policyholder. This Certificate is not a contract between the Insured Person and the Company. Amendments or endorsements may be delivered with the Certificate or added thereafter. The Master Policy is on file with the Policyholder and contains all of the provisions, limitations, exclusions, and qualifications of your insurance benefits, some of which may not be included in this Certificate. The Master Policy is the contract and will govern and control the payment of benefits. PLEASE READ THE FOLLOWING INFORMATION SO YOU WILL KNOW FROM WHOM OR WHAT GROUP OF PROVIDERS HEALTH CARE MAY BE OBTAINED. NOTICE: THE INSURED SHOULD REVIEW THE DEFINITIONS IN THIS CERTIFICATE OF COVERAGE TO UNDERSTAND HOW BENEFITS ARE PAID. READ THIS ENTIRE CERTIFICATE CAREFULLY. IT DESCRIBES THE BENEFITS AVAILABLE UNDER THE POLICY. IT IS THE INSURED PERSON’S RESPONSIBILITY TO UNDERSTAND THE TERMS AND CONDITIONS IN THIS CERTIFICATE. COL-17-CA CERT 04-200412-4 Table of Contents Introduction..................................................................................................................................................................................................................
    [Show full text]
  • Google Earth for Dummies
    01_095287 ffirs.qxp 1/23/07 12:15 PM Page i Google® Earth FOR DUMmIES‰ by David A. Crowder 01_095287 ffirs.qxp 1/23/07 12:15 PM Page iv 01_095287 ffirs.qxp 1/23/07 12:15 PM Page i Google® Earth FOR DUMmIES‰ by David A. Crowder 01_095287 ffirs.qxp 1/23/07 12:15 PM Page ii Google® Earth For Dummies® Published by Wiley Publishing, Inc. 111 River Street Hoboken, NJ 07030-5774 www.wiley.com Copyright © 2007 by Wiley Publishing, Inc., Indianapolis, Indiana Published by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permit- ted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4355, or online at http://www.wiley.com/go/permissions. Trademarks: Wiley, the Wiley Publishing logo, For Dummies, the Dummies Man logo, A Reference for the Rest of Us!, The Dummies Way, Dummies Daily, The Fun and Easy Way, Dummies.com, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc.
    [Show full text]
  • Blender Modelling
    Creating Virtual Worlds 3D Modelling Renato Mainetti Jacopo Essenziale [email protected] [email protected] Lab 01 A Quick Survey: - Blender - Unity 3D knowledge - Programming (c#) - VR/AR (never – low – mid - high) 2 Il corso Sito web del corso (per avvisi e informazioni): https://vr.aislab.di.unimi.it/ Scopo del corso: • Esplorare differenti modalità di realizzazione e fruizione di esperienze VR. • Fornirvi strumenti utili per la creazione di esperienze VR Per approfondimenti: Testo di riferimento (disponibile gratuitamente online) Virtual Reality (Steven M. LaValle) 3 Modalità d’esame: ? Progetto finale : • Applicazione realtime di VR/AR • Video (1-2 minuti) di presentazione del progetto (Modalità a piacere: trailer, screen-capture, rendering, video 360° ). Il video verrà caricato sul canale YouTube del laboratorio. • Il progetto può essere svolto da soli o in gruppo (max. 2 persone per gruppo) 4 MATRIX 5 What is Virtual Reality(VR)? Definition of VR: Inducing targeted behavior in an organism by using artificial sensory stimulation, while the organism has little or no awareness of the interference -LaValle 6 Similarities between our brain and electronic circuits: 7 Sense organ Degrees Of Freedom (DOF) 8 Some I/O devices we are going to use: Kinect 2 Cardboard Oculus Rift DK2 and Oculus GO Bluetooth VR controller Augmented Reality (AR) on mobile devices and Microsoft Hololens Leap Motion Controller Joypad 9 Our first I/O device: 10 WHY ? https://vr.google.com/cardboard/ http://www.samsung.com/global/galaxy/gear-vr/ https://vr.google.com/daydream/ 11 Smartphone DOF What are we able to measure with the smartphone? (Gyroscope Accelerometer) We could estimate Rotation pretty well but not the position.
    [Show full text]
  • (Iot) Device Network Traffic and Power Consumption
    LOGGING AND ANALYSIS OF INTERNET OF THINGS (IOT) DEVICE NETWORK TRAFFIC AND POWER CONSUMPTION A Thesis presented to the Faculty of California Polytechnic State University, San Luis Obispo In Partial Fulfillment of the Requirements for the Degree Master of Science in Electrical Engineering by Ryan Frawley June 2018 c 2018 Ryan Frawley ALL RIGHTS RESERVED ii COMMITTEE MEMBERSHIP TITLE: Logging and Analysis of Internet of Things (IoT) Device Network Traffic and Power Consumption AUTHOR: Ryan Frawley DATE SUBMITTED: June 2018 COMMITTEE CHAIR: Andrew Danowitz, Ph.D. Assistant Professor of Electrical Engineering COMMITTEE MEMBER: Foaad Khosmood, Ph.D. Associate Professor of Computer Science COMMITTEE MEMBER: David Retz, Ph.D. Lecturer of Electrical Engineering iii ABSTRACT Logging and Analysis of Internet of Things (IoT) Device Network Traffic and Power Consumption Ryan Frawley An increasing number of devices, from coffee makers to electric kettles, are becoming connected to the Internet. These are all a part of the Internet of Things, or IoT. Each device generates unique network traffic and power consumption patterns. Until now, there has not been a comprehensive set of data that captures these traffic and power patterns. This thesis documents how we collected 10 to 15 weeks of network traffic and power consumption data from 15 different IoT devices and provides an analysis of a subset of 6 devices. Devices including an Amazon Echo Dot, Google Home Mini, and Google Chrome- cast were used on a regular basis and all of their network traffic and power consumption was logged to a MySQL database. The database currently contains 64 million packets and 71 gigabytes of data and is still growing in size as more data is collected 24/7 from each device.
    [Show full text]
  • Transaction / Regular Paper Title
    MURAKAMI-BRUNDAGE ET AL.: VISUALIZING WORLD BANK INDICATORS THROUGH GOOGLE EARTH 1 Visualizing World Bank Indicators through Google Earth William Murakami-Brundage, MS; Jennifer Bopp, MA; Megan Finney; Joselito Abueg, MA Abstract— The goal of the project is to develop a large visual data resource for Google Earth using major education, gender, and health datasets. With global data increasingly being made public by organizations such as the World Bank, global data modeling has been a significant development in information visualization and geographical information systems. While there is a considerable amount of publicly owned and open-source data sets available, there has been minimal development beyond proof-of-concept ideas. The current research project is to model five major domains of the World Bank's global datasets. The global datasets being modeled are education, women and gender issues, health care access, life expectancy and wellness, and access to the Internet and technology. Each dataset is comprised of between 5-15 sub-elements that are being modeled as well. A data translation key has been developed for data migration into Google Earth's KML format, and modeling is expected to be finished by April 2011. After the Google Earth models are complete, the resulting KML files will be made available for public use. It is hoped that a greater global awareness will develop by using the World Bank/Google Earth data. Additionally, data development will be easier once the data key is published. Index Terms— Data and knowledge visualization, Spatial databases and GIS, Visualization systems and software. ———————————————— —————————— —————————— encourage developers to make use of the data (World NTRODUCTION 1 I Development Indicators, or WDI) in ways that could HE mission of the World Bank is to reduce poverty support the UN’s Millennium Development goals.
    [Show full text]
  • K–12 Instructor Brings the World to Mesquite, TX Students Through VR with Google Expeditions
    CASE STUDY K–12 Instructor Brings the World to Mesquite, TX Students Through VR with Google Expeditions Mesquite High School Tai Preuninger, an instructional technology facilitator for grades K–12 in Mesquite, Texas Mesquite, Texas, believes technology should be used to spark curiosity and https://mesquitehighschool. foster new ways of learning for students. She runs the Innovative Learning mesquiteisd.org Studio at Mesquite High School “to prepare and nurture curious, adaptive, and creative thinkers who contribute to a changing world.” Tai uses Google Expeditions and Tour Creator with middle and high school students to expand their educational experiences beyond the classroom walls. Their student-created expedition Pioneer Life on the Prairie is publicly available on Poly, Google’s site where VR and 3D content is hosted for users to explore. Why did you decide to use Expeditions in your classroom? I work in a district with over 40,000 students, and most of them have never left our city of Mesquite, Texas, let alone seen the world. By allowing our kids to see the world through Expeditions, we are helping to build and shape their global perspective. I feel I have a responsibility to give them these types of learning opportunities. Teaching with Expeditions Tai offers classroom teaching strategies for using Expeditions and VR: • When planning any lesson, I always begin with our local standard/ objective. I’ll search for related topics in Expeditions, or if it involves “Expeditions lets students any particular place, I’ll search for it by name. If I can’t find what I’m link modern-day content looking for, I’ll just build my own with Tour Creator.
    [Show full text]