An Open-Source Alignment Tool for LFG F-Structures

Total Page:16

File Type:pdf, Size:1020Kb

An Open-Source Alignment Tool for LFG F-Structures f-align: An Open-Source Alignment Tool for LFG f-Structures Anton Bryl Josef van Genabith CNGL, School of Computing, CNGL, School of Computing, Dublin City University Dublin City University [email protected] [email protected] Abstract a node B, the descendant nodes of A are aligned to the descendant nodes of B. The main reason we did Lexical-Functional Grammar (LFG) f- not adapt this algorithm for our task are problems structures (Kaplan and Bresnan, 1982) have related to generalizing this approach. In the general attracted some attention in recent years as an case an f-structure is not a tree, but a directed acyclic intermediate data representation for statistical machine translation. So far, however, there graph (DAG). While the algorithm of Meyers et al. are no alignment tools capable of aligning aligns two trees with n nodes and maximum degree f-structures directly, and plain word alignment d in O(n2d2) time, we see no straightforward way of tools are used for this purpose. In this way adapting it to DAGs without increasing complexity no use is made of the structural information substantially. Another issue not to be ignored is that contained in f-structures. We present the first the output of f-structure parsers (Kaplan et al., 2002; version of a specialized f-structure alignment Cahill et al., 2004) is often fragmented. Unlike in- open-source software tool. correct parses, fragmented parses do carry useful in- formation and their exclusion is undesirable. 1 Introduction The extensive existing work on phrase-structure The use of LFG f-structures in transfer-based sta- tree alignment, starting with the work by Kaji et tistical machine translation naturally requires align- al. (1992) and proceeding to a number of more re- ment techniques as a prerequisite for transfer rule cent approaches (Ambati and Lavie, 2008; Zhechev, induction. The existing research (Riezler and 2009), is also not straightforward to reuse, as LFG Maxwell, 2006; Avramidis and Kuhn, 2009; Gra- f-structures represent sentences in a way quite differ- ham and van Genabith, 2009) uses general-purpose ent from phrase-structure trees, in particular having word-alignment tools such as GIZA++ by Och et al. all internal nodes and not only leaves (potentially) lexicalized; not to mention again that, in general, f- (1999) for aligning the f-structures. The general- 1 purpose alignment tools, however, take no advan- structures are graphs, and not necessarily trees. tage of the dependencies, which are made explicit in Therefore we decided to design a new algorithm. the f-structures, thus actually ignoring a lot of useful For the sake of computational speed, we keep the information readily available in the f-structure anno- structure-related part of the algorithm as simple as tated data. This paper focuses on a way of making possible. As measures of “structural closeness” be- use of precisely this information. tween two nodes we propose to use the best lexi- A relevant algorithm was proposed by Meyers et cal match between their children and the best lexical al. (1998), who represent sentences with trees sim- match between their parents. These measures are, ilar to f-structures. Their algorithm aligns a pair of on one hand, simple and efficient to calculate, al- trees in a recursive bottom-up procedure, ensuring 1Phrase-structure trees are used in another layer of LFG, that, somewhat simplifying, if a node A is aligned to namely in c-structure. lowing a greedy alignment of two DAGs, irrespec- item associated with each node. If an f-structure tive of whether they are fragmented or not, to be node is not lexicalized (e.g. it is a SPEC node with built in O(n2d2) + O(n2 ln n) time, which is com- a DET as its child), it is removed and its children are parable to the complexity of Meyers et al. (1998) linked directly to its parents. algorithm for trees. On the other hand, these mea- sures are sufficient for resolving simple ambiguities, 2.1 Composite Alignment Score such as several occurrences of the same word in the The composite alignment score is defined as a sim- same sentence (a very frequent situation, if we take ple scoring formula which makes some use of struc- function words into account). A similar idea under- tural information and incurs reasonable computa- lies the work by Watanabe et al. (2000) on resolving tional cost. the alignment ambiguities which arise when using a Let Lex(A; A0) be a measure of lexical closeness translation dictionary; their method checks how well of the words associated with the nodes A and A0. the neighbors of a word match the neighbors of each Such measure may either come from a dictionary, as of its candidate counterparts. in the work by Meyers et al. (1998), or, as in our ex- We use a very simple bilexical dictionary unit (a periments, be extracted in some way from the data. plain cooccurrence counter on training bitext) in this In addition to the lexical score, and on the basis version of the software, but nothing prevents using of it, we calculate two supplementary scores: more elaborate dictionary units within the same ar- 1. The score of the best lexical match between the chitecture. The software supports the data formats children of A and A0. If both nodes have children, of two different LFG pasers, namely XLE (Kaplan the score is calculated as follows: et al., 2002) and DCU (Cahill et al., 2004). The evaluation of the tool presented here is in- 0 0 Sc(A; A ) = max Lex(B; B ); (1) evitably limited. We use sentence-aligned English B2C(A); B02C(A0) and German Europarl (Koehn, 2005) and SMUL- TRON 2.0 (Volk et al., 2009) data. To the best of where C(:) is the set of children of the node. If only 0 our knowledge, there is no relevant gold standard, one of the two nodes has children, Sc(A; A ) = 0; if 0 so we produced a small gold standard set ourselves, none of the two nodes has children, Sc(A; A ) = 1. manually node-aligning 20 f-structure pairs created from SMULTRON data, using the word alignments 2. The score of the best lexical match between the 0 contained in SMULTRON for reference. It would be parents of A and A . If both nodes have parents, the also possible to evaluate the method within an SMT score is calculated as follows: system, but the available LFG-based SMT software 0 0 Sp(A; A ) = max Lex(B; B ); (2) is currently not accurate enough for the alignment B2P (A); 0 0 to be reliably evaluated by the translation scores. B 2P (A ) We also manually examined and analyzed some sen- where P (:) is the set of parents of the node (all tences from the output. The evaluation, even though nodes with which the node is connected by incom- limited, supports the validity of the core idea of the ing edges). If only one of the two nodes has parents, 0 method and suggests ways of further improvement. Sp(A; A ) = 0; if none of the two nodes has parents, 0 The paper is organized as follows. In Section 2 we Sp(A; A ) = 1. explain the algorithm; Section 3 is dedicated to the resulting software tool; Section 4 details the evalu- To allow for some differences in structure, an ex- ation; in Section 5 we present our conclusion and tended children matching score can be calculated: outline directions for further improvement. in this case the best match for each child of A is searched for also among the “grandchildren” (chil- 0 2 The Alignment Algorithm dren of children) of A and vice versa. Matches be- tween the grandchildren of A and the grandchildren The algorithm requires each sentence to be repre- of A0 are not considered. In the same way, an ex- sented as a graph (probably disjoint) with a lexical tended parent matching score can be calculated. 0 The composite score is a weighted sum of the lex- where Npair(A; A ) is the number of sentence-pairs ical score and the supplementary scores: in which the lexical entry from the node A occurs A0 0 0 0 on the source side, and the lexical entry from oc- S(A; A ) = wcSc(A; A ) + wpSp(A; A )+ 0 curs on the target side, and N1(A) and N2(A ) are 0 monolingual occurrence counts. Additionally, if A (1 − wc − wp)Lex(A; A ) (3) and A0 are identical strings consisting only of digits, The weights wc and wp may vary, with larger val- then Lex(A; A0) is set to 1:0. ues corresponding to greater reliance on structural 2. For each pair of DAGs in the corpus the align- information. ment procedure is performed as described in Section Let us consider aligning two DAGs with at most 2.2. n lexicalized nodes in each, and with at most d chil- dren and at most d parents for each node. It is easy 3 The Tool: f-align to see that (3) is calculated in O(d2) time, and there- fore the scores for all possible pairs are calculated The algorithm is the basis for version 0.1 of a in O(n2d2) time. Building a greedy alignment us- new open-source tool, f-align, which we present 2 ing the scores takes O(n2 ln n) time, the most costly here. It is written in C++ and can be compiled operation being the sorting procedure; so the overall both under Linux and under Windows. The tool complexity of a greedy alignment of two f-structures currently understands two representations of LFG f- is O(n2d2) + O(n2 ln n).
Recommended publications
  • Technote 18.Rtfd
    Actions Tech. Note 18 Draft 1.0 Localizing Freeway Actions II Freeway 4 Pro supports multiple localization meaning that one copy of Freeway can be used in many languages. The language used is based on the language you have Mac OS set to. If Actions have been created with localization in mind then this will follow through into your Action Interface and even your Action output. Folder Structure To set your Action up for localization you will first need to create a folder to store your Action and the associated files in. This folder will be bundled once all of the necessary files have been added to it, it must also have a particular structure to it. It needs to contain a folder named "Contents", which in turn needs to contain a folder named "Resources". This must contain a folder "Actions" for the Action and further folders for each language that you want to represent the Action. These must take the form "Language.lproj" so for instance "English.lproj", Spanish.lproj", "French.lproj" and so on. There is no minimum number of localization folders you can use but you are limited to the maximum amount of languages included in Mac OS. 1. The folder structure Once the folder structure has been created you will need to make a Property List file (.plist) for the Contents folder. This will contain important information used to tell the OS information such as the developmental region of the bundle, an identifier for the bundle and so on. Creating the pList file The plist can be created one of two ways.
    [Show full text]
  • Fortran Resources 1
    Fortran Resources 1 Ian D Chivers Jane Sleightholme October 17, 2020 1The original basis for this document was Mike Metcalf’s Fortran Information File. The next input came from people on comp-fortran-90. Details of how to subscribe or browse this list can be found in this document. If you have any corrections, additions, suggestions etc to make please contact us and we will endeavor to include your comments in later versions. Thanks to all the people who have contributed. 2 Revision history The most recent version can be found at https://www.fortranplus.co.uk/fortran-information/ and the files section of the comp-fortran-90 list. https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=comp-fortran-90 • October 2020. Added an entry for Nvidia to the compiler section. Nvidia has integrated the PGI compiler suite into their NVIDIA HPC SDK product. Nvidia are also contributing to the LLVM Flang project. • September 2020. Added a computer arithmetic and IEEE formats section. • June 2020. Updated the compiler entry with details of standard conformance. • April 2020. Updated the Fortran Forum entry. Damian Rouson has taken over as editor. • April 2020. Added an entry for Hewlett Packard Enterprise in the compilers section • April 2020. Updated the compiler section to change the status of the Oracle compiler. • April 2020. Added an entry in the links section to the ACM publication Fortran Forum. • March 2020. Updated the Lorenzo entry in the history section. • December 2019. Updated the compiler section to add details of the latest re- lease (7.0) of the Nag compiler, which now supports coarrays and submodules.
    [Show full text]
  • Fira Code: Monospaced Font with Programming Ligatures
    Personal Open source Business Explore Pricing Blog Support This repository Sign in Sign up tonsky / FiraCode Watch 282 Star 9,014 Fork 255 Code Issues 74 Pull requests 1 Projects 0 Wiki Pulse Graphs Monospaced font with programming ligatures 145 commits 1 branch 15 releases 32 contributors OFL-1.1 master New pull request Find file Clone or download lf- committed with tonsky Add mintty to the ligatures-unsupported list (#284) Latest commit d7dbc2d 16 days ago distr Version 1.203 (added `__`, closes #120) a month ago showcases Version 1.203 (added `__`, closes #120) a month ago .gitignore - Removed `!!!` `???` `;;;` `&&&` `|||` `=~` (closes #167) `~~~` `%%%` 3 months ago FiraCode.glyphs Version 1.203 (added `__`, closes #120) a month ago LICENSE version 0.6 a year ago README.md Add mintty to the ligatures-unsupported list (#284) 16 days ago gen_calt.clj Removed `/**` `**/` and disabled ligatures for `/*/` `*/*` sequences … 2 months ago release.sh removed Retina weight from webfonts 3 months ago README.md Fira Code: monospaced font with programming ligatures Problem Programmers use a lot of symbols, often encoded with several characters. For the human brain, sequences like -> , <= or := are single logical tokens, even if they take two or three characters on the screen. Your eye spends a non-zero amount of energy to scan, parse and join multiple characters into a single logical one. Ideally, all programming languages should be designed with full-fledged Unicode symbols for operators, but that’s not the case yet. Solution Download v1.203 · How to install · News & updates Fira Code is an extension of the Fira Mono font containing a set of ligatures for common programming multi-character combinations.
    [Show full text]
  • Python for Linguists a Gentle Introduction
    Programming in Python for Linguists A Gentle Introduction Dirk Hovy USC’s Information Sciences Institute 4676 Admiralty Way, Marina del Rey, CA 90292 [email protected] September 25, 2012 Contents 1 Introduction 2 2 Programming 2 3 Python 4 3.1 Installing Python . .4 3.1.1 Windows . .4 3.1.2 Mac/Linux . .5 3.2 Using Python . .5 3.2.1 Windows . .5 3.2.2 Mac/Linux . .6 4 Program 1: Word Statistics — Assignment, For-Loops, and IO 7 4.1 Assignment . .9 4.2 Loops . 12 4.3 Input/Output . 14 5 Program 2: Word Counts — Dictionaries and Control Structures 15 5.1 Control Structures . 18 6 Program 3: Affixes — More on Lists 20 7 Program 4: Split — Functions and Syntactic Sugar 26 8 Debugging 28 9 Further Reading 30 1 1 Introduction Few, if any, linguistics programs do require an introduction to programming. That is a shame. Pro- gramming is a useful tool for linguists, just as word processing or statistics. It can help you do things faster and at larger scale. When I started programming in my PhD, I realized how useful even a little bit of programming would have been for my masters in linguistics: it would have made many term papers a lot easier (at least the empirical part). I hope this tutorial can be that little bit of programming for you and make your life a lot easier! Most people think of professional hackers when they hear programming, but the level necessary to be useful for linguistic work does not require a special degree.
    [Show full text]
  • Infovox Ivox & Visiovoice
    Cover by Michele Patterson Masthead Publisher Robert L. Pritchett from MPN, LLC Editor-in-Chief Robert L. Pritchett Editor Mike Hubbartt Assistant Editor Harry (doc) Babad Consultant Ted Bade Advertising and Marketing Director Wayne Lefevre Web Master James Meister Public Relations and Merchandizing Mark Howson Contacts Webmaster at macCompanion dot com Feedback at macCompanion dot com Correspondence 1952 Thayer, Drive, Richland, WA 99352 USA 1-509-210-0217 1-888-684-2161 rpritchett at macCompanion dot com The Macintosh Professional Network Team Harry {doc} Babad Ted Bade Matt Brewer (MacFanatic) Jack Campbell (Guest Author) Ken Crockett (Apple News Now) Kale Feelhaver (AppleMacPunk) Dr. Eric Flescher Eddie Hargreaves Jonathan Hoyle III Mark Howson (The Mac Nurse) Mike Hubbartt Daphne Kalfon (I Love My Mac) Wayne Lefevre Daniel MacKenzie Chris Marshall (My Apple Stuff) Dom McAllister Derek Meier James Meister Michele Patterson David Phillips (Guest Author) Robert Pritchett Leland Scott Dennis Sellers (Macsimum News) Gene Steinberg (The Tech Night Owl) Rick Sutcliffe (The Northern Spy) Tim Verpoorten (Surfbits) Julie M. Willingham Application Service Provider for the macCompanion website: http://www.stephousehosting.com Thanks to Daniel Counsell of Realmac Software Development (http://www.realmacsoftware.com), who graced these pages and our website with newer rating stars. Our special thanks to all those who have allowed us to review their products! In addition, thanks to you, our readers, who make this effort possible. Please support
    [Show full text]
  • Mac Text Editor for Coding
    Mac Text Editor For Coding Sometimes Pomeranian Mikey probed her cartography allegorically, but idiomorphic Patrik depolymerizes despotically or stains ethnocentrically. Cereal Morten sick-out advisably while Bartel always overglazing his anticholinergic crunches pregnantly, he equilibrating so eath. Substantiated Tore usually exuviated some Greenwich or bumbles pedagogically. TextEdit The Built-in Text Editor of Mac OS X CityMac. 4 great editors for macOS for editing plain lazy and for coding. Brackets enables you! Many features that allows you could wish to become almost everything from an awesome nintendo switch to. Top 11 Code Editors for Software Developers Bit Blog. We know what you have specific id, so fast feedback on rails without even allow users. Even convert any one. Spaces and broad range of alternatives than simply putting your code after it! It was very good things for example. Great joy to. What may find in pakistan providing payment gateway security news, as close to any query or. How does Start Coding Programming for Beginners Learn Coding. It was significantly as either running on every developer you buy, as well from your html tools for writing for free to add or handling is. Code is free pattern available rate your favorite platform Linux Mac OSX and Windows Categories in power with TextEdit Text Editor Compare. How do I steer to code? Have to inflict pain on this plot drawn so depending on your writing source code. Text but it does not suitable for adjusting multiple computers users. The wheat free if paid text editors for the Mac iMore. After logging in free with google translate into full member of useful is a file in a comment is ideal environment, where their personal taste.
    [Show full text]
  • Cheminformatics and Mass Spectrometry Course
    Welcome! Mass Spectrometry meets Cheminformatics Tobias Kind and Julie Leary UC Davis Course 1: General Introduction Class website: CHE 241 - Spring 2008 - CRN 16583 Slides: http://fiehnlab.ucdavis.edu/staff/kind/Teaching/ PPT is hyperlinked – please change to Slide Show Mode 1 What is ChemInformatics? Chemistry Mathematics Statistics Informatics Chemometrics est. 1975 Cheminformatics est. 1998 2 Who uses Cheminformatics? All parts of chemistry heavily depend on cheminformatics. Life sciences, biochemistry, drug industries use cheminformatics. 20 years ago: 80% in lab – 20% in front of computer Now: 20% in lab - 70% in front of computer (*) Examples: • Organic chemistry – automated reaction planning, Beilstein search • Physical chemistry – modeling of structure properties (boiling points) • Inorganic chemistry – ligand bond interactions • Analytical chemistry – structure elucidation of small compounds • Biochemistry – protein/small molecule interaction networks (*) 10% fixing and installing new programs 3 PhD Motivation for Mass Spectrometry meets ChemInformatics To be a master of spectra you need to be a master of structures in the first place. 765 100 OH N NH O O 50 N 807 747 O 705 O N O HO O O O 676 723 604 265 353 395 455 513 538 636 0 260 310 360 410 460 510 560 610 660 710 760 810 (nist_msms) Vincristine Æ Complex MS data interpretations only possible with software Æ MS data obtained by hyphenated techniques (GC-MS, LC-MS) Æ Mass spectral database search and structure search routinely are used Æ Mass spectrometers deliver multidimensional
    [Show full text]
  • Introduction to Compucell3d
    Introduction to CompuCell3D Maciej Swat Biocomplexity Institute Indiana University Bloomington, IN 47405 USA Developing Multi-Scale, Multicell Developmental and Biomedical Simulations with CompuCell3D PASI Summer School Porto Alegre, Brazil July 8th-July 26th , 2013 IU Team: Susan Hester, Julio Belmonte, Abbas Shirinifard, Ryan Roper, Alin Comanescu, Benjamin Zaitlen, Randy Heiland, Dr. Dragos Amarie, Dr. Scott Gens, Dr. James A. Glazier, Dr. James Sluka, Dr. Sherry Clendenon, Dr. Mitja Hmeljak, Dr. Srividhya Jayaraman, Dr. Gilberto Thomas Support: EPA, NIH/NIGMS, NAKFI, Indiana University What will you learn during the workshop? 1. What is CompuCell3D? 2. Why use CompuCell3D? 3. Demo simulations 4. Glazier-Graner-Hogeweg (GGH) model – review 5. CompuCell3D architecture and terminology 6. XML 101. CC3DML-intro 7. Building Your First CompuCell3D simulation 8. Visualization – CompuCell Player 9. Python scripting in CompuCell3D 10.Building C++ CompuCell3D extension modules – for interested participants What Is CompuCell3D? 1. CompuCell3D is a modeling environment to build, test, run and visualize multiscale, multi-cell GGH-based simulations. 2. CompuCell3D has built-in stanadard scripting language (Python) that allows users to quite easily write extension modules that are essential for building sophisticated biological models. 3. CompuCell3D thus is NOT a hard-coded simulation of a specific biological system. 4. Running CompuCell3D simulations DOES NOT require recompilation. 5. CompuCell3D models described using CompuCell3D XML and Python script(s). 6. CompuCell3D platform is distributed with a GUI front end – CompuCell Player for 2- Dand 3-D visualization and simulation replay. 7. CompuCell3D provides a specialized model editor (Twedit) and initial condition generator (PifTracer). 8. CompuCell3D is a cross platform application that runs on Linux/Unix, Windows, Mac OSX.
    [Show full text]
  • Requirements for Web Developers and Web Commissioners in Ubiquitous
    Requirements for web developers and web commissioners in ubiquitous Web 2.0 design and development Deliverable 3.2 :: Public Keywords: web design and development, Web 2.0, accessibility, disabled web users, older web users Inclusive Future Internet Web Services Requirements for web developers and web commissioners in ubiquitous Web 2.0 design and development I2Web project (Grant no.: 257623) Table of Contents Glossary of abbreviations ........................................................................................................... 6 Executive Summary .................................................................................................................... 7 1 Introduction ...................................................................................................................... 12 1.1 Terminology ............................................................................................................. 13 2 Requirements for Web commissioners ............................................................................ 15 2.1 Introduction .............................................................................................................. 15 2.2 Previous work ........................................................................................................... 15 2.3 Method ..................................................................................................................... 17 2.3.1 Participants ..........................................................................................................
    [Show full text]
  • Von Tänen LP4 2021
    Nummer 2 - 2021 Heads up! If you see a before the page number, that means that* article can be enjoyed without knowing Swedish! Farbror von Tänen funderar Summer wishes from Jag är klar, mentalt alltså. Jag sitter verkligen varje år 4 von Tänen Staff i maj och känner att nu är klar, kan jag få vila snart? * Och ja, det får jag, om snart definieras som några Ordförande har ordet veckor. 6 Jag kommer inte sakna tentor, jag kommer inte sakna Summer puns! presentationer och jag kommer inte sakna att behöva 7 gå upp innan min inre klocka ringt. * Horoskop Å andra sidan, jag kommer sakna von Tänen, jag kom- mer sakna MH och jag kommer sakna alla er. 8 Men nu är jag klar. Klar med von Tänen, snart klar med Fråga Fontänen von Tänen denna termin och snart klar med min tid på LTH. Det 10 gick både snabbare och segare än jag förväntat mig. Programming Languages you Det tar alltid emot lite att lämna ett sammanhang, för * 12 didn't know existed att sedan ta steget mot nya äventyr, men i nästan alla fall så har det blivit bra sen också, om inte till och med En tidsresa genom ArkiFet bättre. 14 Apropå bättre, vi har en massa spännande saker som Pollen survival guide vi i redaktionen vill dela med oss av i detta sommar- * 16 späckade supernummer av von Tänen! Det är denna superhärliga redaktions sista nummer och vi kommer Midsommar! inte göra er besvikna! 17 Gör er redo för en Summer Summary och härliga texter Sommarquiz som kommer göra er superglada och ger er den energi 18 ni behöver för att kämpa er igenom de sista veckorna innan lovet <3 Rotten Småcitrus
    [Show full text]
  • By Peter Borg Tuppis.Com/Smultron
    Smultron by Peter Borg tuppis.com/smultron Introduction Smultron is a free text editor for Mac OS X Leopard 10.5 which is both easy to use and powerful. It is designed to neither confuse newcomers nor disappoint advanced users. It should work perfectly for a whole variety of needs; like web programming, script editing, making a to do list and so on. Smultron has all open documents in a list with beautiful Quick Look icons to your left just like e.g. iTunes so you can easily switch between many documents - you can also choose to display them as tabs if you prefer it that way. It is easy to program with Smultron as it colours the content in different colours depending on what the code does. And you also have many ways to search for words and line numbers to help finding the code you are looking for. You can also split the window in two to display two parts of the same document or to compare two different documents side by side. You can also preview HTML-files directly in Smultron and save snippets of text and insert them simply with a shortcut. And if you don’t want to be disturbed by other applications or the desktop you can let Smultron cover the whole screen to let you concentrate on your work. For the more advanced users Smultron can find all those system files that are normally hidden and it has authenticated open and saves for them. Smultron can also use regular expressions and it can run commands and scripts.
    [Show full text]
  • CERN Talk.Pdf
    Data Science in Industry Ellie Dobson Pivotal 1 Network intrusion detection networks in telecommunications What has industry ever done for us? Theory-driven approach Data-driven approach ‘start with the system and ‘start with the data and work towards the data’ work towards the system’ How can we" make it happen?! Value Reactive What will" happen?! Analytics! Predictive" Why did it" happen?! Analytics! Diagnostic" What" happened?! Analytics! Descriptive Analytics Complexity The value of data over time! drive automated ! low latency actions! ! Big Data! in response to ! events of interest ! Size make insights from a large historical dataset…! … and use them to make split- second decisions on real time data! Fast Data! Speed! year+! year! month! day! s! ms! µs! drive automated ! low latency actions! in response to ! events of interest ! predictive maintenance ! payment fraud !! formula 1 racing! ! Telemetry! Telemetry! Car setup! Telemetry! Car setup! Traffic! True positive rate True Telemetry! Car setup! Traffic! Weather! False positive rate! Big Data Complex Data! Operational Commercial &" Dark Data! Social Data! Data! Public Data! TOOLKIT! 4! Write Code for Big Data! 6! Show Results! In-Database! Hadoop! Visualization! 1! Find Data! 3! Run Code! • SQL! • Pig! • python-matplotlib! • GraphViz! • PL/Python! • Hive! • python-networkx! • Gephi! Platforms! Interfaces! • PL/Java! • Java! • D3.js! • R (ggplot2, lattice, • Greenplum DB! • pgAdminIII! • PL/R! • Spark! • Tableau! shiny)! • Pivotal HD! • psql! • PL/pgSQL! • Excel ! • Hadoop (other)!
    [Show full text]