Array Databases: Concepts, Standards, Implementations
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
So Many Date Formats: Which Should You Use? Kamlesh Patel, Jigar Patel, Dilip Patel, Vaishali Patel Rang Technologies Inc, Piscataway, NJ
Paper – 2463-2017 So Many Date Formats: Which Should You Use? Kamlesh Patel, Jigar Patel, Dilip Patel, Vaishali Patel Rang Technologies Inc, Piscataway, NJ ABSTRACT Nearly all data is associated with some Date information. In many industries, a SAS® programmer comes across various Date formats in data that can be of numeric or character type. An industry like pharmaceuticals requires that dates be shown in ISO 8601 formats due to industry standards. Many SAS programmers commonly use a limited number of Date formats (like YYMMDD10. or DATE9.) with workarounds using scans or substring SAS functions to process date-related data. Another challenge for a programmer can be the source data, which can include either Date-Time or only Date information. How should a programmer convert these dates to the target format? There are many existing Date formats (like E8601 series, B8601 series, and so on) and informants available to convert the source Date to the required Date efficiently. In addition, there are some useful functions that we can explore for deriving timing-related variables. For these methods to be effective, one can use simple tricks to remember those formats. This poster is targeted to those who have a basic understanding of SAS dates. KEYWORDS SAS, date, time, Date-Time, format, informat, ISO 8601, tips, Basic, Extended, Notation APPLICATIONS: SAS v9.2 INTRODUCTION Date and time concept in SAS is very interesting and can be handled very efficiently if SAS programmers know beyond basics of different SAS Date formats and informats along with many different Date functions. There are various formats available to use as per need in SAS. -
Array DBMS in Environmental Science
The 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications 21-23 September, 2017, Bucharest, Romania Array DBMS in Environmental Science: Satellite Sea Surface Height Data in the Cloud Ramon Antonio Rodriges Zalipynis National Research University Higher School of Economics, Moscow, Russia [email protected] Abstract – Nowadays environmental science experiences data model is designed to uniformly represent diverse tremendous growth of raster data: N-dimensional (N-d) raster data types and formats, take into account a dis- arrays coming mainly from numeric simulation and Earth tributed cluster environment, and be independent of the remote sensing. An array DBMS is a tool to streamline raster data processing. However, raster data are usually stored in underlying raster file formats at the same time. Also, new files, not in databases. Moreover, numerous command line distributed algorithms are proposed based on the model. tools exist for processing raster files. This paper describes a Four modern raster data management trends are rele- distributed array DBMS under development that partially vant to this paper: industrial raster data models, formal delegates raster data processing to such tools. Our DBMS offers a new N-d array data model to abstract from the array models and algebras, in situ data processing algo- files and the tools and processes data in a distributed fashion rithms, and raster (array) DBMS. Good survey on the directly in their native file formats. As a case study, popular algorithms is contained in [7]. A recent survey of existing satellite altimetry data were used for the experiments carried array DBMS and similar systems is in [5]. -
ASN.1. Communication Between Heterogeneous Systems – Errata
ASN.1. Communication between heterogeneous systems – Errata These errata concern the June 5, 2000 electronic version of the book "ASN.1. Communication between heterogeneous systems", by Olivier Dubuisson (© 2000). (freely available at http://www.oss.com/asn1/resources/books‐whitepapers‐pubs/asn1‐ books.html#dubuisson.) N.B.: The first page number refers to the PDF electronic copy and is the number printed on each page, not the number at the bottom of the Acrobat Reader window. The page number in parentheses refers to the paper copy published by Morgan Kaufmann. Page 9 (page 8 for the paper copy): ‐ last line, replace "(the number 0 low‐weighted byte" with "(the number 0 low‐weighted bit". Page 19 (page 19 for the paper copy): ‐ on the paper copy, second bullet, replace "the Physical Layer marks up..." with "the Data Link Layer marks up...". ‐ on the electronic copy, second bullet, replace "it is down to the Physical Layer to mark up..." with "it is down to the Data Link Layer to mark up...". Page 25 (page 25 for the paper copy): second paragraph, replace "that all the data exchange" with "that all the data exchanged". Page 32 (page 32 for the paper copy): last paragraph, replace "The order number has at least 12 digits" with "The order number has always 12 digits". Page 34 (page 34 for the paper copy): first paragraph, replace "it will be computed again since..." with "it will be computed again because...". Page 37 for the electronic copy only: in the definition of Quantity, replace "unites" with "units". Page 104 (page 104 for the paper copy): ‐ in rule <34>, delete ", digits"; ‐ at the end of section 8.3.2, add the following paragraph: Symbols such as "{", "[", "[[", etc, are also lexical tokens of the ASN.1 notation. -
Understanding JSON Schema Release 2020-12
Understanding JSON Schema Release 2020-12 Michael Droettboom, et al Space Telescope Science Institute Sep 14, 2021 Contents 1 Conventions used in this book3 1.1 Language-specific notes.........................................3 1.2 Draft-specific notes............................................4 1.3 Examples.................................................4 2 What is a schema? 7 3 The basics 11 3.1 Hello, World!............................................... 11 3.2 The type keyword............................................ 12 3.3 Declaring a JSON Schema........................................ 13 3.4 Declaring a unique identifier....................................... 13 4 JSON Schema Reference 15 4.1 Type-specific keywords......................................... 15 4.2 string................................................... 17 4.2.1 Length.............................................. 19 4.2.2 Regular Expressions...................................... 19 4.2.3 Format.............................................. 20 4.3 Regular Expressions........................................... 22 4.3.1 Example............................................. 23 4.4 Numeric types.............................................. 23 4.4.1 integer.............................................. 24 4.4.2 number............................................. 25 4.4.3 Multiples............................................ 26 4.4.4 Range.............................................. 26 4.5 object................................................... 29 4.5.1 Properties........................................... -
An Array DBMS for Simulation Analysis and ML Models Predictions
SAVIME: An Array DBMS for Simulation Analysis and ML Models Predictions Hermano. L. S. Lustosa1, Anderson C. Silva1, Daniel N. R. da Silva1, Patrick Valduriez2, Fabio Porto1 1 National Laboratory for Scientific Computing, Rio de Janeiro, Brazil {hermano, achaves, dramos, fporto}@lncc.br 2 Inria, University of Montpellier, CNRS, LIRMM, France [email protected] Abstract. Limitations in current DBMSs prevent their wide adoption in scientific applications. In order to make them benefit from DBMS support, enabling declarative data analysis and visualization over scientific data, we present an in-memory array DBMS called SAVIME. In this work we describe the system SAVIME, along with its data model. Our preliminary evaluation show how SAVIME, by using a simple storage definition language (SDL) can outperform the state-of-the-art array database system, SciDB, during the process of data ingestion. We also show that it is possible to use SAVIME as a storage alternative for a numerical solver without affecting its scalability, making it useful for modern ML based applications. Categories and Subject Descriptors: H.2.4 [Database Management]: Systems Keywords: Scientific Data Management, Multidimensional Array, Machine Learning 1. INTRODUCTION Due to the increasing computational power of HPC environments, vast amounts of data are now generated in different research fields, and a relevant part of this data is best represented as array data. Geospatial and temporal data in climate modeling, astronomical images, medical imagery, multimedia and simulation data are naturally represented as multidimensional arrays. To analyze such data, DBMSs offer many advantages, like query languages, which eases data analysis and avoids the need for extensive coding/scripting, and a logical data view that isolates data from the applications that consume it. -
The Gamma Operator for Big Data Summarization on an Array DBMS
JMLR: Workshop and Conference Proceedings 36:88{103, 2014 BIGMINE 2014 The Gamma Operator for Big Data Summarization on an Array DBMS Carlos Ordonez [email protected] Yiqun Zhang [email protected] Wellington Cabrera [email protected] University of Houston, USA Editors: Wei Fan, Albert Bifet, Qiang Yang and Philip Yu Abstract SciDB is a parallel array DBMS that provides multidimensional arrays, a query language and basic ACID properties. In this paper, we introduce a summarization matrix operator that computes sufficient statistics in one pass and in parallel on an array DBMS. Such sufficient statistics benefit a big family of statistical and machine learning models, including PCA, linear regression and variable selection. Experimental evaluation on a parallel cluster shows our matrix operator exhibits linear time complexity and linear speedup. Moreover, our operator is shown to be an order of magnitude faster than SciDB built-in operators, two orders of magnitude faster than SQL queries on a fast column DBMS and even faster than the R package when the data set fits in RAM. We show SciDB operators and the R package fail due to RAM limitations, whereas our operator does not. We also show PCA and linear regression computation is reduced to a few minutes for large data sets. On the other hand, a Gibbs sampler for variable selection can iterate much faster in the array DBMS than in R, exploiting the summarization matrix. Keywords: Array, Matrix, Linear Models, Summarization, Parallel 1. Introduction Row DBMSs remain the best technology for OLTP [Stonebraker et al.(2007)] and col- umn DBMSs are becoming a competitor in OLAP (Business Intelligence) query processing [Stonebraker et al.(2005)] in large data warehouses [Stonebraker et al.(2010)]. -
Representation of Date and Time Data Standard
REPRESENTATION OF DATE AND TIME DATA STANDARD Standard No.: EX000013.1 January 6, 2006 This standard has been produced through the Environmental Data Standards Council (EDSC). Representation of Date and Time Data Standard Std No.: EX000013.1 Foreword The Environmental Data Standards Council (EDSC) identifies, prioritizes and pursues the creation of data standards for those areas where information exchange standards will provide the most value in achieving environmental results. The Council involves Tribes and Tribal Nations, state and federal agencies in the development of the standards and then provides the draft materials for general review. Business groups, non-governmental organizations, and other interested parties may then provide input and comment for Council consideration and standard finalization. Standards are available at http://www.epa.gov/datastandards 1.0 INTRODUCTION This is a format standard which indicates how one displays a particular day within a Gregorian calendar month and specifies an instance of time in the day. Time is expressed in Coordinated Universal Time (UTC). UTC is the official time scale, maintained by the Bureau International des Poids et Mesures (BIPM), and the International Earth Rotation Service (IERS). Examples of the formats follow: a. Date only format When the need is for an expression only of a calendar date, then the complete representation shall be a single numeric data element comprising eight digits, where [YYYY] represents a calendar year, [MM] the ordinal number of a calendar month within the calendar year, and [DD] the ordinal number of a day within the calendar month. Extended format: YYYY-MM-DD EXAMPLE 1985-04-12 b. -
Multidimensional Arrays for Analysing Geoscientific Data
International Journal of Geo-Information Review Multidimensional Arrays for Analysing Geoscientific Data Meng Lu 1,2,*,† , Marius Appel 1 and Edzer Pebesma 1 1 Institute for Geoinformatics, University of Muenster, 48149 Muenster, Germany; [email protected] (M.A.); [email protected] (E.P.) 2 Department of Physical Geography, Faculty of Geoscience, Utrecht University, 3584 CB Utrecht, The Netherlands * Correspondence: [email protected]; Tel.: +31 648901629 † Current address: Vening Meinesz Building A, Princetonlaan 8A, 3584 CB Utrecht, The Netherlands. Received: 27 June 2018; Accepted: 30 July 2018; Published: 3 August 2018 Abstract: Geographic data is growing in size and variety, which calls for big data management tools and analysis methods. To efficiently integrate information from high dimensional data, this paper explicitly proposes array-based modeling. A large portion of Earth observations and model simulations are naturally arrays once digitalized. This paper discusses the challenges in using arrays such as the discretization of continuous spatiotemporal phenomena, irregular dimensions, regridding, high-dimensional data analysis, and large-scale data management. We define categories and applications of typical array operations, compare their implementation in open-source software, and demonstrate dimension reduction and array regridding in study cases using Landsat and MODIS imagery. It turns out that arrays are a convenient data structure for representing and analysing many spatiotemporal phenomena. Although the array model simplifies data organization, array properties like the meaning of grid cell values are rarely being made explicit in practice. Keywords: multidimensional arrays; geoscientific data; data analysis; spatiotemporal modeling 1. Introduction An array is a storage form for a sequence of objects of similar type. -
Flask-JSON Documentation Release 0.3.4
Flask-JSON Documentation Release 0.3.4 Sergey Kozlov Aug 23, 2019 Contents 1 Installation 3 2 Initialization 5 3 Basic usage 7 4 Examples 9 5 Creating JSON responses 11 5.1 Jsonify HTTP errors........................................... 13 6 Encoding values 15 6.1 Iterables................................................. 15 6.2 Time values................................................ 15 6.3 Translation strings............................................ 16 6.4 Custom types............................................... 16 6.5 Encoding order.............................................. 17 7 Errors handing 19 8 JSONP support 21 9 Testing 23 10 Configuration 25 11 API 29 11.1 Low-Level API.............................................. 34 12 Flask-JSON Changelog 37 12.1 0.3.4................................................... 37 12.2 0.3.3................................................... 37 12.3 0.3.2................................................... 37 12.4 0.3.1................................................... 37 12.5 0.3.0................................................... 37 12.6 0.2.0................................................... 38 12.7 0.1.................................................... 38 12.8 0.0.1................................................... 38 i Index 39 ii Flask-JSON Documentation, Release 0.3.4 Flask-JSON is a simple extension that adds better JSON support to Flask application. It helps to handle JSON-based requests and provides the following features: • json_response() and @as_json to generate JSON responses. -
Web Coverage Service (WCS) V1.0.0
Open Geospatial Consortium Inc. Date: 2005-10-14 Reference number of this OGC® Project Document: OGC 05-076 Version: 1.0.0 (Corrigendum) Category: OGC® Implementation Specification Editor: John D. Evans Web Coverage Service (WCS), Version 1.0.0 (Corrigendum) Copyright © 2005 Open Geospatial Consortium. All Rights Reserved To obtain additional rights of use, visit http://www.opengeospatial.org/legal/ Document type: OpenGIS® Implementation Specification Document subtype: Corrigendum Document stage: Approved Document language: English OGC 05-076 Contents Page 1 Scope........................................................................................................................1 2 Conformance............................................................................................................1 3 Normative references...............................................................................................1 4 Terms and definitions ..............................................................................................2 5 Conventions .............................................................................................................4 5.1 Symbols (and abbreviated terms) ............................................................................4 5.2 UML notation ..........................................................................................................4 5.3 XML schema notation..............................................................................................6 6 Basic service elements.............................................................................................6 -
Wcs-Core-2.0.1
Open Geospatial Consortium Approval Date: 2012-07-10 Publication Date: 2012-07-12 External identifier of this OGC® document: http://www.opengis.net/doc/IS/wcs-core-2.0.1 Reference number of this Document: OGC 09-110r4 Version: 2.0.1 Category: OGC® Interface Standard Editor: Peter Baumann OGC® WCS 2.0 Interface Standard- Core: Corrigendum Copyright © 2012 Open Geospatial Consortium. To obtain additional rights of use, visit http://www.opengeospatial.org/legal/. Warning This document is an OGC Member approved corrigendum to existing OGC standard. This document is available on a royalty free, non-discriminatory basis. Recipients of this document are invited to submit, with their comments, notification of any relevant patent rights of which they are aware and to provide supporting documentation. Document type: OGC® Interface Standard Document subtype: Corrigendum Document stage: Approved for public release Document language: English OGC 09-110r4 License Agreement Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement. -
OLAP Cubes Ming-Nu Lee OLAP (Online Analytical Processing)
OLAP Cubes Ming-Nu Lee OLAP (Online Analytical Processing) Performs multidimensional analysis of business data Provides capability for complex calculations, trend analysis, and sophisticated modelling Foundation of many kinds of business applications OLAP Cube Multidimensional database Method of storing data in a multidimensional form Optimized for data warehouse and OLAP apps Structure: Data(measures) categorized by dimensions Dimension as a hierarchy, set of parent-child relationships Not exactly a “cube” Schema 1) Star Schema • Every dimension is represented with only one dimension table • Dimension table has attributes and is joined to the fact table using a foreign key • Dimension table not joined to each other • Dimension tables are not normalized • Easy to understand and provides optimal disk usage Schema 1) Snowflake Schema • Dimension tables are normalized • Uses smaller disk space • Easier to implement • Query performance reduced • Maintenance increased Dimension Members Identifies a data item’s position and description within a dimension Two types of dimension members 1) Detailed members • Lowest level of members • Has no child members • Stores values at members intersections which are stored as fact data 2) Aggregate members • Parent member that is the total or sum of child members • A level above other members Fact Data Data values described by a company’s business activities Exist at Member intersection points Aggregation of transactions integrated from a RDBMS, or result of Hierarchy or cube formula calculation