Oracle Paas and Iaas Universal Credits Service Descriptions

Total Page:16

File Type:pdf, Size:1020Kb

Oracle Paas and Iaas Universal Credits Service Descriptions Oracle PaaS and IaaS Universal Credits Service Descriptions Effective Date: 10-September-2021 Oracle UCM 091021 Page 1 of 202 Table of Contents metrics 6 Oracle PaaS and IaaS Universal Credit 20 1. AVAILABLE SERVICES 20 a. Eligible Oracle PaaS Cloud Services 20 b. Eligible Oracle IaaS Cloud Services 20 c. Additional Services 20 d. Always Free Cloud Services 21 Always Free Cloud Services 22 2. ACTIVATION USAGE AND BILLING 23 a. Introduction 23 i. Annual Universal Credit 24 Overage 24 Replenishment of Account at End of Services Period 25 Additional Services 25 ii. Monthly Universal Credit (subject to Oracle approval) 25 Overage 26 Orders Placed via a Partner 26 Replenishment of Account at End of Services Period 26 iii. Pay as You Go 26 iv. Funded Allocation Model 27 Overage 27 Additional Services 28 Replenishment of Account at End of Services Period 28 3. INCLUDED SERVICES 28 i. Developer Cloud Service 28 ii. Oracle Identity Foundation Cloud Service 29 b. Additional Licenses and Oracle Linux Technical Support 29 c. Oracle Cloud Infrastructure Data Catalog 30 d. Oracle Cloud Infrastructure Data Transfer Disk 30 Your Obligations/Responsibilities and Project Assumptions 30 Your Obligations/Responsibilities 31 Project Assumptions 31 Export 32 Oracle Cloud Infrastructure - Application Migration 32 f. Oracle Cloud Infrastructure Console 33 g. Oracle Cloud Infrastructure Cloud Shell 33 Access and Usage 33 4. SERVICES AVAILABLE VIA THE ORACLE CLOUD MARKETPLACE 33 a. Oracle Cloud Services Delivered via the Oracle Cloud Marketplace 33 b. Third Party Products Available via the Oracle Cloud Marketplace 34 Oracle PaaS and IaaS Cloud Services categories 37 Oracle UCM V091021 Page 2 of 202 Oracle Analytics Cloud Services 37 Description 37 Customer Responsibilities 39 Service Activation, Measurement and Usage 39 Oracle Application Development Cloud Services 41 Descriptions 42 Service Activation, Measurement and Usage 48 Customer Responsibilities 50 BYOL Required Licenses 55 Oracle Content Management Cloud Services 56 Descriptions 57 Service Activation, Measurement and Usage 58 Third Party Web Sites, Platforms and Services for Oracle Web Center 58 Customer Responsibilities 59 BYOL Required Licenses 59 Oracle Data Integration Cloud Services 60 Description 61 Service Activation, Measurement and Usage 64 Third Party Web Sites, Platforms and Services 65 Customer Responsibilities 65 Oracle Data Management Cloud Services 68 Description 80 Service Activation, Measurement and Usage 88 Oracle Enterprise Integration Cloud Services 109 Description 109 Service Activation, Measurement and Usage 115 Third Party Web Sites, Platforms and Services 115 BYOL Required Licenses 116 Oracle Management Cloud Services 116 Description 117 Service Activation, Measurement and Usage 122 Oracle Security and Identity Cloud Services 124 Description 125 Usage Limits 127 Service Activation, Measurement and Usage 131 Third Party Web Sites, Platforms and Services 131 Customer Responsibilities 131 Oracle Compute Cloud Services 133 Descriptions 135 Service Activation, Measurement and Usage 136 Operating System 138 Oracle UCM V091021 Page 3 of 202 Oracle Network Cloud Services 138 Descriptions 142 Your Obligations 144 Oracle Cloud Infrastructure Edge Services 146 Your Obligations 147 Service Activation, Measurement and Usage 148 Oracle Storage Cloud Services 149 Description 151 Service Activation, Measurement and Usage 154 Oracle Data and AI Cloud Services 154 Description 155 Service Activation, Measurement and Usage 156 Third Party Web Sites, Platforms and Services 157 Customer Responsibilities 157 Not Discount Eligible Cloud Services 157 Description 158 Service Activation, Measurement and Usage 159 Oracle Cloud Infrastructure – Oracle Roving Edge Infrastructure 162 Description 162 Minimum Services Period, Service Activation, Measurement and Usage 163 Optional Subscription Cloud Services to Use with Universal Credits 165 Metrics 165 Description 165 Service Activation, Measurement and Usage 165 Free Oracle Cloud Promotion 166 Description 167 Oracle Cloud Policies and Pillar Documentation 168 Free Oracle Cloud Promotion - Universal Credits - Startup Accelerator 168 PARTS RETIRED AS OF 6/1/18 170 Oracle PaaS and IaaS Universal Credit for North America 170 Applicable Part # B88640 170 Eligible Oracle PaaS and IaaS Cloud Services 170 Oracle Cloud Policies and Pillar Documentation 170 Data Center Selection 170 Foundation Services 170 Activation, Usage and Billing 171 CREDIT PERIOD TYPES 171 1. Monthly Universal Credit 171 Overage 171 2. Pay as You Go 172 Orders Placed via a Partner 172 Replenishment of Account at End of Services Period 172 Oracle UCM V091021 Page 4 of 202 Bring Your Own License (“BYOL”) 172 Oracle Analytics Universal Credits for North America 173 Part # B88643 173 Eligible Oracle PaaS and IaaS Cloud Services 173 Oracle Cloud Policies and Pillar Documentation 174 Data Center Selection 174 Foundation Services 174 Activation, Usage and Billing 174 CREDIT PERIOD TYPES 174 1. Monthly Universal Credit 174 Overage 175 2. Pay as You Go 175 Orders Placed via a Partner 175 Replenishment of Account at End of Services Period 176 Bring Your Own License (“BYOL”) 176 Overage 176 RETIRED SKUs 178 Appendix A 192 appendix b 195 Oracle UCM V091021 Page 5 of 202 metrics 1,000,000 API Calls: is defined as 1,000,000 API calls or notifications (or combination thereof) incoming from a client to the Oracle Cloud Infrastructure API Gateway Service. Billing for partial 1,000,000 API calls will be prorated. 1,000,000 Calls per Month: is defined as 1,000,000 API calls or notifications consumed by any application built on the Oracle Cloud Service during a month. 10,000 Audit Records Per Target Per Month: is defined as 10,000 database audit records collected from a specific database target by the Oracle Cloud Service during a month. 1,000 Emails Sent: is defined as 1,000 emails that are accepted by the Email Delivery Cloud Service to receive and parse or to deliver to the end recipient in the billing period, where an email is defined as an electronic mail message, counted on a per recipient basis. A single email with 10 different recipients would be counted as 10 emails (e.g., 140,000 emails accepted, each with 2 different recipients would be charged 280 x $0.085= $23.80). For the purposes of Oracle Cloud Infrastructure - Notifications - Email Delivery Cloud Service, each 64KB portion of delivered data is billed as 1 email. For the purposes of Oracle Cloud Infrastructure – Notifications - Email Delivery Cloud Service, each 2MB portion of delivered data is billed as 1 email. The maximum message size of 10MB will be billed as 5 emails (e.g., 140,000 emails accepted at 10MB size, each with 2 different recipients would be charged 280 x $0.085 x 5= $119.00). 100 Entities Per Hour: is defined as 100 entities where each entity refers to a technical asset being managed or monitored, such as a server, database, application that resides either in the cloud and/or onpremise during a one hour period. Examples of entities include, but are not limited to: Host, Docker Container, SQL Server instance, MySQL instance, Oracle Database instance, WebLogic Server, Tomcat, Oracle Traffic Director Instance, custom created entity, etc. You have the ability to extend existing pre-defined entities and create Your own entirely custom entities. In extending pre-defined entities, a maximum of five (5) additional numeric time series is allowed. For custom entities, a total of 40 numeric time series are allowed (a numeric time series is a measurement of time associated with an entity, such as response time, transaction per second, CPU %, etc.). For the purposes of counting certain entity types, a conversion factor will be applied: One database Oracle Compute Unit (OCPU) will count as 1 entity. One database processor will count as 2 entities. One Application Performance Monitoring Agent (an “APM Agent”) will count as 15 entities. An APM Agent is defined as the data collector on a target application server being monitored, whether in the cloud or on-premises. 1,000 Events Per Hour: is defined as 1,000 events where an event is one distributed tracing span. A distributed tracing span describes the time it takes to complete an individual unit of work Oracle UCM V091021 Page 6 of 202 in the distributed system. Each distributed tracing span encapsulates an operation name, context information, a start and finish timestamp, a set of key value tags that can be used for annotation and key value logs that can be used to capture messages and debug information related to the span. 100,000 Events Per Hour: is defined as 100,000 events where an event is one distributed tracing span. A distributed tracing span describes the time it takes to complete an individual unit of work in the distributed system. Each distributed tracing span encapsulates an operation name, context information, a start and finish timestamp, a set of key value tags that can be used for annotation and key value logs that can be used to capture messages and debug information related to the span. 1,000,000 Function Invocations: is defined as 1,000,000 function invocations, where a function invocation is defined as a request received from a client to execute a single function. Oracle will charge You for the number of 1,000,000 invocation quantities used in a month. Billing for partial 1,000,000 invocation quantities will be prorated. 10,000 Gigabyte Memory-Seconds: is defined as 10,000 gigabyte memory-seconds, where a gigabyte memory-second is defined as the amount of RAM (GB) allocated to a function during its execution (S). Oracle will charge You for the number of 10,000 GB-S quantities used by all functions in a month. Billing for partial 10,000 GB-S quantities will be prorated. 1,000,000 Incoming Requests Per Month: is defined as a collection of 1,000,000 page hits over HTTP/S incoming from a client on the internet or CDN to the Web Application Firewall.
Recommended publications
  • Combined Documents V2
    Outline: Combining Brainstorming Deliverables Table of Contents 1. Introduction and Definition 2. Reference Architecture and Taxonomy 3. Requirements, Gap Analysis, and Suggested Best Practices 4. Future Directions and Roadmap 5. Security and Privacy - 10 Top Challenges 6. Conclusions and General Advice Appendix A. Terminology Glossary Appendix B. Solutions Glossary Appendix C. Use Case Examples Appendix D. Actors and Roles 1. Introduction and Definition The purpose of this outline is to illustrate how some initial brainstorming documents might be pulled together into an integrated deliverable. The outline will follow the diagram below. Section 1 introduces a definition of Big Data. An extended terminology Glossary is found in Appendix A. In section 2, a Reference Architecture diagram is presented followed by a taxonomy describing and extending the elements of the Reference Architecture. Section 3 maps requirements from use case building blocks to the Reference Architecture. A description of the requirement, a gap analysis, and suggested best practice is included with each mapping. In Section 4 future improvements in Big Data technology are mapped to the Reference Architecture. An initial Technology Roadmap is created on the requirements and gap analysis in Section 3 and the expected future improvements from Section 4. Section 5 is a placeholder for an extended discussion of Security and Privacy. Section 6 gives an example of some general advice. The Appendices provide Big Data terminology and solutions glossaries, Use Case Examples, and some possible Actors and Roles. Big Data Definition - “Big Data refers to the new technologies and applications introduced to handle increasing Volumes of data while enhancing data utilization capabilities such as Variety, Velocity, Variability, Veracity, and Value.” The key attribute is the large Volume of data available that forces horizontal scalability of storage and processing and has implications for all the other V-attributes.
    [Show full text]
  • Oracle Nosql Database
    An Oracle White Paper November 2012 Oracle NoSQL Database Oracle NoSQL Database Table of Contents Introduction ........................................................................................ 2 Technical Overview ............................................................................ 4 Data Model ..................................................................................... 4 API ................................................................................................. 5 Create, Remove, Update, and Delete..................................................... 5 Iteration ................................................................................................... 6 Bulk Operation API ................................................................................. 7 Administration .................................................................................... 7 Architecture ........................................................................................ 8 Implementation ................................................................................... 9 Storage Nodes ............................................................................... 9 Client Driver ................................................................................. 10 Performance ..................................................................................... 11 Conclusion ....................................................................................... 12 1 Oracle NoSQL Database Introduction NoSQL databases
    [Show full text]
  • Oracle® Nosql Database Changelog
    Oracle® NoSQL Database Changelog Release 20.1 E91819-17 July 2020 Oracle NoSQL Database Changelog, Release 20.1 E91819-17 Copyright © 2011, 2020, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or “commercial computer software documentation” pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract.
    [Show full text]
  • Oracle Nosql Database and Cisco- Collaboration That Produces Results
    Oracle NoSQL Database and Cisco- Collaboration that produces results 1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. What is Big Data? SOCIAL BLOG SMART METER VOLUME VELOCITY VARIETY VALUE 2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Why Is It Important? US HEALTH CARE US RETAIL MANUFACTURING GLOBAL PERSONAL EUROPE PUBLIC LOCATION DATA SECTOR ADMIN Increase industry Increase net Decrease dev., Increase service Increase industry value per year by margin by assembly costs by provider revenue by value per year by $300 B 60+% –50% $100 B €250 B “In a big data world, a competitor that fails to sufficiently develop its capabilities will be left behind.” 3 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Source: * McKinsey Global Institute: Big Data – The next frontier for innovation, competition and productivity (May 2011) Big Data in Action DECIDE ACQUIRE Make Better Decisions Using Big Data ANALYZE ORGANIZE 4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Integrated Solution Stack DATA VARIETY HDFS HADOOP (MapReduce) In-DB Oracle Loader Mining for HADOOP Oracle NoSQL DB Oracle In-DB Exadata ‘R’ In-DB MapReduce OBIEE Oracle Data Analytics Advanced Oracle Database Integrator INFORMATION DENSITY ACQUIRE ORGANIZE ANALYZE DECIDE 5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Big Data in Action DECIDE ACQUIRE Acquire all available, schema-based and non- relational data ANALYZE ORGANIZE 6 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Acquiring Big Data Challenge Process high volume, low- Application changes With sub-millisecond density information frequently Velocity from various data-sets 7 Copyright © 2011, Oracle and/or its affiliates.
    [Show full text]
  • Oracle Big Data SQL Release 4.1
    ORACLE DATA SHEET Oracle Big Data SQL Release 4.1 The unprecedented explosion in data that can be made useful to enterprises – from the Internet of Things, to the social streams of global customer bases – has created a tremendous opportunity for businesses. However, with the enormous possibilities of Big Data, there can also be enormous complexity. Integrating Big Data systems to leverage these vast new data resources with existing information estates can be challenging. Valuable data may be stored in a system separate from where the majority of business-critical operations take place. Moreover, accessing this data may require significant investment in re-developing code for analysis and reporting - delaying access to data as well as reducing the ultimate value of the data to the business. Oracle Big Data SQL enables organizations to immediately analyze data across Apache Hadoop, Apache Kafka, NoSQL, object stores and Oracle Database leveraging their existing SQL skills, security policies and applications with extreme performance. From simplifying data science efforts to unlocking data lakes, Big Data SQL makes the benefits of Big Data available to the largest group of end users possible. KEY FEATURES Rich SQL Processing on All Data • Seamlessly query data across Oracle Oracle Big Data SQL is a data virtualization innovation from Oracle. It is a new Database, Hadoop, object stores, architecture and solution for SQL and other data APIs (such as REST and Node.js) on Kafka and NoSQL sources disparate data sets, seamlessly integrating data in Apache Hadoop, Apache Kafka, • Runs all Oracle SQL queries without modification – preserving application object stores and a number of NoSQL databases with data stored in Oracle Database.
    [Show full text]
  • Oracle Nosql Database EE Data Sheet
    Oracle NoSQL Database 21.1 Enterprise Edition (EE) Oracle NoSQL Database is a multi-model, multi-region, multi-cloud, active-active KEY BUSINESS BENEFITS database, designed to provide a highly-available, scalable, performant, flexible, High throughput and reliable data management solution to meet today’s most demanding Bounded latency workloads. It can be deployed in on-premise data centers and cloud. It is well- Linear scalability suited for high volume and velocity workloads, like Internet of Things, 360- High availability degree customer view, online contextual advertising, fraud detection, mobile Fast and easy deployment application, user personalization, and online gaming. Developers can use a single Smart topology management application interface to quickly build applications that run in on-premise and Online elastic configuration cloud environments. Multi-region data replication Enterprise grade software Applications send network requests against an Oracle NoSQL data store to and support perform database operations. With multi-region tables, data can be globally distributed and automatically replicated in real-time across different regions. Data can be modeled as fixed-schema tables, documents, key-value pairs, and large objects. Different data models interoperate with each other through a single programming interface. Oracle NoSQL Database is a sharded, shared-nothing system which distributes data uniformly across multiple shards in a NoSQL database cluster, based on the hashed value of the primary keys. An Oracle NoSQL Database data store is a collection of storage nodes, each of which hosts one or more replication nodes. Data is automatically populated across these replication nodes by internal replication mechanisms to ensure high availability and rapid failover in the event of a storage node failure.
    [Show full text]
  • An Intelligent Approach for Handling Complexity by Migrating from Conventional Databases to Big Data
    S S symmetry Article An Intelligent Approach for Handling Complexity by Migrating from Conventional Databases to Big Data Shabana Ramzan 1, Imran Sarwar Bajwa 1,* and Rafaqut Kazmi 2 1 Department of Computer Science & IT, Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan; [email protected] 2 School of Computing, University of Technology Malaysia, Johor 81310, Malaysia; [email protected] * Correspondence: [email protected] Received: 26 October 2018; Accepted: 14 November 2018; Published: 3 December 2018 Abstract: Handling complexity in the data of information systems has emerged into a serious challenge in recent times. The typical relational databases have limited ability to manage the discrete and heterogenous nature of modern data. Additionally, the complexity of data in relational databases is so high that the efficient retrieval of information has become a bottleneck in traditional information systems. On the side, Big Data has emerged into a decent solution for heterogenous and complex data (structured, semi-structured and unstructured data) by providing architectural support to handle complex data and by providing a tool-kit for efficient analysis of complex data. For the organizations that are sticking to relational databases and are facing the challenge of handling complex data, they need to migrate their data to a Big Data solution to get benefits such as horizontal scalability, real-time interaction, handling high volume data, etc. However, such migration from relational databases to Big Data is in itself a challenge due to the complexity of data. In this paper, we introduce a novel approach that handles complexity of automatic transformation of existing relational database (MySQL) into a Big data solution (Oracle NoSQL).
    [Show full text]
  • Oracle Big Data Appliance X8-2
    ORACLE DATA SHEET Oracle Big Data Appliance X8-2 Oracle Big Data Appliance is a flexible, high-performance, secure platform for running diverse workloads on Hadoop, Kafka and Spark. With Oracle Big Data SQL, Oracle Big Data Appliance extends Oracle’s industry-leading implementation of SQL to Hadoop/NoSQL and Kafka systems. By combining the newest technologies from the Hadoop ecosystem and powerful Oracle SQL capabilities together on a single pre-configured platform, Oracle Big Data Appliance is uniquely capable to support rapid development of new Big Data applications and tight integration with existing relational data. Oracle Big Data Appliance X8-2 Oracle Big Data Appliance is an open, multi-purpose engineered system for Hadoop and Spark workloads and streaming data processing. Big Data Appliance is designed to run diverse workloads – from Hadoop-only workloads (Yarn, Spark, Hive etc.) to interactive, all-encompassing interactive SQL queries using Oracle Big Data SQL across Apache Kafka, Hadoop and NoSQL databases. Big Data Appliance, is a Cloudera Certified platform and supports both Cloudera 5.x and Cloudera 6.x. Big Data Appliance provides an open environment for innovation while maintaining tight integration and enterprise-level support. Organizations can deploy external software to KEY FEATURES support new functionality – such as graph analytics, natural language processing and • Massively scalable, open infrastructure to store, analyze and fraud detection. Support for non-Oracle components is delivered by their respective manage big data support channels and not by Oracle. • Industry-leading security, performance and the most Lower TCO and Faster Time to Value comprehensive big data tool set on the market all bundled in an easy to Big Data Appliance provides unique pricing to offer both a lower initial deployment cost deploy appliance as well as a dramatically reduced three and four-year TCO when compared to a Do-It- • Flexible configuration and elastic Yourself Hadoop, Spark or Kafka system.
    [Show full text]
  • A Study Over Importance of Data Cleansing in Data Warehouse Shivangi Rana Er
    Volume 6, Issue 4, April 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study over Importance of Data Cleansing in Data Warehouse Shivangi Rana Er. Gagan Prakesh Negi Kapil Kapoor Research Scholar Assistant Professor Associate Professor CSE Department CSE Department ECE Department Abhilashi Group of Institutions Abhilashi Group of Institutions Abhilashi Group of Institutions (School of Engg.) (School of Engg.) (School of Engg.) Chail Chowk, Mandi, India Chail Chowk, Mandi, India Chail Chowk Mandi, India Abstract: Cleansing data from impurities is an integral part of data processing and maintenance. This has lead to the development of a broad range of methods intending to enhance the accuracy and thereby the usability of existing data. In these days, many organizations tend to use a Data Warehouse to meet the requirements to develop decision-making processes and achieve their goals better and satisfy their customers. It enables Executives to access the information they need in a timely manner for making the right decision for any work. Decision Support System (DSS) is one of the means that applied in data mining. Its robust and better decision depends on an important and conclusive factor called Data Quality (DQ), to obtain a high data quality using Data Scrubbing (DS) which is one of data Extraction Transformation and Loading (ETL) tools. Data Scrubbing is very important and necessary in the Data Warehouse (DW). This paper presents a survey of sources of error in data, data quality challenges and approaches, data cleaning types and techniques and an overview of ETL Process.
    [Show full text]
  • Multimodel Database with Ora
    Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. MULTIMODEL DATABASE WITH ORACLE DATABASE 18C Table of Contents Introduction 1 Multimodel Database Architecture 2 Multimodel Database Features in Oracle 18c 3 JSON in Oracle Database 5 Graph Database and Analytics in Oracle Spatial and Graph 6 Property Graph Features in Oracle Spatial and Graph 6 RDF Semantic Graph Triple Store Features in Oracle Spatial and Graph 7 Spatial Database and Analytics in Oracle Spatial and Graph 7 Sharded Database Model 8 Oracle XML DB 9 Oracle Text 10 Oracle SecureFiles 10 Storage Optimization in SecureFiles 10 SecureFiles Features in Oracle Database 18c 11 Conclusion 12 0 | MULTIMODEL DATABASE WITH ORACLE DATABASE 18C Introduction Over the nearly 40 years in the evolution of commercial relational database management systems, a consistent pattern has emerged as the capabilities, data types, analytics, and data models have been developed and adopted. With each new generation of computing architecture – from centralized mainframe, to client server, to internet computing, to the Cloud – new generations of data management systems have been developed to address new applications, workloads and workflows. Today, the successful operation of corporations, enterprises, and other organizations relies on the management, understanding and efficient use of vast amounts of unstructured Big Data that may come from social media, web content, sensors and machine output, and documents.
    [Show full text]
  • Scalable Storage: the Drive for Web-Scale Data Management
    Scalable Storage: The drive for web-scale data management Bryan Rosander University of Central Florida [email protected] March 28, 2012 Abstract Data-intensive applications have become prevalent in todays information econ- omy. The sheer amount of data stored and utilized by todays web services presents unique challenges in the areas of scalability, security, and availability. This has opened new possibilities in data mining, allowing for more tightly integrated, in- formative services. It has also created new challenges. Traditional, monolithic, relational databases are inherently limited in terms of scalability. This has caused many leading companies to abandon traditional databases in favor of horizontally scalable data stores. This paper will evaluate the state of the art in data stor- age and retrieval, covering the history of the database and moving on to newer database technologies such as Googles Bigtable, Apache Cassandra, and Amazons DynamoDB. 1 Introduction Data storage and retrieval has become a central part of many popular web applications. As the amount of data available increases, the database capacity must scale up to meet it. Traditional methods of scaling up database capacity focus mainly on increasing the computing power of the single server on which the database resides. This strategy has been sufficient for many applications but has become infeasible for those that need to store more data than can be efficiently processed by one machine. Newer database paradigms emphasizing horizontal scalability, the ability to add as many nodes as are necessary and redistribute the data between all active nodes, have been growing in popularity. This increase in scalability does come at a cost.
    [Show full text]
  • Bigdansing: a System for Big Data Cleansing
    BigDansing: A System for Big Data Cleansing Zuhair Khayyaty∗ Ihab F. Ilyas‡∗ Alekh Jindal] Samuel Madden] Mourad Ouzzanix Paolo Papottix Jorge-Arnulfo Quiané-Ruizx Nan Tangx Si Yinx xQatar Computing Research Institute ]CSAIL, MIT yKing Abdullah University of Science and Technology (KAUST) zUniversity of Waterloo [email protected] [email protected] {alekh,madden}@csail.mit.edu {mouzzani,ppapotti,jquianeruiz,ntang,siyin}@qf.org.qa ABSTRACT name zipcode city state salary rate t1 Annie 10001 NY NY 24000 15 Data cleansing approaches have usually focused on detect- t2 Laure 90210 LA CA 25000 10 ing and fixing errors with little attention to scaling to big t3 John 60601 CH IL 40000 25 datasets. This presents a serious impediment since data t4 Mark 90210 SF CA 88000 28 cleansing often involves costly computations such as enu- t5 Robert 60827 CH IL 15000 15 t Mary 90210 LA CA 81000 28 merating pairs of tuples, handling inequality joins, and deal- 6 Table 1: Dataset D with tax data records ing with user-defined functions. In this paper, we present BigDansing, a Big Data Cleansing system to tackle ef- have a lower tax rate; and (r3) two tuples refer to the same ficiency, scalability, and ease-of-use issues in data cleans- individual if they have similar names, and their cities are ing. The system can run on top of most common general inside the same county. We define these rules as follows: purpose data processing platforms, ranging from DBMSs (r1) φF : D(zipcode ! city) to MapReduce-like frameworks. A user-friendly program- (r2) φD : 8t1; t2 2 D; :(t1:rate > t2:rate ^ t1:salary < t2:salary) (r3) φU : 8t1; t2 2 D; :(simF(t1:name; t2:name)^ ming interface allows users to express data quality rules both getCounty(t :city) = getCounty(t :city)) declaratively and procedurally, with no requirement of being 1 2 aware of the underlying distributed platform.
    [Show full text]