Apache Derby Performance

Total Page:16

File Type:pdf, Size:1020Kb

Apache Derby Performance Apache Derby Performance Olav Sandstå, Dyre Tjeldvoll, Knut Anders Hatlen Database Technology Group Sun Microsystems Overview • Derby Architecture • Performance Evaluation of Derby • Performance Tips • Comparing Derby, MySQL and PostgreSQL Derby Architecture: Embedded Java Virtual Machine ● Parsing of SQL Application ● Compiling ● Optimization ● Byte-code gen. ● Execution JDBC SQL ● Database buffer Access ● Log generation ● Access path ● Log execution ● Indexes Storage Database buffer ● Asynchronous write of data ● Synchronous (checkpoint) write of log ● Synchronous records Log Data read of data Derby Architecture: Client-Server Application Java Virtual Machine JDBC client Network Server Java Virtual Machine JDBC SQL Access Storage Database buffer Log Data Performance Evaluation of Derby Overview: • Evaluation of disk and file system configurations: > Disk write cache > Database and transaction log on separate disks • Database configurations: > Size of database buffer • Comparing Embedded and Client-Server What Is “Performance”? • How to measure database performance? > Throughput > Response time > Scalability • Which configuration? > Out-of-the-box configuration > Carefully tuned • How to compare database systems with different properties and different tuning possibilities? Test Configuration Load clients: Test platform: 1. TPC-B like load: • Sun JVM 1.5.0 > 3 UPDATES • Solaris 10 x86 > 1 SELECT > 1 INSERT • 2 x 2.4 GHz AMD Opteron • 2 GB RAM 2. Single-record SELECT: • 2 SCSI disks > Select of one record on • UFS file system primary key Effect of Write Cache on Disk TPC-B like load: WARNING: The write cache reduces probability of successful recovery after power failure Separate Data and Log Devices Log device: > Sequential write of transaction log > Synchronous as part of commit > Group commit Data device: > Data in database buffer regularly written do disk as part of checkpoint > Data read from disk on demand Tip: use separate disks for data and log device Database Buffer Example: • Cache of frequently used single-record select: data pages in memory • Cache-miss leads to a read from disk (or disk cache) • Size: > default 4 MB > derby.storage. pageCacheSize Performance tip: • increase the size of the database buffer to get frequently accessed data in memory Comparing Embedded and Client-Server • TPC-B like load: Comparing Embedded and Client-Server • Single-record SELECT: Comparing Embedded and Client-Server: CPU Usage CPU usage per transaction [ms] 1.4 • Network Server add 30%- 1.3 1.2 50% to the CPU usage 1.1 1 User CPU • System CPU usage: 0.9 System CPU 0.8 > Message sending and 0.7 receiving 0.6 0.5 0.4 • User CPU usage: 0.3 > 0.2 Message parsing 0.1 > Character set conversions 0 TPC-B TPC-B Select Select Embedde Client- Embedde Client- d Server d Server Performance Tips Performance Tips 1: 2.75 2.5 Use Prepared Statements 2.25 ) x 2 System time t / s User time 1.75 m ( e 1.5 • Compilation of SQL g a s u 1.25 statements is expensive: U P 1 C > Derby generates Java byte 0.75 0.5 code and loads generated 0.25 classes 0 PreparedStatement Statement • Prepared statements eliminate this cost Tip: • USE prepared statements • and REUSE them Performance Tips 2: Avoid Table Scans • Use indexes to optimize much used access paths: > CREATE INDEX.... • Check the query plan: > derby.language.logQueryPlan=true • Use RunTimeStatistics: > SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1) > SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS() Tip: • Use indexes • Use Derby's tools to understand the query execution Comparing the Performance of MySQL, PostgreSQL and Derby Performance Evaluation: MySQL, PostgreSQL and Derby Evaluated performance of: • MySQL/InnoDB (version 5.0.10) • PostgreSQL (version 8.0.3) • Derby Embedded (version 10.1.1.0) • Derby Client-Server Database Configurations Configurations: Databases: • “Out-of-box” performance 1. Main-memory database: • No tuning, except: > 10 MB user data > size of database buffer > 64 MB database buffer > database and transaction log on separate disks 2. Disk database: ➔ No Benchmark > 10 GB user data > 64 MB database buffer Load: > 1-100 concurrent clients Throughput: TPC-B like load Main-memory database (10 MB): Disk-based database (10 GB): Throughput: Single-record Select Main-memory database (10 MB): Disk-based database (10GB): Observations • Derby outperforms MySQL on disk-based databases > Derby has 100% higher throughput than MySQL • MySQL performs better on small main-memory databases > Update-intensive load: Derby has 20-50% lower throughput > Read-intensive load: Derby has 50% lower throughput • PostgreSQL performs best on read-only databases, and has lowest throughput on update-intensive databases Why? CPU Usage Main-Memory Database TPC-B SELECT 1.5 0.325 1.4 0.3 1.3 System time 0.275 1.2 User time 0.25 System time 1.1 n User time o i 0.225 t n 1 c o i 0.9 0.2 t sa c 0.8 n a 0.175 a s r 0.7 t n / 0.15 s a r 0.6 t m / 0.125 0.5 ms 0.4 0.1 0.3 0.075 0.2 0.05 0.1 0.025 0 0 Derby Derby MySQL PostgreSQ embedded client/serv L Derby Derby MySQL PostgreSQL er embedded client/server CPU Usage Disk-Based Database TPC-B SELECT CPU usage per transaction CPU usage per transaction 6 1.5 5.5 1.4 1.3 5 System time System time 1.2 4.5 User time User time n n 1.1 o o i i 4 t t 1 c c 3.5 a 0.9 a s s n 0.8 a n 3 r t a / 0.7 r s t 2.5 / 0.6 m s 2 0.5 m 1.5 0.4 0.3 1 0.2 0.5 0.1 0 0 Derby Derby MySQL PostgreSQL Derby Derby MySQL PostgreSQL embedded client/server embedded client/server Network I/O TPC-B SELECT 9 4.5 8.5 8 4 7.5 n 7 o n 3.5 i t o 6.5 i c t a 6 c s 3 n 5.5 sa a r an t 5 2.5 r t r Incoming packets Incoming packets e 4.5 r p Outgoing packets e Outgoing packets 2 4 p s t 3.5 s e t k e c 3 1.5 k a 2.5 ac P 2 P 1 1.5 1 0.5 0.5 0 0 Derby MySQL PostgreSQL Derby MySQL PostgreSQL client/server client/server Observation: • Derby has two extra round-trips for SELECT operations Disk I/O: Main-Memory Database (TPC-B) Write Operations Write Volume 2.6 15 14 n 2.4 o i t 13 2.2 c a 12 n s 2 o i n t 11 a c r 1.8 t a Data 10 r s e 1.6 Log n 9 p a r t s 1.4 8 r n e io 1.2 7 p t a r 1 B 6 e K p 5 o 0.8 4 e it 0.6 r 3 W 0.4 2 0.2 1 0 0 Derby MySQL PostgreSQL Derby MySQL PostgreSQL Disk I/O: Disk-Based Database (TPC-B) I/O Operations Data Volume 15 1400 14 1300 13 1200 n o 12 i 1100 t n c 11 o 1000 i a t s 10 c 900 n a a 9 s r n 800 t 8 a r r t 700 e 7 r p e 600 s 6 p n 500 B o 5 i K t 400 a 4 r e 3 300 p 200 O 2 1 100 0 0 Derby MySQL PostgreSQL Derby MySQL PostgreSQL Log Data write Data read Disk I/O: Disk-Based Database (SELECT) I/O Operations Data Volume 2.75 1400 1300 2.5 1200 n 2.25 o 1100 ti n c 2 o 1000 i a t s c 900 a n 1.75 s a r n 800 t 1.5 a r r t 700 e r p 1.25 e 600 p ns 1 500 B o ti K 400 a 0.75 r e 300 p 0.5 O 200 0.25 100 0 0 Derby MySQL PostgreSQL Derby MySQL PostgreSQL Data write Data read Conclusions: Resource Usage • MySQL performs better than Derby when > The database is small and fits in the database buffer > Throughput becomes CPU-bound > Derby uses more CPU and sends more messages over the net • Derby performs better than MySQL when > The database is large and does not fit in the database buffer > Throughput becomes I/O-bound • PosgreSQL performs best with read-only load > Update-intensive load results in much disk I/O Performance Improvement Activities CPU usage: Network IO: • High CPU usage during • Reduce number of initialization of database messages sent and buffer (DONE) received • Reduce CPU usage in Disk IO: network server (message • Improve fairness of parsing and generation) scheduling of I/O requests • Improve use of hash (DONE) tables and reduce lock • Allow concurrent read contention operations on data files Performance tip: • Next version of Derby will have even better performance Summary • Disk Write Cache: > Improves the performance, but..... > WARNING: durability and the probability of successful recovery after a power failure are reduced • Data and log devices: > Keep the transaction log and database on a separate devices • Database buffer: > Keep frequently accessed data in the database buffer • Client-Server versus Embedded: > Throughput: down by 5% (TPC-B) to 30% (SELECT) • Use: > Prepared statements and indexes Questions? Olav Sandstå, Dyre Tjeldvoll, Knut Anders Hatlen [email protected], [email protected], [email protected].
Recommended publications
  • Getting Started with Derby Version 10.14
    Getting Started with Derby Version 10.14 Derby Document build: April 6, 2018, 6:13:12 PM (PDT) Version 10.14 Getting Started with Derby Contents Copyright................................................................................................................................3 License................................................................................................................................... 4 Introduction to Derby........................................................................................................... 8 Deployment options...................................................................................................8 System requirements.................................................................................................8 Product documentation for Derby........................................................................... 9 Installing and configuring Derby.......................................................................................10 Installing Derby........................................................................................................ 10 Setting up your environment..................................................................................10 Choosing a method to run the Derby tools and startup utilities...........................11 Setting the environment variables.......................................................................12 Syntax for the derbyrun.jar file............................................................................13
    [Show full text]
  • Operational Database Offload
    Operational Database Offload Partner Brief Facing increased data growth and cost pressures, scale‐out technology has become very popular as more businesses become frustrated with their costly “Our partnership with Hortonworks is able to scale‐up RDBMSs. With Hadoop emerging as the de facto scale‐out file system, a deliver to our clients 5‐10x faster performance Hadoop RDBMS is a natural choice to replace traditional relational databases and over 75% reduction in TCO over traditional scale‐up databases. With Splice like Oracle and IBM DB2, which struggle with cost or scaling issues. Machine’s SQL‐based transactional processing Designed to meet the needs of real‐time, data‐driven businesses, Splice engine, our clients are able to migrate their legacy database applications without Machine is the only Hadoop RDBMS. Splice Machine offers an ANSI‐SQL application rewrites” database with support for ACID transactions on the distributed computing Monte Zweben infrastructure of Hadoop. Like Oracle and MySQL, it is an operational database Chief Executive Office that can handle operational (OLTP) or analytical (OLAP) workloads, while scaling Splice Machine out cost‐effectively from terabytes to petabytes on inexpensive commodity servers. Splice Machine, a technology partner with Hortonworks, chose HBase and Hadoop as its scale‐out architecture because of their proven auto‐sharding, replication, and failover technology. This partnership now allows businesses the best of all worlds: a standard SQL database, the proven scale‐out of Hadoop, and the ability to leverage current staff, operations, and applications without specialized hardware or significant application modifications. What Business Challenges are Solved? Leverage Existing SQL Tools Cost Effective Scaling Real‐Time Updates Leveraging the proven SQL processing of Splice Machine leverages the proven Splice Machine provides full ACID Apache Derby, Splice Machine is a true ANSI auto‐sharding of HBase to scale with transactions across rows and tables by using SQL database on Hadoop.
    [Show full text]
  • Tuning Derby Version 10.14
    Tuning Derby Version 10.14 Derby Document build: April 6, 2018, 6:14:42 PM (PDT) Version 10.14 Tuning Derby Contents Copyright................................................................................................................................4 License................................................................................................................................... 5 About this guide....................................................................................................................9 Purpose of this guide................................................................................................9 Audience..................................................................................................................... 9 How this guide is organized.....................................................................................9 Performance tips and tricks.............................................................................................. 10 Use prepared statements with substitution parameters......................................10 Create indexes, and make sure they are being used...........................................10 Ensure that table statistics are accurate.............................................................. 10 Increase the size of the data page cache............................................................. 11 Tune the size of database pages...........................................................................11 Performance trade-offs of large pages..............................................................
    [Show full text]
  • Unravel Data Systems Version 4.5
    UNRAVEL DATA SYSTEMS VERSION 4.5 Component name Component version name License names jQuery 1.8.2 MIT License Apache Tomcat 5.5.23 Apache License 2.0 Tachyon Project POM 0.8.2 Apache License 2.0 Apache Directory LDAP API Model 1.0.0-M20 Apache License 2.0 apache/incubator-heron 0.16.5.1 Apache License 2.0 Maven Plugin API 3.0.4 Apache License 2.0 ApacheDS Authentication Interceptor 2.0.0-M15 Apache License 2.0 Apache Directory LDAP API Extras ACI 1.0.0-M20 Apache License 2.0 Apache HttpComponents Core 4.3.3 Apache License 2.0 Spark Project Tags 2.0.0-preview Apache License 2.0 Curator Testing 3.3.0 Apache License 2.0 Apache HttpComponents Core 4.4.5 Apache License 2.0 Apache Commons Daemon 1.0.15 Apache License 2.0 classworlds 2.4 Apache License 2.0 abego TreeLayout Core 1.0.1 BSD 3-clause "New" or "Revised" License jackson-core 2.8.6 Apache License 2.0 Lucene Join 6.6.1 Apache License 2.0 Apache Commons CLI 1.3-cloudera-pre-r1439998 Apache License 2.0 hive-apache 0.5 Apache License 2.0 scala-parser-combinators 1.0.4 BSD 3-clause "New" or "Revised" License com.springsource.javax.xml.bind 2.1.7 Common Development and Distribution License 1.0 SnakeYAML 1.15 Apache License 2.0 JUnit 4.12 Common Public License 1.0 ApacheDS Protocol Kerberos 2.0.0-M12 Apache License 2.0 Apache Groovy 2.4.6 Apache License 2.0 JGraphT - Core 1.2.0 (GNU Lesser General Public License v2.1 or later AND Eclipse Public License 1.0) chill-java 0.5.0 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 OpenCensus 0.12.3 Apache License 2.0 ApacheDS Protocol
    [Show full text]
  • Developing Applications Using the Derby Plug-Ins Lab Instructions
    Developing Applications using the Derby Plug-ins Lab Instructions Goal: Use the Derby plug-ins to write a simple embedded and network server application. In this lab you will use the tools provided with the Derby plug-ins to create a simple database schema in the Java perspective and a stand-alone application which accesses a Derby database via the embedded driver and/or the network server. Additionally, you will create and use three stored procedures that access the Derby database. Intended Audience: This lab is intended for both experienced and new users to Eclipse and those new to the Derby plug-ins. Some of the Eclipse basics are described, but may be skipped if the student is primarily focused on learning how to use the Derby plug-ins. High Level Tasks Accomplished in this Lab Create a database and two tables using the jay_tables.sql file. Data is also inserted into the tables. Create a public java class which contains two static methods which will be called as SQL stored procedures on the two tables. Write and execute the SQL to create the stored procedures which calls the static methods in the java class. Test the stored procedures by using the SQL 'call' command. Write a stand alone application which uses the stored procedures. The stand alone application should accept command line or console input. Detailed Instructions These instructions are detailed, but do not necessarily provide each step of a task. Part of the goal of the lab is to become familiar with the Eclipse Help document, the Derby Plug- ins User Guide.
    [Show full text]
  • Java DB Based on Apache Derby
    JAVA™ DB BASED ON APACHE DERBY What is Java™ DB? Java DB is Sun’s supported distribution of the open source Apache Derby database. Java DB is written in Java, providing “write once, run anywhere” portability. Its ease of use, standards compliance, full feature set, and small footprint make it the ideal database for Java developers. It can be embedded in Java applications, requiring zero administration by the developer or user. It can also be used in client server mode. Java DB is fully transactional and provides a standard SQL interface as well as a JDBC 4.0 compliant driver. The Apache Derby project has a strong and growing community that includes developers from large companies such as Sun Microsystems and IBM as well as individual contributors. How can I use Java DB? Java DB is ideal for: • Departmental Java client-server applications that need up to 24 x 7 support and the sophisti- cation of a transactional SQL database that protects against data corruption without requiring a database administrator. • Java application development and testing because it’s extremely easy to use, can run on a laptop, is available at no cost under the Apache license, and is also full-featured. • Embedded applications where there is no need for the developer or the end-user to buy, down- load, install, administer — or even be aware of — the database separately from the application. • Multi-platform use due to Java portability. And, because Java DB is fully standards-compliant, it is easy to migrate an application between Java DB and other open standard databases.
    [Show full text]
  • Apache Geronimo Uncovered a View Through the Eyes of a Websphere Application Server Expert
    Apache Geronimo uncovered A view through the eyes of a WebSphere Application Server expert Skill Level: Intermediate Adam Neat ([email protected]) Author Freelance 16 Aug 2005 Discover the Apache Geronimo application server through the eyes of someone who's used IBM WebSphere® Application Server for many years (along with other commercial J2EE application servers). This tutorial explores the ins and outs of Geronimo, comparing its features and capabilities to those of WebSphere Application Server, and provides insight into how to conceptually architect sharing an application between WebSphere Application Server and Geronimo. Section 1. Before you start This tutorial is for you if you: • Use WebSphere Application Server daily and are interested in understanding more about Geronimo. • Want to gain a comparative groundwork understanding of Geronimo and WebSphere Application Server. • Are considering sharing applications between WebSphere Application Server and Geronimo. • Simply want to learn and understand what other technologies are out there (which I often do). Prerequisites Apache Geronimo uncovered © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 1 of 23 developerWorks® ibm.com/developerWorks To get the most out of this tutorial, you should have a basic familiarity with the IBM WebSphere Application Server product family. You should also posses a general understanding of J2EE terminology and technologies and how they apply to the WebSphere Application Server technology stack. System requirements If you'd like to implement the two technologies included in this tutorial, you'll need the following software and components: • IBM WebSphere Application Server. The version I'm using as a base comparison is IBM WebSphere Application Server, Version 6.0.
    [Show full text]
  • IBM Acquires Gluecode Software 10 May 2005
    IBM Acquires Gluecode Software 10 May 2005 IBM today announced it has acquired Gluecode especially amongst SMB and departmental users," Software, a privately held company based in El said Robert LeBlanc, general manager, Application Segundo, Calif., and provider of software and and Integration Middleware, IBM Software Group. support services for open source application "With the Gluecode acquisition, IBM enables infrastructure software. Financial details were not customers and Business Partners to tap the low disclosed. cost of entry of open source technology to quickly develop and deploy applications, and migrate to Based on core open source technology from the WebSphere software as business needs expand." Apache Geronimo application server, Gluecode's software and related subscription support services With the acquisition, IBM will enable customers and provide a flexible and affordable infrastructure to Business Partners to download Gluecode organizations that need a reliable alternative to application server software and start development traditional commercial software offerings. and deployment at no cost, and then purchase Gluecode helps Java developers, small and mid- software support services as needed from IBM. sized businesses (SMB) and departmental users Gluecode's operations will be integrated into IBM's reduce the complexity of application development Software Group. by pre-integrating the most common services for building mainstream Java applications. "The acquisition of Gluecode by IBM recognizes the value of software solutions that are built in IBM will become an active contributor to the collaboration with the open community," said Chet Apache Geronimo open source project and will Kapoor, CEO, Gluecode. "We are pleased to expand the existing community of developers that enable a new business model for IBM that allows advance this project and innovate on top of it.
    [Show full text]
  • Kerio Workspace
    Kerio Workspace Step-by-Step Kerio Technologies 2011 Kerio Technologies s.r.o. All rights reserved. This guide provides detailed description on Kerio Workspace, version 1.0.0. All additional modifications and updates reserved. For current versions of the product and related manuals, check http://www.kerio.com/workspace/download/. Information regarding registered trademarks and trademarks are provided in the appendix A. Contents 1 Introduction ................................................................... 4 2 System requirements .......................................................... 5 2.1 Supported operating systems and browsers ................................ 5 3 Installation ..................................................................... 6 3.1 Windows ................................................................. 6 3.2 Mac OS X ................................................................. 6 3.3 Linux ..................................................................... 7 3.4 Admin Account ........................................................... 8 4 Kerio Workspace Administration Interface ..................................... 9 4.1 Users ..................................................................... 9 4.2 Directory Service ........................................................ 11 4.3 Email Settings ........................................................... 13 4.4 Server ports ............................................................. 14 4.5 SSL Certificates .........................................................
    [Show full text]
  • Introduction to Apache Derby, TS-3154, Javaone Conference 2006
    Introduction to Apache Derby Dan Debrunner Senior Technical Staff Member IBM David Van Couvering Senior Staff Engineer Sun TS-3154 Copyright © 2006, Sun Microsystems Inc., All rights reserved. 2006 JavaOneSM Conference | Session TS-3154 | Where Would You Like Your Data? Learn more about Apache Derby, the database that can go anywhere 2 2006 JavaOneSM Conference | Session TS-3154 | Agenda Overview What Is Derby Good For? Derby Architecture 3 2006 JavaOneSM Conference | Session TS-3154 | Overview ● Derby enables data anywhere ● A complete database in a small package ● Mature, robust, performant, supported ● Community-based open source 4 2006 JavaOneSM Conference | Session TS-3154 | With Apache Derby, Your Data Can Go Anywhere In a browser, memory stick, laptop, server machine, mainframe, PDA... —anywhere Java™ technology goes 5 2006 JavaOneSM Conference | Session TS-3154 | How Does Derby Do This? ● Pure Java technology ● Easy to use ● Embeddable and client/server ● Small footprint ● Secure 6 2006 JavaOneSM Conference | Session TS-3154 | A Complete Database in a Small Package TM ● JDBC Optional Package for CDC/Foundation Profile (JSR 169) ● SQL92 and SQL99 ● Java based procedures, triggers, referential constraints, fully transactional with recovery ● Online backup/restore ● Database encryption ● 2 MB runtime footprint 7 2006 JavaOneSM Conference | Session TS-3154 | Java Technology in Your Database CREATE FUNCTION SEND_MAIL( TO_ADDRESS VARCHAR(320), SUBJECT VARCHAR(320), BODY VARCHAR(32000)) RETURNS INT LANGUAGE JAVA PARAMETER STYLE JAVA NO
    [Show full text]
  • HCP-DM Third-Party Copyrights and Licenses
    Hitachi Content Platform 7.3.0 HCP-DM Third-Party Copyrights and Licenses HCP Data Migrator (HCP-DM) software incorporates third-party software from a number of vendors. This book contains the copyright and license information for that software. MK-90ARC030-06 August 2017 © 2010, 2017 Hitachi Data Systems Corporation. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or stored in a database or retrieval system for any purpose without the express written permission of Hitachi Data Systems Corporation (hereinafter referred to as “Hitachi Data Systems”) except that recipients may copy open source licenses and copyright notices without additional permission. Hitachi Data Systems reserves the right to make changes to this document at any time without notice and assumes no responsibility for its use. This document contains the most current information available at the time of publication. When new and/or revised information becomes available, this entire document will be updated and distributed to all registered users. Some of the features described in this document may not be currently available. Refer to the most recent product announcement or contact Hitachi Data Systems for information about feature and product availability. Notice: Hitachi Data Systems products and services can be ordered only under the terms and conditions of the applicable Hitachi Data Systems agreements. The use of Hitachi Data Systems products is governed by the terms of your agreements with Hitachi Data Systems. By using this software, you agree that you are responsible for: a) Acquiring the relevant consents as may be required under local privacy laws or otherwise from employees and other individuals to access relevant data; and b) Ensuring that data continues to be held, retrieved, deleted, or otherwise processed in accordance with relevant laws.
    [Show full text]
  • Hortonworks Data Platform Command Line Installation (November 30, 2016)
    Hortonworks Data Platform Command Line Installation (November 30, 2016) docs.cloudera.com Hortonworks Data Platform November 30, 2016 Hortonworks Data Platform: Command Line Installation Copyright © 2012-2016 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing and analyzing large volumes of data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. The Hortonworks Data Platform consists of the essential set of Apache Software Foundation projects that focus on the storage and processing of Big Data, along with operations, security, and governance for the resulting system. This includes Apache Hadoop -- which includes MapReduce, Hadoop Distributed File System (HDFS), and Yet Another Resource Negotiator (YARN) -- along with Ambari, Falcon, Flume, HBase, Hive, Kafka, Knox, Oozie, Phoenix, Pig, Ranger, Slider, Spark, Sqoop, Storm, Tez, and ZooKeeper. Hortonworks is the major contributor of code and patches to many of these projects. These projects have been integrated and tested as part of the Hortonworks Data Platform release process and installation and configuration tools have also been included. Unlike other providers of platforms built using Apache Hadoop, Hortonworks contributes 100% of our code back to the Apache Software Foundation. The Hortonworks Data Platform is Apache-licensed and completely open source. We sell only expert technical support, training and partner-enablement services. All of our technology is, and will remain, free and open source. Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page.
    [Show full text]