By Ryan Betts, Cto of Voltdb
Total Page:16
File Type:pdf, Size:1020Kb
BY RYAN BETTS, CTO OF VOLTDB Contrary to rumblings that Structured Query Language (SQL) is dying – or even dead – it continues to help modern databases handle both large-volume analytic workloads and high-performance transaction processing workloads. SQL is a proven winner that has dominated for several decades and continues to stay on top of its game – in fact, companies and organizations like Google, Facebook, Cloudera and Apache are making aggressive investments in it. These organizations choose SQL because they know it can easily manage user demands for improved interactivity with data, integrate with standard tooling interfaces, fit existing user training and knowledge, and deliver on meaningful query optimization. Once a technology becomes as dominant as SQL, the reasons for its ascendency are sometimes forgotten. SQL wins because of a unique combination of strengths: • SQL enables increased interaction with data and allows a broad set of questions to be asked against a single database design. That’s key since data that is not interactive is essentially useless, and increased interactions lead to new insight, new questions and more meaningful future interactions. • SQL is standardized, allowing users to apply their knowledge across systems and providing support for third-party add-ons and tools. • SQL scales, and is versatile and proven, solving problems ranging from fast write-oriented transactions, to scan-intensive deep analytics. • SQL is orthogonal to data representation and storage. Some SQL systems support JSON and other structured object formats with better performance and more features than NoSQL implementations. Although NoSQL has garnered a lot of attention of late, SQL continues to win in the marketplace and continues to earn investment and adoption throughout the big data problem space. The term “NoSQL” is ambiguous, but for this database and an optimal algorithm will still be discussion, I use Dr. Rick Cattell’s NoSQL definition computed. In a procedural system, a programmer to mean “systems that provide simple operations needs to revisit and reprogram the original how. like key/value storage or simple records and This is expensive and error-prone. indexes, and that focus on horizontal scalability for those simple operations.” The marketplace understands this critical difference. In 2010 Google announced a SQL It’s clear that the many new databases available implementation to complement MapReduce, driven are not all alike – and recognizing how the DNA by internal user demand. More recently, Facebook behind each helps and hinders problem solvers released Presto, a SQL implementation to query its is the key to success. NoSQL’s key features petabyte HDFS clusters. According to Facebook: make it more appropriate for use in specific “As our warehouse grew to petabyte scale and our problem sets. For example, graph databases are needs evolved, it became clear that we needed better suited for those situations where data is an interactive system optimized for low query organized by relationships vs. by row or document, latency.” Furthermore, Cloudera is building Impala, and specialized text search systems should be another SQL implementation on top of HDFS. All of considered appropriate in situations requiring real- these are advances over Hive, a long-standing and time search as users enter terms. broadly adopted SQL façade for Hadoop. I’m going to draw out the important benefits and SQL IS STANDARDIZED. Although vendors differentiation of SQL systems vs. simple key/value sometimes specialize and introduce dialects and JSON object stores that do not innovate beyond to their SQL interfaces, the core of SQL is well storage format and scalability. standardized and additional specifications, such as ODBC and JDBC, provide broadly available SQL ENABLES INTERACTION. SQL is a stable interfaces to SQL stores. This enables an declarative query language. Users state what ecosystem of management and operator tools to they want (e.g., display the geographies of top help design, monitor, inspect, explore and build customers during the month of March for the prior applications on top of SQL systems. five years) and the database internally assembles an algorithm and extracts the requested results. SQL users and programmers can therefore By contrast, the NoSQL programming innovation reuse their API and UI knowledge across MapReduce is a procedural query technique. multiple backend systems, reducing application MapReduce not only requires the user to know development time. Standardization also allows what they want, but to also state how to produce declarative third-party Extract, Transform, Load the answer. (ETL) tools that enable enterprises to flow data between databases and across systems. This might sound like an uninteresting technical difference, but it is critical for two reasons: First, SQL SCALES. It is absolutely false to assume declarative SQL queries are much easier to SQL must be sacrificed to gain scalability. As build via graphical tooling and point-and-click noted, Facebook created a SQL interface to report builders. This opens up database querying query petabytes of data. SQL is equally effective to analysts, operators, managers and others at running blazingly fast ACID transactions. The with core competencies outside of software abstraction that SQL provides from the storage programming. Second, abstracting what from and indexing of data allows uniform use across how allows the database engine to use internal problems and data set sizes, allowing SQL to information to select the most efficient algorithm. run efficiently across clustered replicated data Change the physical layout or indexing of the stores. Using SQL as an interface is independent from building a cloud, scale or high availability (HA) system, and there is nothing inherent in SQL ABOUT VOLTDB that prohibits or restricts fault tolerance, high availability and replication. In fact, all modern VoltDB provides the world’s fastest operational SQL systems support cloud-friendly horizontal database, delivering high-speed data scalability, replication and fault tolerance. processing and real-time, in-memory analytics in a single database system. VoltDB is a SQL SUPPORTS JSON. Several years ago many relational database that gives organizations an unprecedented ability to build ultra-fast SQL systems added XML document support. Now, applications that can extract insights from as JSON becomes a popular data interchange massive volumes of dynamic data and enable format, SQL vendors are adding JSON-type support real-time decision-making. Organizations in as well. There are good arguments for structured markets including mobile, financial services, data type support given today’s agile programming energy, advertising and gaming use VoltDB to processes and the uptime requirements of web- maximize the business value of data at every exposed infrastructure. Oracle 12c, PostgreSQL interaction. www.voltdb.com 9.2, VoltDB and others support JSON – often with performance benchmarks superior to “native” JSON NoSQL stores. SQL will continue to win market share and see new investment and implementation. NoSQL databases offering proprietary query languages or simple key-value semantics without deeper technical differentiation are in a challenging position. Modern SQL systems match or exceed NoSQL scalability while supporting richer query semantics, established and trained user bases, broad eco-system integration and deep enterprise adoption. The bulk of this article originally appeared in a Network World article debating SQL and NoSQL technologies. 209 Burlington Road, Suite 203 Bedford, MA 01730 Phone: +1.978.528.4660 Fax: +1.978.528.0568 http://voltdb.com.