DZONE.COM/GUIDES

Dear Reader, Table of Contents

3 Executive Summary Welcome to DZone’s latest Guide (and the last one ever!) on BY KARA PHELPS : Evolving Solutions and Toolsets. We’ve come a long way in 4 Key Research Findings the world of databases. Hierarchical databases entered the playing BY JORDAN BAKER field first in the late 1960s, paving the way for network and then 7 The Multiple Facets of Time Series: From Operations Optimization to Business Strategy relational databases in the 1970s, which quickly gained traction. BY DANIELLA PONTES Entity-relationship databases came soon after, followed by semantic, 12 Handling Dynamic Connected Data in Graphs object-oriented, and object-relational databases. Today, semi- BY MAX DE MARZI

structured XML databases still take the stage. 14 AgensGraph: A Graph DB Helping to Solve World Hunger Now, we can’t function without databases. 82% of people from BY JO STICHBURY our DZone Guide to Databases survey have 10 or more years of 18 Performance Tuning and Monitoring Traditionally Falls to the Administrator experience with databases, showing just how much they have BY MONICA RATHBUN permeated developers’ everyday lives. However, developers are 24 Cross-Platform Deployments in Azure With BASH still not containerizing their databases as much as you might think. BY KELLYN POT’VIN-GORMAN Only 10% of our survey respondents say they have containerized 32 Executive Insights on the State of Databases all of their databases — down from 13% in 2018. Still, 23% say they BY TOM SMITH expect containers will transform database technology over the next 36 Databases Solutions Directory five years, so it will be interesting to see how this number changes in 43 Diving Deeper Into Databases coming years.

Also according to our survey, 46% of our survey respondents use MySQL in non-production environments, with PostgreSQL coming in second at 44%. However, 39% said they personally most enjoy DZone is... working with PostgreSQL and only 30% prefer working with MySQL.

BUSINESS & PRODUCT MARKETING EDITORIAL But perhaps the most significant rivalry is between SQL and NoSQL. Matt Tormollen Susan Wall Mike Gates CEO CMO Content Team Lead

SQL databases are considered relational databases, whereas NoSQL Terry Waters Aaron Tull Kara Phelps Interim General Manager Dir. of Demand Gen. Editorial Project Manager

databases are non-relational or distributed databases. The many Jesse Davis Waynette Tubbs Jordan Baker EVP, Technology differences between the two make it very interesting to explore why Dir. of Marketing Comm. Publications Associate Kellet Atkinson Media Product Manager Colin Bish Tom Smith developers choose one over the other. You’ll be able to see which one Member Marketing Spec. Research Analyst Andre Lee-Moye Suha Shim developers prefer and which one suits their needs more. Content Coordinator Acquisition Marketing Mgr. SALES Kendra Williams Lauren Ferrell Cathy Traugot Content Coordinator DZone’s 2019 Guide to Databases: Evolving Solutions and Toolsets Sr. Director of Media Sales Content Marketing Mgr. Chris Brumfield Lindsay Smith dives into more data like this as well as database performance, graph Sales Manager Content Coordinator databases, and handling dynamic data in both SQL and graphs. We’ll Jim Dyer Sarah Sinning Sr. Account Executive Staff Writer also look into time series data and Azure on with databases. Tevano Green Sr. Account Executive PRODUCTION Brett Sayre Chris Smith Thanks for reading, and we hope you enjoy! Account Executive Director of Production Alex Crafts Billy Davis Key Account Manager Production Coordinator Craig London Naomi Kromer Key Account Manager Sr. Campaign Specialist

WRITTEN BY LAUREN FERRELL Jordan Scales Michaela Licari Sales Development Rep. Campaign Specialist CONTENT COORDINATOR, DEVADA

THE DZONE GUIDE TO DATABASES PAGE 2 OF 44 DZONE.COM/GUIDES

it with a specialized time-series database, compared to just 12% of respondents who claimed to do so with their time-series data when asked the same question last year. Respondents who Executive persist their time-series data with a relational database fell from 64% to 56% this year. Respondents using a non-time-series- specific NoSQL database for this data also declined — from 19% Summary to 16% this year. IMPLICATIONS Adoption of specialized time-series databases appears to be BY KARA PHELPS EDITORIAL PROJECT MANAGER, PUBLICATIONS, DEVADA gaining momentum among organizations that rely on time-series At the foundation of software development, databases are data. In a corresponding way, the use of SQL and NoSQL databases essential building blocks. New developers often learn database for time-series data also seems to have fallen over the past year. technology first. Applications rely on databases to deliver data from an ever-increasing array of sources — securely, at scale, RECOMMENDATIONS If you work with time-series data, it may be worth it to look into with little to no latency. To prepare for the DZone’s 2019 Guide how specialized time-series databases might fit your particular to Databases, we surveyed 639 tech professionals to learn how use case, if you haven’t already done so. For an introduction or they’re using database technology every day, and how they a refresher, check out “What the Heck Is Time-Series Data (And expect databases to change in the near future. Let’s dig into some Why Do I Need a Time-Series Database)?” We’re also featuring an key results. article on implementing time-series databases later in this guide.

SQL Maintains Royalty Status The Future Is in Stream Processing/Real-Time DATA Analytics 98% of survey respondents said they use at least some SQL DATA on a regular basis. 31% of respondents said they use SQL In a new question this year, survey takers were asked to choose only. Just 2% reported using NoSQL only. The majority of which technology or trend they believe will have the biggest respondents (65%) reported using a blend of SQL and NoSQL impact on database technology within the next five years. 11% in their typical projects. chose data protection and the rising consumer demand for privacy; 18% chose blockchain; 23% chose containers; the largest IMPLICATIONS group (30%) chose stream processing/real-time analytics. SQL databases continue to outpace NoSQL in terms of adoption. When asked why their resources are arranged as they are, 27% IMPLICATIONS of survey respondents reported that it best suits the current All of these trends are poised to change the face of database knowledge of developers at their organization. It’s possible that technology in the near future. Many tech professionals with developer teams simply continue to go with what they know. knowledge of the space believe that stream processing (also known as real-time analytics or streaming analytics) will lead RECOMMENDATIONS the most transformation. Your SQL skills aren’t falling out of favor anytime soon. SQL will keep its place in a developer’s toolbox for the foreseeable RECOMMENDATIONS future. The large user base also means that any organization in Stream processing is invaluable when insights need to be need of SQL database assistance will easily find knowledgeable extracted from large volumes of data in real time. The amount developers to bring onboard. of data in the world continues to grow exponentially, and the modern enterprise demands any new information to be acted The Rise of Time Series upon instantaneously. It makes sense that stream processing DATA is becoming an essential part of database management. If When asked how their company persists time-series data, 22% of you’re interested in the concept, take a look at our Refcard on respondents who actually use time-series data said they persist Understanding Stream Processing.

THE DZONE GUIDE TO DATABASES PAGE 3 OF 44 DZONE.COM/GUIDES

• Respondents reported using four main programming lan- Key Research guage ecosystems –– 79% said Java –– 74% reported client-side JavaScript –– 41% work with Node.js Findings –– 41% told us the use the Python ecosystem • Despite the above ecosystem usage rates, Java proved the most popular primary programming language by far, with BY JORDAN BAKER PUBLICATIONS ASSOCIATE, DEVADA 55% of respondents using Java in this capacity.

SQL vs. NoSQL: The Battle Continues The choice between SQL and NoSQL databases is one of the most divisive questions in the field of database engineering at the mo- Demographics ment. In fact, over half (65%) reported using a mixture of SQL and NoSQL databases in their projects. The most popular database con- For this year's DZone Database survey, we received 639 respons- figuration among all respondents turned on out to be using mostly es with a 62% completion percentage. Based on these answers, SQL with some NoSQL (44%), with another 31% of survey takers we compiled some basic demographic information about the reporting to use only SQL. And, while plenty of developers are work- respondents. ing with both SQL and NoSQL databases in their code, only 2% of • 82% of respondents have 10 or more years of experience respondents reported using NoSQL only. Unlike in our survey from 2018, this year we included an option to report an even division of • Respondents work in three main roles: resources between SQL and NoSQL databases. Among respondents, –– 35% are developers/engineers 11% claimed to divide their resources in this way. When asked why –– 23% work as architects they use databases in the way they do, most respondents said it –– 22% are developer team leads was either to best organize data to suit their needs (42%) or to

• 61% of survey takers work for enterprise-level organizations best suit the current knowledge of developers at their organization (27%). This last point could prove crucial in understanding why SQL –– 23% for organizations sized 100-999 databases continue to hold an edge over NoSQL databases, at least –– 20% for organizations sized 1,000-9,999 in terms of adoption rates. The older and more established of the –– 18% for organizations sized 10,000+ two technologies, SQL databases could very well bring with them a • Respondents work on three main types of software. larger user base and thus simplify the recruiting and onboarding of new developer talent within an organization. –– 69% develop web applications –– 60% are developing enterprise business applications Looking at the numbers we gathered around specific SQL and –– 24% develop native mobile apps NoSQL databases, we continue to see that SQL databases have

SURVEY RESPONSES Within the next five years, what technology/trend do you think What security protocols do you implement to protect the data will most transform database technology as we know it? in the databases you are responsable for?

                            

      

         

THE DZONE GUIDE TO DATABASES PAGE 4 OF 44 DZONE.COM/GUIDES

a slight advantage in terms of adoption ratings. When we asked ment systems, let's turn to what respondents told us about their respondents what database management system they use in actual data storage and management practices. In keeping with production, the six most popular databases reported included four the popularity of SQL databases in professional settings delineated SQL and two NoSQL databases. The adoption rates of these six above, we found that an overwhelming majority of respondents databases are as follows: use relational data storage models to persist their data. In fact, this model of data persistence has proved far and away the most pop- • PostgreSQL: 49% ular means of data persistence over the past several years. In our • MySQL: 46% 2018 database survey, we found that 88% of respondents reported using relational data persistence models, with key-value coming in • Oracle: 43% as the second most popular persistence model with a 57% percent • Microsoft SQL Server: 40% adoption rating. In this year's survey, relational persistence held • MongoDB: 29% steady, with 89% of respondents using this model, whereas key-val- ue persistence models fell to a 53% adoption rating. • Redis: 25%

As you can see, the top four are all SQL databases. Interestingly, Looking at the number of persistent storage models that respondents PostgreSQL and MySQL swapped spots from last year's survey, with reported using in their projects, we find some interesting year-over- PostgreSQL increasing from its 37% adoption rate in our 2018 sur- year fluctuations as well. Over the past two years, the use of per- vey, and MySQL falling from its 49% adoption rate last year. When sistent storage models proved the most popular with survey takers, we examine databases used in a non-production environment, we with 37% of respondents reporting to use two persistent storage see much of the same. MySQL (46%), PostgreSQL (44%), MS SQL models in each year's survey. But, the number of survey takers using Server (33%) and Oracle (31%) were the four most used, followed by either one or three persistent storage models varied considerable MongoDB (30%) and Redis (22%). between 2018 and 2019. In our 2018 survey, 20% of respondents reported using one persistent storage model; this year, that number While SQL databases seem to have a decided edge in current rose to 30%. Conversely, we found that the percentage of survey adoption rates among respondents, NoSQL databases, specifically takers using three persistent storage models in their projects fell from MongoDB, make up ground in other categories. When we asked sur- 24% in 2018 to 13% in 2019. Thus, while persistent storage models are vey takers to tell us which database management system they most largely remaining relational in their set up, in number they seem to be enjoy working with, 39% reported PostgreSQL, 30% said MySQL, drifting more to one or two per project. and 30% told us MongoDB. And, of the 17% of respondents looking to adopt a new DBMS in the next year, MongoDB and PostgreSQL Unlike the use of relational data models, the use of time-series data tied for their leading candidates (19% each). saw some fluctuation. In 2018, 81% of respondents told us their company collected some kind of time-series data. In this year's Database Management and Data Storage database survey, this number fell to 67%. Among those survey Now that we've discussed the place of SQL and NoSQL databases takers who do use time-series data, the percentage of them who use in the field, and the popularity of particular database manage- either SQL or NoSQL databases for their time-series data also fell. In

SURVEY RESPONSES Which database management systems do you, personally, Which persisent storage models do your application use? most enjoy working with?

 

   

        

               

     

 

         

THE DZONE GUIDE TO DATABASES PAGE 5 OF 44 DZONE.COM/GUIDES

2018, 64% of respondents who worked with time-series data used Despite this low adoption rate, however, when we asked survey tak- a relational database, which dropped to 56% in this year's survey; ers to tell us what technology or trend they thought will transform similarly, those who used NoSQL databases for their time-series database technology over the next five years, 23% said containers. data fell from 19% to 16%. While the adoption rates for these two Interestingly, containers proved the second most popular response widely popular models of data storage dropped among time-series to this question on potentially transformative technologies, beat users, the instance of respondents using a specialized time-series out only by stream processing/real-time analytics (30%). database grew from 12% in 2018 to 22% in 2019. SECURITY Popular Tools and Database Security While the world of database technology is seeing a greater interest DATABASE TOOLS in security arise in recent years, possibly due to several high-profile Having addressed the fluctuations and trends in the types of data breaches, 26% of respondents reported that their organization is storage models above, let's now take a moment to examine the planning on adding new security protocols in the near future. Ad- tools used to host these databases and DBMS, namely servers (and ditionally, the year-over-year trends for the usage rates of security where they're located) and containers. protocols either stayed stagnant or fell. In 2018, 91% of respondents told us they use authentication to protect the data in their database; Despite the constant advances being made in cloud computing, in 2019, 88% reported to use authentication. Similarly, in last year's the adoption rates of cloud, hybrid, and on-premise solutions survey, 37% of respondents told us that they sanitized their user in- seemed to remain rather static year-over-year. When we asked puts, and in 2019 this number stayed fairly steady, with 38% claim- survey takers where the data they work with typically resides, 46% ing to sanitize user inputs. What was potentially alarming, however, told us on-premise (compared to 47% in 2018), 30% host their was the drop in respondents using encryption to protect their data. data in the cloud (compared to 32% in 2018), and 20% use a hybrid In 2018, 62% of survey takers reported that their organization used hosting environment (compared to 19% in 2018). Additionally, we encryption to secure the data they stored in their database; but, this saw little to no yearly fluctuation in the percentage of databases year, that number fell to 49%. reported to be containerized. Given that discussing the percentage of respondents who containerize a certain percentage of their databases is difficult to convey in prose, here's a quick breakdown of this year-over-year trend:

Popular Container 2018 2019 Strategies

0% of databases 53% of respondents 52% of respondents are containerized

100% of databases 13% of respondents 10% of respondents are containerized

11-25% of databas- es are container- 10% of respondents 9% of respondents ized

SURVEY RESPONSES Where does the data that you're working with typically reside? How is your time split between working with SQL and NoSQL databases?

                          

THE DZONE GUIDE TO DATABASES PAGE 6 OF 44 DZONE.COM/GUIDES

The Multiple Facets of Time Series: From Operations Optimization to Business Strategy

BY DANIELLA PONTES SENIOR PRODUCT MARKETING MANAGER AT INFLUXDATA

Time-series data seems to be everywhere lately. Today’s data-driven directions and points back to them. Preemptiveness is the name of society demands optimization — we all want to be on top of what we the game for IT Ops. consume and spend, and we want to make the most out of what we Business visionaries have also contracted the “nothing-to-waste bug” do and buy. — but with a sense of opportunity turned on. New entrants who bet on Lower performance and downtime (of any nature) is not merely seen performance improvements are winning big in commoditized markets. as an inconvenience, but directly translates to waste: whether of Entire new markets are also being created around collecting measure- resources, revenue opportunities, brain power, working hours, or a ments. The sensor-ification of our world (and of ourselves) is driven by perfect sunny day. It all boils down to the only thing that levels us: the our increasing dependence on data to make sound decisions. unrecoverable loss of time. The Rise of Purpose-Built Time-Series Databases Consumers and professionals share the same realization that things Time series is a direct byproduct of the current demand for optimiza- must be constantly improved because there is no time to waste. tion. As the saying goes, “one can only improve what has been mea- Application developers know that visitors will not spare another sured.” IT Ops, DevOps, and data-driven professionals are taking this second on a sluggish website or on an outdated and clumsy ap- wisdom to the limit. Real-time monitoring allows them to act on fresh plication. Ensuring flawless continuous integration and delivery is data to avoid missing profit opportunities or loss of revenue. A subtle becoming a priority for DevOps teams and application managers. event not caught by large sampling intervals could mean a disaster in the making — the root cause of a sequence of degradations and As for IT operations teams, they know that problematic infrastruc- eventual downtime. ture and networks will consume their days with complaints, alerts, and diagnosing and remediating problems. As the clock ticks, the Real-time monitoring means very fine-grained sampling and obser- compounding impact of arising issues creates dissatisfaction in all vation of events as they occur. Some latency-sensitive industries,

THE DZONE GUIDE TO DATABASES PAGE 7 OF 44 DZONE.COM/GUIDES

such as finance and gaming, run on nanosecond technologies. Think order to handle the volume of data being generated, collected, trans- billions of measurements a day. formed, and observed.

“Time series” is everywhere, and it’s generating tons of data. But There Is a Place for Time Series in Every Organization what constitutes time-series data, and why should it be treated any Time-stamped data is on the rise because performance improve- differently than other types of data? ments, observable via time series, is what we are willing to pay for

“Time series” are sets of time-stamped measurement data of the — as professionals as well as consumers, no matter the context. The same monitored object, collected over time. The samples don’t need for flawless performance and increased dependence on data replace previous ones. They accumulate. This already differen- for visibility permeates all layers and relations. tiates time series from other types of records, which are usually Quick interactions and precise actions are the expectation. We use updated. Time-stamped measurements are stored in series, observation and prediction modeling to increase our odds to get it queried in large scans, and plotted in dashboards for visualization right. But collecting measurements randomly without a clear goal of behavior over time. may produce the opposite results: an inability to act objectively. Finding patterns in an apparently uncorrelated, complex, and confusing data pool is like a treasure hunt — we know the answer is in the data, but we cannot see it without the right tools. We want to extract meaning from time-stamped data using the lens of time, and time-series databases have become the channel to tap into “Time series” are sets of time- such data.

stamped measurement data Businesses are already embracing time series beyond the usual use cases — seeing it as a tool for differentiation and taking a more of the same monitored object, strategic approach. Purpose-built time-series platforms have been adopted as part of the following business models: collected over time. • As a central full-stack monitoring platform for IT Ops aimed at creating a common practice and format for collecting and monitoring time-series data across the organization. This becomes especially important for companies that have mul- tiple sites, that go through M&As, and who have to deal with Depending on the type of measurement, the data is kept in raw multiple development and operation centers. form or aggregated for long-term storage and modeling purposes. • As a performant time-series storage engine supporting Data processing can be done upfront by applying real-time stream applications and solutions that extract their value from time- processing for alerting. Cross-analytics with other measurements stamped data. Examples are: resource utilization monitoring for more insightful indicators is important to consider — it provides and planning solutions, performance monitoring solutions additional observation dimensions. Data volume should not be un- and services, and the whole IoT industry. derestimated, either. It will continue to grow, and some sets of data will require long-term retention. Therefore, when seeking time-series • As part of a competitive strategy to differentiate from estab- solutions, you should be looking for a platform that can ingest, pro- lished players. In this case, time series must be integrated into cess in real time, query, and store time-stamped data at scale. various layers. It must provide visibility into critical trans-

How time series are collected — and what you do with them — de- actions, including third-party services. From infrastructure pends on the use case. Keep in mind that they comprise buckets of to business indicators and internal processes, all must be measurements with which you can devise your unique monitoring optimized to achieve SLAs and SLOs. recipe for success. Purpose-built design yet again tips the scale IT OPERATIONS CENTRALIZED MONITORING toward multiple data types, and therefore, providing more options to Full-stack monitoring is one of the most common use cases for compose your monitoring strategy. time series. IT Ops and DevOps have long been collecting time In short, time-series databases must be performant databases in series to monitor their infrastructure and applications. What's

THE DZONE GUIDE TO DATABASES PAGE 8 OF 44 DZONE.COM/GUIDES

changed is the complexity, geo-distribution, and speed of modern instrumentation tools. Synthetic transactions, real user monitoring IT environments. The amount of collected metrics exploded with (RUM), and distributed tracing also fall under APM. Each of these hybrid deployments and microservice architecture running on giga- subcategories has its own set of functionalities, features, and inter- bits-per-second networks. Siloed solutions showing partial views of faces as value propositions. Focusing on the unique value proposi- what is going on are ineffective, yet still generate tons of data that tion aspects would yield quicker results, and would better reduce needs to be examined. But when you are in a downtime situation risk to market, than would developing everything from scratch. losing revenue, every minute counts. A survey from 2017-2018 Particularly with regard to the storage tier (where performance and revealed that about one-quarter of respondents estimated a loss scalability is crucial), it would be better to adopt a platform already between $300K-400K per hour of downtime. proven to be time-series performant and efficient.

Dealing with an incident goes through phases that could be summarized COMPETITIVE ADVANTAGE VIA SERVICE EXCELLENCE as: detection, investigation or diagnosis, and remediation. Reducing In one way or another, all companies promise to be more reliable, detection-to-remediation time has a direct impact on revenue and cheaper, and faster in order to win business. And we all know that, credibility to the organization. Therefore, organizations are in search in practice, what is delivered is quite far from the promised land. of efficiency as well as efficacy. Billions of ad hoc metrics in silos will That has been the status quo: customer complaints versus compa- not provide the answers in a timely manner. It is necessary to be smart nies’ ability to dodge them. However, times are different now, main- about the data collected and analyzed. ly for online e-commerce and retail companies. Customers are one click away from landing on the competition’s website. So, no more Creating a standard and a guideline for full-stack monitoring empty promises — online shoppers are non-committal, and will only organization-wide is a path that many IT Ops and DevOps teams give you another second of dedicated attention if the previous one are adopting. But in order to implement such a plan, it is necessary was satisfactory. to have in place a scalable platform that can accommodate the re- quirements from various business units and teams. For instance, it’s Performance optimization for competitive advantage relies on essential to be able to ingest multiple types of time-stamped data, the ability to observe code, systems, networks, applications, and numeric and non-numeric, as well as provide configurable precision transactions all in the context of customer experience. Purpose-built and retention policies. time-series platforms empower developers to implement application environments with integrated full-stack and business KPI monitoring, One key insight to avoid excessive monitoring data is to use critical intelligent alerting, and workflows. business indicators as red alerts, and the rest of the stack data for quick diagnosis and recovery. Symptoms of a malfunctioning or When starting with a business goal, DevOps, developers, and underperforming application environment should be detectable on performance engineers adopt time series as a means to an end, all layers — infrastructure, network, and application — but not alerted tailoring unique measurements to achieve performance bench- on all outliers. That would lead to a saturation of responses. On the marks to translate into competitive advantage. By taking a other hand, any degradation of business key performance indicators strategic hands-on approach to instrumentation, and defining and (KPIs) should be acted on immediately, with access to the full-stack monitoring critical metrics, small players are entering and even monitoring data for correlation and quick diagnosis. By generating dominating saturated markets. more intelligent alerts, Ops and DevOps can handle the increasing complexity and speed of modern IT environments.

TIME-SERIES DATABASES AS CRITICAL COMPONENTS Performant time-series stores are also the choice of solution archi- DANIELLA PONTES is part of the product marketing team at InfluxData. She started her career in tects who need a scalable time-series database to handle storage telecommunications, wireless technology, and global and processing of time-stamped data. By doing so, they can apply Internet service provisioning. As security became a major concern for their resources on competitive functionalities and features of the enterprises, she worked on enterprise policy management, SaaS, and complete solution. data encryption solutions. Prior to joining InfluxData, she spent some years living in Japan, Germany, and Brazil. Having worked in various market segments, from embedded smart antenna technology to Application performance monitoring (APM) and load testing Internet security and e-commerce doing product management, solutions are two good examples of solutions that benefit from partnerships, marketing, and business development, she has a broad integrating purpose-built time series as their storage tier. APM experience working cross-functionally and with customers and partners. LinkedIn grew to encompass other sets of solutions beyond the early code

THE DZONE GUIDE TO DATABASES PAGE 9 OF 44 May 2019 DZONE.COM/GUIDES Why Time Series matters for metrics, real-time, and sensor data

DOWNLOAD THE E-BOOK

“MySQL is not intended for time series data… I can testify it is like pounding nails with a screwdriver. It’s definitely not what you want to do in any relational database.”

John Burk, Senior Software Developer

THE DZONE GUIDE TO DATABASES PAGE 10 OF 44 DZONE.COM/GUIDES

We already know that business indicators and full-stack measurements should come to the same place in order to be effective and fully utilized. As we ban waste from our processes and resource utilization in our quest for Bringing Time Series optimization, centralization of time series data also cuts the intrinsic inefficiencies and inefficacies of gathering Into the Strategy information pieces from multiple siloed sources.

One can say that success is a measure of outperforming competitors. Therefore, time series planning must move from the operations tool-box to become an important aspect Big brother culture is here to stay. Everything is up to of business strategy. For that, an integrated within the observation justified by performance, efficiency and efficacy whole organization is necessary. Bringing all together will improvements. Infrastructure, flows, applications, processes, allow compounding insights and aggregating improvements services, activities, natural resources...the list grows endlessly. that can be translated into unique competitive advantages. If you can think of it, it exists as an observable entity. Therefore, anything that has a purpose to anyone is in line for time series monitoring.

If anything is qualified to time series, the question is: How to manage such a vast set of different measurements in an WRITTEN BY DANIELLA PONTES effective and efficient way? SENIOR PRODUCT MARKETING MANAGER AT INFLUXDATA.

PARTNER SPOTLIGHT InfluxData Act in Time

Category Time Series Data Platform New Release Quarterly release cycles Open Source? Yes

Case Study Strengths Coupa Software needed to create a custom DevOps • Developer happiness Monitoring solution for their leading spend • Faster Time to Awesome management cloud platform. With InfluxData they • Ease of scaleout and deployment moved from pure data collection to predictive analytics and achieved a consistent track record of delivering • Centralized and integrated full stack monitoring close to 100% uptime SLA across 13 major product releases and 5 major product module offerings, as well as solving their data accessibility, aggregation, and retention challenges. Operational metrics are collected via Telegraf, stored in InfluxDB, and analyzed by Kapacitor. They use Grafana for visualization and have Notable Customers created a custom alerting framework. This has become Website influxdata.com • Coupa the foundation to the path of building a system that is • PayPal Twitter @InfluxDB self-healing and can provide predictive analytics key to • Wayfair Blog influxdata.com/blog accurate forecasting.

THE DZONE GUIDE TO DATABASES PAGE 11 OF 44 DZONE.COM/GUIDES

Handling Dynamic QUICK VIEW 01. How are graph databases different?

02. What are the advantages of graph Connected Data in databases?

03. Comparing data to build knowledge Graphs graphs

BY MAX DE MARZI GRAPH DATABASE EXPERT

Graph databases are predominantly known for two things: fiercely to each other by relationships. These relationships are typed finding the missing relationships in your data and finding the and directed. relationships that shouldn't exist at all. These two basic concepts A user node DRIVES a car node, at least today, but in the near make up the basis for two of the most popular use cases, which are future, the car node may be the one doing the driving. A node can recommendations and fraud detection. have zero or millions of relationships of one type or thousands In recommendations, we are looking for a person you should know of types to a single other node or to every node in the graph; it's because you both know some of the same people, a product you completely dynamic. should buy because shoppers with similar behavior have bought Since there are no columns in graph databases, each node can that product, a movie you should watch because people who share a have completely different properties from another node even if similar taste in movies liked that movie, and so on. they are both the same kind of node. One user node may have In fraud, we are looking at how a back-office insurance agent is just a username and an email, while the other may also have their friends with Mark, who is married to Isabelle, who is a cousin of height, weight, date of birth, etc. One may have a string value for the the person involved in the insurance claim, or complex fraud rings nickname property, and the other may have an array of strings. in money laundering schemes, or over-connected synthetic data Lastly the relationships themselves may have properties, typically that looks legitimate at first glance but forms long chains of shared storing data like the strength of the relationships, when the attributes when analyzed by its relationships. relationship came to be, and if it ended. They are very different from There are many other use cases, but before we go there, we need to traditional relational databases and must not be treated the same way. understand what a graph database is and what it is not. For starters, you cannot talk to them using SQL. Remember, SQL Graph databases do not have tables, columns, rows, or foreign keys. stands for Structured Query Language, where the schema of the All they have is these small objects floating around in space, holding data is painstakingly argued over by DBAs in some dark corner of the

THE DZONE GUIDE TO DATABASES PAGE 12 OF 44 DZONE.COM/GUIDES

office. Is this name property a varchar(32) or varchar(48), or is it an "keep going," making recursive and variable path queries a breeze.

nvarchar(48)? Is this weight property a smallint, decimal, real, or Since every node knows exactly what it is connected to, you don't

do we fall back to varchar(32)? How do the different tables connect? have to spell out the joins in a horrifically long and error-prone SQL Which foreign keys point which ways? How many join tables do we query. Being able to simply ask how these two things are connected, need? Is it "one to many" or "many to many?" regardless of what is between them and the number of hops it takes, is magic. Those questions don't happen in graph databases since properties and relationships are dynamic. Dealing with mixed data types across Magic is exactly what large enterprises need to make sense of multiple databases is a huge pain. Before starting any master data their data. Often due to legacy systems, mergers, and acquisitions, management or data reconciliation project, the DBAs must design a enterprises end up with many siloed databases, each with their own monster of an entity relationship diagram (ERD) the size of an entire version of the truth. wall with foreign keys crisscrossing tables in a giant hairball. Few But what happens when the data in the billing database and the folks outside the DBA team can understand just what is going on. operational database don't match? What do you do when you have But when you show a graph representation of the data to business low confidence in your data? How do you make decisions using stakeholders, they begin to understand their data. It's how they analytics and machine learning when you don't even know the data often think of their business in terms they understand, not database the queries and models were built on are true? tech jargon. Engineers and business can now talk about data in To deal with these problems, companies are importing these siloed a language they both understand. Well, almost. To query graph datasets into multiple graphs and comparing them to one another. databases like Neo4j, Memgraph, RedisGraph, SAP HANA, and others, They are looking for data missing from one system that exists in we still need to use dynamic data languages like Cypher. another and data incorrectly related in one system versus the others. Graph databases are being used to solve problems in a way that They are giving data seen multiple times a high confidence score, and have never been solved before, not because people who use them sending data disputed amongst systems to analysts for reconciliation. are any smarter, but because they are applying a different tool to The goal of these types of projects is twofold: One goal is to prevent the problem. If you hit a screw with a hammer hard enough over and automate fixes to the source systems, and the other is to build and over, it will eventually go into the wood, but sometimes turning the newly popular knowledge graph, which then becomes the it with a screwdriver will be a better solution. Relational databases golden record of data and a solid foundation for analytics, enterprise and graph databases are very different tools; you can think of Cypher data , and machine learning projects. like a screwdriver in a land of SQL hammers.

If all you use is relational databases, you owe it to yourself to try The important characteristics to remember are that every node graph databases. It's like only using languages like Java and C++ "knows" what it is connected to; the "joins" are created just once when some problems are better suited to languages like Python on the creation of the relationship; and the relationships of each and Ruby. The flexibility and expressiveness of graph databases individual node are grouped by type, which allows queries to touch has made it the fastest growing category in popularity these last very small amounts of data to get the answer. few years. Imagine a user node with 200 friends, 500 posts, 3000 likes, etc. So, before you spin up yet another relational database to try to Getting the 200 friends of that user node will cost 200 traversals handle the ever-increasing complexity of information, try a graph (following a pointer from a node to a relationship to another node). database, and you will find new ways to understand your data and Regardless of how many user nodes exist; regardless of how many add a powerful tool to your toolbelt. friendship relationships exist; regardless of how many posts, likes, or any other data exists; the query time is constant. The combined advantages of a dynamic data query language —- "pre-joined" MAX DE MARZI is a graph database expert. His graph relationships, "relationship-type grouping" per node, and dynamic power level is over 9000, but he is terrible at just about properties -- are not available in other types of databases. everything else. Over the past six years, he has written 150 blog posts on graph use cases on his blog at maxdemarzi.com Using Cypher, you are able to build a single query that can find where he waxes poetically and obsessively over these collections of paths of varying lengths while traversing many different types of points and lines. If you have a question about graphs, don’t be shy… connect. LinkedIn Twitter nodes and relationships. The * ("star") character in Cypher means

THE DZONE GUIDE TO DATABASES PAGE 13 OF 44 DZONE.COM/GUIDES

QUICK VIEW

01. The World Food Programme analyzes the many factors behind food poverty in order to deliver timely and AgensGraph: A Graph cost-effective help to the vulnerable.

02. A graph database, AgensGraph, is used by the WFP to map multiple DB Helping to Solve relationships within a repository of survey data.

03. AgensGraph is a multi-model graph World Hunger DB based on PostgreSQL that allows developers to integrate relational and graph data models.

BY JO STICHBURY FREELANCE TECHNICAL WRITER

In this article, we describe how data scientists in one of the world's To determine regional food security, the VAM unit gathers various data largest humanitarian organizations are using a graph database, Agens- including economic analysis, thematic analysis, and geospatial analysis. Graph, to analyze rich datasets and determine food security within However, another strand of their work is to determine the characteris- vulnerable regions. tics of an area by carrying out surveys through face-to-face or over- the-phone interviews. A single survey may have over 400 questions, Introduction and hundreds of thousands of local households and food suppliers are In the most extreme situations, such as war, flood, hurricane, earth- interviewed each year. quake, or fire, there are many vulnerable people left in need of critical nutrition to prevent serious health problems or even death. As part of The surveys are carefully designed and highly structured, but the the United Nations, the World Food Programme (WFP) assists such peo- amount of data is large, diverse, and extremely complex because: ple around the world. Every day, too many men, women, and children • It is impossible to create one single universal standard that lack a daily nutritious meal, according to the WFP --- which estimates encompasses all possible health, livelihood, and socioeconomic that 821 million people still go to bed on an empty stomach each night, variables for the many different regions at risk. and that one in three members of the global population suffer from some form of malnutrition. • A multitude of relationships exists among the variables them- selves. As Gaurav Singhal, the lead data scientist at UN WFP, Assisting 91.4 million people in around 83 countries each year, the WFP puts it, "Food is not just something you eat. In rural areas, food has, on any given day, 5,000 trucks, 20 ships and 92 planes on the move. is something you grow, food is something you transport and The logistics alone are significant, and it is essential that the WFP are something you sell." prepared to deliver the most effective response to every crisis. The Vul- nerability Analysis and Mapping (VAM) unit within the WFP consists of 200 Diverse Data Demystified food security analysts based in more than 70 countries such as Somalia, Let's consider a specific example of how the VAM unit works with the South Sudan, and Yemen. Their mission is to provide needed help in the data they collect to infer new information and determine the most at- most timely and cost-effective way possible. Many factors are behind food risk areas and people. This example is taken, with kind permission, from poverty. Not only does the team need to understand the level of food inse- Gaurav Singhal's presentation given at the Postgres Conference Silicon curity in a particular region, but why that insecurity occurs. Valley in 2018.

THE DZONE GUIDE TO DATABASES PAGE 14 OF 44 DZONE.COM/GUIDES

The VAM unit needed to consolidate their data about food consumption and expenditure. This data was collected by surveying different regions about diverse food groups, different units of measurement, varying cur- rencies, and multiple food sources (such as whether food is purchased from a market, grown at home, or whether it comes via government dis- tribution). How to collate this vital data into a single, useful repository?

The team found a solution by using a graph database to record the relationships between the variables, infer facts, and organize the data. Working with AgensGraph from Bitnine Global, the VAM data scien-

tists took the complex dataset and made sense of the information it Defined Edge Types and Properties contained. Singhal says in the presentation that "...through the use of inferred facts, we were able to easily aggregate and coalesce highly disaggregated raw data along multiple dimensions."

In the era of Big Data, many organizations are finding that traditional RDBMS cannot keep pace with the variety of data formats and their rapidly changing nature. As just one example of the NoSQL database revolution, graph databases are increasingly proving themselves as a powerful technology in the data management community today. Gart- ner predicted in 2015 that at least 70% of leading companies would pilot a graph database project by the end of 2018. EXAMPLE DATA AgensGraph is a multi-model graph database based on PostgreSQL that Even in this simple example, it is clear that the graph database queries allows developers to integrate the relational and graph data model and inference allowed the UN WFP to find a way to:

into one database. SQL and Cypher can be used in the same query, • Identify the households with the most pressing malnutrition issues which allows developers to build upon the power of a graph database • Prioritize distribution of items that aligned with food consump- alongside a legacy SQL environment for data management. PostgreSQL tion pattern analysis users can easily migrate relational data into the graph with support from • Plan appropriate nutrition education programs for long-term PostgreSQL extensions. AgensGraph also supports features such as ACID sustainability transactions, MVCC, triggers, and JSON, and provides a browser to visu- If you're interested in seeing AgensGraph in more detail, here is a video alize the graph contents, forming an intuitive model of the real world. that illustrates how how the WFP work with the database.

One of the examples given in the presentation is a study of the dietary For Further Information habits of a population receiving WFP cash handouts to buy food. The The priority of the WFP is "Zero Hunger," which is a pledge to end hun- graph database can be queried to establish the families that are buying ger, achieve food security, improve nutrition, and promote sustainable and consuming non-essential sugary food 7 days per week, but only agriculture by 2030. Despite progress so far, there is still a long way to consuming essential Vitamin A-containing food once per week. By es- go. To achieve this goal, governments, citizens, civilian organizations, tablishing the regions and characteristics of families lacking education and the private sector must collaborate to invest, innovate, and create about the importance of Vitamin A, the WFP can focus their efforts to lasting solutions. To accelerate the achievement, they work with various improve information. They may reduce cash handouts and instead intro- technologies, of which Bitnine's AgensGraph is one. AgensGraph Com- duce vouchers to be exchanged for those foodstuffs that contain Vitamin munity Edition is licensed under AGPL 3.0, while the Enterprise Edition A, for example, in certain areas. is licensed under a commercial license.

The following graphic illustrates the concept modeling behind this JO STICHBURY is a technology writer and podcaster example: with over 20 years’ experience in the software industry, including 8 years of low-level mobile • A household, which consumes a food group (in this case, meat) development. Jo typically writes about machine intelligence, high performance computing, electric and driverless vehicles, and • A food group (meat), and the nutrition it contains renewable energy. She podcasts about developer marketing at • A household that expends money upon such a food group developermarketingpodcast.com. LinkedIn Twitter

THE DZONE GUIDE TO DATABASES PAGE 15 OF 44 DZONE.COM/GUIDES COMPARE COUCHBASE VS. MONGODBTM

couchbase.com/CBvsMongo

THE DZONE GUIDE TO DATABASES PAGE 16 OF 44 DZONE.COM/GUIDES

And as a cloud-native distributed database, we accelerate the innovation of enterprise-ready applications.

NoSQL for Highly Couchbase is also the only database that combines the best of NoSQL with the power and familiarity of SQL in a single elegant Scalable Business-Critical platform spanning from any cloud to the edge. We give developers all the tools to develop web, mobile, and IoT applications with Applications in Any Cloud unmatched agility. And our comprehensive SQL-compatible query language provides ANSI joins, nest, unnest, group, sort, and other data manipulation functions.

Make no mistake, the requirements of today’s business-critical With our cloud-native architecture, businesses can easily deploy applications far exceed the intended capability of traditional to any cloud – public, private, or multi-cloud. Developers can build relational databases. Modern applications must process a growing and test applications in the exact environment where they’ll be variety, velocity, and volume of data. And they have to rapidly deployed. And our Kubernetes operator makes it simple to run and adapt to constantly evolving technologies, run in distributed manage Couchbase as a service in any cloud. environments, and deliver submillisecond response times to With no need to disrupt your development or operational millions of global users. environments, you can be confident your applications will deliver uncompromised performance and availability at any scale and any That’s why Couchbase – unlike other NoSQL databases – offers location – now and in the future. robust capabilities for business-critical enterprise applications on a highly scalable and available NoSQL platform. Our open standards WRITTEN BY RAVI MAYURAM simplify the transition from mainframe and relational databases. SVP OF ENGINEERING & CTO, COUCHBASE

PARTNER SPOTLIGHT Couchbase NoSQL Database Enterprise-class, multi-cloud NoSQL database

Category NoSQL Database New Release Annual Open Source? Yes

Case Study Strengths Couchbase's mission is to be the platform that • Built for change at scale – Support millions of interactions and easily accelerates application innovation. To make this respond to ever-changing business requirements

possible, Couchbase created an enterprise-class, • Memory-first speed – Rich data access, in-memory replication, multi-cloud NoSQL database architected on top of 99.999% availability an open source foundation. Couchbase is the only • Security across the stack – Encrypt, audit, protect, and secure your database that combines the best of NoSQL with the data anywhere power and familiarity of SQL, all in a single elegant • Cloud-native, available anywhere – Cross datacenter replication (XDCR) platform spanning from any cloud to the edge. in any cloud

Couchbase has become pervasive in our everyday • Fully synced and offline-first – Real-time data sync even when users lives; our customers include industry leaders are offline Amadeus, AT&T, BD (Becton, Dickinson and Company), Notable Customers Carrefour, Comcast, Disney, DreamWorks Animation, Website couchbase.com • Amadeus • Tesco eBay, Marriott, Neiman Marcus, Tesco, Tommy Hilfiger, United, Verizon, Wells Fargo, as well as hundreds of • Comcast • United Twitter @couchbase • LinkedIn other household names. For more information, visit Blog blog.couchbase.com couchbase.com.

THE DZONE GUIDE TO DATABASES PAGE 17 OF 44 DZONE.COM/GUIDES

Performance Tuning and QUICK VIEW 01. Understanding Your Role

Monitoring Traditionally Falls to 02. Discover Execution Plans the Database Administrator 03. Leverage Query Store

BY MONICA RATHBUN MICROSOFT MVP FOR DATA PLATFORM AND MICROSOFT CERTIFIED SOLUTIONS EXPERT

SQL Server database performance tuning and monitoring tradition- use SET SHOWPLAN_ALL prior to executing your code. You have two plan ally falls to the Database Administrator (DBA), but in reality, good options: Estimated Plan (CTRL +LM) and Actual (CTRL + M). Estimated performance starts as applications are being developed. If you are a plans will give you what the query optimizer will likely do and what the developer, it is critical to know how your application affects database estimated rows will be without actually executing the T-SQL or batch at performance. You must also take steps to continually monitor and tune compile time. In almost every case, the "shape" of your execution plan performance. Knowing what tools are available is a good way to start. (the query operators that the optimizer chooses) will be the same in an actual and estimated plan. In performance tuning, getting an estimated Execution Plans plan is a good place to start; however, we will focus on the actual execu- The most important element in understanding database performance is tion plan. The actual execution plan will depict exactly which operators the query execution plan. The execution plan contains all the informa- were utilized to execute the query. tion that SQL Server's Query Optimizer uses to execute the query. The Query Optimizer is responsible for road-mapping the most efficient plan to execute the provided query, using the lowest resource consumption. This execution plan is a best guess based on the column statistics that Figure 1 SQL Server maintains on your data. Note: it is very important to keep the USE AdventureWorks2016CTP3; statistics of the database tables and indexes up to date, in order to be able GO to create optimal execution plans. Once the execution plan is generated, SET SHOWPLAN_ALL ON; it contains runtime information, usage metrics, and warnings such as GO missing indexes, implicit conversions, and tempdb spills (think of this as your query paging to the file system). Getting a look into what your code is Here is a query that looks at all discounted products no longer active doing when it reaches the database is vital to performance tuning. with a unit price greater than 100. One of the key things in the plan to look at is the query cost. The CPU, IO, and memory consumption are To view execution plans in SQL Server, you must be granted SHOW- some of the factors that SQL Server uses in determining the cost of the PLAN permissions for all databases in that query. Once you have the query. The cost is an abstract value. You can use this number to compare proper permission, it's easy to access the execution plan. In SQL Server between two queries --- but because it's an estimate, it means you can't Management Studio, click on the icon highlighted in red in Figure 1, or count on it as an accurate measure of performance. Still, it is important

THE DZONE GUIDE TO DATABASES PAGE 18 OF 44 DZONE.COM/GUIDES

as a pointer to what may need to be tuned or changed if the cost is high pages that have to be read from memory or disk. The lower the number, relative to the overall query. the faster the read.

Before running the query, take note of the two SET statements. These

are two settings you'll want to begin using. SET STATISTICS IO displays statistics on the amount of page activity generated by the query. It gives you important details, such as page logical reads, physical reads, scans,

read aheads, and lob reads, both physical and logical. SET STATISTICS TIME displays the amount of time needed to parse, compile, and execute each statement in the query. The output shows the time in milliseconds for each operation to complete. This allows you to really see, in num- bers, the differences one code change can make.

USE AdventureWorks2016CTP3; GO Figure 3 SET STATISTICS IO ON GO You will also notice in the execution plan SQL Server told us we are miss- SET STATISTICS TIME ON; ing an index, and it gives us the actual index script we need to create. If GO you right-click on the missing index and choose "Index Details," it will SELECT Name, [Description],[UnitPrice],[UnitPriceDis- open a new window with the script you need to run. Be sure to name count] your index with proper naming conventions before you run it. FROM [Production].[Product] p

INNER JOIN [Production].[ProductDescription] pd USE [AdventureWorks2016CTP3] ON p.ProductID = pd.[ProductDescriptionID] GO INNER JOIN [Sales].[SalesOrderDetail] s CREATE NONCLUSTERED INDEX IDX_UnitPrice_UnitPriceDis- ON p.[ProductID]=s.ProductID count WHERE SellEndDate is not NULL ON [Sales].[SalesOrderDetail] ([UnitPrice],[UnitPrice- AND UnitPrice>100.00 Discount]) AND UnitPriceDiscount<>0 INCLUDE ([ProductID]) ORDER BY [Name],[UnitPrice] GO GO Here is the improved plan: Notice the 85% cost against SalesOrderDetailTable. By adding a Miss- ing Index to this, the plan tells us we can estimate an 83% performance

gain. In addition, note the SCAN on the Product.ProductID. For a query returning larger set rows, we always want the index to SEEK to the value, not SCAN. There are times when SQL Server will decide to do a scan if the table is small enough rather than try to seek an index. SEEK means that instead of reading the entire index, SQL Server is only reading the pages

it needs to return the rows meeting your predicate (such as the WHERE clause, or what is being filtered out).

Figure 2

Now let's take a look closer at what SET STATISTICS IO and TIME gave us. Looking at the output, we see the table had 1266 logical reads and a CPU time of 31 ms. We will use these numbers to compare performance improvements. These numbers are important because there are 8,000 Figure 4

THE DZONE GUIDE TO DATABASES PAGE 19 OF 44 DZONE.COM/GUIDES

Now, looking at the plans, you can see the difference. We have added a are put in place --- so you can directly track those impacts. It can also

SEEK on Product.ProductID, and we can see the plan is now using our help you identify and tune top resource-consuming queries. Hundreds, new index for SalesOrderDetail. Its cost is only 29%. This has effective- thousands, and even millions of queries are processed by a server in ly given us a more efficient plan. about a second. Digging through all of them to identify any issues can be very difficult. Query Store helps with that by doing the analysis for After you run the execution plan, you have the ability to save it by just you. In the past, to perform this kind of analysis, you had to access SQL right-clicking in the execution plan area and choosing Save Execution Server's caches in real time to capture the data you needed, which made Plan as shown below in Figure 5. after-the-fact tuning impossible. Fixing those performance issues after identifying them will depend on the issue, but Query Store makes is easy to find them.

Allowing insight as to what code could benefit from tuning, it tracks the number of times a query is called as well as which queries are waiting on server resources. It provides a way to analyze usage patterns and identify high resource consuming queries. It allows you, as developers, to monitor code you have written and to address any issues that may arise.

Let's look at an example of how a simple code change can regress a query and how Query Store can help you identify it.

Figure 5 First you need to enable the feature. Use the latest version of SQL Server Management Studio to see the most recent Query Store options and Additionally, SQL Server gives you the option to compare different analysis available. If you are working on a lower version, you will not see execution plans. You'll notice in Figure 5 there is a "Compare Show- all the reports listed. plan" option. This allows you to compare the active plan with a saved plan. The top plan is the active query and the lower will be the saved USING THE GUI query chosen for comparison. This information can be very useful and In SQL Server Management Studio Object Explorer, right-click on the da- should be added to your testing check lists. When comparing the plans, tabase, click "Properties," and then select "Query Store." In Operational Management Studio highlights similar query operators by shading them Mode, choose "Read Write." To get started, change the Query Store with the same color. The best place to start looking for differences is in Capture Mode to "Auto" (in SQL Server 2016 and 2017 the default for this the properties pane, which you can see on the right in Figure 6 --- where is "All"), which reduces capture overhead on your server. there's a "not-equal" sign, you can easily see the difference in costing these two plans.

Figure 6

QUERY STORE Next let's look at Query Store, a feature that was introduced in SQL Serv- er 2016 and iterates on execution plans for the developer. Query Store

captures execution plan history and runtime statistics, and it retains Figure 7 them for review and trending. It is a great tool for finding regression over a time period (last hour, day, week, and so on), as changes to your code You can also enable using T-SQL:

THE DZONE GUIDE TO DATABASES PAGE 20 OF 44 DZONE.COM/GUIDES

USE [master] tion. Clicking on each bubble will show you their plan in the bottom part GO of the screen. If you find better plans available for a query --- in cases ALTER DATABASE [AdventureWorks2016CTP3] SET QUERY_STORE such as parameter sniffing --- you can "force" a plan, which will require (OPERATION_MODE = READ_WRITE, QUERY_CAPTURE_MODE = AUTO) the Optimizer to use the selected plan. GO

As queries start to run, Query Store will start to keep plans and analyze them for overall query performance.

Figure 9

Another report I really like to view is Top Resource Consuming Queries. It's ideal for seeing the changes in a plan after index creation, new hard- Figure 8 ware, and new application versions. A good use case for this is when you upgrade to a newer version of SQL Server and you need to test your Let's look at the same query we used earlier and see its regression. Here code for performance regressions. You can simply enable Query Store, we are looking at the Regressed Queries report, based upon the dura- run your queries in the lower compatibility mode, and then run them tion metric. In this case, regression means performance has degraded again in the new upgraded compatibility mode. Query Store will clearly within a period of time (over the last week, day, hour, or so on). The que- identify regression directly related to the upgrade. ry now runs worse than before. Navigating through the window, you will find buttons to view the query, force and unforce a query plan, options Summary for grid and chart formats, and a way to compare selected plans. Don't just leave the performance tuning of your code to your SQL Query Store lets you examine regressed queries bases on various metrics. Server Database Administrators. As a developer, it is imperative that you continually monitor your code for performance regression and • Duration improvements. Using tools like execution plans and Query Store that are • CPU time free and readily available is a great way to accomplish that. I advise you • Logical reads to read more on both topics. There are a lot of great resources available, • Logical writes so be sure to check them out. • Physical reads • CLR time • DOP • Memory consumption • Row counts MONICA RATHBUN lives in Virginia and is a Microsoft MVP • Log memory used for Data Platform and Microsoft Certified Solutions Expert. She has nearly two decades of experience working with • TempDB memory used a wide variety of database platforms with a focus on SQL Server • Wait times and the Microsoft Data Platform. She is a frequent speaker at IT industry conferences on topics including performance tuning and Once again, if you are on a lower version of Query Store, some of these configuration management. She is the Leader of the Hampton Roads may not be available to you. SQL Server User Group and a Mid‐Atlantic PASS Regional Mentor. She is passionate about SQL Server and the SQL Server community, Note there are two execution plans indicated on the right chart for the doing anything she can to give back. Monica can always be found on Twitter (@sqlespresso) handing out helpful tips. You can find Monica query --- plan IDs 783 and 791. We can clearly see in the chart that plan blogging at sqlespresso.com. LinkedIn Twitter 791 is the better-performing plan based on the duration of each execu-

THE DZONE GUIDE TO DATABASES PAGE 21 OF 44 DZONE.COM/GUIDES Your devops will love to run Redis

Your developers will love to build with Redis

You will love Redis Enterprise

GET STARTED FREE

THE DZONE GUIDE TO DATABASES PAGE 22 OF 44 DZONE.COM/GUIDES

Redis Enterprise takes the simplicity, elegance, and performance of open-source Redis and transforms it into a highly available, linearly- scalable, database with remarkable ease of use around clustering.

Organizations typically use Redis Enterprise because of its shared- Redis Enterprise nothing clustering architecture enabling horizontal and vertical scaling with the press of a button. In addition, Redis Enterprise can be deployed in a geo-distributed architecture, preserving data consistency, by leveraging Conflict-free Replicated Data Types (CRDT).

Customers managing massive data sets can also take advantage Today's applications deal with two kinds of data: transactional of Redis on Flash and Intel’s persistent memory technology that data and instant data. Transactional data is held in a traditional preserves performance while providing significant cost savings. disk-based database; instant data resides in memory to provide the best possible application performance. Over the years, Redis, In addition, Redis Enterprise provides a collection of modules that consistently voted the most-loved database, has become the de turns Redis into a multi-model database. With Redis Enterprise, organizations can use Graph, Streams, Search, JSON, Time Series facto standard for instant data. and also serve TensorFlow AI models, all from within the same Redis has commonly been used for caching, but due to its flexible database. Redis Enterprise can be deployed on-premise, on a data structures — namely strings, lists, sets, sorted sets, hashes, cloud platform of your choice, and in hybrid and multi-cloud geo, bitmaps, hyperloglogs, and streams — more and more deployment architectures. organizations are using Redis for an increasingly broad set of use cases, including pub-sub messaging, asynchronous job queues, WRITTEN BY MADHUKAR KUMAR and session management. VP TECHNICAL AND PRODUCT MARKETING, REDISLABS

PARTNER SPOTLIGHT In-Memory Database Platform A major release each year, and minor releases every few months

Category In-memory database platform New Release A major release each year, and Open Source? Yes minor releases every few months

Case Study Strengths Whitepages’s innovative people search tools are powered by more • High-performance with sub-millisecond latency, supporting over a than five billion and growing global identity records. On average, million writes per second.

the proprietary Identity GraphTM solution makes 150,000 to • Redis Enterprise modules adds multi-model capabilities for manag- 200,000 calls per second to the three-terabyte database and can ing different data models like Graph, Time Series, Search, JSON and even surpass this number during peak hours. Streams.

Whitepages chose Redis Enterprise to handle time-sensitive • Shared-nothing cluster architecture with always-on availability and customer transactions and maintain sub-hundred millisecond effortless scaling. latency of its application and provide a consistent end-user • High throughput by running multiple Redis instances on multiple cores. experience of its digital identity verification services for businesses and consumers. With Redis on Flash, Whitepages saves up to 70% in infrastructure costs while delivering single Notable Customers digit latency. • Dell • TD Bank Website redislabs.com As Whitepages expands its identity dataset beyond North America, • Staples • Home Depot Twitter @redislabs it increasingly relies on Redis on Flash, and enterprise-grade Redis • Intuit • Mastercard from Redis Labs, to keep operational costs low and performance • Vodafone • Atlassian Blog redislabs.com/blog/ high. redislabs.com/customers/whitepages/ • Microsoft

THE DZONE GUIDE TO DATABASES PAGE 23 OF 44 DZONE.COM/GUIDES

QUICK VIEW

01. Azure commands can be used Cross-Platform to deploy everything from Virtual Machines (VMs), Database Servers, and Virtual Networks and Services.

Deployments in Azure 02. We'll look at a use case for deploying an Oracle VM in Azure.

03. Deployments can be performed at With BASH an enterprise level without demanding intense re-education of technical personnel.

BY KELLYN POT’VIN-GORMAN DATA PLATFORM ARCHITECT, MICROSOFT

When you think deployment automation, you may think cloud, been deployed. Several scripts in different languages are provided but do you think BASH shell scripting with automation in Azure? along with the templates — PowerShell, Ruby, .NET, and a tab Most may not even realize it's a common thing. I've been work- titled CLI. The CLI portion is just a BASH script. ing heavily in this area for the last year and it's been incredibly valuable and the cost savings are evident, as any cloud migration can be. The customers I work with, mostly in Higher Education, are working hard to embrace the cloud, but they may not have the ex- pertise, the resources, or the time to gain the technical knowledge If you have any doubts about this, the dead give-away is that when needed to deploy to the cloud. Many in their organizations feel you click on the CLI tab in the template output, the first line in the that if they could just get the infrastructure deployed, they could script sets the shell: easily get up to speed to manage the resources, but the new skills they imagine they'll have to add to their team often creates an overwhelming roadblock.

I felt a similar pressure this last year when I started at Microsoft Why the tab is titled CLI, (Command Line Interface) instead of and knew I would need to get up to speed very quickly. I'd worked BASH is beyond me. The PowerShell tab is titled correctly, but I in the Oracle and AWS clouds previously but had less experience can only guess that when they designed the portal, the scripting with Azure. I was significantly surprised when I logged into the language they used with the CLI hadn't been set yet and the tab Azure portal and discovered, to reverse engineer what was built title was left over from an earlier release decision. in the Azure portal, there were templates, also referred to as ARM templates, which could be used to recreate much of what had My personal challenge with the ARM templates is that they are

THE DZONE GUIDE TO DATABASES PAGE 24 OF 44 DZONE.COM/GUIDES

highly dependent on complex JSON and they require the user to 5. Create the VM with the appropriate OS and database installa- learn this along with all the other areas of the Azure cloud. As I re- tion chosen as part of our values that were entered earlier in searched how to make the most of what I already knew, I learned the interactive part of the script.

Azure has an extensive library of AZ commands that can be used 6. Create the script to: from the command line. PowerShell has its own built-in Azure –– start the listener. commands, but you can also use PowerShell scripts with the AZ –– start the instance. commands in a manner similar to what I will demonstrate in this –– create the database. article's examples. –– check the status of the database. The format for these commands is: –– log information on the VM about our database environment.

az -- -- 7. Once the script is written to disk, run a remote session to the VM and execute the script to perform the actions.

There are two reasons you would move to the AZ commands vs. This may sound like a lot to perform, but if you use a set scripting the JSON from the templates: format, the rest is rather simple. Most of the script can be recycled for future deployments of other VMs, too. BASH grants us the abil- 1. Dynamic environment variable generation from the ity to perform this type of work and many DBAs may have already script down. been doing this at some level, just not in the cloud. 2. Less code to manage. The new skill that will need to be added for the Oracle DBA is The JSON in the templates export every minute detail of the de- learning the AZ commands. If you're a PowerShell DBA, then ployment. It is an excellent artifact, but less pliable or manageable running through the steps for the AZ commands and database for deployments, granting me more power using the AZ com- creation will be necessary. Luckily, there is impressive reference mands with little to no JSON template/parameter files. material and regular new feature releases to help make this easier for all of us. Azure commands can be used to deploy everything from Virtual Machines (VMs), Database Servers, and Virtual Networks and Step by Step Services. When paired with shell scripts, (BASH or PowerShell) the All my scripts begin the same way: they have a section of com- automation can incorporate logical deployments of objects and ments describing the script, along with its purpose. The format data. With the addition of Azure DevOps CLI, advanced analytics of the steps in the script then proceeds the same way each time. features such as Azure Data Factory pipelines and dataflows can There's a very good reason for this: also be included.

1. Ease of readability. An Oracle VM on Azure 2. Level of maturity. To help demonstrate this, we'll look at a use case for deploying an Oracle VM in Azure. Although this is a cross-platform deployment I won't include all the scripts here, just the main areas that are of of a Linux VM and Oracle on , it's a simple task importance. The full script can be found, along with other resource using BASH scripting and AZ commands. deployments, in my GitHub repository.

Our script will perform the following: To begin on our Azure scripting journey, I need to capture the following values to populate the environment variables as we 1. Designate the arguments that need to be input to fulfill the requirements for the script to run successfully. move forward:

2. Set the values for the environment variables for the user's usage() { echo "usage: $0 -g -s -v deployment. -o -sz -s -l -gn 3. Execute the commands to deploy the resource group (sim- -sp -stp -pp " 1>&2; exit 1; } single grouping). declare groupname="" 4. Create the virtual network as part of the VM on which the declare sku="" database will reside.

THE DZONE GUIDE TO DATABASES PAGE 25 OF 44 DZONE.COM/GUIDES

declare version="" Ask Questions, Make Suggestions, or Give Options declare orname="" Depending on the type of value that needs to be specified, I then decalre size="" request the values for each environment variable, first asking the decalre adminuser="" user for the information with an example or posing the choices from declare zone="" declare gdb="" the Azure catalog (any statement inside the quotes, after the ECHO): declare syspassword="" shift $((OPTIND-1)) declare systempasword=""

declare pdbpassword="" if [[ -z "$groupname" ]]; then echo "Wheat is the name for the resource The values above correspond to the following: group to create the deployment in? Example: ORA_GRP " echo "Enter your resource groupname:" • Groupname: Used for the Resource Group in Azure that read groupname will group all the resources that we'll deploy for ease of [[ "${groupname:?}" ]] management. fi

• SKU: Used to call values from the Azure catalog for Azure # Create the latest version of Oracle VM installations SKU to be use for the sizing of the VM. By querying the Azure availavlbe and push to a file catalog, our script has longevity vs. static values set in the az vm image list --offer Oracle --all --publisher Oracle script. >db.lst

• Version: Also used as part of the calls to the Azure Catalog. if [[ -z "$sku" ]]; then This will be for the database version. echo "Here's the installation version, from 12c through 18c available for Oracle: " • Oraname: Name for the Oracle VM server. cat db.lst | grep sku | awk '{print $2}'| tr -d • Size: From the Azure catalog, size of the VM server. \"\, echo "Enter the version you'd like to in- • Adminuser: This is the VM administrator's name. stall, the numbering convention must be exact, feel free • Zone: Location Zone in Azure (when choosing a zone, I like to copy from the list and past here:" to list out the available zones for the user). read sku [[ "${sku:?}" ]] • Gdb: Global Database Name. fi • SysPassword: This is the password for the Oracle SYS User. For the example above, note the following command: • SystemPassword: Password for the Oracle SYSTEM User. az vm image list -offer Oracle --all --publisher Oracle • PDB Password: Admin Password for the Pluggable Data- >db.lst base. The versions available in the Azure catalog have the option for multi-tenant, so I'm capturing the value to be This command queries the catalog and pulls the VM list for any used if needed. Oracle images, filtered by those published by Oracle, and writes them to a file. The output of this file is formatted in JSON and I then ask my script to run through each one of these variables to looks something like this: get the values from the user, but I do this as a part of the prompts, like the first two, demonstrated below: [ { while getopts ":g:s:v:o:sz:a:l:gn:sp:stp:pp:" arg; do "offer": "Oracle-Database-Ee", case "${arg}" in "publisher": "Oracle", g) "sku": 12.1.0.2", groupname=${OPTARG} "urn": "Oracle:Oracle-Databse- ;; Ee:12.1.0.2:12.1.20170202", s) "version": "12.1.20170202" sku=${OPTARG} }, ;; {

THE DZONE GUIDE TO DATABASES PAGE 26 OF 44 DZONE.COM/GUIDES

"offer": "Oracle-Database-Ee", # Create a Resource Group, this must be a unique tenant "publisher": "Oracle", and choose the location zone to deploy to: "sku": "12.2.01", az group create --name $groupname --location $zone "urn": "Oracle:Oracle-Database-Ee: #Make this resource group and zone defalut 12.2.0.1:12.2.20180725", az configure -- defaults group=$groupname location+$zone "version": "12.2.20180725" }, # Create Oracle VM az vm create \ This output might look complex to work with, but with BASH, I --name $oraname \ can include utilities like AWK and GREP to search for patterns and --image $urn:latest \ format strings from the file to pass information to my script about --admin-username $adminuser \ SKUs, versions, and other pertinent information. You can see this --generate-ssh-keys in the snippet from the script, just a few lines later: az vm open-port \ cat db.lst | grep sku | awk '{print \$2}' | tr -d \"\, --name $oraname \ --port 22 \ Before I capture the list to the screen, I print to the screen, and --priority 330 (echo) that I'm going to be displaying the installation versions # Create Storage Disks to use with database from 12c through 18c that are available from Oracle. az vm disk attach --vm-name $oraname \

I then ask the user to choose one and type it in. The value is then -- caching ReadWrite \ --name $oraname"dsk" \ used to populate the value for the SKU. --sku $size \ I continue to ask questions and offer examples or options avail- -- new able from the catalog until all the values are entered and, if the user answers the questions, the script proceeds, if not, it exits, Create the Oracle VM deploying nothing. There are three steps to building the Virtual Machine that will house my Oracle database. Deploying to Azure 1. Create the VM. As part of my steps, there is one environment variable (urn, for the 2. Open the ports required to gain remote access to the VM. value of the image) that is set dynamically from the db.lst work 3. Attach the disk to be used for the database. file after parsing the$sku and $version values. I also set my logfile that will track all the output I write to it as part of my deployment. Note that in each of my AZ commands, I reference the variables After these two steps are complete, it's time to create our group that were set earlier in my script. that will contain all the resources in Azure. Now it's time to create our script to perform the work on the VM to az group create --name $groupname --location $zone build out the database:

Notice in the next step that I set the group and location as default # Get IP Address and log into VM for two reasons: export vmip+$(az network public-ip list | grep ipAddress

1. Less coding in subsequent commands, as this information | awk '{print $2}' | tr -d \"\,)

is required. # Build Dynamic Oracle script, including install build 2. Less concerns about forgetting to enter this information and for DBCA impacting the deployment. cat < mkdb.sh sudo su - oracle # Get Correct URN alue from sku and version entered: mkdir /home/poracle/data/ urn=$(cat db.lst | grep $version:$sku | grep urn | awk export oh+$(ls /u01/app/oracle/product) '{print $2}' | tr -d \"\,) export loffile=./$jobrun.txt #Start Listener

# Build Steps lsncrtl start

THE DZONE GUIDE TO DATABASES PAGE 27 OF 44 DZONE.COM/GUIDES

My script requires that I know the IP address for remotely logging tnsnames.ora" into my VM with SSH (secure shell). I capture this using the "az echo " (ADDRESS = (PROTIOCOL = TCP)(HOST = $vmh) network" command. You'll also note that I take advantage of the (PORT = 1521)) >> $ORACLE_HOME/network/admin/tnsnames. ora" very valuable grep/awk/tr(im) BASH commands. Once I'm finished, echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora" I need to use the concatenate (cat) utility to write to a new script echo " (CONNECT_DATA = >> $ORACLE_HOME/network/admin/ (mkdb.sh) for everything inside the EOM to EOM sections in the next tnsnames.ora" section of my script. We start with: echo " (SERVICE_NAME = pdb1) >> $ORACLE_HOME/net- work/admiin/tnsnames.ora" • sudo (switch user domain owner) to the Oracle user. echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora"

• mkdir (make the directory) for the datafiles. echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora" echo " >> $ORACLE_HOME/network/admin/tnsnames.ora" • export the oracle home (oh). echo " CDB1 = >> $ORACLE_HOME/network/admin/tnsnames.ora" • start the listener. echo " (DESCRIPTION = >> $ORACLE_HOME/network/tnsnames. ora: dbca -silent \ echo " (ADDRESS_LIST = >> $ORACLE_HOME/network/admin/ -createDatabase \ tnsnames.ora" -templateName General_Purpose.dbc \ echo " (ADDRESS = (PROTOCOL = TCP)(HOST = $vmh) -gdbname $gdb \ (PORT = 1521)) >> $ORACLE_HOME/network/admin/tnsnames. -sid cdb1 \ ora" -responseFile NO_VALUE \ echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora" -characterSet AL32UTDF8 \ echo " (CONNECT_DATA = >> $ORACLE_HOME/network/admin/ -sysPassword $syspassword \ tnsnames.ora" -systemPassword $systempassword \ echo " (SERVICE_NAME = cdb1) >> $ORACLE_HOME/net- -createAsContainerDatabse true \ work/admin/tnsnames.ora" -numberOf{DBs 1 \ echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora" -pdbName pdb1 \ echo " ) >> $ORACLE_HOME/network/admin/tnsnames.ora" -pdbAdminPassword $pdbpassword \ -databaseType MULTIPURPOSE \ # third part of dynamic script build, listener reload and db final check -automaticMemoryManagement false \ -staorageType FS \ lsnrctl stop -datafileDestination /home/oracle/data \ lsnrctl start -ignorePreReqs

In the above step in my script, I set the ORACLE_BASE and the echo "Data creation done, Part II Complete" echo "Part III, configure TNS and validate empty database" ORACLE_HOME using the values that are called as part of my base script. The next part appears to be simple but is quite tricky. I The second part of the script is to use the Oracle Database Config- am required to write to a file as part of the script I'm writing to. I uration Assistant (DBCA) to create the database. I then let the user ended up using an echo to a file and transferring this file over to know that these steps are complete. Keep in mind that right now the VM as part of the final script, but it took me a bit to figure out my script is just "writing" this information to another file to be how to accomplish this. Is there a better way? I'm not sure, but executed afterwards. right now it works, and I have figured out how to use the NETCA (Oracle Network Configuration Assistant) from the command line export ORACLE_BASE=/u01/app/oracle export ORACLE_HOME=/u01/app/oracle/product/$oh/db_Home1 to perform this same task on my list of enhancements. export vmh=$vmip Once it does write to the file, I need to stop and restart the listener,

#Create the TNSNAMES entry so this is what I add as steps to finish up my mkdb.sh creation script.

echo " PDB1 = >> $ORACLE_HOME/network/admin/tnsnames. # Log into new database to verify creation ora" plus connect as /sysdba <> $ORACLE_HOME/network/admin/ spool ./ck_db.txt tnsnames.ora" echo " (ADDRESS_LIST = >> $ORACLE_HOME/network/admin/ connect system@cdb1

THE DZONE GUIDE TO DATABASES PAGE 28 OF 44 DZONE.COM/GUIDES

select count(*) from dba_objects; When I executed the script, I entered the following information:

zone=EastUS and oraname=ora122db8. sho con_name; alter session set container=pdb1; Note, the $oraname variable is used to populate the name for sho con_name; all the resources, making the naming convention dynamic and connect / as sysdba select count(*) from dba_objects; simplified. The user now can use traditional tools to manage the spool off database, add enhanced security at the database level, or begin exit development. Scaling up is easy via the portal and requires little EOF training. If the organization desires to migrate existing Oracle data EOM into the Azure deployment, pre-existing Oracle migration tools can

To check everything, I log into the database with SQLPlus, do a be used to make easy work of it, as well. count on database objects, switch over to the pluggable database (PDB) and check it, logging to a text file and then exiting out of the SQLPlus session.

After all of this, you'll see an End of File (EOF) and End of Message (EOM) to close out all the writes to secondary files and scripts as part of this process. With a little ingenuity, The last part is to execute it all from my deployment server to the VM, which is done with the last AZ command, az vm run-command: the scripting skills you

# Run script created dynamically in the section above this one. already have with BASH az vm run-command invoke \ -- name $oraname \ or PowerShell can be -- command-id RubShellScript \ --scripts "./mkdb.sh" used to deploy to Azure. Once completed, this script can be run and the VM, along with the secondary script creations, don't take very long, but the database creation that's part of the secondary script does take some time. You will get a prompt with the percentage completed:

Copying database files What I've demonstrated here is that, with a little ingenuity, the 1% complete scripting skills you already have with BASH or PowerShell can be Once everything is done, the following resources are created in the used to deploy to Azure. Incorporated with the cloud's CLI utility, cloud, and can be viewed in the Azure Portal: environments can be prepped, dynamically deployed, managed, and verified — even cross-platform. These deployments can be performed at an enterprise level without demanding intense re-education of technical personnel. Embrace this hybrid auto- mation by incorporating the new with the old, while lessening the learning curve of the Azure cloud.

KELLYN POT’VIN-GORMAN is a Data Platform Architect at Microsoft. LinkedIn Twitter

THE DZONE GUIDE TO DATABASES PAGE 29 OF 44 DZONE.COM/GUIDES

Analytics and Transactions UNITED with MariaDB Platform

Download MariaDB to test drive the latest features

Mariadb.com/downloads

THE DZONE GUIDEGUIDE TOTO DATABASES DATABASES PAGE 3230 OFOF 4144 DZONE.COM/GUIDES

a small number of rows using indexes, and read most columns in a row. A data warehouse is great for performing a market basket analysis. These queries need most rows, but a small Bringing Real-Time number of columns per row.

What if an eCommerce application wanted to improve customer Analytics to Customer- engagement by alerting visitors to “soon-to-be-sold-out” products based on active carts, recent purchases and current Facing Applications inventory. This query is too difficult for a database. It needs to perform aggregates on many rows. Nor is it practical for a data warehouse. It will return a small number of rows, and many columns per row. There are databases for transactions, and there are data warehouses for analytics. Not long ago, analytics was used The solution is a hybrid transactional/analytical database with to derive the actionable insights to improving operational row and columnar storage, streaming change-data-capture efficiency and mitigating risks. Today, it’s used to improve to synchronize them and intelligent query routing to use the customer experience and customer engagement – to provide right storage for the right query. To use columnar storage for customers with actionable insights. analytics (e.g., find the top soon-to-be-sold-out products) and to use row storage for transactions( e.g., look up those five However, when it comes to customer-facing applications and products). This is the architecture of MariaDB Platform. analytics, databases offer too little and data warehouses offer too much. A database is great for browsing products, adding

them to a cart and completing a purchase. These queries return WRITTEN BY MARIADB

PARTNER SPOTLIGHT MariaDB Platform MariaDB Platform is the enterprise open source database for data-driven, customer-facing applications and hybrid transactional/analytical workloads.

Category DBMS New Release MariaDB Platform X3, Open Source? Yes January 2019

Case Study Strengths QBerg, a leading market research institute, collecting • NoSQL – JSON for flexibility and transparent sharding for scalability one million price observations every day to provide • Streaming integration – Apache Kafka and Apache Spark connectors its customers with pricing information on consumer goods. The database must support transactional • Temporal – Go back and query data at a previous point in time workloads (e.g., read and write current data) as well • Real-time analytics – columnar storage and parallel processing as analytical workloads (e.g, analyze historical data). However, QBerg outgrew the analytical capabilities of • HTAP – streaming change-data-capture and workload-based query its database, unable to grow beyond four months of routing historical data without exceeding performance SLAs. With MariaDB Platform, QBerg was able to provide Notable Customers its customers with more powerful, flexible and faster Website .com • Deutsche • ServiceNow self-service analytics, using row storage for reading Bank • Verizon @mariadb and writing current data and distributed, columnar Twitter • Nasdaq • Walgreens storage for real-time analytics on years’ worth of Blog mariadb.com/resources/blog historical data.

THE DZONE GUIDE TO DATABASES PAGE 31 OF 44 DZONE.COM/GUIDES

QUICK VIEW

01. The keys to database effectiveness are security, speed, scale, and performance.

Executive Insights on 02. High-speed ingestion and queries are the best way to manage and benefit from the onslaught on data the State of Databases from an increasing number of sources. 03. The future of databases is fluid with IoT, artificial intelligence and machine learning, real-time computing at the edge, quantum computing, and the automation of it all.

BY TOM SMITH RESEACRCH ANALYST, DEVADA

To learn about the current and future state of databases, we spoke like Spark and Kinesis have perfected the collection of data, there with and received insights from 19 IT professionals. Here’s what are still several challenges in the analysis and operationalization of they shared with us: the data.

3. The most significant changes to the database landscape over the Key Findings past couple of years have been the increased support and offerings 1. The holy grails of database effectiveness are security, speed, from cloud vendors and the explosion of choices from vendors. scale, and performance. These four features were more frequently There has been a massive transition to fully managed database mentioned than compliance, accessibility, availability, ease of use, services in the cloud, which gives developers the ability to work and real-time data. Each database you deploy has to meet the with data to support both real-time transactional apps and deep needs of your use case, be it speed, scale, security, or performance. analytics by using a single platform that minimizes data movement Clients are always looking for performance and security is a major and allows them to extract value faster. concern when moving from on-premise to the cloud. New kinds of databases have led to the development of an evolving 2. In our interviews, our respondents did not come to a consensus set of tools. This expansion of solutions and toolsets has resulted on the best way to get a handle on the exploding volume of data in a real shift toward matching the right database tool for the right from an ever-increasing number of sources. High-speed ingestion database job, and the number of databases that teams use has was the only suggestion mentioned more than once. It’s incredibly increased dramatically as a result. important to ingest, store, and present data for querying. When peo- ple struggle, they’ve only figured out one piece of the puzzle. They 4. When respondents were asked if they were seeing increased may be able to ingest quickly, but they’re not able to analyze and adoption in a database persistence model, graphs were the most get insights. It’s all about being able to capture the data and then common answer, followed by SQL/relational, document, time-se- create value from it at the same time. While data ingestion solutions ries, and columnar databases. Given the growth of purpose-fit

THE DZONE GUIDE TO DATABASES PAGE 32 OF 44 DZONE.COM/GUIDES

databases, it appears that more clients are looking to find the right 9. To be more successful with database projects, developers need to tool to solve the particular business problem they are working to understand the concepts of the databases they are working with as solve. There’s also been increased adoption of database-as-a-service well as the foundational elements of AI/ML. Application developers solutions as more companies move to the cloud and they can offload need to have a rudimentary understanding of how data is being many administrative and operational functions. stored and accessed, as well as how the database is going to help you achieve the goal of the application. Do some upfront work with the 5. The most frequently mentioned use cases are in financial services database before you get into the specifics of how the application and and retail, and revolve around forecasting, risk mitigation, and database will work together. Be able to determine the best tool for improving customer experience. Customer journey and hyper-per- the job. sonalization efforts in particular are being spearheaded by marketing across a number of industries and have specific data needs. Classical Understand how to interact with databases. Understanding database use cases like risk management, fraud detection, and inventory man- architectures, database design, access paths/optimizations, and even agement are being rebuilt with built-in real-time analytics. how data is arranged on disk on in-memory Is useful, particularly if you are a backend developer. 6. The most common issues with databases revolve around scale and knowledge. The number one issue is around scale, performance, and It’s also beneficial to understand basic data modeling techniques, cost efficiency. You need to be able to design the database and IT normalization versus denormalization, SQL, when to use foreign stack to cope with the explosion of data. Data volumes are only going keys, execut on plans, prepared statements, different types of joins up. However, you must also be able to access and analyze the data. available, data obfuscation, and encryption. It’s not very valuable unless you are gaining insight from it. Here are the contributors of insight, knowledge, and experience:

Data management can be difficult if you don’t have a knowledge- • Ragu Chakravarthi, SVP and Chief Product Officer, Actian

able staff to handle regular database patches and updates, disaster • Joe Moser, Head of Product, Crate.io recovery tests, and to implement and follow established processes • Brian Rauscher, Director of Support, Cybera and procedures. • Sanjay Challa, Director of Product Management, Dactical

7. Given the vast number of vertical-specific solutions available, the • OJ Ngo, CTO, DH2i future of databases is fluid with IoT, artificial intelligence and machine • Anders Wallgren, CTO, Electric Cloud learning, real-time computing at the edge, quantum computing, and • Johnson Noel, Senior Solutions Architect,

the automation of it all. IoT is now producing unseen volumes and • Adam Zegelin, SVP Engineering, Instaclustr

sets of data. Companies will use real-time analytics and enhanced BI • Daniel Raskin, CMO, Kinetica reporting to build a data-driven culture. Leading database vendors • James Corcoran, CTO of Enterprise Solutions, Kx are automating common DBA tasks like performance tuning, monitor- • Neeraja Rentachintala, V.P. of Product Management, MapR ing, disaster recovery, high availability, low latency, and auto-scaling • Mat Keep, Senior Director of Product & Solutions, MongoDB based on historical workloads. • Philip Rathle, V.P. of Products and Matt Casters, Chief Solution 8. The biggest concerns regarding databases today are around secu- Architect, Neo4j rity and complexity. Security of data at rest continues to be a major • Ariff Kassam, V.P Products, NuoDB

issue, as exemplified breaches at Equifax, StackOverflow, and several • Dhruba Borthakur, co-founder and CTO, Rockset

other targets. We still have many companies that pay very little atten- • Erik Gfesser, Principal Architect, SPR tion to database security. Poor database management practices and • Lucas Vogel, Owner, Vogel Endpoint Systems procedures are revealed with data breaches and cyber hacks. • Neil Barton, CTO, WhereScape

The complexity and the sheer amount of data that’s in the open is a concern. There are so many options to consider, it’s hard to under- TOM SMITH is a Research Analyst at Devada who excels stand the subtle differences and determine the optimal solution for at gathering insights from analytics—both quantitative the business problem you are trying to solve. This is exacerbated by and qualitative—to drive business results. His passion vendors who misrepresent what their products can do at scale. Every is sharing information of value to help people succeed. In his spare time, you can find him either eating at Chipotle or working out at the vendor should ship a utility to help customers verify the capability of gym. LinkedIn - Twitter the vendor.

THE DZONE GUIDE TO DATABASES PAGE 33 OF 44 DZONE.COM/GUIDES

Deliver value quicker while keeping your data safe

Redgate helps IT teams balance the need to deliver software faster with the need to protect and preserve business critical data.

You and your business benefit from a DevOps approach to database development, while staying compliant and minimizing risk.

Find out more at www.red-gate.com/DevOps THE DZONE GUIDE TO DATABASES PAGE 34 OF 44 DZONE.COM/GUIDES

Deliver value quicker while keeping your data safe Database DevOps and the Elephant in the Room

The arguments about including the database in DevOps are now It’s Not All Bad News over and discussions have moved on from whether to include it to Rather than being a blocker to database DevOps, the requirement how to introduce it. to protect data provides a blueprint for introducing a database DevOps process that helps companies comply with data privacy The latest Accelerate State of DevOps Report, for example, talks legislation. There are four natural steps. about database version control and the automation of database changes alongside the application, and concludes: When teams Introduce standardized team-based development, using follow these practices database changes don’t slow them down, or database version control to maintain a single source of truth for cause problems when they perform code deployments. the code, encourage collaboration, and create an auditable record of who changed what, when. Redgate’s 2019 State of Database DevOps Report also revealed that 83% of organizations which have already adopted DevOps think it will take less than a year to automate and deploy database changes – and 45% say it will take less than six months. The requirement to protect data provides

They’re doing so because it’s been recognized that database a blueprint for introducing a database DevOps stops the problems of database deployments and DevOps process that helps companies increases the speed of delivery of database changes. comply with data privacy legislation Redgate helps IT teams balance the need to That’s the Good News But then there’s the elephant in the room I mentioned in the deliver software faster with the need to protect title of this piece. Data privacy. A wave of data breaches and concerns over how personal data is used have prompted new data Automate deployments by using a continuous integration and preserve business critical data. protection regulations to be introduced everywhere. process to trigger an automated build when changes are checked into version control. This tests the changes and flags up any errors, Since the enforcement of the GDPR in the EU, countries as far apart so a stable build is always available. You and your business benefit from a DevOps as New Zealand and China have enacted similar legislation, and the California Consumer Privacy Act is the first of many state laws Monitor availability as well as performance by managing approach to database development, while staying to be introduced in the US. permissions, ensuring data is available and identifiable, and having a record of which servers and what data is being managed. compliant and minimizing risk. What’s important isn’t the new legislation – it’s the common strands that are emerging. One of those is that new regulations Protect and preserve data by introducing a data masking tool to require personal data to be protected throughout the pseudonymize and anonymize data in database copies outside development process. production.

The problem is that the State of Database DevOps Report I In this way, you can streamline processes, automate repetitive tasks, minimize errors, and release value to end users faster. All mentioned earlier also revealed that 65% of respondents use while still being compliant with new data privacy regulations. a copy of their production database in development and test environments. The same database that contains the data that WRITTEN BY MARY ROBBINS needs to be protected. PRODUCT MARKETING MANAGER, REDGATE Find out more at www.red-gate.com/DevOps THE DZONE GUIDE TO DATABASES PAGE 35 OF 44 DZONE.COM/GUIDES

Databases Solutions Directory

Company Product Product Type Free Trial Website

actian.com/data-management/ Actian Actian NoSQL Object-oriented Free solution versant-nosql-object-database/

actian.com/data-management/ Actian Actian X RDBMS Free solution actian-x-hybrid-rdbms/

guides.rubyonrails.org/active_ ActiveRecord ActiveRecord ORM Included in Rails record_basics.html

Aerospike Aerospike Server In-memory, KV Open source aerospike.com

Altibase Altibase HDB In-memory, NewSQL Open source altibase.com

Amazon Web DynamoDB KV, document, DBaaS Free tier available aws.amazon.com/dynamodb Services

Amazon Web SimpleDB Column store Free tier available aws.amazon.com/simpledb Services

Apache Foundation Apache Cassandra KV, wide column Open source cassandra.apache.org

Apache Foundation Apache HBase Wide column Open source hbase.apache.org

In-memory, Hadoop, Apache Foundation Apache Ignite Open source ignite.apache.org data grid

Apache Foundation Apache OpenJPA ORM Open source openjpa.apache.org

THE DZONE GUIDE TO DATABASES PAGE 36 OF 44 DZONE.COM/GUIDES

Company Product Product Type Free Trial Website

Included in iOS & developer.apple.com/ Apple Core Data ORM macOS documentation/coredata

ArangoDB ArangoDB Graph, document, KV Open source arangodb.com

developer.atlassian.com/server/ Included in Jira & Atlassian ActiveObjects ORM framework/atlassian-sdk/active- Confluence objects

CakeDC CakePHP ORM Open source cakephp.org

Cambridge Data discovery & Demo available by Anzo cambridgesemantics.com/product Semantics integration request

Canonical Storm ORM Open source storm.canonical.com

KV, document, data Couchbase Couchbase Server Open source couchbase.com caching

CUBRID CUBRID RDBMS Open source cubrid.org

DataStax DataStax Distributed cloud DB N/A datastax.com

Application release Demo available by Datical Datical datical.com automation for DBs request

Database Release Continuous delivery Demo available by dbmaestro.com/products/ DBmaestro Automation for DBs request database-release-automation

Demo available by Delphix Delphix Dynamic data plaform delphix.com request

Eclipse Foundation EclipseLink ORM Open source .org/eclipselink

Available by embarcadero.com/products/ Embarcadero RDBMS request interbase

enterprisedb.com/products/edb- EnterpriseDB EDB Postgres Platform RDBMS Open source postgres-platform

NewSQL, KV direct Available by FairCom c-treeACE faircom.com/products/c-treeace access request

Firebird Firebird RDBMS Open source firebirdsql.org

$300 credit over 12 Google BigTable Column store cloud.google.com/bigtable months

THE DZONE GUIDE TO DATABASES PAGE 37 OF 44 DZONE.COM/GUIDES DZONE.COM/GUIDES NoSQL With No Equal Production-Ready Cassandra, 100% Open Source Compatible

Get the DataStax Distribution of Apache Cassandra™

THE DZONE GUIDE TO DATABASES PAGE 38 OF 44 THE DZONE GUIDE TO DATABASES PAGE 38 OF 44 DZONE.COM/GUIDES

How DataOps Is Changing Application Development

Another day, another buzzword that surfaces in the data management efficiency, and engagement to data teams (and, by extension, entire space with promises of changing the way things are done. organizations) that DevOps brings to software development teams.

Although in this case—it could be with good reason. But how, specifically, do companies stand to benefit from adopting a DataOps mindset? DataOps, or data operations, refers to the next step in the evolution of DevOps teams, where data scientists, processes, and tools are Here are several of the benefits DataOps brings to application development: increasingly integrated into the development process at data-focused • Real-time insights enterprises. This makes it easier for companies to leverage real-time data • Enhanced collaboration across the enterprise to serve up delightful customer experiences and make better decisions at • Improved agility and faster response times a moment’s notice. • Faster bug fixes By unlocking the true power of DataOps, companies can build modern, data- • Better decisions and avoid costly missteps rich applications in less time while operating with more agility and baking • Higher quality products more automation into their processes to shorten the development lifecycle. • Increased employee engagement • Stronger user experiences Companies that successfully implement an agile approach to data

management are four times more likely to enjoy business outcomes that The benefits of DataOps speak for themselves. Much like DevOps teams exceed shareholder expectations. have enabled software companies to ship better products faster and with greater frequency, DataOps teams bring the same level of efficiency and So, while it’s still a relatively new concept, it comes as no surprise that data-driven decision-making to their organizations. one study found that 73% of organizations were planning to invest in DataOps during 2018. As the promise of DataOps is realized, we expect How to Achieve DataOps more and more companies to invest in the space in 2019 and beyond. Change is never easy, especially when employees have been doing things What is DataOps? one way forever. The names sound similar and the two practices are related. But DataOps To launch DataOps at your organization, you first need to get your team to is not just DevOps with data. Still, just as DevOps transformed the way buy in to the new way of working. Beyond that, implementing DataOps at modern applications are developed, DataOps promises to transform the your organization requires the right suite of tools as well as an underlying way organizations use data and run data analytics, speeding up data- active everywhere database that ensures contextual data is always driven processes considerably. available in real time—even when teams are globally distributed and Just as the agile approach to software development comes with a shiny applications are hosted in hybrid cloud environments. Agile Manifesto, DataOps has its own manifesto, too. The DataOps While it might take some time to get your team to buy in to DataOps and Manifesto places emphasis on: master the associated new workflows, whatever time and resources you • Individuals and interactions over processes and tools invest to make that happen will undoubtedly return significant dividends • Working analytics over comprehensive documentation down the road. • Customer collaboration over contract negotiation • Experimentation, iteration, and feedback over extensive upfront design Thanks to DataOps, your company will be able to move faster, make • Cross-functional ownership of operations over siloed responsibilities better decisions, and ship better products while impressing your customers and inspiring your employees. The DataOps Manifesto also features a set of 18 core principles, including

“continually satisfy your customer,” “value working analytics,” “embrace A new buzzword—yes. But one that is perhaps buzzworthy. change,” and “it’s a team sport.”

What are the Benefits of DataOps? WRITTEN BY MATT KENNEDY By now, you know that DataOps promises to bring the same speed, SENIOR DIRECTOR, CLOUD SOLUTIONS, DATASTAX

THE DZONE GUIDE TO DATABASES PAGE 39 OF 44 DZONE.COM/GUIDES

Company Product Product Type Free Trial Website

$300 credit over 12 Google Cloud Spaner RDBMS, NewSQL cloud.google.com/spanner months

GridGain In-Memory In-memory, Hadoop, gridgain.com/products/gridgain- Gridgain Free tier available Computing data grid products

Hazelcast IMDG Hazelcast In-memory, data grid Open source hazelcast.com/products

Hibernating Rhinos RavenDB Document Free tier available ravendb.net

Available by IBM IBM DB2 RDBMS ibm.com/analytics/us/en/db2 request

ibm.com/analytics/us/en/ IBM Informix RDBMS, transactional Free tier available technology/informix

Stream processing & InfluxData InfluxEnterprise 14 days influxdata.com/products analytics

Object-oriented, intersystems.com/products/ InterSystems Cache N/A relational cache/#technology

Open-source JOOQ JOOQ ORM for Java jooq.org version available

NewSQL, MariaDB ClustrixDB N/A mariadb.com/products/clustrixdb transactional

RDBMS, document, mariadb.com/products/mariadb- MariaDB MariaDB Platform Open source MySQL family platform-transactional

marklogic.com/product/ MarkLogic MarkLogic Transactional Free tier available marklogic-database-overview

MemSQL MemSQL In-memory, NewSQL Free tier available memsql.com/product

Micro Focus Vertica Enterprise RDBMS, column store Free tier available vertica.com/product/on-premise

Part of .NET Microsoft Entity Framework ORM docs.microsoft.com/en-us/ef framework

180-day preview microsoft.com/en-us/sql-server/ Microsoft SQL Server 2019 RDBMS available default.aspx

MongoDB MongoDB Document Open source .com

THE DZONE GUIDE TO DATABASES PAGE 40 OF 44 DZONE.COM/GUIDES

Company Product Product Type Free Trial Website

MyBatis MyBatis ORM Open source blog.mybatis.org/p/products.html

Neo4j Neo4j Graph Free tier available neo4j.com

Nhibernate Nhibernate ORM for .NET Open source nhibernate.info

NuoDB NuoDB NewSQL Free tier available nuodb.com

opentext.com/products-and- solutions/products/specialty- OpenText Gupta Available by OpenText RDBMS technologies/opentext-gupta- SQLBase request development-tools-databases/ opentext-gupta-sqlbase

oracle.com/technetwork/ Oracle JDBC ORM Free solution database/application- development/index-099369.html

MySQL Community Oracle RDBMS Open source .com Edition

Oracle Oracle Database RDBMS, document Free tier available oracle.com/database/index.html

oracle.com/technetwork/ Oracle Toplink ORM Free solution middleware/toplink/overview/ index.html

Open-source OrientDB OrientDB Document, graph, KV orientechnologies.com version available

OrmLite OrmLite ORM Open source ormlite.com

Percona Server for Open-source percona.com/software/mysql- Percona RDBMS, MySQL family MySQL versions available database/percona-server

Pivotal Gemfire In-memory, data grid Free solution pivotal.io/pivotal-gemfire

PostgreSQL PostgreSQL RDBMS Open source .org

RavenDB RavenDB NoSQL, document Free tier available ravendb.net

THE DZONE GUIDE TO DATABASES PAGE 41 OF 44 Company Product Product Type Free Trial Website

CI, source control, Redgate Software SQL Toolbelt 14 days red-gate.com/products change mgmt for DBs

Red Hat ORM Open source hibernate.org

In-memory, KV, data Redis Labs Redis Open source redis.io caching

In-memory, KV, data redislabs.com/redis-enterprise/ Redis Labs Redis Cloud Free tier available caching, DBaaS essentials

Basho Riak Document, KV Open source riak.com/products

In-memory, column- SAP SAP HANA Platform Free tier available sap.com/products/hana.html oriented RDBMS

In-memory, Hadoop, ScaleOut ScaleOut StateServer 30 days scaleoutsoftware.com/products data grid

Available by Software AG Adabas Stream processing adabas.com request

Database Performance DB performance solarwinds.com/database- Solarwinds 14 days Analyzer tuning performance-analyzer

Open-source splicemachine.com/product/ Splice Machine Splice Machine NewSQL, RDBMS version available features

SQLite SQLite RDBMS Open source .org

Studio 3T Studio 3T IDE for MongoDB Free tier available studio3t.com

SymmetricDS SymmetricDS DB replication Open source symmetricds.org

teradata.com/Resources/ Open-source Teradata Aster Database Specialist analytics Brochures/Aster-Database- version available Performance-and-Scalability

VoltDB VoltDB In-memory, NewSQL 30 days .com

THE DZONE GUIDE TO DATABASES PAGE 42 OF 44 Zones

Database dzone.com/database The Database Zone is DZone’s portal for following the news and trends of the database ecosystems, which include relational (SQL) and nonrelational (NoSQL) solutions such as MySQL, PostgreSQL, SQL Server, NuoDB, Neo4j, MongoDB, CouchDB, Cassandra, and many others. Diving Deeper Big Data dzone.com/big-data The Big Data/Analytics Zone is a prime resource and community for Big Data professionals of all types. We’re on top of all the best tips and news for Hadoop, R, and data visualization technologies. Not only that, but we Into Databases also give you advice from data science experts on how to understand and present that data.

Cloud dzone.com/cloud The Cloud Zone covers the host of providers and utilities that make cloud computing possible and push the limits (and savings) with which we can deploy, store, and host applications in a flexible, elastic manner. The Cloud Zone focuses on PaaS, infrastructures, security, scalability, and hosting servers.

Twitter Refcardz DevOps for Databases @SQLEspresso @GFritchey Download this new Refcard to get started with Database Release Automation and eliminate bottlenecks. Learn the key best practices that your DevOps database solution should meet in order for you to get the @KimberlyLTripp @BrentO most out of your investment.

@MMarie @DBArgenis Building Microservices With Redis This Refcard will teach you about types of data processing in microservices, using Redis Streams for interservices communication, @catherinew @cmfinlan optimizing microservices, and more.

Working With Time Series Data @victoria_holt @erinstellato As more connected devices are implemented and data is expected to be collected and processed in real-time, the ability to handle time series data has become more and more important. Learn about the basics of time series data, how it can be used across industries, and using open source tools to manage it.

Books Podcasts

Database System Concepts SQL Data Partners If you’re just getting started with databases, check out this book to learn all Learn about new and familiar topics in the worlds of SQL Server and about database management. professional development.

Readings in Database Systems Voice of the DBA Learn about the fundamentals of recent innovations and research in the Get a look at databases and the world from a data professional’s databases world. perspective.

Fundamentals of Database Systems DBAle Get clear explanations of database theories, designs, and systems. Get clear explanations of database theories, designs, and systems.

THE DZONE GUIDE TO DATABASES PAGE 43 OF 44 DZONE.COM/GUIDES

VISIT THE Database Zone

Managing the growing amount of data eectively is an ongoing concern. From handling event and streaming data to finding the best use cases for NoSQL databases, there’s plenty to unpack in the database community.

Keep a pulse on the industry with topics such as:

• Best practices in optimization • Database performance and monitoring • Handling event and streaming data • Advancements in database technology

Visit the Zone

TUTORIALS CASE STUDIES BEST PRACTICES CODE SNIPPETS

THE DZONE GUIDE TO DATABASES PAGE 44 OF 44