PAPER

EMBEDDED MANAGEMENT FOR IOT AND EDGE DEVELOPERS By John K. Waters

www.actian.com :: PAPER

EXECUTIVE SUMMARY The advent of the of Things (IoT) and the proliferation of Edge have presented developers with a range of significant challenges—not the least of which is the management of the massive amount of data fueling these disruptive trends. Third parties, organizational systems, and billions of end-point devices—everything from industrial sensors and actuators to smart phones and autonomous vehicles—are generating unprecedented volumes and varieties of data previously unheard of velocities. These “three Vs” define , and developers are expected to make the most of it in the IoT and Edge applications they build—not just manage it, but generate near real-time analytics at the point of action for business insights, while providing overall data governance and security. This analysis layer adds complexity for developers, and the old client- models won’t give them what they need in this environment. Whether you’re push- ing your data to a terrestrial server right next door or the cloud, the performance lag inherent in a system that puts the app on one machine and the data it needs on another becomes a defeating factor when the data needs to be processed in near real time. In manufacturing, health care, and financial services, for example, milliseconds matter. Neither will the practice of simply manipulating data in some memory allocated space using file systems provide a solution going forward. The solution for a growing number of developers lies in the embeddable database, the database and management software that is effectively part of the application itself. This white paper examines the key concepts and enabling technologies of embeddable DBs that IoT and Edge developers need to understand.

DEFINITIONS are becoming the norm, and developers Database expertise varies among devel- need to understand them.  Cover Image: By Marisha/ Shutterstock.com opers, as does familiarity with the current terms of art in the IoT and Edge Comput- Embedded ing space, so it makes sense to define a The focus of this paper is the embed- few terms. The truth is, the database is dable database, which is a database (DB) no longer a hunk of shrink-wrapped soft- and various levels of database manage- ware deployed and tended by a database ment software integrated into, or very administrator (DBA). Data-intensive apps tightly bundled with, an application that

[ 1 ] :: PAPER

needs direct or fast access to the data it is manipulating. The DB and the app Edge Computing refers to distributed using it are running on the same machine, information architecture in which the and they communicate via procedure processing of data occurs where it is calls, so there’s very little latency. generated, far from a centralized data- For the purposes of this paper, the center, at the “edge” of the network. category does not That processing is done on IoT devices include in-memory (IMDB) in and remote gateways. Edge Computing which the data is stored entirely in the has also been described as a kind of main memory. Those systems are light, decentralized data persistence that simple, and fast, but provide no perma- occurs on or near the devices that nent storage. generate the data.

THE SOLUTION FOR A GROWING NUMBER OF DEVELOPERS LIES IN THE EMBEDDABLE DATABASE, THE DATABASE AND MANAGEMENT SOFTWARE THAT IS EFFECTIVELY PART OF THE APPLICATION ITSELF.

Also, an embedded database isn’t an , a term coined by , which is special- Cisco Systems in 2014, effectively means purpose software with a dedicated func- the same thing as Edge Computing, tion installed on a piece of hardware. though it’s more directly associated with As of this writing, embedded . As Cisco defines it, the databases are used primarily by Fog “extends the cloud to be closer to independent software vendors (ISVs), the things that produce and act on IoT original equipment manufacturers data.” The term has been called a (OEMs), and systems integrators (SIs). marketing take on Edge, but it has been But the IoT and Edge ecosystem is catching on. (“Fogging” has even expanding at warp speed, and the become a verb.) market is likely to expand with it. Data analytics is the of assess- IoT ing data for actionable insights. Embed- The is part of the ded analytics is the use of reporting and popular lexicon now, but just to be clear, analytic capabilities in transactional IoT refers to the vast and ever-expanding business applications. You can’t have network of physical devices linked embedded analytics without an through the Internet and via enterprise embedded database. intranets. There are billions of them now,

[ 2 ] :: PAPER

everything from smart phones to SQL temperature sensors, kitchen appliances The Structured Query Language (SQL) is to . The software installed on them the language most commonly associated gathers, stores, and analyzes data, either with relational databases. In fact, locally or through a connection with a “relational database” is almost separate database. synonymous with “SQL Databases.” An IoT gateway (sometimes called an SQL statements are used both for intelligent gateway or a control tier) can interactive queries for information from a take the form of a physical hardware relational database and for gathering data appliance or a piece of dedicated soft- for reports. It is considered a must-master ware. It serves as the connection between tool for developers working with the cloud and the universe of controllers, databases. sensors, and intelligent devices. NoSQL Database Management System A NoSQL database is a non-relational A database management system (DBMS) database—no tables with data organized is software that controls the storage, in rows and columns—and they come with retrieval, deletion, security, and integrity looser consistency models than traditional of data within a database. Embedded relational databases. NoSQL databases database solutions are sometimes called use a variety of data models, including embedded DBMSs. document, graph, key-value, in-memory, and search. They’re optimized for applica- Relational Database tions that require large volumes of data, A relational database is a DB that low latency, and flexible data models. organizes information into sets of tables with columns and rows. Each table Flat File contains data that relates to data in the A flat file is a plain text database from other tables. The relationships are which all word processing and other pre-defined and the data can be structure characters or markups have accessed or reassembled in different been removed. It contains a single table ways without having to reorganize the of data with one record per line. One of tables. These types of databases are the most common flat file examples is a designed for transactional and strongly comma-separated values (CSV) file, in consistent online transaction processing which table data is gathered in lines of (OLTP) applications. Relational database American Standard Code for Information management systems (RDBMSs) are Interchange (ASCII) text with the value preferred for OLTP, but not for online from each table separated by a analytical processing (OLAP). comma and each row represented by a new line.

[ 3 ] :: PAPER

“AS THE VOLUME AND VELOCITY OF DATA INCREASES, SO DOES THE INEFFICIENCY OF STREAMING ALL THIS INFORMATION TO A CLOUD OR DATACENTER FOR

PROCESSING.” —SANTHOSH RAO, GARTNER ANALYST

ACID Compliance a traditional datacenter or cloud. Gartner ACID (Atomicity, Consistency, Isolation, analyst Santhosh Rao summarized the and Durability) is an acronym for the four problem this trend is creating for IT attributes of a that organizations in a 2018 report (“What Edge guarantee its validity. All database trans- Computing Means for Infrastructure and actions must be ACID Compliant to Operations Leaders”): “As the volume and ensure data integrity. The attributes are: velocity of data increases, so does the Atomicity: in a transaction involving inefficiency of streaming all this information two or more discreet parts, if a part fails, to a cloud or datacenter for processing.” the whole transaction fails. And virtually every industry on the Consistency: data written to the data- planet is poised to invest significant base as part of the transaction must resources in reaction to these trends. adhere to all defined rules and restric- In another recently published report tions. (“Worldwide Semiannual Internet of Isolation: a transaction in process and Things Spending Guide”), IDC analysts not yet committed must remain isolated predict that IoT spending will reach $1.2 from any other transaction. Trillion in 2022. Durability: all the changes made to the These predictions should reassure database are permanent once a transac- developers considering their own IoT and tion is successfully completed. Edge strategies. While developers may be most interested in leveraging that THE LANDSCAPE have functionality behind them that Why should developers care about IoT, delivers their prescribed outcomes, when Edge Computing, and all this arcane data- it comes to APIs for management of the base lore? They’re not DBA’s, after all, and data within their IoT and Edge applica- they’re already coping with some signifi- tions, investing in technologies that cant challenges associated with acceler- enable data processing closer to the edge ating application release cadences. of the network allows organizations to Ignoring these trends is simply not an analyze that data in near real-time, which option. According to the industry analysts has quickly become an essential capability at Gartner, within the next four years 75 across many industries, including percent of the data generated in the enter- manufacturing, health care, telecommuni- prise will be created and processed outside cations, and finance. And those industries

[ 4 ] :: PAPER

need developers with the skills and the distributed data management. right APIs and underlying data manage- • For situations involving high data ment functionality to make that happen. ingestion rates, such as video streaming. • Mesh sensor networks with peer-to- USE CASES peer communication and control. The universe of use cases for embeddable • Local, intelligent sensor grid or databases is ever expanding, but a list of heterogeneous sensors with interdepen- current notable examples would include: dencies that need data governance, • In intermittent connectivity and security, or performance —not limited bandwidth environments, in which the data or device itself, but the a persistent disconnection would be metadata that deals with them. catastrophic where data is being ingested • Intelligent capital equipment with and, at a minimum, being prepared/pro- integrated instrumentation, complex cessed/formatted for later analysis. process, and regulatory oversight.

“DATA IS THE FUEL THAT DRIVES THESE ENGINES, AND DEVELOPERS NEED THE RIGHT COMPONENTS AROUND THAT BASIC COMBUSTION ENGINE.”

—LEWIS CARR, MARKETING DIRECTOR, ACTIAN

• When analytics are being embedded Increasingly expensive machinery and in the device or grid that require time, vehicles will be sold as a service; the only frequency, or other series data for real- way to effectively do this is by time decision making (even better if it’s instrumenting them extensively. against a changing baseline dataset). Increasingly, simple analytics are being THE KISS CONUNDRUM used locally to avoid a deluge of data in Another question these trends raise: the cloud (it’s not cheap to keep data in why not just use flat files or SQLite, the AWS/Azure, and fractions of pennies, popular open source database? even for depreciation on private cloud Developers are admonished to “keep it storage, adds up). simple, stupid,” and embeddable • Over the next few years, simple and databases seem like a complication. unsupervised routines, Why reinvent the wheel? Because the particularly for nodal processing in deep complexities of data management and learning networks, will reside on devices analysis are growing and the ability to or just above them at a gateway level; leverage that resource via embedded this will require local persistent and analytics is becoming a key competitive

[ 5 ] :: PAPER

advantage. Developers must rethink their server capabilities, SQLite cannot handle options, while still trying to adhere to this concurrent reads and writes. And doesn’t adage. provide full ANSI SQL support, so some An all-too common strategy today SQL calls embedded in application code among developers is to rely on flat file require workarounds to move to or from systems associated with operating SQLite. systems running on their platforms, or, in Actian’s marketing director, Lewis Carr, the absence of an OS, in local memory. summed up the situation succinctly in a Flat files are a disaster in IoT and Edge recent interview: “Data is the fuel that environments. As embedded database drives these engines, and developers provider Actian noted in a recent report need the right components around that (“Top 10 Reasons Friends Don’t Let basic combustion engine. They need to Friends Use Flat Files”), to ensure flat file replace the old carburetor with an over- data consistency and avoid corruption, head CAM 24-valve fuel injection system, developers must code create, read, so to speak.” update, and delete (CRUD) logic to store or retrieve data with care, and coding may EMBEDDABLE DATABASE need to vary across APIs and file systems. ESSENTIALS SQLite is a multiplatform database So, what should developers look for in a widely deployed in the enterprise as an modern, embeddable database solution? embedded data management solution. Generally speaking, an embeddable data- It’s popular because it’s quick and easy base needs to be small, fast, versatile, and leverages existing SQL developer and reliable, with a short code-path, pro- expertise. It’s definitely a step up from grammable for tight application flat file systems. But it has some serious coupling. But there are some specific fea- drawbacks, as Actian points out in tures and capabilities IoT and Edge appli- another report (“Ten Great Reasons to cation developers will want to consider Upgrade to Actian Zen from SQLite”), it as they go about the process of choosing wasn’t meant as a database for multiple an embeddable DB. apps to use simultaneously, which becomes a serious limitation as apps SQL and NoSQL Support evolve, adopt microservice architectures, Look for support for NoSQL program- and spread across virtual server instanc- matic API-based database access and es and out to well-resourced client devic- SQL relational database access to a es. Also, using SQLite often requires data single data set. The SQL access covers conversion and mapping across plat- reporting and local transactions, while forms, which can slow design and coding the NoSQL access provides performance through multiple APIs, adding ETL over- and local analytics support and leverage head, and generating maintenance and of a wide range of programming APIs. support headaches. Because it has no This is a must-have feature.

[ 6 ] :: PAPER

MODULARITY SHOULD BE CONSIDERED A KEY REQUIREMENT, BECAUSE DIFFERENT IOT AND EDGE SOLUTIONS HAVE DIFFERING LEVELS OF RESOURCES.

Modularity Times Series Data Look for solutions with modular architec- A survey of IoT developers conducted by tures that scale from a core set of librar- the Eclipse Foundation’s IoT Group in ies capable of single-user client data early 2018 found that a wide variety of management to a direct key-value store data is being collected by these applica- engine, and even up to a full-edged, tions across all industries, but about 62 enterprise-grade server capable of sup- percent is time series data. “Time series” thousands of users on multicore, are measurements or events that are VM cloud environments. Modularity tracked, monitored, down sampled, and should be considered a key requirement, aggregated over time—things like server because different IoT and Edge solutions metrics, application performance moni- have differing levels of resources. It might toring, network data, sensor data, events, be a 32-bit ARM-based sensor with a etc. Time series data occurs wherever the 16MB DRAM module, a 4GB flash drive, same measurements are recorded on a running Android Things; or a 64-bit - regular basis. Device information and log based Single board embedded data were second and third in the survey. in an MRI machine with 256 GB DRAM, 1 TB SDD drive, running Windows Server Also look for: 2019. These are literal opposites, but • 32-bit and 64-bit support on Intel they’re part of the IoT app spectrum. and ARM • Callable data management APIs from Multiple Platform Support all popular programming languages A critical feature: Unlike the desktop • Free to develop and no hidden world, which is still dominated by Wintel, surprise costs (often interpreted as Open the IoT and Edge Computing universe is Source) more fragmented. Developers must • A quick basic set of APIs – subset accommodate macOS, Android, variants that’s simple, straightforward and easy to of , and more. This is also true of use – open, close, read, write, etc. operation technology (OT) environments • A more comprehensive set of APIs in industrial settings. The IoT and Edge that handles things that you don’t want picture is likely to to do yourself or reinvent the wheel: remain very fragmented for the foresee- Index, sort, search, transpose a matrix, able future. etc.

[ 7 ] :: PAPER

• No changes to the APIs as develop- data that must be managed and analyzed ment moves from one platform or pro- close to the point of action. gramming language to another Given the essential requirements for • Any changes required to deal with data management and analysis emerging underlying file systems, such as encryp- at the edge of the network, it’s fair to tion/decryption, , and recovery on characterize the embeddable database as different platforms, should be abstracted the enabling technology of IoT and Edge and handled by the data management application development. The question isn’t system whether developers should employ this • The deployed software, including the technology, but when. At the end of the embedded DBMS, should be developer day, the three Vs of Volume, Variety, and configurable, so deployment is simple and Velocity have to generate a fourth V: Value. the DBMS isn’t adding additional configu- ration and management complexity Find out more: • Reporting should be something the ://www.actian.com/data- developer can do remotely or set-up to management/zen-embedded-database/ be pre-configured and automated, deliv- ering the proper data in the proper for- mats to the developer, data scientist, or for business and operational analysts • ACID compliance • No DBA: Implementing an embed- John K. Waters is a freelance journalist and dable database should relieve the develop- author based in Silicon Valley. He is editor- er of the need for database administrator at-large for Application Development supervision. Trends, and a contributing editor to Campus Technology, T.H.E. Journal, and CONCLUSION Law Technology News. He covers a wide The world is rapidly filling up with range of topics in information technology “things.” Some industry watchers say and is the author of more than a dozen there are already more Internet-connected books. He writes news and features about devices on the planet than there are trends, application people. And whether it’s a refrigerator and , infrastructure and scanning bar codes to compile a weekly database technologies, cloud computing, grocery list, a fitness tracker sending heart open source software, standards and rate information to a user’s smart phone, governance, , and the people or industrial machines ordering their own working in, and issues affecting, the maintenance checks, they’re all producing information technology business.

[ 8 ]