IDC TechScape IDC TechScape: Internet of Things Analytics and Information Management

Maureen Fleming Stewart Bond Carl W. Olofson David Schubmehl Dan Vesset Chandana Gopal Carrie Solinger

IDC TECHSCAPE FIGURE

FIGURE 1

IDC TechScape: Internet of Things Analytics and Information Management — Current Adoption Patterns

Note: The IDC TechScape represents a snapshot of various technology adoption life cycles, given IDC's current market analysis. Expect, over time, for these technologies to follow the adoption curve on which they are currently mapped.

Source: IDC, 2016

December 2016, IDC #US41841116 IN THIS STUDY

Implementing the analytics and information management (AIM) tier of an Internet of Things (IoT) initiative is about the delivery and processing of sensor data, the insights that can be derived from that data and, at the moment of insight, initiating actions that should then be taken to respond as rapidly as possible. To achieve value, insight to action must fall within a useful time window. That means the IoT AIM tier needs to be designed for the shortest time window of IoT workloads running through the end- to-end system. It is also critical that the correct type of analytics is used to arrive at the insight.

Over time, AIM technology adopted for IoT will be different from an organization's existing technology investments that perform a similar but less time-sensitive or data volume–intensive function. Enterprises will want to leverage as much of their existing AIM investments as possible, especially initially, but will want to adopt IoT-aligned technology as they operationalize and identify functionality gaps in how data is moved and managed, how analytics are applied, and how actions are defined and triggered at the moment of insight. This IDC TechScape covering IoT AIM is designed to help:

. Enterprises learn more about the newer AIM technologies that support IoT . Align these technologies with an enterprise's technology risk profile to determine what is ready to adopt and what should be monitored . Gain a better understanding of where an IoT team will need to create skills and competencies as it plans to adopt newer AIM technologies

TECHNOLOGY MARKERS OF MOMENTUM

The AIM tier of IoT encompasses the following:

. Model discovery, training and design, and the appropriate infrastructure for managing the data associated with these major activities . used in production to collect and deliver data reliably to processing targets . Integration to ensure data is in a format useful to target environments . options to support ancillary functions not included in most IoT platforms as well as used by enterprises to build their own capabilities as needed . Analytical software . Thing registration, state, and device management . Operational intelligence (OI) and monitoring to manage the larger systems and processes of things and related assets . Low-code environments to describe the relationship of events to conditions and actions and to support IoT application development Refer back to Figure 1, which fits the IoT AIM technologies into the appropriate curves. IoT is an emerging opportunity, and adoption of both IoT-specific and IoT-generalized AIM technologies for IoT is also early. We positioned each technology on the curves as an optimization of market adoption and technology maturity to show relative position as opposed to pure market adoption. If we looked only at market adoption, the labels would generally be too concentrated in the early sections of the curve to be legible.

©2016 IDC #US41841116 2 Table 1 organizes AIM technologies into functional areas, the type of curve, and IDC's assessment of stage of adoption, risk level, speed of adoption, and years to market adoption maturity. IoT AIM consists of generalized AIM useful for IoT as well as IoT-specific technologies organized into the following categories:

. IoT data collection . IoT data transport . IoT data event services . IoT data services . IoT value-added data services . IoT analytics . IoT conditions and actions . IoT visibility The descriptions of each technology are listed in the same order as they are presented in Table 1.

TABLE 1

IDC TechScape Technology Markers of Momentum

Years to Stage of Speed of Market Full Technology Curve Type Adoption Adoption Risk Level Buzz Adoption

IoT platform Incremental Evaluate Fast Medium Medium 7

IoT edge data collection

Sensor data collection Incremental Deploy Fast Medium High 3

Historian Incremental Evaluate Medium Low Low 5

IoT data transport

Managed data transport Incremental Test Fast Low Low 2

Streaming data Transformational Test Fast Medium Medium 5

Streaming integration Transformational Evaluate Medium High Low 8

IoT data event services

Thing event store Opportunistic Evaluate Medium Medium Medium 5

Thing registry and device Incremental Deploy Fast Low Medium 3 management

Thing state machine Transformational Test Fast Medium Medium 5

IoT data services

Dynamic data management Incremental Deploy Fast Medium Medium 5

Graph database Transformational Test Slow Medium Low 10

©2016 IDC #US41841116 3 TABLE 1

IDC TechScape Technology Markers of Momentum

Years to Stage of Speed of Market Full Technology Curve Type Adoption Adoption Risk Level Buzz Adoption

Hadoop Incremental Deploy Medium Medium Medium 5

In-memory data processing Transformational Deploy Fast High High 5

In-memory relational Incremental Deploy Medium Low Medium 5

Open data platform Incremental Evaluate Medium High Medium 6

IoT value-added data services

Blockchain Transformational Evaluate Slow High High 10

Data as a service Transformational Evaluate Fast Medium Medium 5

IoT analytics

Rich media analytics Opportunistic Deploy Fast Medium High 10

Statistical analysis Incremental Deploy Fast Medium Low 5

Streaming analytics Transformational Evaluate Medium Medium Medium 5

Supervised machine learning Incremental Evaluate Fast Medium High 10

Unsupervised machine learning Transformational Evaluate Medium Medium Medium 15

IoT conditions and actions

Low-code rules Incremental Deploy Medium Medium Low 7

Low-code app platform Opportunistic Evaluate Medium Low Medium 5

IoT visibility

Operational intelligence Opportunistic Evaluate Medium Medium Low 7

Source: IDC. 2016

©2016 IDC #US41841116 4 IoT Platform

FIGURE 2

IoT Platform Markers of Momentum

Source: IDC, 2016

IoT platforms are a collection of core software components required to support IoT workloads. This includes:

. Registering and connecting devices to the network . Maintaining sensor state data associated with each device . Analytics . Device management . Application development . Security Many of the IoT platforms are offered as cloud software or related sets of IoT services, while others can be deployed on-premises in a datacenter or at the edge.

Examples of products include Amazon's AWS IoT Platform, Bosch IoT Suite, Cisco Jasper, GE Digital's Predix, IBM Watson IoT Platform, Azure IoT Suite, Oracle IoT Cloud, PTC's ThingWorx, and SAP Hana Cloud Platform IoT services.

. Pros: . Is a relatively straightforward way to launch an IoT experiment or initiative . Speeds up the process of operationalizing IoT workloads . Cons: . Locks into a single vendor for core IoT workload functions . Is not comprehensive and will require interoperability with missing pieces of an end-to-end middle tier

©2016 IDC #US41841116 5 IoT Edge Technology Sensor Data Collection

FIGURE 3

Sensor Data Collection Markers of Momentum

Source: IDC, 2016

Sensor data collection edge technology does exactly what its name implies: collects data from sensors. The data collected is persisted in memory or on disk until such time as it is converted as needed, analyzed, filtered, and forwarded via data transport technology. If a historian is also in use, the data may be persisted for a longer period of time to facilitate transaction management and/or replay capabilities.

Sensor data collection software — whether it is embedded or installed in a gateway device or offered as standalone server software or virtual machine software — requires the ability to capture data transmitted by sensors over a variety of protocols, transform into a format that can be transmitted over the internet or back into the originating protocol, and provide reliability mechanisms to request sensor data retransmission and security to prevent unauthorized access and untrusted delivery of data and may require filtering to reduce outbound data volumes.

While there are IoT cloud services that directly collect sensor data, they require that transmission uses open application or messaging protocols, such as MQTT, HTTP, or AMQP. For that reason, we classify them as IoT data streaming services in the data transport section.

Depending on requirements, there is often a need to collect sensor data from a mobile edge, such as vehicles. When the edge is mobile, data may be collected using specialized embedded devices, such as National Semiconductor cRIO, or purpose built by the manufacturer. In that case, communications between the embedded device and a central aggregation source may require different networking or purpose-built communications systems. Embedded sensor data collection and specialized network communications are outside the scope of this IDC TechScape.

Examples of sensor data collectors include Intel's Wind River Intelligent Device Platform XT, MathWorks' ThingSpeak, PTC Kepware's KEPServerEX, and MuleSoft's Anypoint.

. Pros: . Decouples sensors from central data processing applications . Provides a level of data persistence at the edge on which edge analytics can be performed for faster response at the edge

©2016 IDC #US41841116 6 . Improves qualities of service in low and interrupted bandwidth environments . Cons: . Requires compute and storage capabilities at the edge . Increases latency of data from the sensors to a central processing facility Historian

FIGURE 4

Historian Markers of Momentum

Source: IDC, 2016

A historian maintains a local collection of sensor data and persists the information in storage for analysis and reporting, for transactional integrity, or for replay scenarios. Monitoring and reporting software accesses the data to provide situational awareness. In some industries, such as manufacturing, historians are mature and have been used for a long time to collect and process data into a time series.

There is some debate about whether the edge-specific historian will become obsolete in favor of the data being managed in the cloud. Whether historians continue as a permanent store for single location use cases depends on where it is least expensive or simplest to maintain, when there is a narrow time window, and when network conditions are unreliable. Where speed is of benefit, historians may need to be upgraded to support low-latency use cases. That said, a historian can itself become a thing that is accessible from a central source, and data from historians can be transferred on a regular basis to a centralized data store for use in training and discovery.

Process historians are available from a range of vendors, including OSIsoft, Siemens, Honeywell, and GE, that sell manufacturing technology. Cisco acquired ParStream for its analytics database that can be deployed at edge locations. IBM offers Informix edge.

. Pros: . Event history is persisted at the edge, so if there is an issue with data transmission, it can be recovered. . Historical data at the edge can be analyzed for historical trends in isolation. . Data can be replayed in test or simulation environments. . Cons: . Storage capacity is required at the edge. . Storage capacity can be complex to set up and manage.

©2016 IDC #US41841116 7 IoT Data Transport Technology Managed Data Transport

FIGURE 5

Managed Data Transport Markers of Momentum

Source: IDC, 2016

Managed data transport technology picks data up from files or populated by the IoT collector or historian and subsequently sends the files to the target central data processing facility. Managed data transport technology is more likely to be used:

. To support applications where batch or microbatch frequencies meet the IoT data latency requirements of the solution . As a rudimentary bridge between the collector and streaming data technologies, where incompatibility is an issue or where decoupling of the two components is desired . To periodically send data from the historian to the data stores used for the discovery and training required for analytics We use the term managed data transport because there are underlying choices about what technology to use. It is common to use managed file transfer (MFT) software and also reasonable to use the extract, transform, and load (ETL) technology. A file sync and share service can also be used in some applications.

Examples of software vendors and products in this category include but are not limited to Attunity's MFT and Replicate; Axway; IBM's Sterling MFT, Aspera, and Datastage; Informatica PowerCenter; Box; Dropbox; and Egnyte.

. Pros: . Managed data transport technology is relatively mature, and many organizations with IoT projects are likely to already have ETL, MFT, or file sync and share in their portfolio. The issue is implementing agents at the stationary or mobile edge to handle secure transport. . Managed data transport technology can also facilitate data transport from the edge to target processing facilities if data streaming technology is not available or feasible. . Managed data transport technology can be used to decouple the collector from transport, offering a higher quality of service in situations where network connectivity is low or unstable.

©2016 IDC #US41841116 8 . Cons: . Batch or microbatch will increase data latency between the edge and target processing facilities. . Central processing facilities may need to accommodate spikes of activity with each batch, depending on data volume. . Decoupling may help to one end, but it implicitly adds another component to the solution that will need to be monitored, managed, and maintained. . Depending on the software used for the implementation, it may require a heavier footprint implying sufficient processing and persistence capacity at the edge. Streaming Data

FIGURE 6

Streaming Data Markers of Momentum

Source: IDC, 2016

Streaming data is the transport that facilitates the flow of data from a source to a target or, in some cases, multiple targets. Streaming data software transports data that is generated continuously and transmitted simultaneously in small sizes (order of kilobytes). Transmission is handled by messaging technology, by specialized agents that forward data, or through that continuously post data to deliver to a target and in some cases, by application-level coordination of communication using lower- level protocols, such as HTTP or MQTT. Some solutions handle streaming sensor events directly from the sensor client through a gateway to the targets and from a central source through a forwarder to the sensor client. Other solutions pick up sensor events from the collector, which has already converted the protocol to an IP-compatible format. The messaging that transports data streams also may serve as the queuing mechanism at the target to receive and queue data from multiple data streams.

Many organizations have already adopted streaming data for IoT. We list it as transformational because it is a core component of an event-driven architecture, which in its entirety is considered transformational.

©2016 IDC #US41841116 9 Examples of technology in this category include but are not limited to messaging software such as Apache's ActiveMQ, Apache Kafka, MQTT-S, RabbitMQ, and ZeroMQ. Software for posting sensor events via REST APIs include Google Apigee Link and Red Hat 3Scale. IBM offers Bluemix Message Hub to connect its IoT platform to IBM's Hadoop Bluemix service.

. Pros: . Message queuing technologies offer higher quality of service levels over base transport protocols such as HTTP and MQTT. . Message queueing technologies are not new and as such have a lower level of risk associated with them. . Cons: . HTTP and MQTT methods can result in tightly coupled systems, requiring the source to maintain history in the event of data transmission issues. Sending applications will need to manage potential breaks in network connectivity. . Message queuing services add another layer of complexity into the end-to-end solution deriving additional requirements for monitoring, management, and maintenance. . Message queueing services may insert additional latency into the data transmission. Streaming Integration

FIGURE 7

Streaming Integration Markers of Momentum

Source: IDC, 2016

Streaming integration technologies are used to provide intermediary functionality between the edge and central processing facilities. Intermediary functionality may be required to perform protocol conversion, data normalization, and/or filtering. Streaming integration technology sits between the collector and the data stream, such as an API gateway into the data stream or a change-data-capture component listening to a collector's local database. It could also be a component that intercepts messages from the stream or the target message queue, processes the data, and forwards or puts the transformed data back into the stream.

©2016 IDC #US41841116 10 Examples of software vendors and products in this category include but are not limited to Apache NiFi, Hortonworks Dataflow, Informatica PowerCenter Real-Time Edition, Oracle Goldengate, StreamAnalytix, StreamSets Data Collector, and Striim.

. Pros: . Streaming integration is useful if transport and/or data protocol conversion is required between the edge and the stream. . Streaming integration can also be useful to filter, normalize, and reduce the volume of data, relieving the pressure on stream bandwidth and central processing capacities. . Much of the functionality is borrowed from existing segments of data and application integration software markets, so there is low risk associated with the technology. . Cons: . Streaming integration adds more components in the end-to-end solution, resulting in more points of failure that need to be monitored, managed, and maintained, and can increase latency in the data transmission process. IoT Data Event Services Thing Event Store

FIGURE 8

Thing Event Store Markers of Momentum

Source: IDC, 2016

Event stores capture and organize sensor data, adding to the store when new sensor data is delivered. A key attribute is the creation timestamp of the sensor event. Event stores are also created when streaming analytics is deployed. The event store can be queried by end users, applications, and time series–based analytical software. Event stores can also be used to backstream for testing and auditing purposes. Event stores are offered by some vendors as part of their IoT portfolio and also can be implemented using an in-memory time series database, data grid, or a general-purpose database that supports time series.

©2016 IDC #US41841116 11 Examples of IoT-optimized products include GE Digital's Predix Time Series, InfluxData, and Basho's Riak TS. General-purpose databases that support time series, usually with a special index and logic that can perform time series analytical functions, include the DataStax version of Cassandra (DSE), Clusterpoint, and SAP HANA.

. Pros: . Is highly efficient for low-latency systems . Is part of an IoT event–driven architecture that simplifies access to the times series of sensor data . Can replace or serve as the data management aspect of a historian to provide similar benefits . Cons: . Not broadly deployed . Increases complexity Thing Registry and Device Management

FIGURE 9

Thing Registry and Device Management Markers of Momentum

Source: IDC, 2016

A thing registry is a database of IoT. Things are the devices that are part of the IoT network. Each thing registered has an ID, name, and properties or attributes that are used to connect, collect information, and manage devices. Information about hardware and firmware version levels, install date, maintenance dates, and other static information about each thing are typically collected in the registry. Location of the thing may be static if it is fixed, but devices that are mobile would more likely have location as part of the thing state model.

Thing registry is core to an IoT program, enabling connectivity, support of applications, and device monitoring and management. Some organizations also build their own registry using a database. A , for example, is useful for registering a thing and its relationships.

Device management supports bulk operations related to devices, provides diagnostic information, and handles device actions, such as delivering and installing updates. While device management is not technically part of event data services, it is paired with the thing registry and thing state machine and made sense to keep these together.

©2016 IDC #US41841116 12 Examples of products include Amazon's AWS Thing Registry, Bosch's IoT Things and IoT Remote Manager, GE Digital's Predix Edge Manager, IBM Watson IoT Platform Foundation Device Management, Microsoft Azure's IoT Hub Device Identity Registry, and PTC's ThingWorx Foundation.

. Pros: . A registry provides a central repository of things connected to the network. . A registry can be used for analytics of lifetime, runtimes, service history, and inventories. . A registry can be used to identify location for MRO. . Cons: . The registry will need to be maintained, and unless the things themselves are providing the data for the attributes in the registry, manual maintenance could become overwhelming. Thing State Machine

FIGURE 10

Thing State Machine Markers of Momentum

Source: IDC, 2016

A thing state machine maintains the current status of a thing's sensors. While the thing registry maintains static information about a thing, the state model maintains the current status of information. Depending on the complexity, a thing state model may also consist of a series of state models. Depending on product capabilities, state machines can consist of direct sensor readings as well as calculated — or derived — state. This derived state may also use analytics to arrive at the state, for example, scoring the status of a derived property in the state model. Using an event-driven architecture built around publish-and-subscribe provides a way for multiple thing state models to subscribe to the same sensor data event, depending on the use case. State models may also be propagated from edge to cloud and across clouds. Ultimately, state models provide the thing state data required for custom and packaged IoT-related applications.

Not all IoT platforms have a state model construct and may choose to store state data in a time series database or a relational database. Depending on the complexity of the use case, enterprises may choose to build their own state models using NoSQL database technology.

©2016 IDC #US41841116 13 Example of products include Amazon's Device Shadows for AWS IoT, PTC's ThingWorx Thing Model, and Salesforce's Thunder and IoT Cloud.

. Pros: . The thing state model is an important asset in an event-driven architecture and for low- code environments, particularly for application development and where nontechnical subject matter experts (SMEs) are developing condition detection and response logic. . The thing state model makes it easier to distribute sensor data to all systems that need the data, particularly in decentralized systems where the design of the system has multiple tiers managed by different vendors or products, such as an edge tier or a middle tier for machine-specific use cases or an interaction tier for customer experience–centric use cases, where there is an advantage in splitting up the design based on assets required in each tier. . Cons: . Not all IoT platforms have this capability and may require internal skills to develop and manage on an ongoing basis. . Not all organization working on IoT projects are structuring around events and may be more comfortable using more familiar databases. IoT Data Services Dynamic Data Management

FIGURE 11

Dynamic Data Management Markers of Momentum

Source: IDC, 2016

A dynamic data management system can accept data without requiring that the structure and elements of the data be defined in advance. These include scalable data collection managers (the most common being Hadoop) and dynamic DBMSs. Because they do not require the use of SQL, dynamic DBMSs are sometimes called NoSQL database systems. There are two categories of dynamic DBMS:

. Semischematic, where the data may be governed by a schema, but one is not required (Any data may be entered into the database that conforms to the general data format of the DBMS if no schema is present. If a schema is present, it governs the data and optimizes database operation on that basis.) . Nonschematic, where no schema is required, and any data conforming to the general format of the database may be added

©2016 IDC #US41841116 14 The resulting collection of data may end up being rationalized under a schematic structure (in the case of semischematic), mapped on the basis of field names and values or simply accessed by means of key-value pairs. Types of dynamic data management systems include:

. Document-oriented database systems: Document-oriented database systems manage data blocks containing fields that are identified according to a generally accepted document format. The two most common such formats are Extensible Markup Language (XML) and JavaScript Object Notation (JSON). Examples of products include Amazon DynamoDB, Couchbase, IBM Cloudant, and MongoDB. . Key accessible database systems: Key accessible databases are nonschematic and store data in a way that supports random retrieval by key value or retrieval in key-value order. They are not true database management systems because they merely facilitate the storage and retrieval of data according to certain optimized techniques but do not actually manage the database per se — the applications do that. Examples of products include Amazon SimpleDB, Apache HBase, Basho's Riak, and Oracle NoSQL Database. This category also includes graph databases and Hadoop, which are covered separately.

. Pros: . Faster, more flexible way to manage data, particularly data structures that change rapidly or do not lend themselves to an RDBMS . Low-latency response times . High scalability . Cons: . This technology can't be used for applications that query using SQL. . There are skills gaps compared with SQL-based systems. Graph Database

FIGURE 12

Graph Database Markers of Momentum

Source: IDC, 2016

Graph DBMS software manages data as graph structures. These contain objects sometimes called "nodes" or "vertices" with recursive attributed relationships, sometimes called "edges." The attributes of the objects and relationships are called "properties." Unlike a fully schematic database, the structure of a graph database is derived from the relationship structure that is found in the instance data.

©2016 IDC #US41841116 15 Graph databases are used to capture and analyze extremely complex relationship instance structures. For example, a thing registry could logically be built in a graph database to make it easier to show relationships between things and networks of things as well as data flows. Graph databases are also used to support some types of machine learning.

Graphs are especially useful for discovering previously unknown or little understood relationships. These relationships can include those arising from behavioral patterns or coincident patterns of change. With respect to connected devices, these could be such things as tracking customers through shopping areas using their cell phone location data and correlating this tracking data with that of others to find useful patterns.

Another example comes from the automotive industry where new cars are heavily instrumented, regularly transmitting data about the condition of the engine and various wear on parts of the vehicle. Combine that with geospatial data, and the geolocation data from vehicles with coincident data about weather and traffic conditions, and it becomes possible to find patterns of relationships between engine and drivetrain wear, fuel consumption, and various combinations of weather (hot versus cold and dry versus wet) and traffic (heavy versus light). These patterns, in turn, may be analyzed to a level of detail that can better inform maintenance service intervals for specific locales and even future design changes.

Examples of products include Neo Technology's Neo4j, IBM's Bluemix Graph, , Ontotext Graph DB, OrientDB, Objectivity's ThingSpan (formerly known as InfiniteGraph), and DataStax's DSE Graph.

. Pros: . Unlike other NoSQL DBMSs, a graph DBMS is driven by instance relationships and so makes analysis of patterns and combinations of relationships relatively easy and fast. Unlike an RDBMS, which requires data to conform to a fixed relationship structure, a graph database reveals the relationships inherent in the data, with very little preparation ahead of the data load. . Because actions and consequences in a complex system generally result in changes to data relationship patterns, graph databases can help drive machine learning and other AI- related operations. . Cons: . Because graph databases can make no assumptions about relationships and patterns of relationships in the data, preloading query optimization is not possible. This is different from an RDBMS, where the relationship structures are fixed in the schema, so query plans are typically optimized. This means that the work of graph databases must be focused on situations where relationship pattern discovery is primary; it is not a substitute for an RDBMS. Because of the overhead involved in relationship management, it is also not a substitute for the relatively simple object-by-object processing of a document-oriented database system (e.g., JSON or XML). . Not all graph databases are good at all graph workloads. Some graph databases do text graphing well but fall down with large volume object graphs. Some graph databases are better for relationship traversal (such as finding all objects with at least a fifth-degree relationship to a given object), while others are good at statistical patterns based on large numbers of related objects.

©2016 IDC #US41841116 16 . This area is still evolving. There is no one standard graph (such as SQL for relational), although TinkerPop is emerging as a framework, and is its language. Neo4j offers a language called . SPARQL is sometimes used for graphs that represent semantic information structures. GraphQL is a graph data access method that uses a RESTful API, though its name would suggest a query language. There are various efforts under way to develop a common query language. Hadoop

FIGURE 13

Hadoop Markers of Momentum

Source: IDC, 2016

Apache Hadoop is a cluster-based platform for the ingesting and processing of large volumes of data using a massively parallel processing (MPP) approach. It exists through a group of closely related Apache open source projects that provide software to manage the cluster and handle the consolidation of result data across the cluster and various administrative functions. Closely related to are HDFS, which acts as a cluster-based file system, and HBase, which runs on top of HDFS and acts as a key-value store (a simple NoSQL database that randomly stores and retrieves blocks of data based on unique key-value pairs). Also, commonly used in this context is Apache Hive, a facility for defining the data in HBase for retrieval using standard SQL.

The normal mode of processing data, especially new data, in Hadoop is a programming technique called MapReduce. For IoT and machine learning cases, MapReduce has fallen out of favor as more users are turning to the high-speed in-memory processing of Spark, either coding natively or in conjunction with a query processing layer such as Spark SQL. Apache Spark is described in the data services section under in-memory data processing.

Hadoop is commonly used in the following ways:

. As an initial ingest engine, accepting data as well as ordering, filtering, and formatting it and then delivering a subset for further processing either in HDFS or on another platform . For the one-time or limited frequency analysis of very large amounts of data . For the long-term storage of data that ought to be retained but is accessed only occasionally . As a clearinghouse or transformation platform as data is moved from system to system, sometimes as a substitute or replacement for an extract, transform, and load facility . As a combination of the aforementioned bullets, commonly called a "data lake"

©2016 IDC #US41841116 17 Apache Hadoop may be downloaded and used directly from the Apache website, but this requires considerable technical expertise and a willingness on the part of the enterprise to act as its own software tech support organization. Most enterprises choose instead to use a commercial packaged distribution of Hadoop, which comes with advanced management tools, professional support, and regular software updates ready to install.

Examples of commercial Hadoop distributions include Cloudera Enterprise, Hortonworks Data Platform (HDP), and MapR Converged Data Platform (which includes an indexed file system called MapR-FS and its companion NoSQL DBMS MapR-DB as substitutes for HDFS and HBase). Also, IBM bundles Hadoop into IBM BigInsights, Oracle bundles it in Oracle Appliance (OBDA), and similarly, Microsoft offers HDInsight. Amazon offers an AWS-optimized variant called Elastic MapReduce (EMR).

. Pros: . Is ultimately flexible and scalable; can accept any data of any size because the processing details depend on code. . Is cost effective as a storage platform for huge amounts of searchable data, which is particularly useful for IoT long-term storage of sensor event data . Supports IoT discovery and training, which is critical to the ultimate success of IoT projects but is not part of an IoT platform . Cons: . Hadoop applications must be coded. There is no schema and no optimizer. The user is responsible for the maintenance of the system and must do work that DBMSs normally do, such as data structure management and access optimization. . This is a batch-oriented system, so real-time processing of streaming data is not possible. Where streaming data is involved, it is usually a companion to some stream data processing engine, serving as a back-end storage facility for later processing of historical data after the fact. . Hadoop in its native form is not suitable for random data update and so should not be considered for transaction processing. In-Memory Data Processing

FIGURE 14

In-Memory Data Processing Markers of Momentum

Source: IDC, 2016

©2016 IDC #US41841116 18 In-memory data processing platforms enable large-scale data-centric operations to be carried out entirely in memory, without reference to storage. This sometimes takes the form of loading the data from some source (such as a database) into memory and maintaining it there for analytic query processing. It can also take place by managing the data in memory as a database, using snapshots and logs, or replication to prevent data loss in case of system failure.

The most common of the former type of in-memory data processing platform is Apache Spark. This facility is run on a cluster, holds data in memory, and performs MPP-based queries on the data. It is optimized for speed. Spark is most commonly deployed on a Hadoop cluster, using the HDFS (or HBase) layer for its storage, but it is also run on top of the wide column database, Apache Cassandra, and can even run on its own clusters. This last configuration is becoming more and more common on AWS, where it uses the S3 layer for its storage.

Spark is popular for data operations on large data collections where an outcome is expected immediately or nearly immediately or to speed up time-consuming analytics training. This contrasts it with Hadoop MapReduce, which is not typically used for interactive query because of the batch nature of its processing. Spark is also used to collect streaming data, making it available for nearly immediate use.

Examples of in-memory data processing include Apache Spark, Apache Flink, Apache Ignite, Databricks, and GridGrain. In addition to the Hadoop distributors listed in the Hadoop section, there are many commercial Spark distributions. Databricks is a pure-play Spark distributor.

. Pros: . In-memory data processing is much faster than Hadoop MapReduce and is assuming increasing amounts of the latter's workloads. . Spark has a range of other projects and a growing ecosystem around it that are designed to add value and functionality to the basic platform. These include MLib for machine learning, GraphX for graph support, Spark Streaming for streaming data ingestion, and Spark SQL. There are also examples of using Spark in combination with GPU to speed up model training, particularly for highly complex use cases. . Cons: . Like Hadoop, Spark and similar products require a lot of hand coding to make solutions work. . This category is still evolving. Spark, in particular, is evolving rapidly, and new versions are not always fully compatible with previous versions, which means that some adaptation of applications to successive versions of Spark may be necessary.

©2016 IDC #US41841116 19 In-Memory Relational

FIGURE 15

In-Memory Relational Markers of Momentum

Source: IDC, 2016

The in-memory relational technology is found in memory-optimized RDBMSs (i.e., they are optimized for the management of data in memory as opposed to in storage). Some of these databases are designed mainly for transaction processing, some mainly for analytical processing, and some do both. Typically, the analytic RDBMSs in this category are columnar, and most use a compression technique that not only saves memory but ensures that the data is organized optimally for query processing by enabling the use of the entire microprocessor data cache to be used with data test operators (e.g., equals, not equals, greater than, and less than). This makes the use of single instruction multiple data (SIMD) operations possible, greatly increasing processing speed. RDBMSs that mainly process transactions typically hold the data in rows. Those that handle mixed workloads may hold some data in rows, some in columns, or in some cases, other formats designed to minimize instructions and memory access.

Some of these in-memory relational databases can accept streaming data at speed, allowing queries that include current and previously collected data to execute on a very timely basis. Other databases are simply designed to process transactions very quickly or support complex queries very quickly. All of these RDBMSs use various techniques including persistent transaction logging and snapshotting to ensure recoverability so that data loss is no more a concern with them than with storage-based RDBMSs.

Examples of in-memory RDBMSs include Altibase, deepSQL, MemSQL, Oracle TimesTen, SAP HANA, and VoltDB.

. Pros: . SQL is the most commonly understood query language in the IT world, and these products are optimized for it. . In-memory relational technology delivers speed with structure in a familiar format. . Cons: . Requires the data to conform to the schema of the database, so it is really only usable where the data is well understood and its format does not change much . Requires systems with large amounts of memory, which could be a cost concern

©2016 IDC #US41841116 20 Open Data Platform

FIGURE 16

Open Data Platform Markers for Momentum

Source: IDC, 2016

In IoT, the open data platform is a combination of technologies, integrated together, that enable the management of and access to relevant IoT and enterprise data regardless of where it sits and what its format is. The components of such a platform usually include a data integration engine capable of dynamic data integration (rather than batch data integration), composite data frameworks for federation and virtualization, data transports, connectors to established databases, stream processing for incoming streaming data, and a central processing engine, which increasingly are being housed within Hadoop.

Examples of the use of such a platform in an IoT context could include such things as an automotive service center comparing readings from sensors in your vehicle with data regarding recent problems and service to determine whether a service call is in order, a bank with a smart app on your phone comparing movements and purchases with your general purchase pattern to determine if there is a risk that your phone was stolen, and a utility company comparing patterns of electricity usage from smart meter data with historical patterns to determine if changes in the distribution of power on the grid are warranted.

The open data platform is synonymous with the term unified data platform. In some cases, streaming data and streaming integration software are used in conjunction with broad database and data management capabilities to offer a comprehensive open data platform. Enterprises may also choose to build their own from the four core major components.

Examples of products that deliver such functionality include Teradata with its Unified Data Architecture (UDA), Informatica with the Informatica Platform, PluralSoft with a unified data architecture focused on healthcare, and IBM Watson Data Platform.

. Pros: . Ensures data consistency and provides access to data and the ability to find what you need when you need it . Offers an opportunity to provide business context for IoT data through integration of in- motion and at-rest data

©2016 IDC #US41841116 21 . Cons: . Open data platforms are early in their development and deployment. Many of these offer a set of capabilities that need to be assembled for technical and business use cases, and assembly may not be trivial. As these platforms become more widely used, standard architectures and best practices will emerge, but for now, this represents a high-risk component. . Nontrivial assemblies lead to complex monitoring, management, and maintenance. IoT Value-Added Data Services Blockchain

FIGURE 17

Blockchain Markers of Momentum

Source: IDC, 2016

Blockchain provides a decentralized chain of trust for transactions against an object. Blockchain originates from bitcoin, and many of the first applications of blockchain technologies are focused on financial services: payments, equities, and money transfers. However, blockchain can be applied beyond financial transaction use cases to provide a chain of trust for any type of transaction against any type of object — real or virtual. The value of the blockchain is that it can be trusted, and it is distributed, not centralized, providing full provenance of the data on the chain.

Blockchain in IoT can be used to validate that data being received from a thing is actually from that thing and not an imposter. Likewise, instructions from a source to update a thing can also use blockchain for validation. Blockchain can also be used to represent the most recent state of a thing, potentially as an alternative to the thing registry and state model because the blockchain keeps an immutable record of the history of the thing, and could represent the current state. Every trusted application that needs access to data about the thing will have a local copy of the thing's chain. When new blocks are added, the distributed chain is also updated. However, these are still speculations on how the technology could be applied. There is a lot of work and innovation yet to happen before the most appropriate use cases of blockchain in IoT emerge.

Blockchain in IoT is still very much in its infancy, although some vendors are releasing technology building blocks, including the IBM Watson IoT Platform and Chronicled, which has launched a Ethereum IoT registry based on blockchain. Stock.it is a start-up at the intersection of blockchain and IoT applications in the sharing economy.

It is also not clear whether the blockchain technology used in bitcoin will be exactly the same used in IoT use cases or whether the term will be closer to a term used to represent an ultra-secure method for

©2016 IDC #US41841116 22 guaranteeing decentralized data integrity. For example, Ericsson's data integrity service is based on a keyless signature infrastructure (KSI) that offers similar guarantees and has similar constructs. GE and Ericsson offer this in the GE Predix catalog as a blockchain-enabled service, which digitally signs and verifies data to assure that configurations, firmware, and data have not been compromised in addition to a blockchain-like chain of custody. This doesn't appear to be based on a bitcoin proof of work.

. Pros: . Data in a blockchain is tamper proof and does not exist in a single location, so it cannot be maliciously modified. . There is no single thread of communication that can be intercepted, preventing man-in- the-middle attacks from occurring. . Blockchain technology may enable autonomous functioning of smart things without the need for a centralized authority. . Every participant in the chain has the most recent version of the truth and state of each thing. . Cons: . Latency of transaction validation in a blockchain network is still very high because every actor needs to agree the transaction is valid. . Integration of blockchain technologies with legacy systems in existing organizations may be daunting. . There is still a lot of research and development happening with blockchain, making this a high-risk technology at this point. Data as a Service

FIGURE 18

Data-as-a-Service Markers of Momentum

Source: IDC, 2016

Data as a service (DaaS) represents the data and/or content that is produced or derived as a by- product of the usual economic activity in commercial and public sectors. These data assets may be in form of raw data or as various value-added content such as lists, data feeds, scores, algorithms, recommendations, or benchmarks. DaaS are consumed to improve various types of analytics, with the ultimate goal of improving the quality of decisions. DaaS also represents the opportunity for organizations to sell their own data to third parties — either raw data or as a component that enhances exiting products and services.

©2016 IDC #US41841116 23 For example, inclusion of weather or location data — two ubiquitous DaaS options — can enhance predictive asset maintenance or logistics optimization processes. Organizations providing DaaS include those in commercial enterprises and government agencies that generate the original raw data and companies that locate, extract, mine, aggregate, enrich, and/or curate data for resale. There is a broad range of data providers, brokers, and marketplaces.

In IoT, there are a handful of general-purpose DaaS, such as weather and location data, but there are also many other specialized, industry-specific, and business process–specific data services. Examples of data services include GE SmartSignal, Michelin solutions, Volkswagen Car-Net, IBM's The Weather Company, Pirelli, MyJohnDeere.com, and Verizon's Precision Market Insights.

. Pros: . As consumers of external (third-party) data, organizations can enhance their analytic models with the availability of more data and augment their things master data. . As producers of data or various derived value-added content, organizations have the opportunity to monetize such data assets either directly (by selling data to third parties) or indirectly (by incorporating data into other services they provide). . Cons: . Use of external data can create additional challenges in data integration and data integrity management. . Monetizing one's data is a complex task that requires creation of a strategy and specific plans for packing, pricing, and ongoing maintenance and delivery of such data products. IoT Analytics Rich Media Analytics

FIGURE 19

Rich Media Analytics Markers of Momentum

Source: IDC, 2016

Rich media analytics solutions identify objects, entities, events, attributes, or patterns of behavior (including temporal and special events either in real time or post event) through the detection, determination, and analysis of video and image data. Use cases for these solutions include security, object identification, video monitoring/tracking, image search, automatic alerting, forensic analysis, image categorization, pattern, image, and shape recognition.

©2016 IDC #US41841116 24 The amount of rich media data that needs to be analyzed and understood is increasing exponentially with growth of the internet and mobile devices that capture images and videos on a more or less constant basis. However, IDC estimates that much of this data is useless unless some type of analytics is applied to it.

The market and opportunities for image and video analytics is growing significantly. Many organizations would like to be able to monetize images for ecommerce. In addition, there is increased interest in automated solutions for video surveillance — of human and nonhuman activity. Organizations are also looking at using video and images as part of the data needed to understand and improve customer experiences, along with social media data, geolocation information, and transactional sales data. Video data and video surveillance are being used in a variety of ways by many different organizations. Governments and enterprises are primary users of image and video analytics today.

Companies offering image and video analytics include Hitachi, Fujitsu, NEC, Sony, JustVisual, HPE, IBM, Clarifai, Cortexica, Ramp, Aventure, IntelliVision, 3VR, Accenture, and ObjectVideo.

. Pros: . The exponentially increasing amount of image and video content offers an opportunity to apply rich media analytics technology to extract valuable information and knowledge. . Image and video analytics can add another dimension to text-based knowledge in diverse areas from healthcare to terrorism investigation to Internet of Things. . Image and video analytics can provide real-time feedback and information for cognitive decision making in areas such as robotics, drones, and driverless vehicles. . Cons: . This area is still emerging, and identifying objects, patterns, and visual cues correctly can be prone to errors depending on the algorithms and tools used. . Many of these tools use extensive amounts of machine learning, which is highly processing intensive. . Relating entities and objects from video to textual records and information can be challenging. Statistical Analysis

FIGURE 20

Statistical Analysis Markers of Momentum

Source: IDC, 2016

©2016 IDC #US41841116 25 Statistical analytics software includes packages that use a range of statistical techniques to create, test, and execute models on analyzing IoT data. This genre falls into the advanced and predictive analytics software segment of business intelligence and analytics tools. Sample techniques used include descriptive and predictive analysis, regression, and clustering.

Statistical analytics is used to discover relationships in data and make predictions that are hidden, not apparent, or too complex to be extracted, or when there is not enough data for other types of modeling. An example of use case in IoT would be in predictive maintenance, where analysis of sensor data would provide predictions on components that will be in imminent need of maintenance.

Most statistical analytics packages use programming languages that might be proprietary or open source or a combination. Most packages also include a graphical user interface that allows analysts to interact with the software and build models with no or minimal programming.

Examples of products include SAS Analytics and SAS Enterprise Miner, IBM SPSS, SAP Predictive Analytics and SAP Infinite Insight, and Oracle Data Mining. SPSS is a component of IBM Watson IoT Platform. In addition, open source modeling languages like are commonly used by advanced data analysts.

. Pros: . This technology can be used where large gaps exist in data models or where data models are incomplete. . Data can be easily imported from excel files or other formats. . A variety of statistical techniques for analyzing data can be used. Most packages allow power users to use programming languages for complex analyses that cannot be done with graphical user interfaces. . Cons: . These tools can be fairly complex to use. Users need to be sophisticated in concepts of statistics and data mining and programming in order to be able to take full advantage of the capabilities of these tools. . Statistical packages typically cannot account for all factors that might affect an outcome, especially those that cannot be expressed as structured data. . These tools are not suitable for data manipulation or data preparation. They assume that the data is cleansed, validated, and prepared, and hence bad data will result in poor predictions.

©2016 IDC #US41841116 26 Streaming Analytics

FIGURE 21

Streaming Analytics Markers of Momentum

Source: IDC, 2016

Streaming analytics continuously evaluates and correlates events to detect anomalies and conditions requiring further action. Events are received and correlated one at a time in real time or in microbatches, and the logic typically involves a time window. A condition model managed within the event processing engine describes:

. The relationship of two or more events to each other . The relationship of two or more data elements within a single event . The relationship of a new event and the math or logic that should be applied to the event . The comparison of a desired state and the current state A simple correlation example is evaluating the newest vibration sensor data event associated with a machine to a threshold level while more complex models ingest streams of data from multiple sources to identify more complex patterns that may be more appropriate for systems of things rather than individual things.

Models can include calculations of statistical probabilities and use of algorithms, rulesets, or code that describe a condition. Models can describe the presence or an absence of a pattern as well as time logic between data events within a time window. Streaming analytics in runtime tends to be publish- and-subscribe, with nodes listening for the results of a previous node. This software often constructs a series of nodes that handle individual processing steps, such as ingesting a sensor event and correlating it to others, and if the node has an output (a derived event), it serves as input to the next node or any node subscribing to that output. The next node may enrich the derived event by looking up customer information or thing information, with that output serving as input to the next node, which may apply a rule or algorithm to determine the next best action.

The term streaming analytics was first used as a subcategory of complex event processing (CEP). Today, the term is used synonymously with CEP. There are a variety of software products that handle streaming analytics, both open source and proprietary. Many value-added offerings are extending Spark and Spark Streaming.

©2016 IDC #US41841116 27 Examples of products in this category include Apache Kafka Streams, Apache Storm, Apache Spark Streaming, AWS Kinesis, IBM Streams, Microsoft Azure Stream Analytics, Salesforce Thunder, SAS Event Stream Processing, SQLStream, and TIBCO's Streambase and BusinessEvents.

. Pros: . Can be used when requirements call for low-latency detection of conditions, particularly under high data volume conditions . A central component of event-driven design that is oriented to decision support and decision automation . Can be used in a compact way at the edge compared with other analytical techniques . Can plug in machine learning as part of a stream, supporting hybrid cognitive/programmatic use cases . Can be used for preprocessing events that need to be correlated before moving to a different analytical environment . Cons: . Some organizations may opt not to use when they have many applications that rely on mature data management systems. Instead, the organizations will improve their data refresh rates and forego event-driven design. . Popular open source–based streaming analytics software is less mature, missing many of the key elements that are present in proprietary streaming analytics. . There is a scarcity of developer skills in use of streaming analytics. Supervised Learning

FIGURE 22

Supervised Learning Markers of Momentum

Source: IDC, 2016

Supervised machine learning begins with examples of training data paired with identifying labels (e.g., right or wrong and positive or negative) selected from the categories to be learned. Using these pairs of example data and labels ("training data"), the system learns parameters of statistical models that it can then generalize to unlabeled examples of data items that were not seen in the training data ("test data"). In most cases, the learned models improve over time via a feedback loop that adjusts the model parameters to better reflect additional sets of training or production data. The performance of a learned model can be measured by simple prediction accuracy or by the particular business metric the learned model is designed to support. Performance depends on the degree to which the training data matches the real world, the choice of algorithm, the algorithm's parameters, and the quantity of data.

©2016 IDC #US41841116 28 Companies like IBM, IPsoft, Wipro, Intel's Saffron Technologies, Infosys, CognitiveScale, and Tata Consultancy Services include machine learning capabilities in their cognitive system platforms that allow developers and enterprises to build cognitively enabled "smart" applications that learn over time.

In addition, vendors such as Google, Amazon, Microsoft, and Skytree offer commercial machine learning libraries as standalone tools. There are also many free and open source machine learning packages, including Apache Spark's MLlib, which is designed to make machine learning easy and useful inside the popular Apache Spark framework for cluster computing. In addition, Microsoft recently open sourced its distributed machine learning library, DMTK, under an MIT License. Additional open source software includes Waikato Environment for Knowledge Analysis (Weka) and Massive Online Analysis (MOA) from the University of Waikato and H2O.

Deep learning is a particular type of supervised machine learning based on neural network algorithms, which has seen recent commercial success. Google released its second-generation deep learning library, TensorFlow, to open source. Other open source deep learning libraries include Caffe from the University of California, Berkeley; Theano from the University of Montreal; and Torch from Idiap, used extensively by Google and as well as Weka and H2O.

. Pros: . Supervised learning algorithms can learn quickly from examples and self-correct when changing trends are reflected in new sets of labeled data. . Advances in computing power and ever expanding sources of data make advanced algorithms possible. . Lots of investment by vendors and venture capital firms are leading to rapid progress. . Cons: . Finding or creating the required labeled data is costly and difficult. . A wide range of options make vendor selection tricky. It can go from being quite inexpensive with open source to very expensive with large vendor offerings. The less costly options will entail lots of internal resources to make it work. . As advanced as these products have become, there are still challenges in achieving objectives when there are large numbers of variables and interdependencies for a particular decision. . Subject matter experts are needed to assist with the initial and ongoing review of data training sets, which may prove costly and time consuming. . The bias-variance trade-off requires balanced learning algorithms based on the amount of available data and the discernable complexity of the function to be learned to automatically adjust the bias-variance trade-off.

©2016 IDC #US41841116 29 Unsupervised Learning

FIGURE 23

Unsupervised Learning Markers of Momentum

Source: IDC, 2016

Unsupervised machine learning is another variation of machine learning where algorithms detect and discern attributes and features without the benefit of labeled training data. Some algorithms cluster data into meaningful groups by finding centers of data density. Other unsupervised algorithms use dimensionality reduction techniques (like singular value decomposition) to uncover the essential attributes of the data without requiring a human to define those attributes in advance. This is particularly useful for "unstructured" data, such as images or text, where an underlying structure can be automatically inferred, enabling other algorithms to leverage the data.

Unsupervised learning algorithms based on clustering, dimensionality reduction, and neural networks can be found in most major commercial and open source packages, including Apache Mahout, Gensim, Environment for DeveLoping KDD-Applications Supported by Index-Structures (ELKI), and word2Vec, to name a few. These algorithms are used in clustering challenges with unstructured data such as image categorization, text analytics, or speech recognition. Companies offering unsupervised learning capabilities include Numenta, Nervana Systems, Loop AI Labs, Luminoso, Clarifai, H2O.ai, and MetaMind.

For specific IoT use cases, IBM offers Streaming Analytics Service and Watson Machine Learning Service, which provides more than 200 algorithms for analytics and detecting in streaming IoT data.

. Pros: . Unsupervised machine learning does not need labeled training data, enabling faster implementation. . It requires little or no user intervention. . It provides low cost of ownership from reduced staffing and hardware requirements. . Self-learning systems provide automatic system updates. . Cons: . Unsupervised learning needs a lot of data to develop good models. . Training data needs to be well represented. . Training systems risk being over trained.

©2016 IDC #US41841116 30 IoT Conditions and Actions Low-Code Rules

FIGURE 24

Low-Code Rules Markers of Momentum

Source: IDC, 2016

A major driver of IoT is the ability to connect sensor data to analytics to detect and predict conditions that warrant a response. The consequences of this shift from reactive problem solving to predictive problem anticipation are the large volume of decisions that must be made quickly to determine how to respond. In predictive systems, decision conditions are similar to Big Data. There are volume, velocity, and variety consequences that we can almost think of as big decisions.

Rules engines and decision services are not widely adopted in IoT, but there is some recognition that decision automation and decision support is a needed area of investment operationally as well as in support of sensor-based customer experience initiatives. Rules software abstracts the conditional, decision-oriented logic from system and application logic. This software is used to create rules that assign and route work, standardize how decisions are made, and automate decisions.

While this type of reasoning is typically embedded as code in systems, the abstraction improves the ability to make changes rapidly as situations change. Subject matter experts become the managers of decision assets rather than developers who, without rules, make changes to conditional logic embedded in systems through change requests.

Example of low-code rules software includes IBM Bluemix Business Rules, which includes a recipe to integrate rules with the Watson IoT Platform. Because rules design requires Eclipse, Business Rules is more oriented toward developers than subject matter experts but is capable of supporting sophisticated operational requirements. IBM also offers Real-Time Insights as part of its IoT platform. Sapiens DECISION is a sophisticated horizontal offering that provides a development environment for SMEs and can be deployed decentrally and also support centralized on-demand decision service. Salesforce IoT Cloud is in preview. It provides a consumer-grade experience that allows end users to identify events and map them to conditions and conditions to actions. Red Hat BRMS is based on the open source Drools.

. Pros: . Improves the efficiency of managing decisions as assets in environments where high volumes of decisions need to be made efficiently and rapidly . Works in conjunction with analytics to provide precision in how to route actions

©2016 IDC #US41841116 31 . Speeds up development and change management . Cons: . Low-code rules can be complex to manage decisions as assets. . The traditional, more highly adopted rules engines can be too hard to use for nondevelopers and the newer, easier-to-use rules engines may be too simple for sophisticated use cases. Low-Code App Platform

FIGURE 25

Low-Code App Platform Markers of Momentum

Source: IDC, 2016

Low-code application platforms combine development and runtime into a single offering. They typically consist of graphical modeling environments to describe workflows, data objects, and forms; point-and- click configurations; and relatively simple scripting. These environments are popular for rapid development as well as development teams that include both business participants and developers.

In IoT, low-code platforms are useful for automating workflows, for building mobile apps, and for assigning and managing tasks. Moving into the future, the low-code platforms will be useful for designing and automating interactions used to manage an event-based customer experience.

IoT-specific examples of products include IBM's Node-RED and PTC's ThingWorx Foundation. Generalized low-code workflow or mobile app environments include Nintex Workflow Cloud, Alfresco Activiti, Bonitasoft's Bonita BPM, BP Logix, Appian, and Salesforce Lightning.

. Pros: . Useful for application design involving collaboration between process experts and developers . Fast development cycles . Ability to provide short-term situational apps . Cons: . May not offer the control developers need for specific use cases

©2016 IDC #US41841116 32 IoT Visibility Operational Intelligence

FIGURE 26

Operational Intelligence Markers of Momentum

Source: IDC, 2016

Operational intelligence, continuously or in microbatch, captures operational data in near real time, correlating the data against relationships within the data streams, key performance indicators (KPIs), service-level agreements (SLAs), or time series data, and delivers the results into a dashboard, ideally with a drill path to explore root cause. The goal is to spot an operating condition immediately that can be fixed within a current time window to improve operating performance. Alerts are an important part of OI, as well as linking alerts to third-party actions that trigger and manage the response.

The idea behind OI is creating business value by identifying and solving one operational problem at a time to improve profitability.

OI can be tightly focused on a particular process or subprocess within a domain or can be much broader and span domains. With IoT, sensors associated with things can be monitored in near real time and trigger an action when an anomaly condition occurs. Broader use of operational intelligence can monitor networks of things as well as merging data streams from many sources and correlating and detecting conditions. A cross-domain example is the use of OI to link multiple wind farm sensor data to a real-time electricity market data feed, providing the opportunity to dial up or down the production of electricity in real time based on market demand.

The lines between the different intelligence and analytical techniques blur and combine as required. OI is not necessarily used to predict but instead spot a problem as early in a process as possible to make it almost seem like a prediction. OI is also moving toward the use of machine learning.

OI products include Splunk when Splunk Forwarder delivers data continuously or in micro batches, Sight Machine, Vitria IoT Analytics Platform, business activity monitoring products from Software AG, IBM, and Oracle.

TIBCO Business Events, Software AG's Apama, Apache's Storm and Kafka Streams, SAS Event Processing, and other one-event-at-a-time streaming analytics tools can also be used for OI in conjunction with real-time monitoring solutions. These are included under Streaming Analytics in the IoT Analytics section.

©2016 IDC #US41841116 33 . Pros: . Produces a higher-level business view of sensor-supporting operations . Makes it easier to get started in IoT by identifying low-hanging opportunities where the problems are straightforward to identify and causes are fairly well known . Shifts from reactively responding to problems to proactively identifying them to speed up resolution . Cons: . Is supplemental rather than core IoT AIM technology . Does not incorporate advanced analytics

TECHNOLOGY ADOPTION OUTLOOK

IoT is the opposite of traditional AIM technology adoption, which involves moving data in batches and then normalizing and loading the data into target systems. Analytical software is used once the data is loaded and at rest, typically to produce reports or statistical analysis that help in decision making or for on-demand decision automation.

IoT AIM is about sensing and responding within a time window, continuously moving and managing sensor events, and handling large volumes of data, continuous decision automation, and decision support using analytics and rules. Data must travel from a sensor to edge collection to central processing where it is normalized and analyzed against some type of prediction model or algorithm to determine whether further action is required. Once actions are required, response cycle times vary substantially, but the end-to-end cycle time must be faster than the time window allotted to derive business benefit.

Four considerations should dominate IoT AIM technology adoption planning:

. What is the total time window available to deliver business value when a condition is identified that requires a response? Adoption of AIM technology is required when time windows are narrower than the cycle time of the end-to-end IoT system. . How good is the prediction or insight from your analytics software? Quality problems occur for a variety of reasons, but noisy predictions and wrong or nonactionable predictions are all expensive. Using the best approach to analytics for a particular problem requires an assessment of whether there are data gaps that need to be resolved as well as identifying options and experimenting with them prior to adoption. Different techniques may also be required for different workloads or stages within a workload. . How much technical debt are you accumulating by repurposing existing AIM technology or investing in custom development? In the beginning, it makes sense to keep costs low by leveraging existing AIM technology for an IoT project. But technical debt rapidly accumulates when existing technology doesn't really align with needs and has to be customized or contorted on an ongoing basis to make it work. As IoT initiatives are operationalized, the use of purpose-built tools is almost always a better path once those tools reach the required level of sophistication. . How do technology choices align with your enterprise's adoption risk profiles? Different organizations have different approaches to risk. When it becomes clear that there is a need to add new functionality or replace nonperforming existing technology, the selection has to align with the skills of the team implementing and using the technology. We assess the adoption risk

©2016 IDC #US41841116 34 and speed of adoption for each of the 25 technologies highlighted in this IDC TechScape. Planning should take both of those factors into account. If a new technology identified in Figure 1 has a higher risk than is acceptable to your organization but has a fast rate of market adoption, it is important to begin planning and acquiring skills sooner than later for eventual adoption.

LEARN MORE

Related Research . IDC's Worldwide Software Taxonomy, 2016 (IDC #US41572216, July 2016) . Internet of Things Analytics and Information Management Software Taxonomy, 2016 (IDC #US40708515, December 2015) IDC TechScape Methodology Unlike other technology assessment frameworks, the IDC TechScape provides a visual representation of the process of technology adoption, dividing technologies into three major categories based on their impact on the organization and assessing their relative maturity within their respective categories. The study examines particular individual categories and provides additional insights about the speed of adoption, technology potential for success (risk), and industry hype. Refer back to Figure 1 for the IDC TechScape for Internet of Things analytics and information management.

The IDC TechScape is a tool for strategic planning and tactical decision making for technology professionals in IT buyer organizations. This audience may include CIOs and senior technology professionals, strategists, and IT buyers from IT or from lines of business.

The document's two functions:

. Strategic planning tool: . Offers a view into where a technology exists in its overall adoption life cycle. Generally, technologies in the early stages of evaluation and deployment are riskier investments than those further along in the adoption life cycle as they are deployed more broadly. . Sorts technologies into three categories that may help organizations make judgements about which technologies might provide the greatest positive impact on their organization. IT strategists can use this information to prioritize interest in a technology or group of technologies. . Tactical decision-making tool: Because it lays out where a technology exists within its overall adoption life cycle, and a certain level of associated risk may be inferred, an organization can use the IDC TechScape to determine whether or not it should immediately adopt a particular technology or should wait until the risk of adoption is less.

©2016 IDC #US41841116 35 IDC TechScape Categories and Definitions: Transformational, Incremental, and Opportunistic Executives use the IDC TechScape model to:

. Inform technology adoption decisions based on organizational appetite for risk and potential for transformational change . Support a decision on when a technology or group of technologies might be ready for adoption, given the purchasing organization's preferred appetite for risk — whether or not an organization should immediately adopt a particular technology or wait until the risk of adoption decreases The three types of adoption curves in an IDC TechScape are:

. Transformational. These technologies will completely reshape markets and investment strategies. They may create new business and/or market opportunities and lead to new enterprise and consumer capabilities. They may differ significantly from current technologies and may have mostly unrecognized market impacts/opportunities. Transformational technologies have already demonstrated that they fundamentally change current best practices. . Incremental. This new generation of technology measurably improves on an existing category of technologies to deliver better business outcomes. In terms of business processes, technologies deliver small but measurable improvement over current best practices. . Opportunistic. These technologies will grow based on specific use cases, and they have an undetermined or limited capability to improve existing technologies/processes. Their potential changes currently lack a clear impact on current best practices. Synopsis Over time, analytics and information management (AIM) technology adopted for IoT will be different from an organization's existing technology investments that performs a similar, but less time-sensitive or data volume–intensive function. Enterprises will want to leverage as much of their existing AIM investments as possible, especially initially, but will want to adopt IoT-aligned technology as they operationalize and identify functionality gaps in how data is moved and managed, how analytics are applied, and how actions are defined and triggered at the moment of insight. This IDC TechScape covering IoT AIM is designed to help: . Enterprises learn more about the newer AIM technologies that support IoT . Align these technologies with an enterprise's technology risk profile to determine what is ready to adopt and what should be monitored . Gain a better understanding of where an IoT team will need to create skills and competencies as it plans to adopt newer AIM technologies According to Maureen Fleming, vice president for IDC's IoT Analytics and Information Management research program, "Implementing the analytics and information management tier of an IoT initiative is about the delivery and processing of sensor data, the insights that can be derived from that data and, at the moment of insight, initiating actions that should then be taken to respond as rapidly as possible. To achieve value, insight to action must fall within a useful time window. That means the IoT AIM tier needs to be designed for the shortest time window of IoT workloads running through the end-to-end system. It is also critical that the correct type of analytics is used to arrive at the insight."

©2016 IDC #US41841116 36 About IDC International Data Corporation (IDC) is the premier global provider of market intelligence, advisory services, and events for the information technology, telecommunications and consumer technology markets. IDC helps IT professionals, business executives, and the investment community make fact- based decisions on technology purchases and business strategy. More than 1,100 IDC analysts provide global, regional, and local expertise on technology and industry opportunities and trends in over 110 countries worldwide. For 50 years, IDC has provided strategic insights to help our clients achieve their key business objectives. IDC is a subsidiary of IDG, the world's leading technology media, research, and events company.

Global Headquarters

5 Speen Street Framingham, MA 01701 USA 508.872.8200 Twitter: @IDC idc-community.com www.idc.com

Copyright and Trademark Notice

This IDC research document was published as part of an IDC continuous intelligence service, providing written research, analyst interactions, telebriefings, and conferences. Visit www.idc.com to learn more about IDC subscription and consulting services. To view a list of IDC offices worldwide, visit www.idc.com/offices. Please contact the IDC Hotline at 800.343.4952, ext. 7988 (or +1.508.988.7988) or [email protected] for information on applying the price of this document toward the purchase of an IDC service or for information on additional copies or web rights. IDC and TechScape are trademarks of International Data Group, Inc. IDC TechScape is a registered trademark of International Data Corporation, Ltd. in Japan.

Copyright 2016 IDC. Reproduction is forbidden unless authorized. All rights reserved.