The Life-Cycle of Data with an Eye on Integration Of
Total Page:16
File Type:pdf, Size:1020Kb
The Life-Cycle of Data With an eye on integration of embedded devices with the Cloud, Raima CTO Wayne Warren differentiates between live, actionable information and data with ongoing value, and argues that to realise the true power of the Cloud, businesses must utilize the power of the collecting and controlling computers on the edge of the grid. The rise of the Cloud has presented companies of all sizes with new opportunities to store, manage and analyze data – easily, effectively and at low cost. Data management in the Cloud has enabled these companies to reduce their in-house systems costs and complexity, while actually gaining increased visibility on plant and processes. At the same time, third party service organisations have emerged, providing data dashboards that give companies 'live real time control' of their assets, often from remote locations, as well as historical trend analysis. Consider, for example, a company at a central location with a key asset in an entirely different or isolated location. It may be advantageous to monitor key operational data to ensure the equipment itself is not trending towards some catastrophic fault, and some performance data to ensure output is optimal. That might be relatively few sensors over all, and perhaps some diagnostics feedback from onboard control systems. But getting at that data directly might mean setting up embedded web servers or establishing some form of telemetry, and then getting that data into management software and delivering it in a means that enables it to be acted upon. How much easier to simply provide those same outputs to a Cloud-based data management provider, and then log-in to a customized dashboard that provides visualization and control, complete with alarms, actions, reports and more? And all for a nominal monthly fee. Further, with virtually unlimited storage in the Cloud, all data can be stored, mined, analyzed and disseminated as reports that provide unprecedented levels of traceability (important to many sectors of industry) and long term trend analysis that can really help companies to boost performance and, ultimately, improve profitability. As our data output increases, it might seem reasonable to expect that the quality of information being returned from the Cloud should improve as well, enabling us to make better operational decisions that improve performance still further. And to an extent, this is true. But there is also danger on that path, because as we move into an era of 'big data', it is becoming increasingly difficult to pull meaningful, 'actionable information' out from the background noise. Where once a data analyst might simply have been interested in production line quotas and the link to plant or asset uptime, today they may also be interested in accessing the data generated by the myriad of automated devices along the production line, because that raw data may well hold the key to increased productivity, reduced energy consumption, elimination of waste, reduction in down time, improved overall equipment effectiveness, and ultimately a better bottom line. Date: 09/10/2014 RaimaDMA025D page 1 / 6 And we really are talking about huge amounts of data. The rise of the 'Internet of Things' and machine-to- machine (M2M) communications, combined with the latest GSM networks that deliver high-speed, bi- directional transfer without the limitations of range, power, data size and network infrastructure that held back traditional telematics solutions, has seen data transmission increase exponentially in the last few years. As of 2012, across the globe over 2.5 exabytes (2.5x10exp18) of data were being created every day, and it is certainly not unusual for individual companies to be generating hundreds of gigabytes of data. Importantly, different types of data will have different lifecycles, and this impacts on how that data needs to be managed. The phasor measurement devices, for example, monitoring variables on the power grid that highlight changes in frequency, power, voltage etc, might generate perhaps a few terabytes of information per month. Certainly this is a lot of data, and it has a mixture of lifecycles; long term information indicative of trends, and live data that can flag up an immediate fault. A complex product test, by contrast, might generate the same volume of information in an hour or less, but again there will be a mixture of data lifecycles; the complex information that provides a pass/fail output for the test needs to be immediately available to optimise production cycles, but has no value subsequently, while the overview information might be important to store for traceability reasons. The common thread, however, is the large amounts of data being generated. Indeed, this is so much information that it is no longer meaningful to measure today's data in terms of the number of records, but rather by the velocity of the stream. Live data – that is, captured data about something happening right now – is available in great quantities and at low cost. Sensors on embedded and real-time computers are able to capture information at a rate that exceeds our ability to use it. That means that the moment for which any given volume of data has real value may well come and go faster than we can actually exploit it. If our only response is simply to send all of that data to the Cloud, with no regard for the lifecycle of the data, then the Cloud becomes little more than a dumping ground for data that may well have no ongoing value. It is vital, then, to consider the life-cycle of live data, and how that data is best distributed between embedded devices and the Cloud. For Cloud resources to be truly optimized while enabling meaningful operational decisions to be made locally, in the moment, then the power of embedded systems on the edge of the grid must be fully utilized. Only by delegation of responsibilities for data collection, filtering and decision making to the increasingly powerful computers deployed within the 'Internet of Things' can we have effective management of data from its inception to disposal. The embedded database industry has responded to this requirement with data management products that deliver the requisite performance and availability in products that are readily scalable. These data management products can take the captured live data, process it (aggregating and simplifying the data as required) and then distribute it to deliver the visualization and analytics that will enable meaningful decisions to be made. The ability to do all of this locally within embedded systems – acting on data that is only of real value in the moment – has a huge impact on the performance of plant and assets, while the data that has ongoing value can be sorted and sent to the Cloud. Date: 09/10/2014 RaimaDMA025D page 2 / 6 Consider, for example, the testing of consumer products where the way the product sounds or feels is taken as an indicator of its quality. Such quality testing is common in a host of domestic and automotive products, which possess intrinsic vibration and sound characteristics that may be used as indictors of mechanical integrity. A part under test might be subjected to a period of controlled operation while measuring millions of data points. A multitude of metrics and algorithms need to be applied to this data to create a 'signature', which determines whether the product passes or fails the quality check. Raima was involved in just such an application, in a market where production cycle times were critical and where new data sets were being generated every two seconds. The live data had to be acted on in real time to match the required production cycle time while providing reliable pass/fail information. At the same time, it is important to aggregate, manage and store the essential test information for the long term so that in the event of an operational fault or a customer complaint, the product serial number can be quickly checked against the test history. It is important to be able to reprocess the historical data when considering warranty costs or perhaps even the need for a product batch recall. This is a very clear differentiation of data lifecycles – historical data that can be aggregated, sorted and then stored for the long term (ideal for the Cloud), and live data that impacts directly on production performance. When we talk about performance, we do not necessarily have to think about 'real time' response in a deterministic sense for streaming data, but we must have 'live real time' response that is simply fast enough to work with live information that appears quickly and has a short life-cycle. The database might need to be able to keep up with data rates that may measure thousands of events per minute, with burst rates many times higher, and must able to raise alarms or trigger additional actions when particular conditions are met. Those conditions might involve the presence or absence of data in the database, so quick lookups must be performed. They may also depend on connections between records in the database, so the database system needs to be able to maintain associations and lookups that can be quickly created or queried. The high-speed processors in modern computer systems play a part, but increasingly meeting performance requirements depends on scalability, which comes from the ability to distribute the database operations across multiple CPUs and multiple processor cores. This not only makes best use of available resources, but also opens up possibilities for parallel data access, allowing very fast throughput. Consider the example of wind turbine control, where operators need to constantly monitor variables such as wind speed, vibration and temperature. Because wind turbines are often in remote locations and are unmanned, a database is required that can store large amounts of data – perhaps in the order of terabytes per day – and that will continue to operate reliably 24/7 without intervention.