DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it
Corso Big Data Architecture Scuola di Ingegneria di Firenze Tema: Visual Analytics of IOT Data and data traffic Paolo Nesi, Gianni Pantaleo
DISIT Lab Dipartimento di Ingegneria dell’Informazione, DINFO Università degli Studi di Firenze Via S. Marta 3, 50139, Firenze, Italy Tel: +39‐055‐2758517, fax: +39‐055‐2758570 http://www.disit.org
1 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it
SUMMARY
1. Introduction • IoT/IoE in evolving Smart City environments 2. Solutions for IOT data traffic and flow Visual Analytics • The Snap4City architecture • AMMA and DevDash Tools 3. IoT Data Flows Management • IoT Brokers & Communication Protocols • Apache NiFi 4. Distributed Storage and Indexing • Configure Zookeeper • HBase storing & Solr indexing 5. Producing Visual Analytic Tools • Banana Dashboards
2 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it
SUMMARY
1. Introduction • IoT/IoE in evolving Smart City environments 2. Solutions for IOT data traffic and flow Visual Analytics • The Snap4City architecture • AMMA and DevDash Tools 3. IoT Data Flows Management • IoT Brokers & Communication Protocols • Apache NiFi 4. Distributed Storage and Indexing • Configure Zookeeper • HBase storing & Solr indexing 5. Producing Visual Analytic Tools • Banana Dashboards
3 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 1. Introduction – IoT/IoE in evolving Smart City Environments
Current State of the Art: Goals, Issues & Solutions
Users may easily access, Problems due to Storing continuously last read and monitor ingested data, discontinuity and loss values from all of data by data‐driven connected devices applications (IoT, Smart City sensors etc.) build applications and dashboards
collecting historical trends (Data Shadow) visualize, process and perform different kinds Lack of tools which efficiently Production of visual, easy of analytics on data monitor data traffic from to create tools which devices and applications quantitatively monitor messages/data Detect potential flows and traffic problems and anomalies High costs (real‐time & in personal data traffic historical traffic trends) (Quality of service) Requiring users with programming skills Providing different kinds of data analytics and visual dashboards
4 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 1. Introduction – IoT/IoE in evolving Smart City Environments
Field
Update Query / Act Context Context Producer Consumer Publish Subscribe Context Broker 5 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 1. Introduction – IoT Data Trend & Issues
Typical use case: Personal device, data collection and visualization
Data Analytics Visual IoT Tools Dashboard User Apps Send Queries, Registration Actions Publish/Subscribe etc… Update IoT IoT Apps Sensors etc… & Services Brokers Visualize Actuators AMQP Last Data and Devices NGSi Historical Data Data Storage & Indexes Data Flow Optimization & Enrichment DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 1. Introduction – IoT Data Trend & Issues
Each operation of reading / acting produce several calls and messages within the IoT Infrastructure
IoT Devices
IoT Brokers
7 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 1. Introduction – IoT Data Trend & Issues
Large scale deployment… Rapid & huge growth of connections and data flows, messages etc…
8 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 1. Introduction – IoT Data Trend & Issues
# IoT > # People
≈ 7 Billions World Population
Reference source: iot-analytics.com 9 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 1. Introduction – Major IoT Platforms
Azure IoT
Google IoT
Amazon AWS
10 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 1. Introduction – Big Data Flow Ingestion, Collection & Management
Requirements: Big Data IoT data and logs ingestion and data flow management among many different kind of device, broker protocols (MQTT, NGSi etc.) and user applications. Support different communication modalities: push (event‐driven messages), pull, polling (periodically scheduled requests), http listening etc… IoT Data flows Buffering / Queuing management, fault tolerance, data provenance and replication. Persistent storing (data shadow) and Indexing IoT data and logs. Produce visual tools to display charts about data analytics, temporal trends etc.
11 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it
SUMMARY
1. Introduction • IoT/IoE in evolving Smart City environments 2. Solutions for IOT data traffic and flow Visual Analytics • The Snap4City architecture • AMMA and DevDash Tools 3. IoT Data Flows Management • IoT Brokers & Communication Protocols • Apache NiFi 4. Distributed Storage and Indexing • Configure Zookeeper • HBase storing & Solr indexing 5. Producing Visual Analytic Tools • Banana Dashboards
12 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 2. Solutions for IOT data traffic and flow Visual Analytics – The Snap4City Architecture https://www.snap4city.org
13 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 2. Solutions for IOT data traffic and flow Visual Analytics – AMMA and DevDash Tools
DevDash: Developer Dashboard AMMA: Application & Microservice Monitor & Analyzer service http://devdash.snap4city.org http://amma.snap4city.org Data Value Data flow control Control: collection, tool for real‐time enrichment and monitoring and indexing data from analyzing traffic and IoT devices. communication flows (IoT devices and Drill down on applications). data, time, time‐ trends, facet filtering, Many different geo‐spatial faceting. kind of data analytics and visualization Apply filters up functionalities. to reach the desired data view. Drill down on data, time, time‐trends, View data on facet filtering, geo‐spatial map up to single faceting. device resolution. Origin/destination Download data from/to external/local IP. Details. 14 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 2. Solutions for IOT data traffic and flow Visual Analytics – AMMA and DevDash Tools
15 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it
SUMMARY
1. Introduction • IoT/IoE in evolving Smart City environments 2. Solutions for IOT data traffic and flow Visual Analytics • The Snap4City architecture • AMMA and DevDash Tools 3. IoT Data Flows Management • IoT Brokers & Communication Protocols • Apache NiFi 4. Distributed Storage and Indexing • Configure Zookeeper • HBase storing & Solr indexing 5. Producing Visual Analytic Tools • Banana Dashboards
16 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – IoT Brokers and Communications Protocols
MQTT Protocol MQTT (Message Queuing Telemetry Transport Protocol) is a lightweight messaging protocol designed for M2M (machine to machine) released in 2010. It implements a publish‐subscribe messaging mechanism, involving three main actors: Publishers, which produce data and send them to a broker. Subscribers, which subscribe to a topic of interest, and receive notifications when a new message for the topic is available. Broker, which filter data based on topic and distribute them to subscribers.
MQTT Broker Publisher Subscriber
Topic Queue Publisher Subscriber
17 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – IoT Brokers and Communications Protocols
AMQP Protocol Advanced Message Queuing Protocol (AMQP) is an Open‐standard protocol for message‐oriented applications. Similar to MQTT providing a publish / subscribe mechanism which also supports system interoperability in distributed environments thanks to an Exchange module, which is responsible for receiving publisher messages and distributing them to queues based on pre‐defined roles and conditions.
AMQP Broker Publisher Queue Subscriber
Exchange
Publisher Queue Subscriber
18 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
19 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
What is NiFi ? NiFi (short for “Niagara Files”) is an Open Source dataflow tool that can collect, route, enrich, transform and process data in a scalable manner. It is a processing engine based on the concepts of flow‐based programming (FBP), that was designed to manage the flow of information in an ecosystem.
Why NiFi ? Open Source Scalable, extensible platform Visual Web Interface Provide data provenance Highly configurable × No data replication Data‐source agnostic
20 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi is NOT: NiFi is not a distributed computation Engine. It’s not a complete ETL tool. It’s not a persistent Data storage tool. It only holds data temporarily for re‐run / data provenance purposes. It’s not a document indexer. It’s indexing capabilities are only to help in troubleshooting / debugging.
21 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi is NOT: FBP Term NiFi Term Description Information FlowFile Each object moving through the system. Packet Performs the work, doing some FlowFile combination of data routing, Black Box Processor transformation, or mediation between systems.
Bounded The linkage between processors, acting as Connection queues and allowing various Buffer processes to interact at differing rates. Maintains the knowledge of how Flow processes are connected and manages Scheduler Controller the threads and allocations thereof which all processes use. A set of processes and their connections, Process which can receive and send data via Subnet ports. A process group allows creation of Group entirely new component simply by composition of its components.
22 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Features Highly configurable and extensible ‐ Low latency VS High throughput ‐ Modify Dataflow at runtime ‐ Build custom processors ‐ Development of single components that can be reused and combined to make more complex flows
Data buffering and queueing ‐ Provide back‐pressure management ‐ Buffering queued data ‐ Custom prioritization schemes for queues
23 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Features Security and data recovery ‐ Content encryption, communication over secure protocols (SSL, SSH, HTTPS) ‐ Role‐based authentication/authorization mechanism for both data transfer and user management.
Data Provenance ‐ NiFi records and indexes fine‐grained data provenance details as objectsflow through the system, making them accessible for displaying.
24 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Features Web‐based interface ‐ Drag and drop processors to build a flow ‐ Start, stop, and configure components in real time ‐ View errors and corresponding error messages ‐ View statistics and health of data flow ‐ Create templates of common processor & connections
25 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
26 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Architecture NiFi is a Java based system that executes within a OS / Host JVM. JVM Primary components: Web Server • Web Server: Hosts NiFi HTTP‐based control API Processor 1 Extension N • Flow Controller Provides Processor N Flow Controller and schedules threads for execution • Extensions: FlowFile FlowFile Content Provenance Processors, Controller Repository Repository Repository Services, etc. • Repositories: FlowFile Local Storage (state of a given FlowFile), Content (actual content bytes),
Provenance 27 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Distributed Architecture
28 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi General Installation Download repository: http://nifi.apache.org/download.html Two versions available: Linux (tarball) and Windows (zip) Download the appropriate version and extract to the location from which youwanttorun the application. Mac OSX Users may also use the tarball file or install via Homebrew by running: $ brew install nifi Install NiFi as a Service (Linux) Navigate to the NiFi installation folder. Run: $ bin/nifi.sh install
NiFi Execution (Linux/MacOS) Navigate to the NiFi installation folder. To run NiFi in the foreground, run: $ bin/nifi.sh run Use Ctrl-C to stop the application. To run NiFi in the background, run: $ bin/nifi.sh start To stop the application, run: $ bin/nifi.sh stop Starting NiFi as a service: $ sudo service nifi start Stopping NiFi service: $ sudo service nifi stop Check NiFi service running status:
$ sudo service nifi status 30 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Execution (Windows) Navigate to the NiFi installation folder.
Execute the bin/run-nifi.bat file.
To stop the application, select the window that was launched and press Ctrl-C.
Use NiFi Web‐based Interface (All platforms) Open a web browser and navigate to http://localhost:8080/nifi
Port 8080 is the default port and can be changed by editing the nifi.properties file in the NiFi configuration directory.
31 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: FlowFiles NiFi FlowFile Nifi FlowFiles structure consists of two parts: a header containing the attributes and the content. Header (Attributes)
Attributes can be referenced via the NiFi Content expression language. (Payload)
The payload is typically actual data that is being routed through the dataflow and can also be referenced by specific processors. HTTP Document NiFi FlowFile
32 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: FlowFiles FlowFiles can be created, copied, cloned, merged, split, modified, deleted etc. FlowFiles consists of a map of key/value pair attribute strings. FlowFiles attributes typically contain a set of default attributes, then custom attributes can be added. Default attributes: filename – A filename that can used when storing data locally or on a remote system. path – the directory that can be used when storing data. uuid – A Universally Unique Identifier for each single FlowFile. entryDate – the date and time at which the FlowFile entered the system. lineageStartDate – The date and time at which the oldest ancestor of the FlowFile entered the system. fileSize – Represents the size, in number of bytes, of the FlowFile’s Content.
33 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Processors The FlowFile Processor is the actual working NiFi component, which can perform a large variety of tasks and actions, such as: listen for incoming data; pull data from external sources; publish data to external sources; route, transform, or extract information from FlowFiles etc. NiFi built‐in FlowFile Processors examples: Data Ingress (Ingestion) • GetFile – Pull content from the local disk and delete the original file. • GetSFTP – Pull content from a remote system. Routing • RouteOnAttribute – Route FlowFiles based on the values of specific FlowFile attributes. • RouteOnContent – Route FlowFiles based on the values of specific FlowFile content. Data Transformation • CompressContent – Compress or decompress content. • ReplaceText – Use Regular Expressions to modify textual content. 34 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Processors Other FlowFile Processors examples: Data Egress • PutFile – Writes the FlowFile contents to a directory on the local disk. • PutSFTP – Copies the contents of the FlowFile to a remote server. Attribute Extraction • UpdateAttribute – Adds or updates attributes using statically defined values or dynamically derived values using NiFi’s Expression Language. • ExtractText – Creates attributes based on User defined Regular Expressions. Splitting and Aggregation • UnpackContent – Unpacks archive formats such as TAR and ZIP and sends each file within the archive as a separate FlowFile through the dataflow.
Apache NiFi User Guide: https://nifi.apache.org/docs/nifi‐docs/html/user‐guide.html
35 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Processors Creation Open the NiFi web‐based user interface. Through the Components Toolbar it is possible to add elements to the FileFlow.
To add a FlowFile Processor, drag and drop the processor icon in the FileFlow canvas. A dialog is shown to the user, in order to choose which type of Processor to use.
36 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Processors Configuration Contextual Menu for Processor configuration Configure: open the configuration Tab Start or Stop: start or stop a Processor (exclusive), depending on the current state of the Processor. Enable or Disable: enable or disable a Processor (exclusive), depending on the current state of the Processor. View data provenance: This option displays the NiFi Data Provenance table, with information about data provenance events for the FlowFiles routed through that Processor. View status history: graphical representation of the Processor’s statistical information over time. View usage: show the Processor’s usage documentation. Center in view: center the view of the canvas on the given Processor. 37 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Processors Configuration Contextual Menu for Processor configuration View connections (Upstream/Downstream): This option allows the user to see and "jump to" connections (Upstream / Downstream) that are coming into / going out of the Processor. This is particularly useful when processors connect into and out of other Process Groups. Change color: change the color of the Processor. Create template: This option allows the user to create a template from the selected Processor. Copy: This option places a copy of the selected Processor on the clipboard, so that it may be pasted elsewhere on the canvas by right‐clicking on the canvas and selecting Paste. The Copy/Paste actions also may be done using the keystrokes Ctrl-C (Command‐C) and Ctrl-V (Command‐V). Delete: This option allows the DFM to delete a Processor from the canvas.
38 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Processor Configuration ‐ Settings Processors are configurable. By right‐clicking on the Processor itself, and choosing the «configure» option, four configuration tabs are presented to the user: Settings ‐ This tab allows you to: • Manage Penalty and Yield • Rename the processor functionalities • Enable/Disable the processor • Set Bulletin Level for error and warning notifications
39 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Processor Configuration ‐ Scheduling Scheduling ‐ This tab allows you to set different scheduling strategies: • Timer driven: This is the default mode. The Processor will be scheduled to run on a regular interval, set by the “Run Schedule” parameter. The “Run Duration” parameter defines how long the Processor should run each time is triggered (choosing low latency vs high throughput approaches) . • Event driven: When this mode is selected, the Processor will be triggered to run by an event, and that event occurs when FlowFiles enter Connections feeding this Processor (experimental and is not supported by all Processors). • CRON driven: When using the CRON driven scheduling mode, the Processor is scheduled to run periodically, similar to the Timer driven, but with more flexible configure options. 40 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Processor Configuration ‐ Comments
Comments ‐ This tab simply allows users to add comments.
41 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Processor Configuration ‐ Properties
Properties ‐ This tab allows you to configure the Processor’s specific properties. If the processor allows custom properties to be configured, the user can click the plus sign in the top‐right to add them. Some properties allow for the NiFi Expression Language.
NiFi Expression Language documentation: https://nifi.apache.org/docs/nifi‐docs/html/expression‐language‐guide.html#types
42 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Processor Configuration – NiFi Expression Language
The NiFi expression language is the framework fore defining and referring attributes (metadata). The language is built on the attribute being referenced with a preceding ${ and proceeding }, for example ${inputFilePath}. Additional terms (functions) can be added for attributes manipulation, transformation and logic expressions, for example: Check for substring matching within attributes ${fileName:contains('Nifi')} Append string operations ${outputPath:append('/new_directory’)} Reformat dates ${string_date:toDate("yyyy-MM-DD")} Mathematical operations ${totalAmount:minus(5)} Multi‐variable comparison ( gt => “greater than”) ${variable_one:gt(${variable_two})} 43 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: NiFi Expression Language built‐in functions Boolean Logic • prepend • endsWith • replace • contains • isNull • replaceFirst • in • notNull • replaceAll • find • isEmpty • replaceNull • matches • equals • replaceEmpty • indexOf • equalsIgnoreCase • length • lastIndexOf • gt • jsonPath • ge • lt Encode/Decode Functions • le • escapeJson Mathematical Operations and • and • escapeXml Numeric Manipulation • or • escapeCsv • plus • not • escapeHtml3 • minus • ifElse • escapeHtml4 • multiply • unescapeJson • divide • unescapeXml String Manipulation • mod • unescapeCsv • toRadix • toUpper • unescapeHtml3 • fromRadix • toLower • unescapeHtml4 • random • trim • urlEncode • Math • substring • urlDecode • substringBefore • base64Encode • substringBeforeLast • base64Decode Date Manipulation • substringAfter • format • substringAfterLast Searching • toDate • getDelimitedField • now • append • startsWith 44 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
Processor Configuration – JSONPath Expression Language
Properties ‐ When the Processor references a JSON, the JSONPath Expression Language is used. In this language, the JSON hierarchy is referenced with a $ to represent the root and the names of the nested fields get a value.
Similarly, the EvaluateXPath Expression is provided for referencing XML. 45 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Process Groups
Process Groups are used to logically group a set of components so that the dataflow is easier to understand and maintain. Process Groups are set composed by processors and their connections. They can receive data via input ports and sends data via output ports.
46 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Process Groups Labels: draggable colored areas that can be used to visually highlight and differentiate, for instance, different logical flows of Processors and Process Groups as well. Also, add documental text to Processors and Process Groups.
Navigate into the Process Group and back to the main Flow
47 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Process Groups and Remote Process Groups Creation Open the NiFi web‐based user interface. Through the Components Toolbar it is possible to add elements to the FileFlow.
To add a Processor Group, drag and drop the processor icon in the FileFlow canvas. Remote Process Group (RPG): Remote Process Groups are particular Process Groups which reference remote instances of NiFi. When an RPG is dragged ontothe canvas, the user is prompted for the URL of the remote NiFi instance. If the remote NiFi is a clustered instance, the URL that should be used is the URL of any NiFi instance in that cluster.
48 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Input and Output Ports
Input Ports provide a mechanism for transferring data into a Process Group.
Output Ports provide a mechanism for transferring data from a Process Group to destinations outside of the Process Group: DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Funnels, Templates and Labels
Funnels are used to combine the data from many Connections into a single Connection. Connections can be configured with FlowFile Prioritizers, i.e. providing the feature to Prioritize all data on that single Connection, rather than prioritizing the data on each Connection independently.
Templates can be created from sec‐ tions of the flow, or they can be imported from other dataflows. Several components may be combined together to make a larger building block to be included in a dataflow. Templates can also be exported as XML and imported into another NiFi instance.
Labels are draggable colored areas that can be used to visually highlight and differentiate, for instance, different logical flows of Processors and Process Groups as well. Also, add documental text to Processors and Process Groups. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Controller Services
Controller Services are shared services that can be used by reporting tasks, processors, and other services to utilize for configuration or task execution. To add a Controller Service for a reporting task, select Controller Settings from the Global Menu. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Controller Services
Controller Services are, in a simplified view, packages of configuration parameters and code that perform usually some actions in the background. Some examples are:
Connections to external services, for instance databases and APIs, where the controller service encapsulates the connection parameters. Reporting Tasks that send statistics about NiFi on a regular basis, for example to a monitoring service Sharing state between processors and cluster nodes, for instance with cache services. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Controller Services Configuration ‐ General
The Controller Services window has four configuration tabs: General, Reporting Task Controller Services, Reporting Tasks and Registry Clients.
The General tab provides settings for the overall maximum thread counts of the instance. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Controller Services Configuration – Reporting Task Controller Services In the Reporting Task Controller Services tab, it is possible to create new Controller Services by clicking the "+" button in the upper‐right corner: DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Controller Services Configuration – Reporting Task Controller Services Once you have added a Controller Service, you can configure it by clicking the Configure button in the far‐right column. Other buttons in this column include Enable, Remove and Access Policies: DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Reporting Tasks
Reporting Tasks run in the background to provide statistical reports aboutwhatis happening in the NiFi instance. The DFM adds and configures Reporting Tasks similar to the process for Controller Services. To add a Reporting Task, select Controller Settings from the Global Menu.. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Connections Connections provide linkage between processors, specifying how FlowFiles should travel between processors. Common connections are for Success and Failure, which are simple error handling for processors. FlowFiles that are processed without fault are sent to the success queue while those with problems are sent to a failure queue. Additional connection types: Not Found or Retry. Enable back pressure via configurable upper bounds. Manage queued data with priority mechanisms. It is possible to draw a connection that loops back on the same processor (useful if the user wants the processor to try to re‐process FlowFiles if they go down a failure Relationship).
57 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Connections ‐ Configuration Connections present a two‐tabs configuration window. The Details tab provides information about the source and destination components (component name, component type, and Process Group in which the component lives);
58 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Connections ‐ Configuration The Settings tab allows to configure the Connection’s Name, FlowFile Expiration, Back Pressure Thresholds, Load Balance Strategy and Prioritization. FlowFile expiration: data that cannot be processed within the time value set by this options is automatically removed from the flow. Load Balance Strategy: to distribute the data in a flow across the nodes in the cluster, NiFi offers the following load balance strategies: • Do not load balance (default) • Partition by attribute • Round robin • Single node (all FlowFiles will be sent to a single node in the cluster). 59 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Connections ‐ Configuration Back Pressure parameters (Object and Size thresholds) indicate how much data should be allowed to exist in the queue, before the component that is the source of the Connection is no longer scheduled to run. This prevent the system to be overrun with data!
60 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Main Components: Connections ‐ Configuration Available Prioritizer: data can be prioritized in the queue so that higher priority data is processed first. Prioritizers can be dragged from the top ('Available prioritizers') to the bottom ('Selected prioritizers'). Multiple prioritizers can be selected. The prioritizer that is at the top of the 'Selected prioritizers' list is the highest priority. If a prioritizer is no longer desired, it can then be dragged from the 'Selected prioritizers' list to the 'Available prioritizers' list. The following prioritizers are available: FirstInFirstOutPrioritizer: The FlowFile that reached the connection first will be processed first. NewestFlowFileFirstPrioritizer: The FlowFile that is newest in the dataflow will be processed first. OldestFlowFileFirstPrioritizer: The FlowFile that is oldest in the dataflow will be processed first. This is the default scheme that is used if no prioritizers are selected’. PriorityAttributePrioritizer: The FlowFile with the highest priority value will be processed first. Note that an UpdateAttribute processor should be used to add the "priority" attribute (alphanumeric values, being "a" a higher priority than "z", as well as "1" is a higher priority than "9”) to the FlowFiles before they reach a connection that has this prioritizer set.
61 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Variables Window Variables can be created and configured within User Interface through a dedicated section. The variables can be used in any field that supports Expression Language.
(4)
(1)
(3)
(2)
(5)
62 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Variables Scope Variables in a child group override the value in a parent group.
63 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Data Provenance
64 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Data Provenance
65 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Data Provenance Provenance Event Description ADDINFO Indicates a provenance event when additional information such as a new linkage to a new URI or UUID is added ATTRIBUTES_MODIF Indicates that a FlowFile’s attributes were modified in some way IED CLONE Indicates that a FlowFile is an exact duplicate of its parent FlowFile CONTENT_MODIFIE Indicates that a FlowFile’s content was modified in some way D CREATE Indicates that a FlowFile was generated from data that was not received from a remote system or external process DOWNLOAD Indicates that the contents of a FlowFile were downloaded by a user or external entity DROP Indicates a provenance event for the conclusion of an object’s life for some reason other than object expiration EXPIRE Indicates a provenance event for the conclusion of an object’s life due to the object not being processed in a timely manner FETCH Indicates that the contents of a FlowFile were overwritten using the contents of some external resource FORK Indicates that one or more FlowFiles were derived from a parent FlowFile JOIN Indicates that a single FlowFile is derived from joining together multiple parent FlowFiles RECEIVE Indicates a provenance event for receiving data from an external process REPLAY Indicates a provenance event for replaying a FlowFile ROUTE Indicates that a FlowFile was routed to a specified relationship and provides information about why the FlowFile was routed to this relationship SEND Indicates a provenance event for sending data to an external process 66 UNKNOWN Indicates that the type of provenance event is unknown because the user who is attempting DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Simple FileFlows Examples Listen for HTTP incoming posts
67 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Simple FileFlows Examples Query Yahoo weather API and produce a JSON
68 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Installation & Configuration Download tarball of your favorite stable release (referred as NIFI_VERSION>.bin.tar.gz) from the NiFi repository: http://nifi.apache.org/download.html Untar and extract to the location from which you want to run the application (referred as
To run NiFi in the background, run: $ bin/nifi.sh start
To stop the application, run: $ bin/nifi.sh stop
69 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Complete Flow Example @ DISIT Lab: Collect, Extract and Index Data Traffic Logs (back‐end for AMMA Dashboard)
70 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Complete Flow Example @ DISIT Lab: Collect, Extract and Index Data Traffic Logs
[ . . . ] 71 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
NiFi Complete Flow Example @ DISIT Lab: Collect, Extract and Index Data Traffic Logs
Syslog: Collection For AMMA 72 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – Apache NiFi
Many NiFi flow examples available on the Web!
Indexing Tweets with NiFi and Solr https://blogs.apache.org/nifi/entry/indexing_tweets_with_nifi_and
NiFi flow to Push Tweets into Solr/Banana, HDFS/Hive https://community.hortonworks.com/articles/1282/sample‐hdfnifi‐flow‐to‐push‐tweets‐into‐ solrbanana.html DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it
SUMMARY
1. Introduction • IoT/IoE in evolving Smart City environments 2. Solutions for IOT data traffic and flow Visual Analytics • The Snap4City architecture • AMMA and DevDash Tools 3. IoT Data Flows Management • IoT Brokers & Communication Protocols • Apache NiFi 4. Distributed Storage and Indexing • Configure Zookeeper • HBase storing & Solr indexing 5. Producing Visual Analytic Tools • Banana Dashboards
74 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Configure Zookeeper Setting Up the Environment
Kibana / Banana
Cloud Distributed Configuration Files
75 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Configure Zookeeper
Prerequisites (to be done for each Cluster node) Set up 3 or more VMs or physical hosts connected to the same LAN. These machines will constitute the nodes of the distributed cluster.
Install Java on all the cluster nodes (if not already installed): $ sudo apt update $ sudo apt install openjdk-8-jdk This will Install Open JDK 8. We will refer to the installation folder as
Once the installation is complete, check the Java JDK version installed: $ java –version
76 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Configure Zookeeper
Prerequisites (to be done for each Cluster node) On every node of the cluster, in order to set up PATH and JAVA_HOME variables, add the following entries to ~/.bashrc file: export JAVA_HOME=
Now apply all the changes into the current running system.
$ source ~/.bashrc
Add the following entries in the /etc/hosts file:
78 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Configure Zookeeper
Zookeeper Installation & Configuration Download tarball of your favorite stable release of Zookeper (referred as
Untar the application to a folder of your choice (referred as
Create a configuration file, e.g. zoo.cfg, if not present, or edit the existing one by adding or modifying the following entries: tickTime=2000 dataDir=
Zookeeper Installation & Configuration Create, if not exists, or edit the file myid in the folder specified by the dataDir parameter in the zoo.cfg configuration file: $ sudo touch
Each Zookeeper server should have a unique number in the myid file. For example, server 1 will have value 1, server 2 will have value 2 and so on.
$ sudo sh -c "echo '1' >
Start Zookeeper servers’ ensemble: $
80 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Configure Zookeeper
Zookeeper Installation & Configuration Set the Java heap size (important to avoid Zookeeper swapping, which will significantly degrade performance). Conservative parameters: use a maximum heap size of 3GB for a 4GB machine. To do this, create the file java.env in
export JVMFLAGS="-Xmx2048m"
Restart the Zookeeper service.
To stop the Zookeeper servers’ ensemble run: $
81 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
82 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
HBase Installation & Configuration HBase can be installed in three different fashions: 1. Standalone mode 2. Pseudo‐Distributed mode (Single‐node Hadoop system + HBase installation)
3. Fully‐Distributed mode (Multi‐node Hadoop system + HBase installation)
On every node of the cluster download your favorite stable release (referred as
Untar the package in your desired
$ tar -xvf hbase-1.1.2-bin.tar.gz -C
83 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
HBase Installation & Configuration Open
[ . . . ]
84 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
HBase Installation & Configuration [ . . . ]
HBase Installation & Configuration Open the file
86 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
87 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
Solr Cloud Installation & Configuration On every node of the cluster do the following instructions: Choose your favorite stable release to download at: http://archive.apache.org/dist/lucene/solr/ Extract the Solr distribution archive
88 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
Solr Cloud Installation & Configuration
Check Solr configuration file (typically /etc/default/solr.in.sh) and add or edit the Zookeeper host parameter: $ ZK_HOST="
Check also the JAVA home parameter: SOLR_JAVA_HOME="
89 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
Solr Cloud Installation & Configuration Define a new data schema for the new Solr sharded collection we are going to create. To this purpose, create a new configuration folder for the new collection, by copying the default one (
Edit the managed-schema.xml file in the
Install Solr Cloud with Zookeeper
91 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 3. IoT Data Flows Management – EventLogger EventLogger Data Classification Model Output Input Function Description Tstmp Tstmp Timestamp in Unix‐Epoch milliseconds. o i φl1 PidLoco PidLoci φ =I Process ID Container of the logging Microservice (i.e.: IoT device l2 / process /application / service).
ComModeo ComModei φ =I Communication Mode, indicating if the logging Microservice is l3 transmitting or receiving. Agent PidLoc Agent type (i.e: Node‐RED application, ETL, Data Analytics etc.). i φl4
Lat SrvUri Latitude of the logging device or virtual machine, obtained by o i φl5 the input SrvUrii calling the Smart City API. Lngo SrvUrii φ Longitude of the logging device or virtual machine, obtained by l5 the input SrvUrii calling the Smart City API. GeoLoc Lat , Special geolocation format for representing the logging i φl6 Lngi Microservice on map and for geographical faceting functionalities.
SrvUrio SrvUrii φ =I URI of the device / service involved in the process, as l7 represented in the Km4City Ontology. IpLoc IpLoc IP of the logging Microservice, providedinIPv4orDNSformat. o i φl8 The φl8 function performs encoding check and adjustment. IpExt IpExt IP of the external host to / from which the currently logging o i φl8 Microservice (represented by PidLoci,) is transmitting / receiving
data, according to the ComModei parameter. The φl8 function performs encoding check and adjustment.
SrvScope IpLoci , IpExti φ This parameters indicates whether the Microservice Internal or l9 External. Motivation, that is the aim of the logging Microservice Motivo Motivi φ l10 (Filesystem or DB storage, Dashboard management, Smart City API call etc.). PayLoad PayLoad Measure of data flow traffic (transmitted or received, o i φl11 depending on the ComModei parameter) between IpLoci and IpExti.ItisexpressedinKB. AppName AppName Name of the Microservice application. o i φl12=I Msg Msg Text message for additional logging notes. o i φl13
92 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
Solr Cloud Installation & Configuration Set the new managed-schema configuration for the whole sharded collection, by uploading it through Zookeeper: $
Create a new Solr sharded collection
93 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
Solr Cloud Installation & Configuration A new Solr sharded collection can be equivalently created from command line in one of the cluster nodes (for example in
$ curl -XGET http://
94 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing Navigate the Solr web interface in the Cloud Tab:
95 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
zkui (ZooKeeper User Interface): a graphical UI for monitoring Zookeeper shared configuration for Solr: https://github.com/echoma/zkui zkui is a cross‐platform GUI for frontend for managing operations on Zookeeper clusters. Shared hierarchal namespace which is organized similarly to a standard file system
Syslog: SensoriRT‐v2: Collection Collection For AMMA For DevDash
96 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing Browse the ZooKeeper node tree, edit the node's data. Copy a node to new path recursively. Delete a node and all its children. Monitor the coherence of configurations files in the cluster.
managed-schema
97 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing Monitor cluster resources and select different collections:
98 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing Make queries on selected collection:
Syslog: Collection For AMMA
99 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it
SUMMARY
1. Introduction • IoT/IoE in evolving Smart City environments 2. Solutions for IOT data traffic and flow Visual Analytics • The Snap4City architecture • AMMA and DevDash Tools 3. IoT Data Flows Management • IoT Brokers & Communication Protocols • Apache NiFi 4. Distributed Storage and Indexing • Configure Zookeeper • HBase storing & Solr indexing 5. Producing Visual Analytic Tools • Banana Dashboards
100 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
101 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
The Open Source Banana project is a fork of Kibana. It works with time‐series data stored in Apache Solr (upon which it’s actually installed). It includes powerful features, such as D3.js (data‐driven Javascript), supporting dynamic and interactive views with structured data. It is based on Angular JS, simplifying and enhance the MCV (Model‐Control‐View) paradigm in the development of web‐based and mobile applications.
102 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Simplify and enhance the MCV (Model-Control-View) paradigm in the development of web-based and mobile applications!
103 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards Angular JS “Hello Wolrd” Example
Hello Angular
Inserisci il tuo nome:
Hello {{name}}!
104 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards Angular JS “Hello Wolrd” Example
https://plnkr.co/edit/?p=preview
105 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana Web‐app Installation & Configuration Download Banana .zip archive (banana-release.zip) from GitHub repository: https://github.com/lucidworks/banana Create the
106 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana Web‐app Installation & Configuration Open the Banana web‐app by browsing one of the following URL (equivalently): http://
Start creating a new dashboard
107 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
You can easily create also custom dashboards from scratch! https://github.com/lucidworks/banana/wiki/Tutorial:‐How‐to‐Build‐a‐Custom‐Panel
108 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana Web‐app In the Dashboard Settings Panel, click on the Solr tab to choose the data collection.
Example: Collection for DevDash
DevDash
109 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana ‐ AMMA & DevDash Widgets and Functionalities Time window ______
Relative Absolute Since Time window options:
110 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana ‐ AMMA & DevDash Widgets and Functionalities Search Query ______
Total Hits ______
Total Hits options:
111 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana ‐ AMMA & DevDash Widgets and Functionalities Facet: In the facet panel, users can select facet fields to automatically filter all the other widgets and panels instantiated in the same dashboard.
112 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana ‐ AMMA & DevDash Widgets and Functionalities Bar/Line Histogram: this widget is useful for monitoring time trends (in terms of counts or cumulated values). The graph visualization is stacked and grouped on the basis of a field which can be chosen by the user.
113 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana ‐ AMMA & DevDash Widgets and Functionalities Terms: these are some histogram and pie charts in which several kinds of distribution can be shown on the basis of the Facet fields data (in terms of counts but also sum of values, mean, max, min etc.).
114 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana ‐ AMMA & DevDash Widgets and Functionalities Sunburst: the Sunburst representation is a multi‐level ring/pie chart which allows the user to easily visualize multi‐level faceting diagrams, depending on the faceting input order.
115 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana ‐ AMMA & DevDash Widgets and Functionalities Bettermap, reporting on map the geolocated data. SmartCItyMap: widget obtained by modifying the Bettermap map, showing on enriched information on geolocated data , retrieved by exploiting the Km4City Smart City API developed at DISIT Lab, and providing also geo‐faceting capabilities.
116 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana ‐ AMMA & DevDash Widgets and Functionalities Table: a table with data coming from the SOLR index, with a selection of columns and the possibility to order by column values, set clickable URL‐based fields to redirect the user to the corresponding web link, set pagination values etc.
117 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards
Banana ‐ AMMA & DevDash Widgets and Functionalities
118 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards Some Use Cases (A)
119 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 5. Producing Visual Analytic Tools – Banana Dashboards Some Use Cases (B)
(a) (b)
120 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it
Other Technologies / Approaches
121 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it
Possible integration between different frameworks
NiFi as a Producer NiFi acting as a Kafka producer.
NiFi as a Consumer In some scenarios an organization may already have an existing pipeline bringing data to Kafka. In this case NiFi can take on the role of a consumer and handle all of the logic for taking data from Kafka to wherever it needs to go.
122 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
NiFi Indexing to ElasticSearch
123 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
NiFi Indexing to ElasticSearch
[ . . . ] 124 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it 4. Distributed Storage and Indexing – Hbase Storing & Solr Indexing
NiFi Indexing to ElasticSearch
125 DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it
Performance Comparison: Solr VS Elasticsearch
Elasticsearch Elasticsearch
Solr Solr
126