INNOVATION ACTION

H2020 GRANT AGREEMENT NUMBER: 825171

WP3 – Vision Materialisation and Technical Infrastructure D3.1 – Tools assessment

Document Info Contractual Delivery Date: 30/06/2019 Actual Delivery Date: 30/06/2019 Responsible Beneficiary: UNIC Contributing Beneficiaries: SIV, INOV, UWA, UOG, EGR Dissemination Level: Public Version: 1.2 Type: Final

This project has received funding from the European Union’s H2020 research and innovation programme under the grant agreement No 825171

`

DOCUMENT INFORMATION

Document ID: D3.1 Tools assessment Version Date: 30/06/2019 Total Number of Pages: 37 Abstract: Report on the technical requirements to be covered by the EUNOMIA solution. The report presents the requirements, as well as describes the state-of-the-art of the technologies and tools involved. Keywords: Technical requirements, tools

VERSION HISTORY

Version Date Comments

0.1 01/06/2019 First draft of the ToC 0.2 15/06/2019 State-of-the-art 0.3 21/06/2019 Completed state-of-the-art 1.0 26/06/2019 Final draft 1.1 29/06/2019 Revision based on review 1.2 30/06/2019 Final

AUTHORS

Full Name Beneficiary / Organisation Role

University of Nicosia UNIC Editor University of West Attica UWA Contributor SIVECO Romania SA SIV Contributor University of Greenwich UOG Contributor, Reviewer INOV INESC INOVAÇÃO INOV Contributor, Reviewer Eugen Rochko EGR Contributor

REVIEWERS

Full Name Beneficiary / Organisation Date

INOV INESC INOVAÇÃO INOV 28/06/2019 University of Greenwich UOG 30/06/2019

Type of deliverable PUBLIC Page | 2 H2020 Grant Agreement Number: 825171 Document ID: WP2 / D2.4

EXECUTIVE SUMMARY

EUNOMIA is a fully decentralised, intermediary-free and open-source solution for addressing three key challenges: (a) which social media user is the original source of a piece of information; (b) how this information has spread and been modified in an information cascade; (c) and how likely it is to be trustworthy. Work package 3 sets the basis for the technical implementation of EUNOMIA by identifying the technical requirements and producing the specifications and architecture design. WP3 also progresses with the setup of the technical infrastructures and frameworks that will allow the development, operation and evaluation of the tools and prototypes of WP4. The scope of the current deliverable 3.1 is to identify the technical requirements and the actors/services of the EUNOMIA platform. The functional and non-functional requirements were extracted and analysed in deliverable 2.4: “User needs and requirements”, as well as the actors/services. In total, 26 actors, 61 functional and 21 non-functional requirements were identified. These will guide the development of all technical components in the project directly and indirectly by informing the specifications and architecture. Also, this deliverable documents the state-of-the-art technologies to be implemented, which are: infrastructure, content and context analysis, human-as-trust-sensor and user reputation mechanism, cybersecurity and privacy framework, digital companion and peer-to-peer infrastructure.

@Copyright of EUNOMIA Consortium Page iii `

TABLE OF CONTENTS

DOCUMENT INFORMATION ...... ii

VERSION HISTORY ...... ii

AUTHORS...... ii

REVIEWERS ...... ii

Executive Summary ...... iii

Table of Contents ...... iv

List of Figures ...... iv

List of Tables...... v

List of Acronyms and Abbreviations ...... vi

1. Introduction ...... 1 1.1 Relation to Other Tasks and Deliverables ...... 1

2. State-of-the-art Review...... 1 2.1 Blockchain infrastructure ...... 1 2.2 Content and Context Analysis ...... 4 2.3 Human-as-trust-sensor and reputation mechanism ...... 8 2.4 Cybersecurity and Privacy Framework ...... 11 2.5 Digital Companion ...... 12 2.6 Peer-to-peer Infrastructure ...... 17

3. Technical requirements identification ...... 20 3.1 Identified Actors ...... 20 3.2 Functional Requirements ...... 22 3.3 Non-functional requirements ...... 27

4. Conclusions ...... 30

5. References ...... 31 LIST OF FIGURES

Figure 1: Samples from the Quora dataset ...... 10 Figure 2: Web of Trust and NewsGuard Browser extensions ...... 13 Figure 3: Textbox screenshot ...... 14

Type of deliverable PUBLIC Page | 4 H2020 Grant Agreement Number: 825171 Document ID: WP2 / D2.4

LIST OF TABLES

Table 1: Comparison between , and Fabric ...... 2 Table 2: A sample of extracted patterns and their sentiment polarity ...... 6 Table 3: Sentiment analysis on a sample of opinion snippets (Twitter) ...... 6 Table 4: Opinion targets and their respective sentiment polarity from a sample of Twitter reviews ...... 7 Table 5: High-level summary of existing platforms ...... 8 Table 6: Cross-platform mobile development frameworks ...... 14 Table 7: Description of actors ...... 20 Table 8: MoSCoW method ...... 23 Table 9: Generic Functional Requirements ...... 23 Table 10: Functional Requirements per component ...... 24 Table 11: ISO/IEC 25010:2011 Software Product Quality Model Sub-Categories ...... 27 Table 12: Non-functional requirements ...... 28

@Copyright of EUNOMIA Consortium Page v `

LIST OF ACRONYMS AND ABBREVIATIONS

Term Description BI Blockchain Infrastructure CA Content and context anal CNN Convolutional Neural Network DNS Domain name system HaTS Human-as-Trust-Sensor HRM HaTS and reputation mechanism IPFS InterPlanetary File System LSTM Long Short-Term Memory MRPC Microsoft Research Paraphrase Corpus NLP Natural language processing P2P Peer-to-peer PI Paraphrase Identification URL Uniform Resource Locator

Type of deliverable PUBLIC Page | 6

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

1. INTRODUCTION

Work Package 3 sets the basis for the technical implementation of EUNOMIA by identifying the technical requirements. The EUNOMIA user needs and requirements will stem directly from the D2.4: Report on user needs and requirements. In addition to the functional and non-functional requirements, the scope is to identify and document a first version of the technical requirements of the platform. The identification and analysis of the functional and non-functional requirements, and their translation into technical requirements is a continuous process that will last until M7, and will constantly be updated and documented in the subsequent versions of this deliverable. Deliverable 3.1 is organized in two main sections as indicated in the table of contents. Section 2 documents the state-of-the-art technologies to be implemented. Section 3 describes the identified actors/services of the system as well as the functional and non-functional requirements.

1.1 Relation to Other Tasks and Deliverables

This report has a direct relation to the user requirements task and its associated deliverable (D2.4), and feeds directly into the Specifications and architecture design task (Task 3.2), as well as the individual technical development tasks of WP3 and WP4, namely: Task 3.3 Cyber security and privacy framework, Task 3.4 Blockchain infrastructure, Task 3.5 Peer to peer infrastructure, Task 4.1: Human-as-trust-sensor and reputation mechanism, Task 4.2: Social media content and context data collection and analysis, Task 4.3: Trustworthiness scoring, Task 4.4: Digital companion, and Task 4.5: Supervisory tools and digital observatory connection.

2. STATE-OF-THE-ART REVIEW

This section presents the state-of-the-art technologies reviewed for the EUNOMIA platform. The technologies have been categorized into six sections: Blockchain Infrastructure, Content and Context Analysis, HaTS and reputation mechanism, Cybersecurity and Privacy Framework, Digital Companion and Peer-to-peer Infrastructure.

2.1 Blockchain infrastructure

The rise of Bitcoin and over the last decade have made its underlying technology (blockchain) come into the spotlight. The blockchain immutability and decentralization properties have motivated the development of a range of applications that go beyond the initial goal of blockchain technologies (i.e., transfer of digital assets) and into identity management, data storage management, timestamping, etc. The incorporation of blockchain technologies into social media applications constitute a novel area posing a series of challenges spanning from the social aspects of the interaction design to core engineering issues such as the scalability of backend data management mechanisms. In this section, we focus on the state-of-the-art technologies. At a high level, a blockchain system can be categorized as either permissionless or permissioned. In a permissionless network or public network such as Bitcoin, Ethereum, anyone can join anonymously the network to perform transactions [Tien Tuan Anh Dinh et al., 2018]. In a permissioned or private network, the identity of each participant is known and the blockchain can store who performed which transaction. In addition, such a network enables its members to

Type of deliverable PUBLIC Page | 1 `

decide who can participate in the consensus mechanism of the network and who can validate transactions on the network [Parth Thakkar et al., 2018]. “A permissioned network is highly suitable for enterprise applications that require authenticated participants. Each node in a permissioned network can be owned by different organizations” [Parth Thakkar et al., 2018]. Hence, permissioned do not expend the amount of resources that open blockchains do and are able to reach better transaction latency and throughput. In addition, “it makes possible to control the set of participants tasked with maintaining the rendering this type of blockchain a more attractive solution for larger corporations, since it can be separated from the dark web or illegal activities” [Joao Sousa et al., 2018]. The utilization of blockchain technologies for social media applications can be broadly distinguished into services or data management. The vast majority of social media applications that are referred to as “blockchain-based” fall into the first category, where the blockchains are integrated in the periphery of the applications supporting several reward-related services. For example Synero and are using blockchain technology for facilitating the allocation of rewards, e.g., a user is given a reward (in the form of crypto-coin) for their posts. The Hyperledger Fabric (or simply, Fabric) is a system for deploying and operating permissioned blockchains that target business applications [Elli Androulaki et al., 2018]. It is maintained by and it has many unique properties suited for enterprise-class applications. It’s architecture is modular, allowing components, such as consensus mechanism and membership services, to be plug-and-play. It leverages container technology (docker) to enable smart contracts called “chaincode” that comprises the application logic of the system [Hyperledger Fabric 2019]. It can run arbitrary smart contracts (chaincodes) implemented in Go/ JAVA/ Nodejs/ Javascript language and all nodes that participate in the network have an identity, as provided by a modular membership service provider (MSP) [Harish Sukhwani et al., 2018]. Also, in a Fabric network, nodes are not publicly discoverable and the owners of nodes have to agree to permit their nodes to message each other [Hyperledger Fabric, 2019]. Fabric can leverage consensus protocols that do not require a native to incent costly mining or to fuel execution. Avoidance of a cryptocurrency reduces some significant risk/attack vectors, and absence of cryptographic mining operations means that the platform can be deployed with roughly the same operational cost as any other distributed system [Hyperledger Fabric, 2019]. According to Suporn Pongnumkul et al., (2017) experimental results, based on varying number of transactions, show that Hyperledger Fabric consistently outperforms Ethereum across all evaluation metrics which are execution time, latency and throughput. Also, table 1 is a comparison between Ethereum, Bitcoin and Hyperledger Fabric: Among a long list of blockchain characteristics, we have picked out the most important ones: Table 1: Comparison between Ethereum, Bitcoin and Hyperledger Fabric

Characteristic Bitcoin Ethereum Hyperledger Fabric

Blockchain Type Public Public Private

Pluggable Kafka Consensus Mechanism Proof-of-Work Hybrid, towards PoS (default)

Type of deliverable PUBLIC Page | 2

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

No (But it can be Native currency Bitcoin (btc) Ether (eth) programmable through chaincode)

Programming Java, Go, Nodejs, Bitcoin script language Javascript etc.

Smart contracts No Yes Yes

Read Access Public Public Private

Write Access Public Public Private

Blockchain Type • In a permissionless (public) blockchain anyone can join anonymously the network to perform transactions.There are no restrictions on the identities, thus everyone can start mining to create blocks. This is the case in Bitcoin and Ethereum. • In a permissioned (private) blockchain network, the identity of each participant is known and the blockchain can store who performed which transaction.Transaction processing is performed by predefined users. This is the case in Hyperledger Fabric. Consensus Mechanism The consensus mechanism is a means to determine consensus about all transactions and the current state of the system. The mechanism ensures that transactions will only be added to the blockchain if are valid and never recorded more than once. • Proof-of-Work (PoW): Miners have to solve a computational difficult problem to ensure the validity of new transactions (Bitcoin/Ethereum). • Proof-of-Stake: Miners can create a new block depending on their investment and ownership to the system. • Kafka is the Hyperledger Fabric ordering mechanism. This ordering mechanism utilizes Apache Kafka, an open source stream processing platform that provides a unified, high- throughput, low-latency platform for handling real-time data feeds. In this case, the data consists of endorsed transactions and RW sets. The Kafka mechanism provides a crash fault-tolerant solution to ordering. Native currency: Native currency refers to whether the blockchain has an inherent currency. For example, Bitcoin uses its currency “bitcoin” as medium for exchange. Ethereum uses “ether”. But Hyperledger Fabric does not use an own currency. The lack of a native currency also allows for scalable consensus algorithm whereby the network can process at high transaction rates. Programming language: Another critical difference in the Hyperledger vs. Ethereum is the programming language used by the two frameworks. Ethereum smart contracts rely on high-level oriented programming language Solidity. Hyperledger on its part relies on “chaincode” which is a synonym for intelligent contract and handles business logic agreed by members in the network. The chaincodes (Smart Contracts) are written in Java, Go, Nodejs, Javascript. Bitcoin uses a

Type of deliverable PUBLIC Page | 3 `

scripting system for transactions. Is simple, stack-based, and processed from left to right. It is intentionally not Turing-complete, with no loops. Smart contracts: Some blockchains such as Ethereum and Hyperledger Fabric provide developers with a Turing-complete scripting language. Developers can create smart-contracts that can interact with each other and form decentralized applications. Blockchains, such as Bitcoin, only provide a very limited stack-based programming possibility. This makes application development very difficult and sometimes not viable. Read Access: There are no restrictions on reading transaction data. Everyone can download the blockchain ledger and view all transactions. This is the case of Bitcoin and Ethereum. Write Access: Direct access to blockchain data is limited to predefined users. Thus, only participants that are registered on the blockchain network can download the ledger.This is the case of Hyperledger Fabric. Hyperledger Fabric delivers some key differentiating capabilities over other popular or blockchain platforms. One key point of differentiation is that Hyperledger was established under the Linux Foundation. Fabric is the first distributed ledger platform to support smart contracts authored in general-purpose programming languages, rather than constrained domain specific languages (DSL). Fabric combine different features in terms of transaction processing, transaction latency, and it enables privacy and confidentiality. We have chosen to use the blockchain technology for EUNOMIA platform because we benefit from the advantages this technology provides which are: decentralization, immutability, security, and transparency. • The blockchain technology allows for verification without having to be dependent on third- parties. • The data structure in the blockchain is append-only. So, the data cannot be altered or deleted. • It uses protected cryptography to secure the distributed ledger. • All the transactions and data are attached to the block after the process of verification. There is a consensus of all the ledger participants on what is to be recorded in the block. • The transactions are recorded in chronological order. Thus, all the blocks in the blockchain are time stamped. • The ledger is distributed across every single node in the blockchain network. • The transactions that take place are transparent. The individuals who are provided authority can view the transactions. With the smart contracts, EUNOMIA can pre-set conditions on the blockchain. The automatic transactions are triggered only when the conditions are met.

2.2 Content and Context Analysis

Sentiment analysis is the process of extracting valuable information from user-generated opinionated content. This content may exist in various online sources and formats like product reviews, discussion forums or social networks. As the amount of online content grows, the importance of tools that automatically analyze such information sources becomes a necessity. Sentiment analysis lies in this category of tools, and it has high commercial value.

Type of deliverable PUBLIC Page | 4

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

In this area, many approaches have been developed. The dictionary-based approach is a representative example. In this approach, there is an opinion word lexicon which consists of two opinion word classes. Positive words such as “good, handy, impressive” followed by negative like “bad, useless, trash”. Such generic lexicons are already available but, as expected, fall short in adapting to data coming from diverse domains. This requires the utilization of domain-specific dictionaries of opinion words. Currently, there are three main lines of research for tackling the task of opinion word extraction: (a) semantic thesaurus-based, (b) corpus-based and (c) pattern-based methods (a) Semantic thesaurus-based approaches: Some studies have proposed to utilize existing semantic thesauri for sentiment lexicon construction, like Word-Net and General Inquirer for English, Hownet for Chinese. This category of studies mainly depends on the synonym and antonym relationship among sentiment terms and the lists of related words in thesauri to expand the polarity lexicon from a sentiment word seed list. Hu and Liu , (2004) used adjective synonym and antonym lists in Word-Net to predict the semantic orientations for adjectives. Kamps et al., (2004) built a polarity lexicon by linking synonyms provided by thesauri, and the sentiment polarity was defined by the distance from seed words (“good” and “bad”). Esuli and Sebastiani, (2006), Baccianella et al., (2010) proposed to use the synsets in Word-Net as the sentiment words, and annotated their positive, neutral and negative polarity scores. The annotation process was divided into a semi-supervised learning stage and a random walking learning stage. The above methods rely on the assumption that adjectives tend to have the same polarities with their synonyms and opposite polarities with their antonyms. The first disadvantage of these set of approaches is that they totally rely on prior rich and complicated resources and thus cannot be applied to languages where such resources are not available. The second disadvantage is that they do not consider domain-dependent characteristics of sentiment lexicons, such as polarity variations. (b) Corpus-based approaches: The corpus-based approaches are built on the assumption that there is co-occurrence of sentiment words that share the same polarity. One of the most representative survey in this field is that of Turney PD, (2002), that focuses on learning polarities from a corpus. The adjective and adverb phrases were firstly extracted as candidate sentiment terms using pattern rules and their polarity was determined based on the co-occurrence of words like “excellent” or “poor” (which are considered seed words). The co-occurrence was measured by the number of hits returned by a search engine. Hatzivassiloglou and McKeown, (1997) approach relies on an analysis of textual corpora that correlates linguistic features, or indicators, with semantic orientation. No direct indicators of positive or negative semantic orientation have been proposed but demonstrated that conjunctions between adjectives provide indirect information about orientation. No direct indicators of positive or negative semantic orientation have been proposed but demonstrated that conjunctions between adjectives provide indirect information about orientation. Popescu and Etzioni, (2005) introduce an unsupervised, high- precision information extraction system which mines product reviews in order to build a model of product features and their evaluation by reviewers. The method iteratively assigns polarity to words by using various features including intra-sentential co-occurrence and synonyms/antonyms of a thesaurus. Corpus-based approaches like the ones mentioned above are intuitive, but they are quite limited since they capture only one type of simple relationship (co-occurrence). Our method captures more complicated syntactic relationships that form the extraction patterns. These patterns can be utilized not only for discovering opinion words but also for sentiment classification.

Type of deliverable PUBLIC Page | 5 `

(c) Pattern-based approaches: These techniques utilize patterns in order to extract opinion words. Note that there is an overlap with corpus-based approaches described in the previous section since there are techniques that analyze a corpus in order to discover such patterns. Turney et al., (2002), for example, uses part of speech (pos) tagging in each review sentence. Some predefined patterns like adjective–noun and adjective–adjective are considered, and after that, for each extracted phrase, the semantic orientation is inferred by comparing its similarity to a positive reference word (e.g. “excellent”) and a negative reference word (e.g. “poor”). In Hatzivassiloglou and McKeown, (1997), the focus is on syntactic patterns. Adjectives are considered opinion words, and word orientations are calculated via the use of conjunctions links and with the help of some adjectives from the corpus with already known polarities. Then, the lexicon is expanded with the use of antonym–synonym relations. In Popescu and Etzioni, (2005), Word-Net relationships and morphological cues (e.g. suffixes) are being used, noun phrases are extracted from already defined patterns and some domain-independent rules for the extraction of potential opinion phrases are introduced. In Wilson et al., (2005), the polarity of words from predefined lexicons is taken into account and with the help of morphological cues, adverbs and adjectives are being searched within each sentence and within the previous and next sentences. Qiu et al., (2011) and Qiu et al., (2009) exploit direct and indirect relations between sentiment words and topics (or product features). Adjectives are considered opinion words, nouns opinion targets, while specific patterns are introduced. We argue and demonstrate that pattern-based sentiment analysis is beneficial since it captures complex sentiment dependencies in the use of language. For this reason, we employ a mechanism (DidaxTo) [Agathangelou et al., 2017] an algorithm and graphic user interface tool that assigns polarity for each pattern our method discovers. This enables the utilization of our patterns directly in sentiment analysis without the need of extracting opinion words first. A demonstration of content analysis on a sample of twitter reviews follows: Table 2: A sample of extracted patterns and their sentiment polarity

Table 3: Sentiment analysis on a sample of opinion snippets (Twitter)

Type of deliverable PUBLIC Page | 6

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

Table 4: Opinion targets and their respective sentiment polarity from a sample of Twitter reviews

More sophisticated models tackling the opinion mining problem employ and embed the use of neural networks. In the recent history of neural networks there were many attempts towards the effort to model sequential data like language. A promising solution was given by the development of the LSTM architecture [S. Hochreiter et al., 1997]. The main attribute in this model is that the main block (which is called “memory cell”) is able to preserve states over long sequences, and also distinguishes inner actions from the outside world. A more sophisticated version of the LSTM is the Hybrid Bi-Directional LSTM that captures semantic representations in both directions introduced in [A. Graves, et al., 2013]. However, one of the disadvantages of the above methods is their difficulty to train successfully over long sequences. In addition, simple recurrent neural networks tend to have a bias over the last sequences [T. Mikolov et al., 2011]. In our method [Agathangelou and Katakis, 2019] we alleviate this issue by first introducing a sentence embedding over a single word embedding in an RNN input which significantly reduces the number of sequences. Secondly, we alleviate the bias problem by adopting a Bi-directional LSTM structure which exploits separately the forward and the backward semantic knowledge tree. Multilayer Convolutional Neural Networks [Bengio et al., 1995] are feed forward neural networks whose architectures are tailored for minimizing the sensitivity to translations, rotations, or distortions of an input image. Such networks have also shown remarkable behaviour to time series data such as text classification [LeCun et al., 1995]. In [Agathangelou and Katakis, 2019], we have adopted a simplified version of the convolutional structure that was utilized in [Bengio et al., 1995] and we apply a max pooling operation over the convolution layers of the multifilter convolution model we propose. By adopting the above combination of processes in the convolutional layer we aim at extracting a sentence embedding that will feed the recurrent steps of the proposed model.

Type of deliverable PUBLIC Page | 7 `

2.3 Human-as-trust-sensor and reputation mechanism

Human-as-Trust-Sensor (HaTS) is a paradigm which leverages human sensing capabilities for the analysis and detection of misinformation. HaTS itself represents a paradigm of human-based information trustworthiness assessment and therefore do not directly map to any individual scientific or technological state-of-the-art. However, in developing and deploying HaTS capabilities, this typically requires two facilities 1) a way in which to collect and structure data for human analysis and assessment, 2) an integrated and responsive facility on user platforms in which to exercise a HaTS role, by means of visualising information cascades and voting on a series of indicative information trustworthiness metrics which have been selected for review. Here, the Reputation Mechanism specifically corresponds to the information cascades that provides data attributes for HaTS analysis to interpret and evaluate the reputation of information present in the cascade. This evaluation primarily tries to answer the following questions: what is the source of an information cascade and importantly how it might have changed, what metrics can be assessed to score the trustworthiness of the information cascade and source. In the table below we provide a high-level summary of the existing platforms and services which employ HaTS and Reputation Mechanism functionality to support the assessment of information trustworthiness. This summary evaluates HaTS state-of-the-art by assessing whether the capabilities implement source verification, information cascade, trustworthiness scoring, whether the system requires expert curation (or is crowdsourced) and if users are involved in the assessment of information trustworthiness. Table 5: High-level summary of existing platforms

Existing platform/product Type Verified Information Scores Users are Expert cascade trustw’ness involved curation source

https://www.snopes.com Website no no no no yes

https://fullfact.org Website yes no no no yes

https://www.factcheck.org/ Website no no no no yes

http://blogs.channel4.com/fa Website no no no no yes ctcheck

Social media fact checking Patent no no no no yes method and system1

Efficient fact checking method Patent no no no no no and system2

1 https://patents.google.com/patent/US8458046B2/en 2 https://patents.google.com/patent/US8990234

Type of deliverable PUBLIC Page | 8

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

Truthnest3 Software yes no yes no no

https://www.washingtonpost. Website no no no no yes com/news/fact-checker/

https://africacheck.org Website no no yes no yes

http://piaui.folha.uol.com.br/l Website no no yes no Yes upa/

http://www.politifact.com/ Website no no yes no no

http://factscan.ca/scoring/ Website no no yes no no

Outside of platforms and services for HaTS, which combines source verification, information cascade, trustworthiness scoring and expert curation functionality, an interesting sub-field which has direction application for detecting information cascades in social media has emerged in the detection of paraphrasing. Paraphrasing aims to assess the similarity between a series of text sentences (e.g., two or more), which can be a key indicator of how information from one source (e.g., a social media post) has been used in another related form. With respect to information cascades, paraphrasing is a key enabler of automatically identifying the spread of related information across a large pool data; which is usually a manual and challenging task for a human user requiring significant effort and time. From a technical perspective, Paraphrase Identification (PI) is a useful Natural language processing (NLP) application, which is considered to be a subtask of Semantic Textual Analysis [Conneau and Kiela, 2018]. Its main objective is to detect if a sentence has been expressed using other words, by identifying if the semantics are kept the same or it has a different meaning. One widely used application of PI is to discover if there are duplicate questions in question and answer (QA) forums, which increases the efficiency of such forums by grouping together focused answers to users’ queries [Wang et al., 2017]. To this end, Quora has published a dataset containing over 400,000 question pairs, where each question pair is annotated with a binary value (i.e., 0 or 1) indicating whether the two questions are paraphrase of each other. Moreover, the Microsoft Research Paraphrase Corpus (MRPC) [Dolan et al., 2004] is composed of pairs of sentences which have been extracted from news sources on the Web. The sentence pairs have been annotated by human according to whether they capture the same semantic content.

3 http://www.truthnest.com/

Type of deliverable PUBLIC Page | 9 `

Figure 1: Samples from the Quora dataset PI is a well-studied task in NLP that is gaining more attention due to the QA forums [He et al., 2015; Wang et al., 2016]. Wang et al., (2016) follow the Siamese Network approach, which has been successfully applied to the face identification task (Chopra et al., 2005), to implement two baseline models: “Siamese-CNN” and “Siamese-LSTM”, with the latter achieving around 4% higher accuracy on the Quora dataset. Both are considered to be semi-supervised approaches that encode the two input sentences into sentence vectors with a neural network encoder, and make a decision based on the cosine similarity between the two sentence vectors. A most recent work of this research team [Wang et al., 2017] use a bilateral multi-perspective matching model based on a character-based Long Short-Term Memory (LSTM) [Hochreiter and Schmidhuber, 1997] at its input layer, a layer of Bi-directional LSTMs for computing context information, followed by four different types of multi-perspective matching layers, an additional bi-LSTM aggregation layer, and a two-layer fully connected network that outputs the predicted label. This network increased the state-of-the-art performance on the Quora dataset by around 5%. Similar performance is achieved by the decomposable attention model that utilizes four simple feedforward networks for self-attention and is based on the decomposable attention model of [Parikh et al., 2016] but relies on with character n-gram embeddings and is pretrained on automatically labeled noisy, but task-specific data. Finally, Subramanian et al., (2018) present a multi-task deep learning framework for learning general-purpose fixed-length sentence representations. This network is capable of encapsulating the inductive biases of several diverse training signals, such as Neural Machine Translation, Sentiment Analysis and Paraphrase Identification, into a single model. In their work, they demonstrate that the learned sentence representations yield competitive or superior results to previous general-purpose representation methods in most of the applied tasks, achieving also state-of-the-art performance (84.4% accuracy) in the MRPC dataset. There exist a number of outstanding challenges when employing paraphrasing techniques across a range of platform contexts, such as social media. For example, currently, paraphrase detection for short text samples, such as those commonly found on Twitter, and text samples with language irregularity and noise are difficult to detect accurately. Further, lexical-based matching techniques often fail to identify the similarity between sentences which use different, but synonymous, words to convey the same meaning. Another challenge for the semantic similarity research community is the need for more consistency in terms of data set annotation, e.g. the levels of annotation granularity. In fact, paraphrasing alone may prove insufficient for information cascade generation and therefore existing methods of similarity comparison between images, URLs, hashtags, user mentions in social media post data offer potentially further information cascade enrichment [Sawant et al., 2011, Jalili M and Perc, 2017, Culotta, 2003, Javed and Byung, 2016, Neeraja and Prakash, 2016].

Type of deliverable PUBLIC Page | 10

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

2.4 Cybersecurity and Privacy Framework

The cybersecurity and privacy framework have the objective to provide in EUNOMIA functions for supporting authentication, confidentiality and integrity of data used in the different modules of the system, whenever this functionality is not available. Additionally, it will implement the cryptographic protocol used for voting according with the defined privacy and security requirements. The concept of access-control to data and formal security models for computer systems, services and applications, implementing segregation and user roles in computer systems dates back to the first multi-user systems in the 1960s [Dennis, 1965]. Many of today’s software platforms which assures a high level of protection of sensitive data offers fixed and rigid constraints on how sensitive information is used, usually that also creates restrictions on the openness of the platform to third-party entities. That leaves developers and users with an unfortunate trade-off between convenience, added value and privacy. In traditional access control approaches, such as Role Based Access Control (RBAC) [Sandhu, 1998] or in more recent ones such as Attribute Based Access Control (ABAC) [Li, Mitchell and Winsborough, 2002], a set of rules that express the allowed operations over data are compiled into policy descriptions used in the scope of a policy-based management framework [Han and Lei, 2012; Verma, 2002] and enforced at specific trusted Policy Enforcement Points (PEP). Most modern applications nowadays are either Web based, or access to data through some type of Web services where similar enforcement approaches based on ABAC have been proposed [Yuan and Tong, 2005]. A different approach, W5 [Krohn, Yip, Brodsky, Morris and Walfish, 2007] propose a separation of user data and policies from the application logic. This architecture uses IFC (information flow control) and a well-defined security perimeter to address the issues commonly occurring with current architectures. These control mechanisms mostly based on trusted policy enforcement points are an awkward fit for modern frameworks that incorporate third-party sub-systems, where data is often distributed among mobile devices, and different not equally trustworthy entitles, but still must protect user data from inappropriate access, modification or sharing. Additionally, defining trusted components in a system creates a huge dependency on the actual security of those trusted components, if they fail the security of all the system fails. Consequently, a more decentralized and cryptographically enforced security model would be more adequate to address current challenges. Attempts to build systems that resort to searchable encryption to protect user data in servers, while still allowing some server side functionality aren’t new [Popa et al., 2014], [Fuller et al., 2017] and still present considerable design and implementation challenges [Grubbs, McPherson, Naveed, Ristenpart and Shmatikov 2016]. Attribute based encryption (ABE) has been widely used for preserving privacy in cloud based services [Narayan et al., 2010]. Several proposals for implementing self-destructing data, such as Vanish [Geambasu et al., 2009] have been presented, although with some limitations [Wolchok et al., 2010]. Indeed, the management of encrypted data and the revocation of access/keys are relevant topics; they have been raising concerns namely on cloud-based services [Wang et al., 2016]. Homomorphic encryption gives room for revocation or update of keys without revealing private keys nor plain

Type of deliverable PUBLIC Page | 11 `

data. Encrypted data is re-encrypted so that private keys are renewed and plain data is not revealed. Coupling this with Attribute Based Encryption [Bethencourt et al., 2007], up to date data is ensured to be accessed and/or edited only by legitimate third-parties, according to the police in place [Wang et al., 2016]. The security of a ranking system is paramount to its trustfulness and credibility, especially in relation to ensuring privacy, confidentiality, anonymity, authentication, and integrity in the voting system [Delaune and Hirschi, 2017]: privacy ensures that a users’ vote is not revealed to someone else; confidentiality ensures that a secret token/nonce identifies the user and is not made public; anonymity stands for the absence of any attribute identifying the user in the exchanged messages; authentication ensures that a user is who is saying to be; integrity ensures that the data was not forged or lost. These properties are usually ensured by the implementation of suitable cryptographic protocols and functions. Actually, all these properties have been widely researched and used in the blockchain technology [Zheng et al., 2016]. One of the reasons that make the blockchain technology [Zheng et al., 2016] anonymous and confidential stands on enabling the creation of different addresses for every single transaction [Swan, 2015]. The single-use token is fundamental to ensure that the transactions will not be linked and traced back to a given user. They are authenticated as an address and its ownership is of users' responsibility through the possession of private keys to access them. Obviously, the latter is only possible by the usage of asymmetric cryptography [Sharma, 2018]. The trustworthiness of systems providing voting services should also be carefully evaluated, namely considering the usage of protocols [Neff, 2001] which allow to maintain the privacy of the voters even in untrusted platforms. All these elements may provide a framework ensuring authentication, confidentiality, anonymity. GDPR has come to highlight the need of data minimization in any exchange of sensitive data. This requires that cryptographic protocols enforce data minimization in the design phase.

2.5 Digital Companion

The digital companion is the actual interface between the end user and the rest of the EUNOMIA platform elements. It is to be deployed on both desktop and mobile devices allowing the active involvement of the social media users via a Human-as-A-Trust-Sensor logic (“trust” / “don’t trust” button). The user will be able to link existing decentralized social media accounts with EUNOMIA, view the results of a post’s sentiment analysis, as well as other post related information indicators (e.g., the votes of other users about the post). In the scope of reputation scoring, Web Of Trust4 is a tool that provides safety and security ratings for visited websites and search engine results. It is mostly based on user ratings, utilizing also some third-party trusted sources, such as phishing directories, and displays reputation icons next to search results, social media, and other popular sites to help users make informed decisions online. Moreover, Microsoft provides a plugin called NewsGuard5 for its Edge web browser, that uses “Green-Red” ratings to signal if a website is trying to get it right or instead has a hidden agenda or knowingly publishes falsehoods or propaganda, giving readers more context about

4 https://www.mywot.com 5 https://www.microsoft.com/en-us/p/newsguard/9nwp4lmmkfkt

Type of deliverable PUBLIC Page | 12

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1 their news online. To this end, NewsGuard relies on trained analysts, who are experienced journalists, research online news brands and check the validity of the produced news.

Figure 2: Web of Trust and NewsGuard Browser extensions

Apart from the aforementioned HaTS client applications, Media Bias/Fact Check6 is a large media bias resource that also provides Firefox and Chrome extensions which displays a color-coded icon denoting the bias of the page one is currently viewing, according to their analysis results. Existing platforms like twitter are also working towards the use of crowdsourcing tools, as a part of the company’s “battle against rampant abuse on its platform” [Dwoskin, 2017]. Furthermore, WhatsApp provides a tip-line to which one can send forwards, rumors, and suspicious-sounding messages and have them verified [Ghoshal, 2019]. Regarding text analysis and fake news detection, TextBox (Figure 3) and FakeBox, are two tools by MachineBox7 that process text, perform natural language processing, sentiment analysis, entity and keyword extraction and try to assess whether news articles are likely to be real news or not. Users can interact with these services using a web browser, being able to provide the content they want to be analyzed and view the results.

6 https://mediabiasfactcheck.com 7 https://machinebox.io

Type of deliverable PUBLIC Page | 13 `

Figure 3: Textbox screenshot In the scope of the technological tools that are used for developing end-user clients, a usual decision to be made is the approach to take for multiplatform (or as more commonly used cross- platform) delivery. When a mobile application is to be deployed in more than one mobile operating systems (i.e., Android based or Apple’s iOS based), a common practice is the use of one of the several cross-platform frameworks. Following the ever increasing adoption of web development technologies for cross-platform application development, existing frameworks are evolved and new ones are deployed. Biorn-Hansen et al. [Biorn-Hansen et al., 2018] provide an extensive overview of the several trending approaches and frameworks that make use of them, regarding cross-platform mobile development. They categorize the frameworks based on five approaches: hybrid, interpreted, cross-compiled, model-driven and progressive web apps.

Table 6: Cross-platform mobile development frameworks

Hybrid Interpreted Cross-compiled Model-Driven Progressive Web Apps

Cordova React Native Xamarin MD Ionic Framework

Capacitor NativeScript Flutter MobML Zuix

PhoneGap Tabris.js Codename One Applause Mithril

Ionic Framework Fusetools Xojo Mobile Mobl Polymer

Type of deliverable PUBLIC Page | 14

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

Onsen UI Mosync MonoCross Mendix Svelte

Framework7 Titanium Corona Appian Preact Appcelerator

Quasar Adobe AIR Apportable MAML Vue.js Framework

AppGyver Smartface Cloud Qt Mobile Mobia Modeler Angular

Quasar Weex MoSync Xmob React.js Framework

Sencha Touch Kony RAD Studio AXIOM Stencil.js (Delphi)

Intel App Jasonette Crosslight MOPPET Glimmer.js Framework

Intel XDK Lua View RoboVM mdsl Ember.js

RhoMobile Marmalade Automobile viperHTML

Kony Rhodes WebRatio Moon.js

EvoThings DragonRAD XIS-Mobile

NSB/AppStudio MobDSL

Cocoon

Trigger.io

In a related survey questionnaire [Biorn-Hansen et al., 2019] about cross-platform development for mobile devices, Phonegap8 and React Native9 are the two frameworks that rank first in terms of framework familiarity, interest and usage.

8 https://phonegap.com 9 https://facebook.github.io/react-native

Type of deliverable PUBLIC Page | 15 `

The hybrid approach works by initializing a native application that includes a WebView component, that is an embeddable web browser. An API (Application Programming Interface) is included in each framework for a two-way communication between the WebView and Native code. Instead of rendering a bundled web page, the interpreted approach makes use of on-device JavaScript interpreters, in order to render native user interface components to the screen. Like in the hybrid approach, communication between the JavaScript layer and the Native code layer is achieved through the use of modules and plugins that provide bridging APIs. Due to the fact that code is compiled to Native byte code, in the cross-compiled approach, the use of a WebView (as a layer) is not necessary. Native device features are rather provided via the framework’s platform’s Software Development Kit (SDK), which in turn maps the functionality to the underlying platform’s SDK. Like in the interpreted approach, many frameworks do provide communication channels to the latter SDK, allowing developers to include native code when necessary. In the model-driven approach, like in the Model-Driven Development (MDD) paradigm that it comes from, the development of an application requires knowledge of the specific framework’s domain-specific language (DSL), rather than the device’s operating system language (i.e. Java/Kotlin for Android and Objective-C/Swift for iOS). The progressive web apps approach is a recently increasingly gaining popularity approach, which makes use like in the hybrid approach, of HTML, CSS and JavaScript. An offline-first approach is used, with the necessary web assets downloaded locally on the device, and Service Workers, i.e. scripts written in JavaScript able to run to execute in the background, to handle the application’s lifecycle, business logic, data synchronization and notifications. As Table 6 presents, JavaScript is the most widely used programming language when it comes to cross-platform mobile programming. It is also a common choice for cross-platform desktop applications. Electron.js10 and NW.js11 are the most commonly used frameworks that allow the development of desktop applications with JavaScript and HTML along with Node.js integration to grant access on the web pages to the low level (operating) system. Although this approach has the disadvantage of greater memory usage and file size than other cross-platform desktop development approaches like using C++ or Python with Qt for the user interface, it is in many cases preferred as being faster deployed and easier to be updated. The electron framework is the one used in existing desktop clients for the decentralized social network mastodon. For example, both Whalebird12 and Mstdn13 make use of electron as an operating system bridge and are available for the three most common desktop operating systems (i.e., Windows, Mac OS X, and Linux).

10 https://electronjs.org 11 https://nwjs.io 12 https://github.com/h3poteto/whalebird-desktop 13 https://github.com/rhysd/Mstdn

Type of deliverable PUBLIC Page | 16

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

2.6 Peer-to-peer Infrastructure

Peer-to-Peer (P2P) technology is a distributed application architecture, that partitions tasks, workloads and information among peers. While the concept had been previously implemented in other forms like the DNS system for example, whose distributed database is a good example of data partitioning between a set of nodes, the concept only really took interest from the market with the birth of companies like Napster, Gnutella and Kazaa, that really took to the masses P2P technology in the form of file sharing. Since then several protocols emerged based on the concept of a P2P extended in a form of overlay over the IP network [Malatras, 2015]. Chord [Stoica et al., 2003] is one of the first and most popular protocols for structured P2P overlays. It addresses the issue of locating nodes in P2P overlays that hold specific data items, i.e. keys, by using distributed hash tables (DHTs). In particular, Chord utilizes consistent hashing [Karger et al., 1997] for the assignment of keys to nodes and data in order to balance related workload of nodes, since it leads to nodes holding approximately equal number of keys. Moreover, Chord is resilient to churn and is fully decentralized and distributed. The main design goals of the distributed and decentralized structured content addressable network (CAN) overlay [Ratnasamy et al., 2001] include self-organization, fault tolerance and scalability. It is also based on the DHT concept, but instead of organizing the overlay nodes in a ring like Chord it considers them as points in a d-dimensional coordinate space, where d is a parameter of the CAN protocol. Every node of the CAN P2P overlay is in charge of a particular area of the d-dimensional coordinate space, which is distinct and non-overlapping to that of other nodes. With many similarities to Chord, Pastry [Rowstron and Druschel, 2001] P2P overlay is a structured, self-organized and fully decentralized overlay aiming at scalable resource discovery and routing, while at the same time taking into account network locality to reduce average path length. Kademlia [Maymounkov and Mazieres, 2002] is a decentralized structured P2P overlay that utilizes consistent hashing to map identifiers to keys and node and it exploits a XOR-based metric to compute the distance, i.e. the closeness, between identifiers. It was designed based on observations regarding existing P2P overlays’ operation. SkipNet overlay network [Harvey and Munro, 2003] is a distributed, decentralized structured P2P overlay that draws inspiration from the Skip Lists data structure [Pugh 1990], similar to the work discussed in [Aspnes and Shah, 2007]. SkipNet, conversely to other DHT-based approaches, allows for controlled data placement. Freenet [Freenet, 2018] is a distributed, unstructured P2P overlay network that was built in order to allow anonymous access to and handling of data (publication, replication and retrieval) over wide network infrastructures. The main design goal of protecting the privacy of data publishers is what drove the decision towards a completely decentralized and distributed approach, with no single point of failures. BitTorrent [Cohen, 2003] is one of the most popular P2P file-sharing systems available nowadays with millions of active users, as of 2013, BitTorrent has 15 to 27 million concurrent users worldwide [Wang and Kangasharju, 2013]. The BitTorrent protocol specification 3 [Bittorrent, 2019] and [Cohen, 2003] do not classify BitTorrent as an unstructured P2P overlay but the connections between peers and their adaptive management does indeed create a P2P overlay network

Type of deliverable PUBLIC Page | 17 `

topology. Since the latter is not built on any strict rules, but is rather subject to random and uncoordinated nodes’ activities, so it is classified as an unstructured one. Unstructured Multisource Multicast [Ripeanu et al., 2010] is an unstructured P2P overlay that aims at addressing the problem of group communications by means of multicasting. UMM is fully distributed and is targeted at dynamic environments. Several solutions exist to support these concepts of P2P storing such as: Resílio, Storj, BitTorrent- tracker and clients, Ias2peer, Barrel, ScuttleBot, TomP2P, etc. Resílio Connect [Resilio, 2018] is a commercial product that provides a scalable, P2P solution used for moving and syncing data, while enabling data to be shared across several peers. Storj [Storj, 2018] is also a commercial product that delivers object storage to his customers using the blockchain technology, it can be used to store raw information or structured information like transactions. Existing since 2003, several open-source projects exists that provides all means and software needed to setup a BitTorrent network. The tracker [bittorrent-tracker, 2014] and also several clients such as qBittorrent [Qbittorrent, 2018] and transmission [Transmission 2018]. las2peer [Klamma et al., 2016] is a Java-based Open Source framework for distributing community services in a peer-to-peer infrastructure and it is being developed by the ACIS Group which is part of the Chair Of Computer Science 5 at RTWH Aachen University. Main goals are easy development and deployment of services. It provides distributed data storage and communication encryption and also allows to write RESTFull APIs in order to allow the integration and interaction with external systems. Barrel [Barrel, 2018] is a modern document-oriented database in Erlang, with an open source licensing, focusing on data locality (put/match the data next to you) and P2P, allowing master- master replication, providing configurable redundancy, making sure that even when the hardware become offline it will be possible to exchange data with other devices on the platform. It provides an HTTP API allowing for external application integration. Scuttlebot [Scuttlebot, 2018] is an open source P2P log store. It can be used as a database, and identity provider, and provides also a messaging system. It features global replication, file- synchronization, and end-to-end encryption. Scuttlebot forms a global cryptographic social network with its peers. Each user is identified by a public key, and publishes a log of signed messages, which other users can follow socially. Scuttlebot searches the P2P mesh for new messages and files from followed users and from FoaFs (friend of a friend's). The messages and files are stored locally, indefinitely, for applications to read. It provides an API for external application integration and features global replication. TomP2P [TomP2P, 2018] is a DHT with added features. Allowing storing multiple values for a key. Each peer has a table, that can be disk-based or memory-based to store its values. A single value can be queried and updated with a secondary key. The underlying communication framework uses Java NIO to handle many concurrent connections. The biggest difference from the previous solution is it's only possible to read and right raw key values, instead of allowing messaging and other extended features. In TomP2P there are indirect and direct replication mechanisms available, the direct replication can be described as peers constantly publishing their content and the indirect as peers are publishing content for others.

Type of deliverable PUBLIC Page | 18

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

A large number of distributed file system based P2P technologies exist [Hasan et al., 2005] given that a file system interface is the most adequate interface to general applications, such as: P2PFS, XtreemFS, Tahoe-iafs, IPFS, Ivy, etc. P2FFS is a file system implemented on top of FUSE using the P2P Kademlia [Campos, 2013] protocol. The P2PFS is supported by the P2P Kadmilia implementation tomp2p that is a P2P library and a DHT implementation, providing a decentralized key-value infrastructure for distributed applications. XtreemFS [XtreemFS, 2018] is a general purpose storage system that covers most of the storage needs in a single deployment. It is open-source and requires no special hardware or kernel modules. Can be used as a file system mounted on several OSs like: Linux, Windows and OS X. Similar to the Freenet, Tahoe-lafs is a free and open decentralized cloud storage system [Tahoe- LAFS, 2018]. It distributes data across multiple servers. Even if some of the servers fail or are taken over by an attacker, the entire file store continues to function correctly, preserving your privacy and security. Like P2PFS, it provides a file system transparent interface, allowing regular files to be accessed by regular application means, being completely transparent to the application that uses it. Known as the InterPlanetary File System (IPFS) [Labs, 2018] is a distributed file system that seeks to connect all computing devices with the same system of files, similar to a single bittorrent swarm exchanging GIT objects. At its core, IPFS is a versioned file system that can take files and manage them and also store them somewhere and then tracks versions over time. IPFS also accounts for how those files move across the network so it is also a distributed file system. One of the particularities of this P2P technology, is that the content is identified not by an address like a Uniform Resource Locator (URL) but instead, by its content. This is called content-addressing. This content address never changes to the same content, and another interesting property is that the address in independent of the location where the content is stored. When using IPFS, a given content isn’t not just downloaded from a given peer to a given computer, but this computer is also part of the global peer network and is also responsible to distribute it to other peers. So all the content is effectively managed by all the peers belonging to the network. It supports all kind of contents, going from regular files, database records, web pages, etc. This makes it possible to download a file from many locations that aren’t managed, stored by a single organization but instead from independent peers. This also implies that IPFS is participatory and collaborative so it nobody that is using IPFS has the content identified by a given address available for others to access, it won’t be possible to get it. On the other hand, content can’t be removed from IPFS as long as someone is interested enough to make it available, whether that person is the original author or not. Ivy [Muthitacharoen et al., 2002] is a multi-user read/write peer-to-peer file system based on Chord. It is suitable for small cooperative groups spread over large geographic areas. Ivy allows such groups to avoid the reliability and trust problems inherent in use of a central file server. An Ivy file system consists solely of a set of logs, one log per participant. Ivy stores its logs in a distributed hash table and each participant finds data by consulting all logs, but performs modifications by appending only to its own log. This arrangement allows Ivy to maintain meta- data consistency without locking. Ivy users can choose which other logs to trust, an appropriate arrangement in a semi-open peer-to-peer system. It does support replication mechanisms allowing the same block to be stored on different peers.

Type of deliverable PUBLIC Page | 19 `

Much work is being currently ongoing, bringing together the blockchain and P2P technologies like the one presented in [Li, Wu and Chen 2018], in here a blockchain is being used to provide integrity of the data and metadata stored in a P2P network. There are still improvements in DHT as it can be observed in [Hassanzadeh-Nazarabadi et al., 2018] here it’s being extended to feature replication mechanisms. Including the development of social networks on top of DHTs [Franchi et al., 2019]. P2P networks continue to be researched, receiving a lot of scientific attention with a multitude of possible applications.

3. TECHNICAL REQUIREMENTS IDENTIFICATION

This section presents the requirements of the EUNOMIA platform. The actors, functional requirements and non-functional requirements are described in detail in sections 3.1, 3.2 and 3.3 respectively.

3.1 Identified Actors

As end-user types, we have considered content consumers and content creators. We have identified the actors directly linked with a technological components (see Table 7). Each actor type can be a physical entity or a software agent (service).

Table 7: Description of actors

# Actors Description of actors

Blockchain Infrastructure

1 Clients A client is the End-User.

2 Peers A Peer is a network entity that maintains a ledger.

A defined collective of nodes that orders transactions into a block. The ordering service exists independent of the peer 3 Ordering Service processes and orders transactions on a first-come-first-serve basis for all channel’s on the network.

Membership Services authenticates, authorizes, and manages Membership identities on a permissioned blockchain network. The 4 Service Provider membership services code that runs in peers and orderers both authenticates and authorizes blockchain operations.

5 Smart Contract A smart contract is a piece of code installed in a peer node.

Channels A channel is a private blockchain overlay which allows for data 6 isolation and confidentiality.

Type of deliverable PUBLIC Page | 20

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

State Database Current state data is stored in a state database for efficient reads 7 and queries.

Content and Context Analysis

Knowledge Is a DB system that keeps stored all necessary sources for the 8 database sentiment analysis task.

Service Provider Is a mechanism that undertakes to connect and transact client 9 services.

HaTS and Reputation Mechanism

Is the service that performs paraphrase detection and other post Information 10 attribute similarity analysis (e.g., image, URL, hashtag, @mention Cascade Service comparison) for creating information cascades.

Is the service which extracts key post and author (where the user has provided explicit permission) metrics as meta-data for posts HaTS feature uploaded to the EUNOMIA Services Node. These are later used 11 extraction service as queryable and adjustable data filters in the information cascade trustworthiness metrics user visualisation on the Digital Companion.

EUNOMIA Services Is the service that transports and synchronises the distributed Node data information cascade knowledge and HaTS datasets and data 12 exchange API models (e.g., paraphrase, sentiment analysis) between EUNOMIA Services Nodes.

13 Blockchain Client The client which provides a means to query and upload information cascades identifiers stored on the EUNOMIA blockchain.

14 Sentiment Analysis Is the service that performs sentiment analysis on posts as they Service uploaded for processing in the EUNOMIA Services Node.

15 Trustworthiness Is the service which runs on the Digital companion and Metric Visualisation dynamically visualises (based on user configuration) HaTS Service features which have been extracted from posts in information cascades queried from the HaTs service on the DC server.

Cybersecurity and Privacy Framework

16 ESN Administrator Is the person responsible for managing a node of the EUNOMIA platform.

17 Device Is the device being used to run the Digital Companion application and interacting with the EUNOMIA platform.

Type of deliverable PUBLIC Page | 21 `

18 Service Is a service running on the EUNOMIA platform that will provide some functionality to the Users. E.g. text analysis.

Digital Companion

19 DC CLIENT The end-user application.

20 DC SERVER The interface between the end-user and EUNOMIA.

21 API CONSUMER A service running on the DC CLIENT for communication with EUNOMIA and DSNs.

22 LOCAL STORAGE Storage on the end-user’s device.

23 HaTS Service running on the DC SERVER.

24 P2P Service running on the DC SERVER.

25 SNP Service running on the DC SERVER.

Peer-to-Peer Infrastructure

26 Node Node of the peer-to-peer infrastructure.

3.2 Functional Requirements

Functional requirements define the required behaviour of the system to be built, as reported by a hypothetical observer envisioning the inputs that the future system will accept and the outputs it will produce in response to those inputs, e.g., they define the capabilities that a product must provide to its users. Functional requirements are based on system objectives and respond to the critical task of ensuring the right implementation of the expected functionality in the final software.

In order to prioritize the Functional requirements, we will use the MoSCoW method, which is a prioritization technique used (among other areas) in software development to reach a common understanding with stakeholders on the importance of the delivery of each requirement14. The categories are as shown in Table 8:

14 https://en.wikipedia.org/wiki/MoSCoW_method

Type of deliverable PUBLIC Page | 22

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

Table 8: MoSCoW method

MoSCoW Description

Requirements labeled as “Must have” are critical to the current delivery timebox in order for it to be a success. If even one Must Must have have requirement is not included, the project delivery should be considered a failure.

Requirements labeled as “Should have” are important but not necessary for delivery in the current deliverable. While Should have requirements can be as important as Must have, they are Should have often not as time-critical or there may be another way to satisfy the requirement so that it can be held back until a future version of the deliverable.

Requirements labeled as “Could have” are desirable but not necessary and could improve the user experience or customer Could have satisfaction for a little development cost. These will typically be included if time and resources permit.

Requirements labeled as “Won't have”, have been agreed by the Won't have (this time) participants as the least-critical, or not appropriate at that time.

Table 9 presents the identified functional requirements of the EUNOMIA end-user actor prioritized using the MoSCoW methodology:

Table 9: Generic Functional Requirements

ID # Description (Detailed description of the requirement) MoSCoW

End-User

FR-1 End-User must be able to create an account. Must have

FR-2 End-User must be able to authenticate. Must have

FR-3 End-User must be able to view the account page. Must have

FR-4 End-User must be able to revoke the account. Must have

FR-5 End-User must be able to maintain the account. Must have

Type of deliverable PUBLIC Page | 23 `

FR-6 End-User must be able to vote on content trustworthiness on Must have EUNOMIA users posts.

FR-7 End-User should not be able to vote on own post. Must have

FR-8 End-User must be able to view trustworthiness indicators of Must have EUNOMIA users’ posts.

Table 10 presents the identified functional requirements of the EUNOMIA component actors prioritized using the MoSCoW methodology, derived based on the user requirements (D2.4) and technical feasibility as assessed by all partners jointly in a dedicated technical workshop (30 May 2019).

Table 10: Functional Requirements per component

ID # Description (Detailed description of the requirement) MoSCoW

Blockchain Infrastructure (BI)

FR-9 BI must be able to accept requests for account creation. Must have

FR-10 BI must be able to accept requests for permissions. Must have

FR-11 BI must be able to accept requests for data stored in the ledger. Must have

FR-12 BI must be able to accept requests for revocation. Must have

FR-13 BI must be able to apply revocation operations. Must have

FR-14 BI must be able to confirm revocation. Must have

FR-15 BI must be able to create Blockchain-based signatures (proof of Must have ownership data).

FR-16 BI must be able to be used for verification purposes. Must have

FR-17 BI must be able to cryptographically link the information cascade. Must have

FR-18 BI must be able to have metadata (indicators) associated with the Must have post.

FR-19 BI should be able to communicate with the Off-Blockchain Must have temporary storage.

FR-20 BI must be able to maintain records for each blockchain Must have transaction.

FR-21 BI must be able to check if a record exists in the blockchain. Must have

Content and Context Analysis (CA)

Type of deliverable PUBLIC Page | 24

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

FR-22 CA must be able to assign a sentiment score to social media Must Have posts. Sentiment score can correspond to a probability.

FR-23 The CA must be able to process text from images. Could Have

FR-24 CA must be able to detect opinion words. Could have

HaTS and Reputation mechanism

FR-25 HRM must be able to create information cascades from post text Must Have content.

FR-26 HRM must be able to collect trustworthiness indicators for Must Have information cascades from post and user attributes.

FR-27 HRM must be able to create information cascades from post data Could Have metrics (image, URL, hashtag, @mention).

FR-28 HRM must be able to query blockchain and information cascade Must Have data via the EUNOMIA Services Node.

FR-29 HRM must be able to extract user behaviour and account data. Must Have

FR-30 HRM must provide a post trustworthiness voting mechanism. Must Have

FR-31 HRM must provide descriptive statistics for trustworthiness Could Have metrics in information cascades.

Cybersecurity and Privacy Framework

FR-32 An ESN administrator shouldn’t be able to change user’s data on Should Have the blockchain.

FR-33 After casting a vote, a USER should be able to withdraw it. Should Have

FR-34 A USER must be able to authenticate into EUNOMIA with their Must Have email and chosen password.

FR-35 A USER must be able to authenticate into EUNOMIA with an Must Have account from another platform (OAUTH based Distributed Social Media account).

FR-36 EUNOMIA should only allow the usage of an authenticated Should Have USER/DEVICE be used to associate or remove new devices into the account of the USER.

FR-37 Only an authenticated USER must have access to the EUNOMIA Must Have platform and to any of its user services.

Type of deliverable PUBLIC Page | 25 `

FR-38 EUNOMIA won’t always show the USER (identity) who first Must Have classified/registered/inserted a new post into the EUNOMIA platform.

FR-39 EUNOMIA must allow the USER to decide if EUNOMIA should Must Have show or hide the USER (identity) who first classified/registered/inserted a new post into the EUNOMIA platform.

FR-40 EUNOMIA should allow the USER to decide if EUNOMIA should Should Have show or hide the USER (identity) who voted for the trustworthiness of post into the EUNOMIA platform.

FR-41 A USER won’t be able to give/receive feedback about a cast vote. Won’t Have

FR-42 A SERVICE must be authenticated in the same way as USER. Must Have

FR-43 To register a DEVICE with the EUNOMIA platform, the USER must Must Have authenticate themselves.

Digital Companion

FR-44 DC CLIENT could be able to capture time spent by end user on a Could have post.

FR-45 DC CLIENT must allow a EUNOMIA USER to associate his/her Must have social media account with a EUNOMIA ID.

FR-46 DC CLIENT must be able to store temporary data in LOCAL Must have STORAGE.

FR-47 DC CLIENT must be able to receive EUNOMIA updates. Must have

FR-48 DC CLIENT must be able to be used on desktop computers. Must have

FR-49 DC CLIENT must be able to be used on smartphones. Must have

FR-50 DC CLIENT could be able to be used on smartwatches. Could have

FR-51 DC CLIENT should be able to be integrated with different social Should have networks using the API CONSUMER.

FR-52 DC CLIENT must establish a secure communication with DC Must Have SERVER using the SNP.

Peer-to-peer Infrastructure

FR-53 A USER won’t able to message another EUNOMIA USER (using Won’t Have EUNOMIA).

Type of deliverable PUBLIC Page | 26

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

FR-54 DC should safeguard all your EUNOMIA related account data Should Have outside your mobile device.

FR-55 EUNOMIA should have a simple algorithm to assign USERS to Should Have NODES.

FR-56 A USER should be able to connect to any EUNOMIA node in Should Have order to access the EUNOMIA platform.

FR-57 EUNOMIA off chain storage must store User accounts. Must Have

FR-58 EUNOMIA off chain storage should store Information cascade Should Have related data.

FR-59 EUNOMIA off chain storage should store User-view related data Should Have of the information cascade (shared among users).

FR-60 EUNOMIA off chain storage should store machine learning Should Have models.

FR-61 EUNOMIA off chain storage should store machine learning Should Have training data.

3.3 Non-functional requirements

Non-functional requirements specify additional properties of EUNOMIA, other than functionality. These requirements can be subcategorized into categories such as performance, design constraints (that can also be categorized under external interface requirements), logical database requirements, and “characteristics” (termed “attributes” in IEEE Std. 830) that don’t fit neatly into any of the other categories. The non-functional Requirement can also describe quality attributes, design and implementation constraints that the product must have, thus they are more qualitative and may require a different approach for their elicitation. To identify the non-functional requirements, the model proposed by ISO/IEC 25010:2011(Software Product Quality model)15[1] was adopted. Following that model there are eight quality characteristics contributing to software product quality presented in Table 11.

Table 11: ISO/IEC 25010:2011 Software Product Quality Model Sub-Categories

ID Quality characteristic Sub-categories

• Functional completeness Qc1 Functional suitability • Functional correctness • Functional appropriateness

15 https://www.iso.org/standard/35733.html

Type of deliverable PUBLIC Page | 27 `

• Time behaviour Qc2 Performance Efficiency • Resource utilization • Capacity

• Co-existence Qc3 Compatibility • Interoperability

• Appropriateness recognisability • Learnability • Operability Qc4 Usability • User error protection • User interface aesthetics • Accessibility

• Maturity • Availability Qc5 Reliability • Fault tolerance • Recoverability

• Confidentiality • Integrity Qc6 Security • Non-repudiation • Authenticity • Accountability

• Modularity • Reusability Qc7 Maintainability • Analysability • Modifiability • Testability

• Adaptability Qc8 Portability • Installability

All non-functional requirements were categorized as MUST HAVE using the MoSCoW method.

Table 12: Non-functional requirements

Technology ID Qc# Description

Posts’ metadata must be stored cryptographically Blockchain NFR-1 Qc6 in a permanent storage

Must be able to assign a sentiment score in less NFR-2 Qc6 than 5 s.

Type of deliverable PUBLIC Page | 28

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

Content and The algorithm should be able to assign sentiment NFR-3 Qc1 Context Analysis scores to subparts of a text.

Extracted user features should not contain PII NFR-4 Qc6 information.

Information cascade generation / retrieval should NFR-5 Qc2 not take more than 20 s.

Information cascade data sub-sampling (e.g., NFR-6 Qc2 filtering) should should minimise the number of posts to process.

After voting on a post trustworthiness category, this should update the EUNOMIA P2P database NFR-7 Qc2 record within 5 minutes and the blockchain within HaTS and 1 hour. Reputation mechanism HaTS and information cascade machine learning NFR-8 Qc2 can be offloaded to the ESN.

HaTS and information cascade feature-sets must be consistent and portable across multiple social NFR-9 Qc3 media platforms (e.g., MASTODON and DIASPORA).

Extracted user and post features and pre- NFR-10 Qc2 processing should complete in no more than 10 s.

EUNOMIA user and post IDs must be resilient to NFR-11 Qc6 spoofing when queried or uploaded by the Digital Companion.

EUNOMIA platform and DEVICEs should be able to detect if the data stored locally or remotely has NFR-12 Qc6 been tampered with, by checking it on the blockchain.

Cybersecurity EUNOMIA Observatory should be considered as a NFR-13 Qc4 and Privacy SERVICE. Framework EUNOMIA must only allow the connection of NFR-14 Qc6 authenticated SERVICES.

A DEVICE could use a stored secret to NFR-15 Qc6 authenticate to the EUNOMIA platform

Type of deliverable PUBLIC Page | 29 `

All data from other USERs stored in the DEVICE of NFR-16 Qc6 a USER could be protected (encrypted) in a way that the DEVICE owner won’t be able to access it.

Data regarding a EUNOMIA account (USER) NFR-17 Qc6 stored in the DEVICE of the USER could be protected (encrypted).

Data regarding a EUNOMIA account (USER) NFR-18 Qc6 stored in the EUNOMIA platform should be protected (encrypted).

Qc3, Qc4, NFR-19 DC CLIENT should run on MAC, Windows, Linux. Digital Qc8 Companion Qc3, Qc4, NFR-20 DC CLIENT should run on Android, iOS. Qc8

Peer-to-peer Infrastructure NFR-21 Qc5 P2P storage must be always accessible.

4. CONCLUSIONS

The main objective of Deliverable 3.1 was to present an assessment of the technological components to be used by EUNOMIA, specifically, the core blockchain architecture, social media analytics, and maching learning components are discussed. Through analyzing the output of Deliverable 2.4 - Report on user needs and requirements, this Deliverable presented an overview of EUNOMIA’s actors, as well as, a characterization of the functional and non-functional requirements for the physical entities and/or computational services determined. The identification of the functional and non-functional requirements assigned to the actors along with their characterization were defined according to the MoSCoW method and ISO/IEC 25010:2011 - Software Product Quality Model. A concrete list of requirements was discussed during a dedicated technical workshop to obtain a concrete list that aligns with EUNOMIA’s requirements taking into account the technical components to be used and user requirements. The main contribution of Deliverable 3.1 was to capture technical requirements per technological component and help align it with EUNOMIA’s specifications. The conceptual understanding of the architecture will drive the materialization of the specification and architectural design to be captured in Deliverable 3.2. In total, 26 actors, 61 functional and 21 non-functional requirements were identified. These will guide the development of all technical components in the project directly and indirectly by informing the specifications and architecture.

Type of deliverable PUBLIC Page | 30

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

5. REFERENCES

Agathangelou, Pantelis, Ioannis Katakis, Ioannis Koutoulakis, Fotis Kokkoras, and Dimitrios Gunopulos (2017). “Learning patterns for discovering domain-oriented opinion words,” Knowledge and Information Systems, Jun 2017. [Online]. Available: https://doi.org/10.1007/s10115-017-1072-y Agathangelou, Pantelis, and Ioannis Katakis (2018). “A hybrid deep learning network for modelling opinionated content,” in Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, ser. SAC ’19. New York, NY, USA: ACM, 2019, pp. 1051–1053. [Online]. Available: http://doi.acm.org/10.1145/3297280.3297570 Androulaki, Elli, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, et al. “Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains,” 2018. https://doi.org/10.1145/3190508.3190538. Aspnes, J., Shah, G. (2007). ‘Skip Graphs’; ACM Trans. Algorithms, Vol. 3, No. 4 (2007). https://doi.org/10.1145/1290672.1290674. A. Graves, N. Jaitly, and A. R. Mohamed (2013). “Hybrid speech recognition with deep bidirectional lstm,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Dec 2013, pp. 273–278. Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the international conference on language resources and evaluation, LREC 2010, 17–23 May 2010, Valletta, Malta. Barrell (2018). Retrieved 14 March 2018, from https://barrel-db.org/ Bethencourt, J., Sahai, A., Waters, B. (2007): ‘Ciphertext-policy attribute-based encryption’; In . IEEE Symposium on Security and Privacy, pp. 321–334. Biorn-Hansen, A., Gronli, T., and Ghinea, G. (2018). A Survey and Taxonomy of Core Concepts and Research Challenges in Cross-Platform Mobile Development. ACM Computing Surveys, 51(5), 1-34. doi: 10.1145/3241739 Biorn-Hansen, A., Gronli, T., Ghinea, G., and Alouneh, S. (2019). An Empirical Study of Cross- Platform Mobile Development in Industry. Wireless Communications And Mobile Computing, 2019, 1-12. doi: 10.1155/2019/5743892 Bittorrent-tracker (2014) Retrieved from https://github.com/webtorrent/bittorrent-tracker (Original work published 26 March 2014) Bittorrent (2019) Retrieved from http://bittorrent.org/beps/bep_0003.html (30 May 2019) Campos, A. F. (2014) ‘p2pfs: Simple P2P file system using FUSE, built on top of Kademlia (tomp2p implementation)’; Java. Retrieved from https://github.com/axfcampos/p2pfs (Original work published 25 November 2013). Culotta A. (2003) Maximizing Cascades in Social Networks: An Overview. Amherst, MA: University of Massachusetts. Chopra, Sumit, Raia Hadsell, and Yann LeCun (2005). “Learning a Similarity Metric Discriminatively, with Application to Face Verification.” In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

Type of deliverable PUBLIC Page | 31 `

Cohen, B. (2013). ‘Incentives build robustness in BitTorrent’; Workshop on Economics of PeertoPeer Systems, Vol. 6 (2013). Conneau, Alexis and Douwe Kiela. SentEval (2018). An Evaluation Toolkit for Universal Sentence Representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan, May 2018. Delaune, S., Hirschi, L.: ‘A survey of symbolic methods for establishing equivalence-based properties in cryptographic protocols’; Journal of Logical and Algebraic Methods in Programming, Vol. 87 (2017), pp. 127–144. https://doi.org/10.1016/j.jlamp.2016.10.005 Dennis, J. B. (1965): ‘Segmentation and the design of multiprogrammed computer systems’; Journal of the ACM (JACM), Vol. 12, No. 4, pp. 589–602. Dwoskin, E. (2017). “Twitter is looking for ways to let users flag fake news, offensive content”. The Washington Post. https://www.washingtonpost.com/news/the-switch/wp/2017/06/29/twitter- is-looking-for-ways-to-let-users-flag-fake-news Esuli, A. and Sebastiani, F. (2006). “SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining.” In Proceedings of 5th Conference on Language Resources and Evaluation. Franchi, E., Poggi, A., Tomaiuolo, M. (2019): ‘Blogracy: A Peer-to-Peer Social Network’; Censorship, Surveillance, and Privacy: Concepts, Methodologies, Tools, and Applications, pp. 675–696. https://doi.org/10.4018/978-1-5225-7113-1. Ghoshal, Abhimanyu (2019). "Whatsapp Launches A Tip Line In India To Battle Fake News". The Next Web. https://thenextweb.com/apps/2019/04/02/whatsapp-launches-a-tip-line-in-india-to- battle-fake-news-ahead-of-national-elections Geambasu, R., Kohno, T., Levy, A. A., Levy, H. M. (2009): ‘Vanish: Increasing Data Privacy with Self- Destructing Data.’; In USENIX Security Symposium (Vol. 9) (2009). Grubbs, P., McPherson, R., Naveed, M., Ristenpart, T., Shmatikov, V. (2016). ‘Breaking web applications built on top of encrypted data’; Presented at the Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ACM, pp. 1353–1364. Han, W., Lei, C. (2012) ‘A survey on policy languages in network and security management’; Computer Networks, Vol. 56, No. 1 (2012), pp. 477–489. https://doi.org/10.1016/j.comnet.2011.09.014. Harvey, N. J. A., Munro, J. I. (2003): ‘Brief Announcement: Deterministic Skipnet’; In Proceedings of the Twenty-second Annual Symposium on Principles of Distributed Computing. New York, NY, USA: ACM (2003), pp. 152–152. https://doi.org/10.1145/872035.872057. Hasan, R., Anwar, Z., Yurcik, W., Brumbaugh, L., Campbell, R. (2005) ‘A survey of peer-to-peer storage techniques for distributed file systems’; IEEE, pp. 205-213 Vol. 2. https://doi.org/10.1109/ITCC.2005.42. Hassanzadeh-Nazarabadi, Y., Küpçü, A., Ozkasap, O. (2018). ‘Decentralized and locality aware replication method for DHT-based P2P storage systems’; Future Generation Computer Systems, Vol. 84 (2018), pp. 32–46. https://doi.org/10.1016/j.future.2018.02.007 Hatzivassiloglou, Vasileios, and Kathleen R. McKeown (1997). “Predicting the Semantic Orientation of Adjectives.” In Proceedings of the 35th Annual Meeting on Association for Computational Linguistics, https://doi.org/10.3115/976909.979640

Type of deliverable PUBLIC Page | 32

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

Sukhwani, Harish, Nan Wang, Kishor S. Trivedi, and Andy Rindos (2018). “Performance Modeling of Hyperledger Fabric (Permissioned Blockchain Network).” IEEE 17th International Symposium on Network Computing and Applications, 2018, https://doi.org/10.1109/NCA.2018.8548070. He, Hua, Kevin Gimpel, and Jimmy Lin (2015). “Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks,” 2015. https://doi.org/10.18653/v1/d15-1181. Hu, Minqing, and Bing Liu (2004). “Mining and Summarizing Customer Reviews.” In Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’04, 2004. https://doi.org/10.1145/1014052.1014073. Hyperledger Fabric (2019) A Blockchain Platform for the Enterprise https://hyperledger- fabric.readthedocs.io/en/release-1.4/ Jalili M, Perc M. (2017). Information cascades in complex networks. Journal of Complex Networks. Jul 6; 5(5):665-93. Javed, Ali, and Byung Suk Lee (2016). "Sense-Level Semantic Clustering of Hashtags in Social Media." SIMBig. 2016. Sousa, Joao, Alysson Bessani, and Marko Vukolic (2018). “A Byzantine Fault-Tolerant Ordering Service for the Hyperledger Fabric Blockchain Platform.” In Proceedings - 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018, 2018. https://doi.org/10.1109/DSN.2018.00018. Kamps J, Marx M, Mokken RJ, de Rijke M (2004). Using wordnet to measure semantic orientations of adjectives. In: Proceedings of the fourth international conference on language resources and evaluation, LREC, May 26–28, 2004, Lisbon, Portugal. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D. (1997): ‘Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web’; In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing. New York, NY, USA, pp. 654–663. https://doi.org/10.1145/258533.258660 Klamma, Ralf, and Dominik Renzel (2016). “Las2peer – A Primer Advanced Community Information Systems ( ACIS ) Group Chair of Computer Science 5 ( Databases & Information Systems ) RWTH Aachen University , Aachen , Germany ACIS Working Group Series Document AWGS- 2016-020 Las2peer – A Primer Ralf Klamma , Dominik Renzel , Peter de Lange , Holger Janßen,” https://doi.org/10.13140/RG.2.2.31456.48645. Krohn, M., Yip, A., Brodsky, M., Morris, R., Walfish, M. (2007): ‘A world wide web without walls’; In 6th ACM Workshop on Hot Topics in Networking (Hotnets) Labs, P. (2018): ‘IPFS is the Distributed Web’; Retrieved 14 March 2018, from https://ipfs.io/ Li, N., Mitchell, J. C., Winsborough, W. H.: ‘Design of a role-based trust-management framework’; In Security and Privacy, 2002. Proceedings. 2002 IEEE Symposium on. IEEE (2002), pp. 114–130. Li, J., Wu, J., Chen, L.: ‘Block-secure: Blockchain based scheme for secure P2P cloud storage’; Information Sciences, Vol. 465 (2018), pp. 219–231. https://doi.org/10.1016/j.ins.2018.06.071 Malatras, A. (2015): ‘State-of-the-art survey on P2P overlay networks in pervasive computing environments’; Journal of Network and Computer Applications, Vol. 55, pp. 1–23. https://doi.org/10.1016/j.jnca.2015.04.014

Type of deliverable PUBLIC Page | 33 `

Maymounkov, P., Mazières, D. (2002). ‘Kademlia: A Peer-to-Peer Information System Based on the XOR Metric’; In Peer-to-Peer Systems. Springer, Berlin, Heidelberg (2002), pp. 53–65. https://doi.org/10.1007/3-540-45748-8_5 Muthitacharoen, A., Morris, R., Gil, T. M., Chen, B. (2002). ‘Ivy: a read/write peer-to-peer file system’; ACM Press, p. 31. https://doi.org/10.1145/1060289.1060293 Narayan, S., Gagné, M., Safavi-Naini, R. (2010). ‘Privacy preserving EHR system using attribute- based infrastructure’; In Proceedings of the ACM workshop on Cloud computing security workshop. ACM, pp. 47–52. Neeraja, M. and Prakash, J. (2016). Detecting Malicious Posts in Social Networks Using Text Analysis. International Journal of Science and Research (IJSR) ISSN (Online), pp.2319-7064. Neff, C. A. (2014): ‘A Verifiable Secret Shuffle and its Application to E-Voting’; p. 10. Popa, R. A., Stark, E., Helfer, J., Valdez, S., Zeldovich, N., Kaashoek, M. F., Balakrishnan, H.: ‘Building Web Applications on Top of Encrypted Data Using Mylar.’; Presented at the NSDI (2014), pp. 157–172. Parikh, A. P., Oscar Tackstrom, Dipanjan Das, and Jakob Uszkoreit (2016). A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933. Parth Thakkar, Senthil Nathan, and Balaji Viswanathan, “Performance Benchmarking and Optimizing Hyperledger Fabric Blockchain Platform,” Proceedings - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018, 2018, 264–76, https://doi.org/10.1109/MASCOTS.2018.00034. Popescu, Ana Maria, and Orena Etzioni. “Extracting Product Features and Opinions from Reviews.” In: HLT/EMNLP 2005, human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference, 6–8 October 2005, Vancouver, BC, Canada Pugh, W. (1990). ‘A Skip List Cookbook’. Qbittorrent, 2018] Retrieved 14 March 2018, from https://www.qbittorrent.org/ Qiu G, Liu B, Bu J, Chen C (2009) Expanding domain sentiment lexicon through double propagation. In: Proceedings of the 21st international joint conference on artificial intelligence, IJCAI’09, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc, pp 1199–1204. Qiu G, Liu B, Bu J, Chen C (2011) Opinion word expansion and target extraction through double propagation. Comput Linguist 37(1): 9–27. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: ‘A Scalable Content-addressable Network’; In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. New York, NY, USA: ACM (2001), pp. 161–172. https://doi.org/10.1145/383059.383072. RESILIO (2018). Fastest and Most Reliable Way to Move Data - P2P File Transfer and Synchronization, 2018] Retrieved 13 March 2018, from https://www.resilio.com Ripeanu, M., Iamnitchi, A., Foster, I., Rogers, A.: ‘In search of simplicity: a self-organizing group communication overlay’; Concurrency and Computation: Practice and Experience, Vol. 22, No. 7 (2010), pp. 788–815. https://doi.org/10.1002/cpe.1543.

Type of deliverable PUBLIC Page | 34

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

Rowstron, A., Druschel, P.: ‘Pastry: Scalable, Decentralized Object Location, and Routing for Large- Scale Peer-to-Peer Systems’; In Middleware 2001. Springer, Berlin, Heidelberg (2001), pp. 329– 350. https://doi.org/10.1007/3-540-45518-3_18. Sandhu, R. S.: ‘Role-based Access Control’; In Advances in computers (Vol. 46). Elsevier (1998), pp. 237–286. Sawant, N., Li, J. and Wang, J.Z., 2011. Automatic image semantic interpretation using social action and tagging data. Multimedia Tools and Applications, 51(1), pp.213-246. Scuttlebot, 2018. Retrieved 14 March 2018, from https://scuttlebot.io/ Sharma, T. K. (2018). ‘How Does Blockchain Use Public Key Cryptography?’. Retrieved 27 January 2018 from https://www.blockchain-council.org/blockchain/how-does-blockchain-use-public- key-cryptography. Stoica, I., Morris, R., Liben-Nowell, D., Karger, D. R., Kaashoek, M. F., Dabek, F., Balakrishnan, H. (2003). ‘Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications’; IEEE/ACM Trans. Netw., Vol. 11, No. 1 (2003), pp. 17–32. https://doi.org/10.1109/TNET.2002.808407. Storj (2018). Decentralized Cloud Storage Retrieved 13 March 2018, from https://storj.io Pongnumkul, S., Siripanpornchana, C. and Thajchayapong, S., 2017, July. Performance analysis of private blockchain platforms in varying workloads. In 2017 26th International Conference on Computer Communication and Networks (ICCCN) (pp. 1-6). IEEE. Hochreiter, S. and Schmidhuber, J. (1997). “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735. Subramanian, S., Adam Trischler, Yoshua Bengio and Christopher J Pal. (2018). Learning general purpose distributed sentence representations via large Multi-task learning. ICLR 2018. Swan, M. (2015). ‘Blockchain: Blueprint for a new economy’; O’Reilly Media, Inc. (2015). Tahoe-LAFS (2018) Retrieved 14 March 2018, from https://tahoe-lafs.org/trac/tahoe-lafs. Dinh, T.T.A., Liu, R., Zhang, M., Chen, G., Ooi, B.C. and Wang, J. (2018). Untangling blockchain: A data processing view of blockchain systems. IEEE Transactions on Knowledge and Data Engineering, 30(7), pp.1366-1385. Transmission (2018),. Transmission Retrieved 14 March 2018, from https://transmissionbt.com/ Mikolov, T., S. Kombrink, L. Burget, J. ernock, and S. Khudanpur (2011). “Extensions of recurrent neural network language model,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp. 5528–5531. Tomar, Gaurav Singh, Thyago Duque, Oscar Tackstrom, Jakob Uszkoreit, and Dipanjan Das (2017). Neural Paraphrase Identification of Questions with Noisy Pretraining. In Proceedings of International Joint Conferences on Artificial Intelligence Organization (IJCAI), Melbourne, Australia, August 2017. TomP2P (2018). A P2P-based key-value pair storage library Retrieved 14 March 2018, from https://tomp2p.net/ Turney, Peter D. (2002). “Thumbs up or Thumbs down? Semantic Orientation Applied to Unsupervised Classification of Reviews.” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL),

Type of deliverable PUBLIC Page | 35 `

Verma, D. C.: (2002). ‘Simplifying network administration using policy-based management’; IEEE Network, Vol. 16, No. 2 (2002), pp. 20–26. Wang, L., Kangasharju, J. (2013). ‘Measuring large-scale distributed systems: case of BitTorrent Mainline DHT’; In IEEE P2P 2013 Proceedings. Trento, Italy: IEEE (2013), pp. 1–10. https://doi.org/10.1109/P2P.2013.6688697 Wang, Z., Wael Hamza, and Radu Florian (2017). Bilateral multi-perspective matching for natural language sentences. In Proceedings of International Joint Conferences on Artificial Intelligence Organization (IJCAI), Melbourne, Australia. Wang, Z., Haitao Mi, and Abraham Ittycheriah (2016). Sentence similarity learning by lexical decomposition and composition. In Proceedings of COLING, 2016. Wang, Z., Haitao Mi, and Abraham Ittycheriah (2016), Semi-supervised clustering for short text via deep representation learning. In CoNLL. Wang, F., Mickens, J., Zeldovich, N., Vaikuntanathan, V. (2016). ‘Sieve: Cryptographically Enforced Access Control for User Data in Untrusted Clouds.’; In NSDI, pp. 611–626. Wilson T, Wiebe J, Hoffmann P (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In: HLT/EMNLP 2005, human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference, 6–8 October 2005, Vancouver, BC, Canada Wolchok, S., S. Hofmann, O., Heninger, N., W. Felten, E., Alex Halderman, J., J. Rossbach, C., et al. (2010). ‘Defeating Vanish with Low-Cost Sybil Attacks Against Large DHTs.’. XtreemFS (2018) Fault-Tolerant Distributed File System Retrieved 13 March 2018, from http://www.xtreemfs.org/ Bengio, Y., Y. LeCun, C. Nohl, and C. Burges (1995). “LeRec: A NN/HMM Hybrid for on-Line Handwriting Recognition.” Neural Computation 7, no. 6 (1995): 1289–1303. https://doi.org/10.1162/neco.1995.7.6.1289. LeCun, Yann, Yoshua Bengio, et al. (1995). “Convolutional Networks for Images, Speech, and Time Series.” The Handbook of Brain Theory and Neural Networks, 1995. Yuan, E., Tong, J. (2005). ‘Attributed based access control (ABAC) for Web services’; In IEEE International Conference on Web Services (ICWS’05) (2005), p. 569. https://doi.org/10.1109/ICWS.2005.25 Zheng, Z., Xie, S., , H.-N., Wang, H. (2016). ‘Blockchain challenges and opportunities: A survey’; Work Pap.–2016.

Type of deliverable PUBLIC Page | 36

H2020 Grant Agreement Number: 825171 Document ID: WP3 / D3.1

www.eunomia-project.eu

Type of deliverable PUBLIC Page | 37