Collecting Information from a Decentralized Microservice Architecture

Total Page:16

File Type:pdf, Size:1020Kb

Collecting Information from a Decentralized Microservice Architecture Linköping University | IDA Bachelor Thesis | Computer Engineering Spring 2018 | LIU-IDA/LITH-EX-G--18/025—SE Collecting Information from a decentralized microservice architecture Carl Ekbjörn Daniel Sonesson Tutor, Jonas Wallgren Examiner, Lena Buffoni Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. © Carl Ekbjörn, Daniel Sonesson Abstract As a system grows in size, it is common that it is transformed into a microservice architecture. In order to be able monitor this new architecture there is a need to collect information from the microservices. The software company IDA Infront is transitioning their product iipax to a microservice architecture and is faced with this problem. In order to solve this, they propose the use of a Message-oriented Middleware (MOM). There exists many different MOMs that are suitable to execute this task. The aim of this thesis is to determine, in terms of latency, throughput and scalability, which MOM is best suitable for this. Out of four suitable MOMs Apache Kafka and RabbitMQ are chosen for further testing and benchmarking. The tests display that RabbitMQ is able to send single infrequent messages (latency) faster than Kafka. But it is also shown that Kafka is faster at sending a lot of messages rapidly and with an increased number of producers sending messages (throughput and scalability). However, the scalability test suggests that RabbitMQ possibly scales better with a larger amount of microservices, thus more testing is needed to get a definite conclusion. iii iv Table of Contents Upphovsrätt ...................................................................................................................... ii Copyright .......................................................................................................................... ii 1. Introduction ........................................................................................................................... 7 2. Background............................................................................................................................ 9 2.1 iipax ............................................................................................................................. 9 2.2 iipax archive ................................................................................................................ 9 3. Theory .................................................................................................................................. 13 3.1 Messaging Protocols .................................................................................................. 13 3.1.1 AMQP ................................................................................................................ 13 3.2 Message-Oriented Middleware.................................................................................. 15 3.2.1 RabbitMQ........................................................................................................... 15 3.2.2 ActiveMQ ........................................................................................................... 16 3.2.3 Apache Qpid ...................................................................................................... 16 3.2.4 Apache Kafka ..................................................................................................... 16 3.3 Quality of service....................................................................................................... 17 3.3.1 Correctness ........................................................................................................ 17 3.3.2 Scalability .......................................................................................................... 18 3.3.3 Efficiency ........................................................................................................... 18 3.4 Microservices............................................................................................................. 18 3.4.1 Microservices in iipax ........................................................................................ 18 4. Method ................................................................................................................................. 21 4.1 Related work .............................................................................................................. 21 4.2 Analysis of MOMs .................................................................................................... 22 4.3 Benchmarking of MOMs ........................................................................................... 24 4.3.1 Latency ............................................................................................................... 24 4.3.2 Throughput......................................................................................................... 25 4.3.3 Scalability .......................................................................................................... 26 4.4 The Kafka Configuration ........................................................................................... 26 4.4.1 Kafka Producer .................................................................................................. 26 4.4.2 Kafka Consumer ................................................................................................ 26 4.5 The RabbitMQ Configuration ................................................................................... 27 4.5.1 RabbitMQ Producer .......................................................................................... 27 4.5.2 RabbitMQ Consumer ......................................................................................... 27 5. Results .................................................................................................................................. 29 5.1 Latency ...................................................................................................................... 29 5.2 Throughput ................................................................................................................ 29 5.2.1 Throughput Send Time ....................................................................................... 29 5.2.2 Throughput Receive Time .................................................................................. 30 5.2.2 Throughput Total Time ...................................................................................... 30 5.3 Scalability .................................................................................................................. 31 5.3.1 Scalability Send Time......................................................................................... 31 5.3.2 Scalability Receive Time .................................................................................... 32 5.3.3 Scalability Total Time ........................................................................................ 32 6. Discussion ............................................................................................................................. 35 6.1 Latency ...................................................................................................................... 35 6.2 Throughput ................................................................................................................ 35 6.3 Scalability .................................................................................................................
Recommended publications
  • Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka
    White Paper Information Security | Machine Learning October 2020 IT@Intel: Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka Our Apache Kafka data pipeline based on Confluent Platform ingests tens of terabytes per day, providing in-stream processing for faster security threat detection and response Intel IT Authors Executive Summary Ryan Clark Advanced cyber threats continue to increase in frequency and sophistication, Information Security Engineer threatening computing environments and impacting businesses’ ability to grow. Jen Edmondson More than ever, large enterprises must invest in effective information security, Product Owner using technologies that improve detection and response times. At Intel, we Dennis Kwong are transforming from our legacy cybersecurity systems to a modern, scalable Information Security Engineer Cyber Intelligence Platform (CIP) based on Kafka and Splunk. In our 2019 paper, Transforming Intel’s Security Posture with Innovations in Data Intelligence, we Jac Noel discussed the data lake, monitoring, and security capabilities of Splunk. This Security Solutions Architect paper describes the essential role Apache Kafka plays in our CIP and its key Elaine Rainbolt benefits, as shown here: Industry Engagement Manager ECONOMIES OPERATE ON DATA REDUCE TECHNICAL GENERATES OF SCALE IN STREAM DEBT AND CONTEXTUALLY RICH Paul Salessi DOWNSTREAM COSTS DATA Information Security Engineer Intel IT Contributors Victor Colvard Information Security Engineer GLOBAL ALWAYS MODERN KAFKA LEADERSHIP SCALE AND REACH ON ARCHITECTURE WITH THROUGH CONFLUENT Juan Fernandez THRIVING COMMUNITY EXPERTISE Technical Solutions Specialist Frank Ober SSD Principal Engineer Apache Kafka is the foundation of our CIP architecture. We achieve economies of Table of Contents scale as we acquire data once and consume it many times.
    [Show full text]
  • Tracking Known Security Vulnerabilities in Third-Party Components
    Tracking known security vulnerabilities in third-party components Master’s Thesis Mircea Cadariu Tracking known security vulnerabilities in third-party components THESIS submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE by Mircea Cadariu born in Brasov, Romania Software Engineering Research Group Software Improvement Group Department of Software Technology Rembrandt Tower, 15th floor Faculty EEMCS, Delft University of Technology Amstelplein 1 - 1096HA Delft, the Netherlands Amsterdam, the Netherlands www.ewi.tudelft.nl www.sig.eu c 2014 Mircea Cadariu. All rights reserved. Tracking known security vulnerabilities in third-party components Author: Mircea Cadariu Student id: 4252373 Email: [email protected] Abstract Known security vulnerabilities are introduced in software systems as a result of de- pending on third-party components. These documented software weaknesses are hiding in plain sight and represent the lowest hanging fruit for attackers. Despite the risk they introduce for software systems, it has been shown that developers consistently download vulnerable components from public repositories. We show that these downloads indeed find their way in many industrial and open-source software systems. In order to improve the status quo, we introduce the Vulnerability Alert Service, a tool-based process to track known vulnerabilities in software projects throughout the development process. Its usefulness has been empirically validated in the context of the external software product quality monitoring service offered by the Software Improvement Group, a software consultancy company based in Amsterdam, the Netherlands. Thesis Committee: Chair: Prof. Dr. A. van Deursen, Faculty EEMCS, TU Delft University supervisor: Prof. Dr. A.
    [Show full text]
  • Unravel Data Systems Version 4.5
    UNRAVEL DATA SYSTEMS VERSION 4.5 Component name Component version name License names jQuery 1.8.2 MIT License Apache Tomcat 5.5.23 Apache License 2.0 Tachyon Project POM 0.8.2 Apache License 2.0 Apache Directory LDAP API Model 1.0.0-M20 Apache License 2.0 apache/incubator-heron 0.16.5.1 Apache License 2.0 Maven Plugin API 3.0.4 Apache License 2.0 ApacheDS Authentication Interceptor 2.0.0-M15 Apache License 2.0 Apache Directory LDAP API Extras ACI 1.0.0-M20 Apache License 2.0 Apache HttpComponents Core 4.3.3 Apache License 2.0 Spark Project Tags 2.0.0-preview Apache License 2.0 Curator Testing 3.3.0 Apache License 2.0 Apache HttpComponents Core 4.4.5 Apache License 2.0 Apache Commons Daemon 1.0.15 Apache License 2.0 classworlds 2.4 Apache License 2.0 abego TreeLayout Core 1.0.1 BSD 3-clause "New" or "Revised" License jackson-core 2.8.6 Apache License 2.0 Lucene Join 6.6.1 Apache License 2.0 Apache Commons CLI 1.3-cloudera-pre-r1439998 Apache License 2.0 hive-apache 0.5 Apache License 2.0 scala-parser-combinators 1.0.4 BSD 3-clause "New" or "Revised" License com.springsource.javax.xml.bind 2.1.7 Common Development and Distribution License 1.0 SnakeYAML 1.15 Apache License 2.0 JUnit 4.12 Common Public License 1.0 ApacheDS Protocol Kerberos 2.0.0-M12 Apache License 2.0 Apache Groovy 2.4.6 Apache License 2.0 JGraphT - Core 1.2.0 (GNU Lesser General Public License v2.1 or later AND Eclipse Public License 1.0) chill-java 0.5.0 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 OpenCensus 0.12.3 Apache License 2.0 ApacheDS Protocol
    [Show full text]
  • Vimal Daga Chief Technical Officer (CTO) – Linuxworld Informatics Pvt Ltd Professional Experience & Certifications
    Vimal Daga Chief Technical Officer (CTO) – LinuxWorld Informatics Pvt Ltd Professional Experience & Certifications: I Professional Experience During this period, has been engaged with various corporate clients on different domains and has been involved in imparting corporate Training programs and Consultancy for various technologies that covers the following: A. Sr. Machine Learning / Deep Learning / Data Scientist / NLP Consultant and Researcher Expertise in the field of Artificial Intelligence, Deep Learning, and Computer Vision and having ability to solve problems such as Face Detection, Face Recognition and Object Detection using Deep Neural Network (CNN, DNN, RNN, Convolution Networks etc.) and Optical Character Detection and Recognition (OCD & OCR) Worked in tools such as Tensorflow, Caffe/Caffe2, Keras, Theano, PyTorch etc. Build prototypes related to deep learning problems in the field of computer vision. Publications at top international conferences/ journals in fields related to computer vision/deep learning/machine learning / AI Experience on tools, frameworks like Microsoft Azure ML, Chat Bot Framework/LUIS . IBM Watson / ConversationService, Google TensorFlow / Python for Machine Learning (e.g. scikit-learn),Open source ML libraries and tools like Apache Spark Highly Worked on Data Science, Big Data,datastructures, statistics , algorithms like Regression, Classification etc. Working knowlegde of Supervised / Unsuperivsed learning (Decision Trees, Logistic Regression, SVMs,GBM, etc) Expertise in Sentiment Analysis, Entity Extraction, Natural Language Understanding (NLU), Intent recognition Strong understanding of text pre-processing and normalization techniques, such as tokenization, POS tagging, and parsing, and how they work at a basic level and NLP toolkits as NLTK, Gensim,, Apac SpaCyhe UIMA etc. I have Hands on experience related to Datasets such as or including text, images and other logs or clickstreams.
    [Show full text]
  • Return of Organization Exempt from Income
    OMB No. 1545-0047 Return of Organization Exempt From Income Tax Form 990 Under section 501(c), 527, or 4947(a)(1) of the Internal Revenue Code (except black lung benefit trust or private foundation) Open to Public Department of the Treasury Internal Revenue Service The organization may have to use a copy of this return to satisfy state reporting requirements. Inspection A For the 2011 calendar year, or tax year beginning 5/1/2011 , and ending 4/30/2012 B Check if applicable: C Name of organization The Apache Software Foundation D Employer identification number Address change Doing Business As 47-0825376 Name change Number and street (or P.O. box if mail is not delivered to street address) Room/suite E Telephone number Initial return 1901 Munsey Drive (909) 374-9776 Terminated City or town, state or country, and ZIP + 4 Amended return Forest Hill MD 21050-2747 G Gross receipts $ 554,439 Application pending F Name and address of principal officer: H(a) Is this a group return for affiliates? Yes X No Jim Jagielski 1901 Munsey Drive, Forest Hill, MD 21050-2747 H(b) Are all affiliates included? Yes No I Tax-exempt status: X 501(c)(3) 501(c) ( ) (insert no.) 4947(a)(1) or 527 If "No," attach a list. (see instructions) J Website: http://www.apache.org/ H(c) Group exemption number K Form of organization: X Corporation Trust Association Other L Year of formation: 1999 M State of legal domicile: MD Part I Summary 1 Briefly describe the organization's mission or most significant activities: to provide open source software to the public that we sponsor free of charge 2 Check this box if the organization discontinued its operations or disposed of more than 25% of its net assets.
    [Show full text]
  • HDP 3.1.4 Release Notes Date of Publish: 2019-08-26
    Release Notes 3 HDP 3.1.4 Release Notes Date of Publish: 2019-08-26 https://docs.hortonworks.com Release Notes | Contents | ii Contents HDP 3.1.4 Release Notes..........................................................................................4 Component Versions.................................................................................................4 Descriptions of New Features..................................................................................5 Deprecation Notices.................................................................................................. 6 Terminology.......................................................................................................................................................... 6 Removed Components and Product Capabilities.................................................................................................6 Testing Unsupported Features................................................................................ 6 Descriptions of the Latest Technical Preview Features.......................................................................................7 Upgrading to HDP 3.1.4...........................................................................................7 Behavioral Changes.................................................................................................. 7 Apache Patch Information.....................................................................................11 Accumulo...........................................................................................................................................................
    [Show full text]
  • PROJECT REPORT IT2901 - Informatics Project II
    PROJECT REPORT IT2901 - Informatics Project II Group 02 - FFI Publish/Subscribe Written by: Fredrik Christoffer Berg Kristoffer Andreas Breiland Dalby Hakon˚ Ødegard˚ Løvdal Aleksander Skraastad Fredrik Borgen Tørnvall Trond Walleraunet Spring 2015 Norwegian University of Science and Technology Page intentionally left blank. We would like to extend our gratitude to Frank Trethan Johnsen and Trude Hafsøe Bloebaum from the Norwegian Defence Research Establishment for their excellent collaboration throughout the lifetime of this project. We would also like to thank our supervisor Alfredo Perez Fernandez, for his excellent feedback and help on the process and report parts of the project. At last we would like to thank Marianne Valstad for proofreading and giving feedback on the report. Abstract This report describes the work done during the course IT2901 - Informatics Project II. Our customer was the Norwegian Defence Research Establishment, which is a governmental organization responsible for research and development for the Norwegian Armed Forces. The assignment was to make an application that translates between different publish/sub- scribe protocols used in the Norwegian Armed Forces and the North Atlantic Treaty Or- ganisation. The relevant protocols were the Web Services Notification protocol, Advanced Messaging Queuing Protocol, Message Queue Telemetry Transport and ZeroMQ. FFI and The Norwegian Armed Forces needed such a broker in order to participate in federated mission networking. This report mainly focuses on the development process.
    [Show full text]
  • An Empirical Performance Evaluation of Apache Kafka
    How Fast Can We Insert? An Empirical Performance Evaluation of Apache Kafka Guenter Hesse, Christoph Matthies, Matthias Uflacker Hasso Plattner Institute University of Potsdam Germany fi[email protected] about such study results is a prerequisite for making informed Abstract—Message brokers see widespread adoption in modern decisions about whether a system is suitable for the existing IT landscapes, with Apache Kafka being one of the most use cases. Additionally, it is also crucial for finding or fine- employed platforms. These systems feature well-defined APIs for use and configuration and present flexible solutions for tuning appropriate system configurations. various data storage scenarios. Their ability to scale horizontally The contributions of this research are as follows: enables users to adapt to growing data volumes and changing • We propose a user-centered and extensible monitoring environments. However, one of the main challenges concerning framework, which includes tooling for analyzing any message brokers is the danger of them becoming a bottleneck within an IT architecture. To prevent this, knowledge about the JVM-based system. amount of data a message broker using a specific configuration • We present an analysis that highlights the capabilities of can handle needs to be available. In this paper, we propose a Apache Kafka regarding the maximum achievable rate of monitoring architecture for message brokers and similar Java incoming records per time unit. Virtual Machine-based systems. We present a comprehensive • We enable reproducibility of the presented results by performance analysis of the popular Apache Kafka platform 1 using our approach. As part of the benchmark, we study selected making all needed artifacts available online .
    [Show full text]
  • BUILDING RESILIENT MICROSERVICES with APACHE QPID PROTON
    BUILDING RESILIENT MICROSERVICES with APACHE QPID PROTON Richard Li Rafael Schloming datawire.io • MICROSERVICES • DESIGNING MICROSERVICES • DEMO • WRAP UP • Release any time • You’re responsible for reliability, availability, scalability, security • You’re also responsible for monitoring, billing, user admin, … Idiot proof deploy Homogenous tech stack Minimize upgrade Synchronized release frequency ACID; 1 simultaneous Easy for vendor to debug release Ship as fast as possible Continuous delivery Lots of functional Design/build in parts breadth Reliability, availability, Resilient system design security, scale Continuous delivery Design/build in parts Microservices. Resilient system design Componentization via Services Organized around Business Capabilities Products not Projects Smart endpoints and dumb pipes Decentralized Governance Decentralized Data Management Infrastructure Automation Design for failure Evolutionary Design http://martinfowler.com/articles/microservices.html DESIGNING MICROSERVICES Monolith 1. Send a tweet. 2. Get followers. 3. Publish tweet. App server Three App Servers 1. Send a tweet. 2. Get followers. 3. Publish tweet. App server App Servers + Asynchronous Queue 1. Send a tweet. 4. Get followers. 3. Process new tweets. 2. Queue tweet for sending. 5. Publish tweet. Not a typical app server App Servers + Asynchronous Queue 4. Get followers. Recommend followers 1. Send a tweet. 3. Process new tweets. 2. Queue tweet for sending. 5. Publish tweet. Not a typical app server Fully Asynchronous Publish changes to followers
    [Show full text]
  • Full-Graph-Limited-Mvn-Deps.Pdf
    org.jboss.cl.jboss-cl-2.0.9.GA org.jboss.cl.jboss-cl-parent-2.2.1.GA org.jboss.cl.jboss-classloader-N/A org.jboss.cl.jboss-classloading-vfs-N/A org.jboss.cl.jboss-classloading-N/A org.primefaces.extensions.master-pom-1.0.0 org.sonatype.mercury.mercury-mp3-1.0-alpha-1 org.primefaces.themes.overcast-${primefaces.theme.version} org.primefaces.themes.dark-hive-${primefaces.theme.version}org.primefaces.themes.humanity-${primefaces.theme.version}org.primefaces.themes.le-frog-${primefaces.theme.version} org.primefaces.themes.south-street-${primefaces.theme.version}org.primefaces.themes.sunny-${primefaces.theme.version}org.primefaces.themes.hot-sneaks-${primefaces.theme.version}org.primefaces.themes.cupertino-${primefaces.theme.version} org.primefaces.themes.trontastic-${primefaces.theme.version}org.primefaces.themes.excite-bike-${primefaces.theme.version} org.apache.maven.mercury.mercury-external-N/A org.primefaces.themes.redmond-${primefaces.theme.version}org.primefaces.themes.afterwork-${primefaces.theme.version}org.primefaces.themes.glass-x-${primefaces.theme.version}org.primefaces.themes.home-${primefaces.theme.version} org.primefaces.themes.black-tie-${primefaces.theme.version}org.primefaces.themes.eggplant-${primefaces.theme.version} org.apache.maven.mercury.mercury-repo-remote-m2-N/Aorg.apache.maven.mercury.mercury-md-sat-N/A org.primefaces.themes.ui-lightness-${primefaces.theme.version}org.primefaces.themes.midnight-${primefaces.theme.version}org.primefaces.themes.mint-choc-${primefaces.theme.version}org.primefaces.themes.afternoon-${primefaces.theme.version}org.primefaces.themes.dot-luv-${primefaces.theme.version}org.primefaces.themes.smoothness-${primefaces.theme.version}org.primefaces.themes.swanky-purse-${primefaces.theme.version}
    [Show full text]
  • The Geography of Job Tasks
    Working Papers WP 21-27 August 2021 https://doi.org/10.21799/frbp.wp.2021.27 The Geography of Job Tasks Enghin Atalay Federal Reserve Bank of Philadelphia Research Department Sebastian Sotelo University of Michigan-Ann Arbor Daniel Tannenbaum University of Nebraska-Lincoln ISSN: 1962-5361 Disclaimer: This Philadelphia Fed working paper represents preliminary research that is being circulated for discussion purposes. The views expressed in these papers are solely those of the authors and do not necessarily reflect the views of the Federal Reserve Bank of Philadelphia or the Federal Reserve System. Any errors or omissions are the responsibility of the authors. Philadelphia Fed working papers are free to download at: https://philadelphiafed.org/research-and-data/publications/working-papers. The Geography of Job Tasks ∗ Enghin Atalay Sebastian Sotelo Daniel Tannenbaum August 3, 2021 Abstract The returns to skills and the nature of work differ systematically across labor mar- kets of different sizes. Prior research has pointed to worker interactions, technological innovation, and specialization as key sources of urban productivity gains, but has been limited by the available data in its ability to fully characterize work across geographies. We study the sources of geographic inequality and present new facts about the geog- raphy of work using online job ads. We show that the (i) intensity of interactive and analytic tasks, (ii) technological requirements, and (iii) task specialization all increase with city size. The gradient for tasks and technologies exists both across and within occupations. It is also steeper for jobs requiring a college degree and for workers employed in non-tradable industries.
    [Show full text]
  • Information Extraction from Semi-Structured Documents Msci
    Kieran Brahney Information Extraction from Semi-Structured Documents MSci. Computer Science with Industrial Experience 05/06/2015 SCC421: Information Extraction from Semi-Structured Documents I certify that the material contained in this dissertation is my own work and does not contain unreferenced or unacknowledged material. I also warrant that the above statement applies to the implementation of the project and all associated documentation. Regarding the electronically submitted version of this submitted work, I consent to this being stored electronically and copied for assessment purposes, including the Department’s use of plagiarism detection systems in order to check the integrity of assessed work. I agree to my dissertation being placed in the public domain, with my name explicitly included as the author of the work. Date: _______________ Signed: _______________ Kieran Brahney (32857004) Page 2 of 55 SCC421: Information Extraction from Semi-Structured Documents ABSTRACT. Every hazardous chemical material is required to be accompanied by Safety Data Sheets (SDS). Employing a semi-structured format, they offer information and advice with regard to the safe handling and use of the product. In an effort to separate out specific data items, a variety of information extraction techniques were studied to utilise their best features and apply them to the SDS. An extensible and scalable Java-based system was designed, implemented and tested. The system supports information extraction using a user-provided lexical specification containing a series of regular expressions. The accuracy of the system was evaluated for each XML element on 40 ‘unseen’ safety data sheets. On structured textual elements, for example phrase codes, the system was able to reach accuracies between 85 – 100%.
    [Show full text]