Hybrid Messaging Solutions Openstack Summit Boston 2017 About Us

Total Page:16

File Type:pdf, Size:1020Kb

Hybrid Messaging Solutions Openstack Summit Boston 2017 About Us Hybrid Messaging Solutions OpenStack Summit Boston 2017 About Us... Ken Giusti ([email protected]) irc: kgiusti ● Apache Qpid Project ● Oslo.Messaging Developer ● Software Engineer at Red Hat Mark Wagner ([email protected]) ● Performance and Scale Engineering ● Performance Engineer at Red Hat Agenda ● Oslo.messaging (RPC and Notification Service Patterns) ● Backend Scenarios (Hybrid Messaging) ● The Drivers ● Testing Methodology and Results ● Next Steps... Oslo.Messaging Oslo.Messaging oslo.messaging oslo.messaging oslo.messaging oslo.messaging oslo.messaging oslo.messaging Messaging Bus ● Part of the OpenStack Oslo project that provides intra-service messaging ○ Remote Procedure Calls (RPC) & Notifications ● Abstraction that hides the details of the underlying messaging technology from the OpenStack Services Oslo.Messaging Services Caller Messaging Server Application System Application Notifications ● Asynchronous exchange from Notifier to Listener ● Listener need not be present when notification is sent ● Temporally decoupled ● Requires store-and-forward capability (e.g. queue or store) Remote Procedure Call (RPC) ● Synchronous exchange between client and server ● Temporally bracketed ● If Server not present, call should fail Notification - Broker Backed Messaging Notify OpenStack Listener Service . OpenStack Notify . Service Client Topic Queue Notify OpenStack Listener Service oslo.messaging oslo.messaging ● Notification API provides the ability to publish an event to a known topic ● Notification Listener(s) fetch the event when ready to consume ● Unidirectional and asynchronous - no traffic flows back to the notifier ● Queue store-and-forward capability is necessary for notify messaging pattern RPC - Broker Backed Messaging Request Queue OpenStack OpenStack RPC RPC Service Client Server Service Reply Queue oslo.messaging oslo.messaging ● RPC transaction takes place across 4 discrete message ownership transfers ○ Requires two queues with associated resource overhead ● Detachment of client and server ○ Messages are acknowledged but not guaranteed to be delivered ● Clients do not readily know when a server is “unavailable” RPC - Direct Messaging OpenStack OpenStack RPC RPC Service Client Server Service oslo.messaging oslo.messaging ● End-to-end transfer of message ownership ○ No store-and-forward message queueing (stateless intermediaries) ● Logical link between clients and server ● Clients can immediately know when a server is “unavailable” Oslo.Messaging API API Notification RPC Service Service Shared Messaging Stuff Transport (driver) Message Bus Oslo.Messaging API API Notification RPC Service Service Shared Messaging Stuff RPC Transport Notification Transport Message Message Bus Bus Oslo.Messaging API API get_transport(URL) get_rpc_transport(URL) [TBD] get_notification_transport(URL) Notification RPC Service Service Shared Messaging Stuff RPC Transport Notification Transport Message Message Bus Bus Service Configuration (e.g. nova.conf, etc.) [DEFAULT] transport_url=rabbit://rpc_user:rpc_pw@rpc_host:rpc_port ... [oslo_messaging_notifications] transport_url=rabbit://notify_user:notify_pw@notify_host:notify_port Single Backend OpenStack Service OpenStack Service OpenStack Service OpenStack Service oslo.messaging oslo.messaging oslo.messaging oslo.messaging RPC Notify RPC Notify RPC Notify RPC Notify Broker Cluster Dual Backend OpenStack Service OpenStack Service OpenStack Service OpenStack Service oslo.messaging oslo.messaging oslo.messaging oslo.messaging RPC Notify RPC Notify RPC Notify RPC Notify Broker Cluster Broker Cluster “Hybrid” Backend OpenStack Service OpenStack Service OpenStack Service OpenStack Service oslo.messaging oslo.messaging oslo.messaging oslo.messaging RPC Notify RPC Notify RPC Notify RPC Notify Direct Broker Cluster Messaging Benefits of Hybrid Messaging ● Optimal alignment of messaging patterns to messaging backend ○ Peer-to-peer messaging for RPC services (direct) ■ “Fail fast - fail clean” ○ Store-and-Forward for Notification services (queueing) ■ Mirroring and message persistence ○ Increase scale - Increase performance ● Diverse Topologies & Alternative Messaging Technology ○ Centralized brokers (hub-n-spoke) ○ Distributed architectures ○ Messaging as a Service Alternative Oslo Messaging Transports ● ZeroMQ Socket Library ● AMQP 1.0 Protocol Transport + Qpid Dispatch Router ● Apache Kafka (experimental) ○ Notifications only (no RPC support) ZeroMQ Transport ● Dedicated TCP connection between client and server ● Matchmaker (Redis) - maps topics ←→ host addresses ● TCP concentrator to limit TCP resource consumption ● Deployer’s guide: https://docs.openstack.org/developer/oslo.messaging/zmq_driver.html Client Client Server A Server A Client Client Proxy Client Client Server B Client Server B Client AMQP 1.0 + Message Router ● Network of “message routers” ● Routers learn location of servers - optimal shortest path ● Stateless (no queueing) - end to end transfers ● Barcelona: https://www.youtube.com/watch?v=R0fwHr8XC1I ● Deployer’s guide: https://docs.openstack.org/developer/oslo.messaging/AMQP1.0.html Client Server C Server A Client Client Server B Client Client Oslo.messaging matrix (drivers, backend, patterns) Driver Messaging Backend Type RPC Notification Backend Rabbit rabbitmq-server Broker (kombu,pika) AMQP 1.0 qdrouterd Direct Messaging ZMQ TCP, proxied Direct Messaging Kafka kafka server Broker-like Distributed Streaming Test Methodology Testing Methodology ● Objective: benchmark RPC using both direct and queued approaches ○ Observe behavior and quantify ● Tool - Oslo Messaging Benchmark Tool ○ A driver independent tool for distributed messaging load generation and measurement ○ https://github.com/kgiusti/ombt ○ Does NOT simulate Openstack project(s) traffic patterns ● Scenarios ○ Separation of RPC and Notification traffic ○ Scoped to rabbit:// and amqp:// drivers for now ● Traffic modeling assumptions ○ RPC-Notify Message ratios, Producer-Consumer Ratios, Message Payload size ● Single server deployments for this testing phase ○ Clusters and meshes planned for follow-up scale and resiliency comparisons Scenario 1 - Single Broker Backend ombt controller rpc rpc clients servers rabbit Notify Notify clients listeners ● Broker used for both RPC and Notifications - not durable ● Messaging Assumptions ○ RPC-Notify Traffic ratios - 50/50 ○ Producer/Consumer ratios - 2/1 ○ Payload size - 1K Scenario 2 - Single Broker Backend - Durable ombt controller rpc rpc clients servers rabbit Notify Notify clients listeners ● Broker used for both RPC and Notifications ● Messaging Assumptions ○ RPC-Notify Traffic ratios - 50/50 ○ Producer/Consumer ratios - 2/1 ○ Payload size - 1K Scenario 3 - Separate Broker Backend ombt controller rpc rpc clients rabbit servers Notify Notify clients rabbit listeners ● Dual rabbitmq-server backends ● Notifications measured with persistence Scenario 4 - Hybrid Messaging Backend ombt controller rpc rpc clients amqp servers Notify Notify clients rabbit listeners ● Direct Messaging Backend used for RPC (amqp:// and qpid-dispatch-router) ● Broker Backend used for Notifications (rabbit:// and rabbitmq-server) Final Comparison Next Steps ● Try it out ○ AMQP 1.0 devstack plugin supports a hybrid configuration “qpid-hybrid” ■ https://git.openstack.org/openstack/devstack-plugin-amqp1 ○ Zmq devstack plugin: https://git.openstack.org/openstack/devstack-plugin-zmq ● [developers] ○ “Messaging” != “Queueing” and oslo.messaging >= rabbitmq ○ Use “get_notification_transport” and “get_rpc_transport” - keep ‘em separated... ○ Get Involved! - oslo.messaging needs you! ● Expand hybrid messaging scenarios in gate checks ● Test and measure additional hybrid scenarios ○ Plausible driver-backend combinations for RPC and Notifications ● Improve the ease-of-use and configuration of hybrid backends ○ Make it easy for the operator to deploy and get immediate value.
Recommended publications
  • Analysis of Notification Methods with Respect to Mobile System Characteristics
    Proceedings of the Federated Conference on DOI: 10.15439/2015F6 Computer Science and Information Systems pp. 1183–1189 ACSIS, Vol. 5 Analysis of notification methods with respect to mobile system characteristics Piotr Nawrocki ∗, Mikołaj Jakubowski † and Tomasz Godzik ‡ ∗AGH University of Science and Technology, al. A. Mickiewicza 30, 30-059 Krakow, Poland e-mail:[email protected] †e-mail:[email protected] ‡e-mail:[email protected] Abstract—Recently, there has been an increasing need for most promise and therefore the purpose is to discern their secure, efficient and simple notification methods for mobile usefulness in the best way possible. systems. Such systems are meant to provide users with precise In addition to the protocols and methods above, we inves- tools best suited for work or leisure environments and a lot of effort has been put into creating a multitude of mobile tigated other solutions, such as the Apple push notification applications. However, not much research has been put at the or Line application which, for various reasons, were not same time into determining which of the available protocols considered further. The Apple push notification technology is a are best suited for individual tasks. Here a number of basic good solution, but it is proprietary, i.e. limited to Apple devices notification methods are presented and tests are performed for and that is why we decided to test more universal solutions the most promising ones. An attempt is made to determine which methods have the best throughput, latency, security and other first. There are also solutions (applications) that use their own characteristics.
    [Show full text]
  • Tracking Known Security Vulnerabilities in Third-Party Components
    Tracking known security vulnerabilities in third-party components Master’s Thesis Mircea Cadariu Tracking known security vulnerabilities in third-party components THESIS submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE by Mircea Cadariu born in Brasov, Romania Software Engineering Research Group Software Improvement Group Department of Software Technology Rembrandt Tower, 15th floor Faculty EEMCS, Delft University of Technology Amstelplein 1 - 1096HA Delft, the Netherlands Amsterdam, the Netherlands www.ewi.tudelft.nl www.sig.eu c 2014 Mircea Cadariu. All rights reserved. Tracking known security vulnerabilities in third-party components Author: Mircea Cadariu Student id: 4252373 Email: [email protected] Abstract Known security vulnerabilities are introduced in software systems as a result of de- pending on third-party components. These documented software weaknesses are hiding in plain sight and represent the lowest hanging fruit for attackers. Despite the risk they introduce for software systems, it has been shown that developers consistently download vulnerable components from public repositories. We show that these downloads indeed find their way in many industrial and open-source software systems. In order to improve the status quo, we introduce the Vulnerability Alert Service, a tool-based process to track known vulnerabilities in software projects throughout the development process. Its usefulness has been empirically validated in the context of the external software product quality monitoring service offered by the Software Improvement Group, a software consultancy company based in Amsterdam, the Netherlands. Thesis Committee: Chair: Prof. Dr. A. van Deursen, Faculty EEMCS, TU Delft University supervisor: Prof. Dr. A.
    [Show full text]
  • Vimal Daga Chief Technical Officer (CTO) – Linuxworld Informatics Pvt Ltd Professional Experience & Certifications
    Vimal Daga Chief Technical Officer (CTO) – LinuxWorld Informatics Pvt Ltd Professional Experience & Certifications: I Professional Experience During this period, has been engaged with various corporate clients on different domains and has been involved in imparting corporate Training programs and Consultancy for various technologies that covers the following: A. Sr. Machine Learning / Deep Learning / Data Scientist / NLP Consultant and Researcher Expertise in the field of Artificial Intelligence, Deep Learning, and Computer Vision and having ability to solve problems such as Face Detection, Face Recognition and Object Detection using Deep Neural Network (CNN, DNN, RNN, Convolution Networks etc.) and Optical Character Detection and Recognition (OCD & OCR) Worked in tools such as Tensorflow, Caffe/Caffe2, Keras, Theano, PyTorch etc. Build prototypes related to deep learning problems in the field of computer vision. Publications at top international conferences/ journals in fields related to computer vision/deep learning/machine learning / AI Experience on tools, frameworks like Microsoft Azure ML, Chat Bot Framework/LUIS . IBM Watson / ConversationService, Google TensorFlow / Python for Machine Learning (e.g. scikit-learn),Open source ML libraries and tools like Apache Spark Highly Worked on Data Science, Big Data,datastructures, statistics , algorithms like Regression, Classification etc. Working knowlegde of Supervised / Unsuperivsed learning (Decision Trees, Logistic Regression, SVMs,GBM, etc) Expertise in Sentiment Analysis, Entity Extraction, Natural Language Understanding (NLU), Intent recognition Strong understanding of text pre-processing and normalization techniques, such as tokenization, POS tagging, and parsing, and how they work at a basic level and NLP toolkits as NLTK, Gensim,, Apac SpaCyhe UIMA etc. I have Hands on experience related to Datasets such as or including text, images and other logs or clickstreams.
    [Show full text]
  • AMQP and Rabbitmq Message Queuing As an Integration Mechanism
    2/19/2018 MQTT A protocol for communicating with your things David Brinnen and Brett Cameron SESAM Seminar, March 2018 Abstract The Internet of Things refers to the ever-growing network of physical devices that have IP connectivity, allowing them to connect to the internet, and the communication that occurs between these devices and other internet-enabled devices and systems. In this talk, Brett and David will introduce the Internet of Things and will discuss some of the key technologies associated with the creation of Internet of Things solutions and services within the manufacturing domain. Particular attention will be given to MQTT, which is gaining acceptance as the preferred protocol for use by the Internet of Things applications. Currently available implementations of MQTT will be briefly reviewed and case studies illustrating the application of the protocol will be presented. Some examples of how MQTT might be used to implement secure, fault-tolerant, and scalable Internet of Things solutions will be discussed, and the application and use of MQTT in Next Generation SCADA systems that need to monitor and control devices at scale in a connected world will be will be discussed and demonstrated. 2 2/19/2018 About David Since completing his Masters studies in Embedded Software at the Swedish Royal Institute of Technology 2015 (KTH), David has been working as a software engineer across a range of projects, including the implementation of control logical and the commissioning of Energy Machines integrated energy systems and air handling units (https://www.energymachines.com/), and the development of a next-generation SADA platform (ControlMachines™) and simulation software to control, monitor, and model Energy Machines deployments.
    [Show full text]
  • Return of Organization Exempt from Income
    OMB No. 1545-0047 Return of Organization Exempt From Income Tax Form 990 Under section 501(c), 527, or 4947(a)(1) of the Internal Revenue Code (except black lung benefit trust or private foundation) Open to Public Department of the Treasury Internal Revenue Service The organization may have to use a copy of this return to satisfy state reporting requirements. Inspection A For the 2011 calendar year, or tax year beginning 5/1/2011 , and ending 4/30/2012 B Check if applicable: C Name of organization The Apache Software Foundation D Employer identification number Address change Doing Business As 47-0825376 Name change Number and street (or P.O. box if mail is not delivered to street address) Room/suite E Telephone number Initial return 1901 Munsey Drive (909) 374-9776 Terminated City or town, state or country, and ZIP + 4 Amended return Forest Hill MD 21050-2747 G Gross receipts $ 554,439 Application pending F Name and address of principal officer: H(a) Is this a group return for affiliates? Yes X No Jim Jagielski 1901 Munsey Drive, Forest Hill, MD 21050-2747 H(b) Are all affiliates included? Yes No I Tax-exempt status: X 501(c)(3) 501(c) ( ) (insert no.) 4947(a)(1) or 527 If "No," attach a list. (see instructions) J Website: http://www.apache.org/ H(c) Group exemption number K Form of organization: X Corporation Trust Association Other L Year of formation: 1999 M State of legal domicile: MD Part I Summary 1 Briefly describe the organization's mission or most significant activities: to provide open source software to the public that we sponsor free of charge 2 Check this box if the organization discontinued its operations or disposed of more than 25% of its net assets.
    [Show full text]
  • PROJECT REPORT IT2901 - Informatics Project II
    PROJECT REPORT IT2901 - Informatics Project II Group 02 - FFI Publish/Subscribe Written by: Fredrik Christoffer Berg Kristoffer Andreas Breiland Dalby Hakon˚ Ødegard˚ Løvdal Aleksander Skraastad Fredrik Borgen Tørnvall Trond Walleraunet Spring 2015 Norwegian University of Science and Technology Page intentionally left blank. We would like to extend our gratitude to Frank Trethan Johnsen and Trude Hafsøe Bloebaum from the Norwegian Defence Research Establishment for their excellent collaboration throughout the lifetime of this project. We would also like to thank our supervisor Alfredo Perez Fernandez, for his excellent feedback and help on the process and report parts of the project. At last we would like to thank Marianne Valstad for proofreading and giving feedback on the report. Abstract This report describes the work done during the course IT2901 - Informatics Project II. Our customer was the Norwegian Defence Research Establishment, which is a governmental organization responsible for research and development for the Norwegian Armed Forces. The assignment was to make an application that translates between different publish/sub- scribe protocols used in the Norwegian Armed Forces and the North Atlantic Treaty Or- ganisation. The relevant protocols were the Web Services Notification protocol, Advanced Messaging Queuing Protocol, Message Queue Telemetry Transport and ZeroMQ. FFI and The Norwegian Armed Forces needed such a broker in order to participate in federated mission networking. This report mainly focuses on the development process.
    [Show full text]
  • BUILDING RESILIENT MICROSERVICES with APACHE QPID PROTON
    BUILDING RESILIENT MICROSERVICES with APACHE QPID PROTON Richard Li Rafael Schloming datawire.io • MICROSERVICES • DESIGNING MICROSERVICES • DEMO • WRAP UP • Release any time • You’re responsible for reliability, availability, scalability, security • You’re also responsible for monitoring, billing, user admin, … Idiot proof deploy Homogenous tech stack Minimize upgrade Synchronized release frequency ACID; 1 simultaneous Easy for vendor to debug release Ship as fast as possible Continuous delivery Lots of functional Design/build in parts breadth Reliability, availability, Resilient system design security, scale Continuous delivery Design/build in parts Microservices. Resilient system design Componentization via Services Organized around Business Capabilities Products not Projects Smart endpoints and dumb pipes Decentralized Governance Decentralized Data Management Infrastructure Automation Design for failure Evolutionary Design http://martinfowler.com/articles/microservices.html DESIGNING MICROSERVICES Monolith 1. Send a tweet. 2. Get followers. 3. Publish tweet. App server Three App Servers 1. Send a tweet. 2. Get followers. 3. Publish tweet. App server App Servers + Asynchronous Queue 1. Send a tweet. 4. Get followers. 3. Process new tweets. 2. Queue tweet for sending. 5. Publish tweet. Not a typical app server App Servers + Asynchronous Queue 4. Get followers. Recommend followers 1. Send a tweet. 3. Process new tweets. 2. Queue tweet for sending. 5. Publish tweet. Not a typical app server Fully Asynchronous Publish changes to followers
    [Show full text]
  • Full-Graph-Limited-Mvn-Deps.Pdf
    org.jboss.cl.jboss-cl-2.0.9.GA org.jboss.cl.jboss-cl-parent-2.2.1.GA org.jboss.cl.jboss-classloader-N/A org.jboss.cl.jboss-classloading-vfs-N/A org.jboss.cl.jboss-classloading-N/A org.primefaces.extensions.master-pom-1.0.0 org.sonatype.mercury.mercury-mp3-1.0-alpha-1 org.primefaces.themes.overcast-${primefaces.theme.version} org.primefaces.themes.dark-hive-${primefaces.theme.version}org.primefaces.themes.humanity-${primefaces.theme.version}org.primefaces.themes.le-frog-${primefaces.theme.version} org.primefaces.themes.south-street-${primefaces.theme.version}org.primefaces.themes.sunny-${primefaces.theme.version}org.primefaces.themes.hot-sneaks-${primefaces.theme.version}org.primefaces.themes.cupertino-${primefaces.theme.version} org.primefaces.themes.trontastic-${primefaces.theme.version}org.primefaces.themes.excite-bike-${primefaces.theme.version} org.apache.maven.mercury.mercury-external-N/A org.primefaces.themes.redmond-${primefaces.theme.version}org.primefaces.themes.afterwork-${primefaces.theme.version}org.primefaces.themes.glass-x-${primefaces.theme.version}org.primefaces.themes.home-${primefaces.theme.version} org.primefaces.themes.black-tie-${primefaces.theme.version}org.primefaces.themes.eggplant-${primefaces.theme.version} org.apache.maven.mercury.mercury-repo-remote-m2-N/Aorg.apache.maven.mercury.mercury-md-sat-N/A org.primefaces.themes.ui-lightness-${primefaces.theme.version}org.primefaces.themes.midnight-${primefaces.theme.version}org.primefaces.themes.mint-choc-${primefaces.theme.version}org.primefaces.themes.afternoon-${primefaces.theme.version}org.primefaces.themes.dot-luv-${primefaces.theme.version}org.primefaces.themes.smoothness-${primefaces.theme.version}org.primefaces.themes.swanky-purse-${primefaces.theme.version}
    [Show full text]
  • Location Independent Inter-Process Communication As Software Buses
    Location independent inter-process communication as software buses Erik Samuelsson Erik Samuelsson VT 2016 Bachelor Thesis, 15 credits Supervisor: Cristian Klein, Ewnetu Bayuh Lakew Extern Supervisor: Clas H¨ogvall, Rickard Sj¨ostr¨om, Johan Forsman Examiner: Jerry Eriksson Bachelor’s Programme in Computing Science, 180 credits Abstract Telecommunication networks will transform and gradually migrate into virtualized cloud environments as a result of the potential for higher profitability through reduced costs and increased revenues. The purpose of this thesis is to investigate architectural mecha- nisms for location-independent communication between software components in a virtualized base station. Systems that provide such mechanisms are typically referred to as middleware and de- ployed as Platform-as-a-Service (PaaS). The overall goal is to achieve desired characteristics in cloud deployment regarding on- demand self-service, rapid elasticity of capacity while upholding services and high availability. Four communication protocols are examined and evaluated based on a set of functional and non- functional requirements that are especially relevant for a virtu- alized base station. In comparison with the Advanced Message Queuing Protocol (AMQP), Message Queuing Telemetry Trans- port (MQTT) and the eXtensible Messaging and Presence Pro- tocol (XMPP), the Data Distribution Service (DDS) standard is found to have excellent performance characteristics. Its complex- ity might have implications for the development and deployment though, that will increase the time it takes to reap the benefits from its advantages. Acknowledgements I wish to sincerely thank my supervisors at Tieto, Clas H¨ogvall, Rickard Sj¨ostr¨om and Johan Forsman for introducing an interesting and challenging thesis idea and for entrusting me with the task.
    [Show full text]
  • Industry Paper: Kafka Versus Rabbitmq a Comparative Study of Two Industry Reference Publish/Subscribe Implementations
    Industry Paper: Kafka versus RabbitMQ A comparative study of two industry reference publish/subscribe implementations Philippe Dobbelaere Kyumars Sheykh Esmaili Nokia Bell Labs Nokia Bell Labs Antwerp, Belgium Antwerp, Belgium ABSTRACT 1 INTRODUCTION Publish/subscribe is a distributed interaction paradigm well adapted e Internet has considerably changed the scale of distributed sys- to the deployment of scalable and loosely coupled systems. tems. Distributed systems now involve thousands of entities po- Apache Kaa and RabbitMQ are two popular open-source and tentially distributed all over the world whose location and behav- commercially-supported pub/sub systems that have been around for ior may greatly vary throughout the lifetime of the system. ese almost a decade and have seen wide adoption. Given the popularity constraints underline the need for more exible communication of these two systems and the fact that both are branded as pub/sub models and systems that reect the dynamic and decoupled na- systems, two frequently asked questions in the relevant online ture of the applications. Individual point-to-point and synchronous forums are: how do they compare against each other and which communications lead to rigid and static applications, and make the one to use? development of dynamic large-scale applications cumbersome [14]. In this paper, we frame the arguments in a holistic approach by To reduce the burden of application designers, the glue between establishing a common comparison framework based on the core the dierent entities in such large-scale seings should rather be functionalities of pub/sub systems. Using this framework, we then provided by a dedicated middleware infrastructure, based on an ad- venture into a qualitative and quantitative (i.e.
    [Show full text]
  • Information Extraction from Semi-Structured Documents Msci
    Kieran Brahney Information Extraction from Semi-Structured Documents MSci. Computer Science with Industrial Experience 05/06/2015 SCC421: Information Extraction from Semi-Structured Documents I certify that the material contained in this dissertation is my own work and does not contain unreferenced or unacknowledged material. I also warrant that the above statement applies to the implementation of the project and all associated documentation. Regarding the electronically submitted version of this submitted work, I consent to this being stored electronically and copied for assessment purposes, including the Department’s use of plagiarism detection systems in order to check the integrity of assessed work. I agree to my dissertation being placed in the public domain, with my name explicitly included as the author of the work. Date: _______________ Signed: _______________ Kieran Brahney (32857004) Page 2 of 55 SCC421: Information Extraction from Semi-Structured Documents ABSTRACT. Every hazardous chemical material is required to be accompanied by Safety Data Sheets (SDS). Employing a semi-structured format, they offer information and advice with regard to the safe handling and use of the product. In an effort to separate out specific data items, a variety of information extraction techniques were studied to utilise their best features and apply them to the SDS. An extensible and scalable Java-based system was designed, implemented and tested. The system supports information extraction using a user-provided lexical specification containing a series of regular expressions. The accuracy of the system was evaluated for each XML element on 40 ‘unseen’ safety data sheets. On structured textual elements, for example phrase codes, the system was able to reach accuracies between 85 – 100%.
    [Show full text]
  • A Survey of Distributed Message Broker Queues
    A Survey of Distributed Message Broker Queues Vineet John Xia Liu University of Waterloo University of Waterloo [email protected] [email protected] ABSTRACT This paper surveys the message brokers that are in vogue today for distributed communication. Their primary goal is to facilitate the construction of decentralized topolo- gies without single points of failure, enabling fault tol- erance and high availability. These characteristics make them optimal for usage within distributed architectures. However, there are multiple protocols built to achieve this, and it would be beneficial to have a empirical comparison between their features and performance to determine their real-world applicability. Figure 1: Kafka Architecture This paper focuses on two popular protocols (Kafka and AMQP) and explores the divergence in their fea- RQ0 What are the message broker implementations tures as well as their performance under varied testing commonly in use today? workloads. RQ1 What are the common requirements for the im- plementation of message queues? KEYWORDS RQ2 What are the divergent functionalities in the distributed message broker, message queue, kafka, amqp, current message queue offerings? rabbitmq RQ3 How do each of the implementations offer relia- bility, partitioning and fault tolerance? 1 INTRODUCTION 3 KAFKA Kafka was developed at LinkedIn and primarily used Distributed Message Brokers are typically used to decou- for log processing. This worked well for Kafka’s user en- ple separate stages of a software architecture. They per- gagement metrics collection use-case. The fundamental mit communication between these stages asynchronously, features behind Kafka are performance over reliability by using the publish-subscribe paradigm.[1]. These mes- and it offers high throughput, low latency message queu- sage brokers are also finding new applications in the ing.
    [Show full text]