Enhancing Real-Time Twitter Filtering and Classification using a Semi-Automatic Dynamic Machine Learning setup approach Master’s Thesis Nick de Jong Enhancing Real-Time Twitter Filtering and Classification using a Semi-Automatic Dynamic Machine Learning setup approach THESIS submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE TRACK SOFTWARE TECHNOLOGY by Nick de Jong born in Rotterdam, 1988 Web Information Systems Department of Software Technology Faculty EEMCS, Delft University of Technol- CrowdSense ogy Wilhelmina van Pruisenweg 104 Delft, the Netherlands The Hague, the Netherlands http://wis.ewi.tudelft.nl http://www.twitcident.com c 2015 Nick de Jong Enhancing Real-Time Twitter Filtering and Classification using a Semi-Automatic Dynamic Machine Learning setup approach Author: Nick de Jong Student id: 1308130 Email:
[email protected] Abstract Twitter contains massive amounts of user generated content that also con- tains a lot of valuable information for various interested parties. Twitcident has been developed to process and filter this information in real-time for interested parties by monitoring a set of predefined topics, exploiting humans as sensors. An analysis of the relevant information by an operator can result in an estimation of severity, and an operator can act accordingly. However, among all relevant and useful content that is extracted, also a lot of irrelevant noise is present. Our goal is to improve the filter in such a way that the majority of information pre- sented by Twitcident is relevant. To this end we designed an artifact consisting of several components, developed within a dynamic framework.