2007:071 MASTER'S THESIS Stock Trend Prediction Using News Articles A Text Mining Approach Pegah Falinouss Luleå University of Technology Master Thesis, Continuation Courses Marketing and e-commerce Department of Business Administration and Social Sciences Division of Industrial marketing and e-commerce 2007:071 - ISSN: 1653-0187 - ISRN: LTU-PB-EX--07/071--SE Stock Trend Prediction Using News Articles A Text Mining Approach Supervisors: Dr. Mohammad Sepehri Dr. Moez Limayem Prepared by: Pegah Falinouss Tarbiat Modares University Faculty of Engineering Department of Industrial Engineering Luleå University of Technology Department of Business Administration and Social Sciences Division of Industrial Marketing and E-Commerce MSc PROGRAM IN MARKETING AND ELECTRONIC COMMERCE Joint 2007 Abstract Stock market prediction with data mining techniques is one of the most important issues to be investigated. Mining textual documents and time series concurrently, such as predicting the movements of stock prices based on the contents of the news articles, is an emerging topic in data mining and text mining community. Previous researches have shown that there is a strong relationship between the time when the news stories are released and the time when the stock prices fluctuate. In this thesis, we present a model that predicts the changes of stock trend by analyzing the influence of non-quantifiable information namely the news articles which are rich in information and superior to numeric data. In particular, we investigate the immediate impact of news articles on the time series based on the Efficient Markets Hypothesis. This is a binary classification problem which uses several data mining and text mining techniques. For making such a prediction model, we use the intraday prices and the time- stamped news articles related to Iran-Khodro Company for the consecutive years of 1383 and 1384. A new statistical based piecewise segmentation algorithm is proposed to identify trends on the time series. The news articles are preprocessed and are labeled either as rise or drop by being aligned back to the segmented trends. A document selection heuristics that is based on the chi-square estimation is used for selecting the positive training documents. The selected news articles are represented using the vector space modeling and tfidf term weighting scheme. Finally, the relationship between the contents of the news stories and trends on the stock prices are learned through support vector machine. Different experiments are conducted to evaluate various aspects of the proposed model and encouraging results are obtained in all of the experiments. The accuracy of the prediction model is equal to 83% and in comparison with news random labeling with 51% of accuracy; the model has increased the accuracy by 30%. The prediction model predicts 1.6 times better and more correctly than the news random labeling. 1 Acknowledgment There are many individuals who contributed to the production of this thesis through their moral and technical support, advice, or participation. I am indebted to my supervisors Dr. Mehdi Sepehri and Dr. Moez Limayem for their patience, careful supervision, and encouragement throughout the completion of my thesis project. It has been both a privilege and a pleasure to have experienced the opportunity to be taught by two leading international scholars. I sincerely thank you both for being the sort of supervisors every student needs - astute, supportive, enthusiastic, and inspiring. The ideal role models for a beginning academic and the best possible leading academics to supervise an ambitious enhancement study. I would like to express my appreciation to Dr. Babak Teimourpour, the PhD student in Industrial Engineering in Tarbiat Modares University. He has been of great help, support, and encouragement in accomplishing the research process. I would also like to express my gratitude to Tehran Stock Exchange Services Company for their cooperation in providing the data from their databases. Finally, I would like to thank my family and friends and especially my husband for his understanding, encouragement, and support over the completion and fulfillment of my research project. I would like to dedicate my thesis to my parents and my husband. 2 Table of Content Abstract .......................................................................................................... 1 Acknowledgment ........................................................................................... 2 List of Table ................................................................................................... 6 List of Figure ................................................................................................. 7 Chapter 1: Introduction and Preface .......................................................... 8 1.1 Considerations and Background ............................................................................... 8 1.2 The Importance of Study ........................................................................................ 11 1.3 Problem Statement .................................................................................................. 12 1.4 Research Objective ................................................................................................. 13 1.5 Tehran Stock Exchange (TSE) ................................................................................ 14 1.6 Research Orientation ............................................................................................... 14 Chapter 2: Literature Review .................................................................... 15 2.1 Knowledge Discovery in Databases (KDD) ........................................................... 15 2.1.1 Knowledge Discovery in Text (KDT) ............................................................. 17 2.1.2 Data Mining Vs. Text Mining .......................................................................... 18 2.1.3 The Burgeoning Importance of Text Mining ................................................... 18 2.1.4 Main Text Mining Operations ......................................................................... 20 2.2 Stock Market Movement ......................................................................................... 20 2.2.1 Theories of Stock Market Prediction ............................................................... 20 2.2.1.1 Efficient Market Hypothesis (EMH) ...................................................... 21 2.2.1.2 Random Walk Theory ............................................................................. 21 2.2.2 Approaches to Stock Market Prediction .......................................................... 22 2.2.2.1 Technicians Trading Approach ............................................................... 22 2.2.2.2 Fundamentalist Trading Approach ......................................................... 23 2.2.3 Influence of News Articles on Stock Market ................................................... 24 2.3 The Scope of Literature Review ............................................................................. 25 2.3.1 Text Mining Contribution in Stock Trend Prediction ...................................... 26 2.3.2 Review of Major Preliminaries ........................................................................ 27 2.4 Chapter Summary ................................................................................................... 40 Chapter 3: Time Series Preprocessing ...................................................... 42 3.1 Time Series Data Mining ........................................................................................ 42 3.1.1 On Need of Time Series Data Mining ............................................................. 43 3.1.2 Major Tasks in Time Series Data Mining ........................................................ 44 3.2 Time Series Representation .................................................................................... 44 3.2.1 Piecewise Linear Representation (PLR) .......................................................... 45 3 3.2.2 PLR Applications in Data Mining Context ...................................................... 46 3.2.3 Piecewise Linear Segmentation algorithms ..................................................... 47 3.2.4 Linear Interpolation vs. Linear Regression ...................................................... 49 3.2.5 Stopping Criterion and the Choice of Error Norm ........................................... 49 3.2.6 “Split and Merge” Algorithm ........................................................................... 51 3.3 Summary ................................................................................................................. 52 Chapter 4: Literature on Text Categorization Task ............................... 53 4.1 Synopsis of Text Categorization Problem .............................................................. 53 4.1.1 Importance of Automated Text Categorization ............................................... 54 4.1.2 Text Categorization Applications .................................................................... 55 4.1.3 Text Categorization General Process ............................................................... 56 4.2 Text Preprocessing .................................................................................................. 57 4.3 Dimension & Feature Reduction Techniques ......................................................... 58 4.3.1
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages165 Page
-
File Size-