Sample Dissertation Format
Total Page:16
File Type:pdf, Size:1020Kb
ECOS Project: Using Data Analysis to Improve Marketing Response at D.C. Thomson Tom Davenport September 2016 Dissertation submitted in partial fulfilment for the degree of Master of Science in Big Data Computing Science and Mathematics University of Stirling Abstract Data analysis tools and techniques provide a substantial opportunity for business. Compa- nies have become able to store more information not only about user’s personal information, but also around their interaction with the company, through their response to emails and to how they behave on their site. This has created a vast pool of data that when analysed ap- propriately can provide useful evidence for marketing decision making. The data however needs to be put through a series of processes to ensure that it is of a sufficient quality that analysis is grounded in fact and will be of business value. This project was undertaken for the DC Thomson company to examine the response to emails sent during the Christmas season from October to December of 2015. It created a series of data models that would highlight what important factors were involved in making an email successful, and whether there were patterns amongst groups of users that purchased from emails. The objectives of the project were to create one model that could reflect the structural factors that made emails successful, and another model that would show what fac- tors had an influence in leading a customer to purchase. It involved a multi-level analysis of different sources of data, creating and assessing the quality of datasets that could be used to generate data models. The approach to modelling involved examining current methods of email data mining, and following the CRISP DM standard for approaching data mining projects. The development of the project was undertaken within an Agile environment, using daily stand up meetings and weekly Sprint meetings. Different methods for creating the analysis were considered, with the majority of data cleaning being undertaken in R, data scraping and merging using Pandas in Python. The final models were developed using R. These models generated were decision trees, a white box model that can effectively show the relationships within the data that lead to a particular outcome. The results of the project were presented to senior management. This included suggestions of how they should test campaign emails in future, by what structural influences the model had suggested generated a good response. An audit of missing data from the user campaign was also presented, showing the importance of each variable and its proportion of missing- ness from the set. The process of building these models was documented effectively which will allow them to be recreated and extended with further detail to study other campaigns. The suggestions from the models will have an impact on the organisation’s decision making process in the future. - i - Attestation I understand the nature of plagiarism, and I am aware of the University’s policy on this. I certify that this dissertation reports original work by me during my University project ex- cept for the following: The removeduplicatesorderedSeq function in section 4.1.1 is adapated from [26]. The postcode validation expression used in section 4.2.2 is from [32]. Signature Date - ii - Acknowledgements I would like to thank the Datalab for their funding and the opportunity to complete this de- gree. I’d like to thank my supervisors Chris Kelly and Andrea Bracialli for all their time and advice. I’d also like to thank my parents for all their support over the last two years. - iii - Table of Contents Abstract ................................................................................................................................. i Attestation............................................................................................................................. ii Acknowledgements ............................................................................................................. iii Table of Contents ................................................................................................................. iv List of Figures...................................................................................................................... vi 1 Introduction ..................................................................................................................... 1 1.1 Background and Context ......................................................................................... 1 1.2 Scope and Objectives ............................................................................................... 2 1.3 Achievements .......................................................................................................... 3 1.4 Overview of Dissertation ......................................................................................... 4 2 State-of-The-Art .............................................................................................................. 5 2.1 Theory ...................................................................................................................... 5 2.2 Literature Review .................................................................................................... 9 3 Approach ....................................................................................................................... 12 3.1 Project Methods ..................................................................................................... 12 3.2 Project Structure .................................................................................................... 13 3.3 Process Model ........................................................................................................ 14 3.4 Underlying Technologies ....................................................................................... 17 4 Solution ......................................................................................................................... 19 4.1 Campaigns ............................................................................................................. 19 4.1.1 Data Scraping ................................................................................................... 19 4.1.2 Data Exploration ............................................................................................... 29 4.1.2.1 Creating Success Metrics .......................................................................... 29 4.1.2.2 Campaign Outliers .................................................................................... 32 4.1.2.3 Correlations ............................................................................................... 34 4.1.3 Data Modelling ................................................................................................. 36 4.2 Users ...................................................................................................................... 41 4.2.1 Data Exploration ............................................................................................... 41 4.2.2 Data Preparation ............................................................................................... 42 4.2.3 Sampling ........................................................................................................... 52 4.2.4 Data Modelling ................................................................................................. 53 5 Results ........................................................................................................................... 57 5.1 Campaigns ............................................................................................................. 57 5.2 Users ...................................................................................................................... 59 - iv - 6 Summary and Conclusion ............................................................................................. 63 6.1 Summary ................................................................................................................ 63 6.2 Critical Assessment of Project ............................................................................... 64 6.3 Recommendations for Future Work ....................................................................... 65 6.4 Conclusion ............................................................................................................. 66 References .......................................................................................................................... 67 Appendix 1 – Campaign Variables ..................................................................................... 71 Appendix 2 – User Variables .............................................................................................. 72 Appendix 3 – Larger Versions of Diagrams ....................................................................... 74 - v - List of Figures Figure 1 Sailthru Email Response Example ............................................................................. 6 Figure 2 CRISP-DM Data Mining Project Structure.[18] ...................................................... 14 Figure 3 Email Data Store ...................................................................................................... 19 Figure 4 Example Image from Email ..................................................................................... 22