Data Mining and Big Data Analytics (2 Credits)
Total Page:16
File Type:pdf, Size:1020Kb
Course: Data mining and Big Data analytics (2 credits) Instructors: Fosca Giannotti and Dino Pedreschi Learning goals The course provides an introduction to data mining and knowledge discovery from data, i.e., the analytical process of extracting useful information from large raw data generated and collected by various ICTs. The key data mining methods of clustering, classification and pattern mining are illustrated, together with practical tools for their execution. Next, we focus on Big Data, the digital traces of human activities at societal scale, which provide a powerful social microscope that together with social mining – the methods for discovering knowledge from these data – can help us understand and forecast many complex socio-economic phenomena. The key methods for Big Data sensing and acquisition are discussed, together with basics of social media mining and sentiment analysis, mobility data mining, social network mining, retail market data mining, and sport data mining. We conclude with an introduction to big data visualization. Syllabus 1. Data mining and the knowledge discovery process. Overview of data mining and machine learning techniques. Exemplar case studies in clustering, classification and pattern mining. 05.10.2015 (Dino & Fosca) 2. Data. Data types and formats. Exploratory data analysis and data understanding. Visual data exploration with KNIME. Data preparation. 06.10.2015 (Dino) 3. Clustering. Taxonomy of clustering concepts: distance-based (separation, centroids, contiguity), density-based, partitional vs. hierarchical. Methods for centroid-based clustering (k-means), hierarchical clustering (single, complete and average linkage), density-based clustering (DBSCAN). Practical clustering with KNIME. 07.10.2015 (Dino) 4. Classification and prediction models. Model learning and model validation. Explanation vs. prediction. Rule-based classifiers and decision trees. Naïve Bayes classifiers. Basic machine learning models (K-nearest neighbors, linear discriminant analysis, support vector machines, ensemble methods). Practical classification with KNIME. 20.10.2015 (Dino) 5. Pattern mining and association rules. Apriori principle. Mining high-frequency patterns and high-confidence rules. Interestingness measures for patterns and rules. Practical pattern mining with KNIME. 21.10.2015 (Dino) 6. Social network mining. Community concepts and community discovery methods (top- down and bottom-up). Validation of discovered communities. Exemplar application to the analysis of user engagement in the Skype network. Discovery of diffusion patterns over social networks. Exemplar case study on the music network Last.fm. 22.10.2015 (Dino) 7. Big data and social sensing. Big data acquisition. Web scraping, crawling, crowdsourcing, crowdsensing. Big data technologies and platforms, NOsql and map-reduce paradigm. 10.11.2015 (Fosca) 8. Social media mining. Listening social media sources. Monitoring social trends. Basics of opinion mining and sentiment analysis. Exemplar social media mining projects. 11.11.2015 (Fosca) 9. Mobility data analytics. Big data proxies of human mobility. Basic measures of human mobility. Data-driven human mobility models. Mobility data mining with GPS tracking data. Analysis of traffic and city dynamics with vehicular telematics data. Analysis of personal vs. collective mobility. 12.11.2015 (Fosca) 10. Mobility data mining with mobile phone data. Analysis of traffic and city dynamics with GSM data. Systematic vs. occasional mobility. Demographic and socio-economic indicators based on GSM data. 24.11.2015 (Fosca) 11. Retail data mining and economic complexity. Mining the network of supermarket consumers and products. Exemplar application to marketing. Characterizing innovators (early adopters) of new products to predict success of innovations. Sport data mining: analysis of large football data to characterize team performance and predict success. 25.11.2015 (Dino) 12. Data visualization and visual analytics. Basics of visual representation of data: hierarchies, networks, maps, time series, spatio-temporal data, text. Exemplar case studies. 26.11.2015 (Fosca) .