Machine Learning Models for Educational Platforms

University of Cagliari Department of Mathematics and Computer Science Ph.D. Course in Computer Science Cycle XXXII Ph.D. Thesis Machine Learning Models for Educational Platforms S.S.D. INF/01 Candidate Mirko Marras ORCID: 0000-0003-1989-6057 Supervisor Ph.D. Coordinator Prof. Gianni Fenu Prof. Michele Marchesi Final Examination Academic Year 2018/2019 Thesis Defence: February 2020 Session Statement of Authorship I declare that this thesis entitled “Machine Learning Models for Educational Plat- forms” and the work presented in it are my own. I conrm that: • this work was done while in candidature for this PhD degree; • when I consulted the work published by others, this is always clearly attributed; • when I quoted the work of others, the source is always given; • I have acknowledged all main sources of help; • with the exception of the above references, this thesis is entirely my own work; • appropriate ethics guidelines were followed to conduct this research; • for work done jointly with others, my contribution is clearly specied. Abstract Scaling up education online and onlife is presenting numerous key challenges, such as hardly manageable classes, overwhelming content alternatives, and academic dishonesty while interacting remotely. However, thanks to the wider availability of learning-related data and increasingly higher performance computing, Articial Intelligence has the po- tential to turn such challenges into an unparalleled opportunity. One of its sub-elds, namely Machine Learning, is enabling machines to receive data and learn for themselves, without being programmed with rules. Bringing this intelligent support to education at large scale has a number of advantages, such as avoiding manual error-prone tasks and reducing the chance that learners do any misconduct. Planning, collecting, developing, and predicting become essential steps to make it concrete into real-world education. This thesis deals with the design, implementation, and evaluation of Machine Learn- ing models in the context of online educational platforms deployed at large scale. Con- structing and assessing the performance of intelligent models is a crucial step towards increasing reliability and convenience of such an educational medium. The contributions result in large data sets and high-performing models that capitalize on Natural Language Processing, Human Behavior Mining, and Machine Perception. The model decisions aim to support stakeholders over the instructional pipeline, specically on content categoriza- tion, content recommendation, learners’ identity verication, and learners’ sentiment analysis. Past research in this eld often relied on statistical processes hardly applicable at large scale. Through our studies, we explore opportunities and challenges introduced by Machine Learning for the above goals, a relevant and timely topic in literature. Supported by extensive experiments, our work reveals a clear opportunity in combin- ing human and machine sensing for researchers interested in online education. Our ndings illustrate the feasibility of designing and assessing Machine Learning models for cat- egorization, recommendation, authentication, and sentiment prediction in this research area. Our results provide guidelines on model motivation, data collection, model design, and analysis techniques concerning the above applicative scenarios. Researchers can use our ndings to improve data collection on educational platforms, to reduce bias in data and models, to increase model eectiveness, and to increase the reliability of their models, among others. We expect that this thesis can support the adoption of Machine Learn- ing models in educational platforms even more, strengthening the role of data as a precious asset. The thesis outputs are publicly available at https://www.mirkomarras.com. Biography Mirko Marras was born on April 19, 1992 in Iglesias (Italy). He is a PhD Candidate in Computer Science at the Department of Mathematics and Computer Science of the Uni- versity of Cagliari (Italy), advised by prof. Gianni Fenu. He received the MSc Degree in Computer Science (cum laude, 18 months) from the same University in 2016. He received the Computer Science Engineering license from the University of Pisa (Italy) in 2016. In 2015, he has been Research Intern at UnitelCagliari (Italy) for the "E-Learning In- teractive Opportunities - ELIOS" project (MIUR, 1.2 ME). In 2017, he spent ve months at EURECAT (Spain), collaborating with the Data Science and Big Data Analytics Unit on the "DEcentralized Citizen Owned Data Ecosystem DECODE" project (EU, 5 ME), among others. In 2018, he spent three months within the Department of Informatics and Systems at the University of Las Palmas (Spain). In 2019, he spent two months at the Department of Computer Science and Engineering of the New York University: Tandon School of Engineer- ing (U.S.A). Since 2018, he has been contributing to the "ILEARNTV, Anywhere, Anytime" project (EU-MIUR, 10 ME). Since 2017, he has been teaching assistant for the “Computer Networks” course and thesis assistant for several students in Computer Science. In 2014, he was recognized as the Best Third-year Student of the BSc Degree in Computer Science of the University of Cagliari (Italy). In 2016, he was recognized as the Best MSc Student of the Faculty of Science and one of the Top 12 MSc Students of the same University. In 2018, he got to the podium of the Call for Visionary Ideas - Education competition organized by Nesta Italia. He has received the Best Poster Award at the European Semantic Web Conference 2017 (ESWC2017), the Demo Honorable Mention at The Web Conference 2018 (WWW2018) and the Best Paper Award at Didamatica 2018. He has been awarded two Erasmus+ PlaceDoc and one GlobusDoc grants to spend 11 months abroad, totally. His research interests focus on machine learning for educational platforms in the context of knowledge-aware systems, recommender systems, biometric systems, and opinion mining systems. He has co-authored papers in top-tier international journals, such as Pattern Recognition Letters (Elsevier), Computers in Human Behavior (Elsevier), and IEEE Cloud Computing. He has given talks and demostrations at several conferences and workshops, such as TheWebConf 2018, ECIR 2019, INTERSPEECH 2019. He has been involving in the program committee of the main Technology-Enhanced Education conferences, such as AIED, EDM, ITICSE, ICALT, UMAP. He has been also acting as a VI reviewer for Q1-level journals, such as IEEE Transactions on Big Data and IEEE Transac- tions on Image Processing. He has been part of the local organizing committee of SITIS 2018. He has been co-chairing the Bias 2020 workshop on algorithmic bias in search and recommendation – with education as a sub-topic - at ECIR 2020. He is member of several associations, including CVPL, AIxIA, GRIN, IEEE, and ACM. For further information, please would you like to visit my personal website http://mirkomarras.com. Dissemination The research that contributed to the skills mastered for and the content part of this Ph.D. thesis has resulted from 22 papers fully published in national and international journals and conference proceedings. I would sincerely thank my co-authors for their precious contribution, and such a gratitude would be demonstrated with the adoption of the scientic ‘We’ throughout the thesis. I rstly make it clear my contribution. I envisioned the research presented in this thesis and completed the majority of the work. I designed the approaches and chose the research directions. I collected the datasets and was responsible for data analysis. Moreover, I was in charge of the implementation of the related scripts. Finally, I wrote the papers for submission, managed the peer-review pipeline, and subsequently revised them. I collaborated closely with the listed co-authors throughout all stages. Co-authors provided feedback on the approaches, oered technical support, discussed techniques, and contributed to the preparation of the submitted work. Furthermore, I was in charge of the presentation of 9 papers at conferences and workshops (orange-text highlighted). Three papers have received an award or an honorable mention (red-text highlighted). Exception is made for papers numbered as (i), (iii), (x), (xiv), (xvi), and (xx) as I and Dr. Danilo Dessí equally contributed. Similarly, I and Dr. Silvio Barra put comparable eort for papers numbered as (xi) and (xii). The detailed references to the produced papers are provided below. Peer-reviewed Book Chapters i. Dessí, D., Dragoni, M., Fenu, G., Marras, M., & Reforgiato, D. (2019). Deep Learn- ing Adaptation with Word Embeddings for Sentiment Analysis on Online Course Reviews. In: Deep Learning-Based Approaches for Sentiment Analysis, 57-83, Springer. https://doi.org/10.1007/978-981-15-1216-2_3 ii. Marras, M., Marin-Reyes, P., Lorenzo-Navarro, J., Castrillon-Santana, M., & Fenu, G. (2019). Deep Multi-Biometric Fusion for Audio-Visual User Re-Identication and Verication. In: Revised Selected Papers of the International Conference on Pattern Recognition Applications and Methods ICPRAM 2019, 11996, 136-157, Springer. https://doi.org/10.1007/978-3-030-40014-9_7 VIII Peer-reviewed Publications in Journals iii. Dessí, D., Fenu, G., Marras, M., & Reforgiato, D. (2019). Bridging Learning Analytics and Cognitive Computing for Big Data Classication in Micro-Learning Video Collections. In: Computers in Human Behavior, 92, 468-477, Elsevier. https://doi.org/10.1016/j.chb.2018.03.004 iv. Fenu, G., & Marras,

Load more