Web Crawling, Analysis and Archiving
Web Crawling, Analysis and Archiving Vangelis Banos Aristotle University of Thessaloniki Faculty of Sciences School of Informatics Doctoral dissertation under the supervision of Professor Yannis Manolopoulos October 2015 Ανάκτηση, Ανάλυση και Αρχειοθέτηση του Παγκόσμιου Ιστού Ευάγγελος Μπάνος Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης Σχολή Θετικών Επιστημών Τμήμα Πληροφορικής Διδακτορική Διατριβή υπό την επίβλεψη του Καθηγητή Ιωάννη Μανωλόπουλου Οκτώβριος 2015 i Web Crawling, Analysis and Archiving PhD Dissertation ©Copyright by Vangelis Banos, 2015. All rights reserved. The Doctoral Dissertation was submitted to the the School of Informatics, Faculty of Sci- ences, Aristotle University of Thessaloniki. Defence Date: 30/10/2015. Examination Committee Yannis Manolopoulos, Professor, Department of Informatics, Aristotle University of Thes- saloniki, Greece. Supervisor Apostolos Papadopoulos, Assistant Professor, Department of Informatics, Aristotle Univer- sity of Thessaloniki, Greece. Advisory Committee Member Dimitrios Katsaros, Assistant Professor, Department of Electrical & Computer Engineering, University of Thessaly, Volos, Greece. Advisory Committee Member Athena Vakali, Professor, Department of Informatics, Aristotle University of Thessaloniki, Greece. Anastasios Gounaris, Assistant Professor, Department of Informatics, Aristotle University of Thessaloniki, Greece. Georgios Evangelidis, Professor, Department of Applied Informatics, University of Mace- donia, Greece. Sarantos Kapidakis, Professor, Department of Archives, Library Science and Museology, Ionian University, Greece. Abstract The Web is increasingly important for all aspects of our society, culture and economy. Web archiving is the process of gathering digital materials from the Web, ingesting it, ensuring that these materials are preserved in an archive, and making the collected materials available for future use and research. Web archiving is a difficult problem due to organizational and technical reasons. We focus on the technical aspects of Web archiving.
[Show full text]