HTTP://WWW.QU.EDU.QA/ HTTP://WWW.VT.EDU/ Digital Library Project

HTTP://WWW.PSU.EDU/ HTTP://WWW.TAMU.EDU/

Monday, 20 May 2013 1 Qatar Digital Library (QDL) Initiative Workshop #1

Monday, 20 May 2013 08:45 to 14:00

Auditorium Room 117, New Library Qatar University — , Qatar

Monday, 20 May 2013 2 Introductions 08:55 – 09:05

Dr. Mohammed Samaka

College of Engineering Qatar University — Doha, Qatar

Project Co-Lead Principal Investigator

Monday, 20 May 2013 3 Qatar Digital Library Project

• Global Explosion of information: o More than 30,000 peer-reviewed research journals exist worldwide o 2.5 million articles published per year

• Knowledge society requires: o Deep awareness and access to the best content o Real-time research results to assist and improve strategic decision-making

Monday, 20 May 2013 4 Qatar Digital Library Project Team

Qatar University, Qatar: Virginia Tech, USA: Mohammed Samaka (Ph.D., Co-Lead PI) Edward Fox (Ph.D., Lead-PI) Myrna Tabet Tarek Kanan Khalid AbualSaud Asad Nafees Sumaya Ali S A Al-Maadeed Penn. State University, USA: (Ph.D., Key Investigator) C. Lee Giles (Ph.D., PI)

Consultants: Texas A&M, USA: John Impagliazzo (Ph.D., Key Investigator) Richard Furuta (Ph.D., PI) Susan Lukesh (Ph.D.) Hamed Alhoori Carole Thompson Robert Laws

This project was made possible by NPRP Grant # 4 - 029 - 1 – 007 from the Qatar National Research Fund (a member of ). Monday, 20 May 2013 5 Qatar Digital Library Project

Project Mission Transform the use of information in Qatar, moving toward a knowledge society, in accord with the Qatar National Vision 2030.

Monday, 20 May 2013 6 Qatar Digital Library Project

Project Objectives/Aims A. Research and prototype digital library systems and infrastructure for Qatar, focusing initially on Qatari information related to government and scholarly activities.

Leverage the crawling engine fromPenn State‘s SeerSuite software infrastructure, and extend it beyond its current focus on English to support Arabic-English collections, and to cover a broad range of scholarly disciplines, and all types of government information.

B. Research and build the digital library community in Qatar, supporting digital library use, services, collection development, tailored systems, and advancing toward a Knowledge Society.

Study scholarly activities, and engage in community building in Qatar, so DLs can be tailored to specific domains and to the unique needs of Qatar. Through workshops, a consulting center at the proposed Institute, and collaborative efforts with libraries and museums in Qatar, we will identify particular needs and uses, and tailor collections, systems, and services, to lead toward the Qatari Knowledge Society.

Monday, 20 May 2013 7 Welcoming Comments 09:05 – 09:15

Dr. Rashid Alammari

Dean, College of Engineering Qatar University — Doha, Qatar

Monday, 20 May 2013 8 Acknowledgment 09:15 – 09:15

This workshop and presentation is due to partial support from a grant from the Qatar National Research Fund (QNRF) through its National Priority Research Program (NPRP) Number 4-029-1-007

Monday, 20 May 2013 9

Participants Submit Completed QDL Surveys

Monday, 20 May 2013 10 Overview of Digital Libraries 09:15 – 09:40

Dr. Edward Fox

Department of Computer Science Virginia Tech — Blacksburg, Virginia USA

Project Lead Principal Investigator

Monday, 20 May 2013 11 Philosophy & Message Collaboration Empowerment Local Uploading National Sharing Regional, Global Open Access

Research Education Computing, DL curriculum Digital libraries, Graduate: ETDs Info. retrieval, … Ugrad: Ensemble

Monday, 20 May 2013 12 Outline • Acknowledgements • Digital Libraries • NDLTD (electronic theses / dissertations) • Digital Library Curriculum Project • Ensemble (Pathway in US NSDL) • Crisis, Tragedy & Recovery Network (CTRnet) • Saudi Digital Library - SDL

Monday, 20 May 2013 13 Acknowledgements

• Mentors (Licklider & Kessler 1967-71 MIT, Salton 1978-1983 Cornell)

• QNRF, Qatar University, NSF and other sponsors

• Students, colleagues, co-investigators

• Virginia Tech: Computer Science, Digital Library Research Lab, Information Technology

• Collaborators on local, national, and international projects

Monday, 20 May 2013 14 DLs — Objectives in 1991

• World Lit.: 24hr / 7day / from desktop

• Integrated “super” information systems: 5S: Table of related areas and their coverage

• Ubiquitous, Higher Quality, Lower Cost

• Education, Knowledge Sharing, Discovery

• Disintermediation -> Collaboration

• Universities Reclaim Property

• Interactive Courseware, Student Works

• Scalable, Sustainable, Usable, Useful

Monday, 20 May 2013 15 DL Overview: Why of Global Interest?

• National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly

• Knowledge and information are essential to economic and technological growth, education

• DL - a domain for international collaboration o wherein all can contribute and benefit o which leverages investment in networking o which provides useful content on Internet & WWW o which will tie nations and peoples together more strongly and through deeper understanding

Monday, 20 May 2013 16 Monday, 20 May 2013 17 Libraries of the Future JCR Licklider, 1965, MIT Press

World

Nation

State

City

Community

Monday, 20 May 2013 18 Information Life Cycle

Authoring Modifying

Using Organizing Creating Indexing Retention / Mining Storing Accessing Retrieving Filtering

Distributing Networking

Monday, 20 May 2013 19 Digital Library Content

Content Types

Text Video Geographic Software, Bio Images and Documents Audio Information Programs Information Graphics

Articles, Speech, (Aerial) Models Genome 2D, 3D, Reports, Photos Simulations Human, VR, Books animal, CAT plant

Monday, 20 May 2013 20 Content- Based Information Retrieval

Monday, 20 May 2013 21 Digital Objects (DOs)

• “Born digital” o Created digitally

• Digitized version of “real” object o Is the DO version the same, better, or worse? o Suggestion for documents : structured + rendered

• Surrogate for “real” object o Scanned versions o 3D models

Monday, 20 May 2013 22 Institutional Repositories

• “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.”

• Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA

• www.arl.org/sparc/IR/IR_Guide_v1.pdf

Monday, 20 May 2013 23

Goals of Institutional Repositories (by Steven Harnad, University of Southampton)

• Self Archiving of Institutional Research o Theses and Dissertations (VTLS NDLTD Project) o Article preprints and post prints o Internal documents and map

• Management of digital collections • Preservation of materials – decentralized approach • Housing of teaching materials • Electronic publishing of journals, books, posters, maps, audio, video and other multimedia objects

Adapted from Slide by V. Chachra, VTLS

Monday, 20 May 2013 24 NDLTD: www.ndltd.org

• Networked Digital Library of Theses and Dissertations (NDLTD) • N D Ltd or “Noodle TD” • Vision: Every thesis and dissertation in the world is: o Devised to take advantage of the most helpful electronic publishing methods o Shared globally and easily found o Supported by a suite of digital library services to aid authors, researchers, learners, universities o Preserved and migrated permanently

Monday, 20 May 2013 25 What are we doing?

• Aiding universities and nations to enhance graduate education, publishing, preservation (data sets next!), and Intellectual Property Rights efforts

• Helping improve the availability and content of (electronic) theses and dissertations (ETDs)

• Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive) http:// curric.dlib.vt.edu/wiki/index.php/ETD_Guide

Monday, 20 May 2013 26 Why ETD? Short Answer

• For Students: o Gain knowledge and skills for the Information Age o Richer communication (digital information, multimedia, …) • For Universities: o Easy way to enter the digital library field and benefit thereby • For the World: o Global digital library – large, useful, many services • General: o Save time and money o Increased visibility for all associated with research results

Monday, 20 May 2013 27 Monday, 20 May 2013 28 Curriculum Module Template

1. Module name 8. Introductory remedial instruction (the body of knowledge to be taught 2. Scope for the prerequisite knowledge/skills 3. Learning objectives required; completion optional) 4. 5S characteristics of the module 9. Body of knowledge (theory + (streams, structures, spaces, scenarios, practice; an outline that could be used society) as the basis for class lectures) 5. Level of effort required (in-class and 10. Resources (required readings for out-of-class time required for students) students; additional suggested readings for instructor and students) 6. Relationships with other modules (flow between modules) 11. Exercises / Learning activities 7. Prerequisite knowledge/skills 12. Evaluation of learning objective required (what the students need to achievement (graded exercises or know prior to beginning the module; assignments) completion optional; complete only if 13. Glossary prerequisite knowledge/skills are not included in other modules) 14. Additional useful links 15. Contributors (authors of module, reviewers of module)

Monday, 20 May 2013 29

DL Curriculum Framework

E R

E Semester 1: Semester 2:

U

S T

R DL collections: DL services and

C U

U development/creation sustainability

O

R

C

T S

Metadata Architectures Services Digitization Naming Archiving and Cataloging (agents, buses, (searching, Storage Repositories preservation Author wrappers/mediators) linking,

Interchange Archives Integrity

L submission Interoperability browsing, etc.)

S

D

C

I

E

P

R

O

O T

C Spaces Architectures Intellectual property Digital objects (conceptual, (agents, buses, rights mgmt. Composites geographic, wrappers/mediators) Privacy Packages 2/3D, VR) Interoperability Protection (watermarking)

Multimedia Thesauri Info. Needs Routing Documents streams/structures Ontologies Relevance Filtering E-publishing

Capture/representation Classification Evaluation Community D

S Markup E Compression/coding Effectiveness filtering

C Categorization

T

I

A

P

L

O

E T R Search & search strategy Bibliographic Content-based Info Multimedia Info seeking behavior information analysis summarization presentation, User modeling Bibliometrics Multimedia Visualization rendering Feedback Citations indexing

Monday, 20 May 2013 30 http://curric.dlib.vt.edu/modDev/modDev.html

Monday, 20 May 2013 31 Monday, 20 May 2013 32 Monday, 20 May 2013 33 Monday, 20 May 2013 34 www.nsdl.org

Monday, 20 May 2013 35 Ensemble, Pathway in NSDL

• National STEM (science, technology, engineering, and mathematics) education Digital Library – NSDL • National Science Digital Library • www.nsdl.org • Many projects, largest now called …… Pathways

Monday, 20 May 2013 36 NSDL Connects

Users: students, educators, life-long learners

Content: structured learning materials; large real-time or archived datasets; audio, images, animations; This slide primary sources; from Lee Zia digital learning objects (e.g. applets); interactive (virtual, remote) laboratories; ...

Tools: search; refer; validate; integrate; create; customize; publish; share; notify; collaborate; ...

Monday, 20 May 2013 37 Ensemble: www.computingportal.org

Monday, 20 May 2013 38 Crisis, Tragedy, and Recovery (CTR) • Human tragedies that result from man-made and natural events affect humans and communities significantly. • During and after a tragic event, there are a series of needs that have to be addressed. o Compounded by communication failures and a confusing plethora of data and information

Monday, 20 May 2013 39 CTR stakeholders

Monday, 20 May 2013 40 Monday, 20 May 2013 41 Saudi Digital Library - about SDL

• … wide spreading of scientific blocks or groupings

• … linking between academic and research communities.

• … supporting these scientific groupings at the national level, where it provides o sophisticated information services o … digital information resources in various forms, o … accessible to faculty staff, researchers and students o …

Monday, 20 May 2013 42 Saudi Digital Library- about SDL

• The digital library includes: o The largest gathering of e-books in the Arab world. More than (100.000) e-book in full text in various scientific specializations o More than 300 global publishers such as Elsevier, Springer, Pearson Wiley, Taylor & Francis, McGraw-Hill, Yale University, Oxford University, Harvard University

Monday, 20 May 2013 43

Monday, 20 May 2013 44 المكتبة الرقمية السعودية

• يتسم عصرنا الراهن بانتشار التكتالت أو التجمعات العلمية بشتى صورها والتي تربط بين المجتمعات األكاديمية والبحثية وتعتبر المكتبة الرقمية السعودية التابعة للمركز الوطني للتعلم اإللكتروني والتعليم عن بعد في وزارة التعليم العالي بالمملكة العربية السعودية من أبرز الصور الداعمة لمثل هذه التكتالت العلمية على المستوى الوطني ، حيث يعمل على توفير خدمات معلوماتية متطورة ، إضافة إلى إتاحة مصادر المعلومات الرقمية بمختلف أشكالها ، وجعلها في متناول أعضاء هيئة التدريس والباحثين والطالب في مرحلتي الدراسات العليا والبكالوريوس بالجامعات السعودية وبقية مؤسسات التعليم العالي • المكتبة الرقمية السعودية ، هي أكبر تجمع أكاديمي لمصادر المعلومات في العالم العربي، حيث تضم أكثر من 114 ألف مرجع علمي، تغطي كافة التخصصات األكاديمية، وتقوم بالتحديث المستمر لهذا المحتوى؛ مما يحقق تراكماً معرفياً ضخماً على المدى البعيد. وقد تعاقدت المكتبة مع أكثر من 300 ناشر عالمي. وقد فازت المكتبة بجائزة االتحاد العربي للمكتبات والمعلومات »اعلم« للمشاريع المتميزة على مستوى العالم العربي عام 2010م. • وتوفر المكتبة لجميع الجامعات السعودية مظلة واحدة، تقوم من خاللها بالتفاوض مع الناشرين حول مختلف القضايا القانونية والمالية، وفي هذا توفير كبير للمال وللجهود، من خالل التكتل تحث مظلة واحدة، تستطيع من خاللها أن تحصل على مزيد من المنافع والحقوق أمام الناشرين.

Monday, 20 May 2013 45 Project Aims / Non-Textual Content 09:40 – 09:50

Dr. John Impagliazzo

Emeritus, Computer Science Department Hofstra University — Hempstead, New York USA

Project Consultant and Key Investigator Additional Content Contributions by Carole Thompson and Robert Laws Project Consultants

Monday, 20 May 2013 46 Project Aims (1 of 5)

Aim #1 Research and prototype digital library systems and infrastructure for Qatar, focusing initially on Qatari information related to government and scholarly activities.

Aim #2 Research and build the digital library community in Qatar, supporting digital library use, services, collection development, tailored systems, and advancing toward a knowledge society.

Monday, 20 May 2013 47 Project Aims (2 of 5)

Regarding Aim 1:

• Leverage Penn State’s SeerSuite software infrastructure • Implement novel advanced systems on the proposed equipment • Extend SeerSuite beyond its current focus on English to support Arabic-English collections and cross-language discovery • Extend the effort to cover a broad range of scholarly disciplines: computing, chemistry, … • Support all types of government information

Monday, 20 May 2013 48 Project Aims (3 of 5)

Regarding Aim 1 (continued):

• Demonstrate how deep analysis of digital objects and collections provides superior capabilities beyond those in commercial systems • Obtain pages, reports, and other information from all branches of the government through websites as well as other databases and other accessible online venues • Collect information related to education and museums • Focus on automatic or semi-automatic collection development • Cover key aspects of Qatari information currently available

Monday, 20 May 2013 49 Project Aims (4 of 5)

Regarding Aim 2:

• Study scholarly activities (by surveys and at your locations) • Identify particular needs and uses • Tailor DL content to specific domains and to the unique needs of Qatar • Establish a consulting center at the QDL Institute • Collaborate in efforts with libraries and museums in Qatar • Engage in community building in Qatar (join us!)

Monday, 20 May 2013 50 Project Aims (5 of 5)

Regarding Aim 2 (continued):

• Tailor collections, systems, and services to lead toward the Qatari Knowledge Society • Extend work on social networks to collect and utilize data, allowing personalized as well as group and agency tailoring. • Include key communities such as citizens, educators, scholars, and students • Partner with the new digital librarians to add other collections, especially covering Qatari culture and heritage • Identify collections with key metadata

Monday, 20 May 2013 51 The Need to Safeguard Culture

Maulana Khan (2001) Doha Conference of Ulama on and Cultural Heritage

“Every group or community has its own particular culture and has the absolute right to safeguard that culture”

Maulana Khan. Proceedings of the Doha Conference of ‘Ulama on Islam and Cultural Heritage. Doha, Qatar. December 30–31, 2001. p. 66 New York: UNESCO, 2005. Retrieved Nov. 15, 2010 from http://unesdoc.unesco.org/images/0014/001408/140834m.pdf

Monday, 20 May 2013 52 Non-Textual Collections (1 of 2)

• Initial focus of QDL on automatic or semi-automatic collection development • Project also includes supplemental collections consisting of non- automatic material. • Sampling would complete the spectrum of data collection useful in supporting research and the long-term interests of Qatar. • Project also seeks to develop a collection that demonstrates the various aspects of content preservation • Research effort seeks to explore non-text artifacts to: o Enhance the textual research and collection o Preserve the heritage and culture of the country and its people o Offer a basis for related research

Monday, 20 May 2013 53 Non-Textual Collections (2 of 2)

• Examples of the sampling includes areas such as: o , Literature, Music o Qatari Sports, Politics o Qatari Education (historical and contemporary perspectives) o Qatari Museum Collections • Appropriate metadata would: o Document each item o Serve as examples of the varying types of description o Serve as a basis to build new digital libraries in Qatar • Sampling of some of these materials would: o Serve the people of Qatar o Become a model for other efforts o Demonstrate the potential for future research

Monday, 20 May 2013 54 Non-Textual Examples (1 of 3)

• Images landmarks in Qatar o Buildings o Mosques • Oral Histories o Over 2000 CDs o Need to document them o Make them available for preservation Mubarak Al Malik • Literature o Ministry of Culture, Arts, and Heritage (MoC) making an effort o http://www.moc.gov.qa/English/Authors/Pages/default.aspx

Monday, 20 May 2013 55

Non-Textual Examples (2 of 3)

• Qatari Art o Qatari Artist Directory o Names of Artists o Examples of their Work • Work of Dr. Wafa Al-Hamad • Exhibits now at QU, New Library Exhibition Hall Talal Nayef Al Qasim o Need to make examples of these collections • Sports in Qatar o Document exhibits and activities of historical interest o Falconry o Camel Racing o Horse racing

Monday, 20 May 2013 56 Non-Textual Examples (3 of 3)

o Qatari Musical Concert images (MoC) o Qatar Philharmonic Orchestra o Document the Development of Music in Qatar o Information from the QF Music Academy at Katara • Education o Qatar University (State University) o Hamad bin Khalifa University () • Media Collections o o Newspapers

Monday, 20 May 2013 57

QDL and Librarian / Corporate /

Government Perspectives 09:50 – 10:00

Myrna Tabet

Library Services Qatar University — Doha, Qatar

Project Research Associate

Monday, 20 May 2013 58 Qatar Digital Library (QDL)

• Importance, Benefits, and Content

• Preservation of Culture and History of Qatar

• Scholarly, Governmental, Institutional, and Corporate Viewpoints

Monday, 20 May 2013 59 Importance of the QDL Project (1 of 2)

• Users have: o Become sophisticated and adept at using technology o Higher expectations of service provided by libraries

• Role of (digital) librarians: o Changing as a consequence to the digital shift o New ways in provision of information

Monday, 20 May 2013 60 Importance of the QDL Project (2 of 2)

• Digital libraries can: o Assist in the transformation of data into information o Help in building a knowledge-based society

• Governments aim: o Provide access to relevant information for their citizens

• Nationwide digital library community can work together toward these goals

Monday, 20 May 2013 61 DL Benefits to Library Users (1 of 2)

• Vastly more information at your fingertips

• Access 24/7, from anywhere, anytime

• Rapidly updated: Current + Historical information

• Information sharing and collaboration

Monday, 20 May 2013 62 DL Benefits to Library Users (2 of 2)

• New forms of access o Multilingual / multimedia o Hypermedia (Linked data, Text, Images, Audio, Video)

• Improved preservation with: o Metadata o Information exchange protocols

Monday, 20 May 2013 63 Significance to Librarians, Corporations, and Governmental Agencies (1 of 2)

• The need to preserve cultural and historical heritage => o Collections of fragile and precious artifacts => o Libraries, museums, and archives developing digital collections => o Users from all over the world accessing and studying

• SeerSuite (crawler, search engine) o Collect from the Web and from curated collection of artifacts o Images, audio, and text for browsing and searching o Extractions of tables and references / citations o Machine learning and artificial intelligence (AI) – beyond commercial options

Monday, 20 May 2013 64 Significance to Librarians, Corporations, and Governmental Agencies (2 of 2)

• A one stop search of: o Information about Qatar o Information to preserve the

• Indexing, analysis, and retrieval of: o Resources, reports, statistics, and other types of information o Information in the Arabic language as well as in English

Monday, 20 May 2013 65 Available Content (1 of 2) • Materials captured: o Local scholarly, cultural o Governmental documents

• Metadata, data, and many types of documents (including full text)

• Free and open as well: o Freely accessible for anyone to use o Available for authorized users due to: • Licenses • Cost issues

Monday, 20 May 2013 66 Available Content (2 of 2)

• Main resources: o First appeared in digital form o Often referred to as being ‘born’ digital

• At a later stage the project will include: o Digital versions of material already existing in print o Multimedia (image, audio, video) forms

• Surveys and studies will: o Guide collection strategies and priorities o Satisfy the needs of the Qatari community

Monday, 20 May 2013 67 Selected Digital Library References

Lesk, M. (2005). Understanding digital libraries, 2nd ed. San Francisco, CA: Morgan Kaufmann.

Tedd, L. & Large, A. (2005). Digital libraries: Principles and practice in a global environment. Munchen: K.G. Saur.

Witten, I., Bainbridge, D. & Nichols, D. (2010). How to build a digital library, 2nd ed. Burlington, MA: Morgan Kaufmann, Elsevier.

Monday, 20 May 2013 68 QDL and Researcher Perspectives 10:00 – 10:10

Hamed Alhoori

Department of Computer Science & Engineering Texas A&M University — College Station, Texas USA

Project Research Associate

Monday, 20 May 2013 69 Introduction

Inadequacy of literature reviews (Boote, D.N., et al., 2005)

Monday, 20 May 2013 70 Introduction – Objective Understand and support the dynamic information needs, information-seeking behavior, information use, and other scholarly activities of researchers, scientists, engineers, scholars and students in Qatar.

Monday, 20 May 2013 71 Introduction - Research Questions

• How do researchers currently search, select, and manage their information sources? • What difficulties are researchers facing during the literature review process? • How social reference management and recommendation systems, used in scholarly communities, influenced the research process? • How to measure a better scientific impact for each discipline using multi-dimensional metrics? • What are the current scholarly research needs?

Monday, 20 May 2013 72 Related Studies

• New patterns of searching (Hallmark, J., 2004). • Difficulty locating information (George, C., 2006). • Not aware of or familiar with some of the services and do not consult librarians (Kuruppu, P.U., 2006). • Limitations

Monday, 20 May 2013 73 Methodology

• Qualitative and quantitative research methods • Statistical hypothesis testing techniques

Monday, 20 May 2013 74 Initial Results — 1

• (Alhoori, et al., 2011) - acceptance rate 9% • 164 researchers participated in the study o 25 faculty members, 5 postdocs, 84 doctoral students, 28 master students, 22 undergraduate students. o 131 male and 33 female o Participants were from 13 different disciplines from Texas A&M University – College Station

Monday, 20 May 2013 75 Initial Results — 2

• Differences in reading habits • Difficulties locating their needs • Getting lost • Repeated results • Printing articles and using folders to organize • Notes • Research updates • Social reference management - lack of awareness, accuracy

Monday, 20 May 2013 76 Initial Results — 3

• Saving methods • Significant relationship between o Saving methods and collaboration o Saving methods and retrieving articles • Researchers’ satisfaction • Search differences • Research interests • Publication overload (78%)

Monday, 20 May 2013 77 References

• Boote, D.N., Beile, P. Scholars Before Researchers: On the Centrality of the Dissertation Literature Review in Research Preparation. Educational Researcher. 34, 3-15 (2005). • Hallmark, J. Access and Retrieval of Recent Journal Articles: A Comparative Study of Chemists and Geoscientists. Issues in Science and Technology Librarianship. 40 (2004). • George, C., Bright, A., Hurlbert, T., Linke, E.C., ST Clair, G., Stein, J. Scholarly use of information: graduate studentsʼ information seeking behaviour. Information Research. 11, 1-19 (2006). • Kuruppu, P.U., Gruber, A.M. Understanding the Information Needs of Academic Scholars in Agricultural and Biological Sciences. The Journal of Academic Librarianship. 32, 609-623 (2006). • Alhoori, H., Furuta, R. ,“Understanding the dynamic scholarly research needs and behavior as applied to social reference management,” International Conference on Theory and Practice of Digital Libraries, TPDL 2011.

Monday, 20 May 2013 78 Monday, 20 May 2013 79 10:10 – 10:40

B R E A K

Monday, 20 May 2013 80 Web Archiving, Crawling,

and SeerSuite 10:40 – 11:15

Dr. Edward Fox and Tarek Kanan

Department of Computer Science Virginia Tech — Blacksburg, Virginia USA

Project Lead Principal Investigator Project Research Associate

Monday, 20 May 2013 81 Web Archiving and Web Crawling 10:40 – 11:05

Dr. Edward Fox

Department of Computer Science Virginia Tech — Blacksburg, Virginia USA

Project Lead Principal Investigator

Monday, 20 May 2013 82 Archiving by Assembling Content + Metadata • Collect digital objects o Digitize / purchase / obtain submissions o Catalog each => metadata record

• Aggregate into a metadata catalog o Searchable o Browsable o Usually free of intellectual property rights concerns o Can be shared through the Open Archives Initiative (OAI)

Monday, 20 May 2013 83 OAI = Technical Umbrella for Practical Interoperability…

Reference Museums Libraries

E-Print Publishers Archives

…that can be exploited by different communities Monday, 20 May 2013 84 The World According to OAI

Service Providers

Current Discovery Preservation Awareness

Data Providers

Monday, 20 May 2013 85 Web Archiving

• Introduction: Web archiving is the process of gathering up data recorded on the World Wide Web, • storing it, • ensuring the data is preserved in an archive, and • making the collected data available for future research.

• The Internet Archive and several national libraries initiated Web archiving practices in 1996.

Monday, 20 May 2013 86

Monday, 20 May 2013 87 Web Archiving

• 2001: International Web Archiving Workshop (IWAW): o share experiences and exchange ideas.

• 2003: International Internet Preservation Consortium (IIPC): o international collaboration in o developing standards and open source tools for the o creation of Web archives.

• Tools: Heritrix, Memento, SiteStory

• Web growth => o concern with change and loss => o local and national Web archiving initiatives

Monday, 20 May 2013 88 Web Crawlers

• A Web crawler is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing. • A Web crawler also may be called a Web spider, an ant, or an automatic indexer.

• Web search engines and some other sites use Web crawling or spidering software to update their Web content or indexes of others sites’ Web content. • Web crawlers can copy all the pages they visit for later processing by a search engine that indexes the downloaded pages so that users can search them much more quickly.

Monday, 20 May 2013 89 Web Crawler

• A Web crawler starts with a list of URLs to visit, called the seeds.

• On those page, identifies all the hyperlinks • adds them to the list of URLs to visit • recursively visits pages pointed to • according to a set of policies.

• Prioritizes its downloads – some pages change often.

Monday, 20 May 2013 90 Web Crawlers Difficulties and Limitations

• Technical challenges of Web archiving

• Intellectual property laws.

• Peter Lyman, states that "although the Web is popularly regarded as a public domain resource, it is copyrighted; thus, archivists have no legal right to copy the Web".

• However national libraries in many countries have a legal right to copy portions of the Web • under an extension of a requirement for legal deposit.

Monday, 20 May 2013 91 Web Crawlers Difficulties and Limitations

• Removal requests: implemented by: • WebCite, the Internet Archive, or Internet Memory

• Other Web archives are only accessible from certain locations or have regulated usage.

• WebCite cites a recent lawsuit against Google's caching, which Google won.

Monday, 20 May 2013 92 Focused Crawlers

• For a particular topic or event • to build a Web collection focused in that area

• Start with URLs of interest, viewed as seeds to grow from • Expand in a ‘smart’ way to get all and only what is relevant

• Use information retrieval / artificial intelligence / machine learning o Require ‘knowledge bases’ and/or human training examples

• Nevertheless, there is a tradeoff between the resulting o Recall (i.e., coverage of what is out there) o Precision (i.e., freedom from noise in what is collected)

Monday, 20 May 2013 93 SeerSuite 11:05 – 11:15

Tarek Kanan

Department of Computer Science Virginia Tech — Blacksburg, Virginia USA

Project Research Associate

Monday, 20 May 2013 94 SeerSuite (1 of 4)

• Prof. C. Lee Giles, QDL Principal Investigator o Created CiteSeer in 1997 o Associates: Steve Lawrence and Kurt Bollacker at the NEC Research Institute (now NEC Labs) in Princeton, New Jersey, USA o Now at Penn State University, he continues to lead SeerSuite into the “Next Generation”

• http://citeseerx.sourceforge.net/

Monday, 20 May 2013 95 SeerSuite (2 of 4)

• What is SeerSuite?

• SeerSuite built by…

• SeerSuite supports…

Monday, 20 May 2013 96 SeerSuite (3 of 4)

• SeerSuite was designed to provide a framework that would replace CiteSeer

• SeerSuite improves on aspects of the original CiteSeer, with features such as: o Reliability o Robustness o Scalability

Monday, 20 May 2013 97 SeerSuite (4 of 4)

• The motivation behind SeerSuite

• SeerSuite design

• SeerSuite enables access to: o Extensive document collections o Citations o Author metadata

Monday, 20 May 2013 98 SeerSuite - CiteSeerX

• CiteSeerX shares several components with digital libraries and search engines o Web Interface o Crawlers o Index o Databases

Monday, 20 May 2013 99 SeerSuite - CiteSeerX

• Domain specific repositories and digital library systems o arXiv for physics o RePEc for economics o Greenstone

• CiteSeerX o Closest in design to Google Scholar

Monday, 20 May 2013 100 http://qdlproject.qu.edu.qa/citeseerx/index

Monday, 20 May 2013 101 http://citeseerx.ist.psu.edu

Monday, 20 May 2013 102

Monday, 20 May 2013 103 SeerSuite - MyCiteSeerX

• SeerSuite improves users’ information access.

• MyCiteSeerX roles….

• MyCiteSeerX allows users to store o Queries o Document portfolios o Tag documents o Monitor and track documents of interest

• MyCiteSeerX needs: o Registration o Login

Monday, 20 May 2013 104 Monday, 20 May 2013 105 SeerSuite - TableSeer

• Tables Contain Important Data

• Analyzes, extracts, and indexes from tables

Monday, 20 May 2013 106 Monday, 20 May 2013 107 QDL Website 11:15 – 11:25

Asad Nafees*

Technology Services Qatar University — Doha, Qatar

Project Research Associate

* Presented by Hamed Alhoori

Monday, 20 May 2013 108 QDL Website http://qdl.qu.edu.qa/

Monday, 20 May 2013 109 Site Objectives

• Provide in-depth information on everything related to the “QDL Project”. This includes objectives, progress, data sets, presentation slides, and interim reports.

• Raise awareness and provide material on concepts related to digital libraries, associated systems, and processes.

• Create an online identity for the proposed institute with information on activities and opportunities to participate.

• Collect feedback and data through surveys and opinion polls.

Monday, 20 May 2013 110 Primary Audience

• Groups of visitors making up the project’s primary audience will be the main focus of our site:

o Librarians and libraries in Qatar

o Researchers and academics

o Government organizations

o Non-Governmental organizations (such as http://www.fsd.org.qa/)

Monday, 20 May 2013 111 Secondary Audience • Groups of visitors making up the secondary audience – these are important but not critical:

o University / School Students o Teachers / Faculty o Managers o Qatari citizens o Other stakeholders

Monday, 20 May 2013 112 Content (Published or Under Development) • Information and details about the project. • A list of digital libraries currently available in Qatar. • Information on the digital library systems available so they can set up collections for their end users. • Searchable database of example digital libraries around the world - in English, Arabic, or other languages, filtered by topic or other criteria. • Lessons and tutorials on how to define and create collections.

Monday, 20 May 2013 113 Content (Published or Under Development)

• A ‘Try it now’ function that would allow an end user to go through the steps of setting up a digital library, creating collections, and uploading electronic artifacts.

• An online forum for librarians allowing for an open discussion of issues and ideas that be used to help organize collections and artifacts.

• Information page on how content in digital library research archives can • increase their visibility around the world, • put them in contact with other researchers and collaborators, and • improve their funding opportunities.

• Examples and links to peer sites in other countries where government information is successfully archived and accessed from a digital library

Monday, 20 May 2013 114 The Survey

• We have a survey • Responses to this survey will help us tailor our efforts to the resources and needs of Qatar. • Provide your contact information; we will keep you informed about the project’s progress over time, and workshops that are offered. • English: http://qdl.qu.edu.qa/content/survey • Arabic: http://qdl.qu.edu.qa/ar/content/survey

Monday, 20 May 2013 115 Join the Mailing List

• Keep up to date with information about our project by joining our mailing list.

• Submit your email and we'll keep you informed about our new initiatives, upcoming events, seminars and news.

Monday, 20 May 2013 116 Audience Participation 11:25 – 11:45

Dr. John Impagliazzo

Emeritus, Computer Science Department Hofstra University — Hempstead, New York USA

Project Consultant and Key Investigator

Monday, 20 May 2013 117 Audience Participation 11:25 – 11:35

Constituent Interest in QDL Project

Monday, 20 May 2013 118 Audience Participation 11:35 – 11:45

Q & A Session

Monday, 20 May 2013 119 Global Perspective

and QDL Next Steps 11:45 – 11:55

Dr. Edward Fox

Department of Computer Science Virginia Tech — Blacksburg, Virginia USA

Project Lead Principal Investigator

Monday, 20 May 2013 120 World Digital Library

• Free and growing Internet available collection of significant cultural treasures from many countries and cultures

• Photographs, books, manuscripts, maps, audio recordings, and films

• Searchable in 7 languages: Arabic, English, French, Russian, Chinese, Spanish and Portuguese o With an easy to use interface that enables o browsing by place, time, topic, type of item, and contributing institution, or o open-ended search.

Monday, 20 May 2013 121 World Digital Library and

• The World Digital Library -- for use by students, scholars, and members of the public.

• UNESCO program led by the US Library of Congress in partnership with libraries all over the world.

• The National Library Qatar Foundation is one of a number of Library members from Arabic/Islamic countries that contribute content.

• The Qatar Foundation is a major financial sponsor of the World Digital Library.

Monday, 20 May 2013 122 World Digital Library and British Library

• Qatar Foundation Qatar National Library o partnership with the British Library to make Arabic science and Gulf history available for worldwide research

• Began in July 2012 o to digitize more than half a million pages of o historic documents detailing o Arab history and culture.

Monday, 20 May 2013 123 World Digital Library and British Library

• The three-year project o to transform people’s understanding of the history of the Middle East o material from the UK’s Office archive + o medieval Arabic manuscripts on science and medicine.

• At the British Library, in close cooperation with the Qatar National Library.

Monday, 20 May 2013 124 QDL Focus

Community in Qatar • Identify interested stakeholders, to tailor to needs • Train next generation of digital librarians, archivists, and curators • Partners helping with additional collection development

Advanced Technology for Enhanced Access • “Low hanging fruit” by crawling Qatar-related Web • Improved analysis (citations, tables, chemicals, …) • Support for both Arabic and English

Monday, 20 May 2013 125 Closing Remarks 11:55 – 12:00

Dr. Mohammed Samaka

College of Engineering Qatar University — Doha, Qatar

Project Co-Lead Principal Investigator

Monday, 20 May 2013 126 12:00 – 12:00

Submit Completed Workshop Evaluations

Monday, 20 May 2013 127 12:00 – 13:00

L U N C H

Monday, 20 May 2013 128 13:00 – 14:00 Optional Post-workshop Hands-on Session

13:00 to 14:00 Room 110

Monday, 20 May 2013 129