<<

to HathiTrust https://www.hathitrust.org

Overview

HathiTrust is a large-scale collaborative repository of digital content from over 100 research , the Books project, and the .

Hathi, pronounced "hah-tee," is the Hindi for "elephant," an animal famed for its long-term memory. (source: Wikipedia)

The contains over 17 million digitized items, over 6 million of which are available as full-text PDF downloads. There are many items relevant to Hawai‘i.

HathiTrust also provides a number of discovery and access services, notably, full-text search across the entire repository. These tools are not covered in this handout.

Make sure to Login as a UH Affiliate

Search Full Text or the Catalog Entry

Limit Search to Full View

Advance Search in Full-Text

Download

Use Collections

If you are using Zotero:  if you are looking at a particular record or PDF, the icon should look like a book and you can download that particular reference.  if you are looking at a list of references, the icon will look like a folder and you can download multiple references at once.

Fall, 2019 1

Finding Items in HathiTrust

Key options when searching include: full text or the catalog (the bibliographic record) full text will give you more results, but make sure you use terms in use when the book was published, i.e. Diamond Head used to be called Diamond Hill the text has created Optical Recognition and may not be "clean" text

full view only or limited (search only) Full view allows direct access to the as either PDF or text searching limited access texts still allows you to determine if an item is relevant access would have to be found another way (such as finding the physical book) limited access can also be analyzed using HathiTrust's data tools.

You can view:  the full text (if available)  the catalog record

Filters

Along the left column, you can limit results by:

 Subject   Language  Place of  Date  Original Format  Original Location (of the item)

Search Tips

Try one of the advanced searches so you can specify your search criteria Also try using phrases by putting quotation marks around the , such as "Diamond Head"

Adding to a Collection (see four for more on collections):

To add a book to one of your collections  click on the "Select Collection" to choose your target selection  click on the check to the left of the entry (or check the "Select all on page" box)  click on the "Add Selected" button in the top right

Fall, 2019 2

Working with PDFs in HathiTrust

Downloading PDFs

To download a PDF from HathiTrust, you must be logged in through the University of Hawai‘i

To download an item (book, article, etc.):

 Find the item  Click on what portion of the book you want to download: o this page o a selection of o the entire book  Hathi will then construct your PDF document and download it to your downloads folder

Fixing PDFs

After you have downloaded the PDF, you may want to delete some of the pages, such as those at the front or back of a book or blank pages in the middle.

Many programs and websites are available to edit PDFs, such as deleting pages, adding pages together, etc.

(full license only - Acrobat Reader will not work)  Preview (Macintosh only)

You can also search online for an PDF editor website.

These sites will typically allow you to upload a PDF, perform basic on the document, and then allow you to download the new PDF.

Working with PDFs

Once downloaded, the PDF is a file on your that you can open with other programs.

Your PDF reader (Preview, Acrobat Reader) will allow you to search the text, text (which may require further cleaning).

Fall, 2019 3

Working with Collections

Collections are groups of texts stored on the HathiTrust that you can refer to, share with others, and use when other people have shared theirs.

Adding an item to one of your collections is described at the bottom of page 1 above.

Collections are accessed via the "Collections" and "My Collections" tabs

Create a new collection

Search for collections

Access a collection by clicking on the collection's

Limit the list to your own collections

Collections the basis for the textual analysis tools provided by HathiTrust.

Options for Your Collections

You can make your collection public or private.

You can copy the URL to your public collections and share them with others.

Fall, 2019 4

Hathi Analytics

The textual analytic tools connected to Hathi are available at https://analytics.hathitrust.org

You must sign up for an account on the analytics website to use the tools. There are very precise requirements for usernames and passwords.

Creating and Validating a Workset

Worksets define the set of texts to be analyzed by a given tool. Typically these worksets begin as a collection on the HathiTrust digital . To begin creating a workset, click on worksets, and then click on the “create a workset” tab.

There are two ways to create a workset. If your workset is a public collection on the HathiTrust , you can use the sharable collection link to import the collection as a workset. If your workset is a private collection, you can download the collection’s as a .TSV (Tab-Delimited Text) file and upload this file to create a workset.

Validating a workset is not necessary to use Hathi’s analytical tools, however it can be worthwhile, especially when working with older worksets. When worksets are validated, Hathi checks to make sure that all the items contained within the workset are still held within the HathiTrust repository. To validate a workset, simply click on the “validate a workset” tab, and select the workset you would like to validate.

Algorithms

The HathiTrust Research Center has developed several general purpose algorithms to analyze worksets.  InPhO Topic Model Explorer o Creates a visualization of computationally derived “topics” within a workset  Named Entity Recognizer o Creates a list of all persons, places, organizations, and dates within the workset, along with the item and page number in which these named entities appear.  Token Count and Tag Cloud Creator o Measures word frequency throughout a workset, and creates a word cloud from the most frequently used words.

Fall, 2019 5

In addition to these algorithms, there is also an “extracted features download helper” tool which can be used to download an extracted features dataset to be used with other user created algorithms.

Data Capsule

Data Capsules are controlled virtual , which allow users to create and run custom algorithms in against volumes in the HathiTrust in a secure . By default, data capsules only have access to out of materials, however users can request indirect access to copyrighted materials, up to the entire HathiTrust corpora.

Data Sets

Data sets include page level extracted features of all 14.7 million volumes of the HathiTrust, and extracted features of all volumes of English language published between 1700 and 1920. The HathiTrust has only just begun directly providing datasets, and the list of datasets available will likely grow.

Hathi+

The Bookworm tool, found under the explore tab, allows users to create visualizations of word use over time as shown in the HathiTrust repository. Users can see frequency within the entire repository or within selected sub-classes of deposited items (e.g. law, medicine, literature).

Books about are more likely to be Groovy

Fall, 2019 6