HTRC worksets tutorial (Archived July 2022)

Learn the three different ways you can create worksets in HTRC Analtyics, as well as how to validate and download a workset to your personal machine.

Browse public worksets

There are many existing worksets in HTRC that you can use instead of creating your own. 

  • Sign in to HTRC Analytics, and from the home page, click on "Worksets" from the top menu to browse the public worksets and your own worksets. 

  • You will be taken to the page listing all the worksets that are public or created by you.

  • You can filter worksets by name, or you can narrow the display to your worksets only.
  • You can click on the hyperlinks to see the volumes in the workset and minimal metadata about each volume. You can follow the links in the “Title” field to see the volume in the HathiTrust Digital Library. You can also click on  "Download" button to download the HathiTrust volume IDs in the workset .

Creating a workset

There are three ways to create a workset to HTRC Analytics:

  1. Import a collection from HathiTrust
  2. Import your selected results from HTRC Workset Builder 2.0
  3. Upload a list of HathiTrust volume identifiers for your volumes of interest

You can create a workset directly from a public HathiTrust collection. There are many existing collections, or you can create your own by following these steps:

  • Go to the HathiTrust Digital Library and begin your search. (See: Tips for searching HathiTrust.)

  • On the results page, click the boxes next to the items you would like to add to your collection, and to your workset. Or you can choose to "Select all on page." From the drop down labeled "Select Collection," choose which collection to which you would like to add your selected volumes. You can also choose to create a new collection at this point! (See: How to create a collection.)

  • Collections can be created temporarily as you browse, or you can log-in to save your collection. Note that the credentials for the HathiTrust Digital Library are different from your HTRC Analytics account and are available only to users are HathiTrust partner institutions, although guest accounts can be created. (See: How to create a guest account.)
    • Note: For volumes in your HathiTrust collection that are not available via HTRC, which occasionally happens, the metadata will still appear in your workset, so that if and when it is available it will be included, but when you run an algorithm on your workset, it will be excluded from the job. 
  • Make sure you make your HathiTrust collection public. You will get an error if you try to import a private collection as a workset.
  • In a separate browser tab, go to HTRC Analytics and sign in. 

  • Click "Worksets" from the top menu on the home page.
  • Click the orange "Create A Workset" button toward the top right of the Worksets page.
  • Click on the blue "Import from HathiTrust" button

  • Return to your collection page on HathiTrust, and copy the URL. 

  • Go back to HTRC Analytics, and paste the URL into the field that says "HathiTrust Collection URL."
  • Click "Fetch collection"

  • A workset name will be suggested for you; you can edit it if desired.
  • Add information to the description field if not pre-populated. 
  • Click the checkbox to make your workset private–so that it is accessible to and viewable by you alone–or leave the box unchecked to make your workset public. 
  • Click "Create workset."

You can import selected results directly from the HTRC Workset Builder. To do this, following these steps:

  • Perform a unigram (single-term) text, metadata or combined text and metadata search in the Workset Builder, yielding a list of results (See: information on searching in Workset Builder).
  • From the page of results, choose volumes to keep in your shopping cart by checking boxes next to each volume and pressing the yellow "Add" button or by selecting volumes via check box and dragging and dropping them into the shopping cart icon at the top right on the result page:

  • Once you have selected the desired volumes your workset, click on the shopping cart icon on the results page to view your selection. This page will show you what is in your current selection, as well as present new options for interacting, saving, and exporting your workset.
  • From the shopping cart page, you can export your workset as a list of volume IDs, a federated metadata file in JSON, TSV or CSV format, or download the JSON Extracted Features data for the volumes in your selection. You can also choose the "Export as Workset" button to directly export your shopping cart to HTRC Analytics as a workset:

  • Once you click to export the workset, you'll be directed to HTRC Analytics, and prompted to login, if you are not currently. Once logged in, you'll be taken directly to the import page, where you're asked to add a name and description for your workset, and decide if you'd like to make it a public (shareable via URL and usable by others) or private (viewable only to your user account) workset:

  • Alternatively, you could first navigate to the Create A Workset page and choose to Import from HTRC Workset Builder:

  • Choosing this option will generate the same page as above, but with an empty "Selection ID" field in which you can paste a selection cart ID, which is given in the bar to the left of the "Export Workset" button on the shopping cart view page:

You can create a workset to use with HTRC algorithms by uploading a list of HahtiTrust volume IDs. 

There are several ways to go about getting a list of volume IDs, including by downloading the metadata for HathiTrust collection(s) and curating a list locally, using the HathiTrust Bibliographic API, or the HathiFiles

Here are the steps you will need to follow:

  • Once you have a list of volume IDs, make sure it conforms to the file requirements. Your volume ID list must be in CSV, TSV, or TXT format, and the only thing it must contain are the volume IDs in the left-most column. Additional fields will be ignored, so while they can be present, they won't affect the upload or the metadata for your workset. The file should contain a header row containing the text "volume" or "id".
  • Click "Worksets" from the top menu on the home page.

  • Click the orange "Create A Workset" button toward the top right of the Worksets page.
  • Click the "Upload File" button."

  • Give your workset a name and description.

  • Upload the file by clicking "Choose File."
  • Click the checkbox to make your workset private–so that it is accessible to and viewable by you alone–or leave the box unchecked to make your workset public. 
  • Click "Create workset."

Validate a workset

HathiTrust is a dynamic repository: It continues to grow, and, with less frequency, items are removed or their access profile changes. In order to check if the volumes in your workset are available for analysis using HTRC algorithms or the HTRC Data Capsule environment, you can validate a workset. 

Note: HTRC Algorithms and HTRC Data Capsules can currently access a snapshot of public domain volumes from the HathiTrust Digital Library. The HTRC is making improvements to increase the frequency with with data is synced from HathiTrust. The most recent HTRC Extracted Features release represents a snapshot of 13.7 million volumes from HathiTrust, and HathTrust+Bookworm likewise can visualize 13.7 volumes from the Digital Library. 

To validate a workset, start by clicking "Worksets" in the top menu. 

Then, click the button toward the top right that says, "Validate Workset."

You will be able to choose the workset you would like to validate, either one of your own or a public workset. 

You can also validate a list of HathiTrust volume IDs before you create a workset from them. As when you upload a file to create a workset, you can upload a CSV, TSV, or TXT file where the only required field is the list of volume IDs in the first column. 

Validating a workset will show you how many of the volumes in your workset are currently accessible via HTRC algorithms or the HTRC Data Capsule environment. You can download either the volume IDs that are valid or those that are not. You could then upload the valid IDs as a new workset, if you wanted.

Download a workset

After you have created a workset, you can download it as a list of volume identifiers in comma separated value (csv) format. Because each workset is functionally a list of pointers to content in the HathiTrust Digital Library, the full text of the volumes is not included in the download. If you are interested in receiving a dataset from the HathiTrust to do research on your own machine, please refer to the directions for requesting a custom dataset. The volume identifiers in a workset are consistent with the volume identifiers used elsewhere across the HathiTrust.

From the homepage of HTRC Analytics sign in and then navigate to Worksets.

Click on the name of the workset you would like to download and click the "Download" button.