Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 HTRC Extracted Features files gathered through rsync will download in pairtree format. Pairtree format is a hierarchical filesystem developed by the University of California Curation Center that maps identifier strings (in this case the HathiTrust Volume ID) to directory paths two characters at a time. The filepaths keep the institutional short code (e.g. mpd, uc2) at the front of each HTID intact.

...


Downloading with Rsync

Converting HathiTrust Volume ID to Rsync URL (using HTRC Feature Reader)

...

Because the feature data files are compressed, you may need to uncompress them into JSON-formatted text files, depending on your need. The compression used is bzip2. If you are using the files with the HTRC Feature Reader, the library will deal with the compression automatically.

Downloading with Workset Builder 2.0

Extracted Features files can also be downloaded for search results in Workset Builder 2.0. First, enter search terms for a desired set of volumes. Once results are returned, apply any filtering for the results to remove any volumes for which you do not want Extracted Features files. Once your result set includes all of the volumes you'd like Extracted Features files for, click "JSON Extracted Features (ZIP)" under the "Export Search Results" heading at the top of the left sidebar:

...