Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

HathiTrust volumes are identified via unique HathiTrust IDs. These alpha-numeric IDs track volumes across HathiTrust and HTRC systems. This volume of Jane Austen's letters has the volume ID hvd.32044021076179. When viewing a volume in the Digital Library, the volume ID can be found in the URL after "id=". The volume ID can be used to call metadata via the HathiTrust's Bibliographic API or to pull volume content via the HathiTrust's Data API. Additionally, the volume ID is often present in the file and/or directory name for content pertaining to a specific volume, and it also makes up the (pairtree) directory structure for volumes accessed via HathiTrust dataset requests or the HTRC Extracted Features Dataset. 

HathiTrust volume IDs begin with a prefix code that identifies the library-of-origin (i.e. holding library) of the digitized item. For example, all volumes IDs that begin with uiug relate to objects held by the University of Illinois. 

...

While metadata for volumes in HathiTrust exists in a variety of formats and for a number of intended use cases, it generally begins as MARC metadata, the standard for library cataloging. It is often helpful to rely on the MARC specifications to navigate HathiTrust metadata for analysis, for example determining what certain codes mean or data structures imply. Additionally, HathiTrust publishes specification for their metadata records that can be quite useful as there are HathiTrust-specific uses of some fields, particularly MARC field 975, that contain useful metadata about volumes: https://www.hathitrust.org/bib_specifications

While HathiTrust does not facilitate bulk-download of full metadata records at this time, metadata is available in various formats and through several services that each can be useful depending on the use case:

  • Hathifiles: tab-delimited files of reduced bibliographic metadata pulled from MARC records that are released daily for incremental additions to HathiTrust. On the first of each month, a file of every volume currently in HathiTrust is released

...

  • HathiTrust Bibliogrpahic API: for retrieving JSON-formatted MARC metadata via HathiTrust ID, HathiTrust record number, or OCLC number for up to 20 identifiers at a time.
  • HTRC Extracted Features: volume-level JSON files include limited bibliographic metadata in addition to page-level metadata and features.

Additionally, this tables of MARC Coverage can help clarify the nature of the collection. 

...