Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Removed bogus links from rsync paths

Table of Contents
maxLevel3

...

Read the docs

Sample Files

A sample of 100 extracted feature files is available for download through your browser: sample-EF202003.zip

Also, thematic collections are available to download: DocSouth (82 volumes), EEBO (234 volumes), ECCO (412 volumes).

Filepaths

The dataset is stored in a directory specification created by HTRC called stubbytree. (Note: This is a change from previous versions of the Extracted Features dataset that used the pairtree directory specification.) Stubbytree places files in a directory structure based on the file name, with the highest level directory being the HathiTrust source code (i.e. volume ID prefix), and then using every third character of the cleaned volume ID, starting with the first, to create a sub-directory. For example the Extracted Features file for the volume with HathiTrust ID nyp.33433070251792 would be located at: 

...

The rsync module (or alias path) for Extracted Features 2.0 is data.analytics.hathitrust.org::features-2020.03/ . 

If you run the rsync command as written above, without specifying file paths, it will sync all files. Do not do this unless you are prepared to work with the full dataset, which is 4 TB. Make sure to include the final period (.) when running your command in order to sync the files to your current directory, or else provide the path to the local directory of your choosing where you would like the files to be synced to.

...

The Rsync module (or alias path) for Extracted Features 1.5 is data.analytics.hathitrust.org::features-2018.01/ . 

If you run the rsync command as written above, without specifying file paths, it will sync all files. Do not do this unless you are prepared to work with the full dataset, which is 4 TB. Make sure to include the final period (.) when running your command in order to sync the files to your current directory, or else provide the path to the local directory of your choosing where you would like the files to be synced to.

...

The Rsync module (or alias path) for Extracted Features .2 is data.analytics.hathitrust.org::features-2015.02

If you run the rsync command as written above, without specifying file paths, it will sync all files. Do not do this unless you are prepared to work with the full dataset, which is 1.2 TB. Make sure to include the final period (.) when running your command in order to sync the files to your current directory, or else provide the path to the local directory of your choosing where you would like the files to be synced to.

...