Excerpt |
---|
See the different ways you can download EF files to your local machine or inside a data capsule. |
Table of Contents | ||
---|---|---|
|
...
Sample Files
A sample of 100 extracted feature files is available for download through your browser: sample-EF202003.zip
Also, thematic collections are available to download: DocSouth_sample_EF202003.zip (82 volumes), EEBO_sample_EF202003.zip (234 volumes), ECCO_sample_EF202003.zip (412 volumes).
Filepaths
The dataset is stored in a directory specification created by HTRC called stubbytree. (Note: This is a change from previous versions of the Extracted Features dataset that used the pairtree directory specification.) Stubbytree places files in a directory structure based on the file name, with the highest level directory being the HathiTrust source code (i.e. volume ID prefix), and then using every third character of the cleaned volume ID, starting with the first, to create a sub-directory. For example the Extracted Features file for the volume with HathiTrust ID nyp.33433070251792 would be located at:
...
A sample of 100 extracted feature files is available for download through your browser: sample-EF201801.zip.
Also, thematic collections are available to download: DocSouth_sample_EF201801.zip (87 volumes),EEBO_sample_EF201801.zip (355 volumes), ECCO ECCO_sample_EF201801.zip (505 volumes).
Filepaths
The data is stored in a pairtree directory structure, allowing you to infer the location of any file based on its HathiTrust volume identifier. Pairtree format is an efficient directory structure, which is important for HathiTrust-scale data, where the files are placed in directories based on “pairs” of characters in their file names. For example the Extracted Features file for the volume with HathiTrust ID mdp.39015073767769 would be located at:
...
A sample of 100 extracted feature files is available for download through your browser:sample-betaEF201801.zip.
Also, thematic collections are available to download: DocSouth_sample_EF201801.zip (87 volumes), EEBO_sample_EF201801.zip (355 volumes), ECCO_sample_EF201801.zip (505 volumes).
Filepaths
The data is stored in a pairtree directory structure, allowing you to infer the location of any file based on its HathiTrust volume identifier. Pairtree format is an efficient directory structure, which is important for HathiTrust-scale data, where the files are placed in directories based on “pairs” of characters in their file names. For example the Extracted Features file for the volume with HathiTrust ID mdp.39015073767769 would be located at:
...