Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: pd-features > features

...

Researchers have several options for creating their workset, including quering the HTRC Solr Proxy API /wiki/spaces/INT/pages/43417814Researchers who do not yet have a workset and who only want to work with the public domain texts can create a workset in the HTRC Workset Builder /wiki/spaces/INT/pages/43418520

Download Format

Files

The HTRC Extracted Features files are formatted in JSON. For more information about the fields, see the documentation for each release

...

Code Block
languagebash
rsync -azv data.analytics.hathitrust.org::pd-features/{{URL}} .


Using the HTRC Portal Algorithm

...

 

Code Block
sh EF_Rsync.sh

 

If your workset contained N volumes with HathiTrust volume IDs V1, V2, V3,... VN respectively, then executing the shell script as shown above will cause the following feature data files for the corresponding volumes to be transferred to your computer’s hard disk via rsync: V1.json.bz2, V2.json.bz2, V3.json.bz2, ..., VN.json.bz2. See Filepaths above to learn more about the pairtree structure the Extracted Features files follow.

...