...
Researchers have several options for creating their workset, including quering the HTRC Solr Proxy API /wiki/spaces/INT/pages/43417814. Researchers who do not yet have a workset and who only want to work with the public domain texts can create a workset in the HTRC Workset Builder /wiki/spaces/INT/pages/43418520.
Download Format
Files
The HTRC Extracted Features files are formatted in JSON. For more information about the fields, see the documentation for each release.
...
Code Block | ||
---|---|---|
| ||
rsync -azv data.analytics.hathitrust.org::pd-features/{{URL}} . |
Using the HTRC Portal Algorithm
...
Code Block |
---|
sh EF_Rsync.sh |
If your workset contained N volumes with HathiTrust volume IDs V1, V2, V3,... VN respectively, then executing the shell script as shown above will cause the following feature data files for the corresponding volumes to be transferred to your computer’s hard disk via rsync: V1.json.bz2, V2.json.bz2, V3.json.bz2, ..., VN.json.bz2. See Filepaths above to learn more about the pairtree structure the Extracted Features files follow.
...