/
Finding Extracted Features data for a known volume ID
Finding Extracted Features data for a known volume ID
The filepath to sync Extracted Features files through RSync follows a pairtree format, keeping the institutional shortcode intact (e.g. mpd, uc2).
Converting ID to RSync URL (Python with HTRC Feature Reader library)
If you are the HTRC Feature Reader library, there is a convenience function in htrc_features.utils.id_to_rsync(
htid)
:
>> from htrc_features import utils >> utils.id_to_rsync('miun.adx6300.0001.001') 'miun/pairtree_root/ad/x6/30/0,/00/01/,0/01/adx6300,0001,001/miun.adx6300,0001,001.json.bz2'
Converting ID to RSync URL (Python)
This example is a simplified part of a longer notebook, which further describes how to collect and download large lists of volumes: ID to EF Rsync Link.ipynb.
If you don't have it, you may have to install the pairtree library with: pip install pairtree
(Python 2.x only).
import os from pairtree import id2path, id_encode def id_to_rsync(htid): ''' Take an HTRC id and convert it to an Rsync location for syncing Extracted Features ''' libid, volid = htid.split('.', 1) volid_clean = id_encode(volid) filename = '.'.join([libid, volid_clean, kind, 'json.bz2']) path = '/'.join([kind, libid, 'pairtree_root', id2path(volid).replace('\\', '/'), volid_clean, filename]) return path
Example:
id_to_rsync('miun.adx6300.0001.001') 'miun/pairtree_root/ad/x6/30/0,/00/01/,0/01/adx6300,0001,001/miun.adx6300,0001,001.json.bz2'
The Extracted Features for this volume can be downloaded using RSync:
rsync -azv data.analytics.hathitrust.org::features/{{URL}} .
, multiple selections available,
Related content
HTRC BookNLP Dataset for English-Language Fiction
HTRC BookNLP Dataset for English-Language Fiction
Read with this
Extracted Features Use Cases and Examples
Extracted Features Use Cases and Examples
Read with this
Downloading Extracted Features
Downloading Extracted Features
More like this
HTRC Workset Toolkit
HTRC Workset Toolkit
More like this
Extracted Features [v.2.0]
Extracted Features [v.2.0]
More like this
About the Collection
About the Collection
More like this