Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: pd-features > features
Info

This documentation has been updated for the newest format of URLs for the Extracted Features dataset, intended for release in August 2016. This format no longer has basic and advanced features described in separate files. If you are looking for information on the earlier format, see version 12 of this page.

The filepath to sync Extracted Features files through RSync follows a pairtree format, keeping the institutional shortcode intact (e.g. mpd, uc2).

...

If you are the HTRC Feature Reader library, there is a convenience function in htrc_features.utils.id_to_rsync(htid, kind):

Code Block
languagepy
>> from htrc_features import utils
>> utils.id_to_rsync('miun.adx6300.0001.001')
'miun/pairtree_root/ad/x6/30/0,/00/01/,0/01/adx6300,0001,001/miun.adx6300,0001,001.json.bz2'

...

Code Block
languagepy
import os
from pairtree import id2path, id_encode
def id_to_rsync(htid):
	'''
	Take an HTRC id and convert it to an Rsync location for syncing Extracted Features
 	'''
    libid, volid = htid.split('.', 1)
    volid_clean = id_encode(volid)
    filename = '.'.join([libid, volid_clean, kind, 'json.bz2'])
    path = '/'.join([kind, libid, 'pairtree_root', id2path(volid).replace('\\', '/'), volid_clean, filename])
    return path

...

Code Block
languagebash
rsync -azv data.analytics.hathitrust.org::pd-features/{{URL}} .