Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A:  Within the HTRC Analytics platform, only in the HTRC Data Capsule environment. HTRC Algorithms function only on "worksets," which are user-created collections of content from the HathiTrust Digital Library. You can import outside data to your Capsule when it is in Maintenance Modemode, though, and work with it within that system. You can also make use of HTRC Extracted Features alongside if you prefer to work on your local desktop only. 

...

  • First, store your Python scripts somewhere on the Internet.
  • Start your Capsule from within the Analytics interface, and make sure your machine is in Maintenance Modemode.
  • Enter your Capsule via Terminal viewer or Remote Desktop viewer.
  • Download the Python scripts from the Internet onto your Capsule.
  • Switch to Secure Modemode
  • If you know the volume IDs that you are interested, you can go ahead to fetch content of these volumes by using this sample Python script in Fetching Volume OCR Content in HTRC Data Capsule (Secure Modemode)
  • Run your Python scripts against the content.
  • If you don't have the volume IDs of your interest, you can search for volumes in the HathiTrust Digital Library. You can search by subject, topic, author, year, etc., and identify the volumes of interest and save your chosen volumes as a collection in HathiTrust. From there, you can either use the HTRC Workset Toolkit to load volumes from the collection in your Capsule, or download the collection's metadata to retrieve the volume IDs for the volumes you have selected.
  • Once you have the volume IDs ready, you can go ahead to fetch the volume content in Data Capsule Secure Mode mode and perform analysis using your Python scripts as mentioned above.

...