HTRC Data Capsule Specifications and Usage Guide

HTRC Data Capsule Specifications and Usage Guide

Dive deeper into understanding the specs provided in each Data Capsule environment. Here we break down the different types of Capsules, modes, and preloaded libraries, data, and tools we include in each Capsule.

 

 

HTRC Data Capsules are secure computing environments developed to facilitate non-consumptive text analysis research. Each Capsule is a virtual machine (VM) that provides researchers a desktop they can use to perform their investigation of volumes in the HathiTrust Digital Library. 

 



Kinds of Capsules: Demo, Research, and Customized Research

During creation, choose between a Demo Capsule (for testing and experimenting with the interface), a Research Capsule (for conducting research), or a Customized Research Capsule (ideal for classroom or workshop use). 

Demo
  • Capsule comes pre-loaded with sample volumes from the HathiTrust 

  • No options for Capsule size or specs

  • Access to public domain corpus only

  • Results cannot be submitted for review to release

  • No additional information required to create

  • Cannot be shared with collaborators

  • Expires after 30 days

Research
  • Option for Capsule to come pre-loaded with sample volumes from the HathiTrust 

  • User can set the Capsule size (see 'Configuration options for Research Capsules' below)

  • By default, access to public domain corpus only

  • All Research Capsules require additional information to create in order to aid in results export requests. Only the requests to create or convert to a Capsule with full corpus access are subject to additional screening (as described above).

  • Can be shared with up to 5 collaborators

  • Expires 18 months from your last log-in date

  • Members-only Benefit: full corpus access for the Data Capsule service. Existing Data Capsule users from HathiTrust member institutions or new Data Capsule requesters from member institutions have the exclusive option to select “Full Corpus Access,” which includes copyrighted items.

  • Configuration options for Research Capsules:

    • Virtual Machine CPUs (VCPUs): the number of virtual machine processors ranges from 1-10 VCPUs for the Capsule

    • Memory: between 1 GB and 20 GB

    • Secure volume size: between 10 and 100 GB

Customized Research
  • User can set the Capsule size (see 'Configuration options for Customized Research Capsules' below) 

  • By default, access to public domain corpus only

  • All Customized Research Capsules require additional information to create in order to aid in results export requests. Only the requests to create or convert to a Capsule with full corpus access are subject to additional screening (as described above).

  • Can be shared with up to 5 collaborators

  • Expires 18 months from your last log-in date

  • Members-only Benefit: full corpus access for the Data Capsule service. Existing Data Capsule users from HathiTrust member institutions or new Data Capsule requesters from member institutions have the exclusive option to select “Full Corpus Access,” which includes copyrighted items.

  • Configuration options for Customized Research Capsules

    • Virtual Machine CPUs (VCPUs): the number of virtual machine processors ranges from 1-10 VCPUs for the Capsule

    • Memory: between 1 GB and 20 GB

    • Secure volume size: between 10 and 100 GB

 Watch videos about creating Demo and Research Capsules here.

Capsule Access and Interaction Modes

Maintenance vs. Secure mode

The Capsules are configured with special security settings that allow you to interact with them in two modes:  Maintenance mode and Secure mode

  • In Maintenance mode, you are allowed to access the network freely and install whatever software you want. 

  • In Secure mode, general network access is restricted, but you can access the HTRC corpus repository, which is otherwise blocked. Any changes you make to the Capsule in Secure mode will not persist. To save data from your analysis, you'll need to save your results in the Secure Volume storage on your Capsule. This storage option is not visible in Maintenance mode.  

 Watch a video on switching modes in a Data Capsule here.

Ways of accessing Capsules

Access your Capsule in-browser from HTRC Analytics by connecting via Remote Desktop (both Maintenance and Secure modes available) or the Terminal command line interface (Maintenance mode only).

You can also SSH into your Capsule in Maintenance mode only if you've followed the directions under "Advanced Features" to set-up a public key. 

For a detailed explanation for how to create, access, and operate your Capsule, please visit the HTRC Data Capsule Tutorials page for step-by-step instructions. 

 

Capsule Sharing Functions

Research Data Capsules can be shared between up to 5 collaborators. The person who creates the Capsule has the most control over it, and they can add and remove other collaborators, assign permissions, and delete the Capsule.

There are 3 roles for users of a shared Capsule:

  • Owner (and Owner-Controller): By default each Capsule creator will get this role. It comes with the highest level of control. The Owner-Controller is able to perform all Capsule functions available in HTRC Analytics, including accessing, starting, stopping, switching modes, deleting, and managing the collaborators on the Capsule. By default the person who creates the Capsule will be the Owner-Controller until they delegate control of the Capsule to a collaborator, at which point their role becomes Owner and the ability to start, stop, and switch the modes of the Capsule moves to the Controller (see below). The Owner can resume Owner-Controller status whenever they choose.

  • Contributor: The Owner can share their Capsule with other HTRC Analytics users. New collaborators have the role of Contributor when they are added. This role has the lowest permission level. Contributors can connect to and conduct research in the Capsule, but cannot perform any of the Capsule management functions.

  • Controller: The Owner can choose to give a Contributor the status of Controller in order to delegate some management tasks of the Capsule to that user, including starting, stopping, and switching modes. There can only be one Controller at a time, and the Owner can revoke control of the Capsule at any time.

Once a collaborator is added to a Capsule, the Capsule will appear for them on their Capsules listing page in HTRC Analytics. Before the new collaborator can access the Capsule, they will need to agree to the Data Capsules Terms of Use.

For Capsules with full-corpus access, HTRC will review the request to add a collaborator and either approve or deny it. The Capsule details will only appear on their Capsules listing page if the request is approved.

As Demo Capsules are meant for short-term exploration, they cannot be shared with collaborators.

 Watch a video on adding collaborators to a Data Capsule here.