HTRC Data Capsule Tutorials

View step-by-step tutorials for common Data Capsule functions.

Once you have your Capsule running, you may find it useful to open this guide in an internet browser in your Capsule so you can copy and paste commands. The short link for this page is: https://wiki.htrc.illinois.edu/x/TQFRAQ.

Overview of Generic Research Workflow in a Capsule

  1. Register and sign in to your HTRC account. 
  2. Create and start a Capsule in the HTRC.
  3. View your Capsule using the Remote Desktop view/Terminal view, or SSH into your Capsule in Maintenance mode.
  4. Configure the software environment of the Capsule as needed. Download the scripts or programs you plan to use in your analysis.
  5. Switch Capsule to Secure mode through HTRC.
  6. Run your analysis against the secure HTRC corpus repository.
  7. Move your results to the secure volume storage on the Capsule.
  8. Switch Capsule back to Maintenance mode to regain normal network access.

Step-by-Step Instructions

Note: You will need to create an HTRC account if you don't already have one. 

From the Analytics homepage, create a Capsule by clicking on Data Capsules on the top menu, then select Create a Capsule from the dropdown menu. You will be asked to provide information about the Capsule you would like to create. This step also explains how to create or convert an existing Capsule to one with access to the full HathiTrust corpus, for HathiTrust members only. 

 Create a Capsule

 Watch video tutorials

User interfaces shown in these videos may be outdated, but step-by-step instructions are up to date.

We're updating our videos to show latest changes!

Create a Capsule

Check Capsule Status

Convert a Research Capsule

 Read step-by-step instructions

Navigate to the 'Create a data capsule' page

Make sure you are signed in. Navigate to Create a data capsule page under the Tools navigation label at the top menu of HTRC Analytics.


Create a data capsule

You will be prompted to choose to create a Demo Capsule, a Research Capsule, or a Customized Research Capsule.


Create Demo Capsule

Note:

  • Demo Capsules are not configurable and can access public domain content only.
  • You cannot request to export derived data from a Demo Capsule.

Hit the Create Capsule button. The Capsule creation procedure usually takes about 1 minute to complete. Refresh your screen to see if it has finished.


You will be prompted to agree to the HTRC Data Capsules Terms of Use. Please review this document as it outlines policy for acceptable in-Capsule behavior. 

Create Research Capsule

Note:

  • Research Capsules are configurable and by default can access public domain content only.
  • You can request to export derived data from a Research Capsule.
  • Additional information is required to create your Capsule. 
  • During creation or after it's created, researchers from HathiTrust member institutions can request for their Research Capsule to be converted to one with computational access to the full HathiTrust corpus, including in-copyright content. 

Fill out the form with the title of your research project, and choose the specs for your Capsule. 

Capsule size can range from 1-10 VCPUs, from 1-20 GB of memory, and from 10-70 GB of secure volume size. The VCPUs and memory allocation you choose will affect the processing speed of your Capsule. 

Add the description for your research project. These answers will be used to aid in reviewing requests to export results from your Capsule. The more information you can provide, the more easily we can assess your results for adherence to the HTRC's Non-consumptive Use Research Policy

Affiliates of HathiTrust member institutions can check the box to request a Capsule with access to the full HathiTrust corpus. 


Checking that box will prompt you to fill out additional information about your project. Note: Creation requests from users who check this box will be routed for human review. Your request will be reviewed to verify that you are affiliated with a HathiTrust member institutions and that your request demonstrates serious research intentions in compliance with the HTRC's Non-consumptive Use Research Policy and  HTRC Data Capsules Terms of Use


Include more information about your anticipated results to further assist in the human review of your data export requests.



If you like, you can choose to allow HTRC to communicate anonymized information about your research project. You must also agree that you will not share your log-in information for HTRC Analytics with anyone.
 


You will be prompted to agree to the HTRC Data Capsules Terms of Use. Please review this document as it outlines policy for acceptable in-Capsule behavior. You will be reminded of these terms regularly while using your Capsule. 

Create Customized Research Capsule

Note:

  • Customized Research Capsules are copies of other users' Research Capsules, and they are ideal for classroom or workshop use.
  • Customized Research Capsules are configurable and by default can access public domain content only.
  • You can request to export derived data from a Customized Research Capsule.
  • Additional information is required to create your Capsule. 
  • During creation or after it's created, researchers from HathiTrust member institutions can request for their Customized Research Capsule to be converted to one with computational access to the full HathiTrust corpus, including in-copyright content. 

Choose a Customized Research Capsule template to use, and select Create a Data Capsule.


Fill out the form with the title of your research project, and choose the specs for your Capsule. 

Capsule size can range from 1-10 VCPUs, from 1-20 GB of memory, and from 10-70 GB of secure volume size. The VCPUs and memory allocation you choose will affect the processing speed of your Capsule. 





Add the description for your research project. These answers will be used to aid in reviewing requests to export results from your Capsule. The more information you can provide, the more easily we can assess your results for adherence to the HTRC's Non-consumptive Use Research Policy

Affiliates of HathiTrust member institutions can check the box to request a Capsule with access to the full HathiTrust corpus. 


Checking that box will prompt you to fill out additional information about your project. Note: Creation requests from users who check this box will be routed for human review. Your request will be reviewed to verify that you are affiliated with a HathiTrust member institutions and that your request demonstrates serious research intentions in compliance with the HTRC's Non-consumptive Use Research Policy and  HTRC Data Capsules Terms of Use


Include more information about your anticipated results to further assist in the human review of your data export requests.

If you like, you can choose to allow HTRC to communicate anonymized information about your research project. You must also agree that you will not share your log-in information for HTRC Analytics with anyone.

You will be prompted to agree to the HTRC Data Capsules Terms of Use. Please review this document as it outlines policy for acceptable in-Capsule behavior. You will be reminded of these terms regularly while using your Capsule.

Check Capsule Status 

After creating a Capsule, you will be taken back to the "My data capsule allocations" page. By default, the Capsule you just created is not running. 

  

Convert a Research Capsule

HathiTrust member-affiliated individuals can request to convert existing Research Capsules into one with access to the full HathiTrust corpus. 

From your My Data Capsule Resources page, click on the title of the Capsule you would like to convert. Then, click the button to Request access to Full HathiTrust Corpus.

You will be taken to the Capsule creation form. If you submit answers when creating your Capsule, they will appear for you to review and, if desired, edit. You will also be asked to fill in additional information about your research use case. Your request will be reviewed to verify that you are affiliated with a HathiTrust member institutions and that your request demonstrates serious research intentions in compliance with the HTRC's Non-consumptive Use Research Policy and  HTRC Data Capsules Terms of Use


Start the Capsule you created by clicking the Start Capsule button on the Capsules page.

 Start the Capsule

From the HTRC Analytics homepage, click the My data capsule allocations link under the Tools top menu header. 


From the My data capsule allocations page, click on the data capsule title of the capsule you wish to start.


Next, click the Start Capsule button located on the upper-right hand side of the page.

Interact with the Capsule either via Remote Desktop viewer or Terminal viewer.

 Interact with the capsule

To get the details of your capsule so that you can log in to it, click on data capsule's title on the My data capsule allocations page.

You will see the details for your capsule. From this page, you can start, stop, or delete your Capsule. If your capsule has been started, you can also click to connect via Terminal (command line interface) or Remote Desktop (to see your capsule's Ubuntu desktop). You will see different options in the More Data Capsule Functions dropdown menu depending on whether you have opened a demo capsule or research capsule and whether or not you have started your capsule.

  • Demo capsule
    • If you have not yet started your capsule, the More Data Capsule Functions dropdown menu offers you the option to request data capsule help.

    • If you have started your capsule, the More Data Capsule Functions dropdown menu allows you to switch between Secure mode and Maintenance mode as well as request data capsule help.

       
  • Research Capsule
    • If you have not yet started your Capsule, the More Data Capsule Functions dropdown menu allows you to manage collaborators, create a template, or request data capsule help.

    • If you have started your capsule, the More Data Capsule Functions dropdown menu allows you to choose to switch between Secure mode and Maintenance mode, manage collaborators, or request data capsule help.


If you choose connect via Terminal, you will be take to a page showing a command line interface to interact with your Capsule. (Note: This option available in Maintenance mode only.)

If you choose to connect via Remote Desktop, you will be taken to a page from which you can interact with your Capsule's desktop. (Note: This option is available in either Maintenance or Secure mode.)

Watch the following video for rules for copying and pasting text/commands in the Data Capsule environment:

Overview of rules

  • Regardless of your local operating system, use ctrl-c and ctrl-v to copy and paste IN the Capsule.
  • AND if you are using the terminal IN your Capsule, regardless of your local operating system, use ctrl-shift-c and ctrl-shift-v.
  • You cannot copy and paste from you local desktop into the Capsule window AND you cannot copy and paste from your Capsule to your local desktop.

If you want to interact with your Capsule via SSH from your personal machine, you can follow the directions to set up that access. 

Alternatively, you can SSH into your Capsule when it is in Maintenance mode only. 

 SSH access in Maintenance mode

 Watch video tutorial

User interfaces shown in this video may be outdated, but step-by-step instructions are up to date.

We're updating our videos to show latest changes!

 Read step-by-step instructions

First, you will need a public key. Scroll down to the blue box at the bottom of your Capsule's status page.

You will be prompted for a key when you attempt to SSH into your Capsule. If you do not yet have a public key set up, then entering one will establish your key. You can do this by clicking Add or Update Public Key in the blue box. If you already have a key, resubmitting a response in the field below "Public Key" will change your key.

You'll find the command to SSH into your Capsule in the blue box on each Capsule's status page.

Switch between Maintenance and Secure mode.

 Switch Capsule modes

HTRC Data Capsules have two modes: Maintenance mode and Secure mode. In Maintenance mode, the capsule can access the network (i.e. the internet) so that you can set up your capsule as you like, such as installing software or importing additional, non-HathiTrust data. In Secure mode, the capsule can access HathiTrust Data. 

HathiTrust data you import and/or work with in Secure mode must be stored on the capsule's Secure Volume, a storage location available in Secure mode only, in order to persist in a capsule when its modes are switched or when it is turned off and back on. For security reasons, data transferred or generated in the capsule in Secure mode that is not saved to the Secure Volume will be deleted when the capsule switches modes or is turned off and on. 


 Watch a video on how to switch modes

User interfaces shown in this video may be outdated, but step-by-step instructions are up to date.

We're updating our videos to show latest changes!

 Read instructions for how to switch modes

There are two ways you can switch your capsule's mode.

Option 1:

Go to the My data capsule allocations page and click on the title of one of your data capsules.

Start your capsule, and then click on the "More data capsule functions" dropdown menu to select "Switch to secure mode." Once it is in secure mode, you can switch it back to maintenance mode from the dropdown as well. 


Option 2:

When viewing your capsule via Remote Desktop, you will see a blue button to either "Switch to Secure Mode" or "Switch to Maintenance Mode." 



Click the button to switch. You'll see the Capsule's state change.



Once it has switched, you'll see that you can click the blue button again to switch modes back. 

 

Share your Research Data Capsule with up to 5 other researchers.

 Managing Data Capsule Collaborators

 Watch a video tutorial

User interfaces shown in this video may be outdated, but step-by-step instructions are up to date.

We're updating our videos to show latest changes!

 Read step-by-step instructions

From the My data capsule allocations page, click on the data capsule title for the capsule you would like to share.


Next, click on the dropdown menu located to the right of the screen labeled More data capsule functions. Select the Manage collaborators option.



You will be taken to a new page, where you can input the email address for the user you would like to add.



The email address must be the one associated with their HTRC Analytics account or you will get an error.



When you successfully add a collaborator, that user's information will appear in the table of collaborators. By default, they will have the role of Contributor. Contributors can access the Capsule and interact with it in its current state. You will have the role of Owner-Controller.



Before the new Contributor can access the Capsule, they will need to agree to the Data Capsules Terms of Use. You will also be unable to delegate control of the Capsule to them until they have agreed.

Once they have agreed to the Terms of Use, you can choose to make them a Controller of the Capsule by clicking the Delegate Control button.



Once complete, you'll find that their role has changed to Controller. Only the Controller can start, stop, and switch the modes of the Capsule. (The Owner-Controller likewise can do these tasks.)



Your role has changed to Owner. The owner can delete the Capsule and revoke control from the Controller. Click on the Revoke Control button to resume Owner-Controller status.



Now the collaborator again has the role of Contributor and you are Owner-Controller.



If you no longer want to share your Capsule with a user, click the red 'X' button next to the Delegate Control button.



After they are removed, you'll see the collaborators table has returned to displaying only you as associated with this Capsule.


Change your data capsule resource amounts.

 Updating data capsule resources

You can adjust your data capsule resource amounts after after you’ve created it and when it is in SHUTDOWN mode. First select your desired data capsule from the My data capsule allocations page, and then click on the “Update resources” button in the “Resources” row.

Screen Shot 2024-07-02 at 2.13.28 PM.png

A popup will appear and you can adjust your available resources to you liking. Please note: if you have other data capsules, your available options will be based on what you have already used in your other capsules. Resource maximums can be seen at the top of your My data capsule allocations page.

Screen Shot 2024-07-02 at 2.14.50 PM.png

Make your changes and click “Confirm.” Your data capsule will take a moment to update accordingly.

Screen Shot 2024-07-02 at 2.17.20 PM.png

Bring text data into your Capsule.

 Get data

Learn how to bring HT volume content into your Data Capsule.

Preferred method

HTRC has developed a Python library for loading volumes into the Data Capsule environment: the HTRC Workset Toolkit. The Toolkit is standard in all Capsules created after March 18, 2018. If you have a Capsule created earlier than this date, then you will need to install or update the Toolkit. 

Make sure you are in Secure mode to prepare to fetch content into your Data Capsule; it won't work in Maintenance mode for security reasons.

You can use the Workset Toolkit's "htrc download" command to transfer the volumes you would like to include in your dataset.

For example, the following command will import the volumes in the HathiTrust collection 'Adventure Novels: G.A. Henty'.

htrc download 'https://babel.hathitrust.org/cgi/mb?a=listis;c=464226859'


You can also curate a list of volumes whose data you would like to import by creating a file containing a HathiTrust volume ID list that you're interested in, with one ID per line. Run the above command replacing the collection URL with your file name.

For example, if you had a file called myvolumes.txt, you would run the following command.

htrc download myvolumes.txt


In the above examples, the data will be transferred to “/media/secure_volume/workset/”. If you want to specify an alternative location, provide an output by including -o and the file path in your command.

Other functions of the Workset Toolkit

You can also use a volume ID, collection URL, or catalog record ID to import volumes. Additionally, you have the option to concatenate files, remove folders, and retrieve metadata using the functions of the Workset Toolkit.

For more examples, see the detailed guide.

For the technical documentation, see: https://htrc.github.io/HTRC-WorksetToolkit/cli.html




Perform your analysis. See the following use cases for examples of how to perform text analysis in the Capsule. 

If you will need more than one session to complete your research, save your interim data to the Secure Volume.

 Save data to Secure Volume

Save data to the Secure Volume

Make sure your Capsule is in Secure mode (see directions above if needed).

Open a terminal window in the capsule and navigate to the secure volume by typing:

cd /media/secure_volume

Between sessions, stop the Capsule via the HTRC using the web browser on your personal desktop. The next time you log in, you can restart the same Capsule and continue your work.

 Stop the Capsule

 Watch a video


 Read step-by-step instructions

From your data capsule's status page, locate the blue "Stop capsule" button on the right side of the screen and click it.

When you are finished with your research, request to export your non-consumptive results. When you no longer need the Capsule, delete it via the HTRC. 

 Export non-consumptive results

(This is the same as Release results)

If you'd like to export results out of the Capsule, you must release them from your virtual machine (VM).

First, switch the VM to Secure mode in the Portal interface. 

Second, open a terminal in the capsule, navigate to the secure volume by typing:

cd /media/secure_volume

Suppose the file you'd like to release is at /home/demouser/demo/r/Rplots.pdf

You can prepare the result data for release by first adding it the the release list: 

releaseresults add /home/demouser/demo/r/Rplots.pdf

Repeat using this command if you have other files to add.

Finally, to complete the release of your data, type: 

releaseresults done


The files will be delivered via email. You will receive them in the email that you registered for the HTRC Analytics account. The email link will be live for 12 hours.

Shut Down the VM

Go back to HTRC Analytics. From the HTRC Analytics homepage, click the Data Capsules tab in the top navigation bar, then select My Data Capsule Resources from the dropdown menu. Click on the Data Capsule title of the VM you would like to shut. Next, select the blue Stop Capsule button in the upper-right hand side of the page to shut down the VM.


Delete the VM

If you do not need the Capsule anymore, you can delete it. From the HTRC Analytics homepage, click the Data Capsules tab in the top navigation bar, then select My Data Capsule Resources from the dropdown menu. From here, click on the Data Capsule title of the VM you wish to delete. Next, you will click the red Delete Capsule button in the upper-right hand side of the page to delete the VM.


 

How to create a data Capsule template:

 Creating and Using a Data Capsule Template

Data Capsule templates are intended for researchers, librarians, or any HTRC Analytics account holder to be able to share a DC (Data Capsule) setup with other HTRC users. 

What is a Customized Research Capsule template?

A template is a snapshot, or image, of a pre-existing Research Data Capsule that can be made accessible for other HTRC account holders to find and use. The template is created and set up by one HTRC user who has a Research Data Capsule they wish to share for educational or research purposes. This user downloads additional code libraries, software tools, worksets, and potentially any other data they can access from the internet when in Maintenance mode and that is not already available in HTRC’s Ubuntu virtual machine, then creates a template from this Capsule. See specific steps below.

Please note: at this time, templates cannot be used to share HT volumes between Capsules or derived data added to the Secure Volume when performing research in Secure mode.

Why would I want to create a template?

Perhaps you are responsible for leading workshops on text analysis platforms and would like to demonstrate a specific workflow within the Data Capsule environment for your attendees. With Customized Research Capsule templates, all attendees can find your public template and start from the same point in the Data Capsule environment.

The same could be done by instructors wishing to demonstrate the Data Capsule environment for a class lesson or assignment in a digital humanities course they are teaching. 

Templates allow HTRC users some customization control of a Data Capsule in use cases where multiple users require access to the same research environment setup.

Why would I want to use a template to create a Data Capsule?

If you are a student or workshop attendee who needs access to certain required data, tools, or Data Capsule setup for a tutorial, project, assignment, or activity, then having access to templates provides an easier option for creating the Data Capsule your instructor or workshop leader needs you to use, rather than creating the Data Capsule from scratch.

Think about it: you are in a workshop that is demonstrating how to perform sentiment analysis on a specific list of HathiTrust volumes. Instead of downloading the correct version of the tool your instructor requires, as well as gathering all the correct HathiTrust volume IDs, this is all preloaded for you in the template the instructor created. Simply locate the template under "All Templates," and create the Data Capsule you need.  

This could also be useful for researchers searching for a template that has a basic setup for research flows and tools you are interested in exploring or if you’re interested in reproducing research conducted within a Data Capsule that has already been published – no need to reinvent the wheel if the environment is primed to go. Search "All Templates" and read through their descriptions – click on the title of the template to read what customizations have been added by the template’s creator.

How can I create a template?

Log into your HTRC Analytics account.

Click My data capsule allocations in the menu beneath the Tools menu label. 


If you have a pre-existing research data capsule in your list, select it by clicking on that data capsule's title. You will be taken to that capsule’s status page.




If you need to set up your data capsule, click the start button and wait for it to be in an RUNNING state. Then use either the Connect via Terminal or Connect via Remote Desktop buttons to access your virtual machine. In Maintenance mode you have access to the internet, so use this connection to download the libraries and tools you need (you cannot download internet-based resources in Secure mode). See this page to know what already comes pre-installed on all Research Data Capsules.

If you do not have a pre-existing Research Data Capsule, read this page for creation steps. Remember: you can only create templates from Research Capsules, not Demo Capsules.

Once your Capsule is set up with all your tools and code libraries, stop your Data Capsule. Your Capsule MUST be in SHUTDOWN mode in order to create a template

Click the More data capsule functions dropdown button and select Create a template.


Fill in the form on the next page.



In order to make the template viewable to other users, you will need to select the “Make Public” checkbox at the bottom of the form (you can decide to make it public later, however, if you don’t want to make it public right away; in this case, don’t select this checkbox). Try to make your template easily identifiable to others who will be seeking out your template by providing a clear title and description, along with a succinct list of customizations. Click the Submit button when you are finished filling out the form.

After you have submitted your template form, you will be redirected to the templates list page. If you only would like to see your current list of templates, filter the template list to “My Templates.”

When you initially create a template, it will be in a CREATE_PENDING status. This means that the process of copying your template from the data capsule is happening. You will not be able to perform any functions with a template in this state, such as creating a data capsule with it or deleting it.



Once the template is fully copied, the status will change to ACTIVE. Now you can perform actions like creating a data capsule from the template, and delete or change the public/private status on your personal templates. 

Click on the title of the template to see the template information page. This page shows you the details of the template. Notice that this page also tells you if your template is in a CREATE_PENDING state or ACTIVE in the status field, as well as whether your template is public or private.



Back in the templates list, once your capsule has a status of ACTIVE you can click on the blue Create a data capsule button, or you can delete the capsule by clicking the trashcan icon.

You can also click back to the template information page, which also allows you to create a data capsule, or to change the public/private status of the template.


How can I create a Data Capsule from a template?

You can create a data capsule either from the template list page, or an individual template information page – both spaces provide buttons for accomplishing this task, and both take you to the same form for data capsule creation.





Clicking one of these buttons routes you to the Create a Research Capsule form with your template name pre-populated in the "HTRC Data Capsule Template" field. You will need to fill in the rest of the required fields of this form prior to submission. 

Click the Create Capsule button when ready to submit and you will see your new Data Capsule listed on your My Data Capsule Resources page.



What to do if you need help troubleshooting problems with your data Capsule:

 Request help directly from your Data Capsule

If you have questions about or are experiencing problems in your Data Capsule, you may submit a request for help directly from your data capsule. To locate the request help form, click on the dropdown menu labeled More data capsule functions located on the right-hand side of the page:


Click the Request data capsule help option.

You will be redirected to a form that you must fill out and submit. 


Once you submit your form, a Jira help ticket will be created on your behalf and HTRC staff will be in touch with you via your institutional email on the status of your issue. We recommend putting in only one help request at a time before submitting a new form (i.e. wait for your initial problem to be addressed before putting in another request for help if another problem occurs in the same data capsule). 


Questions?