HathiTrust+Bookworm step-by-step tutorial

Follow this tutorial in order to learn how to run the HathiTrust+Bookworm visualization tool.

Search for word trends

HathiTrust + Bookworm enables researchers to visualize word trends across 13.7 million of the volumes contained in the HathiTrust Digital Library.

Navigate to https://bookworm.htrc.illinois.edu/develop/

Search

Enter the token (word) into the search field(s) you wish to search against the tool. You can only search for a single term in each box and not multiple words or phrases.

You may choose one or more words to search simultaneously. To add words click on the plus icon to add another search field. To take away search field, click on the minus icon.

When clicked, the facet icon displays various fields for narrowing or specifying search parameters.

Click on the facet icon to display the available search field options for each token choice. (By not clicking and selecting parameters to limit your search, your search will be conducted across all 13.7 million volumes.) Field options include specifications concerning language, class, literary form, and more. These metadata fields are coming primarily from the library catalog records and from data generated for the HTRC’s Extracted Features. A pop-up menu will appear to see available fields.

Click inside the All Texts text box for any field you wish to modify. Once clicked, a drop-down menu will appear for you to make your selection(s).

Click inside the box again to add another parameter (value) to any facet. Multiple values for a given facet field can be chosen by clicking inside the same box.

Multiple values per facet field constitutes a logical OR search (e.g., Publication Country: USA OR United Kingdom OR Canada). Different facet fields constitutes a logical AND relationship (e.g., (Language: English) AND (Publication Country: USA OR United Kingdom OR Canada)).

Depending on your search, items in HathiTrust that are classified as serials, such as magazines, journals, or multi-volume publications, can cause odd spikes in a graph due to unreliable date metadata. You may want experiment by limiting your search to the “Resource Type” as “book” to see how that affects your visualization.

Click on the facet icon again to save your search parameters and to close the pop-up menu once finished making selections.

Adjust Settings

Choose additional settings by selecting desired Date, Metric, and Case fields.

Date and Smoothing

Click on the Dates icon. A pop-up window will appear that allows you to specify date limits for the search. Move the circular toggle right or left to expand or reduce the time frame. There is also a toggle for Smoothing on this menu. Smoothing is a means to create a moving average over the data and to identify overall trends by removing jagged and discontinuous data points. Often trends become more apparent when data is viewed as a moving average. Smoothing windows are weighted: the year shown is weighted the most heavily, and the weights decrease in each direction until the smoothing span is reached.

Smoothing options are described below:

To see the raw data points, set smoothing to 0.
To average one point on each side of a data point, set smoothing to 1, which counts the previous one, current one, and next one and divides that sum by 3.
A smoothing setting of 5 means that 11 values will be averaged, 5 values on each side of the data point.

Metric

This selection allows you to choose how the numerical values are counted. Depending on what option you choose, the label of the y-axis of the graph is changed accordingly, and the chart values adjusted. There are four options:

Words per million shows the number of occurrences of a token per one million words

% of volumes gives the number of texts that use your search terms at least once as a proportion of the total number of texts published that year.

Number of words plots the actual count of the searched word as the y-value for the plot.

Number of volumes plots a count of each volume where the searched word actually occurs as the y-value for the plot so that each volume registers a single count. (The word "text" in the X-axis label is being used interchangeably with the word "volume".)

Case

Selecting Insensitive ignores the distinction between lowercase and uppercase characters when counting words
Selecting Sensitive maintains the distinction between lowercase and uppercase.

Once all settings have been selected, click on the blue Search button to search the corpus.

Search results will be displayed along the x-y graph with one line for each search term.

You may hover the mouse over any plot point along each graphed line. If held over a point, a pop-up box will appear, displaying the token, the year, and the token frequency.

If clicked, this box will generate a listing of the volumes by decreasing order of contribution to the plot at that particular year. (Note: Once clicked, please wait a few seconds for the volume list to generate.) Each volume title is a hyperlink, clicking on which will take you to the corresponding volume in the HathiTrust Digital Library.

Saving/Exporting results

Search results can be saved by copying the link of generated results.

Results can also be saved by exporting PNG or PDF files of generated results.

Documentation