Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Excerpt
Follow this tutorial in order to learn how to run the HathiTrust+Bookworm visualization tool.


Search for word trends 

HathiTrust + Bookworm enables researchers to visualize word trends across 13.7 million of the volumes contained in the HathiTrust Digital Library. 

...

When clicked, the facet icon displays various fields for narrowing or specifying search parameters. 


Click on the facet icon to display the available search field options for each token choice. (By not clicking and selecting parameters to limit your search, your search will be conducted across all 13.7 million volumes.) Field options include specifications concerning language, class, literary form, and more. These metadata fields are coming primarily from the library catalog records and from data generated for the HTRC’s Extracted Features. A pop-up menu will appear to see available fields. 

...

Click on the Dates icon. A pop-up window will appear that allows you to specify date limits for the search. Move the circular toggle right or left to expand or reduce the time frame. There is also a toggle for Smoothing on this menu. Smoothing is a means to create a moving average over the data and to identify overall trends by removing jagged and discontinuous data points. Often trends become more apparent when data is viewed as a moving average. Smoothing windows are weighted: the year shown is weighted the most heavily, and the weights decrease in each direction until the smoothing span is reached

Smoothing options are described below:

  • To see the raw data points, set smoothing to 0. 
  • To average one point on each side of a data point, set smoothing to 1, which counts the previous one, current one, and next one and divides that sum by 3. 
  • A smoothing setting of 5 means that 11 values will be averaged, 5 values on each side of the data point.

...

This selection allows you to choose how the numerical values are counted. Depending on what option you choose, the label of the y-axis of the graph is changed accordingly, and the chart values adjusted. There are four options:

  • Words per million shows the number of occurrences of a token per one million words
  • % of volumes   gives the number of texts that use your search terms at least once as a proportion of the total number of texts published that year.
  • Number of words plots the actual count of the searched word as the y-value for the plot.
  • Number of volumes plots a count of each volume where the searched word actually occurs as the y-value for the plot so that each volume registers a single count. (The word "text" in the X-axis label is being used interchangeably with the word "volume".)

...