Notes of the monthly user group meeting on October 31, 2013


Ted Underwood (University of Illinois, Urbana-Champaign)
Brian Vetruba (Washington University in St. Louis)
Matthew Wilkens (University of Notre Dame)
Michelle Paolillo (Cornell University)
Peter Leonard (Yale University)
Abby Scheel (Florida State University)
Grace Kaletski (Florida State University)

Loretta Auvil (HTRC staff, University of Illinois, Urbana-Champaign)
Sayan Bhattacharyya (HTRC staff, University of Illinois, Urbana-Champaign)
Miao Chen (HTRC staff, Indiana University Bloomington)


The meeting was to engage the community to share educational materials related to HTRC, e.g. tutorials/demos/slides, and foster the use of the community wiki for this purpose.

  • tutorial on searching HathiTrust. 

Scholars can search on HT main page, but some times the search can't meet their need, e.g. number of books in Italian.

Some people download HathiTrust files, dump in database, and search. But HT files have millions of records, which is a lot. 

People want to know that kind of information, and would like to have sample code.

  • There is no obvious visualization for HathiTrust on the main HT page

Getting a use case can help frame how to explore HT metadata

Abby Scheel mentioned there is some visualization of languages available on HT main page

  • Author is not part of the HT file, but it's important
  • The metadata quality is a concern

can have language detection algorithm run on HTRC corpus, since the current language metadata is not reliable

  • Two ways of HT data visualization

one is providing visualization report, the other is providing data access

  • use the community wiki for PPT presentation and slides upload
  • Need good tutorial for algorithms provided by HTRC

One challenge is how to explain them in a way friendly to people less familiar with this kind of stuff, e.g. from traditional humanities field.

It's especially important to explain the algorithms (e.g. Naive Bayes) to people in their context. 

  • Scholarly Commons at UIUC is developing tutorial to the portal
  • Need to have an overview document for HTRC

FAQ is not enough for a beginner knowing nothing about HTRC. 

The first couple of pages of the overview document can introduce example of utility, how helpful it's in classroom.

  • Create a set of slides as templates that non-HTRC people can easily adopt for their own purpose

Actionable items

  • put introduction slides about HTRC on the Wiki for people to use to introduce HTRC
  • develop a set of slides template that people can easily adopt for their own context and purpose

An HT file is a tab delimited file, containing information about a volume