Notes of the User Group Meeting Sep 29, 2015

Attendees: Michelle Paolillo (Cornell University), Tom Burton-West (University of Michigan), Nabil Kashyap (Swarthmore College)
HTRC team: Sayan Bhattacharyya, Eleanor Dickson, Miao Chen


Sayan Bhattacharyya shared his thoughts on using HTRC as a big data set creatively in classroom setting, e.g. for assigning readings in creative writing class, and also presented several uses of the HTRC extracted features data set.

Extracted Features Use Cases developed by Sayan Bhattacharyya:

HT digital library is a big data for digital humanities. Can we do something useful to people in the field who are not necessarily interetsed in working with text mining algorithms? How to help instructors to come up with assignments for students? Started with several uses cases, i.e. extracted features use cases.

Introduced the extracted features data first.

How to help instructors to come up with assignments for students? E.g. Creative writing assignment. Sayan found students sometimes read digest version of assigned readings. How can we inspire students actually doing the reading? 1) how can we use big data? 2) how can we inspire students, making the assignment interesting?

Dunning likelihood can find relatively salient words in one book, compared to another book. Can use it for student reading assignments. A Dunning likelihood is shown on this poster

Using computational techniques in English class: find it hard to encourage creativity if not knowing the technique well? For people not analytically minded, it can be challenging for such instructors to create assignments in this way.

Michelle: at Cornell summer workshop, she found introducing explicit methods to people really help them.



Meeting announcement

Title: Creating engaging student writing assignments framed around text analysis with the HTRC Extracted Features dataset
Presenter:  Sayan Bhattacharyya
Instructors often allege that students typically do not read as much reading material as they are assigned to read. Some instructors have reported success with getting students to read fiction by having them write book reports using innovative techniques — such as adoption of the persona of a character in the text or gamification of the report by creating a board-game based on an assigned novel. While these strategies may be successful in a high school setting, they may not necessarily be suitable for college students, as the latter do need, after they have closely read the text, to write essays as part of learning to write well.
I have been trying to think about how the HathiTrust Research Center’s new Extracted Features dataset (*) can be useful to  instructors for incentivizing students to read assigned literary texts. This suggestion may seem counterintuitive, since big-data techniques are typically thought of in the context only of distant reading rather than of close reading. However,  the affordances around the HTRC Extracted Features service lend themselves well to assignments in undergraduate courses that incentivize close reading by students through creative and playful use of algorithmic analysis of text in such a way that students are forced to do close reading of the text, because the big-data analysis frames the writing assignment in such a way through discovery/analysis that students cannot get away with reading only canned summaries or 'Cliff's Notes' of the text for their writing assignment.
I am making available in a “cookbook” format python scripts that support discovery/analysis for common use cases, such that instructors will be able to easily reuse and adapt them for/to their own needs, crafting assignments for students that contain an element of novelty fostering student engagement. 
 (*) We at HTRC currently provide extracted features for 4.8 million volumes (mostly pre-1923 materials) out of the nearly 14 million volumes of the HathiTrust Digital Library’s collection. Extracted features for volumes encompassing the entire collection are anticipated in the upcoming academic year, although the makeup of the feature data may change slightly.
About the presenter:
Sayan Bhattacharyya ( is a Postdoctoral Research Associate (CLIR Research Fellow) at the HathiTrust Research Center, Graduate School of Library and Information Science and University Library,  University of Illinois at Urbana-Champaign. He received his PhD in Comparative Literature from the University of Michigan, Ann Arbor.