Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

Attendees: Samuel Franklin (Brown), Michelle Paolillo (Cornell), Ted Underwood (UIUC), Tassie Gniady (Indiana), David Mimno (Cornell)

HTRC team: Sayan Bhattacharyya, Miao Chen

We had our September user group teleconference on 9/24 Wed, 3-4pm ET (or 2-3pm CT). This time we invited Samuel Franklin from Brown University to talk about his research interest and needs related to HTRC. He presented his research plan using the HT corpus, and then the group followed up with an open discussion.

About Samuel and his project: He is a 5th year phd student in American Studies. It's basically Part of it is a keyword study around "Creativity" after WWII. He studied discourse and communities. He used case studies, on people having experience of champaign creativity. Itstudies how  different communities of discourse used the term "creativiiy".  It's partially a discourse analysis, e.g. what the concept influenced in the past.

He has played with Google n-gram viewer and J-stor and ProQuest a bit. He has also played with HTRC portal, hoping to have much more going on. He has looked at curves plots of the word "creativity", and parse the curvesparsed them, and found examined what discplines disciplines accounts for these parsed curvesthem.

Sayan: specific commuinty or groups for discourse, would it be good to have library of congress classification Would you find it useful to treat specific library of congress classifications as proxies for communities of discourse?

Ted Underwood suggested him to that Samuel look at HTRC Bookworm, he thinks there is book curve parse+Bookworm.

Ted Underwood mentioned a project he is doing in parallel to Sam's, on discourse around money across time. He does topic modeling. The next step is to find what topics were these words (e.g. "creativity") were assigned to, even thought though there was may have been no topic on creativity.

Sayan mentioned Sam can use the Dunning likilihood log-likelihood algorithm in the portal to, for example, compare the works of two authors(sets of) authors, assuming they both  both (sets) write about creativity.

Another thing Sam looks at is abbreviation, about what it means. This can be done by using collocations. He is also interested in associations. It'd be interesting to track influence given writer, tradition of creativity, or by citation analysis.

"How much are you of what  you  are interested can be found in HTRC portal?"
Most of his specific search searches yielded low numbers.

Miao introduced the HTRC Data Capsule as a solution to running algorithms against copyrighted content.

Sayan mentioned WordNet, a kind of network representation of words, which can be useful (e.g. to search for not just "creativity" but"creativity" and its synonyms, "creativity" and words semantically related to it, etc. etc.)

David: it'd be great to have some examples to show that what we can do with these data. Need to get publicity about what's sitting there. It will also be good to have statistics information, such as word entropy.