Notes of the user group meeting on March 19, 2014

Attendees: Peter Leonard, Michelle Paolillo

HTRC team: Harriett Green, Sayan Bhattacharyya, Miao Chen, Loretta Auvil

Meeting theme: how HTRC can be used to facilitate digital humanities (DH) patrons through the help of librarians?

More explanation: The term 'digital humanities' covers a wide spectrum of interests, and the description "digital humanist" is arguably applicable to a broad array of profiles of humanities researchers. How can a resource like HTRC make itself useful, as much as possible, to this very broad constituency of scholars? (Below are some possible questions, but no need to limit the discussion to these questions only: Can there be potential uses of HTRC for those digital humanists who do not usually work with text, but with images or new media? Can there be potential uses of HTRC for museologists and scholars in museum studies? What will facilitate increased uptake and use of HTRC among digital humanists in general? And not to forget the core constituency of users, namely, textual scholars: what possible changes to the HTRC's tools and services will help them the most?)

Sayan: where does HT fit in within the overall DH domain, is the motivating question. In Michigan university, a professor in DH is working on museums and libraries and trying to use the collections in her under classes. He wants to put upon library collections and museums, and he wonders how HT can help with it (he asked Sayan about). Is there anyway for scholars to work on things beyond text?

Harriett: most media (in collections at her place) are text. Ted Underwood has been doing topic modeling and classification on using HT as big text corpora.

Another project I'm doing is working with a sociology faculty looking at HTRC, for data mining on African American women, on corpus of 20th century. HTRC has been a good condodium for scholars. We're still on the way to getting people to use the tools.

Peter: he has worked on Google book grant in 2011, it has its own advantage and disadvantage. The term count in in-copyright volumes are important.

At Yale library, they're working on some techniques that's applicable to HTRC. They want to help faculty and student to use the tools, e.g. how to use unique IDs, deal with SEASR and Meandre.

If you encounter scholarly communities, you will encounter the metadata quality issues in HTRC books. It'd be good to fold back some data cleaning, that everyone will have to do.

Harriett mentioned the WCSA project, which is in the process of awarding sub-grants to do metadata work. Hopefully we can get some really good results from the research teams.

in the next 18-20month, is there a big investment in the legacy SEASR stuff, or helping users on using these stuff?
Sayan: there is feeling they're different constituencies. There are people who feel comfortable downloading the data and process. There are also naive users who need help, that's the things we have been talking about.

Harriett: we are facing different users. We are thinking different tools that people can use easily to explore the corpus.

Peter: they're thinking what infrastructure is needed for DH, at Yale. They are thinking what tools are needed for their own collections. One possibility is the Bookworm tool. These can also be useful to HTRC. Harriett mentioned HTRC has submitted a proposal to NEH about this.

The challenge is: Very few of us are trained to answer questions on these millions of book.

Harriett: people can generate visualization from using tools, and then they can connect to people knowing network analysis (for example).

Peter: HTRC can be central hub for open corpus (open-open). Libraries at different locations can host copies of out of copyright corpus, to reduce load of HTRC, if there are many such requests to open corpus.

Harriett: Data management will be an issue, e.g. library will need to manage the server, and this would be an issue with many libraries.

Sayan: I had a conversations with Ted Underwood. What would be the killer app for HTRC? The in copyright thing would be it. Maybe in the future copyrighted will be an HTRC-centralized thing, and out of copyright will be decentralized.

Miao: documentation about HTRC can be part of this effort of combining library and HTRC (for DH). For example, Sayan and Thomas (who has left UIUC library) have worked on a documentation of HTRC portal and some algorithms.

Harriett: documentation about HTRC, that would be critical for HTRC UnCamp. It can also be a collaborative thing.

Michelle suggested for a tutorial, it'd be good to think this way when doing it: what do you do if your use case is this? Not about what the algorithms.

Peter: A lot of libraries have DH lib guide, e.g. Duke library.
Find how many times HTRC is listed in the lib guides. It'd be interesting to do a survey of this, to see how librarians think about this.

HT has name recognition issue. Almost every professor knows Google books and search, but much much fewer knows HT. So people need to explain how this relates to Google books and then how to work with HT books.

Sayan: we will showcase HTRC tools to people in conferences such as ALA, not only DH, but also libraries, to have HTRC presence.

Peter suggests to start, HTRC can do a survey to lib guide. The other thing is, there is a group of digital humanities librarians (ACRL-DH organization), it'd be interesting to bring up the question the question to that mailing list. Could convert the documentation to a lib guide page.

HT file: authors name is concatenated with title information
Michelle: one of things is to explain HTRC to people towards throug the production portal and algorithms.

It seems to be cultural divide. Humaniest tend to have problem of where the RQ raises in this HTRC environment.

Sayan: to have a talk in humanities friendly way, e.g. in ALA or ACLA, try to build up discourse with these algorithms.

Michelle: They are interested in computational analysis, they don't really know what it means to them. We can play around with the dynamics, and they become excited when setting their work visualized. Hands on with low-barrier tools seem to be helpful. She wound't start with portal, because there's barrier there. The Meandre would be a huge step forward, so we need to provide them some low-barrier experience.

Sayan: do you think videos tutorial will help?

Michelle: a video tutorial about LDA in the portal would not help much. Something like network relationships seems can capture people's attention.