Notes of the user group meeting on Aug 20, 2014 (under construction)

Not a lot of faculty know what HT or HTRC are.

There has been a dramatic increase in interest in digital humanities, digital publishing and text-analysis type of projects. This is mostly from junior faculty and graduate students, but of late there has also been a surge of interest among advanced undergraduates.

Video tutorials that HTRC is developing will be useful for users. There is an "audience question": many find the HTRC tools to be fairly limiting, especially in comparison with commercial tools.

We don't know if the corpus tools fit the current interests of our linguistics faculty.

Currently 3 million volumes of non-copyrighted materials and 8 million volumes of copyrighted materials.

HT-Bookworm (under development by HTRC and groups from Baylor and Northeastern) and the Data Capsule (which will allow users to run computational methods against protected data) are some techniques being developed that will make copyrighted material more usable to users in the future. In order to for HT-Bookworm to work, the back end will need to be changed from mySql (as it is currently) to Solr. Solr is likely to be more scalable. The plan is also to integrate HT-Bookworm with HTRC worksets, so that one would be able to go in both directions — from the Data Capsule to the workset, and from the workset to the Data Capsule. That is, from what the user discovers using HT-Bookworm, the user might be able to automatically generate a workset. The goal is to make the HT-Bookworm work with all the public-domain material in HTRC within a year. If by then HTRC gets the copyrighted data, that will be sought to be integrated, too.

There will be more interest when there is more tutorial information. Also, the things that faculty want to do don't always fit into existing tutorial information. Doing something more visually with topic models, such as Termite from Stanford, may be good.

If someone wants to submit an algorithm to the portal, how is it decided? Currently, it is decided on a one-on-one basis. Setting up a workflow/process for people to submit their algorithms will be useful.

What happens if someone wants to integrate textual content that they have on their own, with HTRC materials? Unfortunately, that (taking data from other sources) is not currently on the radar. But if someone were to want to augment their analysis with additional data, that is on the radar — for example, if a user were to augment their analysis by validating with a dictionary — that kind of thing is on the radar. But augmenting the data itself with an additional corpus — that has not been on the agenda.