Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Attendees: Charlotte Cubbage, Brendan Quinn, Claire Stewart, Geoff Swindells, Phil Burns, Bill Parod, Chris Comerford (all from Northwestern University)

...

Not a lot of faculty know what HT or HTRC are. They have a DH program and DH activiites. We can use shared syllabus, some kind of workshop to introduce HTRC, algorithms in HTRC portal. Not sure how many faculty are interested in large-scale data mining.

There has been a dramatic increase in interest in digital humanities, digital publishing and text-analysis type of projects. This is mostly from junior faculty and graduate students, but of late there has also been a surge of interest among advanced undergraduates.

...

We don't know if the corpus tools fit the current interests of our linguistics faculty. It is a question that who are the general audience.

HTRC updates

Currently the HT has 3 million volumes of non-copyrighted materials and 8 million volumes of copyrighted materials.

...

There will be more interest when there is more tutorial information. The more things they can point people to, the better. 

 Also, the things that faculty want to do don't always fit into existing tutorial information. Doing something more visually with topic models, such as Termite from Stanford, may be good.

Currently, some people are developing videos at UIUC scholarly commons, story board to help people understand HTRC functionalities

Algorithms and Text, inside or outside HTRC

...

What happens if someone wants to integrate textual content that they have on their own, with HTRC materials? Unfortunately, that (taking data from other sources) is not currently on the radar. But if someone were to want to augment their analysis with additional data, that is on the radar — for example, if a user were to augment their analysis by validating with a dictionary — that kind of thing is on the radar. But augmenting the data itself with an additional corpus — that has not been on the agenda.

Can we use HTRC suite of algorithms on non-HTRC data? Yes you can use the tools (by locating them from other resources, e.g. Mallet tools) to do it.

It was proposed that scholars can use HTRC algorithms on some text from local faculty which is cleaned, and then have it (the local text) available in HTRC so everybody can use it.

Possible use types

When faculty come and talk with librarians about analysis of digitized text, there are three main types of use cases:

...

No matter how big the HT corpus is, or how powerful the existing algorithms are, if the existing (corpus + algorithms) do not meet the needs of the specific problem that the user is trying to solve, then the user will not use the resources. Often, people come to librarians with their own texts (such as EEBO, ECHO, the Old Bailey text corpus, etc., and describe the problem and request an algorithm to be written to do the analysis that they are trying to do. Nowadays, even undergraduates come with quite complex tasks that they are trying to do. 

Everyone's project is different from others, and so need new unique stuff to support the project. We need to have a path or mechanism to address a researcher's request for help on algorithm/data.

Sometimes, the text data set that the user is interested in using, are government documents that get set out periodically — all the articles put out within a certain time period.

...

1) Try to create a bibliography of conference papers — to keep track of what researchers are doing using HTRC tools and resources. This is similar to what the ICPSR does, which has been mentioned and used in sociology classes. If you can get HTRC as such a source, and there could be more use. People need to be more aware of HTRC, and they need to be encouraged to use it.

2) People need to be encouraged to use HTRC as the source for the documents they need. Try to find a way to communicate with undergraduates directly, and let undergraduates know about HTRC and what it can do. Often, undergraduates will run with the resource, and convince faculty to let them use it for class projects, etc. Faculty members tend to be more reticent about using new resources than undergraduates are.

3) The crux of the matter is to come up with tutorials that people can play with whenever they have a few minutes free to play with it, and still make it into a useful learning experience. People's times are fractured in weird ways — especially students'. Sometimes they may have only a few minutes to play with the tutorial.

4) Try to make mechanisms to have people use complex stuff in an easy way. Tutorial is a great way to do achieve it.