Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Michelle discussed on use case from a scholar at Cornell, Prof. Ed Baptist (History). He wanted to access in-copyright content of HathiTrust, which is about Slave narrative from the Federal Writer’s Project. He would like to do some entity extraction and topic modeling on the data. However, this is in-copyright can HathiTrust does not provide scanned image.


She also shared some knowledge on HTRC analysis on in copyrighted content in the future. Mainly, features will be extracted, such as word frequencies. There have also been discussion on getting co-occurrence matrix for topic modeling purpose.

Michelle: some person Prof. Baptist taught digital history class , showed students the HTRC this past semester, which included emphasis on introducing algorithmic analysis to students.  The class was introduced to HTRC porat and Prof. Baptist used topic modeling samples from it as a teaching way method in class. He showed students word cloud from different rolesperspectives, e.g. slaves, abolitionist, comparing narratives of enslavement as told by slaves and abolitionists; quite interesting to see compare the word clouds.

There has been interest in scholarly community to have XML files from some place and use it for other purposes.