Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

Pre-meeting notes:

-word break (Ted Underwood)


-derived stats and informational data (Beth Plale)

Meeting notes:

Ted Underwood (a professor at  Department of English, UIUC) discussed his user case with HTRC people during the meeting.


Loretta summarizes requested extensions: count of number of lines on page, count of number of lines that start with a capitalized token, use dictionary to deal with hypens or nondictionary tokens at end of line and start of next line and combine these tokens only if they exist in a dictionary, add counts for punctuation tokens

Action item:

Sayan, Jiaan will get Python script from Ted, and work on a simpler version/logic of that script.