Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Discussion: Ultimately, HTRC is thinking of providing (some type of) API(s). So, instead of downloading files, access would be provided with API(s') access.


Peter asked: “How much interest may there be in engineering a more complex API? What other things can we (prioritize to) do with the data? Would the likely eventual size of the data (multiple gigabytes of data per decade), when we scale up to the contents of the entire HTRC corpus, end up exceeding what most regular people can download to,or work with, on their desktop machines? Of course, people may want to do things large-scale. Would it be reasonable or unreasonable to push that effort on to the user?”

Discussion: You can actually rsync on the individual volumes as well as the grouped tar files.  However, the volume IDs have to be filename friendly, which HT volume IDs are not. We could provide a mapping from the HT volume IDs to filename-friendly IDs. There are reserved characters used in HT/HTRC volume IDs — characters that should not be used as part of names in the file system. A volumeID is not equivalent to a filename.

...


“Can we do prepared filename lists, for predictable kinds of worksets? Can we organize/ split-up datasets by genre? An argument for doing so is that, for example, fiction may be a high-demand-for-download genre compared to other genres.”

Discussion: We debated about how to group the datasets, and finally we decided to use year/decade/chronological type of information for grouping the datasets — mainly because this would be less controversial. For example, questions like “What counts as fiction?” could be highly controversial — the present grouping by chronology lets us bypass that controversy. However, this is not wholly unproblematic either — as (e.g.) a book listed as having been published in 1916 could well be a reprint of something published in 1872, etc.

“This is going to be a very new kind of thing for many users. How to describe to users how to use this tool? What kind of documentation is planned?” Rachael Rachel asks for examples and documentation.

...