HTRC Publications and Presentations

This is a list of publications and presentations relevant to the work of HTRC and produced by HTRC staff. See also Grant-funded projects for sponsor-funded work by HTRC, and HTRC Research Impact for work produced by others in the scholarly community that make substantial use of HTRC data, tools, and expertise.

1 2025 | 2 2024 | 3 2023 | 4 2022 | 5 2021 | 6 2020 | 7 2019 | 8 2018 | 9 2017 | 10 2016 | 11 2015 | 12 2014 | 13 2013 | 14 2012 | 15 2011 | 16 2010 

2025

Publications

Galina Russell, Isabel, and Glen Layne-Worthey (eds.). The Routledge Companion to Libraries, Archives, and the Digital Humanities, Routledge, 2025. https://www.routledge.com/The-Routledge-Companion-to-Libraries-Archives-and-the-Digital-Humanities/Russell-Layne-Worthey/p/book/9781032356259

Jaillant, Lise, Claire Warwick, Paul Gooding, Katherine Aske, Glen Layne-Worthey, and J. Stephen Downie (eds.) (2024) Navigating AI for Cultural Heritage Organisations. University College London Press. (In Press)

Layne-Worthey, Glen. "Copyright Is the Lock; Non-Expressive Fair Use Is the Key: Research with In-Copyright Texts."  In: The Routledge Companion to Libraries, Archives, and the Digital Humanities, Isabel Galina and Glen Layne-Worthey (eds.). Routledge, 2025.

Layne-Worthey, Glen, J. Stephen Downie, Janet Swatscheno, Nikolaus Parulian, Jill Naiman, Ben Schmidt, Peter Organisciak, Ted Underwood, and Ryan Dubnicek. “Making More Sense with Machines: Artificial Intelligence at the HathiTrust Research Center.” In: Navigating AI for Cultural Heritage Organisations, Lise Jaillant, et al. (eds.). University College London Press. (In Press)

Presentations

2024

Publications

Hu, Yuerong, Zoe LeBlanc, Jana Diesner, et al. (2024). Complexities of leveraging user-generated book reviews for scholarly research: Transiency, power dynamics, and cultural dependency. International Journal on Digital Libraries, 25(317–340). https://doi.org/10.1007/s00799-023-00376-z

Layne-Worthey, Glen, and J. Stephen Downie (eds.) (2024). Journal of Documentation Special Issue: Artificial Intelligence for Cultural Heritage Materials. https://www.emerald.com/insight/publication/issn/0022-0418/vol/80/iss/5

Layne-Worthey, Glen, and J. Stephen Downie (2024). "Special Issue on Artificial Intelligence for Cultural Heritage Materials: Guest Editors' Introduction." Journal of Documentation, v. 80, no. 5, pp. 1025-1030. https://doi.org/10.1108/JD-09-2024-275

Swatscheno, Janet, and Felix Oke (2024). Context Matters: An Introduction to HathiTrust Research Center Tools for Text Analysis. #DLFteach Publications. https://dlfteach.pubpub.org/pub/0a9w5sc5

Presentations

Dubnicek, Ryan and Daniel J. Evans. “Mining the HathiTrust Digital Library: finding and extracting insights from millions of books.” Indiana University Indianapolis Luddy Colloquia Digital Scholarship Series, Indiana University Library, 22 March 2024.

Lamba, Manika, John A. Walsh, Ryan Dubnicek, Jennifer Christie, J. Stephen Downie, Janet Swatscheno, Deren Kudeki, Glen Layne-Worthey. “TORCHLITE: New, Open Analytical Tools and Infrastructure for a Mega-Scale Digital Library.” Poster presented at 2024 ASIS&T Annual Meeting, Calgary, Canada, 25-29 October, 2024.

Parulian, Nikolaus Nova, Ryan Dubnicek, Sarah Griebel, Glen Layne-Worthey, J. Stephen Downie (2024). “From “Can’t…” to “Cancún”: Fine-tuning spaCy’s Spanish-Language Transformer Model for Better and More User-Friendly Named Entity Recognition” Poster presented at ADHO Digital Humanities Conference 2024, August 6 – 10, 20234, Washington, D.C., USA.

Shang, Wenyi, Yuqi Chen, Ryan Dubnicek, Ryan Cordell, J. Stephen Downie (2024). “Interplays Between Materiality and Content in Book History: Evidence from 16th–19th Century Chinese and English Books.” Paper presented at 2024 ADHO Digital Humanities Conference, August 6 – 10, 20234, Washington, D.C., USA.

Walsh, John. “‘On the dusty shelves of libraries’: Exploring the 19th-century with the HathiTrust Digital Library” (invited talk).  The Research Socity for Victorian Periodicals.  16 February 2024.

Workshops

Swatscheno, Janet, Ryan Dubnicek & Jenny Christie (2023). “History of Black Writing Presents: Introduction to HathiTrust and HathiTrust Research Center.” Workshop presented at the College Language Association 2024 Convention, Memphis, TN, USA, 13 April 2024.

 

2023

Publications

Presentations

Parulian, Nikolaus Nova, Ryan Dubnicek, Daniel J. Evans, Yuerong Hu, Glen Layne-Worthey, J. Stephen Downie, Raina Heaton, Kun Lu, Raymond I. Orr, Isabella Magni, John A. Walsh (2023). “Tuning out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature” In Proceedings of 2023 ASIS&T Annual Meeting, London, UK, 27-31 October, 2023. https://asistdl.onlinelibrary.wiley.com/doi/full/10.1002/pra2.839

Walsh, John, Glen Layne-Worthey, Jacob Jett, Boris Capitanu, Peter Organisciak, J. Stephen Downie. “‘The library is open!’: Open data and an open API for the HathiTrust Digital Library.” (2023) Proceedings of CHR 2023, the Computational Humanities Research Conference. https://ceur-ws.org/Vol-3558/paper7875.pdf

Downie, J. Stephen. "Beyond OCR: Non-Textual Opportunities and Challenges at the HathiTrust Research Center" (invited talk). Workshop on Scaling-up Document Image Understanding. The 17th International Conference on Document Analysis and Recognition (ICDAR). 21-26 August 2023, San José, Califirnia.

Downie, J. Stephen, Glen Layne-Worthey, Peter Simon, Amy Kirchhoff, Matthew Lincoln. "What is Non-Consumptive Data and What Can You Do With It?" NISO Plus 2023. 14 February 2023, virtual.

Dubnicek, Ryan. “Updates from HathiTrust Research Center: (Some of) What We’re Working On.” Wednesday Noon Digital Scholarship Series, Indiana University Library, 18 January 2023.

Dubnicek, Ryan & Ted Underwood (2023). “Piloting A Machine Learning Approach to Identify English-Language Fiction in the HathiTrust Digital Library” Paper presented at 2023 ADHO Digital Humanities Conference, Graz, Austria, 10-14 July, 2023.

Melkozernova, Arina, Juliann Vitullo, Ryan Dubnicek, Daniel J. Evans, Boris Capitanu (2023). “Telling a Story with Data: shift in the Mediterranean Diet’s discourse from 1950-2020.” Poster presented at CHR 2023: Computational Humanities Research Conference, December 6 – 8, 2023, Paris, France.

Parulian, Nikolaus, Ryan Dubnicek, Daniel Evans, Yuerong Hu, Glen Layne-Worthey, J. Stephen Downie, Raina Heaton, Kun Lu,  Raymond Orr, Isabella Magni, and John Walsh. "Tuning out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature." 86th Annual Meeting of the Association for Information Science and Technology. 27-31 October 2023, London, UK.

Workshops

Swatscheno, Janet, Ryan Dubnicek & Jenny Christie (2023). “HathiTrust Research Center Extracted Features API and Visualization Workshop. Workshop.” Presented at the Code4Lib 2023 Conference, Princeton, NJ, USA, 14 March 2023.

Research Datasets

Ryan Dubnicek, Boris Capitanu, Glen Layne-Worthey, Jennifer Christie, John A. Walsh, J. Stephen Downie (2023). The HathiTrust Research Center BookNLP Dataset for English-Language Fiction. HathiTrust Research Center. https://doi.org/10.13012/d4gy-4g41

2022

 

Publications

Bainbridge, D., Hilbing, G., Jiang, M., Hu, Y., Layne-Worthey, G., & Downie, J. S. (2022). “Study on the Accuracy of OCR and NLP-based Detection of Japanese Text in the HathiTrust Extracted Features V2.0 Dataset.” DH2022 Tokyo. DOI: 10.1007/978-3-030-96957-8_35

Jiang, M., D’Souza, J., Auer, S., et al. (2022). “Evaluating BERT-based scientific relation classifiers for scholarly knowledge graph construction on digital library collections.” International Journal on Digital Libraries, 23(2), 197–215. https://doi.org/10.1007/s00799-021-00313-y

Parulian, N. N., Worthey, G., & Downie, J. S. (2022). “An Ensemble Framework for Dynamic Character Relationship Sentiment in Fiction.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13192 LNCS, (pp. 414-424).

Parulian, Nikolaus Nova, Ryan Dubnicek, Glen Layne-Worthey, Daniel J. Evans, John A. Walsh, J. Stephen Downie (2022). “Uncovering Black Fantastic: Piloting A Word Feature Analysis and Machine Learning Approach for Genre Classification” In Proceedings of 85th Annual Meeting of the Association for Information Science & Technology, Pittsburgh, Pennsylvania, USA, 29 October - 1 November, 2022. https://doi.org/10.1002/pra2.620

Shang, W., Jett, J., Underwood, T., & Downie, J. S. (2022). “Descriptive cataloging issues for non-Western corpora: A case study of late imperial Chinese books.” Cataloging & Classification Quarterly, 61(1), 1–19. https://doi.org/10.1080/01639374.2022.2148800

Presentations

Dubnicek, R. (2022, January 11). “Where to Find Millions of Books and How to ‘Read’ Them: HathiTrust and HTRC.” University of Washington Digital Humanities Colloquium.

Dubnicek, R., Magni, I., Walsh, J.A., Downie, J.S., Graham, M., Layne-Worthey, G. (2022, June 2-3). Scholar-Curated Worksets for Analysis, Reuse & Dissemination (SCWAReD) from the HathiTrust Research Center [Poster]. 2022 Digital Humanities Benelux, University of Luxembourg, Belval, Luxembourg.

Dubnicek, R., Harrison, J., Magni, I. Walsh, J. A., Graham, M., Downie, J. S., & Layne-Worthey, G. (2022). “SCWAReD: Scholar-Curated Worksets from the HathiTrust Research Center.” Digital Humanities Congress, University of Sheffield, Sheffield, United Kingdom, 9 September 2022.

Lu, K., Heaton, R., Orr, R., Vetter, A., Dubnicek, R., Magni, I. (2022, July, 25-29). Mining the Native American Authored Works in HathiTrust for Insights,” Digital Humanities 2022 Conference, Virtual. Tokyo, Japan.

Magni, I., Worthey, G.C., Graham, M., Walsh, J.A., Downie, J.S., Dubnicek, R. (2022, July 25-29). Centering the Marginalized: Scholar-Curated Worksets from the HathiTrust Digital Library [Poster]. Digital Humanities 2022 Conference,Virtual & Tokyo, Japan.

Parulian, N. N., Dubnicek, R., Worthey, G., Evans, D. J., Walsh, J. A., Downie, J. S. (2022, October 29-November 1). Uncovering Black Fantastic: Piloting A Word Feature Analysis and Machine Learning Approach for Genre Classification [Paper]. 85th Annual Meeting of the Association for Information Science & Technology, Pittsburgh, Pennsylvania.

Parulian, N. N., Dubnicek, R., Layne-Worthey, G., Williams, S, West-White, C., Magni, I., Downie, J. S. (2022, July 25-29). Uncovering the Black Fantastic: Piloting Text Similarity Methods for Finding “Lost” Genre Fiction in HathiTrust [Poster]. Digital Humanities 2022, Tokyo, Japan/virtual.

Walsh, J. A. (2022, May 25). Case study: HathiTrust Research Center . Invited presentation at the Text and Data Mining Conference, National International Standards Organization, Baltimore, Maryland.

Walsh, J. A., Wingate, A., Nurkkala, C., & Christie, J. (2022, September 9). Nineteenth-Century Poets and Their Libraries. Digital Humanities Congress, University of Sheffield, Sheffield, United Kingdom.

Walsh, J. A., Wingate, A., Nurkkala, C., Evans, D., Mertka, A., & Christie, J. (2022, June 2-3). “Bibliographic and textual studies and the personal library.” Paper presented at Digital Humanities Benelux, University of Luxembourg, Belval Campus, Esch-sur-Alzette, Luxembourg.

Workshops

Dubnicek, R., Christie, J. Kudeki, D., Layne-Worthey, G., Walsh, J. A., Downie, J. S. (2022). Workshop: HathiTrust Research Center’s Extracted Features 2.0 Dataset. Workshop presented at the Digital Humanities conference, Tokyo, Japan, 25-29 July 2022.

 

 

2021

 

Publications

Jiang, M., Hu, Y., Worthey, G., Dubnicek, R., Underwood, T., & Downie, J.S. (2021). Evaluating BERT's Encoding of Intrinsic Semantic Features of OCR'd Digital Library Collections, 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), (pp. 308-309) DOI: 10.1109/JCDL52503.2021.00045

Jiang, M., Hu, Y., Worthey, G., Dubnicek, R., Underwood, T., & Downie, J. S. (2021). Impact of OCR Quality on BERT Embeddings in the Domain Classification of Book Excerpts. CEUR Workshop Proceedings, 2989, 266-279. https://ceur-ws.org/Vol-2989/long_paper43.pdf

Jiang, Ming, Yuerong Hu, Glen Layne-Worthey, Ryan Dubnicek, Ted Underwood, J. Stephen Downie. “Impact of OCR Quality on BERT Embeddings in the Domain Classification of Book Excerpts.” In Proceedings of CHR 2021: Computational Humanities Research Conference, vol. 1613, pp 0073. November 17–19, 2021, Amsterdam, The Netherlands. Available: http://ceur-ws.org/Vol-2989/long_paper43.pdf

Organisciak, P., & Downie, J. S. (2021). Research access to in-copyright texts in the humanities. In Information and Knowledge Organisation in Digital Humanities (1st ed., pp. 21). Routledge. https://doi.org/10.4324/9781003131816

Organisciak, P., Schmidt, B. M., & Downie, J. S. (2021). Giving shape to large digital libraries through exploratory data analysis. Journal of the Association for Information Science and Technology. DOI: 10.1002/asi.24547

Parulian, N. N., & Worthey, G. (2021). Identifying Creative Content at the Page Level in the HathiTrust Digital Library Using Machine Learning Methods on Text and Image Features. In K. Toeppe, H. Yan, & S. K. W. Chu (Eds.), Diversity, Divergence, Dialogue (pp. 478-489). Springer International Publishing. DOI: 10.1007/978-3-030-71292-1_37

Samberg, Rachael, Scott Althaus, David Bamman, Sara Benson, Brandon Butler, Beth Cate, Kyle K. Courtney, Eleanor Dickson Koehl, Glen Worthey, et al. (2021) Building Legal Literacies for Text Data Mining. eScholarship, University of California, 2021.  https://berkeley.pressbooks.pub/buildinglltdm/

Presentations

Dubnicek, R. (2021, December 15). Introduction to HathiTrust, HTRC, and the HTRC Extracted Features Dataset. Guest lecture for Digital Humanities: Tools & Methods MA course, University of Groningen.

Dubnicek, R. (2021, October 27). Where to Find Millions of Books and How to “Read” Them: HathiTrust and HTRC. Institute of Advanced Study, Princeton University.

Research Datasets

Tutorials

Jiang, M., Hu, Y., Worthey, G., Dubnicek, R., & Downie, J. S. (2021). The Gutenberg-HathiTrust Parallel Corpus: A Real-World Dataset for Noise Investigation in Uncorrected OCR Texts. iConference 2021 Proceedings. http://hdl.handle.net/2142/109695d

 

Dubnicek, R. & Kudeki, D. (2021, September 27-30). Introduction to and Hands-On Use Cases with HathiTrust Research Center’s Extracted Features 2.0 Dataset. Tutorial at ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2021, Virtual.

2020

 

Publications

Chang, K., Hu, Y., Shang, W., Sharma, A., Singhal, S., Underwood, T., Witte, J., & Wu, P. (2020, July 22-24). Book Reviews and the Consolidation of Genre. DH2020 Proceedings, Ottawa (virtually). DOI: 10.17613/02q2-1v27

Hu, Y., Jiang, M., Underwood, T., & Downie, J. S. (2020). Improving Digital Libraries’ Provision of Digital Humanities Datasets: A Case Study of HTRC Literature Dataset. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 405–408. DOI: 10.1145/3383583.3398621

Parulian, N.N., Dubnicek, R., Hall, K.E., Hu, Y., & Downie, J.S. (2020). Evaluating a Machine Learning Approach to Identifying Expressive Content at Page Level in HathiTrust. DH2020 Proceedings, Carleton University and the University of Ottawa, Ottawa, Canada. DOI: 10.17613/3nfw-tx25

Sharma, A., Hu, Y., Wu, P., Shang, W., Singhal, S., & Underwood, T. (2020). The rise and fall of genre differentiation in english-language fiction. CEUR Workshop Proceedings, 2723, 97–114. http://ceur-ws.org/Vol-2723/long27.pdf

Presentations

Bainbridge, D., Downie, J. S., & Whaanga, H. (2020). An open data approach to revealing indigenous texts in large-scale digital repositories: A case-study of locating pages of Māori text in the HathiTrust. ADHO 2020.

Dubnicek, R. Where to Find Millions of Books and How to “Read” Them: HathiTrust and HTRC. New Jersey Digital Humanities Consortium, 10 September 2020.

Jett, J., Capitanu, B., Kudeki, D., Dubnicek, R., Cole, T.W., & Downie, J.S. (2020, July). Extending the Utility of the HTRC Extracted Features Dataset Through Linked Data [Poster]. Digital Humanities Conference 2020, Ottawa, Canada.

Jett, J., Kudeki, D., Worthley, G., Cole, T. W., & Downie, J. S. (2020). Applying BIBFRAME in large-scale digital libraries: The HathiTrust Research Center's experience. Proceedings of the Association for Information Science and Technology, 57, e410. https://doi-org.proxy.lib.umich.edu/10.1002/pra2.410

Parulian, N. N., Dubnicek, R., Eden, K., Hu, Y., & Downie, S. (2020, July 22-24), Evaluating a machine learning approach to identify expressive content at page level in HathiTrust [Conference proceeding]. Digital Humanities 2020, Carleton University and the University of Ottawa, Ottawa, Canada.

Wong, J. & Dubnicek, R. (2020, March). Piloting a Workflow for Extracting Author Citations in Samuel Johnson’s Dictionary of the English Language [Poster]. iConference 2020, Borås, Sweden.

Datasets

Jett, J., Capitanu, B., Kudeki, D., Cole, T., Hu, Y., Organisciak, P., Underwood, T., Koehl, E., Dubnicek, R., Downie, J.S. (2020). The HathiTrust Research Center Extracted Features Dataset (2.0). HathiTrust Research Center. DOI: 10.13012/R2TE-C227

 

 

2019

 

Publications

Bainbridge, D., Nichols, D. M., Hinze, A., & Downie, J. S. (2019). Using the HTRC Data Capsule Model to Promote Reuse and Evolution of Experimental Analysis of Digital Library Data: A Case Study of Topic Modeling. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), (pp. 463-464).

Weigl, D., Kudeki, D., Cole, T. Downie, J., Jett, J., & Page, K. (2019). Combine or connect: Practical experiences querying library linked data. Proceedings of the 82nd Annual ASIS&T Meeting, 56(1), 296-305.

Plale, B., Dickson, E., Kouper, I., Liyanage, S. H., Ma, Y., McDonald, R. H., Walsh, J. A., & Withana, S. (2019). Safe open science for restricted data. Data and Information Management, 3(1), 50-60.

Presentations

Dickson Koehl, E., Green, H., Henley, A., and Heidenwolf, T. (2019, April). Empowering Librarians to Support Digital Scholarship Research: Professional Development Training on Text Analysis with the HathiTrust. Association of College and Research Library Conference, Cleveland, Ohio.

Downie, J. S., Bainbridge, D., Dubnicek, R. (2019, October 20). Data Without Borders: Exploring International Collaborations with HathiTrust Research Center. International Incubator Session at  ASIS&T 82nd Annual Meeting, Melbourne, Australia.

Furlough, M., & Walsh, J. A. (2019). Shaping the market: developing scalable, researcher-oriented text and data mining service. Paper presented at the DCDC (Discovering Collections, Discovering Communities) Conference, Library of Birmingham, Birmingham, UK, November 14, 2019.

Koehl, E. D., Green, H., Henley, A., & Heidenwolf, T. (2019, April). Empowering Librarians to Support Digital Scholarship Research: Professional Development Training on Text Analysis with the HathiTrust [Working Paper]. Association of College and Research Library Conference. Cleveland, OH.

Tutorials

 

Koehl, E. & Dubnicek, R. (2019, June 6). Text mining with HathiTrust [Tutorial]. ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2019, Urbana, Illinois.

 

 

2018

 

Publications

Bainbridge, D., Downie, J. S., & Capitanu, B. (2018). Providing Pin-point Page-level Precision to 1 Trillion Tokens of Text for Workset Creation. Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 407-408).

Dickson, E, Green, H., Nay, L., Courtney, A., McDonald, R. (2018). HathiTrust Research Center User Requirements Study White Paper.

Downie, J. S., Lorang, E., Soh, L.-K., Bainbridge, D., McIntyre, S., & Page, K. (2018). At the Nexus of Data and Collections: New Affordances in the Age of Mass-Scale Digital Libraries. Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 313–314).

Dubnicek, R., Underwood, T., & Downie, J.S. (2018, November 10-14). Creating A Disability Corpus for Literary Analysis: Pilot Classification Experiments. Proceedings of the iConference 2018, Sheffield, United Kingdom.Fenlon, K., Jett, J., Dubnicek, R., Cole, T.W., & Kudeki, D. (2018). Exploring linked data benefits for digital library users. Proceedings of the 81st ASIS&T Annual Meeting, Vancouver, Canada.

Hinze, A., Bainbridge, D., Cunningham, S. J., Taube-Schock, C., Matamua, R., Downie, J. S., & Rasmussen, E. (2019). Capisco: Low-cost concept-based access to digital libraries. International Journal on Digital Libraries, 20(4), 307-334.

 Hinze, A., Bainbridge, D., Wilkins, R., Taube-Schock, C., & Downie, J. S. (2018). Seeding strategies for semantic disambiguation. In Proceedings of the 18th ACM/IEEE-CS on Joint Conference on Digital Libraries (JCDL '18). ACM, New York, NY (pp. 343-344).

Page, K. R., Jett, J., Cole, T. W., Kudeki, D., Bainbridge, D., Organisciak, P., & Downie, J. S. (2018). Worksets Expand the Scholarly Utility of Digital Libraries. Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 371-372).

 

Presentations

Bainbridge, D., Downie, J.S., Capitanu, B. (2018, June). Providing Pin-point Page-level Precision to 1 Trillion Tokens of Text for Workset Creation.  Joint Conference on Digital Libraries, Fort Worth, TX.

Dickson Koehl, E., et al. (2018, October). Empowering Librarians to Support Digital Scholarship Research: The "Digging Deeper, Reaching Further" project. Digital Library Federation Forum 2018, Las Vegas, NV.

Downie, J.S. (2018, February 13). Creating universal open access to closed textual data at scale: Use cases from the HathiTrust Research Center. Invited talk to Graduate School of Library, Information and Media Studies, University of Tsukuba, Tsukuba, Japan.

Downie, J.S. (2018, January 8). Creating universal open access to closed textual data at scale: Use cases from the HathiTrust Research Center. Invited talk to Department of Computer Science, University of Waikato, Hamilton, New Zealand.

Downie, J.S. (2018, March 16). Creating universal open access to closed textual data at scale: Use cases from the HathiTrust Research Center. Invited lecture to University of Denver Library, Denver CO.

Downie, J.S. (2018, March 23). Creating universal open access to closed textual data at scale: Use cases from the HathiTrust Research Center. Invited lecture to Research Center for Machine Learning, City University of London, London, UK.

Downie, J.S., Lorang, E., Soh, L., Bainbridge, D., McIntyre, S., Page, K. (2018, June). At the Nexus of Data and Collections: New Affordances in the Age of Mass-Scale Digital Libraries. Joint Conference on Digital Libraries, Fort Worth, TX.

Furlough, M., Green, H., Butler, B. (2018, October). HathiTrust and Non-consumptive Research Services: Prospects. Digital Library Federation Forum 2018, Las Vegas, NV.

Page, K.R., Jett, J., Cole, T.W., Kudeki, D., Bainbridge, D., Organisciak, P., Downie, J.S. (2018, June). Worksets Expand the Scholarly Utility of Digital Libraries. Joint Conference on Digital Libraries, Fort Worth, TX.

2017

 

Publications

Bainbridge, D. & Downie, J.S. (2017). All for One and One for All: Reconciling Research and Production Values at the HathiTrust through User-Scripting. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Toronto, ON, (pp. 1-2). DOI: 10.1109/JCDL.2017.7991591

Bhattacharyya, S., Merrill, C., Organisciak, P., Schmidt, B. M., Auvil, L., Aiden, E., & Downie, J. S. (2017). Big-Data Oriented Text Analysis for the Humanities: Pedagogical Use of the HathiTrust+Bookworm Tool. DH 2017, Montreal, Canada.

Dickson, E., Tracy, D.G., McIntyre, S. Glushko, B., McDonald, R.H., Butler, B., & Downie, J.S. (2017).  Creating a Policy Framework for Analytic Access to In-Copyright Works for Non-Consumptive Research. DH 2017, Montreal, Canada.

Green, H., & Dickson, E. 2017. Expanding the Librarian's Tech Toolbox: The "Digging Deeper, Reaching Further: Librarians Empowering Users to Mine the HathiTrust Digital Library Project. D-Lib Magazine. DOI: 10.1045/may2017-green

McDonald, R.H. (2017). Research Center as Distant Publisher: Developing Non-Consumptive Compliant Open Data Worksets to Support New Modes of Inquiry. DH 2017, Montreal, Canada.

Murdock, J., Allen, C., Börner, K., Light, R., McAlister, S., et al. 2017. Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library. PLOS ONE 12(9): e0184188. DOI: 10.1371/journal.pone.0184188

Murdock, J., Jett, J., Cole, T., Ma, Y., Downie, J.S., & B. Plale. (2017). Towards Publishing Secure Capsule-Based Analysis. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Toronto, ON, (pp. 1-4). DOI: 10.1109/JCDL.2017.7991585

Organisciak, P., Capitanu, B., Underwood, T., & Downie, J. S. (2017). Access to billions of pages for large-scale text analysis. IConference 2017. http://hdl.handle.net/2142/96256

Page, K., Nurmikko-Fuller, T., Cole, T., & Downie, J.S. (2017). Building Worksets for Scholarship by Linking Complementary Corpora. DH 2017, Montreal, Canada.

Pustejovsky, J., Verhagen, M., Rim, K., Ma, Y., Ran, L., Liyanage, S., Murdock, J., McDonald, R. H., & Plale, B. (2017). Enhancing Access to Digital Media: The Language Application Grid in the HTRC Data Capsule. Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, (pp. 1-3). DOI: 10.1145/3093338.3104171

Weigl, D. M., Page, K. R., Organisciak, P., & Downie, J. S. (2017). Information-Seeking in Large-Scale Digital Libraries: Strategies for Scholarly Workset Creation. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), (pp. 1-4). DOI: 10.1109/JCDL.2017.7991583

 

Presentations

Downie, J.S. (2017, July 10). Digital humanities using both closed and open data: Use cases from the HathiTrust Research Center. Invited lecture to King’s Digital Lab, King’s College, London, UK.

Downie, J.S. (2017, November 16). HathiTrust Research Center: Text mining the very big data of the HathiTrust Digital Library. Invited keynote lecture to CLICK! Connecting Libraries, Information, and Community Knowledge Conference, Ateneo de Manila University in Quezon City, Philippines.

Downie, J.S. (2017, October 30). HathiTrust Research Center: Strategic approaches to opening research opportunities on closed data. Invited lecture to Shanghai Customs College, Shanghai, China.

Downie, J.S. (2017, September 15). HathiTrust Research Center: Strategic approaches to opening research opportunities on closed data. Invited lecture to International Institute for Digital Humanities, University of Tokyo, Tokyo, Japan.

Downie, J.S. (2017, September 18). HathiTrust Research Center: Strategic approaches to opening research opportunities on closed data. Invited lecture to Institute for Digital Research in the Humanities, University of Kansas, Lawrence, Kansas.

Dubnicek, R. & Organisciak, P. (2017, October 26-27). Data Capsule in 7 Minutes. NovelTM Workshop 2017, Montreal, Canada.

Green, H. Reducing barriers to participation in automated text analysis in the humanities. (2017, January). (Respondent: Sayan Bhattacharyya.) 132nd Annual Convention of the Modern Language Association (MLA), Philadelphia. Google Doc

Hu, X., Chu, S. K. W., Downie, J. S., & Lee, C. W. Y. (2017, March). Data Science as an Emerging Discipline: The Roles of iSchools in the Era of Big Data. Proceedings of Information Science to Data Science: New Directions for iSchools Workshop in iConference, Wuhan, China.

Liyanage, S. & Murdock, Jaimie. (2017, March). HTRC Data Capsule and Python SDK. User group meeting. Google Slides

Liyanage, S., Organisciak, P., Downie, S. (2017, January). HathiTrust Research Center Architecture Overview. Web-conference #3 for HathiTrust all-sites staff. Google Slides

Jett, J., Cole, T. W., & Downie, J. S. (2017). Exploiting graph‐based data to realize new functionalities for scholar‐built worksets. Proceedings of the Association for Information Science and Technology, 54(1), 716-717. https://doi.org/10.1002/pra2.2017.14505401128

Murdock, J., Jett, J., Cole, T.W., Ma, Y., Downie, J.S., & Plale, B. (2017). Towards Publishing Secure Capsule-based Analysis. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

Organisciak, P., & Franklin, S. (2017). Modeling creativity: Tracking long-term lexical change. In Digital Humanities Conference.

Peng, Z., & Plale, B. (2019). Reliable access to massive restricted texts: Experience‐based evaluation. Concurrency and Computation: Practice and Experience, 32(16). https://doi.org/10.1002/cpe.5255

2016

 

Publications

Downie, J.S., Furlough, M., McDonald, R.H., Namachchivaya, B., Plale, B.A., & Unsworth, J. (2016, May/June). The HathiTrust Research Center: Exploring the Full-Text Frontier, EDUCAUSE Review 51(3), 50-51.

Green, H., Dickson, E., Nay, L., & Zegler-Poleska, E. (2017). Scholarly Needs for Text Analysis Resources: A User Assessment Study for the HathiTrust Research Center. (2016). Proceedings of the Charleston Library Conference.

Hinze, A., Bainbridge, D., Cunningham, S., Downie, J.S. (2016, June). Low-cost Semantic Enhancement to Digital Library Metadata and Indexing: Simple Yet Effective Strategies. Proceedings of JCDL 2016 (pp. 93-102).

Jett, J., Cole, T.W., Maden, C., & Downie, J.S.(2016). The HathiTrust Research Center Workset Ontology: A Descriptive Framework for Non-Consumptive Research Collections. Journal of Open Humanities Data 2, e1.

Murdock, J., Zeng, J., & Allen, C. (2016, January). Towards Evaluation of Cultural-scale Claims in Light of Topic Model Sampling Effects. 2016 International Conference on Computational Social Science.

Organisciak, P., & Capitanu, B. (2016). Text Mining in Python through the HTRC Feature Reader.  Programming Historian.  2016.

Plale, B. (2016, July/August). HathiTrust Research Center Data Capsule for Full-Text Distant Reading. D-Lib Magazine, 22, 7-8.

Zeng, J., & Plale, B. (2015). Workload-Aware Resource Reservation for Multi-tenant NoSQL. 2015 IEEE International Conference on Cluster Computing, 32-41. (Best paper candidate)

Workshops

Bhattacharyya, S. (2016, February 4). Workshop with HT+Bookworm for student teams for 4Humanities Student Prize Contest 'Why is studying the humanities important? Scholarly Commons, Main Library, University of Illinois, Urbana-Champaign.

Cline, N. & Mobley, L. (2016, April 14). Text Mining with the HathiTrust: Empowering Librarians to Support Digital Scholarship Research. Scholarly Commons, Main Library, University of Illinois Urbana-Champaign.

Dickson, E. & Bhattacharyya, S. (2016, January 4). Doing Text Analysis with the HathiTrust Research Center’s Tools. University of Texas at Austin.

Dickson, E. (2016, February 10). Text Analysis Methods and Tools. Brownbag at the Illinois Program for Research in the Humanities. University of Illinois at Urbana-Champaign.

Dickson, E. (2016, March 8). University of Illinois Savvy Researcher workshop on Text Analysis. March 8, 2016.

Dickson, E. & Green, H. (2016, April 14). Text Mining with the HathiTrust: Empowering Librarians to Support Digital Scholarship Research. Scholarly Commons, Main Library, University of Illinois, Urbana-Champaign.

Dickson, E., Cline, N. & Mobley, L. (2016, June 12). Text Analysis with the HathiTrust Research Center. Digital Humanities Summer Institute (DHSI 2016), University of Victoria, Canada.

Dickson, E. & Organisciak, P. (2016, August 15). Text Analysis with the HathiTrust Research Center.' Workshop. Digital Humanities at Berkeley Summer Institute and Berkeley Institute for Data Science workshop. University of California Berkeley.

Green, H. (2016, June 11). Introduction to Text Mining with the HathiTrust Research Center. THATCamp Southern Illinois University Edwardsville.

Datasets

Capitanu, B., Underwood, T., Organisciak, P., Cole, T., Sarol, J.M., Downie, J.S. (2016): The HathiTrust Research Center Extracted Features Dataset. 1.0 [Dataset].  http://dx.doi.org/10.13012/J8X63JT3

Organisciak, P. (2016). Term weights for 235k language and literature texts [Data set]. http://hdl.handle.net/2142/89691

Presentations

Bhattacharyya, S. Text analysis tools in progress from the HathiTrust Research Center. (2016, October 10). Mellon Digital Humanities Seminar, Price Lab for Digital Humanities, University of Pennsylvania, Philadelphia, Pennsylvania.

Organisciak, P., Bhattacharyya, S., Auvil, L., Unnikrishnan, L., Schmidt, B., Shamim, M., McDonald, R., Downie, J., Aiden, E. (2016, July 11-16). Adding Flexibility to Large-Scale Text Visualization with HathiTrust+Bookworm. Digital Humanities 2016, Jagiellonian University & Pedagogical University, Kraków.

Nurmikko-Fuller, T., Jett, J., Cole, T.W., Maden, C., Page, K.R., & Downie, J.S. (2016, July 11-16). A Comparative Analysis of Bibliographic Ontologies: Implications for Digital Humanities. Digital Humanities 2016, Jagiellonian University & Pedagogical University, Kraków.

 Jett, J., Nurmikko-Fuller, T., Cole, T.W., Page, K.R., & Downie, J.S. (2016, June 19-23). Enhancing Scholarly Use of Digital Libraries: A Comparative Survey Review of Bibliographic Metadata Ontologies. Joint Conference on Digital Libraries 2016, Newark, New Jersey.

Downie, J.S. (2016). HathiTrust and the Future of Digital Archive [Keynote Address]. International Symposium at University of Tokyo Ito International Academic Research Center.

McDonald, R.H. (2015, January 25). What’s next with the HathiTrust Research Center? Indiana University StatewideIT Day, September 20, 2016, Bloomington, Indiana.

Bhattacharyya, S. & Shamim, M. (2016, January 8). The HathiTrust+Bookworm Project as a Model for Collaborative Research at Large Scale [Presentation in the panel "Developing and Sustaining Collaborative Research in the Humanities"]. 131st Annual Convention of the Modern Language Association (MLA), Austin, Texas.

Bhattacharyya, S. (2016, January 20). HathiTrust Research Center: Capabilities and Affordances [Presentation made to Stanford University Library digital humanities group and subject specialist librarians' group]. Green Library, Stanford University, Stanford, California.

Organisciak, P. & Bhattacharyya, S. (2016, February 10). New tools from the HathiTrust Research Center for digitized text analysis at scale: The HathiTrust+Bookworm tool and the Extracted Features dataset. E-Research Roundtable, Graduate School of Library and Information Science, University of Illinois Urbana-Champaign, Champaign, Illinois.

Downie, J. Stephen. (2016, February 26). DH Panel: Fair Use and the Future of Digital Scholarship. Scholars Lab, University of Virginia.

Dubnicek, R. & Kinnaman, A. (2016, April). Open Access in Text Analytics [Poster]. iSchool Master’s Student Showcase, Champaign, Illinois.

Downie, J.S. (2016, April 8). The HathiTrust Research Center: Exciting New Cultural Computation Opportunities. Initiative for Digital Humanities, Media, and Culture at Texas A&M University.

Bhattacharyya, S. & Underwood, T. (2016, June). Does Gender Affect How Genre-Conformingly Writers Write? Digital Humanities Summer Institute (DHSI 2016), University of Victoria, Canada.

Green, H., Dickson, E., & Bhattacharyya, S. (2016, July 11-16). Scholarly Requirements for Large Scale Text Analysis: A User Needs Assessment by the HathiTrust Research Center. Digital Humanities 2016, Jagiellonian University & Pedagogical University, Kraków.

Organisciak, P. & Downie, J.S. (2016, September). Trends in Centuries of Words: Progress on the HathiTrust+Bookworm Project. Annual Meeting of the Japanese Association for Digital Humanities 2016 (JADH 2016), University of Tokyo, Tokyo, Japan.

Ruan, G. & Plale, Beth. (2016, September). Horme: Random Access Big Data Analytics. IEEE Cluster 2016, Taipei, Taiwan.

Green, H. (2016, September 27). Building Capacities and Communities for Digital Scholarship: The 'Digging Deeper, Reaching Further: Libraries Empowering Users to Mine HathiTrust Digital Library Resources' Project. Collections as Data Symposium: Stewardship and Use Models to Enhance Access. Washington, D.C.

Dickson, E., Green, H., & Courtney, A. (2016, October). (Re)skilling the 21st Century Librarian: Empowering Librarians to Support Text Analysis Research. 2016 Digital Library Federation Forum, Milwaukee, Wisconsin.

Bhattacharyya, S. (2016, March 17-20). Small data and big data: The reflective in the context of text analysis and the humanities classroom. Annual Conference of the American Comparative Literature Association (ACLA), Harvard University, Cambridge, Massachusetts.

Zeng, J. & Plale, B. (2016, May). KVLight: A Lightweight Key-Value Store for Distributed Access in Cloud. 16th IEEE/ACM Int’l Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Columbia.

2015

 

Publications

Bhattacharyya, S., Organisciak, P., & Downie, J.S. (2015, March). A fragmentizing interface to a large corpus of digitized text. Interdisciplinary Science Reviews (special issue on “The Future of Reading”), 40(1), 61-77.

Hinze, A., Taube-Schock, C., Bainbridge, D., Cunningham, S., Downie, J.S. (2015). Introducing Capisco: a semantically-enhanced search and discovery system for large-scale text corpora. SIGWEB Newsletter Autumn 2015 Issue, 1-14.

Murdock, J., Zeng, J., & Allen, C. Towards Cultural-Scale Models of Full Text. Report for the HTRC ACS project "Taxonomizing the Texts: Towards Cultural-Scale Models of Full Text". December 2015.

Murdock, J., Zeng, J., & McDonald, R. H. (2015). Topic Exploration with the HTRC Data Capsule for Non-Consumptive Research. Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries. Association for Computing Machinery.

Nurmikko-Fuller, T., Page, K.R., Willcox, P., Jett, J., Maden, C., Cole, T., Fallaw, C., Senseney, M., & Downie, J.S. (2015). Building Complex Research Collections in Digital Libraries: A Survey of Ontology Implications. Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries. Association for Computing Machinery.

Workshops

Bhattacharyya, S. & York, J. (2015, January 9). Humanistic inquiry with large corpora of digitized text and metadata: Towards new epistemologies? [Workshop]. 130th Annual Conference of the Modern Language Association (MLA), Vancouver, Canada.  

Bhattacharyya, S. & Green, H. (2015, April 29). The HathiTrust+Bookworm tool for lexical trend discovery [Workshop]. Scholarly Commons, University of Illinois at Urbana-Champaign Library, Champaign, Illinois.

Bhattacharyya, S. (2015, May 30). Text mining with the HathiTrust Research Center: An introduction to working with digitized text corpora and metadata [Workshop]. Annual Conference of the Humanities, Arts, Science, and Technology Alliance and Collaboratory (HASTAC), Michigan State University, East Lansing, Michigan.

Bhattacharyya, S. (2015, July 13). The HathiTrust Research Center: Large-scale Computational Analysis with the World’s First Massive Digital Library. Linguistic Society of America (LSA)'s Biennial Linguistic Institute, The University of Chicago, Chicago, Illinois.

Bhattacharyya, S. & Dickson, E. (2015, July 28). Introduction to the HathiTrust Research Center (HTRC): Teaching and research using the power of data and metadata in large text corpora [Workshop]. Humanities Intensive Learning and Teaching (HILT) 2015, Indiana University - Purdue University Indianapolis (IUPUI), Indianapolis, Indiana.

Bhattacharyya, S. & Dickson, E. (2015, July 29). Advanced Topics in Text Analysis with the HathiTrust Research Center (HTRC) [Workshop]. Humanities Intensive Learning and Teaching (HILT) 2015, Indiana University - Purdue University Indianapolis (IUPUI), Indianapolis, Indiana.

Bhattacharyya, S. (2015, November 20). Text Analysis with the HathiTrust Research Center [Workshop]. University of Michigan Digital Scholarship Workshop Series. University of Michigan, Ann Arbor, Michigan.

 Bhattacharyya, S. & Dickson, E. (2015, October 26-28). The HathiTrust Research Center's Tools for Text Analysis with Digitized Text from the HathiTrust Digital Library. Digital Libraries Federation Forum (DLF Forum), Vancouver, Canada.

Chen, M. & Dickson, E. (2015, November 2). HathiTrust Research Center Tools and Services [Webinar]. Georgetown University, Washington, D.C.

Dickson, E. & Bhattacharyya, S. (2015, November 15). Using the HathiTrust Research Center’s Tools for Text Analysis. Chicago Colloquium on Digital Humanities & Computer Science (DHCS 2015), University of Chicago, Chicago, Illinois.

Green, H. & Bhattacharyya, S. (2015, February 16). Introduction to the HathiTrust Research Center Portal (Version 3.0) for Text Mining Research [Workshop]. Scholarly Commons, University of Illinois at Urbana-Champaign Library, Champaign, Illinois.

Datasets

Boris C., Underwood, T., Organisciak, P., Bhattacharyya, S., Auvil, L., Fallaw, C., & Downie, J.S. (2015). Extracted Feature Dataset from 4.8 Million HathiTrust Digital Library Public Domain Volumes (0.2) [Dataset]. HathiTrust Research Center.

Underwood, T., Capitanu, B., Organisciak, P., Bhattacharyya, S., Auvil, L., Fallaw, C., & Downie, J.S.  Word Frequencies in English-Language Literature, 1700-1922 (0.2) [Dataset]. HathiTrust Research Center. August 2015.

Presentations

Auvil, L. (2015, April 17-18). HT+BW: HathiTrust+Bookworm. DPLAfest 2015, Indianapolis, Indiana.

Auvil, L., Lieberman Aiden, E., Downie, J.S., Schmidt, B., Bhattacharyya, S., & Organisciak, P. (2015, June 29-July 3). Exploration of Billions of Words of the HathiTrust Corpus with Bookworm: HathiTrust + Bookworm Project. Digital Humanities 2015 (DH 2015) Conference, Sydney, Australia.

Bhattacharyya, S. (2015, March 26-29). Towards posthumanist reading? How to do things with (mere) words from text [Presentation]. Part of the Rethinking Text As “Process” in the Humanities, Digital and Non-Digital Seminar, Annual Conference of the American Comparative Literature Association (ACLA), Seattle, Washington.

Bhattacharyya, S. (2015, November 11). The HathiTrust Research Center's Extracted Features Dataset: An Opportunity for "Distant" Reading of Millions of Books from the World's Great Research Libraries. “Big Data Case Studies” panel, Big Data Summit 2015. Research Park, University of Illinois at Urbana-Champaign, Champaign, Illinois.

Bhattacharyya, S. (2015, November 19). Comparative Literature 322: Writing World Literatures [Class session for Professor Christi Merrill]. University of Michigan, Ann Arbor.

Bhattacharyya, S. & Dickson, E. (2015, October 2). Text Analysis with the HathiTrust Research Center: Tools and Datasets for Research and Teaching. University of Chicago Digital Humanities Forum, Chicago, Illinois.

Bhattacharyya, S. & Downie, J.S. (2015, June 29-July 3). Approaching textuality with the metaphor of the digitized workset. Digital Humanities 2015 (DH 2015) Conference, Sydney, Australia.

Bhattacharyya, S., Capitanu, B., Organisciak, P., Auvil, L., Fallaw, C.. & Downie, J.S. (2015, November 13-15). Big Textual Data in Undergraduate Student Writing for Literature Courses: Affordances of the HathiTrust Research Center’s Extracted Features Dataset. Chicago Colloquium on Digital Humanities & Computer Science (DHCS 2015). University of Chicago, Illinois.

Capitanu, B., Underwood, T., Organisciak, P., Bhattacharyya, S., Auvil, L., Fallaw, C., & Downie, J.S. (2015, April 3). Extracting features from text for non-consumptive reading with the HathiTrust Research Center. Graduate School of Library and Information Science Research Showcase, University of Illinois at Urbana-Champaign, Champaign, Illinois.

Downie, J. S. (2015, April 17). The HathiTrust Research Center: Bringing you 4.7 billion pages of analytic opportunities! [Invited Keynote]. Virtual Humanities Lab, Department of Italian Studies, Brown University, Providence, Rhode Island.

Downie, J. S. (2015, April 22). HathiTrust Research Center: Your Analytical Gateway to the HathiTrust Digital Library’s 4.5 Billion Pages. Invited talk to Tri-Co Digital Humanities as part of Digital Humanities Month. 

Downie, J. S. (2015, June 11). Metadata in the HathiTrust [Invited Talk]. 2015 International Workshop on Data Management, Beijing Institute of Technology Library, Beijing, China.

Downie, J.S. (2015, February 27). HathiTrust: Large-Scale Repository in the Humanities: Unlocking the Secrets of 4.6 Billion Pages. Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong.

Downie, J.S. (2015, June 12). HathiTrust: Large-Scale Data Repository in the Humanities. Invited keynote at 2015 International Workshop on Data Management, Beijing Institute of Technology Library, Beijing, China.

Downie, J.S. (2015, June 24). The HathiTrust Research Center: Providing analytic access to the HathiTrust Digital Library's 4.7 billion pages [Invited Keynote]. ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’15), Knoxville, Tennessee.

Downie, J.S. Unlocking the Secrets of 4.5 Billion Pages: A HathiTrust Research Center Update. Presented at: Bodleian Digital Library Systems and Services and the Oxford e-Research Centre, Oxford University, Oxford, United Kingdom (2015, January 7); University of Waikato, Hamilton, New Zealand (2015, January 20); Department of Computer Science, Victoria University, Wellington, New Zealand (2015, January 22); National Library of New Zealand, Hamilton, New Zealand (2015, January 23).  

Guo, S., Edelblute, T., Dai, B., Chen, M., & Liu, X. (2015, March 30-31). Toward Enhanced Metadata Quality of Large-Scale Digital Libraries: Estimating Volume Time Range. HathiTrust Research Center UnCamp, Ann Arbor, Michigan.

Guo, S., Edelblute, T.,Dai, B., Chen, M., & Liu, X. (2015, March). Toward Enhanced Metadata Quality of Large-Scale Digital Libraries: Estimating Volume Time Range. iConference 2015, Newport Beach, California.

Herr-Hoyman, D. (2015, April 17). HathiTrust Research Center Tools: SHARC. Digital Humanities + Art: Going Public 2015, University of Wisconsin-Madison, Madison, Wisconsin. 

Herr-Hoyman, D. (2015, March 30-31). SHARC: Secure HathiTrust Analytics Research Commons. HathiTrust Research Center UnCamp, Ann Arbor, Michigan.

McDonald, R. (2015, November 18). The HathiTrust Research Center: Enabling New Knowledge Through Shared Infrastructure. NISO Webinar: Text Mining: Digging Deep for Knowledge.

Organisciak, P., Auvil, L., Bhattacharyya, S., & Downie, J.S. (2015, June 1-3). The HTRC Extracted Features Dataset. Joint Conference of the Canadian Society of Digital Humanities/Société canadienne des humanités numériques (CSDH-SCHN) and the Association for Computers and the Humanities (ACH), Ottawa, Canada.

Organisciak, P., Auvil, L., Schmidt, B., Bhattacharyya, S., Fallaw, C., Shamim, M., McDonald, R., Downie, J.S., & Lieberman Aiden, E. (2015, April 3). The HathiTrust + Bookworm Project: Exploring Cultural and Literary Trends in Millions of Scanned Books. Graduate School of Library and Information Science Research Showcase, University of Illinois at Urbana-Champaign, Champaign, Illinois.

Pathirage, M. & Plale, B. (2015, March 30-31). HathiTrust Research Notebooks. HathtTrust Research Center UnCamp, Ann Arbor Michigan.

Plale, B. (2015, March 3). HathiTrust: Large-Scale Repository in the Humanities Unlocking the Secrets of 4.6 Billion Pages. Cyberinfrastructure Day, University of Missouri, Columbia, Missouri.

Plale, B. (2015, May 7-8). Trust threads: minimal provenance and data publication and reuse. 1st Annual National Data Integrity Conference, Longmont, Colorado.

Plale, B. (2015), May 28. Talk on HT/HTRC at RDA-sponsored Digital Humanities Workshop. Baltimore, Maryland. 

Plale, B. & McDonald, R. (2015, November 16-19).Visualizing the HathiTrust Research Center (HTRC) data. Supercomputing 2015 (SC15) Exhibition. Austin, Texas.

Shamim, M. & Bhattacharyya, S. (2015, November 9). Culturomics: New Developments in Analyzing Digitized Texts. Rice University Digital Humanities Group, Houston, Texas.

The HTRC Team. (2015, March 30-31). Workset Builder and Portal of the HathiTrust Research Center. HathiTrust Research Center UnCamp, Ann Arbor, Michigan.

Underwood, T. (2015, June 17). Iowa Digital Bridges Summer Institute. Grinnell College, Grinnell, Iowa.

Underwood, T. (2015, May 16). How rapidly do literary standards change? Scale and Value Symposium, University of Washington, Seattle, Washington.PDF

Underwood, T. (2015, May 22). The Pace of Literary Change. Cultural Analytics: Computational Approaches to the Study of Culture. University of Chicago, Chicago, Illinois.

Zeng, J., Ruan, G., Plale, B., Crowell, A., & Prakash, A. (2015, March 30-31). HTRC Data Capsule: Non-Consumptive Use of Texts. HathiTrust Research Center UnCamp,  Ann Arbor Michigan.

2014

 

Publications

Bamman, D., Underwood, T., & Smith, N. (2014). A Bayesian Mixed Effects Model of Literary Character. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 370-379.

Underwood, T. (2014). Theorizing Research Practices We Forgot to Theorize Twenty Years Ago. Representations 127(1), 64-72. 

Underwood, T., Long, H., & So, R. J. (2014, December 10). Cents and Sensibility. Slate

Presentations

Bhattacharyya, S. (2014, December 8). What is a text? Approaching an old question with a new metaphor (the digitized workset). Presented to the Conceptual Foundations Group, Graduate School of Library and Information Science, University of Illinois, Urbana-Champaign. 

Bhattacharyya, S. & Mehta, R. (2014, January 31). Investigating Writers' Attitudes by Mining a Large Corpus of Books: Preliminary Research. Postdoctoral Research Symposium, Society of Postdoctoral Scholars, University of Illinois Urbana-Champaign.

Bhattacharyya, S. & Organisciak, P. (2014, September 15). (Post)humanism and Reading via Fragments: Some thoughts about the HathiTrust Research Center's New Textual Tools. Presented to the  "Reading the Digital Humanities" group of the Illinois Program for Research in the Humanities, University of Illinois, Urbana-Champaign.

Bhattacharyya, S., Organisciak, P., & Downie, J.S. (2014, November 6). Non-consumptive reading using feature-extraction at the HathiTrust Research Center. THATCamp HSS 2014. Chicago, IL.

Chen, M. (2014, February 11). Opportunities and Challenges of Text Mining HathiTrust Digital Library. Computational Linguistics Seminar at Indiana University.  

 Chen, M. (2014, March 5). HathiTrust Research Center: Challenges and Opportunities in Big Text Data. Digital Library Brown Bag, Indiana University.

Downie, J.S. (2014, December 17). Unlocking the Secrets of 4.5 Billion Pages: A HathiTrust Research Center Update. Invited talk to School of Computer Science, University of Western Ontario, Toronto, Canada.

Downie, J.S. (2014, December 18). Unlocking the Secrets of 4.5 Billion Pages: A HathiTrust Research Center Update. Invited talk to University of Waterloo, Waterloo, Canada.

Downie, J.S. (2014, June 3). Unlocking the Secrets of 3 Billion Pages: Introducing the HathiTrust Research Center. Invited talk to Academia Sinica, Taipei, Taiwan.

Downie, J.S. (2014, May 6). HathiTrust Research Center: The Workset Creation for Scholarly Analysis (WCSA) Prototyping Project. Sound and Music Computing Lab, Royal Institute of Technology (KTH), Stockholm, Sweden.

Downie, J.S., Dougan, K., Bhattacharyya, S. & Fallaw, C. (2014, September 12). The HathiTrust Corpus: A Digital Library for Musicology Research? First International 'Digital Libraries for Musicology' Workshop, 2014 (DLfM 2014), London, United Kingdom.

Fenlon, K., Fallaw, C., Cole, T.,  & Han, M.J (2014, September 8-12). A Preliminary Evaluation of HathiTrust Metadata: Assessing the Sufficiency of Legacy Records. Digital Libraries 2014, London, UK.

Fenlon, K., Senseney, M., Green, H., Bhattacharyya, S., Willis, C., & Downie, J.S. (2014, October 31-November 4). Scholar-built collections: A study of user requirements for research in large-scale digital libraries. 2014 Annual Meeting of The Association for Information Science & Technology (ASIS&T), Seattle, Washington, USA. 

Green, H. & Bhattacharyya, S. (2014, November 17). Introduction to the HathiTrust Research Center Portal (Version 2.0) for Text Mining Research [Workshop]. Scholarly Commons, University of Illinois at Urbana-Champaign Library, Champaign, IL. 

Green, H., Fenlon, K., Senseney, M., Bhattacharyya, S., Willis, C., Organisciak, P., Downie, J.S., Cole, T., & Plale, B. (2014, March 4-7). Using collections and worksets in large-scale corpora: Preliminary findings from the Workset Creation for Scholarly Analysis project. iConference 2014, Berlin, Germany.

McDonald, R., Peng, Z., & Chen, M. (2014, September 18). Public lecture on HathiTrust Research Center and Hands-on. Ohio State University.

Organisciak, P., Bhattacharyya, S., Auvil, L., & Downie, J.S. (2014, July 8-12). Large-scale text analysis through the HathiTrust Research Center. Digital Humanities 2014 (DH2014) Conference, Lausanne, Switzerland.

Organisciak, P., Plale, B., Downie, J.S., & Auvil, L. (2014, October 23-24). Panel Discussion: 'The HathiTrust Research Center. Chicago Colloquium on Digital Humanities and Computer Science (DHCS 2014), Northwestern University, Evanston, Illinois.

Peng, Z., Chen, M., Plale, B., & Kowalczyk, S. (2014, October 31-November 2014). Author Gender Metadata Augmentation of HathiTrust Digital Library. 2014 Annual Meeting of The Association for Information Science & Technology (ASIS&T).

Plale, B. (2014, May 21). Bridging Digital Humanities Research and Large Repositories of Digital Text [Keynote Address]. 2° Encuentro de Humanistas Digitales, Biblioteca Vasconcelos, Mexico City, Mexico. 

Plale, B. & McDonald, R. (2014, November 16-21). The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework. Supercomputing 2014. New Orleans, LA.

Plale, B., McDonald, R.H., Chen, M., Zeng, J., Ruan, G., & Cline, N. (2014, September). HathiTrust Research Center (HTRC) Data Capsule v 1.0: A Hands-On Demonstration Session. IU Scholars’ Commons Workshop.

Ruan, G., Zhang, H., Wernert, E., & Plale, B. (2014, July 13-18). TextRWeb: Large-Scale Text Analytics with R on the Web. Conference on Extreme Science and Engineering Discovery Environment (XSEDE '14), Atlanta, USA.

Underwood, T. (2014, March). Beyond Tools: The Questions about Interpretation that Link Computer Science to the Humanities. Scholars' Lab, University of Virginia, Charlottesville, Virginia; Emory University, Atlanta, Georgia; University of Kansas, Lawrence, Kansas; Indiana University, Indiana. 

Zeng, J., Ruan, G., Crowell, A., Prakash, A., & Plale, B. (2014, June). Cloud Computing Data Capsules for Non-Consumptive Use of Texts. 5th Workshop on Scientific Cloud Computing (ScienceCloud '14), Vancouver, Canada.

2013

 

Publications

Kowalczyk, S. T., Y. Sun, Z. Peng, B. Plale, A. Todd, L. Auvil, C. Willis, J. Zeng, M. Pathirage, S. Liyanage, et al., (2013). In Wen-Chen Hu and Naima Kaabouch (eds), Big Data at Scale for Digital Humanities: An Architecture for the HathiTrusResearch Center. In Big Data Management, Technologies, and Applications.

Plale, B., McDonald, R.H., Sun, Y., Kouper, I., Cobine, R., Downie, J.S., Sandore Namachchivaya, B., & Unsworth, J. (2013). HathiTrust Research Center: Computational Access for Digital Humanities and Beyond. Proceeding of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries.

Underwood, T., Black, B., Auvil, L., & Capitanu, B. (2013). Mapping Mutable Genres in Structurally Complex Volumes. Proceedings of the 2013 IEEE International Conference on Big Data (pp. 95-103).

Presentations

Cole, T & Green, H. (2013, December 9-10). SeWorkset Creation for Analysis — an HTRC initiative. Coalition for Networked Information (CNI) Membership Meeting, Washington, DC.

Downie, J.S. et al. (2013, December 5-7). The Workset Creation for Scholarly Analysis (WCSA) Prototyping Project: Background and goals. Chicago Colloquium on Digital Humanities and Computer Science, Chicago, Illinois.

Downie, J.S. et al. (2013, December 5-7). The Workset Creation for Scholarly Analysis (WCSA) Prototyping Project: Background and goals. Chicago Colloquium on Digital Humanities and Computer Science, Chicago, Illinois.

Downie, J.S., Cole, T., Plale, B. & Unsworth, J. (2013, September 19-21). Workset Creation for Scholarly Analysis: Preliminary Research at the HathiTrust Research Center [Poster]. Japanese Association for Digital Humanities, Kyoto, Japan.

Green, H. & Underwood, T. (2013, October). Collaboration Between Scholars and Librarians in Digital Humanities. University of Michigan Library, Ann Arbor, Michigan.

Hess, K., Downie, J.S., Cole, T., & Green, H. (2013, November 4-6). Workset Creation for Scholarly Analysis: Preliminary Research at HathiTrust Research Center. DLF Forum 2013 Community Idea Exchange, Austin, Texas.

Kowalczyk, S., Auvil, L, & Chen, M. (2013,July 25). HTRC demo and hands-on [Tutorial]. 13th ACM/IEEE-CS joint conference on Digital libraries. Indianapolis, IN.

Kowalczyk, S., Hess, K., & Auvil, L. (2013, September 8-9). Kirk Hess, and Loretta Auvil. Hands On: Workset Builder, Portal and SEASR. HTRC UnCamp 2013; Champaign, Illinois.

McDonald, R. & Sun, Y. (2013, June 7). The HathiTrust Research Center (HTRC): An Overview and Demo. Indiana University Librarians' Day, Indianapolis.

McDonald, R. H., S. Liyanage, M. Pathirage, Z. Peng, J. Zeng, G. Ruan, and M. Chen. (2013, November 11). Using HathiTrust Center Tools. Catapult Workshop, Bloomington, Indiana.

Plale, B. (2013, December). Big Data Opportunities and Challenges for Information Retrieval, Text Mining, and NLP. The British Library, London, UK; Knowledge Media Institute (KMi), The Open University, Milton Keynes, UK.

Plale, B. (2013, November). Opportunities and Challenges of Text Mining HathiTrust Digital Library. The National Library of the Netherlands (Koninklijke Bibliotheek), Den Haag, The Netherlands.

Plale, B. (2013, October). Big Data Opportunities and Challenges for IR, Text Mining and NLP. Int’l Workshop on Mining Unstructured Big Data Using Natural Language Processing (MNLP 2013), co-located with ACM Int’l Conference on Information and Knowledge Management, San Francisco, California.

Plale, B. (2013, September). Big Data and Open Access: On Track for Collision of Cosmic Proportions? [Keynote]. 2nd Int’l LSDMA-Symposium - The Challenge of Big Data in Science - with a focus on Big Data Analytics, Karlsruhe, Germany.

Plale, B., McDonald, R., & Chen, M. (2013, September 6) The HathiTrust Research Center (HTRC): Exploration of the World’s First Massive Digital Library. Digital HPS (History and Philosophy of Science) workshop, Indiana University, Bloomington.

Plale, B., Sun, Y. (2013, May 7). Digital Humanities at Scale: HathiTrust Research Center. University of Notre Dame, South Bend, Indiana. 

Ruan, G., Zhang, H., & Plale, B. (2013, July 22-25). ‘Exploiting MapReduce and data compression for data-intensive applications’,  The Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery. San Diego, California.

Ruan, G., Zhang, H., & Plale, B. (2013, July 22-25). Exploiting MapReduce and data compression for data-intensive applications. The Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, San Diego, California.

Sun, Y., S. T. Kowalczyk, B. Plale, J. S. Downie, L. Auvil, B. Capitanu, K. Hess, Z. Peng, G. Ruan, A. Todd, et al. (2013, July 16-19). Architecture to enable large-scale computational analysis of millions of volumes. Digital Humanities, University of Nebraska–Lincoln. 

2012

 

Publications

Kouper, I. (2012, December). CLIR/DLF Digital Curation Postdoctoral Fellowship – The Hybrid Role of Data Curator. The Bulletin of the American Society of Information Science and Technology, 39(2), 46-47.

 

Presentations

Allen, C. (2012, September). Digging Into Debating. HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

Auvil, L. (2012, September). SEASR Analytics for HathiTrust Research Center. HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

Downie, J.S. (2012, September). HathiTrust Research Center: Pushing the frontiers of large scale text analytics. Japanese Association for Digital Humanities Conference (JADH 2012), Tokyo, Japan.

Downie, J.S., Poole, M.S., Plale, B., & McDonald, R.H. (2012, September). Toward non-consumptive formal evaluation challenges using the HathiTrust Research Center digital collections. Japanese Association for Digital Humanities Conference (JADH 2012), Tokyo, Japan.

Kowalczyk, S. (2012, September). Harnessing the HTRC Infrastructure: Building Collections and Analyzing Data. HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

Kowalczyk, S. & York, J. (2012, September). HTRC Data and Collections Overview. HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

Kowalczyk, S. T. (2012, November 4-5). Digital Humanities At Scale: Hathi Trust Research Center. 2012 DLF Forum, Denver, Colorado. 

Kowalczyk, S., Plale, B., & Auvil, L. (2012, September). HTRC Demonstrations of Capability. HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

Kowalczyk, S., Unsworth, J., Plale, B., McDonald, R.H., & Sun, Y. (2012, April). The HathiTrust Research Center: An Overview. Indiana University, Bloomington, Indiana.

McDonald, R. H. (2012, September). The HathiTrust Research Center: The Fast Version. Library of Congress Symposium on Designing Storage Architectures for Digital Collections, Washington, DC.

McDonald, R. H., Plale, B., Downie, J.S., Sun, Y., Auvil, L., Kowalczyk, S., Capitanu, B., Todd, A., Hess, K., Zeng, J., et al. (2012, September). HTRC Architecture Overview. HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

Plale, B. (2012, February). Digital Humanities at Scale: HathiTrust Research Center. University of Maryland, hosted by MITH and the Libraries.

Plale, B. (2012, May). Facilitating Large-scale Analysis of Scholarly Archives (Digital Humanities at Scale). Workshop on Big Data Benchmarking (WBDB2012), San Jose, California.

Plale, B. (2012, November). Information Analysis at Scale: HathiTrust Research Center. The International Conference on High Performance Computing, Networking, Storage and Analysis (SC12), Salt Lake City, Utah. 

Plale, B. (2012, November). Invited Talk: Big Data in Computer Science. Celebration of Women in Computing, Las Cruses, New Mexico.

Sun, Y. (2012, September). HathiTrust Research Center API Overview (API in Detail Part 1), HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

Sun, Y. (2012, September). HTRC Data API in Detail Part II (Developing Algorithms with the API). HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

Underwood, T., J. Sellers, M. Black, H. Green, L. Auvil, & B. Capitanu. (2012, September). Using HathiTrust Texts for Literary Research. HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

Unsworth, J., Sandore Namachchivaya, B., & McDonald, R.H. (2012, December). The HathiTrust Research Center: Opening up the Elephant for New Knowledge Creation. CNI Fall 2012 Membership Meeting, Washington, D.C.

Wilkin, J. (2012, September). HathiTrust: Putting Research in Context. HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

York, J., & J. S. Downie. (2012, September). Data in Detail. HathiTrust Research Center UnCamp 2012, Indiana University, Bloomington, Indiana.

2011

 

Reports

McRobbie, M.A., Wheeler, B.C., & Stewart, C.A. (2011, June). Indiana University Pervasive Technology Institute Report to the Lilly Endowment, Inc. 30 Month Program Report Dec 1, 2010 - May 31, 2011, Indiana University.

McRobbie, M.A., Wheeler, B.C., Plale, B., & Stewart, C.A. (2011, December). Indiana University Pervasive Technology Institute Report to the Lilly Endowment, Inc. 36 Month Program Report Jun 1, 2011 - Nov 30, 2011. Indiana University. 

Varvel, J. V. E. & Thomer, A. (2011, December). Google Digital Humanities Awards Recipient Interviews Report. CIRSS Report No. HTRC1101, Champaign, IL, Center for Information Research in Science and Scholarship, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign.  

Presentations

Downie, J.S., McDonald, R.H., Fujinaga, I., & Unsworth, J. (2011, June). Big Data, Big Deal. Joint Conference on Digital Libraries, Ottawa, Canada.   

McDonald, R. H. (2011, May). Building a Public Research Center for the HathiTrust Digital Library. Digital Public Library of America Global Interoperability and Linked Data Workshop, University of Amsterdam, Netherlands.

Plale, B. (2011, September). HathiTrust Research Center Briefing.

Plale, B., Prakash, A., Fox, G.C., & McDonald, R.H. (2011, June). The Data Capsule for Non-Consumptive Research, Digital Humanities 2011 (DH2011), Stanford University, Stanford, California.

Sun, Y., Terkhorn, M., & Plale, B. (2011, November). HathiTrust Research Center Architecture Data Subsystem. International Conference for High Performance Computing, Networking, Storage and Analysis, Seattle, Washington.

Sun, Y., Terkhorn, M., & Plale, B. (2011, November). HathiTrust Research Center Architecture User-Facing Services. International Conference forHigh Performance Computing, Networking, Storage and Analysis (Supercomputing), Seattle, Washington.  

Sun, Y., Terkhorn, M., Plale, B., & Kowalczyk, S. (2011, November). HathiTrust Research Center Use Cases. International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), Seattle, Washington. 

Terkhorn, M., Sun, Y., & Plale, B. (2011, November). HathiTrust Research Center Access. International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing), Seattle, Washington.

Unsworth, J., Plale, B., Poole, S., & McDonald, R.H. (2011, June). The HathiTrust Research Center and tool builders. Corpora Space Workshop II, Maryland Institute for Technology in the Humanities, University of Maryland, College Park, Maryland. 

2010 

 

Reports

 

McRobbie, M.A., Wheeler, B.C., & Stewart, C.A. (2010, June). Report to the Lilly Endowment, Inc. 18 Month Program Report Jan 1, 2010 - May 31, 2010, Indiana University.

McRobbie, M.A., Wheeler, B.C., & Stewart, C.A. (2010, June). Indiana University Pervasive Technology Institute Report to the Lilly Endowment, Inc. 13 Month Program Report Jun 1, 2009 - Dec 31, 2009, Indiana University.