HTRC Publications and Presentations

HTRC Publications and Presentations

This is a list of publications and presentations relevant to the work of HTRC and produced by HTRC staff. See also Grant-funded projects for sponsor-funded work by HTRC, and HTRC Research Impact for work produced by others in the scholarly community that make substantial use of HTRC data, tools, and expertise.

1 2025 | 2 2024 | 3 2023 | 4 2022 | 5 2021 | 6 2020 | 7 2019 | 8 2018 | 9 2017 | 10 2016 | 11 2015 | 12 2014 | 13 2013 | 14 2012 | 15 2011 | 16 2010 

2025

Publications

Layne-Worthey, Glen, J. Stephen Downie, Janet Swatscheno, Nikolaus Parulian, Jill Naiman, Benjamin Schmidt, Peter Organisciak, Ted Underwood, and Ryan Dubnicek. “Making More Sense with Machines: Artificial Intelligence at the HathiTrust Research Center.” In: Navigating Artificial Intelligence for Cultural Heritage Organisations, edited by Glen Layne-Worthey, J. Stephen Downie, Lise Jaillant, Claire Warwick, Paul Gooding, and Katherine Aske, 135–166. UCL Press, 2025. https://doi.org/10.2307/jj.24215718.13

Layne-Worthey, Glen. "Copyright Is the Lock; Non-Expressive Fair Use Is the Key: Research with In-Copyright Texts."  In: The Routledge Companion to Libraries, Archives, and the Digital Humanities, edited by Isabel Galina and Glen Layne-Worthey. Routledge, 2025. http://dx.doi.org/10.4324/9781003327738-9

 

 

 

Workshops

Swatscheno, Janet and Ryan Dubnicek. HathiTrust Research Center Digital Humanities Workshop. 83rd Annual College Language Association Convention, Redefining Expansion and Exploration: Black Diasporic Literatures, Cultures, and Pedagogies, Vancouver, WA, 23 April 2025.

Presentations

Debnath, Tanmoy, Rebekah Fitzsimmons, Glen Layne-Worthey, Suzan Alteri, and Sara Schwebel. “Datafying 75 Years of Book Reviews from the Bulletin of the Center for Children’s Books.” Digital Humanities 2025, Lisbon, Portugal, 14-19 July 2025.

Dubnicek, Ryan, Daniel J. Evans, Sarah Griebel, Xiaotong Hu, Glen Layne-Worthey, and J. Stephen Downie. “Making More Sense with Machines.” AI and the Humanities: An Interdisciplinary Symposium, Illinois State University, 16-17 April 2025.

Evans, Daniel. “From Nineteenth-Century Bibliography to Twenty-First-Century Metadata,” Modern Languages Association (MLA). New Orleans, LA. January 2025.

Griebel, Sarah, Glen Layne-Worthey, Ryan Dubnicek, Daniel J. Evans, and J. Stephen Downie. “Strictly Speaking: Character Attribution in Literary Dialogue with Language Models.” Digital Humanities 2025, Lisbon, Portugal, 14-19 July 2025.

Layne-Worthey, Glen, Isabel Galina, Hege Høsøien, Sarah Potvin, Caitlin Christian-Lamb, Nickoal Eichmann-Kalwara, Alex Wermer-Colan, Pamella Lach, and Hilary Richardson. “Libraries & DH: Histories, Perspectives, Prospects Mini-Conference.” Digital Humanities 2025, Lisbon, Portugal, 14-19 July 2025.

Walsh, John, and Glen Layne-Worthey.  “TORCHLITE and the Open Library: Expanding Access and Interactivity for Cultural Analytics.” Colloque Centre de recherche interuniversitaire sur les humanités numériques en l'honneur de Stéfan Sinclair, Université de Montréal, Québec, 10-12 septembre 2025.

2024

Publications

Hu, Yuerong, Zoe LeBlanc, Jana Diesner, et al. (2024). Complexities of leveraging user-generated book reviews for scholarly research: Transiency, power dynamics, and cultural dependency. International Journal on Digital Libraries, 25(317–340). https://doi.org/10.1007/s00799-023-00376-z

Layne-Worthey, Glen, and J. Stephen Downie (eds.) (2024). Journal of Documentation Special Issue: Artificial Intelligence for Cultural Heritage Materials. https://www.emerald.com/insight/publication/issn/0022-0418/vol/80/iss/5

Layne-Worthey, Glen, and J. Stephen Downie (2024). "Special Issue on Artificial Intelligence for Cultural Heritage Materials: Guest Editors' Introduction." Journal of Documentation, v. 80, no. 5, pp. 1025-1030. https://doi.org/10.1108/JD-09-2024-275

Swatscheno, Janet, and Felix Oke (2024). Context Matters: An Introduction to HathiTrust Research Center Tools for Text Analysis. #DLFteach Publications. https://dlfteach.pubpub.org/pub/0a9w5sc5

Presentations

Dubnicek, Ryan and Daniel J. Evans. “Mining the HathiTrust Digital Library: finding and extracting insights from millions of books.” Indiana University Indianapolis Luddy Colloquia Digital Scholarship Series, Indiana University Library, 22 March 2024.

Evans, Daniel J., Clara Belitz, and Catherine Evans. “Computational Gender Flailing: How Queer Identities Glitch Algorithmic Classifications,” The Association for the Computers and the Humanities. Virtual Presentation. November 2024.

Evans, Daniel J., Ryan Dubnicek, and Kadin Henningson. “Gender Reveal 19th Century Style: Finding Hidden Cross-Dressing Narratives in Literature,” iSchool Research Showcase, University of Illinois Urbana-Champaign. November 2024.

Evans, Daniel, and Zoe LeBlanc. “What’s the Issue? Overcoming Copyright and Cataloguing Challenges for Computational Periodicals in the HathiTrust Collections,” CHR 2024: Computational Humanities Research Conference, Virtual Presentation. December 2024.

Evans, Daniel, Arina Melkozernova, Juliann Vitullo, Ryan Dubnicek, and Boris Capitanu. “Telling a Story with Data: Shift in the Mediterranean Diet’s discourse from 1950-2020,” The Association for the Computers and the Humanities.Virtual Lightning Talk. November 2024.

Evans, Daniel, Jill Naiman, and J. Stephen Downie.“Overcoming OCR Inaccuracies in Historic Newspaper Directories: Improving Digital Archaeology and Digital Humanities Data Pipelines,” DH2024: Digital Humanities Conference. Poster. Washington D.C. August 2024.

Lamba, Manika, John A. Walsh, Ryan Dubnicek, Jennifer Christie, J. Stephen Downie, Janet Swatscheno, Deren Kudeki, Glen Layne-Worthey. “TORCHLITE: New, Open Analytical Tools and Infrastructure for a Mega-Scale Digital Library.” Poster presented at 2024 ASIS&T Annual Meeting, Calgary, Canada, 25-29 October, 2024.

Parulian, Nikolaus Nova, Ryan Dubnicek, Sarah Griebel, Glen Layne-Worthey, J. Stephen Downie (2024). “From “Can’t…” to “Cancún”: Fine-tuning spaCy’s Spanish-Language Transformer Model for Better and More User-Friendly Named Entity Recognition” Poster presented at ADHO Digital Humanities Conference 2024, August 6 – 10, 2024, Washington, D.C., USA.

Shang, Wenyi, Yuqi Chen, Ryan Dubnicek, Ryan Cordell, J. Stephen Downie (2024). “Interplays Between Materiality and Content in Book History: Evidence from 16th–19th Century Chinese and English Books.” Paper presented at 2024 ADHO Digital Humanities Conference, August 6 – 10, 2024, Washington, D.C., USA.

Walsh, John. “‘On the dusty shelves of libraries’: Exploring the 19th-century with the HathiTrust Digital Library” (invited talk).  The Research Socity for Victorian Periodicals.  16 February 2024.

Workshops

Swatscheno, Janet, Ryan Dubnicek & Jenny Christie (2023). “History of Black Writing Presents: Introduction to HathiTrust and HathiTrust Research Center.” Workshop presented at the College Language Association 2024 Convention, Memphis, TN, USA, 13 April 2024.

 

2023

Publications

Presentations

Parulian, Nikolaus Nova, Ryan Dubnicek, Daniel J. Evans, Yuerong Hu, Glen Layne-Worthey, J. Stephen Downie, Raina Heaton, Kun Lu, Raymond I. Orr, Isabella Magni, John A. Walsh (2023). “Tuning out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature” In Proceedings of 2023 ASIS&T Annual Meeting, London, UK, 27-31 October, 2023. https://asistdl.onlinelibrary.wiley.com/doi/full/10.1002/pra2.839

Walsh, John, Glen Layne-Worthey, Jacob Jett, Boris Capitanu, Peter Organisciak, J. Stephen Downie. “‘The library is open!’: Open data and an open API for the HathiTrust Digital Library.” (2023) Proceedings of CHR 2023, the Computational Humanities Research Conference. https://ceur-ws.org/Vol-3558/paper7875.pdf

Downie, J. Stephen. "Beyond OCR: Non-Textual Opportunities and Challenges at the HathiTrust Research Center" (invited talk). Workshop on Scaling-up Document Image Understanding. The 17th International Conference on Document Analysis and Recognition (ICDAR). 21-26 August 2023, San José, Califirnia.

Downie, J. Stephen, Glen Layne-Worthey, Peter Simon, Amy Kirchhoff, Matthew Lincoln. "What is Non-Consumptive Data and What Can You Do With It?" NISO Plus 2023. 14 February 2023, virtual.

Dubnicek, Ryan. “Updates from HathiTrust Research Center: (Some of) What We’re Working On.” Wednesday Noon Digital Scholarship Series, Indiana University Library, 18 January 2023.

Dubnicek, Ryan & Ted Underwood (2023). “Piloting A Machine Learning Approach to Identify English-Language Fiction in the HathiTrust Digital Library” Paper presented at 2023 ADHO Digital Humanities Conference, Graz, Austria, 10-14 July, 2023.

Evans, Daniel, and Melissa Ocepek. “Information Behavior Patterns in Mineable Digital Libraries,” iSchool Research Showcase. Poster. University of Illinois Urbana-Champaign. October 2023.

Melkozernova, Arina, Juliann Vitullo, Ryan Dubnicek, Daniel J. Evans, Boris Capitanu (2023). “Telling a Story with Data: shift in the Mediterranean Diet’s discourse from 1950-2020.” Poster presented at CHR 2023: Computational Humanities Research Conference, December 6 – 8, 2023, Paris, France.

Parulian, Nikolaus, Ryan Dubnicek, Daniel Evans, Yuerong Hu, Glen Layne-Worthey, J. Stephen Downie, Raina Heaton, Kun Lu,  Raymond Orr, Isabella Magni, and John Walsh. "Tuning out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature." 86th Annual Meeting of the Association for Information Science and Technology. 27-31 October 2023, London, UK.

Workshops

Swatscheno, Janet, Ryan Dubnicek & Jenny Christie (2023). “HathiTrust Research Center Extracted Features API and Visualization Workshop. Workshop.” Presented at the Code4Lib 2023 Conference, Princeton, NJ, USA, 14 March 2023.

Research Datasets

Ryan Dubnicek, Boris Capitanu, Glen Layne-Worthey, Jennifer Christie, John A. Walsh, J. Stephen Downie (2023). The HathiTrust Research Center BookNLP Dataset for English-Language Fiction. HathiTrust Research Center. https://doi.org/10.13012/d4gy-4g41

2022

 

Publications

Bainbridge, D., Hilbing, G., Jiang, M., Hu, Y., Layne-Worthey, G., & Downie, J. S. (2022). “Study on the Accuracy of OCR and NLP-based Detection of Japanese Text in the HathiTrust Extracted Features V2.0 Dataset.” DH2022 Tokyo. DOI: 10.1007/978-3-030-96957-8_35

Jiang, M., D’Souza, J., Auer, S., et al. (2022). “Evaluating BERT-based scientific relation classifiers for scholarly knowledge graph construction on digital library collections.” International Journal on Digital Libraries, 23(2), 197–215. https://doi.org/10.1007/s00799-021-00313-y

Parulian, N. N., Worthey, G., & Downie, J. S. (2022). “An Ensemble Framework for Dynamic Character Relationship Sentiment in Fiction.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13192 LNCS, (pp. 414-424).

Parulian, Nikolaus Nova, Ryan Dubnicek, Glen Layne-Worthey, Daniel J. Evans, John A. Walsh, J. Stephen Downie (2022). “Uncovering Black Fantastic: Piloting A Word Feature Analysis and Machine Learning Approach for Genre Classification” In Proceedings of 85th Annual Meeting of the Association for Information Science & Technology, Pittsburgh, Pennsylvania, USA, 29 October - 1 November, 2022. https://doi.org/10.1002/pra2.620

Shang, W., Jett, J., Underwood, T., & Downie, J. S. (2022). “Descriptive cataloging issues for non-Western corpora: A case study of late imperial Chinese books.” Cataloging & Classification Quarterly, 61(1), 1–19. https://doi.org/10.1080/01639374.2022.2148800

Presentations

Dubnicek, R. (2022, January 11). “Where to Find Millions of Books and How to ‘Read’ Them: HathiTrust and HTRC.” University of Washington Digital Humanities Colloquium.

Dubnicek, R., Magni, I., Walsh, J.A., Downie, J.S., Graham, M., Layne-Worthey, G. (2022, June 2-3). Scholar-Curated Worksets for Analysis, Reuse & Dissemination (SCWAReD) from the HathiTrust Research Center [Poster]. 2022 Digital Humanities Benelux, University of Luxembourg, Belval, Luxembourg.

Dubnicek, R., Harrison, J., Magni, I. Walsh, J. A., Graham, M., Downie, J. S., & Layne-Worthey, G. (2022). “SCWAReD: Scholar-Curated Worksets from the HathiTrust Research Center.” Digital Humanities Congress, University of Sheffield, Sheffield, United Kingdom, 9 September 2022.

Lu, K., Heaton, R., Orr, R., Vetter, A., Dubnicek, R., Magni, I. (2022, July, 25-29). Mining the Native American Authored Works in HathiTrust for Insights,” Digital Humanities 2022 Conference, Virtual. Tokyo, Japan.

Magni, I., Worthey, G.C., Graham, M., Walsh, J.A., Downie, J.S., Dubnicek, R. (2022, July 25-29). Centering the Marginalized: Scholar-Curated Worksets from the HathiTrust Digital Library [Poster]. Digital Humanities 2022 Conference,Virtual & Tokyo, Japan.

Parulian, N. N., Dubnicek, R., Worthey, G., Evans, D. J., Walsh, J. A., Downie, J. S. (2022, October 29-November 1). Uncovering Black Fantastic: Piloting A Word Feature Analysis and Machine Learning Approach for Genre Classification [Paper]. 85th Annual Meeting of the Association for Information Science & Technology, Pittsburgh, Pennsylvania.

Parulian, N. N., Dubnicek, R., Layne-Worthey, G., Williams, S, West-White, C., Magni, I., Downie, J. S. (2022, July 25-29). Uncovering the Black Fantastic: Piloting Text Similarity Methods for Finding “Lost” Genre Fiction in HathiTrust [Poster]. Digital Humanities 2022, Tokyo, Japan/virtual.

Walsh, J. A. (2022, May 25). Case study: HathiTrust Research Center . Invited presentation at the Text and Data Mining Conference, National International Standards Organization, Baltimore, Maryland.

Walsh, J. A., Wingate, A., Nurkkala, C., & Christie, J. (2022, September 9). Nineteenth-Century Poets and Their Libraries. Digital Humanities Congress, University of Sheffield, Sheffield, United Kingdom.

Walsh, J. A., Wingate, A., Nurkkala, C., Evans, D., Mertka, A., & Christie, J. (2022, June 2-3). “Bibliographic and textual studies and the personal library.” Paper presented at Digital Humanities Benelux, University of Luxembourg, Belval Campus, Esch-sur-Alzette, Luxembourg.

Workshops

Dubnicek, R., Christie, J. Kudeki, D., Layne-Worthey, G., Walsh, J. A., Downie, J. S. (2022). Workshop: HathiTrust Research Center’s Extracted Features 2.0 Dataset. Workshop presented at the Digital Humanities conference, Tokyo, Japan, 25-29 July 2022.

 

 

2021

 

Publications

Jiang, M., Hu, Y., Worthey, G., Dubnicek, R., Underwood, T., & Downie, J.S. (2021). Evaluating BERT's Encoding of Intrinsic Semantic Features of OCR'd Digital Library Collections, 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), (pp. 308-309) DOI: 10.1109/JCDL52503.2021.00045

Jiang, M., Hu, Y., Worthey, G., Dubnicek, R., Underwood, T., & Downie, J. S. (2021). Impact of OCR Quality on BERT Embeddings in the Domain Classification of Book Excerpts. CEUR Workshop Proceedings, 2989, 266-279. https://ceur-ws.org/Vol-2989/long_paper43.pdf

Jiang, Ming, Yuerong Hu, Glen Layne-Worthey, Ryan Dubnicek, Ted Underwood, J. Stephen Downie. “Impact of OCR Quality on BERT Embeddings in the Domain Classification of Book Excerpts.” In Proceedings of CHR 2021: Computational Humanities Research Conference, vol. 1613, pp 0073. November 17–19, 2021, Amsterdam, The Netherlands. Available: http://ceur-ws.org/Vol-2989/long_paper43.pdf

Organisciak, P., & Downie, J. S. (2021). Research access to in-copyright texts in the humanities. In Information and Knowledge Organisation in Digital Humanities (1st ed., pp. 21). Routledge. https://doi.org/10.4324/9781003131816

Organisciak, P., Schmidt, B. M., & Downie, J. S. (2021). Giving shape to large digital libraries through exploratory data analysis. Journal of the Association for Information Science and Technology. DOI: 10.1002/asi.24547

Parulian, N. N., & Worthey, G. (2021). Identifying Creative Content at the Page Level in the HathiTrust Digital Library Using Machine Learning Methods on Text and Image Features. In K. Toeppe, H. Yan, & S. K. W. Chu (Eds.), Diversity, Divergence, Dialogue (pp. 478-489). Springer International Publishing. DOI: 10.1007/978-3-030-71292-1_37

Samberg, Rachael, Scott Althaus, David Bamman, Sara Benson, Brandon Butler, Beth Cate, Kyle K. Courtney, Eleanor Dickson Koehl, Glen Worthey, et al. (2021) Building Legal Literacies for Text Data Mining. eScholarship, University of California, 2021.  https://berkeley.pressbooks.pub/buildinglltdm/

Presentations

Dubnicek, R. (2021, December 15). Introduction to HathiTrust, HTRC, and the HTRC Extracted Features Dataset. Guest lecture for Digital Humanities: Tools & Methods MA course, University of Groningen.

Dubnicek, R. (2021, October 27). Where to Find Millions of Books and How to “Read” Them: HathiTrust and HTRC. Institute of Advanced Study, Princeton University.

Research Datasets

Tutorials

Jiang, M., Hu, Y., Worthey, G., Dubnicek, R., & Downie, J. S. (2021). The Gutenberg-HathiTrust Parallel Corpus: A Real-World Dataset for Noise Investigation in Uncorrected OCR Texts. iConference 2021 Proceedings. http://hdl.handle.net/2142/109695d

 

Dubnicek, R. & Kudeki, D. (2021, September 27-30). Introduction to and Hands-On Use Cases with HathiTrust Research Center’s Extracted Features 2.0 Dataset. Tutorial at ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2021, Virtual.

2020

 

Publications

Chang, K., Hu, Y., Shang, W., Sharma, A., Singhal, S., Underwood, T., Witte, J., & Wu, P. (2020, July 22-24). Book Reviews and the Consolidation of Genre. DH2020 Proceedings, Ottawa (virtually). DOI: 10.17613/02q2-1v27

Hu, Y., Jiang, M., Underwood, T., & Downie, J. S. (2020). Improving Digital Libraries’ Provision of Digital Humanities Datasets: A Case Study of HTRC Literature Dataset. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 405–408. DOI: 10.1145/3383583.3398621

Parulian, N.N., Dubnicek, R., Hall, K.E., Hu, Y., & Downie, J.S. (2020). Evaluating a Machine Learning Approach to Identifying Expressive Content at Page Level in HathiTrust. DH2020 Proceedings, Carleton University and the University of Ottawa, Ottawa, Canada. DOI: 10.17613/3nfw-tx25

Sharma, A., Hu, Y., Wu, P., Shang, W., Singhal, S., & Underwood, T. (2020). The rise and fall of genre differentiation in english-language fiction. CEUR Workshop Proceedings, 2723, 97–114. http://ceur-ws.org/Vol-2723/long27.pdf

Presentations

Bainbridge, D., Downie, J. S., & Whaanga, H. (2020). An open data approach to revealing indigenous texts in large-scale digital repositories: A case-study of locating pages of Māori text in the HathiTrust. ADHO 2020. [abstract]

Dubnicek, R. Where to Find Millions of Books and How to “Read” Them: HathiTrust and HTRC. New Jersey Digital Humanities Consortium, 10 September 2020.

Jett, J., Capitanu, B., Kudeki, D., Dubnicek, R., Cole, T.W., & Downie, J.S. (2020, July). Extending the Utility of the HTRC Extracted Features Dataset Through Linked Data [Poster]. Digital Humanities Conference 2020, Ottawa, Canada. [abstract]

Jett, J., Kudeki, D., Worthey, G., Cole, T. W., & Downie, J. S. (2020). Applying BIBFRAME in large-scale digital libraries: The HathiTrust Research Center's experience. Proceedings of the Association for Information Science and Technology, 57, e410. https://doi-org.proxy.lib.umich.edu/10.1002/pra2.410

Parulian, N. N., Dubnicek, R., Eden, K., Hu, Y., & Downie, S. (2020, July 22-24), Evaluating a machine learning approach to identify expressive content at page level in HathiTrust [Conference proceeding]. Digital Humanities 2020, Carleton University and the University of Ottawa, Ottawa, Canada. [abstract]

Wong, J. & Dubnicek, R. (2020, March). Piloting a Workflow for Extracting Author Citations in Samuel Johnson’s Dictionary of the English Language [Poster]. iConference 2020, Borås, Sweden.

Datasets

Jett, J., Capitanu, B., Kudeki, D., Cole, T., Hu, Y., Organisciak, P., Underwood, T., Koehl, E., Dubnicek, R., Downie, J.S. (2020). The HathiTrust Research Center Extracted Features Dataset (2.0). HathiTrust Research Center. DOI: 10.13012/R2TE-C227

 

 

2019

 

Publications

Bainbridge, D., Nichols, D. M., Hinze, A., & Downie, J. S. (2019). Using the HTRC Data Capsule Model to Promote Reuse and Evolution of Experimental Analysis of Digital Library Data: A Case Study of Topic Modeling. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), (pp. 463-464).

Weigl, D., Kudeki, D., Cole, T. Downie, J., Jett, J., & Page, K. (2019). Combine or connect: Practical experiences querying library linked data. Proceedings of the 82nd Annual ASIS&T Meeting, 56(1), 296-305.

Plale, B., Dickson, E., Kouper, I., Liyanage, S. H., Ma, Y., McDonald, R. H., Walsh, J. A., & Withana, S. (2019). Safe open science for restricted data. Data and Information Management, 3(1), 50-60.

Presentations

Dickson Koehl, E., Green, H., Henley, A., and Heidenwolf, T. (2019, April). Empowering Librarians to Support Digital Scholarship Research: Professional Development Training on Text Analysis with the HathiTrust. Association of College and Research Library Conference, Cleveland, Ohio.

Downie, J. S., Bainbridge, D., Dubnicek, R. (2019, October 20). Data Without Borders: Exploring International Collaborations with HathiTrust Research Center. International Incubator Session at  ASIS&T 82nd Annual Meeting, Melbourne, Australia.

Furlough, M., & Walsh, J. A. (2019). Shaping the market: developing scalable, researcher-oriented text and data mining service. Paper presented at the DCDC (Discovering Collections, Discovering Communities) Conference, Library of Birmingham, Birmingham, UK, November 14, 2019.

Koehl, E. D., Green, H., Henley, A., & Heidenwolf, T. (2019, April). Empowering Librarians to Support Digital Scholarship Research: Professional Development Training on Text Analysis with the HathiTrust [Working Paper]. Association of College and Research Library Conference. Cleveland, OH.

Tutorials

 

Koehl, E. & Dubnicek, R. (2019, June 6). Text mining with HathiTrust [Tutorial]. ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2019, Urbana, Illinois.

 

 

2018

 

Publications

Bainbridge, D., Downie, J. S., & Capitanu, B. (2018). Providing Pin-point Page-level Precision to 1 Trillion Tokens of Text for Workset Creation. Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 407-408).

Dickson, E, Green, H., Nay, L., Courtney, A., McDonald, R. (2018). HathiTrust Research Center User Requirements Study White Paper.

Downie, J. S., Lorang, E., Soh, L.-K., Bainbridge, D., McIntyre, S., & Page, K. (2018). At the Nexus of Data and Collections: New Affordances in the Age of Mass-Scale Digital Libraries. Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 313–314).

Dubnicek, R., Underwood, T., & Downie, J.S. (2018, November 10-14). Creating A Disability Corpus for Literary Analysis: Pilot Classification Experiments. Proceedings of the iConference 2018, Sheffield, United Kingdom.Fenlon, K., Jett, J., Dubnicek, R., Cole, T.W., & Kudeki, D. (2018). Exploring linked data benefits for digital library users. Proceedings of the 81st ASIS&T Annual Meeting, Vancouver, Canada.

Hinze, A., Bainbridge, D., Cunningham, S. J., Taube-Schock, C., Matamua, R., Downie, J. S., & Rasmussen, E. (2019). Capisco: Low-cost concept-based access to digital libraries. International Journal on Digital Libraries, 20(4), 307-334.

 Hinze, A., Bainbridge, D., Wilkins, R., Taube-Schock, C., & Downie, J. S. (2018). Seeding strategies for semantic disambiguation. In Proceedings of the 18th ACM/IEEE-CS on Joint Conference on Digital Libraries (JCDL '18). ACM, New York, NY (pp. 343-344).

Page, K. R., Jett, J., Cole, T. W., Kudeki, D., Bainbridge, D., Organisciak, P., & Downie, J. S. (2018). Worksets Expand the Scholarly Utility of Digital Libraries. Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 371-372).

 

Presentations

Bainbridge, D., Downie, J.S., Capitanu, B. (2018, June). Providing Pin-point Page-level Precision to 1 Trillion Tokens of Text for Workset Creation.  Joint Conference on Digital Libraries, Fort Worth, TX.

Dickson Koehl, E., et al. (2018, October). Empowering Librarians to Support Digital Scholarship Research: The "Digging Deeper, Reaching Further" project. Digital Library Federation Forum 2018, Las Vegas, NV.

Downie, J.S. (2018, February 13). Creating universal open access to closed textual data at scale: Use cases from the HathiTrust Research Center. Invited talk to Graduate School of Library, Information and Media Studies, University of Tsukuba, Tsukuba, Japan.

Downie, J.S. (2018, January 8). Creating universal open access to closed textual data at scale: Use cases from the HathiTrust Research Center. Invited talk to Department of Computer Science, University of Waikato, Hamilton, New Zealand.

Downie, J.S. (2018, March 16). Creating universal open access to closed textual data at scale: Use cases from the HathiTrust Research Center. Invited lecture to University of Denver Library, Denver CO.

Downie, J.S. (2018, March 23). Creating universal open access to closed textual data at scale: Use cases from the HathiTrust Research Center. Invited lecture to Research Center for Machine Learning, City University of London, London, UK.

Downie, J.S., Lorang, E., Soh, L., Bainbridge, D., McIntyre, S., Page, K. (2018, June). At the Nexus of Data and Collections: New Affordances in the Age of Mass-Scale Digital Libraries. Joint Conference on Digital Libraries, Fort Worth, TX.

Furlough, M., Green, H., Butler, B. (2018, October). HathiTrust and Non-consumptive Research Services: Prospects. Digital Library Federation Forum 2018, Las Vegas, NV.

Page, K.R., Jett, J., Cole, T.W., Kudeki, D., Bainbridge, D., Organisciak, P., Downie, J.S. (2018, June). Worksets Expand the Scholarly Utility of Digital Libraries. Joint Conference on Digital Libraries, Fort Worth, TX.

2017

 

Publications

Bainbridge, D. & Downie, J.S. (2017). All for One and One for All: Reconciling Research and Production Values at the HathiTrust through User-Scripting. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Toronto, ON, (pp. 1-2). DOI: 10.1109/JCDL.2017.7991591

Bhattacharyya, S., Merrill, C., Organisciak, P., Schmidt, B. M., Auvil, L., Aiden, E., & Downie, J. S. (2017). Big-Data Oriented Text Analysis for the Humanities: Pedagogical Use of the HathiTrust+Bookworm Tool. DH 2017, Montreal, Canada.

Dickson, E., Tracy, D.G., McIntyre, S. Glushko, B., McDonald, R.H., Butler, B., & Downie, J.S. (2017).  Creating a Policy Framework for Analytic Access to In-Copyright Works for Non-Consumptive Research. DH 2017, Montreal, Canada.

Green, H., & Dickson, E. 2017. Expanding the Librarian's Tech Toolbox: The "Digging Deeper, Reaching Further: Librarians Empowering Users to Mine the HathiTrust Digital Library Project. D-Lib Magazine. DOI: 10.1045/may2017-green

McDonald, R.H. (2017). Research Center as Distant Publisher: Developing Non-Consumptive Compliant Open Data Worksets to Support New Modes of Inquiry. DH 2017, Montreal, Canada.

Murdock, J., Allen, C., Börner, K., Light, R., McAlister, S., et al. 2017. Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library. PLOS ONE 12(9): e0184188. DOI: 10.1371/journal.pone.0184188

Murdock, J., Jett, J., Cole, T., Ma, Y., Downie, J.S., & B. Plale. (2017). Towards Publishing Secure Capsule-Based Analysis. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Toronto, ON, (pp. 1-4). DOI: 10.1109/JCDL.2017.7991585

Organisciak, P., Capitanu, B., Underwood, T., & Downie, J. S. (2017). Access to billions of pages for large-scale text analysis. IConference 2017. http://hdl.handle.net/2142/96256

Page, K., Nurmikko-Fuller, T., Cole, T., & Downie, J.S. (2017). Building Worksets for Scholarship by Linking Complementary Corpora. DH 2017, Montreal, Canada.

Pustejovsky, J., Verhagen, M., Rim, K., Ma, Y., Ran, L., Liyanage, S., Murdock, J., McDonald, R. H., & Plale, B. (2017). Enhancing Access to Digital Media: The Language Application Grid in the HTRC Data Capsule. Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, (pp. 1-3). DOI: 10.1145/3093338.3104171

Weigl, D. M., Page, K. R., Organisciak, P., & Downie, J. S. (2017). Information-Seeking in Large-Scale Digital Libraries: Strategies for Scholarly Workset Creation. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), (pp. 1-4). DOI: 10.1109/JCDL.2017.7991583

 

Presentations

Downie, J.S. (2017, July 10). Digital humanities using both closed and open data: Use cases from the HathiTrust Research Center. Invited lecture to King’s Digital Lab, King’s College, London, UK.

Downie, J.S. (2017, November 16). HathiTrust Research Center: Text mining the very big data of the HathiTrust Digital Library. Invited keynote lecture to CLICK! Connecting Libraries, Information, and Community Knowledge Conference, Ateneo de Manila University in Quezon City, Philippines.

Downie, J.S. (2017, October 30). HathiTrust Research Center: Strategic approaches to opening research opportunities on closed data. Invited lecture to Shanghai Customs College, Shanghai, China.

Downie, J.S. (2017, September 15). HathiTrust Research Center: Strategic approaches to opening research opportunities on closed data. Invited lecture to International Institute for Digital Humanities, University of Tokyo, Tokyo, Japan.

Downie, J.S. (2017, September 18). HathiTrust Research Center: Strategic approaches to opening research opportunities on closed data. Invited lecture to Institute for Digital Research in the Humanities, University of Kansas, Lawrence, Kansas.

Dubnicek, R. & Organisciak, P. (2017, October 26-27). Data Capsule in 7 Minutes. NovelTM Workshop 2017, Montreal, Canada.

Green, H. Reducing barriers to participation in automated text analysis in the humanities. (2017, January). (Respondent: Sayan Bhattacharyya.) 132nd Annual Convention of the Modern Language Association (MLA), Philadelphia. Google Doc

Hu, X., Chu, S. K. W., Downie, J. S., & Lee, C. W. Y. (2017, March). Data Science as an Emerging Discipline: The Roles of iSchools in the Era of Big Data. Proceedings of Information Science to Data Science: New Directions for iSchools Workshop in iConference, Wuhan, China.

Liyanage, S. & Murdock, Jaimie. (2017, March). HTRC Data Capsule and Python SDK. User group meeting. Google Slides

Liyanage, S., Organisciak, P., Downie, S. (2017, January). HathiTrust Research Center Architecture Overview. Web-conference #3 for HathiTrust all-sites staff. Google Slides

Jett, J., Cole, T. W., & Downie, J. S. (2017). Exploiting graph‐based data to realize new functionalities for scholar‐built worksets. Proceedings of the Association for Information Science and Technology, 54(1), 716-717. https://doi.org/10.1002/pra2.2017.14505401128

Murdock, J., Jett, J., Cole, T.W., Ma, Y., Downie, J.S., & Plale, B. (2017). Towards Publishing Secure Capsule-based Analysis. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

Organisciak, P., & Franklin, S. (2017). Modeling creativity: Tracking long-term lexical change. In Digital Humanities Conference.

Peng, Z., & Plale, B. (2019). Reliable access to massive restricted texts: Experience‐based evaluation. Concurrency and Computation: Practice and Experience, 32(16). https://doi.org/10.1002/cpe.5255

2016

 

Publications

Downie, J.S., Furlough, M., McDonald, R.H., Namachchivaya, B., Plale, B.A., & Unsworth, J. (2016, May/June). The HathiTrust Research Center: Exploring the Full-Text Frontier, EDUCAUSE Review 51(3), 50-51.

Green, H., Dickson, E., Nay, L., & Zegler-Poleska, E. (2017). Scholarly Needs for Text Analysis Resources: A User Assessment Study for the HathiTrust Research Center. (2016). Proceedings of the Charleston Library Conference.

Hinze, A., Bainbridge, D., Cunningham, S., Downie, J.S. (2016, June). Low-cost Semantic Enhancement to Digital Library Metadata and Indexing: Simple Yet Effective Strategies. Proceedings of JCDL 2016 (pp. 93-102).

Jett, J., Cole, T.W., Maden, C., & Downie, J.S.(2016). The HathiTrust Research Center Workset Ontology: A Descriptive Framework for Non-Consumptive Research Collections. Journal of Open Humanities Data 2, e1.

Murdock, J., Zeng, J., & Allen, C. (2016, January). Towards Evaluation of Cultural-scale Claims in Light of Topic Model Sampling Effects. 2016 International Conference on Computational Social Science.

Organisciak, P., & Capitanu, B. (2016). Text Mining in Python through the HTRC Feature Reader.  Programming Historian.  2016.

Plale, B. (2016, July/August). HathiTrust Research Center Data Capsule for Full-Text Distant Reading. D-Lib Magazine, 22, 7-8.

Zeng, J., & Plale, B. (2015). Workload-Aware Resource Reservation for Multi-tenant NoSQL. 2015 IEEE International Conference on Cluster Computing, 32-41. (Best paper candidate)

Workshops

Bhattacharyya, S. (2016, February 4). Workshop with HT+Bookworm for student teams for 4Humanities Student Prize Contest 'Why is studying the humanities important? Scholarly Commons, Main Library, University of Illinois, Urbana-Champaign.

Cline, N. & Mobley, L. (2016, April 14). Text Mining with the HathiTrust: Empowering Librarians to Support Digital Scholarship Research. Scholarly Commons, Main Library, University of Illinois Urbana-Champaign.

Dickson, E. & Bhattacharyya, S. (2016, January 4). Doing Text Analysis with the HathiTrust Research Center’s Tools. University of Texas at Austin.

Dickson, E. (2016, February 10). Text Analysis Methods and Tools. Brownbag at the Illinois Program for Research in the Humanities. University of Illinois at Urbana-Champaign.

Dickson, E. (2016, March 8). University of Illinois Savvy Researcher workshop on Text Analysis. March 8, 2016.

Dickson, E. & Green, H. (2016, April 14). Text Mining with the HathiTrust: Empowering Librarians to Support Digital Scholarship Research. Scholarly Commons, Main Library, University of Illinois, Urbana-Champaign.

Dickson, E., Cline, N. & Mobley, L. (2016, June 12). Text Analysis with the HathiTrust Research Center. Digital Humanities Summer Institute (DHSI 2016), University of Victoria, Canada.

Dickson, E. & Organisciak, P. (2016, August 15). Text Analysis with the HathiTrust Research Center.' Workshop. Digital Humanities at Berkeley Summer Institute and Berkeley Institute for Data Science workshop. University of California Berkeley.

Green, H. (2016, June 11). Introduction to Text Mining with the HathiTrust Research Center. THATCamp Southern Illinois University Edwardsville.

Datasets

Capitanu, B., Underwood, T., Organisciak, P., Cole, T., Sarol, J.M., Downie, J.S. (2016): The HathiTrust Research Center Extracted Features Dataset. 1.0 [Dataset].  http://dx.doi.org/10.13012/J8X63JT3

Organisciak, P. (2016). Term weights for 235k language and literature texts [Data set]. http://hdl.handle.net/2142/89691

Presentations

Bhattacharyya, S. Text analysis tools in progress from the HathiTrust Research Center. (2016, October 10). Mellon Digital Humanities Seminar, Price Lab for Digital Humanities, University of Pennsylvania, Philadelphia, Pennsylvania.

Organisciak, P., Bhattacharyya, S., Auvil, L., Unnikrishnan, L., Schmidt, B., Shamim, M., McDonald, R., Downie, J., Aiden, E. (2016, July 11-16). Adding Flexibility to Large-Scale Text Visualization with HathiTrust+Bookworm. Digital Humanities 2016, Jagiellonian University & Pedagogical University, Kraków.

Nurmikko-Fuller, T., Jett, J., Cole, T.W., Maden, C., Page, K.R., & Downie, J.S. (2016, July 11-16). A Comparative Analysis of Bibliographic Ontologies: Implications for Digital Humanities. Digital Humanities 2016, Jagiellonian University & Pedagogical University, Kraków.

 Jett, J., Nurmikko-Fuller, T., Cole, T.W., Page, K.R., & Downie, J.S. (2016, June 19-23). Enhancing Scholarly Use of Digital Libraries: A Comparative Survey Review of Bibliographic Metadata Ontologies. Joint Conference on Digital Libraries 2016, Newark, New Jersey.

Downie, J.S. (2016). HathiTrust and the Future of Digital Archive [Keynote Address]. International Symposium at University of Tokyo Ito International Academic Research Center.

McDonald, R.H. (2015, January 25). What’s next with the HathiTrust Research Center? Indiana University StatewideIT Day, September 20, 2016, Bloomington, Indiana.

Bhattacharyya, S. & Shamim, M. (2016, January 8). The HathiTrust+Bookworm Project as a Model for Collaborative Research at Large Scale [Presentation in the panel "Developing and Sustaining Collaborative Research in the Humanities"]. 131st Annual Convention of the Modern Language Association (MLA), Austin, Texas.

Bhattacharyya, S. (2016, January 20). HathiTrust Research Center: Capabilities and Affordances [Presentation made to Stanford University Library digital humanities group and subject specialist librarians' group]. Green Library, Stanford University, Stanford, California.

Organisciak, P. & Bhattacharyya, S. (2016, February 10). New tools from the HathiTrust Research Center for digitized text analysis at scale: The HathiTrust+Bookworm tool and the Extracted Features dataset. E-Research Roundtable, Graduate School of Library and Information Science, University of Illinois Urbana-Champaign, Champaign, Illinois.

Downie, J. Stephen. (2016, February 26). DH Panel: Fair Use and the Future of Digital Scholarship. Scholars Lab, University of Virginia.

Dubnicek, R. & Kinnaman, A. (2016, April). Open Access in Text Analytics [Poster]. iSchool Master’s Student Showcase, Champaign, Illinois.

Downie, J.S. (2016, April 8). The HathiTrust Research Center: Exciting New Cultural Computation Opportunities. Initiative for Digital Humanities, Media, and Culture at Texas A&M University.

Bhattacharyya, S. & Underwood, T. (2016, June). Does Gender Affect How Genre-Conformingly Writers Write? Digital Humanities Summer Institute (DHSI 2016), University of Victoria, Canada.