Page Comparison

...

Excerpt
Advanced Collaborative Support (ACS) is a scholarly service at HTRC offering collaboration between external scholars and HTRC staff to solve challenging problems related to HTRC tools and services.

By working together with scholars, we facilitate computational access to HathiTrust Research Center digital tools (HTRC) as well as the HathiTrust Digital Library (HTDL) based on individual scholarly need. ACS will drive innovation at the scholar's digital workbench for enhancing and developing new techniques for use within the HTRC platform.

Calls for proposals to participate in the ACS program go out approximately once per year. For questions, please send an email to acs@hathitrust.org.

Table of Contents

maxLevel	2
minLevel	2

2024 Awardee

Architecture and life: Data-mining and computational analysis of the architectural discourse for comparisons between buildings and human bodies, body systems, and living organisms

Christopher Reinhart (Indiana University)

With the rise of “smart homes,” IoT devices, and machine learning implemented into commercial building automation systems, it is common to hear professional architects and engineers, as well as the broader public, speak of the “brain” of buildings (Erhardt et al., 2022). This is only the most recent of a long lineage of metaphors comparing building systems to human body systems, including the “skin” of a building (envelope), the “skeleton” (structure), the “heart” (pumps), and the “lungs” (ventilation). Comparisons of buildings to human bodies are found in the earliest written work in the Western tradition, Vitruvius’ De Architectura, in which he “puts forth a vision of architecture understood as an appreciation of the human body as its regulating system, based on the ‘optimal proportions of the human body’” (Perez de Vega, 2018). Le Corbusier, one of the most influential architects of the 20th century, wrote of “the conception of a LIVING ORGANISM” (1946) related to engineering/architecture. The all-caps emphasis is Corbu’s, and it is not the only such reference in his writing. His contemporary and rival, Frank Lloyd Wright “thought that a building should function like a cohesive organism, where each part of the design relates to the whole” (“Organic architecture,” 2023). In the mid-20th century, the noted historian Sigfried Giedion writes of “architecture as an organism” and “architecture as an independent organism” (1967). In postwar Japan, “Metabolist architects imagined large-scale megastructures based on biological concepts” (Gardner, 2020). Outside the mainstream, Glenn Howard Small has been exploring his “biomorphic biosphere” idea for decades (Small, L., 2002). These are just several bread crumbs in a long trail leading to contemporary ideas. Biophilic design is an evidence-based approach to design that recognizes humans’ innate affinity for nature and living systems as psychologically hard-wired through evolutionary processes (Kellert et al., 2008, Browning et al., 2014). Biomimicry uses nature as a source for inspiration and “genius” for solving design problems and emulates solutions from other life forms in human technologies to solve the same problems (Benyus 2002). Despite the ubiquity of these comparisons, there has been no research that seeks to catalog and analyze all such comparisons. This study intends to fill that gap by defining and creating a workset representing the architectural discourse and using data-mining and other computational methods, alongside human analysis, to identify, organize, and analyze comparisons between architecture and life, human bodies, and body systems.

2021 - 2023 Awardees

Projects funded by the Andrew W. Mellon Foundation through the Scholar-Curated Worksets for Analysis, Reuse & Dissemination (SCWAReD) grant project.

Mining the Native American Authored Works in HathiTrust for Insights

Kun Lu, Raina Heaton, and Raymond Orr (University of Oklahoma)

This project seeks to compile a collection of Native American authored works in HathiTrust and apply various text mining methods to the collection to reveal the coverage, subjects, perspectives, and writing styles of Native authors. A list of Native authors and their works will be compiled from an existing database created by a member of the project team and from other online resources. This list will be aligned with the HathiTrust digital library to create a workset of Native American authored works in HathiTrust for further analysis. Then, a variety of text mining methods will be used to analyze the subjects, topics, language use, and writing styles of Native American authors. Comparative analysis will be carried out to understand the characteristics of this textual community. The project is expected to develop a database of Native American authors and the bibliographic information of their works, create a reusable workset of Native American authored works in HathiTrust, identify potential gaps in the HathiTrust corpus on this textual community, and provide insights into the characteristics of the community by text mining their works.

The Black Fantastic: Curated Vocabularies, Artifact Analysis and Identification

Clarissa West-White (Bethune Cookman University) and Seretha Williams (Augusta University)

This project focuses on identifying Black Fantastic texts in the HathiTrust Digital Library. The project proposes that characteristics of the Black Fantastic—the cultural production of African Diasporic artists and creators who engage with the intersections of race and technology in their work—exist in historical and current cultural artifacts, including those created by and about future-forward personalities, such as Dr. Mary McLeod Bethune. It builds on previous and ongoing work to create a bibliography of the Black Fantastic that is featured in Third Stone Journal. Works in HathiTrust will be analyzed along with Black Fantastic artifacts from other collections, such as the Dr. Mary McLeod Bethune collection in the Bethune-Cookman University archives. By working across collections, the project will test methods for locating Black Fantastic texts and lives.

Creating Period-Specific Worksets for Latin American Fiction

José Eduardo González (University of Nebraska, Lincoln)

This project seeks to create large datasets to research the history of Latin American fiction and question traditional periodization of this literature by attempting to detect the boundaries between literary periods and subgenre distinctions in Latin American fiction. It will look critically at the techniques for detecting genre distinctions that have developed over the last few years and evaluate how they apply to the particular development of Latin American literary system. While many of the subgenres in the English-speaking literary market such as detective fiction, the Gothic novel, and speculative fiction have followers in Latin America, the genres that have traditionally been considered important for the changes in the literary history of the region are less formulaic and more closely linked to national and regional historical and/or social developments. Instead of attempting to identify canonical documents that typify a genre, this project will examine how documents diverge from a particular canon in order to explore the social and cultural reasons an author might accept or deviate from a dominant style.

The National Negro Health Digital Project: Recovering and Restoring a Black Public Health Corpus

Kim Gallon (Purdue University)

This project draws on HathiTrust’s collection of public health documents on Black health to explore how early twentieth Black public health officials communicated and addressed health disparities that impacted African American communities. The major goal of the project is to create a series of worksets and visualizations that scholars and students of African American health and medicine along with public health experts and physicians can use to deepen historical narratives about Black health that might offer insight into the development of contemporary health communications targeted toward African American communities. The project also establishes some of the research for Technologies of Recovery: Black DH Theory and Praxis, a book in- progress. Finally, the work will fill a gap in the history of African American public health.

2020 Awardees

Read project updates

...

This project leverages HathiTrust’s U.S. Federal Documents Collection to investigate how materials produced by the U.S. federal government document shifts in terminologies of ethnoracial difference. The project will focus on the documents and materials published by the Department of Education (formerly United States Department of Health, Education, and Welfare) and related congressional documents from hearings in specialized subcommittees from 1958 until the present. It will explore how the rhetorics of ethnoracial difference overlapped with the growing allocation of federal resources to postsecondary institutions, particularly Minority Serving Institutions, in the latter half of the 20th century. The start of the National Defense Education Act in 1958 was a watershed moment that signaled the greater engagement of the federal government in higher education.The subsequent passing of the Higher Education Act in 1965, alongside amendments through the 1990s and 2000s, allocated specific federal appropriations to support colleges and universities, including Historically Black Colleges & Universities, Tribal Colleges & Universities, Hispanic Serving Institutions, and Asian American & Native American Pacific Islander Serving Institutions. The project contributes to current work focusing on the history of federal responses to higher education in the United States, and the growing visibility of Minority Serving Institutions as a valuable sector of the postsecondary sector in the United States’ higher education.

2019 Awardees

Read project updates

Building Large-Scale Collections of Genre Fiction

...

This project will develop methods for automatically constructing large-scale collections of genre fiction from HathiTrust. Even, and especially, in digital libraries as large as HathiTrust, it can prove challenging to understand whether the library contains suitable representations of a chosen genre. The researchers plan to focus on collections of speculative fiction novels as a case study, but they intend their solutions to be generalizable. They will identify robust methods for correlating author-title pairs to matching volume sets in HathiTrust. Using these methods in conjunction with lists of novels that were curated by hand, they will build their collections and investigate which works are (over)represented and which are missing. They expect their project will enable scholars to better understand the suitability of studying genre fiction through HathiTrust and highlight underserved author and genre groups. Moreover, the project will result in collections of genre fiction which can be readily reused and reorganized for different lines of humanistic inquiry.

Project report: Building Large-Scale Collections of Genre Fiction: Final Report

Mapping scientific names to the HathiTrust Digital Library

...

This project will create an index of all the scientific names of the Earth’s species found within the HathiTrust corpus. The index, which will likely measure in the hundreds of millions to billions of entries, will consist of a simple link between the scientific name and the volume and page location of that name within HathiTrust. The index will assist in identifying volumes that may be medically relevant, for example by identifying all of the volumes containing the scientific name for the mosquito that carries illnesses such as Zika virus (‘Aedes aegypti’). The index will also allow volumes to be grouped into clusters based on which scientific names they contain to show which taxon (e.g. “mammals”) are most common. This team of researchers has completed similar work across the data of the Biodiversity Heritage Library. Their ACS project will allow them to do cross-corpora comparisons.

Project report: Global Names and the HathiTrust: Towards comprehensive indexing of taxon names in real time

Supporting The Conglomerate Era Project

...

This project furthers the researcher’s investigation into how the conglomeration of the publishing industry changed literature. The results will be included the researcher’s in-progress book titled The Conglomerate Era: A Computational History of Literature in the Age of the Agent. The project explores a set of publisher-based corpora to see whether there are distinctions in what is published by large publishing houses versus independent presses. It will make use of predictive modeling to further the researcher’s existing work to build a computational model of genre that aids in identifying latent patterns in the publishers’ editorial practices.The project will utilize methods such as genre detection through unsupervised modeling; stylistic differentiation through text classification and supervised learning via logistic regressions; and social network analysis with metadata to determine latent literary connections, especially with regard to gender and race of the author.

Project report: The Conglomerate Era Project

Deriving Basic Illustration Metadata

...

This project aims at identifying all pictorial elements in educational texts from 1800-1850 to explore the interplay between progressive education and print media in the early nineteenth century. The resulting research will characterize the extent to which wood engravings and other reprographic materials were shared among educational publishers. The researcher will extract specific features from page images, such as illustration location, using advances in machine learning. The project intends to make use of the process developed to identify pictorial elements to motivate a new metadata field that describes the location and type of illustrations on the page. An ultimate goal of the project is to move toward “machine-read” texts where the data generated by classifiers and dimensionality reduction techniques are bundled as metadata with the corresponding volumes and made available to future research. (“Machine-read” is a term is borrowed from researcher Ben Schmidt.)

Project report: Derived Metadata for Early 19C Illustrations: ACS Grant Final Report

Semantic Phasor Embeddings

...

This project intends to explore a novel way of abstracting and representing textual data that could aid in new ways of discovering and deduplicating items in HathiTrust, detecting and analyzing genre, or analyzing narrative analogies. The project team will investigate the utility of a certain kind of mathematical representation of text documents, called semantic phasor embeddings, that combine a mathematical structure called phasors with data from standard word embeddings (strings of numbers that represent an item). If successful, the vectors could represent documents with a tunable degree of granularity, which could provide an opportunity to share vectors representing copyright-protected without concerns about wholesale text reproduction. The vectors would also carry valuable information about the global ordinal structure of the volumes, so that the items could be queried, clustered, and visualized in a robust way that recognizes similarity not just in the content of the items, but also their structure.

2017 Awardees

Computational Support for Reading Chicago Reading

Robin Burke, John Shanahan, Ana Lucic (DePaul University)

The Reading Chicago Reading team will seek to extend their own research on the “One Book, One Chicago” city-wide reading program by incorporating textual analysis on books chosen for the OBOC program, as well as comparison texts. Further, the resulting textual analysis—including toponym extraction, sentiment analysis, and story arc detection—will be paired with library patron, circulation and demographic data to present a fuller picture about the OBOC program, and the books chosen for inclusion.

Project report: Computational Support for ‘Reading Chicago Reading’

Modeling the History of Book Design

David Bamman and Bjorn Hartmann (University of California, Berkeley)

This project will utilize the HTRC Data Capsule to conduct feature extraction on page images from 10,000 in-copyright books in the HathiTrust repository, extracting features such as page construction, line justification, leading between baselines, kerning between letter pairs/combinations, line density per page, characters per line, position of images, typeface (serif, sans-serif) and font size. Beyond the analysis and utility of the extracted feature set, this project also seeks to serve as a use case for engagement with HathiTrust/HTRC beyond books-as-strings-of-words analysis.

Project report: Modeling the History of Book Design, HTRC Whitepaper: Summary of Activities

The Power of Place: Structure, Culture, and Continuities in U.S. Women’s Movements

Laura Nelson (Northeastern University)

Dr. Nelson’s project will study the women's movement in the United States from 1848-1975 in two cities, New York City and Chicago, using new advances in network analysis and computational text analysis to identify structural and cultural diversity. This approach is three-pronged: building a workset of writing by individuals and organizations within the movements in New York and Chicago, using network analysis to measure the structure of this movement, and conducting computational text analysis to measure the underlying culture and ideas within the movement, including lexical analyses to identify distinctive words and topic modeling to identify dominant themes.

Project report: The Power of Place: Structure, Culture, and Continuity in U.S. Women's Movements

A Computational History of the U.S. Novel, 1950-2000

Richard Jean So (McGill University)

Dr. So’s project seeks to write a new history of the American novel by examining a series of large textual datasets focused on the full cycle of the U.S. literary field from production to reception to canonization. The major goal is to identify the emergence of new patterns of language, style, discourse and themes in American novels as they appear at different moments in the cycle of literary production and reception, including publication via large publishing houses such as Random House, and book reviews in major U.S. periodicals. This will be achieved through using the HTRC Data Capsule environment to undertake text analysis of full texts, including using various methods in Machine Learning and Natural Language Processing, such as topic models, word embeddings, and specialized tools such as BookNLP, which allows for the extraction of grammatical dependencies and characters.

Project report: A Computational History of the U.S. Novel, 1950-2000

Measuring Literary Novelty

Laura McGrath, Devin Higgins, and Arend Hintze (Michigan State University)

This work draws on ongoing collaborative efforts to develop a method for applying genetic sequencing tools to the study of literature in order to identify and measure literary novelty, and address questions of literary history, canonicity, and prestige. Previous results have been suggestive of a prominent connection between the purely information-based novelty of the sequences of characters that comprise literary texts, and the experimental newness we associate with modernist literary texts. Leveraging the HTRC Data Capsule will offer the potential to apply this theory at scale for the first time, and potentially lead into new research into modernism and the literary history of the 20th century.

A Writer’s Workshop Workset with the Program Era Project (PEP)

Nicholas Kelly, Loren Glass, and Nikki White (University of Iowa)

The PEP team will compile a proof-of-concept workset with, at first, prominent individuals (faculty, staff, students) who were involved with the Iowa Writers’ Workshop (IWW), then produce “style cards” for each author’s works (by volume), based on stylometric data gathered through text analysis of the IWW workset within the HTRC Data Capsule. It is the goal of the project to also create a living workset that can be continually updated for scholars who wish to engage with IWW authors and their writing.

Project report: Program Era Project

Off-Cycle project:

The Life of Words

David-Antoine Williams (The University of Waterloo)

...

Project report: The Life of Wor ds

2016 Awardees

Fighting Fever in the Caribbean: Medicine and Empire, 1650-1902

Mariola Espinosa (University of Iowa)

...

Project report: Fighting Fever in the Caribbean

Inside the Creativity Boom

Samuel Franklin (Brown University)

...

Project report: Inside the Creativity Boom

The Chicago School: Wikification as the First Step in Text Mining in Architectural History

Dan Baciu (Illinois Institute of Technology)

...

Project report: The Chicago School: Evolving Systems of Value

Signal and Noise and Pride and Prejudice: Toward an Information History of Romantic Fiction

Dallas Liddle (Augsburg College)

...

Project report: Signal and Noise and Pride and Prejudice

2015 Awardees

The Trace of Theory

Geoffrey Rockwell (University of Alberta), Laura Mandell (Texas A&M University), Stefan Sinclair (University of Alberta), Matthew Wilkens (University of Notre Dame), and Susan Brown (University of Notre Dame)

...

Versions Compared

Old Version 37

New Version Current

Key

2024 Awardee

Architecture and life: Data-mining and computational analysis of the architectural discourse for comparisons between buildings and human bodies, body systems, and living organisms

2021 - 2023 Awardees

Mining the Native American Authored Works in HathiTrust for Insights

The Black Fantastic: Curated Vocabularies, Artifact Analysis and Identification

Creating Period-Specific Worksets for Latin American Fiction

The National Negro Health Digital Project: Recovering and Restoring a Black Public Health Corpus

2020 Awardees

Read project updates

2019 Awardees

Read project updates

Building Large-Scale Collections of Genre Fiction

Mapping scientific names to the HathiTrust Digital Library

Supporting The Conglomerate Era Project

Deriving Basic Illustration Metadata

Semantic Phasor Embeddings

2017 Awardees

Computational Support for Reading Chicago Reading

Modeling the History of Book Design

The Power of Place: Structure, Culture, and Continuities in U.S. Women’s Movements

A Computational History of the U.S. Novel, 1950-2000

Measuring Literary Novelty

A Writer’s Workshop Workset with the Program Era Project (PEP)

Off-Cycle project:

2016 Awardees

Fighting Fever in the Caribbean: Medicine and Empire, 1650-1902

Inside the Creativity Boom

The Chicago School: Wikification as the First Step in Text Mining in Architectural History

Signal and Noise and Pride and Prejudice: Toward an Information History of Romantic Fiction

2015 Awardees

The Trace of Theory