ACS awardees: Molly Des Jardin, Scott Enderle, Katie Rawson (University of Pennsylvania)
Order and Scale
Much recent discussion of quantitative research in the humanities has concerned scale. Confronted with the vast quantities of data produced by digitization projects over the last decade, humanists have begun exploring ways to synthesize that data to tell stories that could not have been told before. Our ACS project aims to make that kind of work easier by creating compact, non-expressive, non-consumptive representations of individual volumes as vectors. These vectors will contain information not only about the topics the volumes cover, but also about the way they order that coverage from beginning to end. Our hope is that these representations will allow distant readers to investigate the internal structures of texts at larger scales than have been possible before. But now that we've reached the midpoint of our work, our preliminary results have led to some surprising reflections about scale at much smaller levels.
Order and Scale
In our approach to the problem of creating document vectors, we use existing methods to create word vectors, and we then aggregate the vectors for each word in a given text. A simpler method than ours might aggregate the vectors by averaging them together, losing word order information; a more complex method than ours might aggregate the vectors by passing them through a neural network model, producing powerful but opaque document representations. To preserve both word order information and transparent, interpretable document features, we pass the vectors through a Fourier transform, and preserve the top ten frequency bands.
...