...
Although Fourier transforms have been used in the past by other projects for the purpose of smoothing out noise, our aim is different. (Indeed, if we could preserve all frequency bands without breaking the HathiTrust terms of service, we would!) Instead, we use Fourier transforms to create orthogonal representations of fluctuations at different scales, called phasors, which can be added and subtracted in structure-preserving ways. The mathematical properties of phasors make them well suited for the same kinds of algebraic manipulations that allow word word vectors to represent analogies.
Left: word vectors for London, England, Moscow, and Russia. Right: the vector operations representing an analogy.
Just as word vectors allow us to express the idea that Moscow is to Russia as London is to England using a mathematical equation – London - England + Russia = Moscow – phasors might allow us to represent structural analogies between texts, identifying documents that discuss different topics using the same underlying organization.
...
.
What counts as a duplicate?
...