Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Anchor
searching
searching
Searching

You can search for unigrams (single terms) in both volume metadata or in the text of each volume, by page. Since this is a search built on the Extracted Features Dataset, bigram and larger n-gram searches (phrases) are not possible in the conventional sense. Instead, you can search for phrases using quotations (e.g. "snow ski"), which will return volumes and pages where each term in the query co-occur. In this way, a search for "snow ski" is equivalent to a Solr syntax search of "snow" AND "ski". See more details about Solr syntax, and a link to a guide, in the section below.

Searches are not case sensitive, and by default, your search will be conducted on pages recognized as English. Click “Search all Languages” if you prefer to search everything. Users can also choose specific languages to limit your search to from those that appear under “Show other languages.” Limit your search to a specific part-of-speech by using the checkboxes under the language, though be aware that not all of the languages have the functionality to search by part-of-speech. Wildcard matching is possible using '?' for a single character and '*' for multiple characters. For example 'canad?' and '*land'.

There are four options for searching: text, metadata, combined and advanced. Text search will search the full text of the volume, at the page level, for a unigram or unigrams Searches are not case sensitive, and by default, your search will be conducted on pages recognized as English. Click “Search all Languages” if you prefer to search everything. Users can also choose specific languages to limit your search to from those that appear under “Show other languages.” Limit your search to a specific part-of-speech by using the checkboxes under the language, though be aware that not all of the languages have the functionality to search by part-of-speech. Wildcard matching is possible using '?' for a single character and '*' for multiple characters. For example 'canad?' and '*land'.

There are four options for searching: text, metadata, combined and advanced. Text search will search the full text of the volume, at the page level, for a unigram or unigrams (e.g. searching all volumes for the word "rose"). Results returned are volume-level metadata, along with page-level metadata and bag-of-words tokens. Since this a page-based search, you will receive one result for each page that matches your query. To see results grouped by volume (multiple page results under one volume heading and one result), check the box marked "Sort &Group by Volume" under the search bar. Metadata search will search volume-level metadata fields for given unigrams, and return volumes in which the terms appear in a given (or any) field specified in the drop-down menu (e.g. searching all volumes for those with a publicaton place of "bl" the MARC code for Brazil). A combined search allows both text and metadata search in a given query (e.g. a search to return all volumes published in Brazil in which the term "rose" appeared on a page). Advanced search allows for users familiar with Solr syntax (see below for more information) to construct and execute their own queries(e.g. searching all volumes for those with a publicaton place of "bl" the MARC code for Brazil). A Combined search allows both text and metadata search in a given query (e.g. a search to return all volumes published in Brazil in which the term "rose" appeared on a page). Advanced search allows for users familiar with Solr syntax (see below for more information) to construct and execute their own queries.

When searching the page text it is important to realize that every word you enter is treated as a separate term (a unigram) for the purposes of the query that is performed. Effectively phrase searching the page text is not possible.  This is because Workset Builder is a search interface built on the Extracted Features Dataset where the sequential order of the words has been removed, effectively making it bag of words. The closest approximation is to use the AND operator, for example the query lawn AND tennis will return all pages where both words appear somewhere on the page.  In the case of a hyphenated word, this is processed as single term, and so does present as a phrase in terms of indexing, for example the query "lawn tennis" (in quotes) will find pages where that term appears hyphenated. In the case of volume metadata search, the sequential order to the words is kept.  This means phrase searching is possible across metadata.

Search Text

Text search allows users to search volume query the full text, by page, for unigrams (single terms). A version of phrase searching can be achieved using the same method as described under Search Metadata: using quotation marks to initiate a search for multiple terms on the same page of text. By default, text searches will search English-language volumes. If you'd like to search all languages, check the "Search pages in all languages" button underneath the search bar. Currently, part-of-speech information is only available for volumes in English, German, Portuguese, Danish, Dutch and Swedish. While other languages are coded in volume metadata and thus can be retrieved, there will not be part-of-speech data available for those volumes.

Text searches will retrieve volume-level metadata, but the main unit of search and retrieval is the page. Since many pages in a single volume may contain a given unigram, users may wish to check the "Sort & Group by Volume" button directly beneath the search bar, which will present results by the volume, with a list of pages on which the term appears, as compared to multiple volume entries with a single associated page in the results view. 

Search Metadata

Metadata search is similar to text search, supporting unigram queries across all metadata fields. Search for a single term in all fields by choosing “All Fields” from the drop-down (this is the default metadata search), or search a specific field by selecting it from the menuin the results view. 

Search Metadata

Metadata search allows users to query the catalogue metadata associated with each volume in the corpus (aka volume metadata). Enter a single term, multiple terms, phrases (in quotes), or any combination thereof.  By default, "All fields" is selected in the drop-down menu next to the search box.  Click on the drop-down menu to select more specific fields to search by, such as Title.

To search multiple metadata fields, enter your search query in a format called Solr syntax. For example, a search for “titletitle_t:hamlet AND contributorName_t:shakespeare” shakespeare will return all volumes with “hamlet” in the title field and “shakespeare” in the contributorName field, the latter being the field being used by the cataloger cataloguer to record a personal or corporate name associated with the volume. The same search can be used with an “OR” operand to return volumes that satisfy either condition. Note that there is no space between the colon and the search term. For more information, see this Solr query syntax guide.

...