Corpus Collocates
Corpus Collocates is a table view of which terms appear more frequently in proximity to keywords across the entire corpus.
Use it with a Jane Austen corpus or with your own corpus.
Overview
The table view shows the following three columns by default:
- Term: this is the keyword (or keywords) being searched
- Collocate: these are the words found in proximity of each keyword
- Count (context): this is the frequency of the collocate occurring in proximity to the keyword
An additional column can be shown to display Count which is the frequency of the keyword term in the corpus – see the Grids guide for more information.
By default, the most frequent collocates are shown for the 10 most frequent keywords in the corpus.
Options
You can add specify which keyword to use by typing a query into the search box and hitting enter (see Search for more advanced searching capabilities).
There is also a slider that determines how much context to consider when looking for collocates. The value specifies the number of words to consider on each side of the keyword (so the total window of words is double). By default the context is set to 5 words per side, and the slider can have a minimum of 1 and a maximum of 30.
Clicking on the options icon also allows you to define a set of stopwords to exclude – see the Stopwords for more information.
Spyral
To use Catalogue in Spyral you can use the following code as a starting point. Modify the config object to modify the visualization.
let config = {
columns: null, // 'term', 'rawFreq', 'contextTerm', 'contextTermRawFreq'
context: null, // The number of terms to consider on each side of the keyword.
dir: null, // The direction in which to sort the results: 'asc' or 'desc'
docId: null, // The document ID(s) to restrict the results to.
docIndex: null, // The document index(es) to restrict the results to.
query: null, // A query or array of queries (queries can be separated by a comma).
sort: null, // The column to sort the results by
stopList: null, // A comma-separated list of words, a named list or a URL to a plain text list, one word per line. By default this is set to 'auto' which auto-detects the document's language and loads an appropriate list (if available for that language). Set this to blank to not use the default stopList.
termColors: null, // Which term colors to show in the grid. By default this is set to 'categories' which shows the term color only if it's been assigned by a category. The other alternatives are 'terms' which shows all terms colors, and '' or undefined which shows no term colors.
};
loadCorpus("austen").tool("CorpusCollocates", config);
Please see Tools.CorpusCollocates for more information about configuration.
Additional Information
For a graphical view of corpus collocates, try the Collocates Graph tool.