Document Terms
Document Terms is a table view of term frequencies for each document.
Use it with a Jane Austen corpus or with your own corpus.
Overview
The table view shows the following five data columns by default:
- #: this is the document number (the position of the term's document in the corpus)
- Term: this is the document term
- Count: this is the raw frequency of the term in the document
- Relative: this is the relative frequency of the term in the document (calculated by dividing the raw frequency by the total number of terms in the document and multiplying by 1 million)
- Trends: this is a sparkline graph that shows the distribution of the term within the segments of the document; you can hover over the sparkline to see finer-grained results
Additional columns are available by clicking on the down arrow that appears in the right side of a column header:
- Significance: the significance is measured here as a TF-IDF score, a common way of expressing how important a term is in a document relative to the rest of the corpus
- Z-Score: the Z-Score (or standard score) is a normalized value for the term's raw frequency compared to other term frequencies in the same document (it's the difference between the term's raw frequency and the mean or raw frequencies, divided by the standard deviation of raw frequencies)
By default, the terms with the highest per-document frequencies are shown.
Options
You can specify terms by typing a query into the search box and hitting enter (see Search for more advanced searching capabilities).
Clicking on the options icon allows you to define a set of stopwords to exclude – see the Stopwords for more information.
Spyral
To use Documents in Spyral you can use the following code as a starting point. Modify the config object to modify the visualization.
let config = {
"bins": null,
"columns": null,
"dir": null,
"docId": null,
"docIndex": null,
"query": null,
"sort": null,
"stopList": null,
"termColors": null,
};
loadCorpus("austen").tool("documentterms", config);
Please see Tools.DocumentTerms for more information about configuration.