Use this tool to determine if a word should be added to SCOWL based on the frequency in the Google Book's corpus (1980-2008).
Word | Adj. Freq Newness Rank | Normal dict | Large dict similar words | (per million) | should incl | should incl ---------------------|---------------------------|-------------|------------- mitzvoth | 0.0396 0.9 205950 | ** | incl. mitzvot | 7.3x 0.9 68418 | *** | **** mitsvot | 0.5x 1.5 296056 | ** | **
These stats are based on the counts from the 1-grams in Google's Books Ngram dataset for books between 1980 and 2008. The frequency count is without regard to case or dialect marks and does not include words with non-alphabetic characters. The word shown in this report is the best guess at the correct form of the word. For reference the original words found in the corpus are shown after the representative version along with its relative frequency.
The frequency count is adjusted to give more weight to newer words. It is defined as the normal frequency times the newness score when the latter is greater than 1. (When the newness score is less than 1 no adjustment is made). The Newness score is defined as the frequency the word appears between the 5Hyears 2006 and 2008 divided by the frequency the word appears between 1980 and 2008.
The "should incl" score indicates if a word should or should not be considered for inclusion in the given dictionary based on the frequency of the word in the corpus. A word that is already included is labled as "incl.". A word with 5 stars should most likely be included unless there is a good reason not to. A word with 3 stars (***) is still worth considering and a word with 1 star (*) should most likely not be considered.
The acceptance of less common words in a speller dictionary can also depend on if they are any similar words that can mask an incorrect spelling of the more common word. To see a list of these words click on "Show Similar Words" and resubmit the form.
A partial version of the complete list is also available to download at the bottom of this page. It includes all words found in the corpus with a "should incl" score of 3 stars or more for the large dictionary. A version that only includes words not already in the normal size dictionary is also available. Additional reports can fairly easily be generated. Please email me at kevina@gnu.org if interested.
Start Over
Full List (strong candidates)
Full List (3 - 5 star words)
Full List (3 - 5 star words, not already in normal dictionary)