Skip to main content

The once said

When the statistics meets linguistics the fun begins (I admit my fun might be a bit different from what's generally thought as fun). In corpus linguistics (that's what they call it) there are many interesting phenomenons and emergent patterns that can be observed.

For example any written text usually follows a certain distribution if words where the most common word within the text appears roughly twice as often as the second most common word in that text and so on (this is called the Zipf's law). Traversing this table of word frequencies you'll eventually end up to the lower end of the spectrum, often finding words that only appear once.

Those words that only appear once in the said context all called hapax legomenon, the transliteration from Greek meaning being said once. It can be a single text, a whole biography of an author or even a whole record of a written language where these can be observed. They can be relative common words which just only appear once in a given text, or a rare one that doesn't appear anywhere else.

Another interesting classification of words are the nonce words. Unlike hapax legomena they usually appear quite often in the text they can be found, but can barely be found from anything else. These words are made up for the purpose of expressing something that doesn't (yet) have s term in a given context but hence are usually used often in that context.