Folie 22 von 26
At the moment, we still have not enough Yiddish materials from which to demonstrate. Let me do so, instead, with samples from a German archive.
The archive contains the issues of a newspaper over six months.
In a first step we calculated the distribution of all lemmatized words as well as the statistically unexpected distribution of words. We classified the issues according to distributional under-representation and distributional over-representation.
In the next step, we calculate the environment of the wordlist from step 1. We calculate separately the case of under-representation and the case of over-representation. We sort the results according to the statistical evidence.
The statistically detected collocations becomes evident. They cover relevant changes in the outside world, but you also find interesting linguistic phenomena.