CryptologyThe Chi Distribution for English Texts |
|
We collect empirical results for 2000 pairs of 100 letter texts.
For English we use the book Dr Thorndyke Short Story Omnibus by R. Austin Freeman, formerly in Project Gutenberg.
We extract a first part of 402347 letters and chop it into chunks a, b, c, d, ... of 100 letters each. Then we count χ(a, b), ... and list the values in the first column of a spreadsheet.
The figure and table show some characteristics of the distribution.
To get χ-values divide x-values in the graphic by 10000.
Distribution of χ for 2000 English text pairs of 100 letters
Minimum: | 0.0500 | |||
Median: | 0.0660 | Mean value: | 0.0663 | |
Maximum: | 0.0877 | Standard dev: | 0.0049 | |
1st quartile: | 0.0630 | 5% quantile: | 0.0587 | |
3rd quartile: | 0.0693 | 95% quantile: | 0.0745 |