[JoGu]

Cryptology

The Chi Distribution for English Texts

a7Hzq .#5r<
kÜ\as TâÆK$
ûj(Ö2 ñw%h:
Úk{4R f~`z8
¤˜Æ+Ô „&¢Dø

We collect empirical results for 2000 pairs of 100 letter texts.

For English we use the book Dr Thorndyke Short Story Omnibus by R. Austin Freeman, formerly in Project Gutenberg.

We extract a first part of 402347 letters and chop it into chunks a, b, c, d, ... of 100 letters each. Then we count χ(a, b), ... and list the values in the first column of a spreadsheet.

The figure and table show some characteristics of the distribution.

To get χ-values divide x-values in the graphic by 10000.
[Frequency of phi values]

Distribution of χ for 2000 English text pairs of 100 letters

Minimum: 0.0500
Median: 0.0660Mean value: 0.0663
Maximum: 0.0877Standard dev:0.0049
1st quartile:0.06305% quantile: 0.0587
3rd quartile:0.069395% quantile:0.0745

Author: Klaus Pommerening, 2013-Dec-20; last change: 2014-Jan-25.