CryptologyThe Phi Distribution for English Texts |
|
For empirically determining the distribution of the inner coincidence index φ(a) for English texts (or text chunks) a, we again take a large English text—in this case the book The Fighting Chance by Robert W. Chambers from Project Gutenberg—and chop it into chunks a, b, c, d, ... of 100 letters each. Then we count φ(a), φ(b), ... and list the values in the first column of a spreadsheet. The text has 602536 letters. Here is the cleaned text.
We take the first 262006 letters and consider the first 2000 pieces of 100 letters each. The figure and table show some characteristics of the distribution.
To get φ-values divide x-values in the graphic by 4950.
Distribution of φ for 2000 English texts of 100 letters
Minimum: | 0.0481 | |||
Median: | 0.0634 | Mean value: | 0.0639 | |
Maximum: | 0.0913 | Standard dev: | 0.0063 | |
1st quartile: | 0.0594 | 5% quantile: | 0.0549 | |
3rd quartile: | 0.0677 | 95% quantile: | 0.0750 |