[JoGu]

Cryptology

I.3 Some Statistical Properties of Languages

a7Hzq .#5r<
kÜ\as TâÆK$
ûj(Ö2 ñw%h:
Úk{4R f~`z8
¤˜Æ+Ô „&¢Dø

Contents

  1. Recognizing plaintext: Friedman's Most-Frequent-Letters test [mathematical description as PDF]
  2. Empirical results on MFL scores
  3. Application to the cryptanalysis of the Bellaso cipher
  4. Recognizing plaintext: Sinkov's Log-Weight test [mathematical description as PDF]
  5. Recognizing plaintext: The Log-Weight method for bigrams [mathematical description as PDF]
  6. Empirical results on BLW scores
  7. Coincidences of two texts [mathematical description as PDF] with Perl program
  8. Empirical values for natural languages
  9. Autocoincidence of a text [mathematical description as PDF] with Perl program [online call]
  10. The inner coincidence index of a text [mathematical description as PDF], Perl program
  11. The distribution of the inner coincidence index [PDF] with Perl program
  12. Sinkov's formula [PDF]
  13. Sinkov's test for the period [PDF], Perl program, testing a short ciphertext
  14. Kullback's Cross-Product Sum statistic [PDF] (with a side remark on Cohen's kappa), Perl program,
  15. Adjusting the columns of a disk cipher [PDF], example
  16. Modeling language by a stochastic process [PDF]
  17. Stochastic languages [PDF]

Here is the mathematical part as a single PDF.


Summary

In this section we study certain statistical properties of texts and languages. These help to answer questions such as:

To get useful information on these questions we define some statistical measures and analyze the distributions of them. The main methods for determining reference values are:

The systematic statistical approach to cryptanalysis started William F. Friedman (1891–1969) around 1920. It was extended by his assistents Solomon Kullback (1907–1994) and Abraham Sinkov (1907–1998) in the 1920s and 1930s.

The statistical methodology has since developed a bit and now provides a uniform conceptual framework for statistical tests and decisions.

For a systematic treatment of the first question above and for a comparison of several tests a good reference is

Ravi Ganesan, Alan T. Sherman, Statistical Techniques for Language Recognition:
An Introduction and Guide for Cryptanalysts. Cryptologia 17 (1993), 321–366.
An Empirical Study Using Real and Simulated English. Cryptologia 18 (1994), 289–331.
An elementary but mathematically sound introduction to probability and statistics is
A. M. Gleason: Elementary Course in Probability for the Cryptanalyst.
Laguna Hills: Aegean Park Press 1985.


Author: Klaus Pommerening, 1999-Nov-27; last change: 2019-Oct-29.