[JoGu]

Cryptology

Statistical Analysis of Ciphertext

a7Hzq .#5r<
kÜ\as TâÆK$
ûj(Ö2 ñw%h:
Úk{4R f~`z8
¤˜Æ+Ô „&¢Dø

Letter Frequencies

Natural languages such as

and also artificial languages as show typical character frequencies that are Texts of about 500 or 1000 letters in a natural language rareley show a significant difference from the typical frequencies.

This allows automating the cryptanalysis based on letter frequencies to a large extent. The web offers several such programs, for example see here or here, and in the ACA Crypto Dropbox.

Exercises:

  1. Count the letter frequencies of some texts of your choice using the web forms for single letters and for bigrams.
  2. EJGGZ TGWOF IPOHI HONAW OCIAO TQUPO HZTHI EFOTQ QCHIO TNAIO
    IOHHZ TGUJP QRAOT QCGWO FIIJP ROTQR OTQNA VJHOT RJQJQ EOP
    Hint: The letter frequencies will mislead you. A better approach uses a probable word.

For a description in mathematical terms see here.


Soime General Remarks

The statistical analysis of ciphertext

In most cases it doesn't provide a complete solution but assesses the probabilities of different choices in an exhaustion, and in this way reduces the expenses of the exhaustion significantly in the mean.

To escape statistical attacks people invented lots of methods that flatten the distribution of frequencies, for example homophonic, polygraphic, or polyalphabetic ciphers, or compression before encryption. But these methods give little protection against attacks by pattern search or probable words.

There are refined methods of statistical analysis that also apply to more complicated ciphers. These are the subject of Chapter 3.


Author: Klaus Pommerening, 1999-Oct-18; last change: 2014-May-14.