[JoGu]

Cryptology

Letter Frequencies in Artificial Languages

a7Hzq .#5r<
kÜ\as TâÆK$
ûj(Ö2 ñw%h:
Úk{4R f~`z8
¤˜Æ+Ô „&¢Dø

MS-DOS-EXE Files

Byte Frequency
00ca 8%
8B
E8
ca 3%
06
FF
20
74
ca 1.8%
04
50
02
75
01
ca 1.4%
03
46
65
B8
ca 1.4%

The values fluctuate widely.

The frequency of the bytes 65 (letter e) and 20 (space symbol) depends on the portion of embedded text in natural languages.

Therefore the cryptanalysis by statistical methods is significantly more difficult than with natural text.

The method of pattern recognition is indispensable.


Programs in Pascal

Character Frequency
Space symbolca. 25%An author who uses a lot of separating lines :-)
- (dash)ca. 6%
e LF CR ca. 3.5%
R N ; : E nca. 2-2.5%depend heavily on the author
O a T , I lca. 1.5-2%
u = ' ( ) D Sca. 1.2-1.4%

The frequencies depend on the style of the programmer. In the example the author

Specific Characteristics:

Cryptanalysis by statistical methods is significantly more difficult than with natural text.

Pattern recognition methods are indispensable.


MS-Word (.doc Files)

Byte Frequency
00 ca. 7-70%
01ca. 0.8-17%
20 = spaceca. 0.8-12%
65 = eca. 1-10%
FF ca. 1-10%

Observations:

Here is a sample encrypted MS-Word file.. Exercise: Analyze it and decrypt it. You will need a hex editor, and maybe an XOR program.

The last remark leads to an efficient method of cryptanalyzing XOR-encrypted files that use a periodically repeated key: Add (by XOR, that is binary) the blocks pairwise. If one of the blocks corresponds to a plaintext block with predominant zero bytes, the the sum yields readable plaintext.

Plaintext... a1 ... as... 0 ... 0...
Key... k1 ... ks... k1 ... ks...
Ciphertext... c1 ... cs... c1' ... cs'...

where ci = ai + ki, ci' = 0 + ki for i = 1, ..., s.

Hence ci + ci' = ai + ki + ki = ai—a plaintext block is revealed—

and ki = ci'—the key is revealed.

If the addition of two ciphertext blocks yields a null block, then with some probability the corresponding plaintext blocks consisted of zeroes only. Then also you get the key.

[For more general shift ciphers this method works the same way, only apply the inverse group operation for the single steps.]


Author: Klaus Pommerening, 1999-Oct-18; last change: 2014-Feb-06.