[JoGu]

Cryptology

Running-Text Ciphers – Approaches to Cryptanalysis

a7Hzq .#5r<
kÜ\as TâÆK$
ûj(Ö2 ñw%h:
Úk{4R f~`z8
¤˜Æ+Ô „&¢Dø

Cryptanalysis of running-text ciphers is laborious. There are several approaches that should be combined in practice. Automated procedures are proposed in

E. Dawson and L. Nielsen: Automated cryptanalysis of XOR plaintext strings. Cryptologia XX (1996), 165–181.
This paper considers running-text ciphers where plaintext and key are combined via binary addition (XOR) instead of addition mod 26. This distinction not essential for the method (but of course for the use of the program).
A. Griffing: Solving the running key cipher with the Viterbi algorithm. Cryptologia XXX (2006), 361–367.


Approach 0: Exhaustion

Exhausion of all possible keytexts is practically infeasible when there is no a priori idea what the keytext could be.

Exhaustion is feasible when the attacker knows the source of the keytext, say a certain book, and has only to exhaust the book letter for letter. This approach is troublesome for the pencil and paper analyst, but easy with machine support.

Methods for language recognition may be helpful.


Approach 1: Probable Word and Zigzag Exhaustion

When in the example above the attacker guesses the probable word arrive in the plaintext and shifts it along the ciphertext, already for the second position she gets the keytext FYOUCA. With a little imagination she guesses the phrase IFYOUCAN, yielding the plaintext fragment IARRIVET, and expands this fragment to IARRIVETOMORROW. This in turn expands the keytext to IFYOUCANKEEPYOU. Proceeding in this way alternating between plaintext and keytext is called zigzag exhaustion (or cross-ruff method).

For some time during the process it may be unclear whether a partial text belongs to plaintext or key.

A dictionary is a useful tool for this task. Or a pattern search in a collection of literary texts may lead to success.


Approach 2: Frequent Word Fragments

If the attacker cannot guess a probable word she might try common word fragments, bearing in mind that plaintext as well as keytext are meaningful texts. Shifting words or word fragments such as

  THE AND FOR WAS HIS NOT BUT ARE ING ION ENT
  THAT THIS FROM WITH HAVE TION
along the ciphertext will result in many meaningful trigrams or tetragrams that provide seed crystals for a zigzag exhaustion.

Recognizing typical combinations such as

   THE + THE = MOI
   ING + ING = QAM
   THAT + THAT = MOAM
may be useful.


Approach 3: Frequency Analysis

A ciphertext letter may have 26 different origins, for example

E = A + E = B + D = C + C = D + B = E + A = F + Z = G + Y = ...
Therefore running-text cryptograms have typical letter frequencies that may be computed from the usual plaintext letter frequencies, see the mathematical complement. Even though the distribution is much more flat compared with plain language, it is not completely uniform and therefore leaks some information on the plaintext. For example it gives a hint at the method of encryption.

Example: Letter frequencies of running-text cryptograms in English (values in percent).

 A   B   C   D   E   F   G   H   I   J   K   L   M  
4.3 3.5 3.2 2.5 4.7 3.8 4.4 4.4 4.8 2.9 3.5 4.5 4.3

 N   O   P   Q   R   S   T   U   V   W   X   Y   Z 
3.1 3.2 3.6 3.0 4.4 4.5 4.0 3.2 4.9 4.7 3.8 3.3 3.5
Coincidence index = 0.0400.

Example: Letter frequencies of running-text cryptograms in German (values in percentnt).

 A   B   C   D   E   F   G   H   I   J   K   L   M  
4.2 2.6 2.3 2.4 5.0 3.7 3.7 4.3 5.8 2.9 3.7 4.4 4.9

 N   O   P   Q   R   S   T   U   V   W   X   Y   Z 
3.2 3.0 3.1 3.3 5.7 3.4 3.2 3.4 5.9 4.5 3.9 3.9 3.6
Coincidence index = 0.0411.

Even more helpful is the distribution of bigrams and trigrams. Each bigram in the ciphertext has 26² = 676 different possible sources whose probabilities however show large differences. For trigrams most sources even have probabilities 0. A systematic description of this approach is in

Craig BAUER, Christian N. S. TATE: A statistical attack on the running key cipher. Cryptologia 26 (2002), 274–282.


Approach 4: Frequent Letter Combinations

Frequency analysis (approach 3) is cumbersome, at least for manual evaluation. FRIEDMAN refined this approach in a systematic way that works without known plaintext. See the next section.


Author: Klaus Pommerening, 2000-Jan-14; last change: 2014-Aug-06.