Transposition Ciphers - Cryptanalytic Approaches

Cryptology

Transposition Ciphers – Cryptanalytic Approaches

a7Hzq .#5r< kÜ\as TâÆK$ ûj(Ö2 ñw%h: Úk{4R f~`z8 ¤˜Æ+Ô „&¢Dø

Anagramming

… is the standard method for cryptanalyzing transpositions. The goal is to reconstruct the correct order of the plaintext letters. In any case the cryptanalyst needs context information because anagrams in general have many solutions. There are even Web sites, that generate anagrams for each given textstring, see the alt.anagrams FAQ.

The string »Klaus Pommerening« yielded 75707 (!) German anagrams, among them some ingenious ones such as

  GERMANEN IM PULK SO     MAN KEGLE SPION RUM 
  ANMERKEN: SO LUMPIG!    ALGENKNOSPE IM RUM  
  SIGNALE KOMMEN PUR      KLINGE AUS POMMERN  
  MAGNESIUM-KNORPEL       SAUGNOME KLIMPERN   
  SPIONAGEN UM KREML      KLAGE NUR, MEIN MOPS
  KLAMMERUNG IN POSE      KOPIERSAMMLUNGEN

and some absurd ones such as

  ANGEL MERK UNI MOPS     GASEN EMPOR UM LINK
  PLAGE MEIN KORN MUS     GRAUPE MOSLEM KINN 
  OMEGA KLIMPERN UNS      AKNE GLOMM SEIN PUR
  SANGEN KEIM POL RUM     SAMMLER PEKING UNO

that however consist of valid German words.

Ideas to Start With

Some useful hooks for successful anagramming are:

A probable word,
Characteristic bi- and trigrams (QU, TH, THE, or in German QU, CH, SCH),
Bigram scores—if the cryptanalyst tries to combine columns of ciphertext, then the correct solution generally will have a high score, expressed as a cumulative bigram probability.

The larger the »width« of the transposition, the more difficult or ambiguous anagramming will turn out.

Therefore the solutions of general non-periodic transposition ciphers are quite difficult. This difficulty vanishes if the cryptanalyst has two (or more) ciphertexts of the same length generated with the same key. Then she writes them below each other (in »depth«) and tries multiple anagramming looking at the bigram probabilities.

For a (German) example look at the analysis of the cryptogram from Jules Verne's Mathias Sandorf.

Using Bigram Scores

In our exemplary analysis of a columnar transposition we assessed the quality of possible neighbor columns by visually inspecting the occuring bigrams.

For a more systematic approach we would form a score by adding log weights of the single bigrams. To define these we should use conditional probabilities: The first letter of a bigram is given—it occurs in the real plaintext. If we encounter a Q as the first letter, we wouldn't assign the next letter U the probability 0.20% (for English) of the bigram QU but the conditional probability 100% given that Q is the first letter.

Therefore we use conditional Bigram Log-Weights for the bigrams we observe in an attempt of anagramming. Here are tables giving these numbers for English, German, and French. CSV-Files containing them are eng_cblw.csv, ger_cblw.csv, fra_cblw.csv. The rationale for the derivation of these values is in the mathematical version of this section.

The conditional Bigram Log-Weight score (cBLW score) of a sequence of bigrams is the sum of the scores of the bigrams in the sequence. Perl programs that calculate this score for English, German, or French are cBLWscE.pl, cBLWscD.pl, cBLWscF.pl. Dividing the cBLW score by the number of bigrams for easy comparision we get the cBLW rate.

Looking back at the example, at the first decision point we faced the alternative:

1     DDSSRGNIEIGNSAECSEIIEASLMITAGRN
4     NZEJGKCLFTCERTMHHDMNDRODMOEDTAI

1     DDSSRGNIEIGNSAECSEIIEASLMITAGR 
8     EZTCEFSNNBTAULRHSNNKCLCOAKENNS

For Columns 1 and 4 this gives the scores (using the values for German)

   DN: 1.1   NC: 0.7   SR: 1.1   IM: 1.5   MM: 1.9   NI: 1.8
   DZ: 0.8   IL: 1.5   AT: 1.9   IN: 2.3   IO: 1.4
   SE: 2.2   EF: 1.2   EM: 1.5   ED: 1.5   TE: 2.5
   SJ: 0.5   IT: 2.0   CH: 3.0   AR: 1.9   AD: 1.2
   RG: 1.6   GC: 0.0   SH: 1.1   SO: 1.6   GT: 1.8
   GK: 1.0   NE: 2.1   ED: 1.5   LD: 1.6   RA: 2.0

               cBLW score: 47.8           cBLW rate: 1.54

The same procedure for Columns 1 and 8 gives

   DE: 2.7   NS: 1.9   SU: 1.3   IN: 2.3   MA: 2.2
   DZ: 0.8   IN: 2.3   AL: 2.0   IK: 1.2   IK: 1.2
   ST: 2.2   EN: 2.4   ER: 2.4   EC: 1.2   TE: 2.5
   SC: 2.1   IB: 1.0   CH: 3.0   AL: 2.0   AN: 2.2
   RE: 2.2   GT: 1.8   SS: 2.0   SC: 2.1   GN: 1.2
   GF: 0.8   NA: 1.8   EN: 2.4   LO: 1.6   RS: 1.9

               cBLW score: 56.7           cBLW rate: 1.89

and confirms our visual impression.

Exercise: Calculate the cBLW scores for the second decision point:

8  EEZTCEFSNNBTAULRHSNNKCLCOAKENNS
6  LRWEHSUTTZUOUEIAEMVFEKGHGTENWIC 

8  EEZTCEFSNNBTAULRHSNNKCLCOAKENNS
7  FNGTHNEEGEUHDUVIUDETDEIOETLENEE

Remark: Apparently no simple method exists for a-priori calculating the key length of a columnar transposition.

Linear Analysis

… starts from the observation that transposition is a linear map over the finite ring of integers modulo 26 if we identify the standard alphabet with this ring. This makes transposition a special case of linear encryption.

Simulated Annealing

… is a method for optimization that also can be successfully used for solving transposition ciphers. Here are some external links:

Simulated Annealing.
Crank (Cryptanalysis Toolkit).

Also see the paper:

J. P. Giddy, R. Safavi Naini,
Automated Cryptanalysis of Transposition Ciphers. Computer Journal 37 (1994), 429–436.

Author: Klaus Pommerening, 2000-Jan-22; last change: 2014-Jul-27.