[JoGu]

Cryptology

Cryptanalysis of a Columnar Transposition

Example

a7Hzq .#5r<
kÜ\as TâÆK$
ûj(Ö2 ñw%h:
Úk{4R f~`z8
¤˜Æ+Ô „&¢Dø

This is a first example. The plaintext is German, and the method is ad-hoc, mostly using a special characteristic of the language, the bigram ch. An example with an English plaintext and a more systematic line of action is given in a later section.

Ciphertext

DDSSR GNIEI GNSAE CSEII EASLM ITAGR NUUIU AUUHI PROEQ ILLOI
ATIAP IALSU VFEAN ETELE IRNSB TESEO OREEE IHRDE CHANZ EJGKC
LFTCE RTMHH DMNDR ODMOE DTAIN DNARI LNGYL EAUCT RRENB TEESI
NHLRW EHSUT TZUOU EIAEM VFEKG HGTEN WICFN GTHNE EGEUH DUVIU
DETDE IOETL ENEEE ZTCEF SNNBT AULRH SNNKC LCOAK ENNS

The frequency of the letter E catches the eye. The ciphertext could originate from a transposition.


Counting the letters

The frequencies of the single letters are:

  E 39   R 14   H 11   M  5   W  2   X  0
  N 22   U 14   C  9   K  4   J  1
  I 20   S 13   G  9   B  3   P  1
  T 17   L 12   O  9   V  3   Q  1
  A 15   D 11   F  5   Z  3   Y  1   total: 244
These numbers fit the hypothesis that the cryptogram is a transposition of a German plaintext. The coincidence index 0.0661 is somewhat small but this might be caused by the small textlength. The MFL test for German gives the excellent value of 0.72. The autocoincidence indices fluctuate wildly but don't show any regularity. The Kasiski search for repetitions finds repeated trigrams at distances of 110, 70, 130, 135, 114, 79, 23, 10, but no obvious period. Therefore we narrow our hypothesis down to a columnar transposition. There is no quick method to find the length of the key. Therefore we try the key lengths 2, 3, 4, … by exhaustion.

Exercise: Why can we rule out the key lengths 2 and 4 easily? Can this argument modified for key lengths 3 and 5?

Imagine we arrived at key length 8 without any prior success. (In fact we'll see in a later example this is not as time comsuming we might be afraid of.)


Arranging the Text in 8 Columns

Because 244 = 30*8 + 4, four of our 8 columns have length 30, and four have length 31. Unfortunaly, because the columns are permuted in an unknown way, we don't know which is which! Therefore in the following we extend the columns far enough to cover their true contents in any case—the »true columns« could be shifted up to three positions in relation to the (true) adjacent columns.

Since a text editor allows the handling of rows much easier than the handling of columns in the following we write the columns as rows. The true beginning of each column is somewhere in »column« 1 to 5 of this table.

  12345
1     DDSSRGNIEIGNSAECSEIIEASLMITAGRN
2    NUUIUAUUHIPROEQILLOIATIAPIALSUVF
3   VFEANETELEIRNSBTESEOOREEEIHRDECHA
4  CHANZEJGKCLFTCERTMHHDMNDRODMOEDTAI
5 DTAINDNARILNGYLEAUCTRRENBTEESINHLR
6  HLRWEHSUTTZUOUEIAEMVFEKGHGTENWICF
7   CFNGTHNEEGEUHDUVIUDETDEIOETLENEE
8    EEZTCEFSNNBTAULRHSNNKCLCOAKENNS

To get along we could place the columns beneath each other in pairs, with all possible displacements, and evaluate the occuring bigrams. A high score should indicate a possible neighborhood. Another good approach, especially for manual work, searches for the bigram CH that is typical for the plaintext language German.


Searching for CH

Column 1 contains a C. The possible adjacent letter are:

1 ...     C ...
2 ...    IL ...
3 ...   TES ...
4 ...  RTMH ...
5 ... EAUCT ...
6 ...  IAEM ...
7 ...   VIU ...
8 ...    RH ...
There are only two H's (and no K, the other possible companion of C in German). This yields two possibilities for the next column:
1     DDSSRGNIEIGNSAECSEIIEASLMITAGRN
4  CHANZEJGKCLFTCERTMHHDMNDRODMOEDTAI

1     DDSSRGNIEIGNSAECSEIIEASLMITAGRN
8    EEZTCEFSNNBTAULRHSNNKCLCOAKENNS
Assessing the bigram frequencies—visually or by counting or scoring—we prefer the second option to get on.

Fortunately Column 8 contains three C's. This makes a promising continuation. Here are the possible neighbors of the first C:

2 ...    UA ...
3 ...   NET ...
4 ...  ZEJG ...
5 ... NDNA  ...
6 ...  EHS  ...
7 ...   TH  ...
8 ...    C  ...
Only Columns 6 and 7 fit:
8  EEZTCEFSNNBTAULRHSNNKCLCOAKENNS
6 HLRWEHSUTTZUOUEIAEMVFEKGHGTENWICF

8  EEZTCEFSNNBTAULRHSNNKCLCOAKENNS
7 CFNGTHNEEGEUHDUVIUDETDEIOETLENEE
The first variant yields the bigrams CK and CH for the other two C's of Column 8—good!—. The second variant yields the bigrams CE and CO—not so good! Therefore we choose the first variant. This gives the arrangement
1     DDSSRGNIEIGNSAECSEIIEASLMITAGRN
8    EEZTCEFSNNBTAULRHSNNKCLCOAKENNS
6   HLRWEHSUTTZUOUEIAEMVFEKGHGTENWICF
We see that Column 1 has 31 letters. The E at the beginning of Column 8 must in reality be the end of Column 7, and Column 8 is a 30 letter column. Therefore our Columns 1 and 8 are the real Columns 4 and 5.


Ordering the Columns and Searching Onward

We already know that the true order of the columns follows the partial scheme

 1 2 3 4 5 6 7 8
 * * * 1 8 6 * *
We conclude that Column 6 is a 30 letter column. The two trigrams SCH make us optimistic about the right way to the solution. Also the other trigrams look promising. The scheme above for the possible shifts of the columns boils down to
1     DDSSRGNIEIGNSAECSEIIEASLMITAGRN (31)
2     UUIUAUUHIPROEQILLOIATIAPIALSUVF
3    FEANETELEIRNSBTESEOOREEEIHRDECHA
4   HANZEJGKCLFTCERTMHHDMNDRODMOEDTA
5   AINDNARILNGYLEAUCTRRENBTEESINHL
6    RWEHSUTTZUOUEIAEMVFEKGHGTENWIC   (30)
7    FNGTHNEEGEUHDUVIUDETDEIOETLENEE  (31)
8     EZTCEFSNNBTAULRHSNNKCLCOAKENNS  (30)
In particular Column 7 must contain 31 letters.

Let's search onward for further CH's. Column 6 has a C at its end. The scheme

2 ...    VF
3 ...   CHA
4 ...  DTA
5 ...  HL
6 ...   C
7 ...   EE
shows that possible neighbors could only be Columns 3 and 5, with the arrangement
1   DDSSRGNIEIGNSAECSEIIEASLMITAGRN
8   EZTCEFSNNBTAULRHSNNKCLCOAKENNS
6   RWEHSUTTZUOUEIAEMVFEKGHGTENWIC
3  FEANETELEIRNSBTESEOOREEEIHRDECHA

1   DDSSRGNIEIGNSAECSEIIEASLMITAGRN
8   EZTCEFSNNBTAULRHSNNKCLCOAKENNS
6   RWEHSUTTZUOUEIAEMVFEKGHGTENWIC
5   AINDNARILNGYLEAUCTRRENBTEESINHL
The first of these looks considerably better. In particular the fragments INFO ... IKER und MATH ... IKER catch the eye—could the text be about Computer Scientists (»Informatiker«) and Mathematicians (»Mathematiker«)?


Finishing

We take the arrangement

 1 2 3 4 5 6 7 8
 * * * 1 8 6 3 *
of the columns as basis for our further work. We conclude that Column 3 has 30 letters, and therefore Column 2 necessarily has 31 letters. The entire scheme therefore looks like this:
1     DDSSRGNIEIGNSAECSEIIEASLMITAGRN (31)
2     UUIUAUUHIPROEQILLOIATIAPIALSUVF (31)
3     EANETELEIRNSBTESEOOREEEIHRDECH  (30)
4    ANZEJGKCLFTCERTMHHDMNDRODMOEDTA
5   AINDNARILNGYLEAUCTRRENBTEESINHL
6    RWEHSUTTZUOUEIAEMVFEKGHGTENWIC   (30)
7    FNGTHNEEGEUHDUVIUDETDEIOETLENEE  (31)
8     EZTCEFSNNBTAULRHSNNKCLCOAKENNS  (30)
The neighbor of Column 3—in reality the last column—must have 30 letters. This could be Column 4 or Column 5. The C at the next to last position of Column 3 matches with the H in Column 5. We get the extended partial solution:
1   DDSSRGNIEIGNSAECSEIIEASLMITAGRN
8   EZTCEFSNNBTAULRHSNNKCLCOAKENNS
6   RWEHSUTTZUOUEIAEMVFEKGHGTENWIC
3   EANETELEIRNSBTESEOOREEEIHRDECH
5   INDNARILNGYLEAUCTRRENBTEESINHL
and the columns are
1  DDSSRGNIEIGNSAECSEIIEASLMITAGRN (31)
2  UUIUAUUHIPROEQILLOIATIAPIALSUVF (31)
3  EANETELEIRNSBTESEOOREEEIHRDECH  (30)
4  ANZEJGKCLFTCERTMHHDMNDRODMOEDTA (31)
5  INDNARILNGYLEAUCTRRENBTEESINHL  (30)
6  RWEHSUTTZUOUEIAEMVFEKGHGTENWIC  (30)
7  FNGTHNEEGEUHDUVIUDETDEIOETLENEE (31)
8  EZTCEFSNNBTAULRHSNNKCLCOAKENNS  (30)

The C in Column 5 fits an H in Column 4 (shifted by one position because of the line feed). The first C in Column 4 fits an H in Column 2. Now we have found the complete column arrangement

  1  2  3  4  5  6  7  8
  4  2  7  1  8  6  3  5
The solution is
4  ANZEJGKCLFTCERTMHHDMNDRODMOEDTA
2  UUIUAUUHIPROEQILLOIATIAPIALSUVF
7  FNGTHNEEGEUHDUVIUDETDEIOETLENEE
1  DDSSRGNIEIGNSAECSEIIEASLMITAGRN
8  EZTCEFSNNBTAULRHSNNKCLCOAKENNS
6  RWEHSUTTZUOUEIAEMVFEKGHGTENWIC
3  EANETELEIRNSBTESEOOREEEIHRDECH
5  INDNARILNGYLEAUCTRRENBTEESINHL
and the plaintext is
»Auf der einundzwanzigsten deutschen Jahrestagung fuer kuenstliche Intelligenz in Freiburg trug Tony Cohn aus Leeds ueber qualitative raeumliche Schlussmethoden vor. Die Informatiker entdecken die algebraische Topologie. Die Mathematiker sollten diese Anwendung nicht verschlafen.«


Author: Klaus Pommerening, 1997-Sep-12; last change: 2014-Jul-24.