[JoGu]

Cryptology

Commentary on the cryptologic episode in

Jules Verne: La Jangada (Eight Hundred Leagues on the Amazon)

a7Hzq .#5r<
kÜ\as TâÆK$
ûj(Ö2 ñw%h:
Úk{4R f~`z8
¤˜Æ+Ô „&¢Dø

La Jangada (Eight Hundred Leagues on the Amazon) is the second novel by Jules VERNE in which cryptology plays a role, in this case even a crucial role. Jules VERNE shows his thorough knowledge of the actual cryptologic literature but also again that he is no real cryptologist. However he describes the struggle for the decryption in an absolutely haunting way—the reader suffers with the judge Jarriquez, the cryptanalytic hero of this novel.

Like the other cryptologic stories by VERNE also this one has typical inconsistencies: Why should a Brasilian slave hunter write his avowal in French? The answer is immediate: Verne didn't know Portuguese for phrasing and encrypting the corresponding text. Jarriquez correctly guesses the encryption method—the Gronsfeld cipher—, although his reasoning about the repeated trigram hhh is not overly convincing. But why Jarriquez hazards many guesses for many days without any plan instead of moving the probable word Dacosta along the text—the name that really occurs in the text! He discounts Manoel's proposal as infeasible. The reason—the beginning of the key number must coincide with the beginning of the probable word—is completey irrelevant.

How expensive would this approach—searching fot the probable word Dacosta—really be? The text at hand has 276 letters. We need about 270 trials. Let's be generous and allow for 5 minutes per trial. Then the complete search takes less then 23 hours. Moreover Jarriquez could distribute the execution to several people. In this way he would find the solution in a few hours. But this would dramatically decrease the suspense.

Furthermore Jarriquez correctly notes that, if the end of the alphabet is met during encryption, one cyclically restarts at the beginning. However during his attempts at cryptanalysis he seems to ignore this—he excludes possible keys because he believes for example that he cannot count 8 letters backwards from h—where z would result.

At the end, after the probable word Ortega is identified, fortunately the text really starts with the corresponding figure of the key, a chance of 1/6. Otherwise the critical seconds would pass, and Joam Dacosta would dance on air.

And finally: A half-decent investigator should have found the name Ortega without Fragoso's act of force.


Let us try a cryptanalysis following the rules of the trade. First we count the letters:

  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
  3  4  3 16  9 10 13 23  4  8  9  9  9  9 12 16 16 12 10  8 17 13  0 12 19 12

This distribution is not nearly uniform, but too smooth for a monoalphabetic substitution. A polyalphabetic substitution with a not-to-long key would fit. The missing letter W hints at a roman language.

Searching for repetitions we find eight of them of length 3:

   DDQ  - distance 186
   DQF  - distance 186
   RYM  - distance 192
   TOZ  - distance 186
   RPL  - distance  60
   HHH  - distance  54
   KYU  - distance  12
   YUU  - distance  12

Among these we also find the repetition HHH that was noted by Jarriquez, and that alone gives strong evidence for key length 6. Together with the other repetitions we take this length for almost granted. A look at the coincidence spectrum confirms this:

     kappa[1]  = 0.0616 (<---)
     kappa[2]  = 0.0471     
     kappa[3]  = 0.0290     
     kappa[4]  = 0.0435     
     kappa[5]  = 0.0290     
     kappa[6]  = 0.0616 (<---)
     kappa[7]  = 0.0326     
     kappa[8]  = 0.0471     
     kappa[9]  = 0.0326     
     kappa[10] = 0.0507     
     kappa[11] = 0.0652 (<---)
     kappa[12] = 0.1159 <---
     kappa[13] = 0.0326     
     kappa[14] = 0.0217     
     kappa[15] = 0.0399     
     kappa[16] = 0.0471     
     kappa[17] = 0.0399     
     kappa[18] = 0.0725 <---
     kappa[19] = 0.0435     

Therefore we subdivide the cryptogram into groups of 6 letters:

     PHYJSL YDDQFD ZXGASG ZZQQEH XGKFND RXUJUG IOCYTD XVKSBX
     HHUYPO HDVYRY MHUHPU YDKJOX PHETOZ SLETNP MVFFOV PDPAJX
     HYYNOJ YGGAYM EQYNFU QLNMVL YFGSUZ MQIZTL BQGYUG SQEUBV
     NRCRED GRUZBL RMXYUH QHPZDR RGCROH EPQXUF IVVRPL PHONTH
     VDDQFH QSNTZH HHNFEP MQKYUU EXKTOG ZGKYUU MFVIJD QDPZJQ
     SYKRPL XHXQRY MVKLOH HHOTOZ VDKSPP SUVJHD

and count the letters per column:

           A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
                                                              
  Kol. 0:  0 1 0 0 3 0 1 5 2 0 0 0 6 1 0 4 4 3 4 0 0 2 0 3 4 3
  Kol. 1:  0 0 0 7 0 2 4 9 0 0 0 2 1 0 1 1 5 2 1 0 1 4 0 3 2 1
  Kol. 2:  0 0 3 2 3 1 4 0 1 0 9 0 0 3 2 3 2 0 0 0 4 4 0 2 3 0
  Kol. 3:  3 0 0 0 0 3 0 1 1 4 0 1 1 3 0 0 4 4 3 5 1 0 0 1 7 4
  Kol. 4:  0 3 0 1 3 3 0 1 0 3 0 0 0 2 8 5 0 2 2 3 7 1 0 0 1 1
  Kol. 5:  0 0 0 6 0 1 4 7 0 1 0 6 1 0 1 3 1 1 0 0 4 2 0 3 2 3

Now we try to adjust the frequencies to the distributions of common languages by shifting. The following line-up seems promising:

          A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

  0 1 0 0 3 0 1 5 2 0 0 0 6 1 0 4 4 3 4 0 0 2 0 3 4 3
    0 0 0 7 0 2 4 9 0 0 0 2 1 0 1 1 5 2 1 0 1 4 0 3 2 1
      0 0 3 2 3 1 4 0 1 0 9 0 0 3 2 3 2 0 0 0 4 4 0 2 3 0
3 0 0 0 0 3 0 1 1 4 0 1 1 3 0 0 4 4 3 5 1 0 0 1 7 4
        0 3 0 1 3 3 0 1 0 3 0 0 0 2 8 5 0 2 2 3 7 1 0 0 1 1
    0 0 0 6 0 1 4 7 0 1 0 6 1 0 1 3 1 1 0 0 4 2 0 3 2 3

The best fitting shifts are: 4 3 2 5 1 3, and we know from the novel that this is the correct key.


As noted above there is an even faster solution, if we assume, as Jarriquez did, that the cipher is defined »by a number«, that it is a GRONSFELD cipher (a BELASO or VIGENÈRE cipher where the alphabet is shifted by at most 9 positions): We search for the probable word Dacosta. Then each cipher letter can be at most 9 positions after the corresponding plaintext letter. Therefore we use the search term

     [DEFGHIJKLM][ABCDEFGHIJ][CDEFGHIJKL][OPQRSTUVWX]
     [STUVWXYZAB][TUVWXYZABC][ABCDEFGHIJ]

This gives exactly 1 hit: EDGRUZB, and this is the correct location.

According to F. W. BAUER the GRONSFELD cipher was invented in the 17th century, and even in 1892 french anarchists used it—maybe after reading Jules VERNE. BAZERIES easily broke it and thus prevented an act of terrorism.


Author: Klaus Pommerening, 2000-Sep-29; last change: 2013-Sep-20.