CryptologyCommentary on the cryptologic episode inJules Verne: La Jangada (Eight Hundred Leagues on the Amazon) |
|
La Jangada (Eight Hundred Leagues on the Amazon) is the second novel by Jules VERNE in which cryptology plays a role, in this case even a crucial role. Jules VERNE shows his thorough knowledge of the actual cryptologic literature but also again that he is no real cryptologist. However he describes the struggle for the decryption in an absolutely haunting way—the reader suffers with the judge Jarriquez, the cryptanalytic hero of this novel.
Like the other cryptologic stories by VERNE also this one has typical inconsistencies: Why should a Brasilian slave hunter write his avowal in French? The answer is immediate: Verne didn't know Portuguese for phrasing and encrypting the corresponding text. Jarriquez correctly guesses the encryption method—the Gronsfeld cipher—, although his reasoning about the repeated trigram hhh is not overly convincing. But why Jarriquez hazards many guesses for many days without any plan instead of moving the probable word Dacosta along the text—the name that really occurs in the text! He discounts Manoel's proposal as infeasible. The reason—the beginning of the key number must coincide with the beginning of the probable word—is completey irrelevant.
How expensive would this approach—searching fot the probable word Dacosta—really be? The text at hand has 276 letters. We need about 270 trials. Let's be generous and allow for 5 minutes per trial. Then the complete search takes less then 23 hours. Moreover Jarriquez could distribute the execution to several people. In this way he would find the solution in a few hours. But this would dramatically decrease the suspense.
Furthermore Jarriquez correctly notes that, if the end of the alphabet is met during encryption, one cyclically restarts at the beginning. However during his attempts at cryptanalysis he seems to ignore this—he excludes possible keys because he believes for example that he cannot count 8 letters backwards from h—where z would result.
At the end, after the probable word Ortega is identified, fortunately the text really starts with the corresponding figure of the key, a chance of 1/6. Otherwise the critical seconds would pass, and Joam Dacosta would dance on air.
And finally: A half-decent investigator should have found the name Ortega without Fragoso's act of force.
Let us try a cryptanalysis following the rules of the trade. First we count the letters:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 3 4 3 16 9 10 13 23 4 8 9 9 9 9 12 16 16 12 10 8 17 13 0 12 19 12
This distribution is not nearly uniform, but too smooth for a monoalphabetic substitution. A polyalphabetic substitution with a not-to-long key would fit. The missing letter W hints at a roman language.
Searching for repetitions we find eight of them of length 3:
DDQ - distance 186 DQF - distance 186 RYM - distance 192 TOZ - distance 186 RPL - distance 60 HHH - distance 54 KYU - distance 12 YUU - distance 12
Among these we also find the repetition HHH that was noted by Jarriquez, and that alone gives strong evidence for key length 6. Together with the other repetitions we take this length for almost granted. A look at the coincidence spectrum confirms this:
kappa[1] = 0.0616 (<---) kappa[2] = 0.0471 kappa[3] = 0.0290 kappa[4] = 0.0435 kappa[5] = 0.0290 kappa[6] = 0.0616 (<---) kappa[7] = 0.0326 kappa[8] = 0.0471 kappa[9] = 0.0326 kappa[10] = 0.0507 kappa[11] = 0.0652 (<---) kappa[12] = 0.1159 <--- kappa[13] = 0.0326 kappa[14] = 0.0217 kappa[15] = 0.0399 kappa[16] = 0.0471 kappa[17] = 0.0399 kappa[18] = 0.0725 <--- kappa[19] = 0.0435
Therefore we subdivide the cryptogram into groups of 6 letters:
PHYJSL YDDQFD ZXGASG ZZQQEH XGKFND RXUJUG IOCYTD XVKSBX HHUYPO HDVYRY MHUHPU YDKJOX PHETOZ SLETNP MVFFOV PDPAJX HYYNOJ YGGAYM EQYNFU QLNMVL YFGSUZ MQIZTL BQGYUG SQEUBV NRCRED GRUZBL RMXYUH QHPZDR RGCROH EPQXUF IVVRPL PHONTH VDDQFH QSNTZH HHNFEP MQKYUU EXKTOG ZGKYUU MFVIJD QDPZJQ SYKRPL XHXQRY MVKLOH HHOTOZ VDKSPP SUVJHD
and count the letters per column:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Kol. 0: 0 1 0 0 3 0 1 5 2 0 0 0 6 1 0 4 4 3 4 0 0 2 0 3 4 3 Kol. 1: 0 0 0 7 0 2 4 9 0 0 0 2 1 0 1 1 5 2 1 0 1 4 0 3 2 1 Kol. 2: 0 0 3 2 3 1 4 0 1 0 9 0 0 3 2 3 2 0 0 0 4 4 0 2 3 0 Kol. 3: 3 0 0 0 0 3 0 1 1 4 0 1 1 3 0 0 4 4 3 5 1 0 0 1 7 4 Kol. 4: 0 3 0 1 3 3 0 1 0 3 0 0 0 2 8 5 0 2 2 3 7 1 0 0 1 1 Kol. 5: 0 0 0 6 0 1 4 7 0 1 0 6 1 0 1 3 1 1 0 0 4 2 0 3 2 3
Now we try to adjust the frequencies to the distributions of common languages by shifting. The following line-up seems promising:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 0 0 3 0 1 5 2 0 0 0 6 1 0 4 4 3 4 0 0 2 0 3 4 3 0 0 0 7 0 2 4 9 0 0 0 2 1 0 1 1 5 2 1 0 1 4 0 3 2 1 0 0 3 2 3 1 4 0 1 0 9 0 0 3 2 3 2 0 0 0 4 4 0 2 3 0 3 0 0 0 0 3 0 1 1 4 0 1 1 3 0 0 4 4 3 5 1 0 0 1 7 4 0 3 0 1 3 3 0 1 0 3 0 0 0 2 8 5 0 2 2 3 7 1 0 0 1 1 0 0 0 6 0 1 4 7 0 1 0 6 1 0 1 3 1 1 0 0 4 2 0 3 2 3
The best fitting shifts are: 4 3 2 5 1 3, and we know from the novel that this is the correct key.
As noted above there is an even faster solution, if we assume, as Jarriquez did, that the cipher is defined »by a number«, that it is a GRONSFELD cipher (a BELASO or VIGENÈRE cipher where the alphabet is shifted by at most 9 positions): We search for the probable word Dacosta. Then each cipher letter can be at most 9 positions after the corresponding plaintext letter. Therefore we use the search term
[DEFGHIJKLM][ABCDEFGHIJ][CDEFGHIJKL][OPQRSTUVWX] [STUVWXYZAB][TUVWXYZABC][ABCDEFGHIJ]This gives exactly 1 hit: EDGRUZB, and this is the correct location.
According to F. W. BAUER the GRONSFELD cipher was invented in the 17th century, and even in 1892 french anarchists used it—maybe after reading Jules VERNE. BAZERIES easily broke it and thus prevented an act of terrorism.