[JoGu]

Cryptology

Pattern Search

a7Hzq .#5r<
kÜ\as TâÆK$
ûj(Ö2 ñw%h:
Úk{4R f~`z8
¤˜Æ+Ô „&¢Dø

Word Lists

The second basic approach to cryptanalysis of the monoalphabetic substitution is the search for patterns in the ciphertext that correspond to the patterns of

This method is cumbersome if done by hand but easy with computer support that completely searches lists of several 100000 words in a few seconds. Such lists are in the web, for example here:

Their main purpose is not cryptanalysis of classical ciphers. In most cases they are tools for spell checking in word processing, such as the *.dic files of MS-Word or ispell for TEX. Or they serve as basis for password cracking.

Word lists also today are important tools for cryptanalysis of modern ciphers.

Precomputations for Cryptanalysis

In the past where computers were not yet available people compiled large lists of pattern words. However there are no known examples that predate World War II.

This is an example of lookup-table technique: Increase the efficiency of computations that are executed again and again by settling large parts of the computation a priori and use it immediately, if necessary. We'll see instances of this principle in a cryptanalytic context several times.

The Pattern Matching Pointers Page

by Stefano Lonardi is here.


Example

Let's search for conspicuous patterns in the (German) cryptogram from our former example of statistical cryptanalysis:

FOALO BLAEG KEEAS PLOBE AIXOB XOAHO APOEN OBEIX JKOEE OJKPU
ENOBR CXBOP EESJJ HOMCP AMMOB LCNSP CKEUO XOPLC EELOB CPUBO
AROBL CENOB RCXBO PGOPP HCFOB PAIXH LOPEI XJKOE EOJLA OUSJL
OPOBO UOJLO BGBWV HSUBC VXAOX OAEEH KPHOB EIXCO HZOPA OMCJE
LOPGB WVHSC PCJWH AGOBO APNOB RCXBO PRKOB LOEEO PEAIX OBXOA
HMCPC KRLAO UOXOA MXCJH KPULO ECJUS BAHXM KECPU OQAOE OPAEH
XCHEI XQOBO MCOPU OJ
Already the first row yields AEGKEEA and JKOEEOJK.

Of course there is no guarantee that these patterns don't pass word boundaries. We simply have to try. Comparing with a dictionary for AEGKEEA gives the hits:

                   LOBL|AEGKEEA|SPL
Abiturientinnen         entinne *
Agentinnen              entinne *
Assistentinnen          entinne *
Auslassung              uslassu *
Ausmessung              usmessu
Ausreiseerlaubnis       reiseer *
Austritts               stritts
Autoscooter             toscoot
Barterrasse             arterra *
Buechernarren           ernarre *
Bundeskasse             eskasse
Diskussion              iskussi
Ehrenmanne              enmanne *
Eichentonne             entonne *
Eigensinne              ensinne *
Einreiseerlaubnis       reiseer *
Einzelfalle             elfalle *
Fehlerkorrektur         erkorre *
Gartenbeet              tenbeet
Geisteswissenschaft     eswisse *
Genugtuung              nugtuun *
Geplapper               eplappe
Gesetzeswissen          eswisse *
Getreideernte           reideer *
Globetrotter            etrotte *
Hotelhalle              elhalle *
Inspektionsmassnahmen   nsmassn *
Interventionsmassnahmen nsmassn *
Kanonendonner           endonne *
Konkurrentinnen         entinne *
Korrespondentinnen      entinne *
Landeskasse             eskasse *
Maiensonne              ensonne
Mietmutter              etmutte *
Motorboot               torboot *
Patientinnen            entinne *
Schlafsaal              lafsaal *
Sternenbanner           enbanne *
Studentinnen            entinne *
Tageskassen             eskasse *
Titelrollen             elrolle
Totalkollaps            alkolla
Wandelhalle             elhalle *
gefangennahm            angenna *
gestossen               estosse
getrottet               etrotte *
grauhaarig              rauhaar
uebelwollen             elwolle *
verdorre                erdorre
verharre                erharre
vernarre                ernarre
verwirre                erwirre
verworren               erworre
wechselvolle            elvolle *

We can reject almost all of these—labeled by a star—, because the word contains at least on more letter of the pattern that has no counterpart at the corresponding position of the cryptogram.

The same for the pattern JKOEEOJK. Because it has more letters we expect less hits:

                BEIX|JKOEEOJK|PUE
Hochseeschiffahrt    chseesch *
Industriestaates     estaates *
Klosterreste         sterrest *
Landessender         ndessend *
Presseprodukt        Pressepr
Universitaetsstaedte aetsstae *
Verschluesselung     luesselu
Zertruemmerung       ruemmeru *
verbesserbar         rbesserb *

We find only one pair of hits that have compatible letter substitutions:

LAEGKEEASP NOBEIXJKOEEOJKPU
diskussion verschluesselung

This gives such a large part of the key:

a b c d e f g h i j k l m n o p q r s t u v w x y z
. . I L O . U X A . G J . P S . . B E . K N . . . .
that the remainder of the cryptanalysis is obvious.


Search for a Probable Word

Searching for a probable word is a variant of pattern search. We search for the pattern of a word that we suspect from knowledge of the context as occuring in the plaintext.

In our example we might suspect that the plaintext—assumed in German—has to do with cryptography. Then we could search for the word (or partial word) »schluessel« (= key). And behold, we find two exemplars of EIXJKOEEOJ and no other fitting sequence! That's a really good start for cryptanalysis.


Author: Klaus Pommerening, 1999-Oct-28; last change: 2014-Jul-18.