[JoGu]

Cryptology

Application of the Phi Distribution

a7Hzq .#5r<
kÜ\as TâÆK$
ûj(Ö2 ñw%h:
Úk{4R f~`z8
¤˜Æ+Ô „&¢Dø

The Power of the Phi Test

To which questions from the introduction do these results apply?

We can decide whether a text is from a certain language. This includes texts that are monoalphabetically encrypted because φ is invariant under monoalphabetic substitution. And we can recognize a monoalphabetically encrypted ciphertext.

For both of these decision problems we calculate the coincidence index φ(a) of our text a and decide »belongs to language« or »is monoalphabetic encrypted«—depending on our hypothesis—if φ(a) reaches or surpasses the 95% quantile of φ for random texts of the same length—if we are willing to accept an error rate of the first kind of 5%.

For a text of 100 letters the threshold for φ is about 0.0451 by the table for random texts.

The tables for English and German show that English or German texts surpass this threshold with high probability: For both languages the test has a power of nearly 100%.

It makes sense to work with the more ambitious »significance level« of 1% = bound for the error of the first kind. For this we set the threshold to the 99% quantile of the φ-distribution for random texts. Our experiment for texts of length 100 gives the empirical value of 0.0473, failing the empirical minimum for our 2000 English 100 letter texts, and sitting far below the empirical minimum for German. Therefore even at the 1%-level the test has a power of nearly 100%.


The Phi Test for Short Texts

Since the φ test performs so excellently for 100 letter texts we dare to look at 26 letter texts—a text length that occurs in the meet-in-the-middle attack against rotor machines.

Here are the results for

The decision threshold on the 5%-level is 0.0585. For English texts the test has a power of only 50%, for German, near 75%.


Author: Klaus Pommerening, 2013-Dec-20; last change: 2014-Jan-23.