Cryptology: Empirical Results on BLW Scores

Cryptology

Empirical Results on BLW Scores

a7Hzq .#5r< kÜ\as TâÆK$ ûj(Ö2 ñw%h: Úk{4R f~`z8 ¤˜Æ+Ô „&¢Dø

The heuristic motivation of the BLW score, like for all the scores in this chapter, relies on independence assumptions that are clearly violated by natural languages. Therefore again it makes sense to get empirical results by analyzing a large sample of concrete texts.

We extract 20000 letters from each of the texts Kim, Schachnovelle, and De la Terre à la Lune, and decompose them into 2000 chunks à 10 letters, see the files eng10a.txt, ger10a.txt, fra10a.txt.

Likewise we generate random texts, see rnd10Ea.txt, rnd10Da.txt, rnd10Fa.txt.

We collect the results in the spreadsheets ER10res.xls, DR10res.xls, FR10res.xls.

The results are summarized in the following tables and figures.

Frequencies of BLW scores for English vs. random texts

Score	Random	English
0 ≤ x ≤ 1	32	0
1 < x ≤ 2	97	0
2 < x ≤ 3	187	0
3 < x ≤ 4	254	0
4 < x ≤ 5	324	3
5 < x ≤ 6	301	1
6 < x ≤ 7	271	4
7 < x ≤ 8	216	1
8 < x ≤ 9	156	8
9 < x ≤ 10	77	18
10 < x ≤ 11	49	51
11 < x ≤ 12	25	120
12 < x ≤ 13	6	196
13 < x ≤ 14	3	322
14 < x ≤ 15	2	413
15 < x ≤ 16	0	406
16 < x ≤ 17	0	255
17 < x ≤ 18	0	157
18 < x ≤ 19	0	40
19 < x < ∞	0	5

[BLW scores for 2000 English (red) and random (blue) text
chunks of 10 letters each]

Frequencies of BLW scores for German vs. random texts

Score	Random	German
0 ≤ x ≤ 1	38	0
1 < x ≤ 2	105	0
2 < x ≤ 3	207	0
3 < x ≤ 4	269	0
4 < x ≤ 5	296	0
5 < x ≤ 6	319	0
6 < x ≤ 7	256	0
7 < x ≤ 8	185	1
8 < x ≤ 9	143	2
9 < x ≤ 10	96	15
10 < x ≤ 11	47	21
11 < x ≤ 12	30	45
12 < x ≤ 13	4	95
13 < x ≤ 14	4	202
14 < x ≤ 15	1	332
15 < x ≤ 16	0	411
16 < x ≤ 17	0	396
17 < x ≤ 18	0	298
18 < x ≤ 19	0	134
19 < x ≤ 20	0	41
20 < x < ∞	0	7

[BLW scores for 2000 German (red) and random (blue) text
chunks of 10 letters each]

Frequencies of BLW scores for French vs. random texts

Score	Random	French
0 ≤ x ≤ 1	122	0
1 < x ≤ 2	195	0
2 < x ≤ 3	266	0
3 < x ≤ 4	315	0
4 < x ≤ 5	274	0
5 < x ≤ 6	264	0
6 < x ≤ 7	215	2
7 < x ≤ 8	140	0
8 < x ≤ 9	94	10
9 < x ≤ 10	53	15
10 < x ≤ 11	29	31
11 < x ≤ 12	21	50
12 < x ≤ 13	8	114
13 < x ≤ 14	2	239
14 < x ≤ 15	2	322
15 < x ≤ 16	0	415
16 < x ≤ 17	0	420
17 < x ≤ 18	0	258
18 < x ≤ 19	0	115
19 < x ≤ 20	0	8
20 < x < ∞	0	1

[BLW scores for 2000 French (red) and random (blue) text
chunks of 10 letters each]

Summary

The empirical results for the 5% error level are as follows:

English.: We take the threshold value T = 11 for English texts. Then 86 of 2000 English scores are ≤ T, the error is 86/2000 = 4.2%.
For random texts 1964 of 2000 scores are ≤ T, the power is 1964/2000 = 99.5%.
There are 36 random scores and 1914 English scores > T, the predictive value for English is 1914/1950 = 98.2%.
German.: We take the threshold value T = 12 for German texts. Then 84 of 2000 German scores are ≤ T, the error is 84/2000 = 4.2%.
For random texts 1991 of 2000 scores are ≤ T, the power is 1991/2000 = 99.6%.
There are 9 random scores and 1916 German scores > T, the predictive value for German is 1916/1925 = 99.5%.
French.: We take the threshold value T = 11 for French texts. Then 58 of 2000 French scores are ≤ T$, the error is 58/2000 = 2.9%.
For random texts 1967 of 2000 scores are ≤ T, the power is 1967/2000 = 98.3%.
There are 33 random scores and 1942 French scores > T, the predictive value for French is 1942/1975 = 98.3%.

The BLW score is significantly stronger than the MFL score.

Author: Klaus Pommerening, 2014-Jun-10; last change: 2014-Jun-10.