The heuristic motivation of the BLW score, like for all the scores in this chapter, relies on independence assumptions that are clearly violated by natural languages. Therefore again it makes sense to get empirical results by analyzing a large sample of concrete texts.
We extract 20000 letters from each of the texts Kim, Schachnovelle, and De la Terre à la Lune, and decompose them into 2000 chunks à 10 letters, see the files eng10a.txt, ger10a.txt, fra10a.txt.
Likewise we generate random texts, see rnd10Ea.txt, rnd10Da.txt, rnd10Fa.txt.
We collect the results in the spreadsheets ER10res.xls, DR10res.xls, FR10res.xls.
The results are summarized in the following tables and figures.
Score | Random | English |
---|---|---|
0 ≤ x ≤ 1 | 32 | 0 |
1 < x ≤ 2 | 97 | 0 |
2 < x ≤ 3 | 187 | 0 |
3 < x ≤ 4 | 254 | 0 |
4 < x ≤ 5 | 324 | 3 |
5 < x ≤ 6 | 301 | 1 |
6 < x ≤ 7 | 271 | 4 |
7 < x ≤ 8 | 216 | 1 |
8 < x ≤ 9 | 156 | 8 |
9 < x ≤ 10 | 77 | 18 |
10 < x ≤ 11 | 49 | 51 |
11 < x ≤ 12 | 25 | 120 |
12 < x ≤ 13 | 6 | 196 |
13 < x ≤ 14 | 3 | 322 |
14 < x ≤ 15 | 2 | 413 |
15 < x ≤ 16 | 0 | 406 |
16 < x ≤ 17 | 0 | 255 |
17 < x ≤ 18 | 0 | 157 |
18 < x ≤ 19 | 0 | 40 |
19 < x < ∞ | 0 | 5 |
Score | Random | German |
---|---|---|
0 ≤ x ≤ 1 | 38 | 0 |
1 < x ≤ 2 | 105 | 0 |
2 < x ≤ 3 | 207 | 0 |
3 < x ≤ 4 | 269 | 0 |
4 < x ≤ 5 | 296 | 0 |
5 < x ≤ 6 | 319 | 0 |
6 < x ≤ 7 | 256 | 0 |
7 < x ≤ 8 | 185 | 1 |
8 < x ≤ 9 | 143 | 2 |
9 < x ≤ 10 | 96 | 15 |
10 < x ≤ 11 | 47 | 21 |
11 < x ≤ 12 | 30 | 45 |
12 < x ≤ 13 | 4 | 95 |
13 < x ≤ 14 | 4 | 202 |
14 < x ≤ 15 | 1 | 332 |
15 < x ≤ 16 | 0 | 411 |
16 < x ≤ 17 | 0 | 396 |
17 < x ≤ 18 | 0 | 298 |
18 < x ≤ 19 | 0 | 134 |
19 < x ≤ 20 | 0 | 41 |
20 < x < ∞ | 0 | 7 |
Score | Random | French |
---|---|---|
0 ≤ x ≤ 1 | 122 | 0 |
1 < x ≤ 2 | 195 | 0 |
2 < x ≤ 3 | 266 | 0 |
3 < x ≤ 4 | 315 | 0 |
4 < x ≤ 5 | 274 | 0 |
5 < x ≤ 6 | 264 | 0 |
6 < x ≤ 7 | 215 | 2 |
7 < x ≤ 8 | 140 | 0 |
8 < x ≤ 9 | 94 | 10 |
9 < x ≤ 10 | 53 | 15 |
10 < x ≤ 11 | 29 | 31 |
11 < x ≤ 12 | 21 | 50 |
12 < x ≤ 13 | 8 | 114 |
13 < x ≤ 14 | 2 | 239 |
14 < x ≤ 15 | 2 | 322 |
15 < x ≤ 16 | 0 | 415 |
16 < x ≤ 17 | 0 | 420 |
17 < x ≤ 18 | 0 | 258 |
18 < x ≤ 19 | 0 | 115 |
19 < x ≤ 20 | 0 | 8 |
20 < x < ∞ | 0 | 1 |
The empirical results for the 5% error level are as follows:
The BLW score is significantly stronger than the MFL score.