SE250:lab-5:hpan027
Initial problems
A lot of time was spent at the start of the lab to try to understand the code and the statistical results.
Determining the sample size
To determine the sample size, I carried out a series of ent_test with different sample sizes.
Buzhash low 5 3.00000 50.00% 132.875 100.00% -0.509682 Buzhash low 10 3.58496 50.00% 107.667 36.34% 0.235256 Buzhash low 100 6.15366 2.50% 130.460 3.45% -0.088601 Buzhash low 1000 7.84646 97.50% 126.550 2.01% 0.006193 Buzhash low 10000 7.97970 25.00% 127.177 1.95% -0.007153 Buzhash low 100000 7.99827 50.00% 127.587 0.14% 0.000712 Buzhash low 1000000 7.99989 99.99% 127.501 0.23% -0.000832
Basically, the conclusion was somewhere around 10000 the results stop varying much, and hence 10,000 was the chosen sample size for the rest of the tests.
Results for part one
rand low 7.71844 0.01% 110.541 8.92% -0.048389 high rand low 7.79205 25.00% 134.546 4.12% -0.028254 buzhash low 7.84379 95.00% 128.086 0.29% -0.017268 buzhashn low 7.82387 90.00% 127.373 1.06% -0.007118 hash_CRC low 4.04588 0.01% 94.848 27.32% -0.395249 base256 low 0.00000 0.01% 97.000 27.32% undefined Java_Integer low 2.79173 0.01% 31.125 27.32% -0.230200 Java_Object low 2.00000 0.01% 77.000 27.32% -0.521556 Java_String low 7.91760 99.99% 126.441 1.25% 0.003240 rand high 7.76960 5.00% 112.412 11.98% -0.044490 high rand high 7.82756 90.00% 128.999 1.82% -0.025330 buzhash high 7.79778 50.00% 126.574 4.31% -0.007005 buzhashn high 7.82387 90.00% 127.373 1.06% -0.007118 hash_CRC high 7.20246 0.01% 114.932 2.01% -0.032076 base256 high 3.91922 0.01% 106.410 27.32% 0.217294 Java_Integer high 2.79173 0.01% 31.125 27.32% -0.230200 Java_Object high 3.77034 0.01% 41.971 27.32% -0.099688 Java_String high 7.37782 0.01% 117.390 8.92% -0.013887
Conclusions for part one
It was very difficult to determine the order of "randomness" of each function because it's hard to weigh each statistical test. In the end, the order was decided by how many categories each functions "won" in and how many they "lost" in.
1) buzhashn 2) buzhash 3) high rand 4) Java_String 5) rand 6) hash_CRC 7) base256 8) Java_Object 9) Java_Integer
- It was surprising in terms of "score" using a low entropy source and high entropy source actually didn't make much difference to the performance of the hash functions. Although the above rank was decided using an overall score from both low and high entropy source, the standings would not have changed much if we were to rank the functions separately.
- It is likely the ranks for randomness would change depending on sample size. This was clearly seen earlier when buzhash was ran multiple times. It is likely certain functions perform better within a certain range of numbers and hence are disadvantaged by this particular sample size.
Conclusions for part two
- Buzhash is pretty much consistent with the expected result
- Java_String seems to perform better than expected
- Java_Object seems to be broken for with a low entropy source
- hash_CRC tends to have a higher llps than the expected
Data for part two
Buzhash low 1000/10000: llps = 4, expecting 2.82556 hash_CRC low 1000/10000: llps = 2, expecting 2.82556 Java_Object_hash low 1000/10000: llps = 1000, expecting 2.82556 Java_String_hash low 1000/10000: llps = 1, expecting 2.82556 Buzhash low 1000000/100000: llps = 26, expecting 26.6057 hash_CRC low 1000000/100000: llps = 25, expecting 26.6057 Java_Object_hash low 1000000/100000: llps = 1000000, expecting 26.6057 Java_String_hash low 1000000/100000: llps = 18, expecting 26.6057 Buzhash low 10000/10000: llps = 7, expecting 6.67222 hash_CRC low 10000/10000: llps = 12, expecting 6.67222 Java_Object_hash low 10000/10000: llps = 10000, expecting 6.67222 Java_String_hash low 10000/10000: llps = 5, expecting 6.67222 Buzhash low 20000/10000: llps = 10, expecting 9.37449 hash_CRC low 20000/10000: llps = 22, expecting 9.37449 Java_Object_hash low 20000/10000: llps = 20000, expecting 9.37449 Java_String_hash low 20000/10000: llps = 6, expecting 9.37449 Buzhash low 40000/10000: llps = 15, expecting 13.7119 hash_CRC low 40000/10000: llps = 29, expecting 13.7119 Java_Object_hash low 40000/10000: llps = 40000, expecting 13.7119 Java_String_hash low 40000/10000: llps = 7, expecting 13.7119 Buzhash low 50000/10000: llps = 16, expecting 15.6448 hash_CRC low 50000/10000: llps = 49, expecting 15.6448 Java_Object_hash low 50000/10000: llps = 50000, expecting 15.6448 Java_String_hash low 50000/10000: llps = 10, expecting 15.6448 Buzhash low 100000/10000: llps = 22, expecting 24.2788 hash_CRC low 100000/10000: llps = 69, expecting 24.2788 Java_Object_hash low 100000/10000: llps = 100000, expecting 24.2788 Java_String_hash low 100000/10000: llps = 16, expecting 24.2788 Buzhash high 10000/100000: llps = 3, expecting 3.3271 hash_CRC high 10000/100000: llps = 3, expecting 3.3271 Java_Object_hash high 10000/100000: llps = 2, expecting 3.3271 Java_String_hash high 10000/100000: llps = 4, expecting 3.3271 Buzhash high 100000/100000: llps = 7, expecting 7.75952 hash_CRC high 100000/100000: llps = 9, expecting 7.75952 Java_Object_hash high 100000/100000: llps = 16, expecting 7.75952 Java_String_hash high 100000/100000: llps = 8, expecting 7.75952 Buzhash high 20000/10000: llps = 9, expecting 9.37449 hash_CRC high 20000/10000: llps = 11, expecting 9.37449 Java_Object_hash high 20000/10000: llps = 25, expecting 9.37449 Java_String_hash high 20000/10000: llps = 10, expecting 9.37449 Buzhash high 30000/10000: llps = 11, expecting 11.6473 hash_CRC high 30000/10000: llps = 12, expecting 11.6473 Java_Object_hash high 30000/10000: llps = 36, expecting 11.6473 Java_String_hash high 30000/10000: llps = 11, expecting 11.6473 Buzhash high 40000/10000: llps = 14, expecting 13.7119 hash_CRC high 40000/10000: llps = 15, expecting 13.7119 Java_Object_hash high 40000/10000: llps = 49, expecting 13.7119 Java_String_hash high 40000/10000: llps = 13, expecting 13.7119