SE250:lab-5:dols008
Task 1
I used a sample size of 100000 because beyond that it seemed to make very little difference to the results.
hash function entropy exceed mean error correlation Buzhash low 7.99830 75.00% 127.578 0.71% 0.000327 Buzhash typical 7.99783 2.50% 127.374 0.77% -0.000076 Buzhash n low 7.99824 50.00% 127.494 0.13% -0.003092 Buzhash n typical 7.99824 50.00% 127.494 0.13% -0.003092 Hash CRC low 5.59831 0.01% 81.783 24.53% 0.028545 Hash CRC typical 7.84046 0.01% 122.862 1.98% 0.018518 Base256 low 0.00000 0.01% 97.000 27.32% undefined Base256 typical 4.02297 0.01% 107.853 27.32% 0.034082 Add java integer low 4.82824 0.01% 43.883 27.32% -0.092002 Add java integer typical 4.82824 0.01% 43.883 27.32% -0.092002 Add java object low 2.00000 0.01% 77.000 27.32% -0.521556 Add java object typical 5.72209 0.01% 117.318 2.95% -0.350088 Add java string low 7.99957 99.99% 127.627 0.32% -0.000272 Add java string typical 7.94554 0.01% 126.139 0.27% 0.021181 Add rand low 7.95308 0.01% 111.441 11.17% -0.051837 Add rand typical 7.95272 0.01% 111.395 10.65% -0.049131 Add high rand low 7.99828 75.00% 127.441 0.75% -0.001213 Add high rand typical 7.99807 50.00% 127.406 0.07% -0.002226
The more random a hash function is the better. So I expect the better hash functions to have higher entropy, mean closer to 127.5 and correlation closer to 0. I don't understand the chi square distribution. Good hash functions would probably result in a more accurate calculation of pi, but the other number seem more concrete. So, based on these criteria, I would rank the hash functions from best to worst like this:
Buzhash n, Buzhash, java string, hash CRC, java object, java integer, base256.
It does look like high rand is more random than rand, and is about as good as the better hash functions. Java string seems to be better than everything else for low entropy input, but not quite as good for typical entropy input.
Task 2
Buzhash n and java integer didn't work because they have a different function signature. Here are my results:
Buzhash low 60000/40000: llps = 8, expecting 8.8452 Buzhash typical 60000/40000: llps = 9, expecting 8.8452 Hash CRC low 60000/40000: llps = 17, expecting 8.8452 Hash CRC typical 60000/40000: llps = 12, expecting 8.8452 Base256 low 60000/40000: llps = 60000, expecting 8.8452 Base256 typical 60000/40000: llps = 2020, expecting 8.8452 Java object low 60000/40000: llps = 60000, expecting 8.8452 Java object typical 60000/40000: llps = 22, expecting 8.8452 Java string low 60000/40000: llps = 4, expecting 8.8452 Java string typical 60000/40000: llps = 10, expecting 8.8452
Java string is debatably the best hash function. Buzhash performed slightly better for typical data, and a fair bit worse for low entropy data. The prize for worst hash function goes to base256, with java object similarly terrible.