SE250:lab-5:jhor053
Lab 5
Task 1
With:
int sample_size = 1000; int n_keys = 200; int table_size = 100;
My sample size is a fairly large as we wanted to test to make sure the data is being tested to make sure ti is random enough. I also chose to make sure my ratio of keys to table size is 2 to make sure it can handle more keys than table size,
My results are:
For low_entropy_src Type Entropy ChiSq Mean Pi % er S. C. C. buzhash 7.84379 95.00% 128.086 0.29% -0.017268 buzhashn 7.82387 90.00% 127.373 1.06% -0.007118 hash_CRC 4.04588 0.01% 94.848 27.32% -0.395249 base256 0.00000 0.01% 97.000 27.32% undefined Java_Integer 2.79173 0.01% 31.125 27.32% -0.230200 Java_Object 2.00000 0.01% 77.000 27.32% -0.521556 Java_String 7.91760 99.99% 126.441 1.25% 0.003240 rand 7.71844 0.01% 110.541 8.92% -0.048389 high_rand 7.79205 25.00% 134.546 4.12% -0.028254 Now for typical_entropy_src Type Entropy ChiSq Mean Pi % er S. C. C. buzhash 7.79778 50.00% 126.574 4.31% -0.007005 buzhashn 7.82387 90.00% 127.373 1.06% -0.007118 hash_CRC 4.21252 0.01% 92.006 26.56% -0.465003 base256 0.00000 0.01% 97.000 27.32% undefined Java_Integer 2.79173 0.01% 31.125 27.32% -0.230200 Java_Object 2.00000 0.01% 77.000 27.32% -0.521556 Java_String 7.90224 99.99% 126.914 6.61% 0.025449 rand 7.76960 5.00% 112.412 11.98% -0.044490 high_rand 7.82756 90.00% 128.999 1.82% -0.025330
The difference between rand and high_rand is that high rand is generally better but comes at some slight processing power and memory use. High rand tends to 'over' randomize where as rand just 'under' randomizes (ie tends to be biased to under values rather than higher values).
I would rate in order from best to worse... 1, buzhash it generally turned out teh better random variables and overall got the better values compared to the 'expected random values'. 2, buzhashn, was more reliable across typical entropy and low entropy values 3, high_rand was off a bit more but still acceptable 4, rand was just below high random as even though its values are good its still off (+- but lower than) high_rand. 5, Java_string was good but its ChiSq value let it down as its putting it in the extreme for randomness. 6th, Java_integer was slightly better than the below but still really fail it seems. 7th equal, Java_Object, hash_CRC, and base256 failed as the tests didn't seems all that reliable to go off these tests and seems to be quite off 'expected random variables' and generally low quality.
Task 2
Overall
A very good intro to hashing, and great input from John H for explaining the different concepts and tests for randomness. The length was good (I was a bit slow this morn :S) A bit more explanation of the different test on the handout would have helped too though.