SE250:lab-5:stsa027
Had a slight problem when trying to compile all .c files together. Was supposed to use '*.c'.
Had another slight problem - misunderstood the meanings of the parameters we were supposed to set for ourselves. When I understood properly, the three values were 10,000, 7,500 and 2,500.
sample_size was 10000. Chosen because I thought it was adequately large. n_keys was 7500, with the table_size 1/3 of that, at 2500, which means that each linked list in the hash table will be at least 3 elements long, and which I thought was a good test of the hash functions.
Results
buzhash
buzhash low 7.98550 99.00% 125.825 1.26% -0.000047 buzhash low 7500/2500: llps = 10, expecting 10.6686
buzhash typical 7.97970 25.00% 127.177 1.95% -0.007153 buzhash typcial 7500/2500: llps = 10, expecting 10.6686
buzhash 7.98550 99.00% 125.825 1.26% -0.000047 buzhash RC 7500/2500: llps_i = 132, expecting 10.6686
buzhashn
buzhashn low 7.98214 50.00% 127.589 2.10% 0.000600 buzhashn low 7500/2500: llps = 7499, expecting 10.6686
lab-5.c:332: warning: passing arg 4 of `llps' from incompatible pointer type buzhashn typical 7.98214 50.00% 127.589 2.10% 0.000600 buzhashn typcial 7500/2500: llps = 7499, expecting 10.6686
buzhashn 7.98214 50.00% 127.589 2.10% 0.000600 buzhashn RC 7500/2500: llps_i = 12, expecting 10.6686
hash_CRC
hash_CRC low 7.98214 50.00% 127.589 2.10% 0.000600 hash_CRC low 7500/2500: llps = 7499, expecting 10.6686
hash_CRC typical 7.98214 50.00% 127.589 2.10% 0.000600 hash_CRC typical 7500/2500: llps = 10, expecting 10.6686
hash_CRC 4.80204 0.01% 95.131 26.41% -0.380808 hash_CRC 7500/2500: llps_i = 32, expecting 10.6686
Java_Integer
Java_Integer_hash low 4.11606 0.01% 32.385 27.32% -0.217081 Java_Integer 7500/2500: llps = 5, expecting 10.6686
Java_Integer_hash typical 4.11606 0.01% 32.385 27.32% -0.217081 Java_Integer typical 7500/2500: llps = 796, expecting 10.6686
Java_Integer 4.11606 0.01% 32.385 27.32% -0.217081 Java_Integer 7500/2500: llps_i = 3, expecting 10.6686
Java_Object
Java_Object_hash low 2.00000 0.01% 77.000 27.32% -0.521556 Java_Object 7500/2500: llps = 7500, expecting 10.6686
Java_Object typical 4.93157 0.01% 116.582 25.87% -0.506511 Java_Object typical 7500/2500: llps = 22, expecting 10.6686
Java_Object 2.00000 0.01% 77.000 27.32% -0.521556 Java_Object 7500/2500: llps_i = 7500, expecting 10.6686
Java_String
Java_String_hash low 7.99453 99.99% 127.683 1.03% 0.003125 Java_String low 7500/2500: llps = 6, expecting 10.6686
Java_String typical 7.89301 0.01% 126.009 1.42% -0.021360 Java_String typical 7500/2500: llps = 10, expecting 10.6686
Java_String 7.99453 99.99% 127.683 1.03% 0.003125 Java_String 7500/2500: llps_i = 31, expecting 10.6686
base256
base256 low 0.00000 0.01% 97.000 27.32% undefined base256 low 7500/2500: llps = 7500, expecting 10.6686
base256 typical 3.90309 0.01% 107.987 27.32% 0.079032 base256 typical 7500/2500: llps = 221, expecting 10.6686
base256 0.00000 0.01% 97.000 27.32% undefined base256 7500/2500: llps_i = 32, expecting 10.6686
high_rand
Analysis
Theoretical Ranking
Theoretical Ranking from highest to lowest, using entropy values.
Low_Entropy 1. Java_String_hash low 7.99453 2. buzhash low 7.98550 3. hash_CRC low 7.98214 4. buzhashn low 7.98214 5. Java_Integer_hash low 4.11606 6. Java_Object_hash low 2.00000 7. base256 0.00000
Typical_Entropy 1. buzhashn typical 7.98214 2. hash_CRC typical 7.98214 3. buzhash typical 7.97970 4. Java_String typical 7.89301 5. Java_Object typical 4.93157 6. Java_Integer_hash typical 4.11606 7. base256 typical 3.90309
The rankings between the different hash functions were fairly similar when using low_entropy or high_entropy values. base256 hash function performed the most poorly, with the lowest entropy value in both cases. The highest ranking 4 hash functions in both cases are the same.
Practical Ranking
Practical ranking using worst case list length.
Low_Entropy Prac. Ranking ( Theoretical Ranking) longest list expected longest list 1.(5) Java_Integer 7500/2500: llps = 5, expecting 10.6686 2.(1) Java_String low 7500/2500: llps = 6, expecting 10.6686 3.(2) buzhash low 7500/2500: llps = 10, expecting 10.6686 4.(3) hash_CRC typical 7500/2500: llps = 10, expecting 10.6686 5.(4) buzhashn low 7500/2500: llps = 7499, expecting 10.6686 6.(7) base256 low 7500/2500: llps = 7500, expecting 10.6686 7.(6) Java_Object 7500/2500: llps = 7500, expecting 10.6686
As you can see, the hash functions gave had a high entropy value of ~7 (the thoretically best 4) generally perform fairly well in the practical test, except for the buzhashn function, which gave a very long linked list (7499). Also, the Java_Interger_hash function performed much better than suggested by the thoretical entropy value.
Typical_Entropy Prac. Ranking ( Theoretical Ranking) longest list expected longest list 1.(3) buzhash typcial 7500/2500: llps = 10, expecting 10.6686 2.(4) Java_String typical 7500/2500: llps = 10, expecting 10.6686 3.(2) hash_CRC typical 7500/2500: llps = 10, expecting 10.6686 4.(5) Java_Object typical 7500/2500: llps = 22, expecting 10.6686 5.(7) base256 typical 7500/2500: llps = 221, expecting 10.6686 6.(6) Java_Integer typical 7500/2500: llps = 796, expecting 10.6686 7.(1) buzhashn typcial 7500/2500: llps = 7499, expecting 10.6686
Hash functions generally perform better when the inputs are from typical_entropy_src, where base256 and Java_Object no longer have 7500 element long linked lists. However, buzhashn performed equally poorly as before, unexpected from the high entropy value obtained theoretically.
The expected worst case was quite accurate when values of typical entropy were used, with the best 3 performing ones having a worst case of 10, when 10.6686 was expected.
However the expected worst case was not as accurate