SE250:lab-5:stsa027

From Marks Wiki
Revision as of 05:19, 3 November 2008 by Mark (Sọ̀rọ̀ | contribs) (2 revision(s))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Had a slight problem when trying to compile all .c files together. Was supposed to use '*.c'.

Had another slight problem - misunderstood the meanings of the parameters we were supposed to set for ourselves. When I understood properly, the three values were 10,000, 7,500 and 2,500.

sample_size was 10000. Chosen because I thought it was adequately large. n_keys was 7500, with the table_size 1/3 of that, at 2500, which means that each linked list in the hash table will be at least 3 elements long, and which I thought was a good test of the hash functions.

Results

buzhash

buzhash low	7.98550	99.00%	125.825	1.26%	-0.000047
buzhash low 7500/2500: llps = 10, expecting 10.6686
buzhash typical	7.97970	25.00%	127.177	1.95%	-0.007153
buzhash typcial 7500/2500: llps = 10, expecting 10.6686
buzhash	7.98550	99.00%	125.825	1.26%	-0.000047
buzhash RC 7500/2500: llps_i = 132, expecting 10.6686

buzhashn

buzhashn low	7.98214	50.00%	127.589	2.10%	0.000600
buzhashn low 7500/2500: llps = 7499, expecting 10.6686
lab-5.c:332: warning: passing arg 4 of `llps' from incompatible pointer type
buzhashn typical	7.98214	50.00%	127.589	2.10%	0.000600
buzhashn typcial 7500/2500: llps = 7499, expecting 10.6686
buzhashn	7.98214	50.00%	127.589	2.10%	0.000600
buzhashn RC 7500/2500: llps_i = 12, expecting 10.6686

hash_CRC

hash_CRC low	7.98214	50.00%	127.589	2.10%	0.000600
hash_CRC low 7500/2500: llps = 7499, expecting 10.6686
hash_CRC typical	7.98214	50.00%	127.589	2.10%	0.000600
hash_CRC typical 7500/2500: llps = 10, expecting 10.6686
hash_CRC	4.80204	0.01%	95.131	26.41%	-0.380808
hash_CRC 7500/2500: llps_i = 32, expecting 10.6686

Java_Integer

Java_Integer_hash low	4.11606	0.01%	32.385	27.32%	-0.217081
Java_Integer 7500/2500: llps = 5, expecting 10.6686
Java_Integer_hash typical	4.11606	0.01%	32.385	27.32%	-0.217081
Java_Integer typical 7500/2500: llps = 796, expecting 10.6686
Java_Integer	4.11606	0.01%	32.385	27.32%	-0.217081
Java_Integer 7500/2500: llps_i = 3, expecting 10.6686

Java_Object

Java_Object_hash low	2.00000	0.01%	77.000	27.32%	-0.521556
Java_Object  7500/2500: llps = 7500, expecting 10.6686
Java_Object typical	4.93157	0.01%	116.582	25.87%	-0.506511
Java_Object typical 7500/2500: llps = 22, expecting 10.6686
Java_Object	2.00000	0.01%	77.000	27.32%	-0.521556
Java_Object 7500/2500: llps_i = 7500, expecting 10.6686

Java_String

Java_String_hash low	7.99453	99.99%	127.683	1.03%	0.003125
Java_String low 7500/2500: llps = 6, expecting 10.6686
Java_String typical	7.89301	0.01%	126.009	1.42%	-0.021360
Java_String typical 7500/2500: llps = 10, expecting 10.6686
Java_String	7.99453	99.99%	127.683	1.03%	0.003125
Java_String 7500/2500: llps_i = 31, expecting 10.6686

base256

base256	low 0.00000	0.01%	97.000	27.32%	undefined
base256 low 7500/2500: llps = 7500, expecting 10.6686
base256 typical	3.90309	0.01%	107.987	27.32%	0.079032
base256 typical 7500/2500: llps = 221, expecting 10.6686
base256	0.00000	0.01%	97.000	27.32%	undefined
base256 7500/2500: llps_i = 32, expecting 10.6686

high_rand




Analysis

Theoretical Ranking

Theoretical Ranking from highest to lowest, using entropy values.

Low_Entropy

1. Java_String_hash low	     7.99453
2. buzhash low	             7.98550
3. hash_CRC low	             7.98214
4. buzhashn low	             7.98214
5. Java_Integer_hash low     4.11606
6. Java_Object_hash low	     2.00000
7. base256	             0.00000
Typical_Entropy

1. buzhashn typical	     7.98214
2. hash_CRC typical	     7.98214
3. buzhash typical	     7.97970
4. Java_String typical	     7.89301
5. Java_Object typical	     4.93157
6. Java_Integer_hash typical 4.11606
7. base256 typical	     3.90309

The rankings between the different hash functions were fairly similar when using low_entropy or high_entropy values. base256 hash function performed the most poorly, with the lowest entropy value in both cases. The highest ranking 4 hash functions in both cases are the same.


Practical Ranking

Practical ranking using worst case list length.

Low_Entropy

Prac. Ranking ( Theoretical Ranking)  longest list expected longest list

1.(5) Java_Integer 7500/2500:         llps = 5,    expecting 10.6686
2.(1) Java_String low 7500/2500:      llps = 6,    expecting 10.6686
3.(2) buzhash low 7500/2500:          llps = 10,   expecting 10.6686
4.(3) hash_CRC typical 7500/2500:     llps = 10,   expecting 10.6686
5.(4) buzhashn low 7500/2500:         llps = 7499, expecting 10.6686
6.(7) base256 low 7500/2500:          llps = 7500, expecting 10.6686
7.(6) Java_Object  7500/2500:         llps = 7500, expecting 10.6686

As you can see, the hash functions gave had a high entropy value of ~7 (the thoretically best 4) generally perform fairly well in the practical test, except for the buzhashn function, which gave a very long linked list (7499). Also, the Java_Interger_hash function performed much better than suggested by the thoretical entropy value.

Typical_Entropy

Prac. Ranking ( Theoretical Ranking)  longest list expected longest list

1.(3) buzhash typcial 7500/2500:      llps = 10,   expecting 10.6686
2.(4) Java_String typical 7500/2500:  llps = 10,   expecting 10.6686
3.(2) hash_CRC typical 7500/2500:     llps = 10,   expecting 10.6686
4.(5) Java_Object typical 7500/2500:  llps = 22,   expecting 10.6686
5.(7) base256 typical 7500/2500:      llps = 221,  expecting 10.6686
6.(6) Java_Integer typical 7500/2500: llps = 796,  expecting 10.6686
7.(1) buzhashn typcial 7500/2500:     llps = 7499, expecting 10.6686

Hash functions generally perform better when the inputs are from typical_entropy_src, where base256 and Java_Object no longer have 7500 element long linked lists. However, buzhashn performed equally poorly as before, unexpected from the high entropy value obtained theoretically.

The expected worst case was quite accurate when values of typical entropy were used, with the best 3 performing ones having a worst case of 10, when 10.6686 was expected.

However the expected worst case was not as accurate