SE250:lab-5:klod004
Lab 5 This lab was dealing with hashing, which I think is a rather confusing topic. There are only 2 tasks involved. First is to compare the randomness of the Hash Functions provided and the other to measure the performance of the hash functions.
Task 1
Arithmetic Mean: 127.5 is the expected value (Centre between highest and lowest values).
Monte Carlo: the error generated by this number should be as close to 0 as possible for higher randomness.
Serial Correlation: value should be close to 0 to show that there is no relationship between the values.
Sample_size is the amount of values that we want to put into the table.
n_keys is the number of rows in the table.
table_size is the size of the table itself.
The values for sample_size, n_keys and table_size was:
Results are in the format: Entropy, Chi Square, Arithmetic Mean, Monte Carlo, Serial Correlation, llps, expectation
Trial 1: 70, 100, 50. Results: 6.003258, 226.67, 125.2917, 3.0000, -.106681, 3, 2.94456.
Trial 2: 70, 200, 50. Results: 6.003258, 226.67, 125.2917, 3.0000, -0.106681, 9, 9.11619.
Trial 3: 70, 50, 200. Results: 6.003258, 226.67, 125.2917, 3.0000, -1.006681, 2, 2.37434.
Trial 4: 10000, 100, 2000. Results: 7.985498, 201.50, 125.8253, 3.181272509, -0.000047, 2, 1.9509.
Trial 5: 5000, 40, 1000. Results: 7.969523, 208.17, 127.0374, 3.121248499, 0.009778, 2, 1.55167.
Trial 6: 50, 4, 20. Results:5.585055, 233.54, 125.6731, 3.0000, -0.180685, 1, 1.30341
Trial 7: 10000, 1000, 100000. Results: 7.985498, 99%, 125.8253, 1.26%, -0.000047, 2, 2.00948
Now after talking with the lecturer, different set of trials were made; the typical entropy set was used as well; other functions were also used to find good values for n_keys and table_size. The value for the sample space was considered to be good as the entropy was more settled at 5000+ values.
After the new trials, all functions with typical and low entropy; the results yielded was:
Values used were: SS = 10000, keys = 1000, table = 50000.
Buzhash low 7.98550 99.00% 125.825 1.26% -0.000047 Buzhash typical 7.97970 25.00% 127.177 1.95% -0.007153 Buzhashn low 7.98214 50.00% 127.589 2.10% 0.000600 Buzhashn typical 7.98214 50.00% 127.589 2.10% 0.000600 hash_CRC low 4.85247 0.01% 99.261 26.10% -0.285059 hash_CRC typical 7.77880 0.01% 123.087 1.11% -0.002250 base256 low 0.00000 0.01% 97.000 27.32% undefined base256 typical 3.90309 0.01% 107.987 27.32% 0.079032 Java Integer low 4.11606 0.01% 32.385 27.32% -0.217081 Java Integer typical 4.11606 0.01% 32.385 27.32% -0.217081 Java Object low 2.00000 0.01% 77.000 27.32% -0.521556 Java Object typical 4.93851 0.01% 114.920 1.26% -0.404698 Java String low 7.99463 99.99% 127.867 1.95% 0.009136 Java String typical 7.89301 0.01% 126.009 1.42% -0.021360 RAND low 7.93533 0.01% 112.054 10.82% -0.066125 RAND typical 7.93951 0.01% 111.686 10.82% -0.060458 High RAND low 7.98301 75.00% 127.541 0.42% 0.001185 High RAND typical 7.98338 75.00% 128.211 1.49% -0.014177
Values used were: SS = 10000, keys = 300, table = 50000.
Buzhash low 7.98550 99.00% 125.825 1.26% -0.000047 Buzhash typical 7.97970 25.00% 127.177 1.95% -0.007153 Buzhashn low 7.98214 50.00% 127.589 2.10% 0.000600 Buzhashn typical 7.98214 50.00% 127.589 2.10% 0.000600 hash_CRC low 4.85247 0.01% 99.261 26.10% -0.285059 hash_CRC typical 7.77880 0.01% 123.087 1.11% -0.002250 base256 low 0.00000 0.01% 97.000 27.32% undefined base256 typical 3.90309 0.01% 107.987 27.32% 0.079032 Java Integer low 4.11606 0.01% 32.385 27.32% -0.217081 Java Integer typical 4.11606 0.01% 32.385 27.32% -0.217081 Java Object low 2.00000 0.01% 77.000 27.32% -0.521556 Java Object typical 4.93851 0.01% 122.920 28.77% -0.427386 Java String low 7.99463 99.99% 127.867 1.95% 0.009136 Java String typical 7.89301 0.01% 126.009 1.42% -0.021360 RAND low 7.93533 0.01% 112.054 10.82% -0.066125 RAND typical 7.93951 0.01% 111.686 10.82% -0.060458 High RAND low 7.98301 75.00% 127.541 0.42% 0.001185 High RAND typical 7.98338 75.00% 128.211 1.49% -0.014177
Values used were: SS = 10000, keys = 200, table = 20000.
Buzhash low 7.98550 99.00% 125.825 1.26% -0.000047 Buzhash typical 7.97970 25.00% 127.177 1.95% -0.007153 Buzhashn low 7.98214 50.00% 127.589 2.10% 0.000600 Buzhashn typical 7.98214 50.00% 127.589 2.10% 0.000600 hash_CRC low 4.85247 0.01% 99.261 26.10% -0.285059 hash_CRC typical 7.77880 0.01% 123.087 1.11% -0.002250 base256 low 0.00000 0.01% 97.000 27.32% undefined base256 typical 3.90309 0.01% 107.987 27.32% 0.079032 Java Integer low 4.11606 0.01% 32.385 27.32% -0.217081 Java Integer typical 4.11606 0.01% 32.385 27.32% -0.217081 Java Object low 2.00000 0.01% 77.000 27.32% -0.521556 Java Object typical 4.93851 0.01% 114.920 1.26% -0.404698 Java String low 7.99463 99.99% 127.867 1.95% 0.009136 Java String typical 7.89301 0.01% 126.009 1.42% -0.021360 RAND low 7.93533 0.01% 112.054 10.82% -0.066125 RAND typical 7.93951 0.01% 111.686 10.82% -0.060458 High RAND low 7.98301 75.00% 127.541 0.42% 0.001185 High RAND typical 7.98338 75.00% 128.211 1.49% -0.014177
With all these experiments, and by mostly trying to pass the Chi test as this was understood to be the hardest test to pass, the Buzhashn function was ranked to be the best function for hashing.
There was no time for me to do task 2, but all I could understand was that the values gotten from llps and llps_i should be as close to the expected value as possible.