SE250:lab-5:hals016
Task 1
For this task I chose the follow values: int sample_size = 200; int n_keys = 10000; int table_size = 100;
A sample size of 200 seems fair looking at a scenario of a small company assigning ID's to their employees.
BuzHash Low
Output for Buzhash Low Testing Buzhash low on 200 samples Entropy = 6.961838 bits per byte. Optimum compression would reduce the size of this 200 byte file by 12 percent. Chi square distribution for 200 samples is 271.04, and randomly would exceed this value 25.00 percent of the times. Arithmetic mean value of data bytes is 129.8200 (127.5 = random). Monte Carlo value for Pi is 3.030303030 (error 3.54 percent). Serial correlation coefficient is -0.140593 (totally uncorrelated = 0.0). Buzhash low 10000/100: llps = 134, expecting 125.959
The randomness of the Buzhash isn't very good given these results. The chi square distribution is only exceeded 25% of the time. Also the serial correlation coefficient is quite low.
BuzHash Typical
Output for Buzhash Typical Testing Buzhash typical on 200 samples Entropy = 6.987435 bits per byte. Optimum compression would reduce the size of this 200 byte file by 12 percent. Chi square distribution for 200 samples is 240.32, and randomly would exceed this value 50.00 percent of the times. Arithmetic mean value of data bytes is 128.0400 (127.5 = random). Monte Carlo value for Pi is 3.272727273 (error 4.17 percent). Serial correlation coefficient is -0.005251 (totally uncorrelated = 0.0). Buzhash typical 10000/100: llps = 127, expecting 125.959 Press any key to continue . . .
Comparing these results with the buzhash low, I think they represent more randomness. This is because the chi square is now a very good value of 50%, the arithmetic value is closer to 127.5(the random value) and also the serial correlation coefficient is a lot closer to 0.
Buzhashn low
Testing Buzhashn low on 200 samples Entropy = 7.094984 bits per byte. Optimum compression would reduce the size of this 200 byte file by 11 percent. Chi square distribution for 200 samples is 209.60, and randomly would exceed this value 97.50 percent of the times. Arithmetic mean value of data bytes is 120.9150 (127.5 = random). Monte Carlo value for Pi is 3.151515152 (error 0.32 percent). Serial correlation coefficient is 0.099943 (totally uncorrelated = 0.0). Buzhashn low 10000/100: llps = 133, expecting 125.959 Press any key to continue . . .
The buzhashn low, compared to both buzhash's, does a lot better with the monte carlo value for pi, although not so well with the arithmetic value and llps.
Buzhashn typical
Testing Buzhashn typical on 200 samples Entropy = 7.094984 bits per byte. Optimum compression would reduce the size of this 200 byte file by 11 percent. Chi square distribution for 200 samples is 209.60, and randomly would exceed this value 97.50 percent of the times. Arithmetic mean value of data bytes is 120.9150 (127.5 = random). Monte Carlo value for Pi is 3.151515152 (error 0.32 percent). Serial correlation coefficient is 0.099943 (totally uncorrelated = 0.0). Buzhashn typical 10000/100: llps = 127, expecting 125.959
The buzhashn typical looks the same as buzhashn low, except that the llps is a lot closer to the expected value.
hash_CRC low
Testing hash_CRC low on 200 samples Entropy = 3.470509 bits per byte. Optimum compression would reduce the size of this 200 byte file by 56 percent. Chi square distribution for 200 samples is 7305.92, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 94.3400 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is -0.390902 (totally uncorrelated = 0.0). hash_CRC low 10000/100: llps = 405, expecting 125.959
These values compared with the buzhash/buzhashn are a lot lower/worse. The 2 major ones are: llps 405 expecting 125.96 and the chi square would be exceeded by 0.01%.
hash_CRC typical
Testing hash_CRC typical on 200 samples Entropy = 6.059310 bits per byte. Optimum compression would reduce the size of this 200 byte file by 24 percent. Chi square distribution for 200 samples is 934.08, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 94.7650 (127.5 = random). Monte Carlo value for Pi is 3.272727273 (error 4.17 percent). Serial correlation coefficient is 0.129518 (totally uncorrelated = 0.0). hash_CRC typical 10000/100: llps = 146, expecting 125.959
Results are similar to hash_CRC low. However the monte carlo pi and llps have improved dramatically.
base256 low
Testing base256 low on 200 samples Entropy = 3.987359 bits per byte. Optimum compression would reduce the size of this 200 byte file by 50 percent. Chi square distribution for 200 samples is 4146.88, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 101.0700 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is 0.290495 (totally uncorrelated = 0.0). base256 low 10000/100: llps = 10000, expecting 125.959
Largest difference between expected llps and actual llps (as of yet), 1000-125.96. Chi square is a large and undesirable number.
base256 typical
Testing base256 typical on 200 samples Entropy = 3.987359 bits per byte. Optimum compression would reduce the size of this 200 byte file by 50 percent. Chi square distribution for 200 samples is 4146.88, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 101.0700 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is 0.290495 (totally uncorrelated = 0.0). base256 typical 10000/100: llps = 671, expecting 125.959
Little improvements from base256 low, more or less similar.
Java_Integer_hash low
Testing Java_Integer_hash low on 200 samples Entropy = 2.178861 bits per byte. Optimum compression would reduce the size of this 200 byte file by 72 percent. Chi square distribution for 200 samples is 29048.00, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 6.1250 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is -0.227907 (totally uncorrelated = 0.0). Java_Integer_hash low 10000/100: llps = 109, expecting 125.959
Java_Integer_hash typical
Testing Java_Integer_hash typical on 200 samples Entropy = 2.178861 bits per byte. Optimum compression would reduce the size of this 200 byte file by 72 percent. Chi square distribution for 200 samples is 29048.00, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 6.1250 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is -0.227907 (totally uncorrelated = 0.0). Java_Integer_hash typical 10000/100: llps = 932, expecting 125.959
Java_Object_hash low
Testing Java_Object_hash low on 200 samples Entropy = 2.000000 bits per byte. Optimum compression would reduce the size of this 200 byte file by 75 percent. Chi square distribution for 200 samples is 12600.00, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 95.5000 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is -0.037404 (totally uncorrelated = 0.0). Java_Object_hash low 10000/100: llps = 10000, expecting 125.959
Java_Object_hash typical
Testing Java_Object_hash typical on 200 samples Entropy = 4.511741 bits per byte. Optimum compression would reduce the size of this 200 byte file by 43 percent. Chi square distribution for 200 samples is 3604.16, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 78.2500 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is -0.645940 (totally uncorrelated = 0.0). Java_Object_hash typical 10000/100: llps = 406, expecting 125.959
Java_String_hash low
Testing Java_String_hash low on 200 samples Entropy = 7.093661 bits per byte. Optimum compression would reduce the size of this 200 byte file by 11 percent. Chi square distribution for 200 samples is 214.72, and randomly would exceed this value 95.00 percent of the times. Arithmetic mean value of data bytes is 130.9200 (127.5 = random). Monte Carlo value for Pi is 3.030303030 (error 3.54 percent). Serial correlation coefficient is 0.052529 (totally uncorrelated = 0.0). Java_String_hash low 10000/100: llps = 109, expecting 125.959
Java_String_hash typical
Testing Java_String_hash typical on 200 samples Entropy = 6.193853 bits per byte. Optimum compression would reduce the size of this 200 byte file by 22 percent. Chi square distribution for 200 samples is 839.36, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 108.6100 (127.5 = random). Monte Carlo value for Pi is 3.515151515 (error 11.89 percent). Serial correlation coefficient is 0.103661 (totally uncorrelated = 0.0). Java_String_hash typical 10000/100: llps = 123, expecting 125.959
rand low
Testing rand low on 200 samples Entropy = 4.057145 bits per byte. Optimum compression would reduce the size of this 200 byte file by 49 percent. Chi square distribution for 200 samples is 13296.32, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 44.6150 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is -0.045063 (totally uncorrelated = 0.0). rand low 10000/100: llps = 132, expecting 125.959
Comparing this Unix random number with the other hash functions, its not so good and at the same time not so bad.
rand typical
Testing rand typical on 200 samples Entropy = 4.057145 bits per byte. Optimum compression would reduce the size of this 200 byte file by 49 percent. Chi square distribution for 200 samples is 13296.32, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 44.6150 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is -0.045063 (totally uncorrelated = 0.0). rand typical 10000/100: llps = 132, expecting 125.959
Similar results to rand low.
high_rand low
Testing high_rand low on 200 samples Entropy = 0.000000 bits per byte. Optimum compression would reduce the size of this 200 byte file by 100 percent. Chi square distribution for 200 samples is 51000.00, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 0.0000 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is undefined (all values equal!). high_rand low 10000/100: llps = 133, expecting 125.959
Randomness ... not so random compared to everything else, I can only predict the typical to be similar.
high_rand typical
Testing high_rand typical on 200 samples Entropy = 0.000000 bits per byte. Optimum compression would reduce the size of this 200 byte file by 100 percent. Chi square distribution for 200 samples is 51000.00, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of data bytes is 0.0000 (127.5 = random). Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is undefined (all values equal!). high_rand typical 10000/100: llps = 133, expecting 125.959
Similar to high_rand low.
Conclusion and Functions Ranked in Order of Randomness
The Unix results are quite low compared to the other hash functions and do not produce true randomness.
I have ranked the functions as follows(based on my judgment):
1) BuzHash (very dense information storage) 2) BuzHashn (dense information storage) 3) Java_String_Hash (low entropy, not so much typical entropy) 4) hash_CRC (typical entropy, not so much low entropy) 5) Java_Object_hash 6) Java_Integer_hash 7) base256 8) rand 9) high_rand