SE250:lab-5:sgha014
So after i had donloaded all the files, John came and told me that the idea was to pick some values for sample size, n_keys and table_size. I had no clue as to what things i need to consider before picking a suitable value for each. The tutor suggested just start off by putting in any numbers and then seeing the output and trying to understand it.
i chose the following:
int sample_size = 1000; int n_keys = 1000; int table_size = 1000;
rt_add_buzhash (low entropy)
Testing Buzhash low on 1000 samples Entropy = 7.843786 bits per byte. Optimum compression would reduce the size of this 1000 byte file by 1 percent. Chi square distribution for the 1000 samples is 214.46, and randomly would exceed this value 95.00 percent of the times. Arithmetic mean value of the data bytes is 128.0860 <127.5 = random>. Monte Carlo value for Pi is 3.132530120 <error 0.29 percent>. Serial correlation coefficient is -0.017268 <totally uncorrelated = 0.0>. Buzhash low 1000/1000: llps = 6, expecting 5.51384
hmmmm...do they expect us to know what this means???? John says that i dont really need to understand it. and that i should just compare the functions. Hmmm ok i'll do that...
rt_add_buzhashn (low intropy)
Testing Buzhashn low on 1000 samples Entropy = 7.823873 bits per byte. Optimum compression would reduce the size of this 1000 byte file by 2 percent. Chi square distribution for the 1000 samples is 220.61, and randomly would exceed this value 90.00 percent of the times. Arithmetic mean value of the data bytes is 127.3730 <127.5 = random>. Monte Carlo value for Pi is 3.108433730 <error 1.06 percent>. Serial correlation coefficient is -0.007118 <totally uncorrelated = 0.0>. Buzhashn low 1000/1000: llps = 999, expecting 5.51384
This one seems pretty good since it was close to 127.5 and the error for the monte carlo was only 1.06percent and the correlation coefficent is close to zero. I dont know if what im saying is correct...im just gonna assume that it is. except im confused about the 999....well i dont even know what llps even means.
- i hav finally understood wat llsp is (it is now 12.04pm, im adding this note at the edn of the lab) llsp is the length of the longest probe sequence. so since this hash table has 1000 elements, the llsp of 999 means that in one element we are storing 999 things, and only 1 in another. which is reeeeeeeeely bad according to john coz this is a huge waste of space.
rt_add_hash_CRC (low entropy)
Testing hash_CRC low on 1000 samples Entropy = 3.965965 bits per byte. Optimum compression would reduce the size of this 1000 byte file by 50 percent. Chi square distribution for the 1000 samples is 36163.52, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of the data bytes is 93.6860 <127.5 = random>. Monte Carlo value for Pi is 4.000000000 <error 27.32 percent>. Serial correlation coefficient is -0.380754 <totally uncorrelated = 0.0>. hash_CRC low 1000/1000: llps = 11, expecting 5.51384
the value for pi was waaaaay off...this says that pi is 4!
rt_add_Java_Integer (low entropy)
Testing Java_Integer low on 1000 samples Entropy = 2.791730 bits per byte. Optimum compression would reduce the size of this 1000 byte file by 65 percent. Chi square distribution for the 1000 samples is 143448.00, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of the data bytes is 31.1250 <127.5 = random>. Monte Carlo value for Pi is 4.000000000 <error 27.32 percent>. Serial correlation coefficient is -0.230200 <totally uncorrelated = 0.0>. Java_Integer low 1000/1000: llps = 4, expecting 5.51384
getting 4 for pi again. i tried making the sample size smaller(changed it to 100) and Pi was still 4. then i made the table_size 100 and pi was still 4. Then i tried changing all three values to 100 and still no change....so im guessing i should just ignore this and move on.
rt_add_buzhash (typical entropy)
Testing Buzhash typical on 1000 samples Entropy = 7.797775bits per byte. Optimum compression would reduce the size of this 1000 byte file by 2 percent. Chi square distribution for the 1000 samples is 250.82, and randomly would exceed this value 50.00 percent of the times. Arithmetic mean value of the data bytes is 126.5740 <127.5 = random>. Monte Carlo value for Pi is 3.277108434 <error 4.31 percent>. Serial correlation coefficient is -0.007005 <totally uncorrelated = 0.0>. Buzhash typical 1000/1000: llps = 7, expecting 5.51384
rt_add_buzhashn (typical intropy)
Testing Buzhashn typical on 1000 samples Entropy = 7.823873 bits per byte. Optimum compression would reduce the size of this 1000 byte file by 2 percent. Chi square distribution for the 1000 samples is 220.61, and randomly would exceed this value 90.00 percent of the times. Arithmetic mean value of the data bytes is 127.3730 <127.5 = random>. Monte Carlo value for Pi is 3.108433730 <error 1.06 percent>. Serial correlation coefficient is -0.007118 <totally uncorrelated = 0.0>. Buzhashn typical 1000/1000: llps = 999, expecting 5.51384
these are the same as with low entropy...
rt_add_hash_CRC (typical entropy)
Testing hash_CRC typical on 1000 samples Entropy = 7.202459 bits per byte. Optimum compression would reduce the size of this 1000 byte file by 9 percent. Chi square distribution for the 1000 samples is 1660.86, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of the data bytes is 114.9320 <127.5 = random>. Monte Carlo value for Pi is 3.204819277 <error 2.01 percent>. Serial correlation coefficient is -0.032076 <totally uncorrelated = 0.0>. hash_CRC typical 1000/1000: llps = 7, expecting 5.51384
rt_add_Java_Integer (typical entropy)
Testing Java_Integer typical on 1000 samples Entropy = 2.791730 bits per byte. Optimum compression would reduce the size of this 1000 byte file by 65 percent. Chi square distribution for the 1000 samples is 143448.00, and randomly would exceed this value 0.01 percent of the times. Arithmetic mean value of the data bytes is 31.1250 <127.5 = random>. Monte Carlo value for Pi is 4.000000000 <error 27.32 percent>. Serial correlation coefficient is -0.230200 <totally uncorrelated = 0.0>. Java_Integer low 1000/1000: llps = 91, expecting 5.51384
which is the same result as with low entropy
so now im really bored and im gonna skip the rest of the hash functions. im gonna just try doing the rand and high_rand function things..