Mark: 10 revision(s)

2008-11-03T05:19:56Z

10 revision(s)

New page

== '''LAB 5''' ==

== Q1 ==

Testing with
<Pre>int sample_size = 1000;
int n_keys = 1000;
int table_size = 1000;</Pre>
and running '''rt_add_buzhash'''
Testing Buzhash low on 1000 samples

<Pre>Entropy = 7.843786 bits per byte.

Optimum compression would reduce the size of this 1000 byte file by 1 percent.

Chi square distribution for the 1000 samples is 214.46, and randomly would exceed this value 95.00 percent of the times.

Arithmetic mean value of the data bytes is 128.0860 <127.5 = random>.

Monte Carlo value for Pi is 3.132530120 <error 0.29 percent>.

Serial correlation coefficient is -0.017268 <totally uncorrelated = 0.0>.

Buzhash low 1000/1000: llps = 6, expecting 5.51384</Pre>

Not sure what these results mean yet.

After increasing the sample size to first 100000 then 10000000

I observed..

Entropy got closer to 8 bits per byte.

The percentage compression would reduce the file size by decreased to 0.

Chi square distribution decreased.

Arithmetic mean value of the data byes gets closer to 127.5

Monte Carlo value got closer to PI.

Serial correlation coefficient got closer to 0

llps value was closer to the expected value.

All the results suggest increasing the sample size increases "randomness"

Running '''rt_add_buzhashn''' with sample size 100000 and low entropy

<Pre>Entropy = 7.998236 bits per byte.

Optimum compression would reduce the size of this 100000 byte file by 0 percent.

Chi square distribution for the 100000 samples is 244.84, and randomly would exceed this value 50.00 percent of the times.

Arithmetic mean value of the data bytes is 127.4936 <127.5 = random>.

Monte Carlo value for Pi is 3.137635506 <error 0.13 percent>.

Serial correlation coefficient is -0.003092 <totally uncorrelated = 0.0>.

Buzhash low 1000/1000: llps = 999 (!!!!!!), expecting 5.51384</Pre>

llps = 999 suggests that a lot of values is bunched up in one place

Running '''rt_add_hash_CRC''' with sample size 100000 and low entropy

<Pre>Entropy = 5.574705 bits per byte.

Optimum compression would reduce the size of this 100000 byte file by 30 percent.

Chi square distribution for the 100000 samples is 1398897.03, and randomly would exceed this value 0.01 percent of the times.

Arithmetic mean value of the data bytes is 95.7235 <127.5 = random>.

Monte Carlo value for Pi is 3.747989920 <error 19.30 percent>.

Serial correlation coefficient is -0.075371 <totally uncorrelated = 0.0>.

Buzhash low 1000/1000: llps = 13, expecting 5.51384</Pre>

Running '''rt_add_base256''' with sample size 100000 and low entropy

<Pre>Entropy = 0.00000 (!!!) bits per byte.

Optimum compression would reduce the size of this 100000 byte file by 100 percent.

Chi square distribution for the 1000 samples is 25500000.00, and randomly would exceed this value 0.01 percent of the times.

Arithmetic mean value of the data bytes is 97.0000 <127.5 = random>.

Monte Carlo value for Pi is 4.0000000 <error 27.32 percent>.

Serial correlation coefficient is undefined <totally uncorrelated = 0.0>.

Buzhash low 1000/1000: llps = 1000 (!!!!), expecting 5.51384</Pre>

'''base256''' and other hash functions produced many unexpected results.

Maybe the sample size is too large? Suggests '''buzhash''' performs well using large sample sizes.

SE250:lab-5:tlou006 - Revision history

Mark: 10 revision(s)