<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://wiki.kram.nz/index.php?action=history&amp;feed=atom&amp;title=SE250%3Alab-5%3Atlou006</id>
	<title>SE250:lab-5:tlou006 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.kram.nz/index.php?action=history&amp;feed=atom&amp;title=SE250%3Alab-5%3Atlou006"/>
	<link rel="alternate" type="text/html" href="https://wiki.kram.nz/index.php?title=SE250:lab-5:tlou006&amp;action=history"/>
	<updated>2026-04-28T18:26:10Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.3</generator>
	<entry>
		<id>https://wiki.kram.nz/index.php?title=SE250:lab-5:tlou006&amp;diff=6865&amp;oldid=prev</id>
		<title>Mark: 10 revision(s)</title>
		<link rel="alternate" type="text/html" href="https://wiki.kram.nz/index.php?title=SE250:lab-5:tlou006&amp;diff=6865&amp;oldid=prev"/>
		<updated>2008-11-03T05:19:56Z</updated>

		<summary type="html">&lt;p&gt;10 revision(s)&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== &amp;#039;&amp;#039;&amp;#039;LAB 5&amp;#039;&amp;#039;&amp;#039; ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Q1 ==&lt;br /&gt;
&lt;br /&gt;
Testing with&lt;br /&gt;
  &amp;lt;Pre&amp;gt;int sample_size = 1000;&lt;br /&gt;
  int n_keys = 1000;&lt;br /&gt;
  int table_size = 1000;&amp;lt;/Pre&amp;gt;&lt;br /&gt;
and running &amp;#039;&amp;#039;&amp;#039;rt_add_buzhash&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
Testing Buzhash low on 1000 samples&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;Pre&amp;gt;Entropy = 7.843786 bits per byte.&lt;br /&gt;
&lt;br /&gt;
Optimum compression would reduce the size of this 1000 byte file by 1 percent.&lt;br /&gt;
&lt;br /&gt;
Chi square distribution for the 1000 samples is 214.46, and randomly would exceed this value 95.00 percent of the times.&lt;br /&gt;
&lt;br /&gt;
Arithmetic mean value of the data bytes is 128.0860 &amp;lt;127.5 = random&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Monte Carlo value for Pi is 3.132530120 &amp;lt;error 0.29 percent&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Serial correlation coefficient is -0.017268 &amp;lt;totally uncorrelated = 0.0&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Buzhash low 1000/1000: llps = 6, expecting 5.51384&amp;lt;/Pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Not sure what these results mean yet.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
After increasing the sample size to first 100000 then 10000000&lt;br /&gt;
&lt;br /&gt;
I observed..&lt;br /&gt;
&lt;br /&gt;
Entropy got closer to 8 bits per byte.&lt;br /&gt;
&lt;br /&gt;
The percentage compression would reduce the file size by decreased to 0.&lt;br /&gt;
&lt;br /&gt;
Chi square distribution decreased.&lt;br /&gt;
&lt;br /&gt;
Arithmetic mean value of the data byes gets closer to 127.5&lt;br /&gt;
&lt;br /&gt;
Monte Carlo value got closer to PI.&lt;br /&gt;
&lt;br /&gt;
Serial correlation coefficient got closer to 0&lt;br /&gt;
&lt;br /&gt;
llps value was closer to the expected value.&lt;br /&gt;
&lt;br /&gt;
All the results suggest increasing the sample size increases &amp;quot;randomness&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Running &amp;#039;&amp;#039;&amp;#039;rt_add_buzhashn&amp;#039;&amp;#039;&amp;#039; with sample size 100000 and low entropy&lt;br /&gt;
&lt;br /&gt;
&amp;lt;Pre&amp;gt;Entropy = 7.998236 bits per byte.&lt;br /&gt;
&lt;br /&gt;
Optimum compression would reduce the size of this 100000 byte file by 0 percent.&lt;br /&gt;
&lt;br /&gt;
Chi square distribution for the 100000 samples is 244.84, and randomly would exceed this value 50.00 percent of the times.&lt;br /&gt;
&lt;br /&gt;
Arithmetic mean value of the data bytes is 127.4936 &amp;lt;127.5 = random&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Monte Carlo value for Pi is 3.137635506 &amp;lt;error 0.13 percent&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Serial correlation coefficient is -0.003092 &amp;lt;totally uncorrelated = 0.0&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Buzhash low 1000/1000: llps = 999 (!!!!!!), expecting 5.51384&amp;lt;/Pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
llps = 999 suggests that a lot of values is bunched up in one place&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Running &amp;#039;&amp;#039;&amp;#039;rt_add_hash_CRC&amp;#039;&amp;#039;&amp;#039; with sample size 100000 and low entropy&lt;br /&gt;
&lt;br /&gt;
&amp;lt;Pre&amp;gt;Entropy = 5.574705 bits per byte.&lt;br /&gt;
&lt;br /&gt;
Optimum compression would reduce the size of this 100000 byte file by 30 percent.&lt;br /&gt;
&lt;br /&gt;
Chi square distribution for the 100000 samples is 1398897.03, and randomly would exceed this value 0.01 percent of the times.&lt;br /&gt;
&lt;br /&gt;
Arithmetic mean value of the data bytes is 95.7235 &amp;lt;127.5 = random&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Monte Carlo value for Pi is 3.747989920 &amp;lt;error 19.30 percent&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Serial correlation coefficient is -0.075371 &amp;lt;totally uncorrelated = 0.0&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Buzhash low 1000/1000: llps = 13, expecting 5.51384&amp;lt;/Pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running &amp;#039;&amp;#039;&amp;#039;rt_add_base256&amp;#039;&amp;#039;&amp;#039; with sample size 100000 and low entropy&lt;br /&gt;
&lt;br /&gt;
&amp;lt;Pre&amp;gt;Entropy = 0.00000 (!!!) bits per byte.&lt;br /&gt;
&lt;br /&gt;
Optimum compression would reduce the size of this 100000 byte file by 100 percent.&lt;br /&gt;
&lt;br /&gt;
Chi square distribution for the 1000 samples is 25500000.00, and randomly would exceed this value 0.01 percent of the times.&lt;br /&gt;
&lt;br /&gt;
Arithmetic mean value of the data bytes is 97.0000 &amp;lt;127.5 = random&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Monte Carlo value for Pi is 4.0000000 &amp;lt;error 27.32 percent&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Serial correlation coefficient is undefined &amp;lt;totally uncorrelated = 0.0&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Buzhash low 1000/1000: llps = 1000 (!!!!), expecting 5.51384&amp;lt;/Pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;base256&amp;#039;&amp;#039;&amp;#039; and other hash functions produced many unexpected results. &lt;br /&gt;
&lt;br /&gt;
Maybe the sample size is too large? Suggests &amp;#039;&amp;#039;&amp;#039;buzhash&amp;#039;&amp;#039;&amp;#039; performs well using large sample sizes.&lt;/div&gt;</summary>
		<author><name>Mark</name></author>
	</entry>
</feed>