SE250:lab-5:mabd065

I was kind of shocked looking at the code to start with. But after having a look at some parts of the code and seeking help from the tutor (although it wasn't that great help), i managed to get an idea of what i am supposed to do.

I managed to compile quickly using the *.c command which was mentioned in the lab sheet. This was very helpful.

I managed to compile all of the code required for Part 1 in one shot. But had no clue to the actual meaning of the results at the beginning. After seeking the help of the lecturer (which was really helpful), i started making sense of the whole lab (even though an explanation of the meaning of the stats terminology before hand would have been way better and could have saved the lecturer a lot of time explaining it to students individually).

The code for part 1 is as follows:

CODE:

int main( ) {
  int sample_size = 100000;
  int n_keys = 1000;
  int table_size = 10000;

  ent_test( "Buzhash low", low_entropy_src, sample_size, &rt_add_buzhash );
  ent_test( "Buzhash high", typical_entropy_src, sample_size, &rt_add_buzhash );

  ent_test( "buzhashn low", low_entropy_src, sample_size, &rt_add_hash_CRC );
  ent_test( "buzhashn high", typical_entropy_src, sample_size, &rt_add_hash_CRC );

  ent_test( "hash_CRC low", low_entropy_src, sample_size, &rt_add_hash_CRC );
  ent_test( "hash_CRC high", typical_entropy_src, sample_size, &rt_add_hash_CRC );

  ent_test( "base256 low", low_entropy_src, sample_size, &rt_add_base256 );
  ent_test( "base256 high", typical_entropy_src, sample_size, &rt_add_base256 );

  ent_test( "Java Int low", low_entropy_src, sample_size, &rt_add_Java_Integer );
  ent_test( "Java Int high", typical_entropy_src, sample_size, &rt_add_Java_Integer );

  ent_test( "Java Object low", low_entropy_src, sample_size, &rt_add_Java_Object );
  ent_test( "Java Objec high", typical_entropy_src, sample_size, &rt_add_Java_Object ); 

  ent_test( "Java String low", low_entropy_src, sample_size, &rt_add_Java_String );
  ent_test( "Java Strin high", typical_entropy_src, sample_size, &rt_add_Java_String );
  
  ent_test( "Rand  low", low_entropy_src, sample_size, &rt_add_rand );
  ent_test( "Rand  high", typical_entropy_src, sample_size, &rt_add_rand );

  ent_test( "high rand low", low_entropy_src, sample_size, &rt_add_high_rand );
  ent_test( "high rand high", typical_entropy_src, sample_size, &rt_add_high_rand );
 
  return 0;
}

===========

The results were:

For:
  int sample_size = 100000;
  int n_keys = 1000;
  int table_size = 10000;

		
Buzhash low	7.99830	75.00%	127.578	0.71%	0.000327 
Buzhash high	7.99783	2.50%	127.374	0.77%	-0.000076
buzhashn low	5.59831	0.01%	81.783	24.53%	0.028545 
buzhashn high	7.84046	0.01%	122.862	1.98%	0.018518
hash_CRC low	5.71827	0.01%	112.143	12.43%	-0.204906
hash_CRC high	7.84046	0.01%	122.862	1.98%	0.018518 
base256 low	0.00000	0.01%	97.000	27.32%	undefined
base256 high	4.02297	0.01%	107.853	27.32%	0.034082
Java Int low	4.82824	0.01%	43.883	27.32%	-0.092002
Java Int high	4.82824	0.01%	43.883	27.32%	-0.092002
Java Object low	2.00000	0.01%	77.000	27.32%	-0.521556
Java Objec high	5.72621	0.01%	118.129	4.28%	-0.343281
Java String low	7.99958	99.99%	127.474	0.15%	0.001097
Java Strin high	7.94554	0.01%	126.139	0.27%	0.021181 
Rand  low	7.95308	0.01%	111.441	11.17%	-0.051837 
Rand  high	7.95272	0.01%	111.395	10.65%	-0.049131
high rand low	7.99828	75.00%	127.441	0.75%	-0.001213
high rand high	7.99807	50.00%	127.406	0.07%	-0.002226

I used a sample size of 100000 because i though you wouldn't really use a hash table for a small number of values. What i tought is that because we wanted to have the hash function as fast as possible, this implied to me that we have a large number of values. 100 000 seem large enough for me.

Rearranging results:

Using excel, i was able to arrange the results quickly by sorting the columns as shown above each column below.

		DESC*	DESC*	DESC*	ASC*	ASC*
Buzhash low	7.9983	75.00%	127.578	0.71%	0.000327
high rand low	7.99828	75.00%	127.441	0.75%	-0.001213
high rand high	7.99807	50.00%	127.406	0.07%	-0.002226
Java String low	7.99958	99.99%	127.474	0.15%	0.001097 
Buzhash high	7.99783	2.50%	127.374	0.77%	-0.000076
Java Strin high	7.94554	0.01%	126.139	0.27%	0.021181 
buzhashn high	7.84046	0.01%	122.862	1.98%	0.018518
hash_CRC high	7.84046	0.01%	122.862	1.98%	0.018518 
Rand  low	7.95308	0.01%	111.441	11.17%	-0.051837
Rand  high	7.95272	0.01%	111.395	10.65%	-0.049131
base256 high	4.02297	0.01%	107.853	27.32%	0.034082
Java Objec high	5.72621	0.01%	118.129	4.28%	-0.343281
hash_CRC low	5.71827	0.01%	112.143	12.43%	-0.204906
buzhashn low	5.59831	0.01%	81.783	24.53%	0.028545
Java Int low	4.82824	0.01%	43.883	27.32%	-0.092002
Java Int high	4.82824	0.01%	43.883	27.32%	-0.092002
Java Object low	2	0.01%	77	27.32%	-0.521556 
base256 low	0	0.01%	97	27.32%	undefined

* After what i have understood form the lecturer, sorting the results by column didn't really give the desired outcome of showing the best functions at the top because with some tests, the function would rank well and for the other tests would rank below some really bad functions. For example, looking at the first set of results, base256 low would rank better for the mean test (97) compared to (81) buzhashn low.

A load factor of 10 might have been higher than the normal factors used. But that is what i have assumed to be a good factor.

Part 2:

Running the code:

==============

int main( ) {
  int sample_size = 100000;
  int n_keys = 10000;
  int table_size = 1000;

  printf("Buzhash low %d/%d: llps = %d, expecting %g\n", n_keys, table_size,
  llps( n_keys, low_entropy_src, table_size, buzhash), expected_llps(n_keys, table_size));
  printf("Buzhash high %d/%d: llps = %d, expecting %g\n", n_keys, table_size,
  llps( n_keys, typical_entropy_src, table_size, buzhash), expected_llps(n_keys, table_size));

  printf("Hash CRC low %d/%d: llps = %d, expecting %g\n", n_keys, table_size,
  llps( n_keys, low_entropy_src, table_size, hash_CRC), expected_llps(n_keys, table_size));
  printf("Hash CRC high %d/%d: llps = %d, expecting %g\n", n_keys, table_size,
  llps( n_keys, typical_entropy_src, table_size,  hash_CRC), expected_llps(n_keys, table_size)); 

  printf("Base 256 low %d/%d: llps = %d, expecting %g\n", n_keys, table_size,
  llps( n_keys, low_entropy_src, table_size, base256), expected_llps(n_keys, table_size));
  printf("Base 256 high %d/%d: llps = %d, expecting %g\n", n_keys, table_size,
  llps( n_keys, typical_entropy_src, table_size,  base256 ), expected_llps(n_keys, table_size));

 return 0;
}

==============

Gives:

Buzhash low 10000/1000: llps = 20, expecting 21.7096 
Buzhash high 10000/1000: llps = 22, expecting 21.7096
Hash CRC low 10000/1000: llps = 88, expecting 21.7096
Hash CRC high 10000/1000: llps = 23, expecting 21.7096
Base 256 low 10000/1000: llps = 10000, expecting 21.7096
Base 256 high 10000/1000: llps = 497, expecting 21.7096

Rearranged from best to worst (ASCENDING order): 
Buzhash low 10000/1000: llps = 20, expecting 21.7096
Buzhash high 10000/1000: llps = 22, expecting 21.7096
Hash CRC high 10000/1000: llps = 23, expecting 21.7096
Hash CRC low 10000/1000: llps = 88, expecting 21.7096
Base 256 high 10000/1000: llps = 497, expecting 21.7096
Base 256 low 10000/1000: llps = 10000, expecting 21.7096

The results here match roughly what we expect in theory (results in part 1 of the experiment).

The order of the best hash functions are ordered as follows:

Buzhash
Hash CRC
Base 256

Base256 is the worst of all! Feeding it with low or high amount of data gives results that are far from what we want. Although it performs relatively better with a higher source of data than a low one( 497 compared to 10000 ).

SE250:lab-5:mabd065

===========

==============

==============

Navigation menu

Search