SE250:lab-5:mabd065
I was kind of shocked looking at the code to start with. But after having a look at some parts of the code and seeking help from the tutor (although it wasn't that great help), i managed to get an idea of what i am supposed to do.
I managed to compile quickly using the *.c command which was mentioned in the lab sheet. This was very helpful.
I managed to compile all of the code required for Part 1 in one shot. But had no clue to the actual meaning of the results at the beginning. After seeking the help of the lecturer (which was really helpful), i started making sense of the whole lab (even though an explanation of the meaning of the stats terminology before hand would have been way better and could have saved the lecturer a lot of time explaining it to students individually).
The code for part 1 is as follows:
CODE:
int main( ) { int sample_size = 100000; int n_keys = 1000; int table_size = 10000; ent_test( "Buzhash low", low_entropy_src, sample_size, &rt_add_buzhash ); ent_test( "Buzhash high", typical_entropy_src, sample_size, &rt_add_buzhash ); ent_test( "buzhashn low", low_entropy_src, sample_size, &rt_add_hash_CRC ); ent_test( "buzhashn high", typical_entropy_src, sample_size, &rt_add_hash_CRC ); ent_test( "hash_CRC low", low_entropy_src, sample_size, &rt_add_hash_CRC ); ent_test( "hash_CRC high", typical_entropy_src, sample_size, &rt_add_hash_CRC ); ent_test( "base256 low", low_entropy_src, sample_size, &rt_add_base256 ); ent_test( "base256 high", typical_entropy_src, sample_size, &rt_add_base256 ); ent_test( "Java Int low", low_entropy_src, sample_size, &rt_add_Java_Integer ); ent_test( "Java Int high", typical_entropy_src, sample_size, &rt_add_Java_Integer ); ent_test( "Java Object low", low_entropy_src, sample_size, &rt_add_Java_Object ); ent_test( "Java Objec high", typical_entropy_src, sample_size, &rt_add_Java_Object ); ent_test( "Java String low", low_entropy_src, sample_size, &rt_add_Java_String ); ent_test( "Java Strin high", typical_entropy_src, sample_size, &rt_add_Java_String ); ent_test( "Rand low", low_entropy_src, sample_size, &rt_add_rand ); ent_test( "Rand high", typical_entropy_src, sample_size, &rt_add_rand ); ent_test( "high rand low", low_entropy_src, sample_size, &rt_add_high_rand ); ent_test( "high rand high", typical_entropy_src, sample_size, &rt_add_high_rand ); return 0; }
===========
The results were:
For: int sample_size = 100000; int n_keys = 1000; int table_size = 10000; Buzhash low 7.99830 75.00% 127.578 0.71% 0.000327 Buzhash high 7.99783 2.50% 127.374 0.77% -0.000076 buzhashn low 5.59831 0.01% 81.783 24.53% 0.028545 buzhashn high 7.84046 0.01% 122.862 1.98% 0.018518 hash_CRC low 5.71827 0.01% 112.143 12.43% -0.204906 hash_CRC high 7.84046 0.01% 122.862 1.98% 0.018518 base256 low 0.00000 0.01% 97.000 27.32% undefined base256 high 4.02297 0.01% 107.853 27.32% 0.034082 Java Int low 4.82824 0.01% 43.883 27.32% -0.092002 Java Int high 4.82824 0.01% 43.883 27.32% -0.092002 Java Object low 2.00000 0.01% 77.000 27.32% -0.521556 Java Objec high 5.72621 0.01% 118.129 4.28% -0.343281 Java String low 7.99958 99.99% 127.474 0.15% 0.001097 Java Strin high 7.94554 0.01% 126.139 0.27% 0.021181 Rand low 7.95308 0.01% 111.441 11.17% -0.051837 Rand high 7.95272 0.01% 111.395 10.65% -0.049131 high rand low 7.99828 75.00% 127.441 0.75% -0.001213 high rand high 7.99807 50.00% 127.406 0.07% -0.002226
I used a sample size of 100000 because i though you wouldn't really use a hash table for a small number of values. What i tought is that because we wanted to have the hash function as fast as possible, this implied to me that we have a large number of values. 100 000 seem large enough for me.
Rearranging results:
Using excel, i was able to arrange the results quickly by sorting the columns as shown above each column below.
DESC* DESC* DESC* ASC* ASC* Buzhash low 7.9983 75.00% 127.578 0.71% 0.000327 high rand low 7.99828 75.00% 127.441 0.75% -0.001213 high rand high 7.99807 50.00% 127.406 0.07% -0.002226 Java String low 7.99958 99.99% 127.474 0.15% 0.001097 Buzhash high 7.99783 2.50% 127.374 0.77% -0.000076 Java Strin high 7.94554 0.01% 126.139 0.27% 0.021181 buzhashn high 7.84046 0.01% 122.862 1.98% 0.018518 hash_CRC high 7.84046 0.01% 122.862 1.98% 0.018518 Rand low 7.95308 0.01% 111.441 11.17% -0.051837 Rand high 7.95272 0.01% 111.395 10.65% -0.049131 base256 high 4.02297 0.01% 107.853 27.32% 0.034082 Java Objec high 5.72621 0.01% 118.129 4.28% -0.343281 hash_CRC low 5.71827 0.01% 112.143 12.43% -0.204906 buzhashn low 5.59831 0.01% 81.783 24.53% 0.028545 Java Int low 4.82824 0.01% 43.883 27.32% -0.092002 Java Int high 4.82824 0.01% 43.883 27.32% -0.092002 Java Object low 2 0.01% 77 27.32% -0.521556 base256 low 0 0.01% 97 27.32% undefined
- * After what i have understood form the lecturer, sorting the results by column didn't really give the desired outcome of showing the best functions at the top because with some tests, the function would rank well and for the other tests would rank below some really bad functions. For example, looking at the first set of results, base256 low would rank better for the mean test (97) compared to (81) buzhashn low.
A load factor of 10 might have been higher than the normal factors used. But that is what i have assumed to be a good factor.
Part 2:
Running the code:
==============
int main( ) { int sample_size = 100000; int n_keys = 10000; int table_size = 1000;
printf("Buzhash low %d/%d: llps = %d, expecting %g\n", n_keys, table_size, llps( n_keys, low_entropy_src, table_size, buzhash), expected_llps(n_keys, table_size)); printf("Buzhash high %d/%d: llps = %d, expecting %g\n", n_keys, table_size, llps( n_keys, typical_entropy_src, table_size, buzhash), expected_llps(n_keys, table_size)); printf("Hash CRC low %d/%d: llps = %d, expecting %g\n", n_keys, table_size, llps( n_keys, low_entropy_src, table_size, hash_CRC), expected_llps(n_keys, table_size)); printf("Hash CRC high %d/%d: llps = %d, expecting %g\n", n_keys, table_size, llps( n_keys, typical_entropy_src, table_size, hash_CRC), expected_llps(n_keys, table_size)); printf("Base 256 low %d/%d: llps = %d, expecting %g\n", n_keys, table_size, llps( n_keys, low_entropy_src, table_size, base256), expected_llps(n_keys, table_size)); printf("Base 256 high %d/%d: llps = %d, expecting %g\n", n_keys, table_size, llps( n_keys, typical_entropy_src, table_size, base256 ), expected_llps(n_keys, table_size)); return 0; }
==============
Gives:
Buzhash low 10000/1000: llps = 20, expecting 21.7096 Buzhash high 10000/1000: llps = 22, expecting 21.7096 Hash CRC low 10000/1000: llps = 88, expecting 21.7096 Hash CRC high 10000/1000: llps = 23, expecting 21.7096 Base 256 low 10000/1000: llps = 10000, expecting 21.7096 Base 256 high 10000/1000: llps = 497, expecting 21.7096
Rearranged from best to worst (ASCENDING order): Buzhash low 10000/1000: llps = 20, expecting 21.7096 Buzhash high 10000/1000: llps = 22, expecting 21.7096 Hash CRC high 10000/1000: llps = 23, expecting 21.7096 Hash CRC low 10000/1000: llps = 88, expecting 21.7096 Base 256 high 10000/1000: llps = 497, expecting 21.7096 Base 256 low 10000/1000: llps = 10000, expecting 21.7096
The results here match roughly what we expect in theory (results in part 1 of the experiment).
The order of the best hash functions are ordered as follows:
Buzhash Hash CRC Base 256
Base256 is the worst of all! Feeding it with low or high amount of data gives results that are far from what we want. Although it performs relatively better with a higher source of data than a low one( 497 compared to 10000 ).