SE250:April 7

From Marks Wiki
Revision as of 05:08, 3 November 2008 by Mark (Sọ̀rọ̀ | contribs) (6 revision(s))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Agenda

  • OK, so I've uploaded my "screencast" to my own file host. We had to choose something which was based on Lab 3 correct? But some of it wasn't exactly ArrayList content. I've put my screencast there anyway - work in progress of course.

Minutes

Minute taker: Shrek

.talked about screencasts

.lab #5

hash functions

- given several hash functions
- evaluate 2 experiments

1- given suite (collection of functions) for testing randomness

in each case - an input source (a set of keys to be hashed. Strings or numbers) - input sources may be typical words or sequential integers - an output of pseudorandomness ( a number)

pseudorandom is not the same as true random as the number returned will be the same if the same input is again used.

2- Test hash function in practice

input source --> hash function --> look at hash table

array (numbers on right represent number of entries in each position) 0 - 3 - - 0 - - 2 - - 2 - - 1 - - 4 size - 1

frequency count to check for occurences of each entry for each position (i.e. the length of the chains in the array).

Second test thus gives us an idea of the practical implications of the different types of hash functions.



Where does randommness come into it?

hash function: Take a string, from this give me an integer


Simpler problem: Take a character, from this give me an integer

The obvious thing:

A 0 B 1 C 2 . . . Z 25 sp 26

No reason to have consecutive numbers, no reason for them not to be random (lol, can't remember what he said)

Therefore, random integer 0 ... MAX_INT


each entry could be really random like 0xC5C48EA5L, written in hexadecimal to make it look frightening.

Characters are "fired" into a hash function and the value returned could be anywhere along the entire range of 0 - MAXINT.

int tbl[256] = ( ... ) int hash_char( char c) {return tbl[ c]; }

int hash_string(char*sp) { hash_char(sp[0]) hash_char(sp[1]) hash_char(sp[2]) ...

this is bad.

this is better:

int hash_string(char*sp) { combine( combine (hash_char(sp[0]), hash_char(sp[1])) hash_char(sp[2]) ...

}


int hash_string(char*sp) { int has = <some predetermined random integer>; for(; sp != 0; sp++); hash = combine (hash_char, *sp); return hash; }


to combine two hash codes:

need to ensure entropy is maximum of each value

consider:

int combine(int h1, int h2) { return h1 + h2; }

this doesn't work as it favours a last digit of 0 if the two numbers have zeros for last digits.


XOR function is better - allows for any input to be shuffled around and changed.


int combine(int h1, int h2) { return rotate(h1) ^ h2; }