SE250:Poster, Experiment Kit - mgar059, nirv002
This is where most of the work is going. It only works properly with IE, if you know how to fix it for firefox, let me know on the discussion :-)
More Information on Hashing
- Poster, Experiment Kit - mgar059, nirv002 <-- YOU ARE HERE
- General Notes on Hashing - than035,wton016
- Interactive Visualisations on Hashing - hsne002
- Annotated Reading List - ntho055
- Class presentation on Hashing - hcha155
The Experiment Kit for hashing may be found here!
This page deals with Hashing
This is the outline of the content for our poster. This information is repeated on our Intranet Site. Sadly this is only accessable within the university network.
Poster
Hashing- what is it?
Hashing is the process of reducing a data input to a shorter unique identifier. The advantage of using an identifier is that it can speed up the process of finding data. For example, it is much easier to compare identifier strings of 4 digits for equality, than the whole n digit string that it represents. An example of this is license plate numbers. License plate numbers can be used to represent a range of the cars details, including the type of car, its colour, year it was made etc. It is easier to represent all of this in a unique number rather than use a full description. Hashing also has other uses such as verifying large amounts of data quickly, detecting similarity between chunks of data, and hiding information in cryptographic applications.
http://en.wikipedia.org/wiki/Hashing
http://www.cs.duke.edu/~mlittman/courses/Archive/cps130-97/lectures/lect11/node6.html
http://www.cs.sunysb.edu/~skiena/214/lectures/lect21/lect21.html
Uses of hashing
Hashing is used to speed up data search, verify data, detect similarities in data, and to compare data while retaining privacy
http://www.techworld.com/storage/features/index.cfm?FeatureID=235
http://en.wikipedia.org/wiki/Hashing
http://www.answers.com/topic/hash-function?method=22
Hash Tables
Hash table searching is the most powerful method of retrieving data known.
Hash tables are a way of using hash values as indexes to search quickly. In a perfect hash function, where each input has a unique hash, data can be accessed with the first probe into the table. This takes O(1) runtime. Poor hash functions which produce many collisions can degrade the search to a linear operation. Good functions for this are the LOOKUP2 and buzhash functions.
http://www.comp.nus.edu.sg/~stevenha/programming/prog_searching.html
Verifying data
Hashes are very useful for comparing data. Two pieces of data can be compared by simply comparing their hash values. In a good hash function, a small change in the data will be reflected in a vastly different hash value. Costly hash functions which produce unique stings are usually used for these tasks. This is useful when sending data across a network, to check if the data was corrupted during transfer.
http://www.answers.com/topic/hash-function?method=22
Detecting similarity
For some cases, it is useful to have a less sensitive hash function for detecting similarity between data. One use is with music where you want to detect if two MP3 files are the same song. It is possible to have a function which can detect similar music even though they may have been recorded in either a studio or noisy room, or be of slightly different length.
http://citeseer.ist.psu.edu/haitsma01robust.html
Hashing in cryptography
Hash functions are very useful in cryptography. The hash function is used to obscure passwords and private data. This allows the hash values of two passwords to be compared for equality, removing the need to transmit the password in plain format, where it could be intercepted. It is very important with cryptographic functions to reduce collisions and eliminate the ability to derive the original information from the hash value.
http://www.answers.com/topic/hash-function?method=22
Problems with hashing
One of the main problems with hash functions is collisions. Collisions are where two distinct inputs give the same output.
A major problem when using hash functions is collisions. Hash functions attempt to associate complicated data with a more simple unique id for that data. A hash collision is where two different pieces of data are hashed to produce the same hash.
Hashes are often used to speed up the searching of data. By using a hash table, on average, data can be found at a constant time O(1) and at a rare worst case of O(n). When sorting data into a hash table, data is not stored in any specific order. As such, accessing data in a sorted manner can be extremely complex and in cases impossible within a reasonable time.
Hash functions which are good at providing unique hashes with randomly distributed output can be costly in terms of processing power and time to compute. Clustering occurs when a hash function returns values closely spaced within the output range. This can cause significant performance degradation in hash tables.
http://www.answers.com/hash%20table
http://www.answers.com/main/ntquery?s=hash+tables&gwp=13
Hashing process
There will be Diagrams
In hashing, a piece of data is run through a hashing function to give a shorter string of alphanumeric digits. This is often done through a combination of rotational and logic operations on the binary bits that make up the input. If one small bit of the data going into the function changes, the output of the function should also change (ideally, one digit changing in the input should cause at least half of the output digits to change). The idea is that each unique input will produce a corresponding unique hash. This gives a way of comparing large amounts of data with only a small string.
http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=MD5&curtab=2222_1
Experiment Kit
The Experiment Kit for hashing may be found here!
References
A store of our resources are located within: http://studwww.cs.auckland.ac.nz/~mgar059.
Overview
http://www.cs.duke.edu/~mlittman/courses/Archive/cps130-97/lectures/lect11/node6.html
http://www.cs.sunysb.edu/~skiena/214/lectures/lect21/lect21.html
Wiki/other: Specific points on hashing
http://www.answers.com/topic/hash-function?method=22
http://www.answers.com/main/ntquery?s=hash+tables&gwp=13 <-- Clustering
http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=MD5&curtab=2222_1
Buzhash: Authors page
http://www.serve.net/buz/hash.adt/java.000.html
Audio identification:
http://citeseer.ist.psu.edu/haitsma01robust.html
pdf file is in resources folder
Jamie's hashing presentation.
Misc
http://www.comp.nus.edu.sg/~stevenha/programming/prog_searching.html
Pictures
www.marioparga.com/hobbies5.html
www.drinvestigations.com