[Home] [Puzzles & Projects] [Delphi Techniques] [Math Topics] [Library] [Utilities]
Hashing is a mathematical technique to convert a key into an array index. It is useful if we have keys with a limited set of unique values. For example, the set of Delphi reserved words or the set of words in the dictionary. It is further desirable that we access the entries many times - since retrieval is much faster than other types of searching, applications with many lookups are ideal candidates.
The function that performs this conversion is called the hashing function and the array that we will index is called the hash table, the positions in the table are frequently referred to as buckets. The ideal hashing function for a fixed size set of key values would translate every valid key into a unique bucket. For the more general case, the best we can hope for is that the hashing function creates hashed keys that are uniformly randomly distributed across the the table.
Clearly the hash table must be at least as large as the number of unique values that will be inserted. For performance reasons, it is desirable to have the table considerably larger. This is because, most hashing functions are many to one, i.e. it is possible that more than one input will produce the same output. For valid entries, this is known as a collision, and must be handled. One common technique is to start checking buckets sequentially until we find a matching entry or we find an empty bucket. There are other techniques as well. The ratio of the number of entries to the number of buckets is called the loading ratio. For a well designed hashing algorithm, the average number of searches required when accessing the table is 1+ loading2 ratio . In other words, if the table twice as large as its contents (loading ratio=0.5), we can expect to search 1.25. entries on average to find an entry or find a location to insert the entry. Still much shorter than a binary search (and much much faster than a sequential search).
Here's a link to download the source for HashWordCount, a version of the word counting program that defines and uses a first version of a THashStr object to count words in a document. Buttons using list.find (binary search) and list.indexof (sequential search) are included to compare speeds. The THashStr object defined here assumes that keys are strings and that the entries to be accessed are objects derived from TObject. This this case the objects are TCount objects containing only a single integer (the word count) but more complex objects could be used. This program also defines simple StartTimer and StopTimer routines using the QueryPerformanceCount API to accurately compute run times (microsecond resolution).
Copyright © 2000-2013, Gary Darby
All rights reserved.