Hashing is a mathematical technique to convert a key into an
array index. It is useful if we have keys with a limited set of unique
values. For example, the set of Delphi reserved words or the set of words
in the dictionary. It is further desirable that we access the
entries many times - since retrieval is much faster than other types of
searching, applications with many lookups are ideal candidates.

The function that performs this conversion is called the **hashing function **and
the array that we will index is called the **hash table, **the positions in
the table are frequently referred to as **buckets**. The ideal hashing
function for a fixed size set of key values would translate every valid key into
a unique bucket. For the more general case, the best we can
hope for is that the hashing function creates hashed keys that are uniformly
randomly distributed across the the table.

Clearly the
hash table must be at least as large as the number of unique values that will be
inserted. For performance reasons, it is desirable to have the table
considerably larger. This is because, most hashing functions are
many to one, i.e. it is possible that more than one input will produce the same output. For valid entries, this is known as a
**collision**, and must be
handled. One common technique is to start checking buckets sequentially
until we find a matching entry or we find an empty bucket. There
are other techniques as well. The ratio of the number of entries to
the number of buckets is called the **loading ratio**. For
a well designed hashing algorithm, the **average
number of searches** required when accessing the table is ** 1+**^{ loading}_{2}^{
ratio} . In other words, if the table twice as large as its contents
(loading ratio=0.5), we can expect to search 1.25. entries on average to
find an
entry or find a location to insert the entry. Still much
shorter than a binary search (and much much faster than a sequential search).

Here's a link to download the source for **HashWordCount**,
a version of the word counting program that defines and uses a first version of
a **THashStr** object to count words in a document. Buttons using **list.find**
(binary search) and **list.indexof** (sequential search) are included
to compare speeds. The **THashStr** object defined here
assumes that keys are strings and that the entries to be accessed are objects
derived from **TObject**. This this case the objects are **TCount **objects
containing only a single integer (the word count) but more complex objects could
be used. This program also defines simple StartTimer and StopTimer
routines using the QueryPerformanceCount API to accurately compute run
times (microsecond resolution).