SoftComplete Development

HashTrie data structure

The existing implementations of hash-tables have the important disadvantage - fixed size. As a corollary, if the table is filled more than 90 % search and insert operation become very slow.

HashTrie represents new efficient data structure. It combines in herself properties of the hash-tables and trie (digital-trees). As against the usual hash-tables the size HashTrie is not fixed, that allows to work with a unknown beforehand amount of datas.

Structure of HashTrie

HashTrie is very like to tree. Each node is array of pointers. (In our implementation array size is 256). Each pointer may be reference to 1) nil; 2) overflow list; 3) other node (subtree).

const LeafSize = 256;
type
  TBaseItem = class
    // abstract base class
  end;

  TLinkedItem = class(TBaseItem)
  private
    Value: string;
    Data: Pointer; // some data associated with Value
    Next: TLinkedItem; // next item in list or nil
  end;

  TTreeItem = class(TBaseItem)
  private
    Level: integer; // item level in tree
    Items: array[0..LeafSize-1] of TBaseItem;
    // really nil, TLinkedItem or TTreeItem
  end;

Initially HashTrie contains one item (TTreeItem) and behaves as hash-table. The conflicts of duties are allowed through the list of overflow. As hashing function we shall use CRC32 (hereinafter we shall return to a problem of choice hashing function). Is important that outcome of calculation hashing function it 32-bit unsigned integer. For addressing in the table (at the first level) we use a low byte.

For example we add in HashTrie string "ABCDE" so:

1) calculate hash-function. H:=hash("ABCDE");
2) calculate index in array. i:=H and $FF;
3) if Items[i] is nil we create list, otherwise add item to list.
4) if list length is more than some predefined constant (by example 8) we converse list to new tree item.
Step 4 determines a rule of construction a HashTrie.
5) if Items[i] contains pointer to other TTreeItem we add value to it. But at the second level of a tree we shall use second byte of hashing function, on third - third etc.

In the following figure you can see a graphics image of the obtained structure

HashTrie data structure

Choice of hashing function

We have tested with HashTrie some different hashing functions:
1) Simple hash-function:

function HashStr(const S: string): DWORD;
var i: integer;
begin
  Result:=Length(S);
  for i:=1 to Length(S) do
    Result:= (Result shl 5) xor (Result shr 27) xor Ord(S[i]);
end;

2) CRC32
3) Hashing function offered by Bob Jenkins

After addition in HashTrie of 30 thousand words from the English language dictionary we have following results:

Hash
function
Height
of tree
Count of not nil items* Count of nil items* Count of overflow lists with length equal to:
1 2 3 4 5 6
Simple 4 20528 122576 14295 3372 1169 530 249 158
CRC32 2 24519 41273 19069 4429 675 81 8 1
Bob Jenkins 2 24660 41132 19319 4355 635 88 6 1

*) in array of TTreeItem nodes

It is visible, that at usage of simple hashing function we receive poor outcomes. CRC32 and the method of Bob Jenkins is displayed approximately identical outcomes, thus CRC32 is calculated faster.

Download source code of HashTrie