Performance Notes

Space considerations

Ternary trees are notably larger than hash maps or most binary tree designs. Each node holds only one character (instead of a whole key), and use 3-5 pointers. Our nodes consist of 4 size_t values (16 bytes) regardless of platform pointer size, or char type (if at most 2-byte like wchar_t).

The shared parts of strings save space: In a typical English dictionary, each key shares over half its nodes with other keys, so the allocated space is about half of total key-length times 16. In a scrabble dictionary like the one reported below, which contains all valid word endings, most nodes are shared, so its storage cost is "only" total key length times 0.35 times 16, or less than 6 bytes per char. With wchar_t type, the storage cost cannot be considered overly large.

Lookup speed

The complexity of ternary tree operations is basically the same as for binary trees, (logarithmic in tree size) but with quite different constant factors. See [1].

Overall lookup and iteration speed depends on application factors - ie whether strings are inserted in random order or not, etc.

Rough speed estimates (compared to Stlport hash_map and map).

insertion is a bit slower (>30% to 0%) than hash_map, ~30% faster than map.
finding a key is ~0-50% slower than hash_map (equal on failure, with short keys).
finding a key is 1.5-3 times faster than map (again with short keys).

Compared to C versions (DDJ and libtst),

find and insert are slower, by factors ranging from 1.5-4.
partial_match and neighbour searches are 5-20% faster than published DDJ code - the code is essentially the same, but our implementation rolls out some recursion. This is easily back-ported, so in effect they should be considered to run at same speed. This by itself is good news though, since eg single-key lookup is always slower.

Since each character in a key is at a separate node in the internal tree, iterating over values is a little slower than for other tree-based containers.

For detailed test, see performance table below.

Rough performance measurements on 150,000+ words English dictionary.

Lookup, insert and erase figures are normalized on libtst successful-find operation on sorted-insertion tree.
Iteration is normalized on hash_map::iterator.

Average string length was 8.3 chars. The TST trees used 3.12 nodes per string, so prefix sharing makes these strings use 0.37 nodes (just under 6 bytes) per char. All in all 1242 k characters were stored in 7.3 Mb (on 32-bit architectures).
The type of keys used affects performance, though less than insertion sort order. Some reference runs with 10-80k words were made, to verify performance tendencies.

Function find success find failed insert erase iteration

DDJ sort: 0.96
rand: 2.9 sort: 0.32
rand: 0.42 sort: 2.75
rand: 6.4 2.74 (N/A)

libtst sort: 1.0
rand: 2.6 sort: 0.18
rand: 0.18 sort: 1.6
rand: 3.4
0.5 (N/A)

ternary_tree 0.67 sort: 2.0
rand: 3.7 sort: 0.8
rand: 0.77 sort: 3.3
rand: 4.67 0.9 sort: 1.2
rand: 2.1

stlport::hash_map 2.1 1.2 sort: 4.75
rand: 3.6 O(n) always 1.0
O(log n)

stlport::map sort: 4.5
rand: 8.3 sort: 4.0
rand: 3.5 sort: 7.3
rand: 9.8 0.84 2.1
O(1)

Function	find success	find failed	insert	erase	iteration
DDJ	sort: 0.96 rand: 2.9	sort: 0.32 rand: 0.42	sort: 2.75 rand: 6.4	2.74	(N/A)
libtst	sort: 1.0 rand: 2.6	sort: 0.18 rand: 0.18	sort: 1.6 rand: 3.4	0.5	(N/A)
ternary_tree 0.67	sort: 2.0 rand: 3.7	sort: 0.8 rand: 0.77	sort: 3.3 rand: 4.67	0.9	sort: 1.2 rand: 2.1
stlport::hash_map	2.1	1.2	sort: 4.75 rand: 3.6	O(n) always	1.0 O(log n)
stlport::map	sort: 4.5 rand: 8.3	sort: 4.0 rand: 3.5	sort: 7.3 rand: 9.8	0.84	2.1 O(1)

The figures marked "sort" and "rand" refer to trees built by sorted or random order insertion.

Relative performance of advanced searches
The advanced searches are only implemented for DDJ and our ternary_tree, and perform at nearly the same speed.

Timings in the table below are clock-time millisecs spent in 1000 different searches (or, can equivalently be read as microsecs per search). The machine was a 2.4 GHz Intel with 512 Mb RAM running Windows XP. For reference, single key lookup in tests above ran at about 0.3 microsec/call for DDJ, 0.77 microsec/call for ternary_tree.

Function partial match
1 wildcard partial match
3 wc partial match
5 wc

keys found per call 1.3 12.4 106.3

DDJ sort: 8.1
rand: 9.7 sort: 100
rand: 60.4 sort: 526
rand: 274

ternary_tree 0.67 sort: 6.3
rand: 6.3 sort: 96
rand: 63.5 sort: 523
rand: 244

Function hamming
dist=1 hamming
dist=3 hamming
dist=5

keys found per call 3.5 181.4 4207.7

DDJ sort: 68.0
rand: 44.8 sort: 3320.1
rand: 1815.0 sort: 17908.4
rand: 5937.5

ternary_tree 0.67 sort: 57.64
rand: 40.2 sort: 2957.67
rand: 1655.91 sort: 16512.5
rand: 5225.15

Function levenshtein
dist=1 levenshtein
dist=3 levenshtein
dist=5

keys found per call 4.01 316.0 8294.3

ternary_tree 0.675 sort: 172.93
rand: 158.56 sort: 12088.2
rand: 8427.8 sort: 81116.9
rand: 66182.6

Function combinatorial
0 wildcards combinatorial
1 wc combinatorial
2 wc

keys found per call 230.2 2310.2 10093.7

ternary_tree 0.67 sort: 552.34
rand: 417.97 sort: 4189.05
rand: 2880.18 sort: 13585.6
rand: 8629.8

Function	partial match 1 wildcard	partial match 3 wc	partial match 5 wc
keys found per call	1.3	12.4	106.3
DDJ	sort: 8.1 rand: 9.7	sort: 100 rand: 60.4	sort: 526 rand: 274
ternary_tree 0.67	sort: 6.3 rand: 6.3	sort: 96 rand: 63.5	sort: 523 rand: 244
Function	hamming dist=1	hamming dist=3	hamming dist=5
keys found per call	3.5	181.4	4207.7
DDJ	sort: 68.0 rand: 44.8	sort: 3320.1 rand: 1815.0	sort: 17908.4 rand: 5937.5
ternary_tree 0.67	sort: 57.64 rand: 40.2	sort: 2957.67 rand: 1655.91	sort: 16512.5 rand: 5225.15
Function	levenshtein dist=1	levenshtein dist=3	levenshtein dist=5
keys found per call	4.01	316.0	8294.3
ternary_tree 0.675	sort: 172.93 rand: 158.56	sort: 12088.2 rand: 8427.8	sort: 81116.9 rand: 66182.6
Function	combinatorial 0 wildcards	combinatorial 1 wc	combinatorial 2 wc
keys found per call	230.2	2310.2	10093.7
ternary_tree 0.67	sort: 552.34 rand: 417.97	sort: 4189.05 rand: 2880.18	sort: 13585.6 rand: 8629.8

ternary_tree usually performs a few percent faster than DDJ implementation. As explained above, one could as well say they are equally fast. This by itself is good news though, since our insertion and single-key lookup is quite a bit slower than C implementations.

Measurement of 50,000 US library code strings (like "WAFR__505433")

Size of TST trees was 10.2 nodes per string, using 7.1 Mb to store 955 k characters (on 32-bit architectures).
With longer strings (ave. 21 chars instead of 6-8 like English), TST failed lookup is no longer faster than successful search. Here, TST's lose against hash_map, being half as fast for failed search, and marginally slower for successful find. Iterators are significantly more expensive, since more nodes must be traversed to step between end-nodes. If advanced searches or sorted order is needed, this is of course a small price to pay.

Again, everything except iteration is normalized on libtst successful-find operation on sorted-insertion tree.
Iteration is normalized on hash_map iterator.

Function find success find failed insert erase Iteration

DDJ sort: 1.1
rand: 1.2 sort: 1.25
rand: 1.07 sort: 1.85
rand: 2.1 (N/A) (N/A)

libtst sort: 1.0
rand: 1.0 sort: 1.0
rand: 1.0 sort: 1.5
rand: 1.8
sort: 1.1 (N/A)

Boost.Spirit sort: 1.4
rand: 1.6 sort: 1.6
rand: 1.4 sort: 16.2
rand: 17-110+ sort: 5.25
rand: 88.5 (N/A)

ternary_tree 0.65 sort: 1.7
rand: 1.8 sort: 1.0
rand: 1.5 sort: 3.0
rand: 3.3 sort: 0.8
rand: 0.9 sort: 2.8
rand: 3.3

stlport::hash_map 0.9 0.5 sort: 1.9
rand: 1.5 O(n) 1.0

stlport::map sort: 3.1
rand: 2.7 sort: 3.0
rand: 2.6 sort: 4.4
rand: 3.8 0.35 sort: 1.4
rand: 1.4

Function	find success	find failed	insert	erase	Iteration
DDJ	sort: 1.1 rand: 1.2	sort: 1.25 rand: 1.07	sort: 1.85 rand: 2.1	(N/A)	(N/A)
libtst	sort: 1.0 rand: 1.0	sort: 1.0 rand: 1.0	sort: 1.5 rand: 1.8	sort: 1.1	(N/A)
Boost.Spirit	sort: 1.4 rand: 1.6	sort: 1.6 rand: 1.4	sort: 16.2 rand: 17-110+	sort: 5.25 rand: 88.5	(N/A)
ternary_tree 0.65	sort: 1.7 rand: 1.8	sort: 1.0 rand: 1.5	sort: 3.0 rand: 3.3	sort: 0.8 rand: 0.9	sort: 2.8 rand: 3.3
stlport::hash_map	0.9	0.5	sort: 1.9 rand: 1.5	O(n)	1.0
stlport::map	sort: 3.1 rand: 2.7	sort: 3.0 rand: 2.6	sort: 4.4 rand: 3.8	0.35	sort: 1.4 rand: 1.4

Performance Notes

Space considerations

Lookup speed

Rough performance measurements on 150,000+ words English dictionary.

Measurement of 50,000 US library code strings (like "WAFR______5054____33")

Measurement of 50,000 US library code strings (like "WAFR__505433")