Average string length was 8.3 chars. The TST trees used 3.12 nodes per string,
so prefix sharing makes these strings use 0.37 nodes (just under 6 bytes) per char.
All in all 1242 k characters were stored in 7.3 Mb (on 32-bit architectures).
The type of keys used affects performance, though less than insertion sort order.
Some reference runs with 10-80k words were made, to verify performance tendencies.
Function | find success | find failed | insert | erase | iteration |
---|---|---|---|---|---|
DDJ | sort: 0.96 rand: 2.9 |
sort: 0.32 rand: 0.42 |
sort: 2.75 rand: 6.4 |
2.74 | (N/A) |
libtst | sort: 1.0 rand: 2.6 |
sort: 0.18 rand: 0.18 |
sort: 1.6 rand: 3.4 |
0.5 | (N/A) |
ternary_tree 0.67 | sort: 2.0 rand: 3.7 |
sort: 0.8 rand: 0.77 |
sort: 3.3 rand: 4.67 |
0.9 | sort: 1.2 rand: 2.1 |
stlport::hash_map | 2.1 | 1.2 | sort: 4.75 rand: 3.6 |
O(n) always | 1.0 O(log n) |
stlport::map | sort: 4.5 rand: 8.3 |
sort: 4.0 rand: 3.5 |
sort: 7.3 rand: 9.8 |
0.84 | 2.1 O(1) |
The figures marked "sort" and "rand" refer to trees built by sorted or random order insertion.
Relative performance of advanced searches
The advanced searches are only implemented for DDJ and our ternary_tree, and perform
at nearly the same speed.
Timings in the table below are clock-time millisecs spent in 1000 different searches (or, can equivalently be read as microsecs per search). The machine was a 2.4 GHz Intel with 512 Mb RAM running Windows XP. For reference, single key lookup in tests above ran at about 0.3 microsec/call for DDJ, 0.77 microsec/call for ternary_tree.
Function | partial match 1 wildcard |
partial match 3 wc |
partial match 5 wc |
---|---|---|---|
keys found per call | 1.3 | 12.4 | 106.3 |
DDJ | sort: 8.1 rand: 9.7 |
sort: 100 rand: 60.4 |
sort: 526 rand: 274 |
ternary_tree 0.67 | sort: 6.3 rand: 6.3 |
sort: 96 rand: 63.5 |
sort: 523 rand: 244 |
Function | hamming dist=1 |
hamming dist=3 |
hamming dist=5 |
keys found per call | 3.5 | 181.4 | 4207.7 |
DDJ | sort: 68.0 rand: 44.8 |
sort: 3320.1 rand: 1815.0 |
sort: 17908.4 rand: 5937.5 |
ternary_tree 0.67 | sort: 57.64 rand: 40.2 |
sort: 2957.67 rand: 1655.91 |
sort: 16512.5 rand: 5225.15 |
Function | levenshtein dist=1 |
levenshtein dist=3 |
levenshtein dist=5 |
keys found per call | 4.01 | 316.0 | 8294.3 |
ternary_tree 0.675 | sort: 172.93 rand: 158.56 |
sort: 12088.2 rand: 8427.8 |
sort: 81116.9 rand: 66182.6 |
Function | combinatorial 0 wildcards |
combinatorial 1 wc |
combinatorial 2 wc |
keys found per call | 230.2 | 2310.2 | 10093.7 |
ternary_tree 0.67 | sort: 552.34 rand: 417.97 |
sort: 4189.05 rand: 2880.18 |
sort: 13585.6 rand: 8629.8 |
ternary_tree usually performs a few percent faster than DDJ implementation. As explained above, one could as well say they are equally fast. This by itself is good news though, since our insertion and single-key lookup is quite a bit slower than C implementations.
Function | find success | find failed | insert | erase | Iteration |
---|---|---|---|---|---|
DDJ | sort: 1.1 rand: 1.2 |
sort: 1.25 rand: 1.07 |
sort: 1.85 rand: 2.1 |
(N/A) | (N/A) |
libtst | sort: 1.0 rand: 1.0 |
sort: 1.0 rand: 1.0 |
sort: 1.5 rand: 1.8 |
sort: 1.1 | (N/A) |
Boost.Spirit | sort: 1.4 rand: 1.6 |
sort: 1.6 rand: 1.4 |
sort: 16.2 rand: 17-110+ |
sort: 5.25 rand: 88.5 |
(N/A) |
ternary_tree 0.65 | sort: 1.7 rand: 1.8 |
sort: 1.0 rand: 1.5 |
sort: 3.0 rand: 3.3 |
sort: 0.8 rand: 0.9 |
sort: 2.8 rand: 3.3 |
stlport::hash_map | 0.9 | 0.5 | sort: 1.9 rand: 1.5 |
O(n) | 1.0 |
stlport::map | sort: 3.1 rand: 2.7 |
sort: 3.0 rand: 2.6 |
sort: 4.4 rand: 3.8 |
0.35 | sort: 1.4 rand: 1.4 |