Overview Usage Performance Notes Links | tst 0.68 - 23 Jan 2009 |
A ternary tree is a Structured Associative Container. This means that its key type is required to be a Forward Container, and that the tree uses a comparator to establish a strict weak ordering among key::value_type
elements (rather than on whole keys). This allows searches involving parts of keys, ie with shared prefix or with shared middle parts.
ternary_tree is a Unique Associative Container. Note that it is not Pair Associative, so the value that is returned when dereferencing an iterator is not a std::pair<const Key, Data>
as for map or unsorted_map. Instead the Data value is returned. To get the key that a ternary_tree iterator is positioned at, iterator specifies member function key() - see tst_detail::iter_method_forward.
ternary_tree is a Sorted, and thus by implication a Reversible Container. While iteration is amortized constant time, it is often slower than other Associative containers since it must walk several key element nodes to find next key endnode. The iteration cost factor is proportional to node_count() / total_key_length().
ternary_tree generally conforms to standard Associative Container exception guarantees:
TST_NODE_COUNT_TYPE:
controls the count of nodes. On systems where std::size_t may be a 64-bit value, this can be set to a 32-bit unsigned type, if you know that the total count of characters inserted in the tree will not exceed its limits. If the macro is undefined, it defaults to std::size_t
. Note that the macro is undefined internally in the tree, so no code except the node typedef uses it.TST_USE_FASTMAP
is defined, a direct link to first-level symbol nodes is maintained internally. This provides no measurable improvement with 8-bit charsets, but the interested user can try it with 16-bit chars (and please report if you can measure it)See further Implementation Details
It should be noted that insertion order affects tree performance: After sorted (lexicographical) insertion, lookup is sometimes 50-100% slower, while iterators are (always) 50-90% faster than with random-order string insertion. There is currently no support for rebalancing the tree.
Public Types | |
enum | { wildcard_char = char_type('?') } |
Default wildcard for partial_match_search. More... | |
typedef StringT | key_type |
typedef StringT::value_type | char_type |
typedef CompT | char_compare |
typedef DataT | value_type |
typedef DataT | mapped_type |
typedef AllocT | allocator_type |
typedef AllocT::pointer | pointer |
typedef AllocT::const_pointer | const_pointer |
typedef AllocT::reference | reference |
typedef AllocT::const_reference | const_reference |
typedef AllocT::difference_type | difference_type |
typedef AllocT::size_type | size_type |
typedef node::node_index | node_index |
Dependent type, defined by the macro TST_NODE_COUNT_TYPE (default: size_type). | |
typedef std::vector< node, typename allocator_type::template rebind< node >::other > | NodeList |
Impl note: nodes are stored in vector, effectively working as pool allocator. node_index vector offsets are used instead of pointers. | |
typedef tst_detail::tst_iterator_base < NodeList, string_type > | tst_iterator_base |
typedef tst_detail::tst_iterator_base<StringT, DataT, CompT, AllocT> tst_iterator_base; | |
typedef TST_ITERATOR_FACADE_BEGIN iterators::iterator_wrapper < tst_iterator_base, iterators::nonconst_traits < value_type > > TST_ITERATOR_FACADE_END | iterator |
typedef TST_ITERATOR_FACADE_BEGIN iterators::iterator_wrapper < tst_iterator_base, iterators::const_traits < value_type > > TST_ITERATOR_FACADE_END | const_iterator |
typedef std::reverse_iterator < iterator > | reverse_iterator |
typedef std::reverse_iterator < const_iterator > | const_reverse_iterator |
typedef search_results_list < ternary_tree, iterator > | search_results_list |
Public Member Functions | |
Construct, copy, destroy | |
ternary_tree (const char_compare &comp=char_compare(), const allocator_type &alloc=allocator_type()) | |
template<class InputIterator> | |
ternary_tree (InputIterator first, InputIterator last, const char_compare &comp=char_compare(), const allocator_type &alloc=allocator_type()) | |
ternary_tree (const ternary_tree &other) | |
~ternary_tree () | |
ternary_tree & | operator= (const ternary_tree &other) |
allocator_type | get_allocator () const |
Returns a copy of the allocator used to construct the tree. | |
Iterators | |
iterator | begin () |
const_iterator | begin () const |
iterator | end () |
const_iterator | end () const |
reverse_iterator | rbegin () |
const_reverse_iterator | rbegin () const |
reverse_iterator | rend () |
const_reverse_iterator | rend () const |
const_iterator | cbegin () const |
const_iterator | cend () const |
const_reverse_iterator | crbegin () const |
const_reverse_iterator | crend () const |
Capacity | |
bool | empty () const |
size_type | size () const |
size_type | max_size () const |
Element access | |
mapped_type & | operator[] (const key_type &key) |
Inserts a key into the tree and returns a mutable reference to its mapped_value. | |
Modifiers | |
std::pair< iterator, bool > | insert (const std::pair< key_type, value_type > &val) |
Returns a pair whose bool component returns true if an insertion was made and false if the tree already contained an an equivalent key. | |
template<class InputIterator> | |
void | insert (InputIterator first, InputIterator last) |
Insert a range from another ternary_tree (or any iterator to pair with same key_type, mapped_type). | |
void | insert (const_iterator first, const_iterator last) |
iterator | insert (iterator where, const std::pair< key_type, value_type > &val) |
Associative Container method, we do not use hint, so pointless. | |
size_type | erase (const key_type &key) |
Makes key unreachable, but usually does not reclaim nodes to memory pool. | |
iterator | erase (iterator it) |
Makes key unreachable, but usually does not reclaim nodes to memory pool. | |
iterator | erase (iterator first, iterator last) |
Erases a range of values - makes keys unreachable, but usually does not reclaim nodes to memory pool (currently just removes each in a loop, could try better wholesale node removal). | |
void | swap (ternary_tree &other) |
void | clear () |
Sorted Associative Container methods. | |
const_iterator | find (const key_type &key) const |
Returns const_iterator to key, or end() if not found. | |
iterator | find (const key_type &key) |
Returns mutable iterator to key, or end() if not found. | |
iterator | lower_bound (const key_type &key) |
Returns an iterator to the first element in a map with a key value that is equal to or greater than that of the specified key. | |
iterator | upper_bound (const key_type &key) |
Returns an iterator to the first element in a map with a key value that is greater than the specified key. | |
const_iterator | lower_bound (const key_type &key) const |
const_iterator | upper_bound (const key_type &key) const |
std::pair< iterator, iterator > | equal_range (const key_type &key) |
Returns a pair of iterators respectively to the first element in a map with a key that is equal to or greater than the specified key, and a value that is greater than key. | |
std::pair< const_iterator, const_iterator > | equal_range (const key_type &key) const |
size_type | count (const key_type &key) const |
Returns count of key values in tree - returns 0 or 1. | |
Structured Container search methods [deprecated] | |
These searches are possible by storing key elements in strict weak ordering, anchored at beginning of keys. This allows sub-key searches, ie searching for keys with matching subparts (ie first and last letter of a four-letter string must match, or keys differing from search string in at most n letters).
Complexity of sub-key searches are proportional to the "breadth" of the search. Specifying that more than about 10% of key elements may differ can lead to the search operation traversing large parts of the tree. The front anchoring of key elements also have the consequence that prefix matches are same cost as find operation (log in tree size), while suffix searches are linear in tree size. | |
std::pair< iterator, iterator > | prefix_range (const key_type &prefix) |
Returns a pair of iterators, the first points to the first key that begins with the prefix (or end()), the second to the string following the last string that begins with prefix. | |
std::pair< const_iterator, const_iterator > | prefix_range (const key_type &prefix) const |
template<class CharIter> | |
iterator | longest_match (CharIter &first, CharIter last) |
Returns the longest key that matches the beginning of the key element range (or end(), if none is found). | |
template<class CharIter> | |
const_iterator | longest_match (CharIter &first, CharIter last) const |
search_results_list | create_search_results () const |
Factory method for use with wrapper classes. | |
template<class OutputIter> | |
OutputIter | partial_match_search (const key_type &key, OutputIter results, char_type wildcard=wildcard_char) const |
Partial match search - supports wildcard characters (default: '?') Proxies for iterators to the found strings are pushed at the output iterator. | |
template<class OutputIter> | |
OutputIter | hamming_search (const key_type &key, OutputIter results, size_type dist) const |
Hamming search: Finds all keys differing from key in at most dist characters (as an extension to strict hamming search, this includes matching but shorter/longer keys). | |
template<class OutputIter> | |
OutputIter | levenshtein_search (const key_type &key, OutputIter results, size_type dist) const |
Levenshtein search: Finds all strings differing from key in at most dist characters. | |
template<class OutputIter> | |
OutputIter | combinatorial_search (const key_type &key, OutputIter results, size_type wildcards=0) const |
Combinatorial, or "scrabble" search: Finds all keys containing the characters in the search string. | |
Observers | |
ternary_tree defines a key_compare class with same semantics as Sorted Associative Containers.
Note that there is no value_compare type - the value_type does not contain the key. | |
char_compare | char_comp () const |
Returns a copy of the key element comparator object. | |
key_compare | key_comp () const |
Returns a copy of the key_compare class. | |
size_type | node_count () const |
Returns number of nodes in tree. | |
size_type | item_count () const |
Returns number of string-value pairs (alias for size()). | |
size_type | total_key_length () const |
Returns sum of lengths of all keys inserted into tree. | |
size_type | longest_key () const |
Returns length of longest key ever inserted into tree (if key was erased, this value is not updated). | |
Container comparison. | |
bool | operator== (const ternary_tree &other) const |
bool | operator< (const ternary_tree &other) const |
Static Public Member Functions | |
static std::ostream & | print_node (std::ostream &ostr, const node &n) |
Friends | |
class | tst_detail::tst_iterator_base< NodeList, string_type > |
friend class tst_detail::tst_iterator_base<StringT, DataT, CompT, AllocT>; | |
struct | tst_detail::iter_method_forward< ternary_tree > |
Needed to construct iterators for (eg) structured_multimap. | |
class | search_results_list |
Related Functions | |
(Note that these are not member functions.) | |
template<class S, class D, class C, class A> | |
bool | operator== (ternary_tree< S, D, C, A > const &x, ternary_tree< S, D, C, A > const &y) |
template<class S, class D, class C, class A> | |
bool | operator!= (ternary_tree< S, D, C, A > const &x, ternary_tree< S, D, C, A > const &y) |
template<class S, class D, class C, class A> | |
bool | operator< (ternary_tree< S, D, C, A > const &x, ternary_tree< S, D, C, A > const &y) |
template<class S, class D, class C, class A> | |
bool | operator> (ternary_tree< S, D, C, A > const &x, ternary_tree< S, D, C, A > const &y) |
template<class S, class D, class C, class A> | |
bool | operator<= (ternary_tree< S, D, C, A > const &x, ternary_tree< S, D, C, A > const &y) |
template<class S, class D, class C, class A> | |
bool | operator>= (ternary_tree< S, D, C, A > const &x, ternary_tree< S, D, C, A > const &y) |
template<class S, class D, class C, class A> | |
std::ostream & | operator<< (std::ostream &ostr, typename ternary_tree< S, D, C, A >::node const &node) |
Classes | |
struct | find_result |
class | key_compare |
Key comparator, defined in terms of a lexicographical_compare using the char_compare object. More... |
typedef StringT key_type |
typedef StringT::value_type char_type |
typedef CompT char_compare |
typedef DataT value_type |
typedef DataT mapped_type |
typedef AllocT allocator_type |
typedef AllocT::pointer pointer |
typedef AllocT::const_pointer const_pointer |
typedef AllocT::reference reference |
typedef AllocT::const_reference const_reference |
typedef AllocT::difference_type difference_type |
typedef AllocT::size_type size_type |
typedef node::node_index node_index |
Dependent type, defined by the macro TST_NODE_COUNT_TYPE
(default: size_type).
typedef std::vector<node, typename allocator_type::template rebind<node>::other> NodeList |
Impl note: nodes are stored in vector, effectively working as pool allocator. node_index vector offsets are used instead of pointers.
typedef tst_detail::tst_iterator_base<NodeList, string_type> tst_iterator_base |
typedef tst_detail::tst_iterator_base<StringT, DataT, CompT, AllocT> tst_iterator_base;
typedef TST_ITERATOR_FACADE_BEGIN iterators::iterator_wrapper< tst_iterator_base , iterators::nonconst_traits<value_type> > TST_ITERATOR_FACADE_END iterator |
typedef TST_ITERATOR_FACADE_BEGIN iterators::iterator_wrapper< tst_iterator_base , iterators::const_traits<value_type> > TST_ITERATOR_FACADE_END const_iterator |
typedef std::reverse_iterator<iterator> reverse_iterator |
typedef std::reverse_iterator<const_iterator> const_reverse_iterator |
ternary_tree | ( | const char_compare & | comp = char_compare() , |
|
const allocator_type & | alloc = allocator_type() | |||
) | [inline, explicit] |
ternary_tree | ( | InputIterator | first, | |
InputIterator | last, | |||
const char_compare & | comp = char_compare() , |
|||
const allocator_type & | alloc = allocator_type() | |||
) | [inline] |
ternary_tree | ( | const ternary_tree< StringT, DataT, CompT, AllocT > & | other | ) | [inline] |
~ternary_tree | ( | ) | [inline] |
# node::deallocate(m_nodes.begin(), m_nodes.end(), m_allocator);
ternary_tree& operator= | ( | const ternary_tree< StringT, DataT, CompT, AllocT > & | other | ) | [inline] |
allocator_type get_allocator | ( | ) | const [inline] |
Returns a copy of the allocator used to construct the tree.
iterator begin | ( | ) | [inline] |
const_iterator begin | ( | ) | const [inline] |
iterator end | ( | ) | [inline] |
const_iterator end | ( | ) | const [inline] |
reverse_iterator rbegin | ( | ) | [inline] |
const_reverse_iterator rbegin | ( | ) | const [inline] |
reverse_iterator rend | ( | ) | [inline] |
const_reverse_iterator rend | ( | ) | const [inline] |
const_iterator cbegin | ( | ) | const [inline] |
const_iterator cend | ( | ) | const [inline] |
const_reverse_iterator crbegin | ( | ) | const [inline] |
const_reverse_iterator crend | ( | ) | const [inline] |
bool empty | ( | ) | const [inline] |
size_type size | ( | ) | const [inline] |
size_type max_size | ( | ) | const [inline] |
mapped_type& operator[] | ( | const key_type & | key | ) | [inline] |
Inserts a key into the tree and returns a mutable reference to its mapped_value.
std::pair<iterator, bool> insert | ( | const std::pair< key_type, value_type > & | val | ) | [inline] |
Returns a pair whose bool component returns true if an insertion was made and false if the tree already contained an an equivalent key.
void insert | ( | InputIterator | first, | |
InputIterator | last | |||
) | [inline] |
Insert a range from another ternary_tree (or any iterator to pair with same key_type, mapped_type).
void insert | ( | const_iterator | first, | |
const_iterator | last | |||
) | [inline] |
iterator insert | ( | iterator | where, | |
const std::pair< key_type, value_type > & | val | |||
) | [inline] |
Associative Container method, we do not use hint, so pointless.
Makes key unreachable, but usually does not reclaim nodes to memory pool.
Postcondition: find(key) returns end(). Does not invalidate any iterators except those pointing to an erased element. If key exists, item_count() is decremented; node_count() usually does not change. If size() == 1, clear() is called to wipe the slate clean.
Makes key unreachable, but usually does not reclaim nodes to memory pool.
Postcondition: find(key) returns end(). Does not invalidate any iterators except those pointing to an erased element. If key exists, item_count() is decremented; node_count() usually does not change. If size() == 1, clear() is called to wipe the slate clean.
Erases a range of values - makes keys unreachable, but usually does not reclaim nodes to memory pool (currently just removes each in a loop, could try better wholesale node removal).
Returns iterator as per C++0x
void swap | ( | ternary_tree< StringT, DataT, CompT, AllocT > & | other | ) | [inline] |
void clear | ( | ) | [inline] |
const_iterator find | ( | const key_type & | key | ) | const [inline] |
Returns const_iterator to key, or end() if not found.
Returns mutable iterator to key, or end() if not found.
Returns an iterator to the first element in a map with a key value that is equal to or greater than that of the specified key.
If key exists, it is returned, else returns the element that has key for a prefix (or end() if no such element exists).
Complexity is that of a find() operation.
Returns an iterator to the first element in a map with a key value that is greater than the specified key.
If key exists, this returns ++key, else it returns the element that has key for a prefix (or end() if no such element exists).
Complexity is that of a find() operation + one iterator increment.
const_iterator lower_bound | ( | const key_type & | key | ) | const [inline] |
const_iterator upper_bound | ( | const key_type & | key | ) | const [inline] |
Returns a pair of iterators respectively to the first element in a map with a key that is equal to or greater than the specified key, and a value that is greater than key.
The first is the lower_bound of the key, and the second is the upper_bound of the key.
Complexity is that of a find() operation + one iterator increment.
std::pair<const_iterator, const_iterator> equal_range | ( | const key_type & | key | ) | const [inline] |
Returns count of key values in tree - returns 0 or 1.
Returns a pair of iterators, the first points to the first key that begins with the prefix (or end()), the second to the string following the last string that begins with prefix.
The first iterator is the lower_bound of the key, while the second is the first key that does not share the prefix. In a container with values { "aa", "aaa" "ab" }, equal_range("aa") returns { "aa", "aaa" }, while prefix_match returns { "aa", "ab" }. Worst-case complexity is that of two find(prefix) operations.
std::pair<const_iterator, const_iterator> prefix_range | ( | const key_type & | prefix | ) | const [inline] |
iterator longest_match | ( | CharIter & | first, | |
CharIter | last | |||
) | [inline] |
Returns the longest key that matches the beginning of the key element range (or end(), if none is found).
The key element iterator first is advanced to the element following the last matched character.
(This method allows using the TST dictionary as a simple tokenizer over first, last.)
Complexity is that of a single find operation.
const_iterator longest_match | ( | CharIter & | first, | |
CharIter | last | |||
) | const [inline] |
ternary_tree< S, D, C, A >::search_results_list create_search_results | ( | ) | const [inline] |
Factory method for use with wrapper classes.
OutputIter partial_match_search | ( | const key_type & | key, | |
OutputIter | results, | |||
char_type | wildcard = wildcard_char | |||
) | const [inline] |
Partial match search - supports wildcard characters (default: '?') Proxies for iterators to the found strings are pushed at the output iterator.
A wildcard in search string entails visiting all subtrees of matched node(s). The worst-case complexity of partial match search is therefore O(m * k) * regular-find, where m is size of alphabet, k is number of wildcard characters in search string, and "regular-find" is lookup cost of the rest of the search string. In practice, only a fraction of the alphabet is present except near root level of the tree, but concrete performance is unpredictable. Typical timings are in order of 10 times find(key) per item returned. See article available at Jon Bentley and Robert Sedgewick site .
OutputIter hamming_search | ( | const key_type & | key, | |
OutputIter | results, | |||
size_type | dist | |||
) | const [inline] |
Hamming search: Finds all keys differing from key
in at most dist
characters (as an extension to strict hamming search, this includes matching but shorter/longer keys).
Proxies for iterators to the found strings are pushed at the output iterator.
Searches on values of dist > 20% of average key length are typically expensive, since a large part of the tree is traversed (expensive may mean several microseconds per call).
OutputIter levenshtein_search | ( | const key_type & | key, | |
OutputIter | results, | |||
size_type | dist | |||
) | const [inline] |
Levenshtein search: Finds all strings differing from key
in at most dist
characters.
Proxies for iterators to the found strings are pushed at the output iterator.
Searches on values of dist > 20% of average key length are typically expensive, since a large part of the tree is traversed (expensive may mean several microseconds per call).
OutputIter combinatorial_search | ( | const key_type & | key, | |
OutputIter | results, | |||
size_type | wildcards = 0 | |||
) | const [inline] |
Combinatorial, or "scrabble" search: Finds all keys containing the characters in the search string.
The optional wildcard count allows mismatch characters (use with care, it increases complexity of the search dramatically).
char_compare char_comp | ( | ) | const [inline] |
Returns a copy of the key element comparator object.
key_compare key_comp | ( | ) | const [inline] |
Returns a copy of the key_compare class.
size_type node_count | ( | ) | const [inline] |
Returns number of nodes in tree.
size_type total_key_length | ( | ) | const [inline] |
Returns sum of lengths of all keys inserted into tree.
size_type longest_key | ( | ) | const [inline] |
Returns length of longest key ever inserted into tree (if key was erased, this value is not updated).
bool operator== | ( | const ternary_tree< StringT, DataT, CompT, AllocT > & | other | ) | const [inline] |
bool operator< | ( | const ternary_tree< StringT, DataT, CompT, AllocT > & | other | ) | const [inline] |
static std::ostream& print_node | ( | std::ostream & | ostr, | |
const node & | n | |||
) | [static] |
friend class tst_detail::tst_iterator_base< NodeList, string_type > [friend] |
friend class tst_detail::tst_iterator_base<StringT, DataT, CompT, AllocT>;
friend struct tst_detail::iter_method_forward< ternary_tree > [friend] |
Needed to construct iterators for (eg) structured_multimap.
friend class search_results_list [friend] |
bool operator== | ( | ternary_tree< S, D, C, A > const & | x, | |
ternary_tree< S, D, C, A > const & | y | |||
) | [related] |
bool operator!= | ( | ternary_tree< S, D, C, A > const & | x, | |
ternary_tree< S, D, C, A > const & | y | |||
) | [related] |
bool operator< | ( | ternary_tree< S, D, C, A > const & | x, | |
ternary_tree< S, D, C, A > const & | y | |||
) | [related] |
bool operator> | ( | ternary_tree< S, D, C, A > const & | x, | |
ternary_tree< S, D, C, A > const & | y | |||
) | [related] |
bool operator<= | ( | ternary_tree< S, D, C, A > const & | x, | |
ternary_tree< S, D, C, A > const & | y | |||
) | [related] |
bool operator>= | ( | ternary_tree< S, D, C, A > const & | x, | |
ternary_tree< S, D, C, A > const & | y | |||
) | [related] |
std::ostream & operator<< | ( | std::ostream & | ostr, | |
typename ternary_tree< S, D, C, A >::node const & | node | |||
) | [related] |
ternary_tree 0.68 -- by rasmus ekman -- Page generated with Doxygen 1.5.6 on 23 Jan 2009 |