Cachemere
Modular Caching Library for C++
|
Space-efficient probabilistic data structure to estimate the number of times an item was inserted in a set. More...
#include <counting_bloom_filter.h>
Public Member Functions | |
CountingBloomFilter (uint32_t cardinality) | |
Constructor. More... | |
template<typename ItemKey > | |
void | add (const ItemKey &item) |
Increment the count for a given item by one. More... | |
void | clear () |
Clear the filter while keeping the allocated memory. | |
void | decay () |
Divide counter values by two. More... | |
template<typename ItemKey > | |
uint32_t | estimate (const ItemKey &item) const |
Get the counter estimate for a given item. More... | |
uint32_t | cardinality () const noexcept |
Get the cardinality of the filter. More... | |
size_t | memory_used () const noexcept |
Get an estimate of the memory consumption of the filter. More... | |
double | saturation () const noexcept |
Get the saturation of the filter. More... | |
Space-efficient probabilistic data structure to estimate the number of times an item was inserted in a set.
A counting bloom filter is a constant-sized data structure, which means that insertions will never make the filter allocate more memory. However, too many inserts will severely impact the accuracy of counter estimates.
cachemere::policy::detail::CountingBloomFilter< ItemHash >::CountingBloomFilter | ( | uint32_t | cardinality | ) |
Constructor.
To use this data structure at its full potential, it's very important to have a good estimate for the cardinality of the set to be inserted.
cardinality | The expected cardinality of the set to be inserted in the filter. |
void cachemere::policy::detail::CountingBloomFilter< ItemHash >::add | ( | const ItemKey & | item | ) |
Increment the count for a given item by one.
item | The item of the counter to increment. |
|
noexcept |
Get the cardinality of the filter.
void cachemere::policy::detail::CountingBloomFilter< ItemHash >::decay |
Divide counter values by two.
If the user of this filter doesn't rely on the absolute value of the counters, calling this regularly will decrease the saturation of the filter by removing items that are very rarely seen.
uint32_t cachemere::policy::detail::CountingBloomFilter< ItemHash >::estimate | ( | const ItemKey & | item | ) | const |
Get the counter estimate for a given item.
item | The item for which to estimate the count. |
Similarly to the way a bloom filter can return false positives, but not false negatives, the counter estimation produced by a counting bloom filter is actually an upper bound of the real counter value - the real count is guaranteed to be less or equal to the estimate returned.
|
noexcept |
Get an estimate of the memory consumption of the filter.
|
noexcept |
Get the saturation of the filter.
As filter saturations increases, so will the probability of false positives. A filter saturation of 1.0 means that all underlying counters are non-zero, so every call to estimate
will return a counter value greater than 1. Increasing the filter cardinality will slow down the saturation of the filter, at the cost of using more memory.