Microsecond Latency Rules Engine with RoaringBitmap
source link: https://richardstartin.github.io/posts/fast-and-simple-rules-engine-with-roaringbitmap
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Microsecond Latency Rules Engine with RoaringBitmap
Apr 8, 2017 | roaringpattern-matching
Implementing a rules engine can shorten development time and remove a lot of tedious if statements from your business logic. Unfortunately they are almost always slow and often bloated. Simple rules engines can be implemented by assigning integer salience to each line in a truth table, with rule resolution treated as an iterative intersection of ordered sets of integers. Implemented in terms of sorted sets, it would be remiss not to consider RoaringBitmap for the engine’s core. The code is at github.
Classification Table and Syntax
This rules engine builds on the simple idea of a truth table usually used to teach predicate logic and computer hardware. Starting with a table and some attributes, interpreting one attribute as a classification, we get a list of rules. It is trivial to load such a table from a database. Since classifications can overlap, we prioritise by putting the rules we care about most - or the most salient rules - at the top of the table. When multiple rules match a fact, we take the last in the set ordered by salience. So we don’t always have to specify all of the attributes to get a classification, we can rank attributes by their importance left to right, where it’s required that all attributes to the left of a specified attribute are also specified when matching a fact against a rule set.
It’s possible to define rules containing wildcards. Wildcard rules will match any query (warning: if these are marked as high salience they will hide more specific rules with lower salience). It’s also possible to specify a prefix with a wildcard, which will match any query that matches at least the prefix.
Below is an example table consisting of rules for classification of regional English accents by phonetic feature.
thought cloth lot palm plant bath trap accent /ɔ/ /ɒ/ /ɑ/ /ɑː/ /ɑː/ /ɑː/ /æ/ Received Pronunciation (UK) /ɔ/ /ɔ/ /ɑ/ /ɑ/ /æ/ /æ/ /æ/ Georgian (US) /ɑ/ /ɑ/ /ɑ/ /ɑ/ /æ/ /æ/ /æ/ Canadian * * /ɑ/ /ɑ/ /æ/ /æ/ /æ/ North American * * * * * * /æ/ Non Native * * * * * * * French
In the example above, the vowel sounds used in words differentiating speakers of several English accents are configured as a classification table. The accent column is the classification of any speaker exhibiting the properties specified in the six leftmost columns. UK Received Pronunciation is the most specific rule and has high salience, whereas various North American accents differ from RP in their use of short A vowels. A catch all for North American accents would wild card the sounds in thought and caught (contrast Boston pronunciations with Texas). So long as trap has been pronounced with a short A (which all English speakers do), and no other rule would recognise the sounds used in the first six words, the rule engine would conclude the speaker is using English as a second language. If not even the word trap is recognisable, then the speaker is probably unintelligible, or could be French.
Implementation
A rule with a given salience can be represented by creating a bitmap index on salience by the attribute values of the rules. For instance, to store the rule {foo, bar} -> 42
, with salience 10, create a bitmap index on the first attribute of the rule, and set the 10th bit of the “foo” bitmap; likewise for the “bar” bitmap of the second index. Finding rules which match both attributes is a bitwise intersection, and since we rank by salience, the rule that wins is the first in the set. An obvious choice for fast ordered sets is RoaringBitmap.
RoaringBitmap consists of containers, which are fast, cache-friendly sorted sets of integers, and can contain up to 2^16 shorts. In RoaringBitmap, containers are indexed by keys consisting of the most significant 16 bits of the integer. For a rules engine, if you have more than 2^16 rules you have a much bigger problem anyway, so a container could index all the rules you could ever need, so RoaringBitmap itself would be overkill. While RoaringBitmap indexes containers by shorts (it does so for the sake of compression), we can implement wildcard and prefix matching by associating containers with Strings rather than shorts. As the core data structure of the rules engine, a RoaringBitmap container is placed at each node of an Apache commons PatriciaTrie
. It’s really that simple - see the source at github.
When the rules engine is queried, a set consisting of all the rules that match is intersected with the container found at the node in the trie matching the value specified for each attribute. When more than one rule matches, the rule with the highest salience is accessed via the Container.first() method, one of the features I have contributed to RoaringBitmap. See example usage at github.
Recommend
-
7
Here's How We Built a HIPAA-Compliant Rules Engine for a Medtech StartupDecember 14th 2020 new story106
-
7
A Quick Look at RoaringBitmap Mar 1, 2017 | roaring This article is an introduction to the data structures found in the
-
15
Copy link Urhengulas
-
3
最近在某些機緣下認識了規則引擎,以華文介紹規則引擎的文章好像不多,這邊試著粗淺的記下我對規則引擎的認識。 什麼是規則引擎? 在程式的世界,規則引擎通常以套件的方式存在,在系統內藉由規引擎套件的引入,可以利用規則引擎提供...
-
3
The Demikernel Datapath OS Architecture for Microsecond-scale Datacenter Systems Published November 09, 2021 Found something wrong?
-
5
RoaringBitmap插件能将ElasticSearch过滤性能提高 10 倍 Java中更好地压缩位图、位集。通常用作快速数据结构,如果没有压缩它们可能会使用太多内存。RoaringBitmap性能往往优于传统的压缩位图,例如 WAH、EWAH 或 Concise。特点: 非常快的...
-
7
Detection Rules Detection Rules is the home for rules used by Elastic Security. This repository is used for the development, maintenance, testing, validation, and release of rules for Elastic Security’s Detection Engine. This...
-
13
bitmap的表象意义是,使用一个01标识位来表示是否的状态,可以达到节省空间和高效判定的效果。在我们的实际工作中,也有着许多的应用场景,相信了解bitmap定会给你带来一些额外的收获。 1. bitmap使用场景说明
-
4
RoaringBitmap 的原理及在 Go 中如何使用 webff · 大约4小时之前 · 43 次点...
-
2
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK