6

TreeFlat: Building a (possible) faster tree for Rust, inspired by APL

 2 years ago
source link: https://www.elmalabarista.com/blog/2022-flat-tree/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

TreeFlat: Building a (possible) faster tree for Rust, inspired by APL

Motivation

I start paying attention to the parsing of my little lang TablaM with the desire of improving the display of errors. This leads me to a rabbit hole where re-architecting the parser I need a better way to store the AST in a Tree.

Of course, I check several crates for this, but then I remember a different way to do them!

The Arena is the new pointer on Rust

In Rust, building graphs/trees is challengin, because of the way it enforces the ownership of data. Luckily, it pushes towards a more simple & performant pattern that is HOT! HOT! HOT! on Rust:

The Arena design pattern.

Several crates exploit it for great effect, however differently to how I see is done elsewhere when applied to Trees, with the arenas-ids are nested with enums like:

struct Node<T> {
    parent: Option<NodeId>,
    prev_sibling: Option<NodeId>,
    next_sibling: Option<NodeId>,
    children: Option<(NodeId, NodeId)>,
    value: T,
}

This one is inspired by the talk “High-performance Tree Wrangling, the APL Way”, where the tree is made with vectors.

So, a Tree:

. Users
├── jhon_doe
├   ├── file1.rs
├   ├── file2.rs
├── jane_doe
└────── cat.jpg

... is stored in pre-order on 3 vectors, that store the data, the level/deep & the parent node:

DATA:Usersjhon_doefile1.rsfile2.rsjane_doecat.jpg LEVEl:012212 PARENT:001104

This means that all the searching/iterating/navigation is linear on the vectors.

Fix the data layout, and the algorithms are simpler

Similar to things like bumpalo, TreeFlat is not made to cover all the needs for a tree, is just a one-trick pony that is tailored for the use-case of a pre-order tree, and only know how to push nodes at the end.

With this, instead of "jump" chashing pointers/ids the default iteration is purely linear:

impl<'a, T: Debug> Iterator for TreeIter<'a, T> {
    type Item = Node<'a, T>;

    fn next(&mut self) -> Option<Self::Item> {
        let id = NodeId(self.pos);
        self.pos += 1;
        if self.pos <= self.tree.len() {
            Some(self.tree._make_node(id))
        } else {
            None
        }
    }
}

More interesting, navigating the children, parents can exploit this observation:

  • The children are at the right/up of the parent
  • The parents are at the left/down of the children
  • The siblings are all that share the same level

So this means that in the case of navigating the children of jhon_doe:

. Users				
		 ⇡ parents

├── jhon_doe	= Index: 1, Level: 1

		 ⇩ children start at 
			jhon_doe + 1,
			level 	 > jhon_doe

├   ├── file1.rs ?: Level 2 is child!
├   ├── file2.rs ?: Level 2 is child!
├── jane_doe	 ?: Level 1 is below, stop!

└────── cat.jpg

With this, instead of searching a potentially large array, it jumps directly after the node and iterates as long the nodes are above it!.

Performance?

For the first time, I tried to check if the performance claims of the APL way were real, and if I manage to capture it.

I mean: not only the first time on Rust doing this, first time since I have memory I apply a "proper" benchmarking tool more sophisticated than "print the elapsed time in a function". You see, I work mostly in business apps, where I don't need to worry about nanoseconds of difference. With Rust, is another level...

To measure, I use the excellent Criterion.

I also pick the excellent crate ego-tree to compare against... because it already stores the data flat in a vector, the API is almost similar; just the node building is like the "obvious" way. ego-tree is more mature and battle-tested, so that is an excellent candidate to test ambitions (you wanna test against the best!).

Doing it I also hit several bugs in my implementation, and also get my dream crushed when I see that ego-tree brutally defeat my code... and here is where criterion saves my day!.

Looking at the graphs I see something suspicious: All the ego-tree tests show a "flat" line in the execution graph, so yeah, my initial code for the benchmarking was VERY wrong.

I finally make sure to use the same code for both, and now I get more comparable results.

(All this is still some silly benchmarks made by an amateur, so check yourself before believing!).

For example, iterating by children show a nice improvement thanks to the combination of tricks:

Building a tree of 47.370 nodes, with 10 levels, like:

├── 379
├   ├── 380
├   ├   ├── 381
├   ├   ├── 382
├   ├   ├── 383
├   ├   ├── 384
├   ├   ├── 385
├   ├   ├── 386
├   ├   ├── 387
├   ├   ├── 388
├   ├   ├── 389
├   ├   ├   ├── 390
├   ├   ├   ├── 391
├   ├   ├   ├── 392
├   ├   ├   ├── 393
├   ├   ├   ├── 394
├   ├   ├   ├── 395
├   ├   ├   ├── 396
├   ├   ├   ├── 397
├   ├   ├   ├── 398
├   ├   ├   ├   ├── 399
├   ├   ├   ├   ├── 400
├   ├   ├   ├   ├── 401
├   ├   ├   ├   ├── 402
├   ├   ├   ├   ├── 403
├   ├   ├   ├   ├── 404
├   ├   ├   ├   ├── 405
├   ├   ├   ├   ├── 406
├   ├   ├   ├   ├── 407
├   ├   ├   ├   ├   ├── 408
├   ├   ├   ├   ├   ├── 409
├   ├   ├   ├   ├   ├── 410
├   ├   ├   ├   ├   ├── 411
├   ├   ├   ├   ├   ├── 412
├   ├   ├   ├   ├   ├── 413
├   ├   ├   ├   ├   ├── 414
├   ├   ├   ├   ├   ├── 415
├   ├   ├   ├   ├   ├── 416
├   ├   ├   ├   ├   ├   ├── 417
├   ├   ├   ├   ├   ├   ├── 418
├   ├   ├   ├   ├   ├   ├── 419
├   ├   ├   ├   ├   ├   ├── 420
├   ├   ├   ├   ├   ├   ├── 421
├   ├   ├   ├   ├   ├   ├── 422
├   ├   ├   ├   ├   ├   ├── 423
├   ├   ├   ├   ├   ├   ├── 424
├   ├   ├   ├   ├   ├   ├── 425
├   ├   ├   ├   ├   ├   ├   ├── 426
├   ├   ├   ├   ├   ├   ├   ├── 427
├   ├   ├   ├   ├   ├   ├   ├── 428
├   ├   ├   ├   ├   ├   ├   ├── 429
├   ├   ├   ├   ├   ├   ├   ├── 430
├   ├   ├   ├   ├   ├   ├   ├── 431
├   ├   ├   ├   ├   ├   ├   ├── 432
├   ├   ├   ├   ├   ├   ├   ├── 433
├   ├   ├   ├   ├   ├   ├   ├── 434
├   ├   ├   ├   ├   ├   ├   ├   ├── 435
├   ├   ├   ├   ├   ├   ├   ├   ├── 436
├   ├   ├   ├   ├   ├   ├   ├   ├── 437
├   ├   ├   ├   ├   ├   ├   ├   ├── 438
├   ├   ├   ├   ├   ├   ├   ├   ├── 439
├   ├   ├   ├   ├   ├   ├   ├   ├── 440
├   ├   ├   ├   ├   ├   ├   ├   ├── 441
├   ├   ├   ├   ├   ├   ├   ├   ├── 442
├   ├   ├   ├   ├   ├   ├   ├   ├── 443
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 444
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 445
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 446
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 447
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 448
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 449
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 450
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 451
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 452
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 453
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 454
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 455
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 456
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 457
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 458
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 459
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 460
├── 469

Iter Tree children/Ego/4 #5
                        time:   [174.10 us 176.36 us 178.90 us]
                        thrpt:  [558.98 Kelem/s 567.01 Kelem/s 574.37 Kelem/s]
Iter Tree children/Flat/4 #5
                        time:   [32.950 us 33.451 us 33.962 us]
                        thrpt:  [2.9445 Melem/s 2.9894 Melem/s 3.0349 Melem/s]

MacBook Air (M1, 2020) : 16GB

Get the code!

The crate is at crates.io: tree-flat, & the code is at repo: tree-flat.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK