[Tutorial] Splay Tree: One Tree to Rule Them All

38 hours ago, # |

Swiss knife of all sequence manipulation data structure

Unfortunately that is not the case, even though the tree is very natural in it's coreThe problem is amortized complexityIt's not that it's bad, but when you need persistency or implicitness it just won't work

So I would recommend learning in this order

38 hours ago, # ^ |

I'm not sure what you mean by implicitness, but I agree with you about persistency. Splay trees are pretty fast to write once you understand them, so I kinda get why people prefer them over something deterministic (other than the LCT reason), but I recommend AVL trees for deterministic BBSTs (over the other two you mentioned). A ton of the stuff that you can do with AVL trees can be found here, and it is arguably the simplest deterministic BBST once you know treaps.

38 hours ago, # ^ |

I agree in principal. There is (generally) no universal algorithm that works on every problem. But Splay is good enough in most use cases, and when you have worked 1 week on something you would want to make a clickbait title :)

However, I do not think that Splay cannot do persistency or implicitness as you suggested.

Persistency means to create a new node (instead of editing the existing node) whenever a value changes on a node. Splay touches amortized nodes like segment tree so I am not sure why it cannot be done.

(Also see https://stackoverflow.com/questions/8348359/why-are-persistent-splay-trees-particularly-useful-in-functional-programming)

Implicitness (if I understand correctly) means to dynamically allocate nodes to accommodate large ranges like . I think in Splay we can also use one node to represent a range and expand when we need it.

(If you mean more trivial things like using the index of the node to be its key then you should read my wonderful article! :) )

I do not agree with the learning order though. I never really used Treap, AVL or RB tree in CP. I think Treap might have a use case in some niche problems but I cannot recall. I would recommend to learn in this order on sequence manipulation data structures (from easy to hard):

Fenwick Tree
Segment Tree
Splay

as they are way more common in CP. Splay covers almost everything Fenwick Tree and Segment Tree can do so I would argue it is a Swizz knife in sequence manipulation.

But again, the main argument against Splay is not what it cannot do, but the fact that it is slower than simple data structures like Fenwick Tree by a constant factor, just like how a Swizz knife does not excel in simpler tasks :)

38 hours ago, # ^ |

The problem with persistency is pretty simple. With splay tree you can have a state with up to height. The next operation might want to access an element that is deep down. Now persistency forces you to store every state so even read-only operations would blow up the running time on such a state.

As for implicitness, it can work since we're usually creating implicit nodes in a perfect BST manner. I'm not sure, but I think the math checks out. But it would still blow up in running time in other cases.

Segment trees and Fenwick trees are a must know, but they're not BST. Which means you can't insert in the middle and do other stuff BST can. So they shouldn't be a point in this article.

Treap is WAY easier to implement. That's why it is used so much. Everybody knows it's downsides, but the simplicity of implementation outweighs them all. Also you can implement it without being able to prove it's correctness which has it's benefits.

37 hours ago, # ^ |

I am not writing from a BST perspective, because we have set, map, unordered_map and other stuff in C++ the need for a raw BST is very low.

I am writing from a sequence manipulation perspective: you have a sequence and you want to do some magic on it: find the sum of a segment, add something to a segment, set the segment of values to something, etc.

That is also when I think Splay is useful because it allows dynamic change of sequence structure in addition to everything that a segment tree can do.

I think you are right on persistency, but I also see a lot of posts on it (another one: https://stackoverflow.com/questions/16181382/persistent-data-structures-a-persistent-index) so I am not sure what is happening. I will have to think about it.

35 hours ago, # ^ |

What are the downsides of treap?

32 hours ago, # ^ |

The most important downside is that it is randomized, probably.

Even splay tree is deterministic, so you can be sure that your code always produces the same result on the same input. Noticeably simplifies debug.

Other minor problems are:

You need a lot of random numbers. In case you don't need persistency, you can reduce this number to , but then you would have to pay with huge additional constant factor (something about 6 or so) or with the fact that you have no proof of the logarithmic upper bound.
You need to store additional info in a node (). Again, you can throw this info away and have all the problems above in case you don't need persistency. Or, you can make this info more or less useful by replacing it with the size, but in that case you would need even more random numbers. And in the case of persistent treap you even have no choice.
of expected additional memory per split/merge query in all cases where you don't store parent nodes. In comparison, splay trees can do all their operations with additional memory worst-case.

Also, the constant factor is not very pleasant in comparison to static trees like segment tree or Fenwick tree, but I don't think that it is a fair comparison. Sometimes the difference is even logarithmic because treaps have no inner structure on nodes, so you can't do a parallel binary search on two treaps simultaneously, but I doubt any dynamic tree has this property.

32 hours ago, # ^ |

When debugging a treap locally, I always fix the seed for all rngs if I am compiling it locally. This way it is easier to stress-test without wondering which seed your rng was seeded with.

How do you reduce the number of random numbers to as you mentioned in the first point? I just wrote my own implementation of standard (but fast and high-quality) rngs, so it takes care of that to some extent; nevertheless, it is still pretty interesting to see how the number of random numbers can be reduced so much.

26 hours ago, # ^ |

Well, I do that too, but in case you have multitests it does not really helps unless you reseed rng on each test. The easiest general way to do it is to construct it once per test and then pass it everywhere, but I feel like it is quite messy. The same problem applies to the stress testing.

Again, it is not a huge problem. Usually, when you see that your solution change its behaviour on seed change, you just go and check your treap, but it still may be unpleasant in some cases.

Well, in order to deal with amount of random you just replace with some good hash of, let's say, a pointer to the node. Then, you indeed need only random numbers to choose the hash from the family, but you actually need to have high independency. The number I know how to prove is 24-independence, I saw 5- or 6-independence in a literature with some restrictions (it gives only an amortized expected bound, not an expected bound per query, iirc), but it is still too much. Obviously, you can say screw it and choose a faster hash, but then you will be unable to prove the upper bound and sometimes it won't even work. For instance, I know a countertest for the hash "xor with the random number", you just need to add pointers in an increasing order.

Maybe there exists some fast hash that also relatively easily provable, but I don't know it. I also think that the background on this hash-based approach deserves a separate post, because of the amounts of math I try to encapsulate in words like "will/won't work".

23 hours ago, # ^ |

← Rev. 2 →

ngl, I don't have a problem with those, at least for codeforces

You can use the same seed when debugging. I don't see why you would need to debug multitests, since once you find the wrong case you just debug that case alone.

Generating n random numbers is quite fast, and the 1 extra variable per node + extra memory per query is insignificant compared to the rest of the treap itself.

The only downside for me is the higher constant compared to splay trees. Of course if a static tree is enough then by all means use it instead.

17 hours ago, # ^ |

About multitests I meant the annoying problem when you use your random number generator as a global variable and the WA occurs only when the test is third or even twenty-third in the order. Again, it is not a huge problem, but it could be annoying sometimes when you need a lot of different treap features in your solution and you forgot to push somewhere.

Generating random numbers is quite costly. Try to replace your PRNG with random_device if you use linux and compare the performance. And if you use windows, you don't have true random at all, sorry, use linux. However, treaps with proper PRNGs (e.g. mt19937) always work well, but the reason why is basically the hashing argument.

Extra memory per query is mostly theoretical point, but even one extra variable per node can make a lot of difference. If you are not lucky enough, it can multiply your memory consumption by two because of padding. And if you want to avoid this, you sometimes can use balancing with the size, but then generating even pseudorandom numbers adds some noticeable overhead.

I am not sure which tree is faster, btw, splay or treap. Would be nice to see some comparisons. Obviously, both trees have cases when they are asymptotically faster, but in general case of random queries, I think, splay is a bit slower (I may be wrong, obviously).

3 hours ago, # ^ |

← Rev. 3 →

Regarding generating truly random numbers, it is almost impossible to use a true RNG in a competitive programming context. In most implementations, std::random_device involves a filesystem call to read from /dev/urandom, and this is why it is slow. One advantage of using this over other PRNGs is that it uses entropy from environmental noise, but it is just a PRNG too (although it is also cryptographically secure). Another way of generating a "good-enough" random number (but for a one-time use, for instance, for seeding your PRNGs) is to allocate a single byte of data and use a hash of its address as a seed.

I had implemented some pretty fast PRNGs quite a while back in 146874714, and they have similar guarantees to std::mt19937. The Lehmer PRNG is still good enough for treap from what I can remember — and if you want to use it without remembering these magic constants, std::minstd_rand implements this PRNG. The wyhash PRNG has better quality, but is a tiny bit slower than Lehmer. It is notable that both lehmer (the 64 bit version) and wyhash (both 32 and 64 bit versions) pass BigCrush and practrand. Some references can be found here.

Regarding splay vs treap, most of the fastest submissions here use splay tree.

33 hours ago, # ^ |

Why do you put fenwick as easier than segtree…? It’s harder to learn and the only use case is if you need to optimize constant

33 hours ago, # ^ |

The code is shorter.

33 hours ago, # ^ |

If I couldn’t use a template I would write a segtree instead of fenwick because chance to make a mistake is much lower

33 hours ago, # ^ |

Each to their own, I guess.

22 hours ago, # ^ |

Is there anything that fenwick can do but segment can't? If not, why learn it?

17 hours ago, # ^ |

Should be none in theory. Fenwick has more constraints than Segment trees in terms of what operations it could use (i.e. For querying any arbitrary interval, the inverse operation for the query operation should be clearly defined. This is why you generally can't use it for multiplication and min/max.) Still, the Fenwick Tree has a much smaller code size, and reduces the constant on memory/time complexity, so IMO it is still worth learning even if you know how to use segtrees.

37 minutes ago, # ^ |

Do you know of any resources where they compare time complexities of segment and fenwick? I keep hearing that fenwick has smaller constant than segment, but I don't know of the exact mechanism behind it. Wikipedia states fenwick can do range query in O(4*log(N)), which is same as segment.

33 hours ago, # ^ |

As an addition to very commonly needed persistency and very rare implicitness, as far as I understand, splay tree also cannot insert or erase an element in time if you don't need to find that element (e.g. if you have a pointer to it).

In comparison, treap can do both in expected time. This is not really commonly needed, but it is nice to have in some cases. Also, note that the dynamic optimality conjecture is not applicable here because of the difference in interfaces (you have an additional info of where the element you want to delete).

But, on the other hand, splay trees are nice in a sense that you do not need to optimize the number of times you query the same element, because those redundant queries add only to the running time as opposed to for almost all other trees I know.

Recommend

One Load Balancer to rule them all

Splay tree

Lightning Talk: One Language to Rule Them All (IaC in F#)

One event to rule them all

One enum to rule them all

The NAND gate. One gate to rule them all.

One Tool to Rule Them All

GStreamer: one repository to rule them all

One algorithm to rule them all?

One fring to rule them all...

About Joyk