6

Use case of utilizing std::set instead of std::map

 2 years ago
source link: https://vorbrodt.blog/2021/10/08/use-case-of-utilizing-stdset-instead-of-stdmap/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Use case of utilizing std::set instead of std::map

Some time back I reviewed code that had the following design and data structure:

struct Object final {
    // some constructors, more data members, more member functions
    Object(const int id, std::string n) : id_(id), name_(std::move(n)) {}
    int id_{};
    std::string name_;

Now, the above is a struct but just imagine it could be a class with private/public sections, a beefy interface and data members.

The author(s) of the code wanted to place it in some data structure that can help retrieve an “Object” using a given “ID”. So they picked the following approach:

// usage example with some helper functions
void add_new(std::map<int, Object>& m, Object obj) {
    m.insert(std::make_pair(obj.id_, std::move(obj)));
void add_new(std::map<int, Object>& m, const int id, const std::string& name) {
    m.insert(std::make_pair(id, Object{id, name}));
// the simple data structure
std::map<int, Object> mymap;
add_new(mymap, Object{1, "1"s});
add_new(mymap, 5, "5"s);
add_new(mymap, 3, "3"s);
add_new(mymap, -7, "-7"s);
add_new(mymap, Object{4, "4"s});
// usage of key to find/access Object
const auto& n = mymap[5].name_;
fmt::print("get 5's name {}\n", n);
assert(n == "5"s);

Pretty standard data structure. A {key, value} pair where the “key” is the “ID” of the “Object” and the “value” is the “Object” itself. This is not the first time I see such pattern. It is actually quite common in various code trees.

While looking at this piece of code I wonder, can we do better? The “ID” is part of the “Object”, and the requirements are to be able to insert/find values using std::map(*). The “Object”s are ordered by “ID”. My question is – Why do we need to repeat “ID” as a “Key” ?

Before we move forward, let’s compute the current cost:

// This prints: sizeof pair 48, object 40, string 32
fmt::print("sizeof pair {}, object {}, string {}\n",
            sizeof(std::pair<int, Object>), sizeof(Object),
            sizeof(std::string));

This prints “sizeof pair 48, object 40, string 32” which means we pay an extra 8 bytes for storing an integer as the “ID”/”Key”. The same integer that is already stored in “Object”.

So the immediate thought is “yes“. We can do better. What if we eliminate the need for a separate “Key” and store the “Object”s in a similar container, that provides us the same requirements/performance(O(x) wise) but with smaller memory footprints?

We have something like this. It is called std::set<> and it has very similar characteristics to std::map<>. std::map<> stores std::pair<T1,T2>. It contains 2 types. std::set<> stores a single type.

So let’s give it a shot:

// more helper functions
void add_new(std::set<Object>& s, Object obj) {
    s.insert(std::move(obj));
void add_new(std::set<Object>& s, const int id, const std::string& name) {
    s.insert(Object{id, name});
std::set<Object> myset;
add_new(myset, Object{1, "1"s});
add_new(myset, 5, "5"s);
add_new(myset, 3, "3"s);
add_new(myset, -7, "-7"s);
add_new(myset, Object{4, "4"s});

And we would like to have the same idea. We’d like to lookup number “5”. But std::set<> does not have a subscript operator (“[]”). It does however have “find()” function instead:

const auto& nset = myset.find(5)->name_;
fmt::print("get 5's name {}\n", nset);
assert(nset == "5"s);

There are couple things we need to straighten out before we open the champagne.

First, by default, std::set<> requires less (“<“) operator to be defined for the stored objects. The compiler does not know how to compare “Object”s. We need to teach it. We can pass a compare function, we can use lambda or, we can just implement a simple member function that will help us to compare Objects by IDs.

struct Object final {
    // same as above
    // ...
    // new function definition
    [[nodiscard("why would you discard me?")]
    bool operator<(const Object& o) const noexcept {
        return id_ < o.id_;

This is basically fixing the major missing part. Remember that with std::map<> we had simple “int” as a “Key” and the compiler is very capable of generating code to compare “int”s. But not for “Object”s.

There is some wart here that requires discussion. If we want to write something like:

const auto& nset = myset.find(5)->name_;

We need to be able to convert an integer to an “Object”. We can do it using a conversion constructor:

struct Object final {
    // cannot make it explicit since we need it as a conversion operator
    Object(const int id) : id_(id) {}
    // same as before
    // ...

Notice the comment. We cannot make it explicit. In most cases, conversion constructor should be marked explicit, but if you actually need the conversion, you would have to pay this trade off. Otherwise, you’d need to write something less “natural” e.g.:

const auto& nset = myset.find(Object{5})->name_;

In my opinion this is a trade off that we would need to pay. Unless there are other sane options. For example I could think about having this non-explicit constructor in private section and having a get() friend function accessing it but I’m not sure this worth the effort.

That’s it. I saved you 8 bytes per entry. Keep the change and thank me later!

(*) std::map is probably implemented as a Red-Black Tree with log(N) to find something and O(1) to (re)balance it

Like this:

Loading...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK