Case Study: Reusing Double Dispatch for serialization
source link: https://gieseanw.wordpress.com/2018/12/29/reuse-double-dispatch/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
In my previous blog post, I gave a tutorial on the double dispatch pattern. I mentioned that you could reuse the pattern for a variety of things, one of those being I/O. In this blog post, we’ll walk through how we can add serialization support for our class hierarchy without touching the classes themselves.
Situation: You want to serialize a collection of base class pointers
And the double dispatch machinery is already in place.
The main classes we care about are an Animal
base class, and AnimalCollection
that is a lightweight wrapper around a vector<Animal*>
:
struct
Animal
{
virtual
~Animal() =
default
;
virtual
void
Visit(AnimalVisitor*)
const
= 0;
};
struct
AnimalCollection
{
// ...
private
:
std::vector<std::unique_ptr<Animal>> animals_;
};
A brief review of our double dispatch machinery
AnimalVisitor
is the roughly the same as before — an abstract base class with knowledge of the Animal
hierarchy:
struct
Cat;
struct
Dog;
// ... other forward declared classes
struct
AnimalVisitor
{
virtual
void
Visit(
const
Cat*) = 0;
virtual
void
Visit(
const
Dog*) = 0;
// ... other overloads for Visit
virtual
~AnimalVisitor() =
default
;
};
And the derived Animal
classes all use the mixin pattern I showed in the last post to make them visitable:
template
<
class
T>
struct
VisitableAnimal : Animal
{
void
Visit(AnimalVisitor* _visitor)
const
override
{
_visitor->Visit(
static_cast
<
const
T*>(
this
));
}
};
struct
Cat : VisitableAnimal<Cat>
{};
struct
Dog : VisitableAnimal<Dog>
{};
//... other Animals
(I’ve slightly modified the Visit
methods and AnimalVisitor
here to operate on pointers to const
. In reality the code will have both versions, as you’ll see in the live demo later.)
Serialization requirements
Our serialization goals are modest:
- Write an
AnimalCollection
to astd::ostream
- Read an
AnimalCollection
from astd::istream
- Each
Animal
is written to its own line as a magic string- e.g. "cat" for
Cat
and "dog" forDog
, etc.
- e.g. "cat" for
Example output:
Given an AnimalCollection
like [Cat, Dog, Cat, Cat, Llama]
, we’d expect output like so:
cat
dog
cat
cat
llama
Why this format specifically?
I intentionally simplified the serialization/deserialization logic here to only operate on magic strings representing a type, because this dovetails nicely into factory-pattern serialization schemes.
Key-value serialization formats like .xml and .json lend themselves nicely to this kind of factory pattern. Here’s a sample .json file we might eventually end up with for our animal collection as the Animal
classes evolve:
# animals.json
{ "animal_collection":
{
"count" : 2
"data" : [
{
"type" : "cat"
"declawed" : "true"
"mice_killed" : 0
},
{
"type" : "dog"
"belly_rubs_received" : 42
"sticks_gathered" : 8
}
]
}
Note that each field in the "data" section of our JSON has a "type" string representing which concrete Animal
class to create. A more fully-featured serialization scheme would read that first and then delegate the rest of the work to a type-specific Cat::Load(std::istream&)
or Dog::Load(std::istream&)
etc.
The naive approach
Your knee-jerk reaction to the above requirements is likely along the lines of "let’s give the Animal
class a virtual Save()
method!"
struct
Animal
{
virtual
void
Save(std::ostream& _outstream)
const
= 0;
// ...
};
And then derived classes can implement Save
like so
struct
Cat : VisitableAnimal<Cat>
{
void
Save(std::ostream& _outstream)
const
override
{
_outstream <<
"cat\n"
;
}
};
This makes things quite easy for AnimalCollection
struct
AnimalCollection
{
// ...
public
:
void
Save(std::ostream& _outstream)
const
{
for
(
const
auto
& animal : animals_)
{
animal->Save(_outstream);
}
}
};
Problem solved, right?
Well, we sort of forgot about deserialization…
Deserialization
How do we implement AnimalCollection::Load
now?
void
AnimalCollection::Load(std::istream& _instream){
std::string nextline;
while
(std::getline(_instream, nextline))
{
auto
nextAnimal =
/*which Animal to create???*/
;
animals_.push_back(std::move(nextAnimal));
}
}
The problem is that the magic strings representing Cat
, Dog
, etc. are all hard-coded into *::Save
methods. From here, you might be tempted to take one of the following approaches:
AnimalCollection
should "just know" about all the magic strings for each derived type
if
(nextLine ==
"cat"
)
{
auto
nextAnimal = std::make_unique<Cat>();
animals_.push_back(std::move(nextAnimal));
}
else
if
(nextLine ==
"dog"
)
{
// ...
}
// etc
Code like that should be a big red flag. What happens when we want to change the strings such that the first character is capitalized? "Cat" instead of "cat". Now we have to do it in two places — once in AnimalCollection.cpp
and again in Cat.cpp
.
Data that is duplicated should be assumed to already be out of sync.
- Perhaps each type should have a
GetType()
method that returns astring
(or anenum
that can be converted into astring
)
struct
Cat : VisitableAnimal<Cat>
{
static
std::string GetType(){
return
"cat"
;
}
};
Then our deserialization code looks like this:
if
(nextLine == Cat::GetType())
{
auto
nextAnimal = std::make_unique<Cat>();
animals_.push_back(std::move(nextAnimal));
}
else
if
(nextLine == Dog::GetType())
{
// ...
}
// etc.
Code like this might be useful if you are exposing an AnimalFactory
directly to the client (via a Factory Pattern). Our use case, though, begs the question of
"Do you really need to add another method to every class’ interface?"
I think the answer is a resounding "No", and I’m going to appeal to the authority of Bob Martin to support me here
Good developers learn to limit what they expose at the interfaces of their classes and modules. The fewer methods a class has, the better. – Robert Martin, Clean Code
What a great motivation for reusing our double dispatch machinery to solve this problem non-invasively.
Adding Serialization via double dispatch
We can reuse our existing double dispatch machinery that we’ve already gone to the trouble of adding to Animal
to save and load an AnimalCollection
.
Our goals are thus:
- Only write our magic strings in one location
- Avoid using run time type information
- Don’t touch any of our existing interfaces
Saving via double dispatch
We need to save a single Animal
to a stream, and we don’t need state, so let’s prefer a non-member, non-friend function to do this in order to maximize encapsulation.
// AnimalSerialization.h
namespace
animal_serialization
{
// preconditions: Animal is not null, _outstream is open and ready
void
Save(
const
Animal* _animal, std::ostream& _outstream);
}
In AnimalSerialization.cpp
, we need a way to translate Animal
instances into strings. Our double dispatch visitor comes in handy here.
The implementation is fairly straightforward. Deriving a SaveAnimalVisitor
from AnimalVisitor
allows us to immediately know about all types in the hierarchy. From there it’s a matter of printing the magic strings to a stream.
// AnimalSerialization.cpp
// ...
namespace
{
std::string CatString(){
return
"cat"
;}
std::string DogString(){
return
"dog"
;}
struct
SaveAnimalVisitor : AnimalVisitor
{
public
:
// precondition: _outstream will outlive the SaveAnimalVisitor instance
explicit
SaveAnimalVisitor(std::ostream& _outstream) :
outstream_{&_outstream}
{}
void
Visit(
const
Cat*)
override
{
*outstream_ << CatString() <<
"\n"
;
}
void
Visit(
const
Dog*)
override
{
*outstream_ << DogString() <<
"\n"
;
}
// ... other overridden Visit methods
private
:
std::ostream* outstream_ =
nullptr
;
};
}
// anonymous namespace
From there it’s a simple matter of hooking up this new visitor to the Animal
we wish to serialize:
// AnimalSerialization.cpp
// ... (our visitor code)
namespace
animal_serialization
{
void
Save(
const
Animal* _animal, std::ostream& _outstream){
::SaveAnimalVisitor saveVisitor{_outstream};
_animal->Visit(&saveVisitor);
}
}
AnimalCollection::Save
uses it like so:
// AnimalCollection.cpp
void
AnimalCollection::Save(std::ostream& _outstream)
const
{
for
(
const
auto
& animal : animals_)
{
animal_serialization::Save(animal.get(), _outstream);
}
}
That was disturbingly easy, right?
We could grow our Animal
hierarchy to 100 types and AnimalSerialization.cpp
would still only be ~400 LOC.
Loading
Now the challenge is to read strings from a stream and construct Animal
instances. How can double dispatch help us here?
While we cannot directly use double dispatch during the deserialization portion, what it enabled us to do was put all our magic strings into just the AnimalSerialization.cpp
source file. With that in place, we can implement animal_serialization::Load()
as a basic Factory:
// AnimalSerialization.h
namespace
animal_serialization
{
// ...
// precondition: _instream is open and ready
std::vector<std::unique_ptr<Animal>> Load(std::istream& _instream);
}
// AnimalSerialization.cpp
// ...
std::unique_ptr<Animal> ParseAnimal(std::string _line){
if
(_line == CatString())
return
std::make_unique<Cat>();
else
if
(_line == DogString())
return
std::make_unique<Dog>();
// etc.
}
namespace
animal_serialization
{
// ...
std::vector<std::unique_ptr<Animal>> Load(std::istream& _instream){
std::vector<std::unique_ptr<Animal>> toReturn;
std::string line;
while
(std::getline(line, _instream))
toReturn.push_back(ParseAnimal(line));
return
toReturn;
}
Wow that was easy, too.
AnimalCollection::Load
has an easy task in front of it:
// AnimalCollection.cpp
// ...
void
AnimalCollection::Load(std::istream& _instream)
{
std::vector<std::unique_ptr<Animal>> loadedAnimals = animal_serialization::Load(_instream);
animals_ = std::move(loadedAnimals);
}
From this point, we could take refactoring a number of steps further, ultimately going so far as to have a map
of strings
to functions returning Animal
instances:
// AnimalSerialization.cpp
// ...
template
<
class
T>
std::unique_ptr<Animal> CreateAnimal()
{
return
std::make_unique<T>();
}
using
AnimalCreatorFunction = std::function<std::unique_ptr<Animal>()>;
std::unordered_map<std::string, AnimalCreatorFunction> animalFactory =
{
{CatString(), AnimalCreatorFunction{&CreateAnimal<Cat>}},
{DogString(), AnimalCreatorFunction{&CreateAnimal<Dog>}},
// ...
};
std::vector<std::unique_ptr<Animal>> Load(std::istream& _instream){
std::vector<std::unique_ptr<Animal>> toReturn;
std::string line;
while
(std::getline(_instream, line))
toReturn.push_back(animalFactory.at(line)());
return
toReturn;
}
Full-fledged demo here
At this point, we’ve moved the implementation details into the narrowest possible scope and avoided duplication at all cost. This is a pretty good stopping point; we could still grow to 100 derived Animal
types without over-complicating or overcrowding AnimalSerialization.cpp
(Perhaps ~500 LOC).
(If you find yourself in a situation where you DO need to split things up further, feel free to contact me (see my About Me page); there are other techniques we could use that are outside the scope of this article.)
Conclusion
The double dispatch pattern lends itself nicely to stable interfaces thanks to its reusability. In this post, I walked through how we might reuse it to implement basic serialization without needing to touch Animal
itself, or any derived class. I hope you’re already thinking of places in your codebase that could benefit from refactoring to use this pattern!
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK