OwenGage.com
source link: https://owengage.com/writing/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Articles
Exploring serde
's data model with a toy deserializer
2021 Aug 14
Previously I did a shallow dive into understanding serde
by expanding the
Deserialize
macro. This time I'll go deeper and map how each type in serde's
data model is treated.
This is written for people intending to write their own deserializer for
serde
, but might be interesting for more people. Later in the article we will
implement a toy deserializer for some made up format.
There are three major categories of types to consider:
- Sequences of values (think
Vec
, slice, tuple) - Maps from keys to values (think
HashMap
,struct
) - Everything else
The 'Everything else' turns out to be the easiest, so lets start there.
Table of everything else
The basic steps followed between you and serde
when writing a deserializer is:
Deserialize::deserialize
is called on some type.- This calls a
deserialize_*
method on your deserializer, passing you aVistitor
. - You call appropriate
visit_*
methods for that call.
For example, calling deserialize
on a String
calls deserialize_string
on
your deserializer, you deserialize a string from your input and call
visit_string
on the visitor you were passed.
When writing a deserializer I wish I knew
- what real Rust types would end up calling my
deserialize_*
methods. - which
visit_*
methods my deserializer should call in response.
Before answering these questions, lets look at the Visitor
trait. This trait
has default implementations of each method. Some return an error, and others
forward the call to another visitor method.
This will be important later, since it means our deserializer can call a larger set of methods than you would naïvely expect.
Here's a summary of all the forwarded methods:
Visitor
methodForwarded tovisit_i8
visit_i64
visit_i16
visit_i32
visit_u8
visit_u64
visit_u16
visit_u32
visit_f32
visit_f64
visit_char
visit_str
visit_borrowed_str
visit_str
visit_string
visit_str
visit_borrowed_bytes
visit_bytes
visit_byte_buf
visit_bytes
(For completeness, the methods that do not forward are visit_bool
, visit_i64
, visit_u64
,
visit_f64
, visit_str
, visit_bytes
, visit_none
, visit_some
,
visit_unit
, visit_newtype_struct
, visit_seq
, visit_map
, and visit_enum
.)
Okay, back to who calls and expects what.
This table is mostly put together from impls.rs
in
serde
.
Each row has a Rust type that is being deserialized, the deserialize call that
will be called on the deserializer, and finally the acceptable method calls on
the visitor passed to the deserializer.
Deserializer
method calledExpected Visitor
method callsstruct field identifierdeserialize_identifer
visit_str
, visit_bytes
, but also visit_u64
where the number is field number. (forwarded from: visit_char
,visit_borrowed_str
, visit_string
,visit_borrowed_bytes
, visit_byte_buf
, visit_{u8,16,32}
)bool
deserialize_bool
visit_bool
()
deserialize_unit
visit_unit
i8
deserialize_i8
visit_i{8,16,32,64}
visit_u{8,16,32,64}
i16
deserialize_i16
i32
deserialize_i32
i64
deserialize_i64
isize
deserialize_i64
u8
deserialize_u8
u16
deserialize_u16
u32
deserialize_u32
u64
deserialize_u64
usize
deserialize_u64
f32
deserialize_f32
visit_f64
visit_f32
visit_i{8,16,32,64}
visit_u{8,16,32,64}
f64
deserialize_f64
char
deserialize_char
visit_char
, visit_str
erroring if string length not exactly 1. (forwarded from: visit_borrowed_str
, visit_string
)String
deserialize_string
visit_str
visit_string
visit_bytes
, erroring if not unicodevisit_byte_buf
, erroring if not unicode, stealingVec<u8>
's allocation
visit_char
,visit_borrowed_str
,visit_borrowed_bytes
)&'a str
deserialize_str
visit_borrowed_str
visit_borrowed_bytes
, erroring if not unicode
&'a [u8]
deserialize_bytes
visit_borrowed_str
visit_borrowed_bytes
Option<T>
deserialize_option
visit_some
, visit_none
, visit_unit
PhantomData<T>
deserialize_unit_struct
visit_unit
Vec<T>
deserialize_seq
visit_seq
[T; N]
deserialize_tuple
(!)tuples eg (T,U)
deserialize_tuple
BTreeMap<K, V>
deserialize_map
visit_map
HashMap<K, V>
Most of this is unsurprising. Some things that sticks out to me are:
- an array causes
deserialize_tuple
to be called. I guess it is similar to tuples in that arrays and tuples are fixed sizes, unlike aVec
. - conversions to floating point use the
as
keyword to convert from integer types. The reference shows that this will convert to the nearest float, and that overflow to infinity can only happen foru128
tof32
1. So this conversion seems sensible. - you cannot
visit_bytes
for deserializing achar
like you would for aString
.
Sequences
Sequences are the next level up in difficulty for our category of types.
When a type like a Vec<T>
is deserialized, it calls deserialize_seq
on the
Vec
, passing in a visitor with visit_seq
implemented.
Here's the actual implementation in impls.rs
:
impl<'de, T> Deserialize<'de> for Vec<T>
where
T: Deserialize<'de>,
{
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
struct VecVisitor<T> {
marker: PhantomData<T>,
}
impl<'de, T> Visitor<'de> for VecVisitor<T>
where
T: Deserialize<'de>,
{
type Value = Vec<T>;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("a sequence")
}
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: SeqAccess<'de>,
{
let mut values = Vec::with_capacity(size_hint::cautious(seq.size_hint()));
while let Some(value) = try!(seq.next_element()) { // ❷
values.push(value);
}
Ok(values)
}
}
let visitor = VecVisitor {
marker: PhantomData,
};
deserializer.deserialize_seq(visitor) // ❶
}
// ... snipped out deserialize_in_place
}
The steps here are:
deserialize_seq
gets called with a visitor. ❶- The deserializer sets up an object implementing
SeqAccess
, and callsvisit_seq
with it. - The visitor repeatedly calls
next_element
on theSeqAccess
object untilNone
is returned (or it errors). ❷
This is much like with the Iterator
trait, but fallible. Some other stuff
happens, like the visitor asking for a hint to the size for efficiency, but a
SeqAccess
object doesn't have to provide this.
Let's look at the SeqAccess
trait:
pub trait SeqAccess<'de> {
type Error: Error;
fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>, Self::Error>
where
T: DeserializeSeed<'de>;
fn next_element<T>(&mut self) -> Result<Option<T>, Self::Error>
where
T: Deserialize<'de>,
{
self.next_element_seed(PhantomData)
}
fn size_hint(&self) -> Option<usize> {
None
}
}
The only method without a default implementation is next_element_seed
. If
you're writing a deserializer, this is what you need to implement. When it is
called it is expected that you deserialize a T
from your input and return it,
or None
if it's the end of the sequence.
In practice this usually means you call T::deserialize
repeatedly (recursively
depending on other bits of your deserializer). You just have to correctly figure
out when there are no more elements. This might be easy or hard depending on the
data format you're deserializing.
Example data format
Let's invent a very trivial data format to demonstrate sequences. Our format is simply going to be a list of integers, where each integer is 3 bytes, with a single byte at the front to tell us the length of the sequence. That's it.
We're going to make the following work:
#[test]
fn three_byte_format() {
let data = [
3, // single byte for length
0, 0, 1, // first 3 byte int
0, 0, 2,
0, 0, 3];
let mut deserializer = Deserializer::from_bytes(&data);
let res = Vec::<i32>::deserialize(&mut deserializer).unwrap();
assert_eq!(&[1, 2, 3], res.as_slice());
}
Let's get the boilerplate our of the way. We need to set up our own custom error, which is just going to be a wrapped String. We aren't valuing good errors in our toy format, sorry:
use std::io::Read;
use serde::de::{Error as _, SeqAccess, Visitor};
#[derive(Debug)]
struct Error(String);
impl std::fmt::Display for Error {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.write_str(&self.0)
}
}
impl std::error::Error for Error {}
impl serde::de::Error for Error {
fn custom<T>(msg: T) -> Self
where
T: std::fmt::Display,
{
Error(msg.to_string())
}
}
struct Deserializer<'de> {
input: &'de [u8],
}
impl<'de> Deserializer<'de> {
pub fn from_bytes(input: &'de [u8]) -> Self {
Self { input }
}
}
impl<'de, 'a> serde::de::Deserializer<'de> for &'a mut Deserializer<'de> {
serde::forward_to_deserialize_any! {
bool i8 i16 i32 i64 u8 u16 u32 u64 f32 f64 char str string
byte_buf option unit unit_struct newtype_struct tuple tuple_struct
seq map struct enum identifier ignored_any bytes
}
type Error = Error;
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: serde::de::Visitor<'de>,
{
use serde::de::Error;
Err(Self::Error::custom("expected sequence"))
}
}
This is pretty much the minimum required to make things compile. There's a few things to note here:
serde
has it's ownError
trait we need to implement that requires acustom
method. This letsserde
report deserialization errors, eg 'exepected seq'.- Our deserializer just takes some bytes. Most real deserializers allow you to
provide anything that implements
Read
. The general approach is the same. - We use
forward_to_deserialize_any!
to implement all of the required methods. Our 'any' just errors. - We're implementing
Deserializer
for&'a mut Deserializer
, not the value itself. This is crucial later.
First thing is to actually implement deserialize_seq
since that is what will be
called when deserializing a Vec<T>
. We add the below implementation and remove
seq
from forward_to_deserialize_any!
.
fn deserialize_seq<V>(mut self, visitor: V) -> Result<V::Value, Self::Error>
where
V: Visitor<'de>,
{
let mut length = [0u8; 1];
self.input
.read_exact(&mut length)
.map_err(|_| Error::custom("read error"))?;
let length = length[0] as usize;
visitor.visit_seq(OurSeqAccess {
inner: &mut self,
remaining_length: length,
})
}
The main thing we need to do here is get the length of the sequence that's
coming, and get the input to the point where the next thing to be deserialized
is the first element. In our format this is one in the same. For some formats
the length won't exist ahead of time at all (like an array in JSON). As long as
we can find the end it's fine (so ]
in JSON).
We create a new type that implements SeqAccess
that we're calling
OurSeqAccess
. This stores the length and the deserializer. Here's the implementation:
struct OurSeqAccess<'a, 'de> {
inner: &'a mut Deserializer<'de>,
remaining_length: usize,
}
impl<'a, 'de> SeqAccess<'de> for OurSeqAccess<'a, 'de> {
type Error = Error;
fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>, Self::Error>
where
T: serde::de::DeserializeSeed<'de>,
{
if self.remaining_length > 0 {
self.remaining_length -= 1;
let el = seed.deserialize(&mut *self.inner)?;
Ok(Some(el))
} else {
Ok(None)
}
}
}
Some important details
We need the second lifetime 'a
for the lifetime of the deserializer itself.
The 'de
lifetime is that of the input data.
This code is why implementing Deserializer
on the mutable reference is
crucial. If we implemented it on our deserializer directly, we would have to
pass a clone to seed.deserialize
. This would be bad, because when we consume
input, it wouldn't be visible outside of the current function call (the slice in
the original deserializer would not change). By implementing on the &mut
we
can continue to consume input.
Back to our implementation...
So! All we're doing in our next_element_seed
checking there are more elements
to read, then calling seed.deserialize
with our deserializer. At this point we
do not know the concrete type we are currently deserializing, we just have T
.
By calling deserialize
on this T
, we will call the relevant
deserialize_*
method.
Unlike the visitor, each deserialize_*
method does not have a default
implementation. So to support i32
we need to implement exactly
deserialize_i32
on our deserializer (removing it from the
forward_to_deserialize_any!
macro), like so:
fn deserialize_i32<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: Visitor<'de>,
{
let mut buf = [0; 3];
match self.input.read_exact(&mut buf) {
Ok(_) => {
// convert buffer to integer.
let mut i = (buf[0] as i32) << 16;
i += (buf[1] as i32) << 8;
i += buf[2] as i32;
visitor.visit_i32(i)
}
Err(_) => Err(Error::custom("could not read next integer")),
}
}
Here we try to read exactly 3 bytes. If we're successful we do some bit shifting
to create an i32
and call the visit_i32
method on it.
It is at this point where a visitor implementation might not have this method
implemented, and it would be forwarded to visit_i64
thanks to the default
implementation.
Our test now passes!
#[test]
fn three_byte_format() {
let data = [
3, // single byte for length
0, 0, 1, // first 3 byte int
0, 0, 2,
0, 0, 3];
let mut deserializer = Deserializer::from_bytes(&data);
let res = Vec::<i32>::deserialize(&mut deserializer).unwrap();
assert_eq!(&[1, 2, 3], res.as_slice());
}
$ cargo test
...
running 1 test
test tests::three_byte_format ... ok
Beyond a toy
If we changed the Vec<i32>
in the test to Vec<i64>
, then it would fail,
because our deserializer does not implement deserialize_i64
(or rather, it
does, but it forwards to deserialize_any
, which errors).
In a more robust implementation, you would implement all of the numeric methods on the deserializer.
Maps are the most complex part to handle, and I'm going to leave quite a lot of
it as an excerise to the reader. The structure and handling is very similar to
the seq
stuff we've just implemented, but with keys and values.
Here's a trimmed version of the MapAccess
trait:
pub trait MapAccess<'de> {
type Error: Error;
fn next_key_seed<K>(&mut self, seed: K) -> Result<Option<K::Value>, Self::Error>
where
K: DeserializeSeed<'de>;
fn next_value_seed<V>(&mut self, seed: V) -> Result<V::Value, Self::Error>
where
V: DeserializeSeed<'de>;
//...
}
Similar to needing to implement next_element_seed
for SeqAccess
, we need to
implement next_key_seed
and next_value_seed
.
We'd create one of these in the
deserialize_map
method of our deserializer. We'd set up the input to be ready
to deserialize the first key of the map. Somewhere in these next methods you
would need to get the input ready for the first value, then the second key, etc.
These are the major parts for creating a deserializer. I would recommend looking
at the serde docs for a deserializer
for more, as well as looking at existing implementations like
serde_json. My own
fastnbt
crate also has a
deserializer
for the Minecraft NBT format, which might be simpler due to only working with
byte slices.
Thanks for reading! Hope it was helpful.
Footnotes
serde
does supportu128
behind a feature flag. It does not do float conversion for 128-bit integers by default)↩
Understanding Rust's serde
using macro expansion
2021 Jul 23
While I was writing fastnbt, I struggled to find an in depth explanation of how to write a deserializer with serde. I want to explore how serde works using cargo-expand.
This article expects familiarity with Rust, and at least a little experience using the de facto serialization/deserialization library serde.
Expansive mess
cargo expand
is a custom subcommand for Cargo that lets you print the results
of expanding a macro. Let's try it for a simple Deserialize
macro:
#[derive(Deserialize)]
struct Human {
name: String,
}
Here we simply have a Human
struct
that contains a name. We derive an
implementation of the Deserialize
trait. If we run cargo expand...
cargo install cargo-expand
cargo expand
...then we get the incredibly short... (don't spend time looking at this)
#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () = {
#[allow(unused_extern_crates, clippy::useless_attribute)]
extern crate serde as _serde;
#[automatically_derived]
impl<'de> _serde::Deserialize<'de> for Human {
fn deserialize<__D>(__deserializer: __D) -> _serde::__private::Result<Self, __D::Error>
where
__D: _serde::Deserializer<'de>,
{
#[allow(non_camel_case_types)]
enum __Field {
__field0,
__ignore,
}
struct __FieldVisitor;
impl<'de> _serde::de::Visitor<'de> for __FieldVisitor {
type Value = __Field;
fn expecting(
&self,
__formatter: &mut _serde::__private::Formatter,
) -> _serde::__private::fmt::Result {
_serde::__private::Formatter::write_str(__formatter, "field identifier")
}
fn visit_u64<__E>(self, __value: u64) -> _serde::__private::Result<Self::Value, __E>
where
__E: _serde::de::Error,
{
match __value {
0u64 => _serde::__private::Ok(__Field::__field0),
_ => _serde::__private::Ok(__Field::__ignore),
}
}
fn visit_str<__E>(
self,
__value: &str,
) -> _serde::__private::Result<Self::Value, __E>
where
__E: _serde::de::Error,
{
match __value {
"name" => _serde::__private::Ok(__Field::__field0),
_ => _serde::__private::Ok(__Field::__ignore),
}
}
fn visit_bytes<__E>(
self,
__value: &[u8],
) -> _serde::__private::Result<Self::Value, __E>
where
__E: _serde::de::Error,
{
match __value {
b"name" => _serde::__private::Ok(__Field::__field0),
_ => _serde::__private::Ok(__Field::__ignore),
}
}
}
impl<'de> _serde::Deserialize<'de> for __Field {
#[inline]
fn deserialize<__D>(
__deserializer: __D,
) -> _serde::__private::Result<Self, __D::Error>
where
__D: _serde::Deserializer<'de>,
{
_serde::Deserializer::deserialize_identifier(__deserializer, __FieldVisitor)
}
}
struct __Visitor<'de> {
marker: _serde::__private::PhantomData<Human>,
lifetime: _serde::__private::PhantomData<&'de ()>,
}
impl<'de> _serde::de::Visitor<'de> for __Visitor<'de> {
type Value = Human;
fn expecting(
&self,
__formatter: &mut _serde::__private::Formatter,
) -> _serde::__private::fmt::Result {
_serde::__private::Formatter::write_str(__formatter, "struct Human")
}
#[inline]
fn visit_seq<__A>(
self,
mut __seq: __A,
) -> _serde::__private::Result<Self::Value, __A::Error>
where
__A: _serde::de::SeqAccess<'de>,
{
let __field0 =
match match _serde::de::SeqAccess::next_element::<String>(&mut __seq) {
_serde::__private::Ok(__val) => __val,
_serde::__private::Err(__err) => {
return _serde::__private::Err(__err);
}
} {
_serde::__private::Some(__value) => __value,
_serde::__private::None => {
return _serde::__private::Err(_serde::de::Error::invalid_length(
0usize,
&"struct Human with 1 element",
));
}
};
_serde::__private::Ok(Human { name: __field0 })
}
#[inline]
fn visit_map<__A>(
self,
mut __map: __A,
) -> _serde::__private::Result<Self::Value, __A::Error>
where
__A: _serde::de::MapAccess<'de>,
{
let mut __field0: _serde::__private::Option<String> = _serde::__private::None;
while let _serde::__private::Some(__key) =
match _serde::de::MapAccess::next_key::<__Field>(&mut __map) {
_serde::__private::Ok(__val) => __val,
_serde::__private::Err(__err) => {
return _serde::__private::Err(__err);
}
}
{
match __key {
__Field::__field0 => {
if _serde::__private::Option::is_some(&__field0) {
return _serde::__private::Err(
<__A::Error as _serde::de::Error>::duplicate_field("name"),
);
}
__field0 = _serde::__private::Some(
match _serde::de::MapAccess::next_value::<String>(&mut __map) {
_serde::__private::Ok(__val) => __val,
_serde::__private::Err(__err) => {
return _serde::__private::Err(__err);
}
},
);
}
_ => {
let _ = match _serde::de::MapAccess::next_value::<
_serde::de::IgnoredAny,
>(&mut __map)
{
_serde::__private::Ok(__val) => __val,
_serde::__private::Err(__err) => {
return _serde::__private::Err(__err);
}
};
}
}
}
let __field0 = match __field0 {
_serde::__private::Some(__field0) => __field0,
_serde::__private::None => {
match _serde::__private::de::missing_field("name") {
_serde::__private::Ok(__val) => __val,
_serde::__private::Err(__err) => {
return _serde::__private::Err(__err);
}
}
}
};
_serde::__private::Ok(Human { name: __field0 })
}
}
const FIELDS: &'static [&'static str] = &["name"];
_serde::Deserializer::deserialize_struct(
__deserializer,
"Human",
FIELDS,
__Visitor {
marker: _serde::__private::PhantomData::<Human>,
lifetime: _serde::__private::PhantomData,
},
)
}
}
};
We can add some clarity here by
- Replacing private aliases with the more expected form. So
_serde::__private::Result
is actuallystd::result::Result
. - Renaming type parameters to be easier on the eyes, like
__D
to justD
. - Removing some of the annotations like
#[automatically_derived]
. - Removing the wrapping scope ie
const _: () = {...}
. - Moving nested
struct
andimpl
blocks to the top level.
These things are to isolate the expanded code from the code around it. Preventing the expanded code affecting yours, and yours from affecting the expanded code.
A Reddit user pointed out that the wrapping scope was introduced because of GitHub serde issue 159.
There's quite a few types and implementations created by this expansion. Below is a quick summary:
impl Deserialize for Human
This is exactly what we wanted to derive.struct HumanVisitor
A visitor that gets called by the deserializer. It's job is to produce the Human
value.enum Field
This enum represents the fields of our struct Human
, in our case it simply containsfield0
for 'name', and anignore
variant.struct FieldVisitor
A visitor purely to check identifier-like values produced by the deserializer match our fields.Deserialize
implementation
After all that clean up for human eyes, here's our Deserialize
implementation:
impl<'de> serde::Deserialize<'de> for Human {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
const FIELDS: &'static [&'static str] = &["name"];
serde::Deserializer::deserialize_struct(
deserializer,
"Human",
FIELDS,
HumanVisitor {
marker: PhantomData::<Human>,
lifetime: PhantomData,
},
)
}
}
We can see here that this just delegates to the deserialize_struct
method,
passing some extra information like the names of our fields, and a visitor that
was also generated by the macro. Nothing too complicated here. What's that
HumanVisitor
?
Our visitor
Here's our visitor with some code snipped out for brevity:
struct HumanVisitor<'de> {
marker: PhantomData<Human>,
lifetime: PhantomData<&'de ()>,
}
impl<'de> serde::de::Visitor<'de> for HumanVisitor<'de> {
type Value = Human;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
fmt::Formatter::write_str(formatter, "struct Human")
}
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: SeqAccess<'de>,
{
// ...
}
fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
where
A: MapAccess<'de>,
{
// ...
}
}
The purpose of a visitor is to be driven by the deserializer, constructing the
values as it goes. This visitor in particular is expecting to have methods
called that can be converted to our Human
structure. It would make no sense if
the visitor received an integer type, because that does not represent a
structure.
The visitor only implements the methods that make sense for the type it is
expecting. For struct
s, you would generally expect a map of key-value pairs.
This is why our visitor implements visit_map
. It also implements visit_seq
(seq
for sequence); this supports formats that encode the values in order,
skipping keys.
Default method implementations
The serde::de::Visitor
trait has default implementations of each method, which
simply raise an error or forward the call on to another method. Here's the
default implementation of visit_bool
:
fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
where
E: Error,
{
Err(Error::invalid_type(Unexpected::Bool(v), &self))
}
So if the deserializer calls the visitor with a bool, but it does not implement
the method, by default you will get an error. Some method implementations add
convenience for the common cases, like visit_u8
, which forwards on to
visit_u64
:
fn visit_u8<E>(self, v: u8) -> Result<Self::Value, E>
where
E: Error,
{
self.visit_u64(v as u64)
}
This makes some sense. If you are making a visitor that accepts unsigned integer
types, you can implement visit_u64
and handle unsigned integers of smaller
types like u32
for free. This can make untagged enums containing these
forwarded types difficult however, forcing us to create stricter
deserializers.
Map access
Let's take a look at the code for visit_map
:
fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
where
A: MapAccess<'de>,
{
let mut field0: Option<String> = None;
while let Some(key) = MapAccess::next_key::<Field>(&mut map)? {
match key {
Field::field0 => {
if Option::is_some(&field0) {
return Err(<A::Error as Error>::duplicate_field("name"));
}
field0 = Some(MapAccess::next_value::<String>(&mut map)?);
}
_ => {
let _ = MapAccess::next_value::<serde::de::IgnoredAny>(&mut map)?;
}
}
}
let field0 = match field0 {
Some(field0) => field0,
None => serde::__private::de::missing_field("name")?,
};
Ok(Human { name: field0 })
}
I've changed the original expanded code to take advantage of the ?
operator.
The meaning is near-identical and makes it smaller.
This function iterates through the given maps keys and values, looking for any
of the fields of the structure. In our case we have the single field 'name'
which is encoded in the Field
type which we will look at in a second.
For each key, it checks if it is one of our fields. If it is, it tries to get
the value as the expected String
type. It ignores fields that are not in our
structure. Finally it checks it has all the required fields and returns our
Human
.
The Field
types
The Field
and related types looks like this:
enum Field {
field0,
ignore,
}
impl<'de> serde::Deserialize<'de> for Field {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
Deserializer::deserialize_identifier(deserializer, FieldVisitor)
}
}
struct FieldVisitor;
impl<'de> serde::de::Visitor<'de> for FieldVisitor {
type Value = Field;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
fmt::Formatter::write_str(formatter, "field identifier")
}
fn visit_u64<E>(self, value: u64) -> Result<Self::Value, E>
where
E: Error,
{
match value {
0u64 => Ok(Field::field0),
_ => Ok(Field::ignore),
}
}
fn visit_str<E>(self, value: &str) -> Result<Self::Value, E>
where
E: Error,
{
match value {
"name" => Ok(Field::field0),
_ => Ok(Field::ignore),
}
}
fn visit_bytes<E>(self, value: &[u8]) -> Result<Self::Value, E>
where
E: Error,
{
match value {
b"name" => Ok(Field::field0),
_ => Ok(Field::ignore),
}
}
}
We have a very similar situation to our Human
visitor here. Deserialize
is
implemented for the Field
enum, and it just passes on to
deserialize_identifer
passing the FieldVisitor
along.
This FieldVisitor
expects the deserializer to call methods on it that 'look
like' field identifiers. So it implements visit_str
and visit_bytes
, both of
which see if the deserialized value looks like one of our fields. If it doesn't,
the field gets deserialized to the special Field::ignore
variant.
There is also the visit_u64
method, which allows the field name to be the
number of the field; zero in our case.
Conclusion
This was a brief look into how serde deserializes data into values. If you would like more detailed information about this, let me know via whatever medium you found this on.
Some things I think I would like to expand upon are:
- How optional types, nested structures, enums, bytes, and strings are handled.
- How to write a deserializer for a data format.
- How borrowing from underlying data is provided.
Thanks for reading!
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK