1

OwenGage.com

 3 years ago
source link: https://owengage.com/writing/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
writing - OwenGage.com

Articles

Exploring serde's data model with a toy deserializer

2021 Aug 14

Previously I did a shallow dive into understanding serde by expanding the Deserialize macro. This time I'll go deeper and map how each type in serde's data model is treated.

This is written for people intending to write their own deserializer for serde, but might be interesting for more people. Later in the article we will implement a toy deserializer for some made up format.

There are three major categories of types to consider:

  1. Sequences of values (think Vec, slice, tuple)
  2. Maps from keys to values (think HashMap, struct)
  3. Everything else

The 'Everything else' turns out to be the easiest, so lets start there.

Table of everything else

The basic steps followed between you and serde when writing a deserializer is:

  1. Deserialize::deserialize is called on some type.
  2. This calls a deserialize_* method on your deserializer, passing you a Vistitor.
  3. You call appropriate visit_* methods for that call.

For example, calling deserialize on a String calls deserialize_string on your deserializer, you deserialize a string from your input and call visit_string on the visitor you were passed.

When writing a deserializer I wish I knew

  • what real Rust types would end up calling my deserialize_* methods.
  • which visit_* methods my deserializer should call in response.

Before answering these questions, lets look at the Visitor trait. This trait has default implementations of each method. Some return an error, and others forward the call to another visitor method.

This will be important later, since it means our deserializer can call a larger set of methods than you would naïvely expect.

Here's a summary of all the forwarded methods:

Visitor methodForwarded tovisit_i8visit_i64visit_i16visit_i32visit_u8visit_u64visit_u16visit_u32visit_f32visit_f64visit_charvisit_strvisit_borrowed_strvisit_strvisit_stringvisit_strvisit_borrowed_bytesvisit_bytesvisit_byte_bufvisit_bytes

(For completeness, the methods that do not forward are visit_bool, visit_i64, visit_u64, visit_f64, visit_str, visit_bytes, visit_none, visit_some, visit_unit, visit_newtype_struct, visit_seq, visit_map, and visit_enum.)

Okay, back to who calls and expects what.

This table is mostly put together from impls.rs in serde. Each row has a Rust type that is being deserialized, the deserialize call that will be called on the deserializer, and finally the acceptable method calls on the visitor passed to the deserializer.

type being deserializedDeserializer method calledExpected Visitor method callsstruct field identifierdeserialize_identifervisit_str, visit_bytes, but also visit_u64 where the number is field number. (forwarded from: visit_char,visit_borrowed_str, visit_string,visit_borrowed_bytes, visit_byte_buf, visit_{u8,16,32})booldeserialize_boolvisit_bool()deserialize_unitvisit_uniti8deserialize_i8
  • visit_i{8,16,32,64}
  • visit_u{8,16,32,64}
with fallible conversions where the value is out of range.i16deserialize_i16i32deserialize_i32i64deserialize_i64isizedeserialize_i64u8deserialize_u8u16deserialize_u16u32deserialize_u32u64deserialize_u64usizedeserialize_u64f32deserialize_f32
  • visit_f64
  • visit_f32
  • visit_i{8,16,32,64}
  • visit_u{8,16,32,64}
with infallible conversions.f64deserialize_f64chardeserialize_charvisit_char, visit_str erroring if string length not exactly 1. (forwarded from: visit_borrowed_str, visit_string)Stringdeserialize_string
  • visit_str
  • visit_string
  • visit_bytes, erroring if not unicode
  • visit_byte_buf, erroring if not unicode, stealing Vec<u8>'s allocation
(forwarded from: visit_char,visit_borrowed_str,visit_borrowed_bytes)&'a strdeserialize_str
  • visit_borrowed_str
  • visit_borrowed_bytes, erroring if not unicode
&'a [u8]deserialize_bytes
  • visit_borrowed_str
  • visit_borrowed_bytes
Option<T>deserialize_optionvisit_some, visit_none, visit_unitPhantomData<T>deserialize_unit_structvisit_unitVec<T>deserialize_seqvisit_seq[T; N]deserialize_tuple (!)tuples eg (T,U)deserialize_tupleBTreeMap<K, V>deserialize_mapvisit_mapHashMap<K, V>

Most of this is unsurprising. Some things that sticks out to me are:

  • an array causes deserialize_tuple to be called. I guess it is similar to tuples in that arrays and tuples are fixed sizes, unlike a Vec.
  • conversions to floating point use the as keyword to convert from integer types. The reference shows that this will convert to the nearest float, and that overflow to infinity can only happen for u128 to f321. So this conversion seems sensible.
  • you cannot visit_bytes for deserializing a char like you would for a String.

Sequences

Sequences are the next level up in difficulty for our category of types.

When a type like a Vec<T> is deserialized, it calls deserialize_seq on the Vec, passing in a visitor with visit_seq implemented.

Here's the actual implementation in impls.rs:

impl<'de, T> Deserialize<'de> for Vec<T>
where
    T: Deserialize<'de>,
{
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        struct VecVisitor<T> {
            marker: PhantomData<T>,
        }

        impl<'de, T> Visitor<'de> for VecVisitor<T>
        where
            T: Deserialize<'de>,
        {
            type Value = Vec<T>;

            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                formatter.write_str("a sequence")
            }

            fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
            where
                A: SeqAccess<'de>,
            {
                let mut values = Vec::with_capacity(size_hint::cautious(seq.size_hint()));

                while let Some(value) = try!(seq.next_element()) { // ❷
                    values.push(value);
                }

                Ok(values)
            }
        }

        let visitor = VecVisitor {
            marker: PhantomData,
        };
        deserializer.deserialize_seq(visitor) // ❶
    }

    // ... snipped out deserialize_in_place
}

The steps here are:

  1. deserialize_seq gets called with a visitor. ❶
  2. The deserializer sets up an object implementing SeqAccess, and calls visit_seq with it.
  3. The visitor repeatedly calls next_element on the SeqAccess object until None is returned (or it errors). ❷

This is much like with the Iterator trait, but fallible. Some other stuff happens, like the visitor asking for a hint to the size for efficiency, but a SeqAccess object doesn't have to provide this.

Let's look at the SeqAccess trait:

pub trait SeqAccess<'de> {
    type Error: Error;

    fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>, Self::Error>
    where
        T: DeserializeSeed<'de>;

    fn next_element<T>(&mut self) -> Result<Option<T>, Self::Error>
    where
        T: Deserialize<'de>,
    {
        self.next_element_seed(PhantomData)
    }

    fn size_hint(&self) -> Option<usize> {
        None
    }
}

The only method without a default implementation is next_element_seed. If you're writing a deserializer, this is what you need to implement. When it is called it is expected that you deserialize a T from your input and return it, or None if it's the end of the sequence.

In practice this usually means you call T::deserialize repeatedly (recursively depending on other bits of your deserializer). You just have to correctly figure out when there are no more elements. This might be easy or hard depending on the data format you're deserializing.

Example data format

Let's invent a very trivial data format to demonstrate sequences. Our format is simply going to be a list of integers, where each integer is 3 bytes, with a single byte at the front to tell us the length of the sequence. That's it.

1 byte3 bytes...length of intsbig-endian integerrepeat

We're going to make the following work:

#[test]
fn three_byte_format() {
    let data = [
        3, // single byte for length
        0, 0, 1, // first 3 byte int
        0, 0, 2,
        0, 0, 3];

    let mut deserializer = Deserializer::from_bytes(&data);
    let res = Vec::<i32>::deserialize(&mut deserializer).unwrap();
    assert_eq!(&[1, 2, 3], res.as_slice());
}

Let's get the boilerplate our of the way. We need to set up our own custom error, which is just going to be a wrapped String. We aren't valuing good errors in our toy format, sorry:

use std::io::Read;

use serde::de::{Error as _, SeqAccess, Visitor};

#[derive(Debug)]
struct Error(String);

impl std::fmt::Display for Error {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.write_str(&self.0)
    }
}
impl std::error::Error for Error {}
impl serde::de::Error for Error {
    fn custom<T>(msg: T) -> Self
    where
        T: std::fmt::Display,
    {
        Error(msg.to_string())
    }
}

struct Deserializer<'de> {
    input: &'de [u8],
}

impl<'de> Deserializer<'de> {
    pub fn from_bytes(input: &'de [u8]) -> Self {
        Self { input }
    }
}

impl<'de, 'a> serde::de::Deserializer<'de> for &'a mut Deserializer<'de> {
    serde::forward_to_deserialize_any! {
        bool i8 i16 i32 i64 u8 u16 u32 u64 f32 f64 char str string
        byte_buf option unit unit_struct newtype_struct tuple tuple_struct
        seq map struct enum identifier ignored_any bytes
    }

    type Error = Error;

    fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
    where
        V: serde::de::Visitor<'de>,
    {
        use serde::de::Error;
        Err(Self::Error::custom("expected sequence"))
    }
}

This is pretty much the minimum required to make things compile. There's a few things to note here:

  • serde has it's own Error trait we need to implement that requires a custom method. This lets serde report deserialization errors, eg 'exepected seq'.
  • Our deserializer just takes some bytes. Most real deserializers allow you to provide anything that implements Read. The general approach is the same.
  • We use forward_to_deserialize_any! to implement all of the required methods. Our 'any' just errors.
  • We're implementing Deserializer for &'a mut Deserializer, not the value itself. This is crucial later.

First thing is to actually implement deserialize_seq since that is what will be called when deserializing a Vec<T>. We add the below implementation and remove seq from forward_to_deserialize_any!.

fn deserialize_seq<V>(mut self, visitor: V) -> Result<V::Value, Self::Error>
where
    V: Visitor<'de>,
{
    let mut length = [0u8; 1];
    self.input
        .read_exact(&mut length)
        .map_err(|_| Error::custom("read error"))?;

    let length = length[0] as usize;

    visitor.visit_seq(OurSeqAccess {
        inner: &mut self,
        remaining_length: length,
    })
}

The main thing we need to do here is get the length of the sequence that's coming, and get the input to the point where the next thing to be deserialized is the first element. In our format this is one in the same. For some formats the length won't exist ahead of time at all (like an array in JSON). As long as we can find the end it's fine (so ] in JSON).

We create a new type that implements SeqAccess that we're calling OurSeqAccess. This stores the length and the deserializer. Here's the implementation:

struct OurSeqAccess<'a, 'de> {
    inner: &'a mut Deserializer<'de>,
    remaining_length: usize,
}

impl<'a, 'de> SeqAccess<'de> for OurSeqAccess<'a, 'de> {
    type Error = Error;

    fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>, Self::Error>
    where
        T: serde::de::DeserializeSeed<'de>,
    {
        if self.remaining_length > 0 {
            self.remaining_length -= 1;
            let el = seed.deserialize(&mut *self.inner)?;
            Ok(Some(el))
        } else {
            Ok(None)
        }
    }
}

Some important details

We need the second lifetime 'a for the lifetime of the deserializer itself. The 'de lifetime is that of the input data.

This code is why implementing Deserializer on the mutable reference is crucial. If we implemented it on our deserializer directly, we would have to pass a clone to seed.deserialize. This would be bad, because when we consume input, it wouldn't be visible outside of the current function call (the slice in the original deserializer would not change). By implementing on the &mut we can continue to consume input.

Back to our implementation...

So! All we're doing in our next_element_seed checking there are more elements to read, then calling seed.deserialize with our deserializer. At this point we do not know the concrete type we are currently deserializing, we just have T. By calling deserialize on this T, we will call the relevant deserialize_* method.

Unlike the visitor, each deserialize_* method does not have a default implementation. So to support i32 we need to implement exactly deserialize_i32 on our deserializer (removing it from the forward_to_deserialize_any! macro), like so:

fn deserialize_i32<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
    V: Visitor<'de>,
{
    let mut buf = [0; 3];

    match self.input.read_exact(&mut buf) {
        Ok(_) => {
            // convert buffer to integer.
            let mut i = (buf[0] as i32) << 16;
            i += (buf[1] as i32) << 8;
            i += buf[2] as i32;

            visitor.visit_i32(i)
        }
        Err(_) => Err(Error::custom("could not read next integer")),
    }
}

Here we try to read exactly 3 bytes. If we're successful we do some bit shifting to create an i32 and call the visit_i32 method on it.

It is at this point where a visitor implementation might not have this method implemented, and it would be forwarded to visit_i64 thanks to the default implementation.

Our test now passes!

#[test]
fn three_byte_format() {
    let data = [
        3, // single byte for length
        0, 0, 1, // first 3 byte int
        0, 0, 2,
        0, 0, 3];

    let mut deserializer = Deserializer::from_bytes(&data);
    let res = Vec::<i32>::deserialize(&mut deserializer).unwrap();
    assert_eq!(&[1, 2, 3], res.as_slice());
}
$ cargo test
...
running 1 test
test tests::three_byte_format ... ok

Beyond a toy

If we changed the Vec<i32> in the test to Vec<i64>, then it would fail, because our deserializer does not implement deserialize_i64 (or rather, it does, but it forwards to deserialize_any, which errors).

In a more robust implementation, you would implement all of the numeric methods on the deserializer.

Maps are the most complex part to handle, and I'm going to leave quite a lot of it as an excerise to the reader. The structure and handling is very similar to the seq stuff we've just implemented, but with keys and values.

Here's a trimmed version of the MapAccess trait:

pub trait MapAccess<'de> {
    type Error: Error;

    fn next_key_seed<K>(&mut self, seed: K) -> Result<Option<K::Value>, Self::Error>
    where
        K: DeserializeSeed<'de>;

    fn next_value_seed<V>(&mut self, seed: V) -> Result<V::Value, Self::Error>
    where
        V: DeserializeSeed<'de>;

    //...
}

Similar to needing to implement next_element_seed for SeqAccess, we need to implement next_key_seed and next_value_seed.

We'd create one of these in the deserialize_map method of our deserializer. We'd set up the input to be ready to deserialize the first key of the map. Somewhere in these next methods you would need to get the input ready for the first value, then the second key, etc.

These are the major parts for creating a deserializer. I would recommend looking at the serde docs for a deserializer for more, as well as looking at existing implementations like serde_json. My own fastnbt crate also has a deserializer for the Minecraft NBT format, which might be simpler due to only working with byte slices.

Thanks for reading! Hope it was helpful.

Footnotes

  1. serde does support u128 behind a feature flag. It does not do float conversion for 128-bit integers by default)

Understanding Rust's serde using macro expansion

2021 Jul 23

While I was writing fastnbt, I struggled to find an in depth explanation of how to write a deserializer with serde. I want to explore how serde works using cargo-expand.

This article expects familiarity with Rust, and at least a little experience using the de facto serialization/deserialization library serde.

Expansive mess

cargo expand is a custom subcommand for Cargo that lets you print the results of expanding a macro. Let's try it for a simple Deserialize macro:

#[derive(Deserialize)]
struct Human {
    name: String,
}

Here we simply have a Human struct that contains a name. We derive an implementation of the Deserialize trait. If we run cargo expand...

cargo install cargo-expand
cargo expand

...then we get the incredibly short... (don't spend time looking at this)

#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () = {
    #[allow(unused_extern_crates, clippy::useless_attribute)]
    extern crate serde as _serde;
    #[automatically_derived]
    impl<'de> _serde::Deserialize<'de> for Human {
        fn deserialize<__D>(__deserializer: __D) -> _serde::__private::Result<Self, __D::Error>
        where
            __D: _serde::Deserializer<'de>,
        {
            #[allow(non_camel_case_types)]
            enum __Field {
                __field0,
                __ignore,
            }
            struct __FieldVisitor;
            impl<'de> _serde::de::Visitor<'de> for __FieldVisitor {
                type Value = __Field;
                fn expecting(
                    &self,
                    __formatter: &mut _serde::__private::Formatter,
                ) -> _serde::__private::fmt::Result {
                    _serde::__private::Formatter::write_str(__formatter, "field identifier")
                }
                fn visit_u64<__E>(self, __value: u64) -> _serde::__private::Result<Self::Value, __E>
                where
                    __E: _serde::de::Error,
                {
                    match __value {
                        0u64 => _serde::__private::Ok(__Field::__field0),
                        _ => _serde::__private::Ok(__Field::__ignore),
                    }
                }
                fn visit_str<__E>(
                    self,
                    __value: &str,
                ) -> _serde::__private::Result<Self::Value, __E>
                where
                    __E: _serde::de::Error,
                {
                    match __value {
                        "name" => _serde::__private::Ok(__Field::__field0),
                        _ => _serde::__private::Ok(__Field::__ignore),
                    }
                }
                fn visit_bytes<__E>(
                    self,
                    __value: &[u8],
                ) -> _serde::__private::Result<Self::Value, __E>
                where
                    __E: _serde::de::Error,
                {
                    match __value {
                        b"name" => _serde::__private::Ok(__Field::__field0),
                        _ => _serde::__private::Ok(__Field::__ignore),
                    }
                }
            }
            impl<'de> _serde::Deserialize<'de> for __Field {
                #[inline]
                fn deserialize<__D>(
                    __deserializer: __D,
                ) -> _serde::__private::Result<Self, __D::Error>
                where
                    __D: _serde::Deserializer<'de>,
                {
                    _serde::Deserializer::deserialize_identifier(__deserializer, __FieldVisitor)
                }
            }
            struct __Visitor<'de> {
                marker: _serde::__private::PhantomData<Human>,
                lifetime: _serde::__private::PhantomData<&'de ()>,
            }
            impl<'de> _serde::de::Visitor<'de> for __Visitor<'de> {
                type Value = Human;
                fn expecting(
                    &self,
                    __formatter: &mut _serde::__private::Formatter,
                ) -> _serde::__private::fmt::Result {
                    _serde::__private::Formatter::write_str(__formatter, "struct Human")
                }
                #[inline]
                fn visit_seq<__A>(
                    self,
                    mut __seq: __A,
                ) -> _serde::__private::Result<Self::Value, __A::Error>
                where
                    __A: _serde::de::SeqAccess<'de>,
                {
                    let __field0 =
                        match match _serde::de::SeqAccess::next_element::<String>(&mut __seq) {
                            _serde::__private::Ok(__val) => __val,
                            _serde::__private::Err(__err) => {
                                return _serde::__private::Err(__err);
                            }
                        } {
                            _serde::__private::Some(__value) => __value,
                            _serde::__private::None => {
                                return _serde::__private::Err(_serde::de::Error::invalid_length(
                                    0usize,
                                    &"struct Human with 1 element",
                                ));
                            }
                        };
                    _serde::__private::Ok(Human { name: __field0 })
                }
                #[inline]
                fn visit_map<__A>(
                    self,
                    mut __map: __A,
                ) -> _serde::__private::Result<Self::Value, __A::Error>
                where
                    __A: _serde::de::MapAccess<'de>,
                {
                    let mut __field0: _serde::__private::Option<String> = _serde::__private::None;
                    while let _serde::__private::Some(__key) =
                        match _serde::de::MapAccess::next_key::<__Field>(&mut __map) {
                            _serde::__private::Ok(__val) => __val,
                            _serde::__private::Err(__err) => {
                                return _serde::__private::Err(__err);
                            }
                        }
                    {
                        match __key {
                            __Field::__field0 => {
                                if _serde::__private::Option::is_some(&__field0) {
                                    return _serde::__private::Err(
                                        <__A::Error as _serde::de::Error>::duplicate_field("name"),
                                    );
                                }
                                __field0 = _serde::__private::Some(
                                    match _serde::de::MapAccess::next_value::<String>(&mut __map) {
                                        _serde::__private::Ok(__val) => __val,
                                        _serde::__private::Err(__err) => {
                                            return _serde::__private::Err(__err);
                                        }
                                    },
                                );
                            }
                            _ => {
                                let _ = match _serde::de::MapAccess::next_value::<
                                    _serde::de::IgnoredAny,
                                >(&mut __map)
                                {
                                    _serde::__private::Ok(__val) => __val,
                                    _serde::__private::Err(__err) => {
                                        return _serde::__private::Err(__err);
                                    }
                                };
                            }
                        }
                    }
                    let __field0 = match __field0 {
                        _serde::__private::Some(__field0) => __field0,
                        _serde::__private::None => {
                            match _serde::__private::de::missing_field("name") {
                                _serde::__private::Ok(__val) => __val,
                                _serde::__private::Err(__err) => {
                                    return _serde::__private::Err(__err);
                                }
                            }
                        }
                    };
                    _serde::__private::Ok(Human { name: __field0 })
                }
            }
            const FIELDS: &'static [&'static str] = &["name"];
            _serde::Deserializer::deserialize_struct(
                __deserializer,
                "Human",
                FIELDS,
                __Visitor {
                    marker: _serde::__private::PhantomData::<Human>,
                    lifetime: _serde::__private::PhantomData,
                },
            )
        }
    }
};

We can add some clarity here by

  • Replacing private aliases with the more expected form. So _serde::__private::Result is actually std::result::Result.
  • Renaming type parameters to be easier on the eyes, like __D to just D.
  • Removing some of the annotations like #[automatically_derived].
  • Removing the wrapping scope ie const _: () = {...}.
  • Moving nested struct and impl blocks to the top level.

These things are to isolate the expanded code from the code around it. Preventing the expanded code affecting yours, and yours from affecting the expanded code.

A Reddit user pointed out that the wrapping scope was introduced because of GitHub serde issue 159.

There's quite a few types and implementations created by this expansion. Below is a quick summary:

ThingDescriptionimpl Deserialize for HumanThis is exactly what we wanted to derive.struct HumanVisitorA visitor that gets called by the deserializer. It's job is to produce the Human value.enum FieldThis enum represents the fields of our struct Human, in our case it simply containsfield0 for 'name', and anignore variant.struct FieldVisitorA visitor purely to check identifier-like values produced by the deserializer match our fields.

Deserialize implementation

After all that clean up for human eyes, here's our Deserialize implementation:

impl<'de> serde::Deserialize<'de> for Human {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        const FIELDS: &'static [&'static str] = &["name"];

        serde::Deserializer::deserialize_struct(
            deserializer,
            "Human",
            FIELDS,
            HumanVisitor {
                marker: PhantomData::<Human>,
                lifetime: PhantomData,
            },
        )
    }
}

We can see here that this just delegates to the deserialize_struct method, passing some extra information like the names of our fields, and a visitor that was also generated by the macro. Nothing too complicated here. What's that HumanVisitor?

Our visitor

Here's our visitor with some code snipped out for brevity:

struct HumanVisitor<'de> {
    marker: PhantomData<Human>,
    lifetime: PhantomData<&'de ()>,
}

impl<'de> serde::de::Visitor<'de> for HumanVisitor<'de> {
    type Value = Human;
    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        fmt::Formatter::write_str(formatter, "struct Human")
    }

    fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
    where
        A: SeqAccess<'de>,
    {
        // ...
    }

    fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
    where
        A: MapAccess<'de>,
    {
        // ...
    }
}

The purpose of a visitor is to be driven by the deserializer, constructing the values as it goes. This visitor in particular is expecting to have methods called that can be converted to our Human structure. It would make no sense if the visitor received an integer type, because that does not represent a structure.

The visitor only implements the methods that make sense for the type it is expecting. For structs, you would generally expect a map of key-value pairs. This is why our visitor implements visit_map. It also implements visit_seq (seq for sequence); this supports formats that encode the values in order, skipping keys.

Default method implementations

The serde::de::Visitor trait has default implementations of each method, which simply raise an error or forward the call on to another method. Here's the default implementation of visit_bool:

fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
where
    E: Error,
{
    Err(Error::invalid_type(Unexpected::Bool(v), &self))
}

So if the deserializer calls the visitor with a bool, but it does not implement the method, by default you will get an error. Some method implementations add convenience for the common cases, like visit_u8, which forwards on to visit_u64:

fn visit_u8<E>(self, v: u8) -> Result<Self::Value, E>
where
    E: Error,
{
    self.visit_u64(v as u64)
}

This makes some sense. If you are making a visitor that accepts unsigned integer types, you can implement visit_u64 and handle unsigned integers of smaller types like u32 for free. This can make untagged enums containing these forwarded types difficult however, forcing us to create stricter deserializers.

Map access

Let's take a look at the code for visit_map:

fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
where
    A: MapAccess<'de>,
{
    let mut field0: Option<String> = None;
    while let Some(key) = MapAccess::next_key::<Field>(&mut map)? {
        match key {
            Field::field0 => {
                if Option::is_some(&field0) {
                    return Err(<A::Error as Error>::duplicate_field("name"));
                }
                field0 = Some(MapAccess::next_value::<String>(&mut map)?);
            }
            _ => {
                let _ = MapAccess::next_value::<serde::de::IgnoredAny>(&mut map)?;
            }
        }
    }
    let field0 = match field0 {
        Some(field0) => field0,
        None => serde::__private::de::missing_field("name")?,
    };
    Ok(Human { name: field0 })
}

I've changed the original expanded code to take advantage of the ? operator. The meaning is near-identical and makes it smaller.

This function iterates through the given maps keys and values, looking for any of the fields of the structure. In our case we have the single field 'name' which is encoded in the Field type which we will look at in a second.

For each key, it checks if it is one of our fields. If it is, it tries to get the value as the expected String type. It ignores fields that are not in our structure. Finally it checks it has all the required fields and returns our Human.

The Field types

The Field and related types looks like this:

enum Field {
    field0,
    ignore,
}

impl<'de> serde::Deserialize<'de> for Field {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        Deserializer::deserialize_identifier(deserializer, FieldVisitor)
    }
}

struct FieldVisitor;

impl<'de> serde::de::Visitor<'de> for FieldVisitor {
    type Value = Field;

    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        fmt::Formatter::write_str(formatter, "field identifier")
    }

    fn visit_u64<E>(self, value: u64) -> Result<Self::Value, E>
    where
        E: Error,
    {
        match value {
            0u64 => Ok(Field::field0),
            _ => Ok(Field::ignore),
        }
    }

    fn visit_str<E>(self, value: &str) -> Result<Self::Value, E>
    where
        E: Error,
    {
        match value {
            "name" => Ok(Field::field0),
            _ => Ok(Field::ignore),
        }
    }

    fn visit_bytes<E>(self, value: &[u8]) -> Result<Self::Value, E>
    where
        E: Error,
    {
        match value {
            b"name" => Ok(Field::field0),
            _ => Ok(Field::ignore),
        }
    }
}

We have a very similar situation to our Human visitor here. Deserialize is implemented for the Field enum, and it just passes on to deserialize_identifer passing the FieldVisitor along.

This FieldVisitor expects the deserializer to call methods on it that 'look like' field identifiers. So it implements visit_str and visit_bytes, both of which see if the deserialized value looks like one of our fields. If it doesn't, the field gets deserialized to the special Field::ignore variant.

There is also the visit_u64 method, which allows the field name to be the number of the field; zero in our case.

Conclusion

This was a brief look into how serde deserializes data into values. If you would like more detailed information about this, let me know via whatever medium you found this on.

Some things I think I would like to expand upon are:

  • How optional types, nested structures, enums, bytes, and strings are handled.
  • How to write a deserializer for a data format.
  • How borrowing from underlying data is provided.

Thanks for reading!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK