5

Rust Type System: P2

 1 year ago
source link: https://sanjuvi.github.io/Blog/posts/Rust-Type-System-Part-2/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Table Of Contents

Use Cases

Part 1

Type as Sets

There are various ways to interpret types:

  1. Storage Method: The way data is stored. Types like i32, u8, and other copy types are stored on the Stack, unless explicitly placed inside a Box.

  2. Permitted Operations: Different types allow different operations. Integers and floats allow arithmetic operations, while performing arithmetic operations on strings doesn’t make sense. However, the addition operation is overloaded to make sense for strings—adding two strings concatenates them.

  3. Assignable Values: For instance, consider let x: i32;. The assignment only accepts i32 values.

Programming languages utilize types to express our intentions to the computer clearly and concisely. For example, when specifying a variable with the type u8 (Unsigned 8-bit integer), the compiler guarantees the following:

  1. The values are integers, not decimal numbers or any other string values.
  2. The integer can’t be negative.
  3. There’s no need to validate logic to ensure that a variable is the expected type, as is the case with strings.
  4. The possible values the type can represent range between 0 and 255, and nothing more or less.
  5. Arithmetic and comparison operations are permitted, with overflow being detected in debug mode.
  6. If the expected type isn’t u8, there is no implicit conversion; instead, an error is thrown. To learn more about Rust’s preferences for implicit and explicit conversions, you can read more here.

However, not all programming languages help us avoid other invalid states that can arise from certain choices, especially in the presence of implicit conversions or when dealing with integers of any type, as is the case in languages like JavaScript or Python. In such languages, a negative value might result in a runtime error if it doesn’t make sense in the context.

Types as Sets of Values

We can consider types in a programming language as containers of sets of values, each set having associated operations based on the values it contains. The cardinality of these sets represents the total number of values contained. How does this relate to types? In Rust, there are ten integer types (five signed, for negative values, and five unsigned, for positive values), two floating-point types (f32 and f64), as well as String and char. Why is this concept interesting? Why do languages like C/C++ and Rust utilize different subsets of the same type?

Each type has its own set of capabilities and limitations. Depending on the problem we’re solving, choosing the appropriate type can make a significant difference. For example, when developing a game application, initially, we might not have any players. It’s logical to use the smallest possible ranges within the integer types, as player counts should not be represented using floating-point values due to the inclusion of decimals and negative values that aren’t meaningful. Therefore, we opt for u8 instead of other integer types to restrict ourselves to positive values only, thereby reducing the number of potential invalid states to just 255. This not only helps us choose the right type but also utilizes less memory since u8 only takes up 1 byte. The limitation is that if we add just 1 more value to the maximum u8 can hold, it will overflow. However, such guarantees cannot be achieved in languages where only one integer type is available to represent both positive and negative values, as is the case with languages like JavaScript and Python. These languages lack the type system to prevent them from encountering invalid states in programs.

At times, we might further constrain ourselves to not require all the values a type can represent, focusing instead on a subset of those values.

  • Representable States: These are all possible values that a type can represent. For instance, the lower and upper bounds of a u64 are 0 to 18446744073709551615.
  • Valid States: These represent a smaller subset of the type’s representable states. For example, an image might have dimensions of 256*256, but only a subset of values from this range would be relevant for making decisions. The validity of states depends on the context or domain we are modeling. Domain-specific modeling reduces the number of possible valid states.
valid.png

Usually, we resort to brute force when we lack sufficient information to narrow down the search space (possible states that the type can represent in this context).

For instance, if we intend to represent days, we might use u8 because it doesn’t accommodate negative values. Depending on the integer value, we return a representative day, such as 1 meaning Monday, 2 meaning Tuesday, and so on. However, despite its smaller memory footprint and exclusive use of positive values, this approach is inefficient. For the given definition, the range of representable states is 0 to 255, but the valid states are 1 to 7, indicating that we have 249 invalid states.

fn return_recipe(day: u8) -> &'static str {
    if day == 1 { return "Dosa"; }
    else if day == 2 { return "Idli"; }
    // ... Weekend should include non-veg, otherwise, no one will come to eat.
    else if day == 7 { return "Chicken"; }
}

The function return_recipe would have significantly more invalid states in Python or JavaScript compared to its Rust counterpart due to the possibility of passing negative or large integer values.

However, the function above doesn’t return anything for invalid states, which is inconvenient for business logic. There’s a need to either reduce the number of invalid states or only allow valid states to be representable. While the former might involve considerable validation logic within the function, the latter is the focus of this discussion.

Another instance that illustrates the situation is when using the String type, where the potential outcomes seem to spiral out of control.

fn parse_email(email: &str) -> String;
fn parse_phone(phone: &str) -> String;
fn parse_url(url: &str) -> String;
fn parse_header(header: &str) -> String;

Aside from encountering a multitude of invalid states, the type system provides no preventative measures against passing a string with a phone number to parsers for other types of data. Consequently, errors may arise even if the string values are correct for the intended parser.

Struct and Enum

Before delving into how to represent valid states or reduce invalid states in the aforementioned problem, let’s first explore Rust’s algebraic types, namely struct for product types and enum for sum types. But why are they referred to as product and sum types, respectively?

As defined in cs42:

The fundamental concept behind algebraic data types (ADTs) is to depict relationships between data, specifically the concepts of and and or. An AND type signifies the combination of multiple types, while an OR type signifies a value that corresponds to precisely one of several possible types.

In Rust, struct describes and relations, while enum describes or relations. This analogy is akin to boolean conditions in control flow expressions. For instance:

if a && b && c {} else {}

Here, a, b, and c must all be true for the if expression to execute. It only matches if all conditions hold.

if a | b | c {} else {}

In contrast, if any one of the conditions is met, the if expression is executed, or it falls back to the else block. The condition is short-circuiting, meaning if the first condition is true, further checks are unnecessary.

Struct

struct ProductType {
    two_state: bool,
    finite_state: u8,
}

Adding fields to a struct multiplies its states. Why do these fields in the struct multiply? This is because a struct can’t be partially initialized. The initialization process is as follows:

let product_type = ProductType {
    two_state: true,
    finite_state: 1,
};

The bool type represents only two values, true and false, and occupies one byte of memory. Its cardinality is 2. The u8 type represents values in the range of 0..255. According to the product type definition (2 x 256), this struct should have 512 possible states. Specifically, it can have states like true and one of 0..255 (1 * 256), and false and one of 0..=255 (1 x 256). Thus, the struct can exist in any of the 512 states or all 512 states, if desired.

// Number of possible states of the above struct type
let bool_ = true;

// True and 0..255 values
for i in 0..=u8::MAX {
    product_type = ProductType {
        two_state: bool_, // true
        finite_state: i,
    };
}

// False and 0..255 values
for i in 0..=u8::MAX {
    product_type = ProductType {
        two_state: !bool_, // false
        finite_state: i,
    };
}

In the first loop, the boolean is set to true, and the state takes on values from 0 to 255 (inclusive). The second loop follows the same pattern, but with the boolean set to false. A struct is used to represent the and relationship between its fields, which results in this multiplicative behavior.

Accessing struct fields requires the dot operator, provided that the type is in scope. If not, static functions are used for initialization and instance methods are used for field access, if provided. struct can also be generic and accept default values for generic types.

There are three ways to define a struct depending on use cases. Fields are private by default, unless the pub keyword is used.

  • Named struct: Fields are named, and the order of fields doesn’t matter.
  • Tuple struct: Fields are unnamed, and the order of initializing types is important.
  • Unit struct: An empty struct that takes no space in memory. It’s used for various purposes, such as type state programming. An example of an empty struct type is PhantomData.
enum SumType {
    TwoState(bool),
    FiniteState(u8),
}

The bool and u8 states remain the same, whether it’s a product or sum type. The enum contains two variants that hold data. According to the sum type definition (2 + 256), this enum should have 258 possible states. If the variant is TwoState, it can be either true or false. If the variant is FiniteState, it can be any value from 0 to 255 (inclusive). Therefore, the enum is either in two states or 256 states, but not both simultaneously. This is an addition of states, not multiplication. Enum variants are partially initialized in terms of the variant itself, meaning we can leave other variants uninitialized when initializing.

use self::SumType::{TwoState, FiniteState};
let sum_type = TwoState(true);
let mut sum_type = TwoState(false);

for i in 0..=u8::MAX {
    // Mutation is possible because both variants are the same under the SumType enum.
    sum_type = FiniteState(i);
}

When the first variant is initialized, it will be either true or false. Otherwise, it will be one of the 0 to 255 states. The number of possible states would increase if the upper bound of the type were larger, as is the case with u64.

Enum variants cannot be directly accessed for safety reasons, similar to a union in C. The only way to access the data inside the enum is to use pattern matching. Different pattern matching options are available for different conveniences. Pattern matching will be discussed in Part 4 of this series, as it is not solely for accessing enum variants.

Another advantage of an enum is its compact memory usage. It’s able to represent either this or that, instead of both this and that. This means it occupies memory equal to one of its larger variants, not the entire memory space of all variants. Why is the product type not efficient for representing errors?

fn main() {
    // There's no way to know if this is a success or failure.
    divide(1, 0);
}

fn divide(n1: i32, n2: i32) -> SomeNone<i32> {
    if n2 == 0 {
        SomeNone { some: 0, none: 0 }
    } else {
        SomeNone {
            some: n1 / n2,
            none: 0,
        }
    }
}

struct SomeNone<T> {
    some: T,
    none: T,
}

In this particular case, we have to use both states for success, setting none to a dummy value like zero, and for failure, setting both some and none to zero to convey the message that the operation has failed. The struct wastes a total of 12 bytes of memory, compared to the Option type, which only takes up 4 bytes. This inefficiency increases further when using other types. No dynamic dispatch is involved here. This showcases the power of enums in Rust.

Three ways to define an enum: - Variant with named fields, similar to a struct. - Variant with tuple-like fields. - An empty variant, which can be used as a base case or equivalent to Option’s None variant.

Both structs and enums have states, unlike traits. Layouts are optimized by default, and they support instance methods and static functions through the impl block. The derive macro generates trait methods for them. Both structs and enums support move semantics by default, even if the fields are of the Copy type. However, this can be changed by implementing the Copy and Clone traits either through the derive macro or manually via the impl block. They can be extended through traits. Both enums and structs can contain enums within structs and structs within enums. Rust doesn’t have inheritance. To learn more about object-oriented features in Rust, read this article.

Patterns in Rust Using Algebraic Types

Plain types versus new types (Wrapped types): - Plain types are Rust’s core numeric types like i32, u8, f32, i8. While the string is a built-in type in the standard library, it’s essentially a user-defined type that is wrapped by a pointer and numeric types. - The new type pattern in Rust involves simply wrapping built-in types. What are the benefits of wrapping types that might themselves be wrapped? If a new type is defined through a struct or enum, we can’t use the basic operations of the built-in types without explicitly overloading mechanisms through traits.

Type aliases or type synonyms are a way to alias types with descriptive names for programmer convenience, much like renaming imports. In other words, a type alias like type Integer = usize is exactly equivalent to usize, so all operations applicable to usize also apply to the Integer type. Note that type aliases have various use cases, like reducing boilerplate for long types. However, type aliases do not provide any additional type safety. Consider the following function:

type Integer = usize;
let add: Integer = Integer + usize;
accept_usize(add);
fn accept_usize(x: usize) {}

New type pattern

The newtype with a single case type doesn’t impose any runtime overhead. The sizes of Email and String are the same; i.e., at the representation level, they are identical. However, at the type level, they differ, providing more abstraction compared to other built-in abstract types. New types are not subtypes in the sense that we can’t use one type in place of another.

use std::mem::{align_of, size_of};

#[derive(PartialEq)]
struct Email(String);

#[derive(PartialEq)]
struct Phone(String);

#[derive(PartialEq)]
struct Url(String);

#[derive(PartialEq)]
struct Header(String);

assert!(size_of::<String>() == size_of::<Email>());
assert!(align_of::<Header>() == align_of::<Url>());

fn email(email: Email) {}
fn phone(phone: Phone) {}
fn url(url: Url) {}
fn header(header: Header) {}

Compile errors arise if we pass different types to different enums in the parse string method, as each parse method accepts domain-specific types. We don’t need validation logic in various places when we use these methods, as the only way to construct a value is by initializing it with the correct value, or else they won’t compile. This is a significant improvement over the previous design. Creating domain-specific types is straightforward in Rust but would add more complexity in object-oriented programming without adequate support.

We can derive equality and comparability for our custom types, easily achieved using derive macros. In other languages, we might need to override methods like __add__ and __eq__ in Python. In Rust, these functionalities are explicit and aren’t automatically generated by the compiler. This deliberate approach ensures that the compiler doesn’t inadvertently generate undesired functionality. Rust’s approach avoids the pitfalls of languages that automatically derive these methods, which might not always align with what developers want. By wrapping a primitive inside a tuple struct and not implementing any non-meaningful traits for the domain, we can prevent unintended operations. Furthermore, Rust’s coherence rules prevent consumers from implementing these methods for imported types.

The newtype pattern is particularly useful when giving new meaning to a type based on domain constraints. For instance, in database systems, IDs are often represented as usize or u64 due to their positive nature. However, this can lead to issues because integers lack the semantic distinction needed to prevent nonsensical arithmetic operations. Wrapping IDs in a tuple struct like struct Id(usize) and providing appropriate methods can address this concern.

The newtype pattern also offers benefits unique to Rust. For operations needing primitives that can’t be cloned, wrapping the primitive and not deriving Copy or Clone provides single ownership semantics. Primitive types lack ownership semantics in Rust, but with this approach, we can make our primitive type behave like an owned type.

use Days::{Friday, Monday, Saturday, Sunday, Thursday, Tuesday, Wednesday};

enum Days {
    Monday,
    Tuesday,
    Wednesday,
    Thursday,
    Friday,
    Saturday,
    Sunday,
}

fn return_recipe(days: Days) -> &'static str {
    match days {
        Monday => "Idli",
        Tuesday => "Dosa",
        Wednesday => "Poori",
        Thursday => "Sapathi",
        Friday => "Veg Rice",
        Saturday => "Egg Rice",
        Sunday => "Chicken",
    }
}

With this design, valid states are represented by the seven enum variants, eliminating any invalid states. The match expression must be exhaustive, covering all seven variants of the Days enum; otherwise, Rust won’t compile the code. This is vital for accommodating potential additions or removals of variants in the future, ensuring that all possibilities are covered. However, it’s important not to always use the catch-all pattern just to satisfy the compiler.

Enums and newtype patterns provide tools to represent valid states and prevent the introduction of invalid states, enhancing error handling and reducing the need for extensive testing. These features, often found in functional programming languages, are available in Rust without imposing the overhead typically associated with them.

Other Uses of Enums and Structs in Rust

Homogeneous collection types like Vector, HashMap, and others cannot accept heterogeneous types in their collections. A workaround is to wrap those types inside an enum and use multiple types within the Vector. This offers the advantage of storing data contiguously. Control flow expressions must return the same type as the previous branch. However, we can achieve this by wrapping both vectors and strings under the enum type. This workaround is used with Option and Result types for error handling, allowing the return of different types. Additionally, trait objects can return various abstract types as long as they implement a common trait. You can find an example of this workaround in this gist.

Error Handling

Rust lacks null pointers/null objects and exceptions. However, similar patterns can be expressed without the safety issues associated with null pointers or null objects. This ensures that we won’t forget to check for null before accessing data or accidentally access data without proper checks.

Option and Result types are both defined generically and offer useful methods for chaining, fallbacks, and more without explicit pattern matching. None/null objects in Python and Java represent any type, leading to runtime errors when invoking methods on them. With Option types, we set a concrete type when initializing with None, and the type is inferred when initialized with Some. This allows method calls to conform to the type of value in the Some variant. This topic is covered in greater depth in this article, which also delves into other concepts.

Algebraic types (Structs and Enums) empower us to make good things possible while making bad things impossible. Additionally, these types offer benefits unique to Rust. The combination of enums and newtype patterns allows us to represent valid states and prevent invalid ones, enhancing error handling and reducing the need for extensive testing. Rust provides these features elegantly, without imposing the overhead typically associated with them. The next significant concept in Rust is the trait system.

Traits

Traits in Rust are similar to interfaces in Java or typeclasses in Haskell, except they provide zero-cost abstraction and offer more flexibility through the addition of Generic Associated Types (GATs), as we’ll see shortly. However, traits alone don’t have access to states, which must be accessed when implementing them for structs or enums. The trait interface can include function signatures and associated types. A trait can define multiple methods, some of which can have default implementations (provided methods) based on required methods or super traits. Methods that lack default implementations are called required methods because when implementing the trait, we must provide an implementation for those methods.

Traits in Rust form a unified system used for different purposes than how other languages have utilized them. Trait abstractions are dispatched statically or dynamically through trait objects. Notably, Rust doesn’t have inheritance.

trait Types {
    fn borrow(&self);         // &T
    fn mutable_borrow(&mut self);  // &mut T
    fn takes_ownership(self);   // T
}

These methods also reflect ownership and borrowing rules. The abstract self type refers to the type being implemented. For structs, the implementing type is the struct itself, and the same applies to enums. Traits are the only types that can be implemented for structs and enums outside of their own inherent implementation block. In other words, we can’t implement a struct for another type, an enum for another type, or a trait for another trait. The syntax for implementing traits for structs and enums is as follows:

impl Trait for StructType {
    // Now methods can access the states (fields) of the struct or enum
}

Marker Types

Marker types don’t have sizes or methods. They are similar to empty structs or enums, but they can be implemented for different types of structs and enums. Marker types are designed to provide static analysis or compile-time verification of facts without imposing any runtime overhead. Marker types help group types with certain characteristics, preventing them from being mixed with other groups of types. The std::marker module contains marker types for the standard library, and additional ones can be created by libraries. It’s not necessary for programmers to implement all marker types themselves. These types can be used in trait bounds once the properties of each type are known.

  • Copy: This marker trait is automatically implemented by the compiler, and we can’t implement it ourselves. This marker trait signifies that a type has copy semantics and doesn’t participate in ownership. It exists alongside move semantics, which are essentially ownership types.

  • Sized: Similar to Copy, this marker trait has restrictions. It marks types that are sized. Types like str, [T], and dyn Trait do not implement Sized. There is special syntax support for specifying whether a type may be sized or not. For example:

struct<T: ?Sized> {
    x: T,
}

This struct can accept both sized and non-sized types when being constructed. This syntax is specific to this particular marker trait.

  • Eq: This trait also lacks methods in its body. Its purpose is to guarantee the equality properties for a type. However, floats (f32, f64) do not guarantee such equality properties. The standard library introduces two different traits, PartialEq and Eq, implementing the former for floats but not the latter. Other types implement both traits. These traits are related through trait composition. Floating-point values cannot be used as keys in hashmaps or any type that relies on equality properties, as they cannot guarantee that hashing the same value will result in the same hash function when used as a key. This would lead to incorrect results when querying the hashmap. Therefore, f32 and f64 do not implement the Eq and Hash traits.

Generics

Programming languages offer ways to reduce code duplication. In Rust, this is achieved through generics. Generics represent parametric polymorphism and help reduce code duplication while maintaining the performance of concrete type functions. In Rust, generic contexts of functions, structs, and enums can be statically dispatched because type information is available to the compiler for optimization, avoiding indirection. However, dynamic dispatch can also be chosen if necessary.

fn generic<T>(x: T) {}
struct GenericS<A>(A);
enum GenericE<B> {
    Any(B),
}

Abstract types can be generic, such as T, A, and B above. Rust employs a unique generic notation that is distinct from other languages. Generic lifetimes in Rust abstract over lifetimes, instead of just types. This is specified as follows:

fn generic_lifetime<'a, T>(x: &'a T) {}
struct GenLifetime<'a>(&'a str);
enum GenLifetime<'a, T> {
    Borrow(&'a T),
}

Any type T must live at least the lifetime 'a. Generic lifetimes describe an abstract scope or region where data is valid to be used or returned from within the function body. Explicit lifetimes are only used when defining them, as the compiler infers them based on the function’s, struct’s, or enum’s signature. More details can be found here. Generics or traits alone don’t provide much value to types.

Const generics are abstract over values rather than types. As the name suggests, they are evaluated at compile time and support default values. The value of a const can be inferred or explicitly defined. Const generics allow us to ensure certain properties or invariants at compile time. For example, const generics can ensure that the dimensions of two arrays are the same without requiring runtime checks.

struct Dimension<const N: usize, T> {
    n1: [T; N],
    n2: [T; N],
}

In this case, we cannot construct n1 and n2 with different lengths. This verification occurs at compile time.

Many Uses of Generics with Traits

The combination of generics and traits provides a more powerful mechanism for type safety, greater abstraction, compile-time evaluation, prevention of concurrency bugs at compile time, and much more.

Ergonomics from a User’s Perspective

Strings are always UTF8-encoded, which means they don’t support random access due to their variable-width encoding. String interpolations are verified at compile time, making them easier to work with. Unlike languages such as C, there’s no need to specify argument types explicitly; the compiler deduces them. This verification is achieved through traits.

  • {} - Only types that implement the Display trait can be printed.
  • {:?} - Only types that implement the Debug trait can be printed.
  • {:p} - Only types that implement the Pointer trait can print the pointer address, and others.

Failing to implement these traits will prevent the code from compiling. Built-in types already have these traits implemented. Handling string types in Rust might be more intricate compared to other languages. You can find more information in a blog post I published, which goes beyond what the official Rust book covers.

The From and Into traits are used to construct types from user-friendly interfaces. The Extend trait is used to add more values to collections. The Collect trait is used to collect into different collection types.

use std::collections::HashMap;

// Construct from an array of tuples
let string = HashMap::from([(1, 2), (3, 4)]);

// The `into` is automatically available if `From` is implemented
// The type annotation is needed
let mut hashmap: HashMap<u8, u8> = [(1, 45), (56, 78)].into();

// Inserting more key-value pairs
hashmap.extend([(45, 5), (6, 234)]);

// Turn into another collection
let vector = hashmap.iter().collect::<Vec<_>>();
println!("{vec:?}");

Most collections in the Standard library implement these traits to simplify their construction complexities.

The Default trait is used for constructing and initializing types, providing a convenient way to initialize the remaining fields of a struct using ..Default::default(). This is particularly useful when a struct has many fields and the client doesn’t need to specify all of them. No explicit type annotation is needed, and it reduces the requirement for initializing all struct fields, as other fields are initialized with default values. For instance, the default value for an Option is None, and for a string, it’s an empty string "", which is only seen when used with debug or pretty printing.

When data is enclosed in types like Box, Mutex, Rc, and Cell, we can directly call methods on the inner type without using special operators, as we usually do with the dot operator. This is due to the two traits Deref and DerefMut.

use std::cell::Cell;
use std::rc::Rc;

let nested_type: Box<Rc<Cell<char>>> = Box::new(Rc::new(Cell::new('a')));
nested_type.set('b');
println!("{}", nested_type.get().is_alphanumeric());

Without these traits, the code above would look less readable and clean. The direct use of set and get methods of the Cell type, even though it’s wrapped in two other types, simplifies code readability. Each outer type implements Deref, so when calling methods of the Cell type, the outer types return the inner type, which then returns the type of Cell.

Facilitating Better API Design

Expressions are central to many operations performed with iterators. These operations can be piped or chained together, with each chain transforming the data as needed for processing. Expressing tasks declaratively is easier than using for loops and manipulating temporary variables and initialization. Iterator expressions are lazily evaluated, in contrast to imperative/eager evaluation, where operations are executed as soon as they are encountered.

Chaining APIs on iterators, Option, and Result types, or calling factory methods on structs where each method returns Self, is common. In functional programming languages, these concepts are referred to as monads. In the case of iterators, the logic involves implementing the Iterator trait for the return type, allowing for chaining as long as each expression returns a type that implements the Iterator trait. These expressions are then finally consumed by operations like collect, sum, product, fold, and others. The Polars data library also offers a powerful expression-based API that runs in parallel, unlike the sequential execution of iterator methods.

The Pattern trait on string slices accepts different types to the same API.

let string = "Hello";
string.find('a');
string.find("He");
string.find(['a', 'b']);
string.find(|ch: char| ch.is_ascii());

This approach provides varying levels of convenience through a single method. For instance, the std::io::copy function accepts any type that implements the Read trait for reading bytes from the source and any type that implements Write for writing bytes to the destination. This allows us to read streams from a TcpStream and write to a File for storage, an in-memory Cursor, or even to the terminal.

Generic Associated Types

Trait interfaces can have associated types, constants, and lifetimes. These types are considered associated because they conform to the trait’s specification when implemented. These types offer tighter abstractions; users of the API don’t need to specify anything to use it. By understanding the interface specification, we can design APIs in a more abstract manner.

fn accept<I>(x:I,item:I::Item) where I:Iterator , I::Item : Eq + Debug {
}

The syntax for accessing the associated type is < T as SomeTrait>:: AssociatedType. However, in this case, I is already an iterator, so we can access it as shown above. The function has specific guarantees. Passing vec![1,3,4].iter() means the item is of type &i32, otherwise a compile error occurs. The same applies to the other two variants. Even if the item type matches the iterator’s items, it’s still a compile error if the items are not Eq and Debug. These bounds must describe an and relationship. If this were not the case, passing floating-point values should have compiled. Instead of using the Iterator bound, it’s better to use IntoIterator to spare users from needing to call iter, into_iter, or iter_mut on the caller side.

Associated types can be

bounded similarly to generics. Associated constants specify the type and can also accept default values; otherwise, they must be initialized when implementing. Generic Associated Types (GATs) are used inside traits and are implemented for structs or enums. What’s the use of GATs? The Iterator trait and its convenience traits, operator overloading traits, and the Pattern trait all rely on associated types.

One practical application of associated constants is ensuring that types are 8-byte aligned. Rather than creating a common trait and implementing it for all 8-byte aligned types, we can use associated constant types. Since constants must be evaluated at compile time, we can ensure that the type is 8-byte aligned without relying on runtime checks. You can find an example of this approach here.

Bounds on the Generic Type using Traits

Bounds are similar to concepts in C++ that constrain the type to exhibit specific behaviors. Both Rust generics and C++ templates are performant, but Rust prevents misunderstandings as early as possible. Unlike C++’s SFINAE (Substitution Failure Is Not An Error), Rust generics do not have this feature. Without bounds, we can’t perform useful operations with the types.

fn nothing<T>(x: T) {}

If you intend to perform any operations with the type, you need to know the common traits to specify the desired behavior. The most commonly used trait bounds are:

  • Copy: For types that only accept Copy types like i32, f64, and &str.
  • Clone: For types that only accept Clone types like String and Vec, enabling the use of the clone method generically.
  • std::ops::*: Operator overloading traits that allow operations on the type.
  • Read and Write: Traits that work with bytes.
  • Closure Traits: These traits are extensively used in iterator methods as generic types.
  • Debug and Display: For types that need to be printed either in Debug mode or Display mode.

Once these bounds are specified, you can use the appropriate methods and operators of those traits inside the function body. A single bound might not be very interesting; it’s the combination of bounds that makes a difference. By using +, you can combine multiple bounds for the same type, and the where clause allows you to define multiple types and their bounds expressively. These are not trait inheritance but requirements for the type. Trait bounds can range from loose constraints, accepting a wider range of types like Option and Result, to tight constraints like spawning threads where local references and non-Send types can’t be passed.

Better Abstraction and Type-Safe APIs

Traits and their generic associated types provide a way to design APIs that are safe, ergonomic to use, and even used for performance improvements. Let’s take the example of IntoIterator from the standard library, a convenience trait used in for loops. How can we ensure that calling into_iter on any type that implements the Iterator trait will return an iterator and that the type of items it iterates over is the same as the type we called into_iter on? This is the essence of the IntoIterator trait’s definition from the standard library:

trait IntoIterator {
    type Item;
    type IntoIter: Iterator<Item = Self::Item>;

    fn into_iter(self) -> Self::IntoIter;
}

The IntoIterator trait defines two associated types, Item and IntoIter. The return type of the into_iter method is any type that implements the Iterator trait, determined by the bound on the IntoIter associated type. The type of item it produces is defined by the Item associated type. Both of these associated types ensure that the type conforms to the required behavior. This design benefits both implementers and consumers of the API. Implementers must adhere to the specifications of the IntoIterator trait, while consumers can’t misuse the trait, ensuring that the type system enforces correctness.

pub fn set(&mut self, value: P::Target)
where
    P::Target: Sized,

In this function, we receive two guarantees while remaining generic over the type:

  1. The accepted value must be the value to which it’s dereferenced.
  2. The type of Target must be sized.

The above-mentioned Deref and DerefMut traits are decomposed into two separate traits. These traits are related to each other. Deref can only provide an immutable reference to the underlying data, which is useful when read-only access is required. This prevents mutation of data inside types like Arc, Rc, and the read method on RWLock. Attempting to implement DerefMut for these types results in an error due to the orphan rule, which prevents implementing traits on types you don’t own. DerefMut does not have a separate Target type; instead, it returns the same Target type as Deref, but with a mutable reference, since Deref is a super trait of DerefMut. This means you can’t implement DerefMut without implementing Deref first, but you can implement Deref only. This decomposition allows for more flexibility and avoids coupling between the two traits.

Conditional APIs provide different instances with different methods based on their types. This not only applies to traits but also to struct and enum instance methods, enabling the creation of conditional APIs through multiple impl blocks. The Write trait, for instance, is used to work with bytes and is implemented for Vec<u8>, not for vectors of any other type. This ensures that the vector must contain u8 values in order to call the write method or use it in contexts that expect a Write implementation. An example of this can be found here.

Exist

ential types allow us to add methods to foreign types or implement foreign methods on our types for extensibility, without creating entirely new APIs from scratch for new types. For instance, the Itertools crate extends the methods of the standard library’s Iterator trait by simply specifying Itertools: Iterator. Once the crate is in scope, its methods are available alongside the Iterator methods. This approach is also used by the String crate to extend methods for the built-in String type. The Rayon parallel iterator has similar methods to regular iterators, but they are not extension methods; they share the same names as the iterator methods for ease of use.

Extending types improves the ergonomics of public APIs. However, sometimes we need to protect a trait from extension by users. For example, the SliceIndex trait is used for indexing slices, but implementing it correctly is unsafe. To prevent users from implementing it themselves, it’s indirectly implemented via the safe Index and IndexMut traits. This is achieved by creating a new trait in a private module and binding it to the type you want to protect. This ensures that users cannot extend it because the trait bound won’t be satisfied. An example of this approach can be found here.

Rust’s abstraction provides not only zero-cost abstraction but also abstraction with strong safety guarantees. If library authors design the trait interface specifications to align with the desired behavior, implementers can’t misuse or incorrectly implement the traits. Consumers of the API can’t use them incorrectly either, and the compiler helps deduce types. This results in a seamless experience: we don’t need to read comments to use APIs correctly, but instead, we refer to documentation to understand API requirements, ensuring that code compiles and functions as intended. If you look at the front pages of Actix and Rocket, they both emphasize being type-safe. This distinction is not evident when implemented in other languages. Thanks to Rust’s abstractions, these frameworks feel like high-level programming languages. Web frameworks exist in C++, but they lack the ergonomics and ease of use that Rust frameworks like Actix and Rocket provide. These Rust frameworks offer the same level of convenience as other high-level frameworks in garbage-collected languages.

Operator Overloading (OO)

Operator overloading allows us to define custom behavior for operators on our own types, making them behave like built-in types. It’s important to note that operator overloading in Rust does not provide a way to create new operators; it involves implementing existing operators for custom types. This is accomplished by implementing appropriate traits for our type, which are defined abstractly and often use generic associated types (GATs). GATs provide the flexibility of producing different types as the result of an operation. For example, when adding two complex numbers, we don’t necessarily need to return another complex number; we can store the result in a Vec or another suitable type, then return it to the caller. Once we implement these traits, we can use either operators or methods, as they are equivalent.

use std::ops::Add;
use std::mem::size_of;

fn main() {
    let new = Ops::default();
    println!("{:?}", new + &new);
    println!("{:?}", new.add(&new));
}

#[derive(Debug, Copy, Clone, Default)]
struct Ops {
    n1: i32,
    n2: i32,
}

impl Add<&Self> for Ops {
    type Output = Vec<Self>;
    fn add(self, rhs: &Self) -> Self::Output {
        let mut vec = Vec::with_capacity(size_of::<Self>());
        let left = self.n1 + rhs.n1;
        let right = self.n2 + rhs.n2;
        vec.push(Self { n1: left, n2: right });
        vec
    }
}

Some traits can be derived using the #[derive] macro:

#[derive(PartialOrd, Ord, Eq, PartialEq)]

After deriving these traits, operators like ==, !=, <=, >=, >, and < become available for our types, as well as their corresponding methods.

Implementing Deref and DerefMut for a type allows it to behave like the built-in String type. For example:

struct Udt(String);

If you only want to provide read-only access methods for the String, you can avoid implementing DerefMut and instead implement methods directly on the impl block for your type. This approach restricts the use of mutable methods for your type. The implementations of these traits for the Udt struct can be found here. The Index and IndexMut traits follow a similar composition, overloading the indexing operator []. The difference is that indexing operations may panic if the index is out of bounds, as the return type isn’t wrapped in an Option.

As mentioned in the Rust Blog:

Despite their seeming simplicity, traits are a unifying concept that supports a wide range of use cases and patterns, without having to pile on additional language features.

Use Cases

Generic

Associated Types (GATs) also improve performance while maintaining abstraction. For example, the Chumsky parser library uses associated types to reduce unnecessary computation during validation logic. The library generates output only when needed, avoiding unnecessary computation. This is achieved using the associated type type Output<T>. When the Mode is in the Check state, Output<T> = (), ensuring that no computation is performed. It’s like selectively disabling expensive methods when they’re not needed. When the Mode is in the Emit state, Output<T> = T, allowing output generation. The Check and Emit states are represented by empty structs, resulting in no runtime overhead and static verification of computation during type checking.

The type state pattern helps establish relationships or invariants, without imposing runtime overhead or relying solely on compile-time verification. Although type state-oriented programming is not a first-class construct in Rust, it can be efficiently implemented using Rust’s empty types, move semantics, and GATs. The paper Session Types in Rust implements type states for channel types. Rust features that contribute to this implementation include:

  • Move semantics, which transfer ownership whenever the state changes.
  • The Send marker trait, allowing types to be used in channels.
  • Empty structs, enums, and phantom data for state transitions.
  • Traits and generic associated types to verify dual relationships.

These features collectively empower Rust programmers to express and enforce complex relationships between types while ensuring performance and safety.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK