26

Writing and publishing a Python module in Rust

 4 years ago
source link: https://blog.yossarian.net/2020/08/02/Writing-and-publishing-a-python-module-in-rust
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Aug 2, 2020

Tags:programming, devblog , python , rust

This post is a quick walkthrough of how I wrote a Python library, procmaps , in nothing but Rust. It uses PyO3 for the bindings and maturin to manage the build (as well as produce manylinux1 -compatible wheels).

The code is, of course, available on GitHub , and can be installed directly with a modern Python (3.5+) via pip without a local Rust install :

$ pip3 install procmaps

Procmaps?

procmaps is an extremely small Python library, backed by a similarly small Rust library .

All it does is parse “maps” files, best known for their presence under procfs on Linux, into a list of Map objects. Each Map , in turn, contains the basic attributes of the mapped memory region.

By their Python attributes:

import os
import procmaps

# also: from_path, from_str
# N.B.: named map_ instead of map to avoid shadowing the map function
map_ = procmaps.from_pid(os.getpid())[0]

map_.begin_address  # the begin address for the mapped region
map_.end_address    # the end address for the mapped region
map_.is_readable    # is the mapped region readable?
map_.is_writable    # is the mapped region writable?
map_.is_executable  # is the mapped region executable?
map_.is_shared      # is the mapped region shared with other processes?
map_.is_private     # is the mapped region private (i.e., copy-on-write)?
map_.offset         # the offset into the region's source that the region originates from
map_.device         # a tuple of (major, minor) for the device that the region's source is on
map_.inode          # the inode of the source for the region
map_.pathname       # the "pathname" field for the region, or None if an anonymous map

Critically: apart from the import s and the os.getpid() call, all of the code above calls directly into compiled Rust .

Motivation

The motivations behind procmaps are twofold.

First: I do program analysis and instrumentation research at my day job. Time and time again, I need to obtain information about the memory layout of a program that I’m instrumenting (or would like to instrument). This almost always means opening /proc/<pid>/maps , writing an ad-hoc parser, getting the field(s) I want, and then getting on with my life.

Doing this over and over again has made me realize that it’s an ideal task for a small, self-contained Rust library:

  • The “maps” format is line-oriented and practically frozen, with no ambiguities. Rust has many high quality PEG and parser combinator libraries that are well suited to the task.
  • Writing ad-hoc parsers for it is bad™, especially when those parsers are written in C and/or C++.
  • Having a small library with a small API surface would make exposure to other languages (including C and C++) trivial.

Second: I started learning Rust about a year ago, and have been looking for new challenges in it. Interoperating with another language (especially one with radically different memory semantics, like Python) is an obvious choice.

Structure

The procmaps module is a plain old Rust crate. Really.

The only differences are in the Cargo.toml:

[lib]
crate-type = ["cdylib"]

[package.metadata.maturin]
classifier = [
  "Programming Language :: Rust",
  "Operating System :: POSIX :: Linux",
]

(Other settings under package.metadata.maturin are available for e.g. managing Python-side dependencies, but procmaps doesn’t need them. More details are available here .)

In terms of code, the crate is structured like a normal Rust library. PyO3 only requires a few pieces of sugar to promote everything into Python-land:

Modules

Python modules are created by decorating a Rust function with #[pymodule] .

This function then uses the functions of the PyModule argument that it takes to load the module’s functions and classes.

For example, here is the Python-visible procmaps module in its entirety :

#[pymodule]
fn procmaps(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<Map>()?;
    m.add_wrapped(wrap_pyfunction!(from_pid))?;
    m.add_wrapped(wrap_pyfunction!(from_path))?;
    m.add_wrapped(wrap_pyfunction!(from_str))?;

    Ok(())
}

Functions

Module level functions are trivial to create: they’re just normal Rust functions, marked with #[pyfunction] . They’re loaded into modules via add_wrapped + wrap_pyfunction! , as seen above. Alternatively, they can be created within a module definition (i.e., nested within the #[pymodule] ) function via the #[pyfn] decorator.

Python-visible functions return a PyResult<T> , where T implements IntoPy<PyObject> . PyO3 helpfully provides an implementation of this trait for many core types; a full table is here . This includes Option<T> , making it painless to turn Rust-level functions that return Option s into Python-level functions that can return None .

procmaps doesn’t make use of them, but PyO3 also supports variadic arguments and keyword arguments. Details on those are available here .

Here’s a trivial Python-exposed function that does integer division, returning None if division by zero is requested:

#[pyfunction]
fn idiv(dividend: i64, divisor: i64) -> PyResult<Option<i64>> {
  if divisor == 0 {
    Ok(None)
  } else {
    Ok(Some(dividend / divisor))
  }
}

Classes

Classes are loaded into modules via the add_class function, as seen in the module definition.

Just like modules, they’re managed almost entirely behind a single decorator, this time on a Rust struct. Here is the entirety of the procmaps.Map class definition:

#[pyclass]
struct Map {
    inner: rsprocmaps::Map,
}

procmaps doesn’t need them, but trivial getters and setters can be added to the members of a class with #[pyo3(get, set)] . For example, the following creates a Point class:

#[pyclass]
struct Point {
  #[pyo3(get, set)]
  x: i64,
  #[pyo3(get, set)]
  y: i64,
}

…for which the following would be possible in Python:

# get_unit_point not shown above
from pointlib import get_unit_point

p = get_unit_point()
print(p.x, p.y)

p.x = 100
p.y = -p.x
print(p.x, p.y)

Using #[pyclass] on Foo auto-implements IntoPy<PyObject> for Foo , making it easy to return your custom classes from any function (as above) or member method (as below).

Member methods

Just as Python-visible classes are defined via #[pyclass] on Rust struct s, Python-visible member methods are declared via #[pymethods] attribute on Rust impl s for those structures.

Member methods return PyResult<T> , just like functions do:

#[pymethods]
impl Point {
  fn invert(&self) -> PyResult<Point> {
    Ok(Point { x: self.y, y: self.x})
  }
}

…allows for the following:

# get_unit_point not shown above
from pointlib import get_unit_point

p = get_unit_point()
p_inv = p.invert()

By default, PyO3 forbids the creation of Rust-defined classes within Python code. To allow their creation, just add a function with the #[new] attribute to the #[pymethods] impl block. This creates a __new__ Python method rather than __init__ ; PyO3 doesn’t support the latter.

For example, here’s a constructor for the contrived Point class above:

#[pymethods]
impl Point {
  #[new]
  fn new(x: i64, y: i64) -> Self {
    Point { x, y }
  }
}

…which allows for:

from pointlib import Point

p = Point(100, 0)
p_inv = p.invert()
assert p.y == 100

Exceptions and error propagation

As mentioned above, (most) Python-visible functions and methods return PyResult<T> .

The Err half of PyResult is PyErr , and these values get propagated as Python exceptions. The pyo3::exceptions module contains structures that parallel the standard Python exceptions, each of which provides a py_err(String) function to produce an appropriate PyErr .

Creating a brand new Python-level exception takes a single line with the create_exception! macro. Here’s how procmaps creates a procmaps.ParseError exception that inherits from the standard Python Exception class:

use pyo3::exceptions::Exception;

// N.B.: The first argument is the module name,
// i.e. the function declared with #[pymodule].
create_exception!(procmaps, ParseError, Exception);

Similarly, marshalling Rust Error types into PyErr s is as simple as impl std::convert::From<ErrorType> for PyErr .

Here’s how procmaps turns some of its errors into standard Python IOError s and others into the custom procmaps.ParseError exception:

// N.B.: The newtype here is only necessary because Error comes from an
// external crate (rsprocmaps).
struct ProcmapsError(Error);
impl std::convert::From<ProcmapsError> for PyErr {
    fn from(err: ProcmapsError) -> PyErr {
        match err.0 {
            Error::Io(e) => IOError::py_err(e.to_string()),
            Error::ParseError(e) => ParseError::py_err(e.to_string()),
            Error::WidthError(e) => ParseError::py_err(e.to_string()),
        }
    }
}

Compilation and distribution

With everything above, cargo build just works — it produces a Python-loadable shared object.

Unfortunately, it does it using the cdylib naming convention, meaning that cargo build for procmaps produces libprocmaps.so , rather than one of the naming conventions that Python knows how to look for when searching $PYTHONPATH .

This is where maturin comes in: once installed, a single maturin build in the crate root puts an appropriately named pip -compatible wheel in target/wheels .

It gets even better: maturin develop will install the compiled module directly into the current virtual environment, making local development as simple as:

$ python3 -m venv env
$ source env/bin/activate
(env) $ pip3 install maturin
(env) $ maturin develop
$ python3
> import procmaps

procmaps has a handy Makefile that wraps all of that; running the compiled module locally is a single make develop away.

Distribution is slightly more involved: maturin develop builds wheels that are compatible with the local machine, but further restrictions on symbol versions and linkages are required to ensure that a binary wheel runs on a large variety of Linux versions and distributions.

Compliance with these constraints is normally enforced in one of two ways:

  1. Packages are compiled into binary wheels, and then audited (and potentially repaired) via the PyPA’s auditwheel before release.
  2. Packages are compiled into binary wheels within a wholly controlled runtime environment, such as the PyPa’s manylinux Docker containers.

Distribution with maturin takes the latter approach: the maturin developers have derived a Rust build container from the PyPa’s standard manylinux container, making fully compatible builds (again, from the crate root) as simple as:

# optional: do `build --release` for release-optimized builds
$ docker run --rm -v $(pwd):/io konstin2/maturin build

This command, like a normal maturin build , drops the compiled wheel(s) into target/wheels . Because it runs inside of the standard manylinux container, it can and does automatically build wheels for a wide variety of Python versions (Python 3.5 through 3.8, as of writing).

From here, distribution to PyPI is as simple as twine upload target/wheels/* or maturin publish . procmaps currently uses the former, as releases are handled via GitHub Actions using the PyPA’s excellent gh-action-pypi-publish action.

Voilá: a Python module, written completely in Rust, that can be installed on the vast majority of Linux distributions with absolutely no dependencies on Rust itself. Even the non- maturin metadata in Cargo.toml is propagated correctly!

MZJFvqM.png!web

Wrapup

I only ran into one small hiccup while working on procmaps — I tried to add a Map.__contains__ method to allow for inclusion checks with the in protocol, e.g.:

fn __contains__(&self, addr: u64) -> PyResult<bool> {
    Ok(addr >= self.inner.address_range.begin && addr < self.inner.address_range.end)
}

…but this didn’t work, for whatever reason, despite working when called manually:

>>> 4194304 in map_
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument of type 'Map' is not iterable

>>> map_.__contains__(4194304)
True

There’s probably a reasonable explanation for this in the Python data model that I haven’t figured out.

By and large, the process of writing a Python module in Rust was extremely pleasant — I didn’t have to write a single line of Python (or even Python-specific configuration) until I wanted to add unit tests. Both pyO3 and maturin are incredibly polished, and the PyPA’s efforts to provide manylinux build environments made compatible builds a breeze.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK