4

Types for Python HTTP APIs: An Instagram Story

 2 years ago
source link: https://instagram-engineering.com/types-for-python-http-apis-an-instagram-story-d3c3a207fdb7
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Types for Python HTTP APIs: An Instagram Story

And we’re back! As we mentioned in the first part of our blog post series, Instagram Server is a Python monolith with several million lines of code and a few thousand Django endpoints.

This post is about how we use types to document and enforce a contract for our Python HTTP APIs. In the next few weeks, we’ll share details on more tools and techniques we’ve developed to manage our codebase’s quality.

Background

When you open up the Instagram app on your mobile client, it makes requests to our Python (Django) server over a JSON, HTTP API.

To give you some idea of the complexity of the API we expose to the mobile client, we have:

  • over 2000 endpoints on the server
  • over 200 top-level fields in the client data object that represents that image, video, or story in the app
  • 100s of engineers writing code for the server (and even more on the client!)
  • 100s of commits to the server each day that may modify the API to support new features

We use types to document and enforce a contract for our complex, evolving HTTP API.

Types

Let’s start at the beginning. PEP 484 introduced a syntax for adding type annotations to Python code. But why add type annotations at all?

Consider a function that retrieves a star wars character:

def get_character(id, calendar):
if id == 1000:
return Character(
id=1000,
name="Luke Skywalker",
birth_year="19BBY" if calendar == Calendar.BBY else ...
)
...

To understand the get_character function, you have to read its body.

  • it takes an integer character id
  • it takes a calendar system enum (e.g. BBY or “Before Battle of Yavin”)
  • it returns a character with fields id, name, and birth year

The function has an implicit contract that you have to re-establish every single time you read the code. But code is written once and read many times, so this doesn’t work well.

Further, it’s hard to verify that the callers of the function and the function body itself adhere to the implicit contract. In a large codebase, this can lead to bugs.

Consider instead the function with type annotations:

def get_character(id: int, calendar: Calendar) -> Character:
...

With type annotations there is an explicit contract. You only have to read the function signature to understand its input and output. A typechecker can statically verify that code conforms to the contract, eliminating an entire class of bugs!

Types for HTTP APIs

Let’s develop an HTTP API to retrieve a star wars character, and use type annotations to define an explicit contract for it.

The HTTP API should take the character id as a url parameter and the calendar system as a query parameter. It should return a JSON response for the character.

curl -X GET https://api.starwars.com/characters/1000?calendar=BBY{
"id": 1000,
"name": "Luke Skywalker",
"birth_year": "19BBY"
}

To implement this API in Django, you first register the url path and the view function responsible for taking a HTTP request to that url path and returning a response.

urlpatterns = [
url("characters/<id>/", get_character)
]

The view function takes the request and the url parameters (in this case, id) as input. The function parses and casts the calendar query parameter, fetches the character from a store, and returns a dictionary that is serialized as JSON and wrapped in a HTTP response.

def get_character(request: IGWSGIRequest, id: str) -> JsonResponse:
calendar = Calendar(request.GET.get("calendar", "BBY"))
character = Store.get_character(id, calendar)
return JsonResponse(asdict(character))

Although the view function has type-annotations, it does not define a strong, explicit contract for the HTTP API. From the signature, we don’t know the names or types of the query parameters, or the fields in the response or their types.

Instead, what if we could make the view function signature exactly the same as that of the earlier type-annotated function?

def get_character(id: int, calendar: Calendar) -> Character:
...

The function parameters can represent request parameters (url, query, or body parameters). The function return type can represent the content of the response. Then, we would have an explicit, easy-to-understand contract for the HTTP API that the typechecker can enforce.

Implementation

So, how can we implement this idea?

Let’s use a decorator to transform the strongly-typed view function to the Django view function. This design requires no changes to the Django framework. We can use the same routing, middleware, and other components that we are familiar with.

@api_view
def get_character(id: int, calendar: Calendar) -> Character:
...

Let’s dive into the implementation of the api_view decorator:

def api_view(view):
@functools.wraps(view)
def django_view(request, *args, **kwargs):
params = {
param_name: param.annotation(extract(request, param))
for param_name, param in inspect.signature(view).parameters.items()
}
data = view(**params)
return JsonResponse(asdict(data))

return django_view

That’s a dense bit of code. Let’s go over it piece by piece.

We take as input the strongly-typed view, and wrap it into a regular Django view function that we return:

def api_view(view):
@functools.wraps(view)
def django_view(request, *args, **kwargs):
...
return django_view

Now let’s look at the implementation of the Django view. First we have to construct the arguments to the strongly-typed view function. We use introspection with the inspect module to get the signature of the strongly-typed view function and iterate over its parameters:

for param_name, param in inspect.signature(view).parameters.items()

For each of the parameters, we call an extract function, which extracts the parameter value from the request.

Then, we cast the parameter value to the expected type from the signature (e.g. cast the calendar system from a string to an enum value).

param.annotation(extract(request, param))

We call the strongly-typed view function with the parameter arguments that we’ve constructed:

data = view(**params)

It returns a strongly-typed class (e.g. Character). We take that class, transform it into a dictionary, and wrap it into a JSON, HTTP response:

return JsonResponse(asdict(data))

Great! So now we have a Django view that can wrap the strongly-typed view. Finally, let’s take a look at that extract function:

def extract(request: HttpRequest, param: Parameter) -> Any:
if request.resolver_match.route.contains(f"<{param}>"):
return request.resolver_match.kwargs.get(param.name)
else:
return request.GET.get(param.name)

Each parameter may be a url parameter or a query parameter. The url path of the request (the url path we registered as the first step) is accessible on the Django URL resolver’s route object. We check if the parameter name is present in the path. If it is, then it’s a url parameter, and we can extract it from the request in one way. Otherwise, it’s a query parameter, and we can extract it in another way.

And that’s it! This is a simplified implementation but it illustrates the main ideas.

Data types

The type used to represent the HTTP response content (e.g. Character) can use either a dataclass or a typed dictionary.

A dataclass is a concise way to define a class that represents data.

from dataclasses import dataclass@dataclass(frozen=True)
class Character:
id: int
name: str
birth_year: strluke = Character(
id=1000,
name="Luke Skywalker",
birth_year="19BBY"
)

Dataclasses are the preferred way to model HTTP response objects at Instagram. They:

  • automatically generate boilerplate constructors, equals, and other methods
  • are understood by typecheckers and can be typechecked
  • can enforce immutability with frozen=True
  • are available in the Python 3.7 standard library, or as a backport on the Python Package Index

Unfortunately at Instagram, we have a legacy codebase which uses large, untyped dictionaries passed between functions and modules. It would be difficult to migrate all this code from dictionaries to dataclasses. So, while we use dataclasses for new code, we use typed dictionaries for legacy code.

Typed dictionaries allow us to add type annotations for dictionary client objects and benefit from typechecking, without changing runtime behavior.

from mypy_extensions import TypedDictclass Character(TypedDict):
id: int
name: str
birth_year: str
luke: Character = {"id": 1000}
luke["name"] = "Luke Skywalker"luke["birth_year"] = 19 # type error, birth_year expects a str
luke["invalid_key"] # type error, invalid_key does not exist

Error handling

The view function expects us to return a character. What do we do when we want to return an error to the client?

We can raise an exception, which the framework will catch and translate to an HTTP error response.

@api_view("GET")
def get_character(id: str, calendar: Calendar) -> Character:
try:
return Store.get_character(id)
except CharacterNotFound:
raise Http404Exception()

This example also shows the HTTP method in the decorator, that specifies the allowed HTTP methods for this API.

Tooling

The HTTP API is strongly typed with a HTTP method, request types, and response types. We can introspect the API and determine that it should take a GET request with a string id in the URL path and a calendar enum in the query string, and it will return a JSON response with a Character.

What can we do with all of this information?

OpenAPI is an API description format with a rich set of tools built on top of it. If we write a bit of code to introspect our endpoints and generate an OpenAPI specification from them, we can take advantage of that ecosystem of tools.

paths:
/characters/{id}:
get:
parameters:
- in: path
name: id
schema:
type: integer
required: true
- in: query
name: calendar
schema:
type: string
enum: ["BBY"]
responses:
'200':
content:
application/json:
schema:
type: object
...

We can generate HTTP API documentation for the get_character API that includes the names, types, and documentation for the request and response. This is the right level of abstraction for client developers who want to make a request to the endpoint; they shouldn't have to read Python code.

API documentation

There are additional tools we could build such as a "try it out" tool to make requests in the browser, so developers can hit their HTTP APIs without having to write code. We could even code generate type-safe clients for end-to-end type-safety. With this we can have a strongly-typed API on the server and call it with a strongly-typed API on the client.

We could also build a backwards-compatibility checker. What happens if we release a version of the server that has id, name, and birth_year required fields, and later realize that we don't know the birth year of every character? We want to make birth year optional, but old clients that expect birth year can crash. Although we have an explicit type for the API, that explicit type can change (birth year goes from required to optional). We can keep track of the changes to the API, and warn developers as a part of their workflow if they make changes that can break clients.

Success at Instagram

There are a spectrum of application protocols that machines can use to communicate with each other.

On one end of the spectrum, we have RPC frameworks like Thrift or gRPC. They generally define strong types for the request and the response, and generate code on the client and server to make requests. They may not communicate over HTTP or even serialize their data into JSON.

On the other end of the spectrum, we have unstructured Python web frameworks, with no explicit contract for requests or responses. The approach we have taken captures many of the benefits of more structured frameworks, while continuing to communicate via HTTP + JSON with minimal application code changes.

It’s important to note that this is not a new idea. In strongly-typed languages, there are many frameworks that provide an API like the one we’ve described. In Python, there is also prior art with the APIStar framework.

We’ve successfully rolled out types for HTTP APIs at Instagram. We’ve been able to adopt this across our codebase since the framework is easy and safe to adopt for existing views. The value is clear to product engineers: the generated documentation becomes the means by which server and client engineers can communicate.

Many thanks to my team members: Keith Blaha (who co-built the framework), Carl Meyer, Jeremy Fu, Chris Leung, Mark Vismonte, Jimmy Lai, Jennifer Taylor, Benjamin Woodruff, and Lee Bierman.

If you want to learn more about this work or are interested joining one of our engineering teams, please visit our careers page, follow us on Facebook or on Twitter.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK