0

A Retrospective on Requests

 6 months ago
source link: https://blog.ian.stapletoncordas.co/2024/02/a-retrospective-on-requests
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

API Design

One thing many people can likely agree on is the design of the interface for python-requests is (mostly) intuitive and lends itself to rapid development, prototyping, and is very nice when working in a REPL. If you're unfamiliar, here are some examples:

import requests

# Basic HTTP GET
url: str = "https://requests.readthedocs.io/en/latest/"
r = requests.get(url)
print(r.text)

# Does the right thing with TLS
url: str = "https://expired.badssl.com/"
try:
   r = requests.get(url)
except requests.exceptions.SSLError:
   print("Unable to verify expired certificate")


# POSTing JSON data easily without having to encode it yourself or specify
# headers
url: str = "https://httpbin.org/post"
r = requests.post(url, json={"some": "data"})

The one problem here is how often people stop at the functions defined at the root of the module. The way these functions work is by each instantiating a new requests.Session. The Session object is so often forgotten and ignored but can dramatically improve one's experience with python-requests. It is what provides connection pooling (so if you're making repeated calls to the same domain, it will attempt to keep connections alive for you, pool them, and reuse them so there's less time spent setting up a connection) it provides a way to configure many of the other aspects of the request that you otherwise need to pass as function parameters, and many other ergonomic benefits.

What's Wrong with the API

Well for one thing, what you see here is far from all of the API surface area. There are many more parameters to the function, some of which cooperate with each other and some of which don't. Above we already had the example of sending JSON data; but before that many people have relied on other data serialization formats. The next simplest looking one is application/x-www-form-urlencoded. An example of how to do that with python-requests is:

import requests

s: requests.Session = requests.Session()
url: str = "https://httpbin.org/post"
r = s.post(url, data={"urlencode", "me"})

After that, because python-requests heavily depends on urllib3, we can easily support multipart/form-data. That can be done in a few ways:

import requests

s: requests.Session = requests.Session()
url: str = "https://httpbin.org/post"
# Let's just send a file
r = s.post(url, files={"fileA": open(filename, "rb")})
# NOTE: Do not use files this way, this is purely for simplicity of example
# code

# Let's send a file and some other form elements
r = s.post(
   url,
   data={"wait": "is this urlencoded too?"},
   files={"fileA": open(filename, "rb")},
)

# Let's send two form fields, a basic file, and then a file with a custom
# name, custom part content-type, and additional custom headers for the
# part
r = s.post(
   url,
   data={"wait": "is this urlencoded too?", "form-encoded": "no"},
   files={
      "fileA": open(filename, "rb"),
      "fileB": (
         "custom-filename.xml",
         open(other_filename, "rb"),
         "application/xml",
         {"X-Custom-Part-Header": "value"},
   },
)

If you're unavailable, these will look like:

Content-Length: ...
Content-Type: multipart/form-data; boundary=---------------------------9051914041544843365972754266

-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="wait"

is this urlencoded too?
-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="form-encoded"

no
-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="fileA"; filename="a.txt"
Content-Type: text/plain

Content of a.txt.

-----------------------------9051914041544843365972754266
Content-Disposition: form-data; name="fileB"; filename="custom-filename.xml"
Content-Type: application/xml
X-Custom-Part-Header: value

<root><elems><elem>Item</elem></elems></root>

-----------------------------9051914041544843365972754266--

This is primarily the last one but it has all the items because each request builds upon the last.

Using data here, makes things confusing for a lot of people. Some expect that value to be URL form encoded but it doesn't. Sometimes they expect that a file in files be URL form encoded. And yes, these use-cases are documented but if people don't know that they want to use multipart/form-data encoding, they might not look at the section that describes that interaction.

Furthermore, what happens if someone does:

import requests

s: requests.Session = requests.Session()
url: str = "https://httpbin.org/post"
# This is broken, if you see this searching for how to handle things, don't
# copy this but read below
r = s.post(url, json={"a": "b"}, files={"fileA": open(filename, "rb")})

I expect no one could guess what the interaction unless they've already encountered this. The actual behaviour here is that the json argument is completely ignored (well not completely, we serialize the data and then throw it away).

This leads to one other way in which users can easily frustrate themselves:

import requests

s: requests.Session = requests.Session()
url: str = "https://httpbin.org/post"
# This is broken, if you see this searching for how to handle things, don't
# copy this but read below
r = s.post(url, json={"a": "b"}, headers={"Content-Type": "application/xml"})
r = s.post(url, files={"a": open(filename, "rb")}, headers={"Content-Type": "multipart/form-data"})
r = s.post(url, data={"form": "encoded"}, headers={"Content-Type": "multipart/form-data"})
r = s.post(url, data={"form": "encoded"}, headers={"Content-Length": "1000000"})
# This may not do what some people expect. Many people tend to expect this
# to  **replace** c=d to the query string, but instead it appends c=e so
# the query string becomes a=b&c=d&c=e. In reality, there's no intuitive
# behaviour here for everyone
r = s.post("https://httpbin.org/get?a=b&c=d", params={"c": "e"})

If you specify your own header above but are relying on Requests to serialize native Python objects for you, you risk using the wrong Content-Type which will cause at best 400 Bad Request responses and at worst, very wrong behaviour in a server. If you override the Content-Length header, that can cause many other issues including intermediaries either terminating your request before it reaches the destination, or them rewriting the header (depending on the type of intermediary). Either way, it's not the behaviour indicated by the code.

How Could It Be Better?

This is a great API for a proof of concept and it makes things simple. That said, it hides a lot of what it does and in some ways that causes issues for users.

There are many ways this could be improved for users. There's no one right answer here. Here are some ideas though that can help with some of the problems above:

  1. Keep the semi-functional API but change how the parameters work:

    1. Get rid of json and files, consolidate everything into data but provide classes like JSONData or MultipartFormData or UrlEncodedFormData which can take the basic Python structures. These classes would all implement a protocol (or interface/abstract base class) allowing users that care about customizing aspects of this to do so. It's them very explicit and obvious what the behaviour is.
    1. Make the class above responsible for validating that conflicting headers have not been specified and raising an exception otherwise. (Ideally this logic is baked into the class users would inherit from so they need specify an attribute with the headers they care about (with reasonable defaults) and the base class would do the rest.)
    1. Provide something a bit nicer than the class names suggested above that is still clear enough
  1. Choose something a bit more familiar to other languages:

    1. Many other languages have a Request object that can be built up and operated on. Many have excellent APIs around the way headers can be manipulated.
    1. Provide a builder object for the Request object to easily chain methods together (this looks like the builders in pyca/cryptography or more generically the builder pattern)

I think the first would be the least disruptive and most likely to be accepted by users, but to make it work well, I'd be pretty firm about data not accepting anything that doesn't implement the protocol.

The latter feels better long term though because I don't know many developers that only work in Python and never in another language such that they haven't seen the benefits of that pattern.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK