5

Asyncio, twisted, tornado, gevent walk into a bar...

 10 months ago
source link: https://www.bitecode.dev/p/asyncio-twisted-tornado-gevent-walk
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Summary

Concurrency has a lot to do with sharing one resource, and Python has dedicated tools to deal with that depending on the resource you must share.

If you have to share one CPU while waiting on the network, then the specialized tools for this are asyncio, twisted, trio, gevent, etc.

Asyncio is the current standard to do this, but tornado, gevent and twisted solved this problem more than a decade ago. While trio and curio are showing us what the future could look like

But chances are, you should use none of them.

Different forms of concurrency

As Rob Pike said, concurrency "is about dealing with a lot of things as once", unlike parallelism, which is "doing lots of things at once".

The typical analogy is this:

  • concurrency is having two lines of customers ordering from a one cashier;

  • parallelism is having two lines of customers ordering from two cashiers.

Which, means, if you think about it, that concurrency has a lot to do with sharing one resource.

The question is, which resource?

In a computer, you may have to share different things:

  • Battery charge.

  • CPU calculation power.

  • RAM space.

  • Disk space and throughput.

  • Network throughput.

  • File system handles.

  • User input.

  • Screen real estate.

There are tons of software that is written solely to deal with the fact we have to share.

asyncio, twisted, tornado and gevent are such tools, specialized to let you share more efficiently a CPU core between several things that access the network.

Now, it seems a bit counterintuitive to tie "CPU core" and "network" performances together.

But when you talk to the network, you send messages to the outside world, and then things gets out of your control. The outside world can answer each message very quickly, or not. You can't affect it that much.

Meanwhile, your program sits there, waiting for the answer to the message. And while it sits, what does it do in Python by default? It keeps the CPU core for itself.

Sure, this waiting and seating is often only a few milliseconds, so to a human it seems very quick at first glance. But one millisecond is a huge time for a computer that does billions of things in the blink of an eye.

asyncio, twisted, tornado and gevent have one trick up their sleeve: they can send a message to the network, and while waiting for the response, wake up another part of the program to do some other work. And they can do that with many messages in a row. While waiting for the network, they can let other parts of the program use the CPU core.

Note that they only can speed up waiting on the network. They will not make two calculations at the same time (can't use several CPU cores like with multiprocessing) and you can't speed up waiting on other types of I/O (like when you use threads to not block on user input or disk writes).

All in all, they are good for writing things like bots (web crawler, chat bots, network sniffers, etc.) and servers (web servers, proxies, ...). For maximum benefits, it's possible to use them inside other concurrency tools, such as multiprocessing or multithreading. You can perfectly have 4 processes, each of them containing 4 threads (so 16 threads in total), and each thread with their own asyncio loop running.

But let's not get side tracked, and focus on the question at hand: what are asyncio, tornado or twisted?

A concrete example

Let's take a few URLs from a completely random web site. Our task will be to get all the titles of all those pages.

Here is how to do that synchronously, with only the stdlib:

import re
import time
from urllib.request import urlopen, Request

urls = [
    "https://www.bitecode.dev/p/relieving-your-python-packaging-pain",
    "https://www.bitecode.dev/p/hype-cycles",
    "https://www.bitecode.dev/p/why-not-tell-people-to-simply-use",
    "https://www.bitecode.dev/p/nobody-ever-paid-me-for-code",
    "https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager",
    "https://www.bitecode.dev/p/the-costly-mistake-so-many-makes",
    "https://www.bitecode.dev/p/the-weirdest-python-keyword",
]


title_pattern = re.compile(r"<title[^>]*>(.*?)</title>", re.IGNORECASE)

# We'll pretend to be Firefox or substack is going to kick us
user_agent = (
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0"
)

# Let's time how long all this takes
global_start_time = time.time()

for url in urls:
    # let's also time each page processing
    start_time = time.time()

    with urlopen(Request(url, headers={"User-Agent": user_agent})) as response:
        html_content = response.read().decode("utf-8")
        match = title_pattern.search(html_content)
        title = match.group(1) if match else "Unknown"
        print(f"URL: {url}\nTitle: {title}")

    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Time taken: {elapsed_time:.4f} seconds\n")

global_end_time = time.time()
global_elapsed_time = global_end_time - global_start_time
print(f"Total time taken: {global_elapsed_time:.4f} seconds")

If we run the script, we can see that, individually, each page load are not that long:

URL: https://www.bitecode.dev/p/relieving-your-python-packaging-pain
Title: Relieving your Python packaging pain - Bite code!
Time taken: 0.6022 seconds

URL: https://www.bitecode.dev/p/hype-cycles
Title: XML is the future - Bite code!
Time taken: 0.1813 seconds

URL: https://www.bitecode.dev/p/why-not-tell-people-to-simply-use
Title: Why not tell people to "simply" use pyenv, poetry or anaconda
Time taken: 0.9496 seconds

URL: https://www.bitecode.dev/p/nobody-ever-paid-me-for-code
Title: Nobody ever paid me for code - Bite code!
Time taken: 0.3314 seconds

URL: https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager
Title: Python cocktail: mix a context manager and an iterator in equal parts
Time taken: 0.2849 seconds

URL: https://www.bitecode.dev/p/the-costly-mistake-so-many-makes
Title: The costly mistake so many make with numpy and pandas
Time taken: 0.3622 seconds

URL: https://www.bitecode.dev/p/the-weirdest-python-keyword
Title: The weirdest Python keyword - Bite code!
Time taken: 0.5032 seconds

Total time taken: 3.2149 seconds

However, the sum of all of them is quite a lot, because all networks accesses are sequential, and the program waits for each one to finish before starting the next.

Now consider the equivalent with asyncio (this requires to install httpx since asyncio doesn't come with an HTTP client):

import asyncio
import re
import time

import httpx

urls = [
    "https://www.bitecode.dev/p/relieving-your-python-packaging-pain",
    "https://www.bitecode.dev/p/hype-cycles",
    "https://www.bitecode.dev/p/why-not-tell-people-to-simply-use",
    "https://www.bitecode.dev/p/nobody-ever-paid-me-for-code",
    "https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager",
    "https://www.bitecode.dev/p/the-costly-mistake-so-many-makes",
    "https://www.bitecode.dev/p/the-weirdest-python-keyword",
]

title_pattern = re.compile(r"<title[^>]*>(.*?)</title>", re.IGNORECASE)

user_agent = (
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0"
)

# fetch_url() is the concurrency unit for this program. We can start many
# of them and each will wait for the response while letting other tasks run.
async def fetch_url(url):
    start_time = time.time()

    async with httpx.AsyncClient() as client:
        response = await client.get(url, headers={"User-Agent": user_agent})
        match = title_pattern.search(response.text)
        title = match.group(1) if match else "Unknown"
        print(f"URL: {url}\nTitle: {title}")

    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Time taken for {url}: {elapsed_time:.4f} seconds\n")


async def main():
    global_start_time = time.time()

    await asyncio.gather(*map(fetch_url, urls))

    global_end_time = time.time()
    global_elapsed_time = global_end_time - global_start_time
    print(f"Total time taken for all URLs: {global_elapsed_time:.4f} seconds")


asyncio.run(main())

While the code is more complex, the performances are much better:

# URL: https://www.bitecode.dev/p/hype-cycles
# Title: XML is the future - Bite code!
# Time taken for https://www.bitecode.dev/p/hype-cycles: 0.1750 seconds

# URL: https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager
# Title: Python cocktail: mix a context manager and an iterator in equal parts
# Time taken for https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager: 0.1656 seconds

# URL: https://www.bitecode.dev/p/the-weirdest-python-keyword
# Title: The weirdest Python keyword - Bite code!
# Time taken for https://www.bitecode.dev/p/the-weirdest-python-keyword: 0.1636 seconds

# URL: https://www.bitecode.dev/p/the-costly-mistake-so-many-makes
# Title: The costly mistake so many make with numpy and pandas
# Time taken for https://www.bitecode.dev/p/the-costly-mistake-so-many-makes: 0.1803 seconds

# URL: https://www.bitecode.dev/p/nobody-ever-paid-me-for-code
# Title: Nobody ever paid me for code - Bite code!
# Time taken for https://www.bitecode.dev/p/nobody-ever-paid-me-for-code: 0.2661 seconds

# URL: https://www.bitecode.dev/p/why-not-tell-people-to-simply-use
# Title: Why not tell people to "simply" use pyenv, poetry or anaconda
# Time taken for https://www.bitecode.dev/p/why-not-tell-people-to-simply-use: 0.2938 seconds

# URL: https://www.bitecode.dev/p/relieving-your-python-packaging-pain
# Title: Relieving your Python packaging pain - Bite code!
# Time taken for https://www.bitecode.dev/p/relieving-your-python-packaging-pain: 0.5334 seconds

# Total time taken for all URLs: 0.5335 seconds

In the end, we are 6 times faster, and with more URLs, the advantage would grow.

That's because while the code of all of those tasks cannot run at the same time, asyncio at least makes sure when one task waits on the network, it switches to let another task run.

The await and the async with in fetch_url() tell asyncio that, at those lines, we are going to ask something from the network, and so it can switch to another fetch_url() task.

So what's the deal with asyncio, twisted, gevent, trio and all that stuff?

All those libraries solve the same problem, but there are so many, so it can be confusing.

Let's dive in.

asyncio

asyncio is the modern module for asynchronous network programming provided with the python stdlib since 3.4. In other words, it's the default stuff at your disposal if you want to code something without waiting on the network.

asyncio replaces the old deprecated asyncore module. It is quite low level, so while you can manually code most network-related things with it, you are still at the level of TCP or UDP. If you want higher-level protocols, like FTP, HTTP or SSH, you have to either code it yourself, or install a third party library.

Because asyncio is the default solution, it has a the biggest ecosystem of 3rd party libs, and pretty much everything async strives to be compatible with it directly, or through compatibility layers like anyio.

Twisted

20 years ago, there was no asyncio, there was no async/await, nodejs didn't exist and Python 3 was half a decade away. Yet, it was the .com bubble, everything needed to be connected now. And so was born twisted, the grandfather of all the asynchronous frameworks we have today. Twisted ecosystem grew to include everything, from mail to ssh.

To this day, twisted is still a robust and versatile tool. But you do pay the price of its age. It doesn't follow PEP8 very well, and the design lean on the heavy size.

Here is a typical asyncio http request:

import httpx
import asyncio

async def fetch_url(url):
    async with httpx.AsyncClient() as client:
        response = await client.get("url")
        print("Response received")

asyncio.run(fetch_url("https://www.bitecode.dev/p/relieving-your-python-packaging-pain"))

And here is the code you will find in the twisted docs:

from twisted.internet import reactor
from twisted.web.client import Agent
from twisted.web.http_headers import Headers

agent = Agent(reactor)

d = agent.request(
    b"GET",
    b"https://www.bitecode.dev/p/relieving-your-python-packaging-pain"
    Headers({"User-Agent": ["Twisted Web Client Example"]}),
    None
)

def cbResponse(ignored):
    print("Response received")

d.addCallback(cbResponse)

def cbShutdown(ignored):
    reactor.stop()

d.addBoth(cbShutdown)

reactor.run()

To be fair, the code can be reduced to:

from twisted.internet import reactor
from twisted.web.client import Agent

agent = Agent(reactor)

d = agent.request( b"GET", b"https://www.bitecode.dev/p/relieving-your-python-packaging-pain")
d.addCallback(lambda ignored: print("Response received"))
d.addBoth(lambda ignored: reactor.stop())

reactor.run()

But in the end, you still get a lot of drawbacks:

  • The doc, as you noticed, doesn't have your back.

  • There are too many things to know. Wants SSL support? You need to install twisted[tls], not twisted. You have to use bytes, not strings. You have to manually tear down the reactor.

  • Twisted does support async/await, but I leave you figuring out how to turn this snippet into using it. You'll see.

  • ChatGPT will not help you much. Most twisted code is old, and use the old APIs and code style. You'll get... twisted results.

  • Google will find many examples still using yield and other contraptions.

  • The deferred system is like "futures" or "promises" in other tools. Until it's not.

  • High levels libs are hard to find.

For the last point, you may think it's not important, but it's easy to find httpx or aiohttp for asyncio. However, unless you know the ecosystem, good luck finding out the existence of treq, the high-level HTTP defacto lib. Which turns the code into a much better:

import treq
from twisted.internet import reactor

def done(response):
    print("Response received")
    reactor.stop()

deferred = treq.get("https://www.bitecode.dev/p/relieving-your-python-packaging-pain")
.addCallback(done)

reactor.run()

I do appreciate that asynchronous operations are automatically scheduled, though. Having to call asyncio.create_task() is one of the things I hate in Python and having more natural, JS-like calls, is much simpler.

In short, the product is good, but the developer experience is not. And I say that as a co-author of Expert Twisted.

When I was younger, I got an interview to work for Jamendo. The boss said they were interested in my profile solely because I heard of Twisted. Not because I knew how to use it. Heard. It was hard to use so few people did.

Also, I got rejected because I wanted to work from home. Post COVID it seems funny, doesn't it?

Tornado

Tornado was developed after Twisted, by FriendFeed, at this weird 2005-2015 web dev period where everything needed to be social web scale. It was like Twisted, but tooted to be faster, and was higher level. Out of the box, the HTTP story is way nicer.

Today, you are unlikely to use Tornado unless you work at Facebook or contribute to jupyter. After all, if you want to make async web things, the default tool is FastAPI in 2023.

gevent

Gevent is a weird one for me. It came about in 2009, the same year as Tornado, but with a fundamentally different design. Instead of attempting to provide an asychronous API, it decided to do black magic. When you use gevent, you call from gevent import monkey; monkey.patch_all() and it changes the underlying mechanism of Python networking, making everything non-blocking.

I used to fear gevent at the time:

  • You had to compile it from source, or use eggs, and that had too many ways to fail.

  • Monkey patching is brittle by nature, so you rolled the dice every time you used it.

  • Task switching is implicit, and you never knew what dragon would await.

So I avoided it like the plague.

Ironically, today, gevent is becoming quite appealing.

Thanks to wheels, installing it is simple and robust. Monkey patching had more than a decade to be polished, so it's now quite reliable. And the implicit task switching becomes just a trade off with async/await colored functions.

Because of the way gevent works, you can take a blocking script, and with very few modifications, make it async. Let's take the original stdlib one, and convert it to gevent:

import re
import time

import gevent
from gevent import monkey

# We magically patch everything.
# THIS MUST BE DONE BEFORE IMPORTING URLLIB
monkey.patch_all()

from urllib.request import Request, urlopen

urls = [
    "https://www.bitecode.dev/p/relieving-your-python-packaging-pain",
    "https://www.bitecode.dev/p/hype-cycles",
    "https://www.bitecode.dev/p/why-not-tell-people-to-simply-use",
    "https://www.bitecode.dev/p/nobody-ever-paid-me-for-code",
    "https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager",
    "https://www.bitecode.dev/p/the-costly-mistake-so-many-makes",
    "https://www.bitecode.dev/p/the-weirdest-python-keyword",
]

title_pattern = re.compile(r"<title[^>]*>(.*?)</title>", re.IGNORECASE)

user_agent = (
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0"
)


# We move the fetching into a function so we can isolate it into a green thread
def fetch_url(url):
    start_time = time.time()

    headers = {"User-Agent": user_agent}
    with urlopen(Request(url, headers=headers)) as response:
        html_content = response.read().decode("utf-8")
        match = title_pattern.search(html_content)
        title = match.group(1) if match else "Unknown"
        print(f"URL: {url}\nTitle: {title}")

    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Time taken: {elapsed_time:.4f} seconds\n")


def main():
    global_start_time = time.time()

    # Here is where we convert asynchronous calls into async ones
    greenlets = [gevent.spawn(fetch_url, url) for url in urls]
    gevent.joinall(greenlets)

    global_end_time = time.time()
    global_elapsed_time = global_end_time - global_start_time
    print(f"Total time taken: {global_elapsed_time:.4f} seconds")


main()

No async, no await. No special lib except for gevent. In fact it would work with the requests lib just as well. Very few modifications are needed, for a net perf gain:

URL: https://www.bitecode.dev/p/the-weirdest-python-keyword
Title: The weirdest Python keyword - Bite code!
Time taken: 0.1896 seconds

URL: https://www.bitecode.dev/p/relieving-your-python-packaging-pain
Title: Relieving your Python packaging pain - Bite code!
Time taken: 0.2071 seconds

URL: https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager
Title: Python cocktail: mix a context manager and an iterator in equal parts
Time taken: 0.1955 seconds

URL: https://www.bitecode.dev/p/why-not-tell-people-to-simply-use
Title: Why not tell people to "simply" use pyenv, poetry or anaconda
Time taken: 0.2764 seconds

URL: https://www.bitecode.dev/p/nobody-ever-paid-me-for-code
Title: Nobody ever paid me for code - Bite code!
Time taken: 0.3167 seconds

URL: https://www.bitecode.dev/p/the-costly-mistake-so-many-makes
Title: The costly mistake so many make with numpy and pandas
Time taken: 0.4341 seconds

URL: https://www.bitecode.dev/p/hype-cycles
Title: XML is the future - Bite code!
Time taken: 0.4432 seconds

The only danger is if you call gevent.monkey.patch_all() too late. You get a cryptic error that crashes your program.

So I'm much more likely to use gevent in 2023 than in 2009, as it now has very good value, especially for utility scripts. It's the HTMX of async libs: it's simple, does a lot for a low cost, and you can use your old toolbox.

trio

For many years, the very talented dev and speaker David Beazley has been showing unease with asyncio's design, and made more and more experiments and public talks about what could an alternative look like. It culminated with the excellent Die Threads presentation, live coding the sum of the experience of all those ideas, that eventually would become the curio library. Watch it. It’s so good.

Meanwhile, Nathaniel J. Smith published “Notes on structured concurrency, or: Go statement considered harmful”, an article that made a ripple in the asynchronous loving community. The article is quite complex, but the core idea is simple: spawning a coroutine (or goroutine, or green thread) is like calling goto. For the youngsters among the readers, it's a reference to a famous Edsger Dijkstra's letter.

In short, it states that every time you start an async task, just like with goto, you jump somewhere else, in some other part of the code. Which means it's very hard to know where a coroutine comes from, when it started, and where it's going, or when it's going to stop. Scope and life spans are suddenly opaque, which makes reasoning about the whole software difficult.

And according to him, this is a problem with the design of our asyncio API, not with the nature of async itself.

Nathaniel didn't just come with a problem, it also brought a solution: a new kind of design for async handling, inspired by Beazley's concepts, with a few twists.

This solution grew into a library, trio.

Trio is not compatible with asyncio, nor gevent or twisted by default. This means it's also its little own async island.

But in exchange for that, it provides a very different internal take on how to deal with this kind of concurrency, where every coroutine is tied to an explicit scope, everything can be awaited easily, or canceled.

The code isn't that different from your typical asyncio script:

import re
import time

import httpx
import trio

urls = [
    "https://www.bitecode.dev/p/relieving-your-python-packaging-pain",
    "https://www.bitecode.dev/p/hype-cycles",
    "https://www.bitecode.dev/p/why-not-tell-people-to-simply-use",
    "https://www.bitecode.dev/p/nobody-ever-paid-me-for-code",
    "https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager",
    "https://www.bitecode.dev/p/the-costly-mistake-so-many-makes",
    "https://www.bitecode.dev/p/the-weirdest-python-keyword",
]

title_pattern = re.compile(r"<title[^>]*>(.*?)</title>", re.IGNORECASE)

user_agent = (
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0"
)


async def fetch_url(url):
    start_time = time.time()

    async with httpx.AsyncClient() as client:
        headers = {"User-Agent": user_agent}
        response = await client.get(url, headers=headers)
        match = title_pattern.search(response.text)
        title = match.group(1) if match else "Unknown"
        print(f"URL: {url}\nTitle: {title}")

    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Time taken for {url}: {elapsed_time:.4f} seconds\n")


async def main():
    global_start_time = time.time()

    # That's the biggest API difference
    async with trio.open_nursery() as nursery:
        for url in urls:
            nursery.start_soon(fetch_url, url)

    global_end_time = time.time()
    global_elapsed_time = global_end_time - global_start_time
    print(f"Total time taken for all URLs: {global_elapsed_time:.4f} seconds")


if __name__ == "__main__":
    trio.run(main)

Thanks to anyio, it even can use httpx like asyncio. So what's the big deal?

Well, for once, it's a tad faster:

URL: https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager
Title: Python cocktail: mix a context manager and an iterator in equal parts
Time taken for https://www.bitecode.dev/p/python-cocktail-mix-a-context-manager: 0.1029 seconds

URL: https://www.bitecode.dev/p/hype-cycles
Title: XML is the future - Bite code!
Time taken for https://www.bitecode.dev/p/hype-cycles: 0.1203 seconds

URL: https://www.bitecode.dev/p/nobody-ever-paid-me-for-code
Title: Nobody ever paid me for code - Bite code!
Time taken for https://www.bitecode.dev/p/nobody-ever-paid-me-for-code: 0.1137 seconds

URL: https://www.bitecode.dev/p/the-costly-mistake-so-many-makes
Title: The costly mistake so many make with numpy and pandas
URL: https://www.bitecode.dev/p/relieving-your-python-packaging-pain
Title: Relieving your Python packaging pain - Bite code!
Time taken for https://www.bitecode.dev/p/the-costly-mistake-so-many-makes: 0.1074 seconds

Time taken for https://www.bitecode.dev/p/relieving-your-python-packaging-pain: 0.1286 seconds

URL: https://www.bitecode.dev/p/why-not-tell-people-to-simply-use
Title: Why not tell people to "simply" use pyenv, poetry or anaconda
Time taken for https://www.bitecode.dev/p/why-not-tell-people-to-simply-use: 0.1220 seconds

URL: https://www.bitecode.dev/p/the-weirdest-python-keyword
Title: The weirdest Python keyword - Bite code!
Time taken for https://www.bitecode.dev/p/the-weirdest-python-keyword: 0.1883 seconds

Total time taken for all URLs: 0.2133 seconds

Also, because it doesn't create nor schedule coroutines immediately (notice the nursery.start_soon(fetch_url, url) is not nursery.start_soon(fetch_url(url))), it will also consume less memory. But the most important part is the nursery:

    # That's the biggest API difference
    async with trio.open_nursery() as nursery:
        for url in urls:
            nursery.start_soon(fetch_url, url)

The with block scopes all the tasks, meaning everything that is started inside that context manager is guaranteed to be finished (or terminated) when it exits. First, the API is better than expecting the user to wait manually like with asyncio.gather: you cannot start concurrent coroutines without a clear scope in trio, it doesn't rely on the coder's discipline. But under the hood, the design is also different. The whole bunch of coroutines you group and start can be canceled easily, because trio always knows where things begin and end. As soon as things get complicated, code with curio-like design become radically simpler than ones with asyncio-like design.

Would I recommend using trio in prod for now? No.

The asyncio ecosystem and compat advantage is too good to pass on.

But it is inspiring changes in asyncio design itself in very positive ways. E.G, in 3.11, we got asyncio.TaskGroup, a similar concept to nurseries, and we had exception groups added to Python. This can't change the underlying asyncio design, but the API is way nicer.

Fast API

Fast API is not in the same category as the other libs. It's not supposed to be part of this article, because it's a completely different beast, but people asked questions about it and seemed confused, so I decided to add a little section to clarify.

Fast API is a high-level web framework like flask, but that happens to be async, unlike flask. With the added benefit of using type hints and pydantic to generate schemas.

It's not a building block like twisted, gevent, trio or asyncio. In fact, it's built on top of asyncio. It's in the same group as flask, bottle, django, pyramid, etc. Although it's a micro-framework, so it's focused on routing, data validation and API delivery.

I use Fast API when I want to make a quick little web API. It basically replaced flask for everything I used it for, unless my clients or co-workers as for it, of course.

But don't get me wrong, Django is my web framework of choice, as I very rarely need a full web async framework, and with django-ninja, it's very easy to build a Web API almost like Fast API.

Also, and this will be the topic of another controversial article, I strongly believe beginners should start their first serious project with django and not flask, despite the fact most people see it the other way around. Flask is fine for learning, or for serious projects if you know what you are doing. In the middle lie troubles.

What to use?

So which async lib to use?

Well, probably none of them.

Those tools serve a very niche purpose, and most people don't encounter it very often.

In fact, I would dare to say that the vast majority of developers are not working on problems where network performance is an issue that hasn't be solved in a better way. At least at their scale.

If you are doing a web site, blocking frameworks like Django or Flask are fine, and you'll need a task queue no matter your stack. Small to medium companies rarely build services that would need more than that. Even a lot of big companies probably don’t need more.

If you are doing calculations, this is CPU related, and they will not help you.

If you need to get a few URLS fast, a ThreadPoolExecutor is likely the Pareto solution, 99% of the time.

If you need an industrial crawler, scrappy is there for you.

You have to understand that async programming is hard, and no matter how good the tooling is, it's going to make your code more difficult to manage. It has a high price.

Ok, but let's say you are sure you need some async lib, then which one?

If you are asking this question, then just go with asyncio. It's standard, so you'll get more doc, more 3rd party components, more resources in general.

I’ll add you probably should not go with asyncio manually. Use a higher level asyncio based lib, or better, framework. Async is hard enough as it is.

Some of you may think: “but wait, for this particular task, I do need async, and I did my research, and asyncio is not the proper tool”.

But then, you don’t need the help of this article to decide, you already have the skills to make an educated choice.

Get a Future object on the next article!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK