My Favorite Bits of OSDI/ATC'23
source link: https://brooker.co.za/blog/2023/07/13/osdi.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
My Favorite Bits of OSDI/ATC'23
Talking to 3D people is cool again.
This week brought USENIX ATC'23 and OSDI'23 together in Boston. While I've followed OSDI and ATC papers for years, it's the first time I've been to either of them (I've have been to NSDI a couple times). It was a really good time. In this post I'll cover a couple of my favorite papers1, and trends I noticed.
Overall, it was great to meet a bunch of folks in person who I've only interacted with online, and nice to be back to in-person conferences.
Thoughts and Trends
When we presented the Firecracker paper at NSDI'20, several people said to me that they were worried about the fact we had chosen Rust, because it raised the risk that Firecracker wouldn't be useful once Rust was no longer in vogue. This year at OSDI, pretty much everybody I talked to was building in Rust. Obvious exceptions are folks doing AI/ML work (Python still seems big there), and folks looking to get into the mainline Linux kernel. I couldn't be more happy to see memory safety start to become the default practice in systems.
Loads of folks were talking about emergent system properties like metastability. Unfortunately, not a lot of folks seem to be writing papers about it, or getting grants to work on it. I did talk to a couple folks with upcoming papers, and I really hope the hallway interest turns into more publications. Metastable failures in distributed systems and Metastable Failures in the Wild are some of the most important systems work of the last few years, in my opinion. There's a lot more to do here.
I got a rough feeling that more papers were paying more attention to security issues than in years past. Subtle issues like timing side-channels especially. Another trend I like to see. Security and systems have always been linked, so this isn't new, but there does seem to be a reduction in completely security-naive work.
Some of the Papers I Enjoyed The Most
- Take Out the Trache by Audrey Cheng et al2. This paper makes an astute observation about how caches help with latency the most when everything a transaction needs is cached, and so traditional cache eviction strategies don't make the right decisions. They then present new metrics, and a nice design for improving things. Worth reading if you're building any kind of database or distributed cache.
- VectorVisor by Samuel Ginzburg et al. What if we compiled normal applications to WASM, then ran them on GPUs? And it actually worked? This is the kind of academic systems work I love the most: bold, innovative, and solving a problem that doesn't really exist yet but definitely could in the future.
- EPF: Evil Packet Filter by Di Jin et al. Operating system kernels like Linux use various internal mechanisms that make it harder to go from kernel bug to working exploit. This paper looks at how useful the current BPF implementation can be for thwarting these mechanisms.
- Triangulating Python Performance Issues with SCALENE by Emery Berger et al. A selection of cool approaches for profiling CPU, GPU, and memory in Python programs. Emery finished his talk with a tantalizing demo: generating performance patches automatically by combining LLMs with profiler results.
There are many papers I haven't read yet, but have heard good things about. I want to look at MELF, zpoline, Ensō, and vMVCC in more detail.
Amazon's Papers
We presented two papers at ATC this year:
- On-demand container loading in AWS Lambda by me, Mike Danilov, Chris Greenwood, and Phil Piwonka. I wrote a post about this paper back in May. We won a best paper award for this work!
- Distributed Transactions at Scale in Amazon DynamoDB by a great group of folks from the DynamoDB team, looks at DynamoDB's serializable atomic transaction scheme based on Timestamp Ordering (TO) and 2PC. This paper is a perfect antidote to the widespread idea that transactions can't or don't scale. Combined with the team's ATC'22 paper, this is an excellent deep dive into how a massive scale (105.2 million TPS for one workload) database works under the covers.
Cloning and Snapshot Safety
A number of papers in the program implemented VM or process cloning, typically for accelerating serverless workloads. This thread of work, related to our own work on Lambda Snapstart, is bound to have a lot of influence over how systems are built in the coming decades. But I was disappointed to see most of these papers not paying attention to some of the uniqueness risks of cloning. As we describe in Restoring Uniqueness in MicroVM Snapshots, naively cloning VMs leads to situations where UUIDs, cryptographic keys, or IVs can be duplicated between clones. I'd love to see folks working on cloning insist on solving this problem in their solutions.
Soapbox
Two things came up that I found extremely disappointing. First, there were a lot of folks who should have been there (especially paper authors) who couldn't get visas to come to the US. It's unacceptable and counterproductive to have a visa policy where folks who are doing cutting-edge research in an economically-critical areas can't trivially travel to the USA.
Second, a group of folks presented the results of the CS Conference Climate & Harassment Survey. I'd recommend reading Dan Ports' post for a summary of the results. In short, 40% of the community have experienced harassment at conferences (not necessarily this conference, or a USENIX conference), and 30% of non-male attendees don't feel welcome. This is unacceptable, and we need to do better3.
Footnotes
- These are some of my favorites of the ones I've read, or saw talks for. If you presented a paper and it's not on this list, you can safely assume I haven't had time to check out your excellent work yet.
- Great DB work from the folks at UC Berkley? Hard to believe.
- I unfortunately missed the dedicated session on this topic, and look forward to attending similar sessions at future conferences.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK