1

AWS Lambda 的 cache 架構

 5 months ago
source link: https://blog.gslin.org/archives/2024/03/28/11719/aws-lambda-%e7%9a%84-cache-%e6%9e%b6%e6%a7%8b/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

AWS Lambda 的 cache 架構

Lobsters 上看到的老文章:「[Cache Architecture for] Container Loading in AWS Lambda」,原文從 url 看起來是去年五月發表的資訊了:「Container Loading in AWS Lambda」。

主要是在講 container 怎麼 load 才會儘快執行,首先是提到了大家常用的 layer cache,在 AWS Lambda 上則是改用了 block level cache:

Most of the existing systems do this at the layer or file level, but we chose to do it at the block level.

然後每一塊 512KB:

We unpack a snapshot (deterministically, which turns out to be tricky) into a single flat filesystem, then break that filesystem up into 512KiB chunks.

接著是提到 lazy load 的方式:「Slacker: Fast Distribution with Lazy Docker Containers」:

Our analysis shows that pulling packages accounts for 76% of container start time, but only 6.4% of that data is read.

Slacker speeds up the median container development cycle by 20x and deployment cycle by 5x.

而這個技巧也被用在 AWS Lambda 上,而且是透過 FUSE 實作:

In Lambda, we did this by taking advantage of the layer of abstraction that Firecracker provides us. Linux has a useful feature called FUSE provides an interface that allows writing filesystems in userspace (instead of kernel space, which is harder to work in).

另外一個 AWS Lambda 有實作的是 tiered caching,分成三層,包括了 worker 的 local cache (L1)、同一個 AZ 上的 cache (L2) 以及 S3 上的資料 (L3):

Despite our local on-worker (L1) cache being several orders of magnitude smaller than the AZ-level cache (L2) and that being much smaller than the full data set in S3 (L3), we still get 67% of chunks from the local cache, 32% from the AZ level, and less than 0.1% from S3.

也因為 L3 cache 是 S3 的關係,他們在 L1 與 L2 上就不用擔心 durability 的問題 (反正不見了就往後面找):

The whole set of chunks are stored in S3, meaning the cache doesn’t need to provide durability, just low latency.

但還是用了 Erasure code,儘量維持每個 cache tier 在自己 tier 裡面就可以找到資料的機率,這樣可以盡量降低 peak latency (於是造成 99.9%/99.95%/99.99% 的 SLO 不好看?):

Think about what happens in a classic consistent hashed cache with 20 nodes when a node failure happens. Five percent of the data is lost. The hit rate drops to a maximum of 95%, which is a more than 5x increase in misses given that our normal hit rate is over 99%. At large scale machines fail all the time, and we don’t want big changes in behavior when that happens.

So we use a technique called erasure coding to completely avoid the impact. In erasure coding, we break each chunk up into M parts in a way that it can be recreated from any k. As long as M - k >= 1 we can survive the failure of any node with zero hit rate impact (because the other k nodes will pick up the slack).

大概是本來比較簡單的三層架構在 benchmark 後發現無法達成對應的 SLO,所以就「補上」erasure code 拉高 SLO,從這邊就可以感覺到老闆的要求對於架構設計上的影響...

話說難得看到一些細節被丟出來...

Related

Amazon ECS 的 Service Discovery

AWS 宣佈了 Amazon ECS 也支援 Route 53 提供的 Service Discovery 了:「Introducing Service Discovery for Amazon ECS」。 也就是說現在都整合好了... 比較一下先前需要自己包裝起來套用的方式會少不少功夫: Previously, to ensure that services were able to discover and connect with each other, you had to configure and run your own service discovery system or connect every service to a load…

March 24, 2018

In "AWS"

Lambda 可以掛 EFS 了

AWS Lambda 可以掛 Amazon EFS 了:「New – A Shared File System for Your Lambda Functions」。 這有點像是一開始 Amazon EC2 只能把資料存到 Amazon S3 上,後來支援 EBS 的感覺:這使得很多程式可以直接用內建的 library 操作檔案系統,而不需要掛 AWS 專用的 library 操作 Amazon S3。 有了一個 filesystem 後馬上就可以想到很多惡搞的方法,像是用 lambda 搞 serverless PHP 之類的,之後應該會看到很有「創意」的玩法...

June 19, 2020

In "AWS"

AWS Lambda 的消息:計費方式 1ms、上限變高、自訂 Image

這次 AWS re:Invent 對 AWS Lambda 也更新了不少東西: New for AWS Lambda – 1ms Billing Granularity Adds Cost Savings New for AWS Lambda – Functions with Up to 10 GB of Memory and 6 vCPUs New for AWS Lambda – Container Image Support 首先是計價方式的改變,從本來 100ms 降到 1ms,對於這點 Cliff 有提出來了,cold start 會是成本中很重的一環…

December 2, 2020

In "AWS"

a611ee8db44c8d03a20edf0bf5a71d80?s=49&d=identicon&r=gAuthor Gea-Suan LinPosted on March 28, 2024March 28, 2024Categories AWS, Cloud, Computer, Infrastructure, Murmuring, Network, ServiceTags amazon, architecture, aws, cache, cloud, lambda, layer, service

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Post navigation


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK