CPU Core 之間溝通的時間成本

在 Hacker News 上看到「Measuring CPU core-to-core latency (github.com/nviennot)」這篇，專案在「Measuring CPU core-to-core latency」這裡，看起來是個有趣的研究，測試許多不同 CPU 內，跨 core 之間溝通的時間花費。

依照專案的說明，測試的方式是利用 cache coherence 來來量測：

We measure the latency it takes for a CPU to send a message to another CPU via its cache coherence protocol.

By pinning two threads on two different CPU cores, we can get them to do a bunch of compare-exchange operation, and measure the latency.

裡面已經測了很多不同的 CPU，然後可以看到一些有趣的結果。

像是第一張圖片的「Intel Core i9-12900K @ 8+8 Cores (Alder Lake, 12th gen) 2021-Q4」這組，大家還蠻好奇 CPU #8 到底是怎麼一回事，跨 core 溝通的 latency 特別低，還特別找了 CPU 的 die 圖片看看：

另外一個是 AWS 上的 c6a.metal，機種是「AMD EPYC 7R13 @ 48 Cores (Milan, 3rd gen) 2021-Q1」，可以看到被分成了六個區塊：

接下來在 ARM 平台，在更多 CPU core 的 c7g.16xlarge 上，機種「AWS Graviton3 @ 64 Cores (Arm Neoverse, 3rd gen) 2021-Q4」，會看到更多不平均的現象：

早一點的 c6gd.metal 雖然也還是 ARM 的 64 cores 機種「AWS Graviton2 @ 64 Cores (Arm Neoverse, 2nd gen) 2020-Q1」，但可以看到很不一樣的 latency pattern：

大致上可以感覺到當 core 數愈多就會有很多技術上的瓶頸，導致不同 core 之間的溝通成本不一樣... 這個感覺跟當初學到 NUMA 的情況有點像。

Let's Encrypt 升級資料庫伺服器 (AMD YES？)

Let's Encrypt 升級了 MariaDB 資料庫的伺服器 (跑 InnoDB)，特地寫了一篇文章出來講：「The Next Gen Database Servers Powering Let's Encrypt」。 CPU 的部份從本來的 2x Intel Xeon E5-2650 (Total 24 cores / 48 threads) 換成了 2x AMD EPYC 7542 (Total 64 cores / 128 threads)，這點在本來就是 CPU 滿載的情境下改善很大：而本來的瓶頸一解決，也使得 API 的 latency 直接降下去：回頭看一下架構，可以看到他們提到沒有使用分散式的資料庫，而是單台 database 硬撐，驗證了即使到了 Let's Encrypt 這種規模，以暴制暴還是很有效的：…

January 22, 2021

In "Computer"

Percona 對 mysql_query_cache 的測試 (以 Magento 為例)

Percona 的人以現在的觀點來看 mysql_query_cache：「The MySQL query cache: Worst enemy or best friend?」。起因主要也是懷疑 query cache 是 global mutex 在現在的硬體架構 (主要是 CPU 數量成長) 應該是個負面的影響，但不確定影響多少： The query cache is well known for its contentions: a global mutex has to be acquired for any read or write operation, which means that any access is…

August 7, 2015

In "Computer"

Apple M1 的效能與省電原因

在 Hacker News Daily 上看到 Apple M1 為什麼這麼快又省電的解釋，可以當作一種看法： 1/ In case you were wondering: Apple's replacement for Intel processors turns out to work really, really well. Some otherwise skeptical techies are calling it "black magic". It runs Intel code extraordinarily well.— Robᵉʳᵗ Graham?, provocateur (@ErrataRob) November 25, 2020 可以在 Thread…

November 30, 2020

In "Computer"

Author Gea-Suan LinPosted on September 19, 2022Categories Computer, Hardware, MurmuringTags access, cache, coherence, core, cpu, hardware, latency, numa, performance, protocol, speed, thread

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Previous Previous post: Backblaze 對 SSD 存活率的報告

CPU Core 之間溝通的時間成本

CPU Core 之間溝通的時間成本

Related

Let's Encrypt 升級資料庫伺服器 (AMD YES？)

Percona 對 mysql_query_cache 的測試 (以 Magento 為例)

Apple M1 的效能與省電原因

Leave a Reply

Post navigation

Recommend

The State of Web3 in 2022 through Data

2022 年移动直播应用市场洞察：上半年全球总收入超 8 亿美元，美国、日本最高

Microsoft finally issues fix for 'No Bootable Device' error on Surface Go and Go...

明星接连“塌房”，虚拟偶像代言费数百万成最大赢家？

央行副行长范一飞：将苏州、雄安等 4 地数字人民币试点范围扩大到全省

今年丰收节花呗专项支持绿色有机农场品消费

本田将在 2025 年之前推出 10 款或更多新电动摩托车

女装大卖家，开始涌入抖音

作为品牌传递的重要媒介，文案应该怎么玩？

Connect SuperTokens to a MySQL or PostgreSQL DB - DZone Integration

About Joyk