2

這個月 GitHub 的不穩問題,都是 mysql1 這個 cluster 的鍋… – Gea-Suan Lin's BLOG

 2 years ago
source link: https://blog.gslin.org/archives/2022/03/28/10630/%e9%80%99%e5%80%8b%e6%9c%88-github-%e7%9a%84%e4%b8%8d%e7%a9%a9%e5%95%8f%e9%a1%8c%ef%bc%8c%e9%83%bd%e6%98%af-mysql1-%e9%80%99%e5%80%8b-cluster-%e7%9a%84%e9%8d%8b/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

這個月 GitHub 的不穩問題,都是 mysql1 這個 cluster 的鍋...

GitHub 針對了這個月的四次 downtime 說明,大致上都跟 mysql1 這組 cluster 有關:「An update on recent service disruptions」,這是 Keith Ballinger 發的文章,找了一下掛的頭銜是 SVP of Engineering at GitHub。

文章裡提到的 mysql1 在「Partitioning GitHub’s relational databases to handle scale」這邊可以看到一些資訊 (我在「GitHub 的 MySQL 架構與數字」這邊也有提到),基本上有 ProxySQL + Vitess 兩套方案在 scale,但可以看出來主資料庫本身還是有很大的 loading 在上面跑。

這次的問題是 mysql1 看起來這次遇到了效能上的瓶頸,不過還是沒找到原因,這可以從這幾次的說明看出來,從第一次的 outage:

The incident appeared to be related to peak load combined with poor query performance for specific sets of circumstances.

第二次的:

The following day, we saw the same peak traffic pattern and load on mysql1. We were not able to pinpoint and address the query performance issues before this peak, and we decided to proactively failover before the issue escalated.

第三次的:

While we had reduced load seen in the previous incidents, we were not fully confident in the mitigations.

In this third incident, we enabled memory profiling on our database proxy in order to look more closely at the performance characteristics during peak load.

到最近第四次的:

In order to reduce load, we throttled webhook traffic and will continue to use that as a mitigation to prevent future recurrence during peak load times as we continue to investigate further mitigations.

可以看到基本上還沒完,之後再遇到問題時應該還是會把 webhook traffic 拿出來開刀...

Related

GitHub 的 MySQL 架構與數字

前幾天 GitHub 有寫一篇文章提到他們的 MySQL 是怎麼 scale 的,另外裡面也有一些數字可以看:「Partitioning GitHub’s relational databases to handle scale」。 他們最主要的 database cluster 叫做 mysql1,裡面有提到 2019 年的時候這個 cluster 是 950K qps,其中 primary 有 50K qps: In 2019, mysql1 answered 950,000 queries/s on average, 900,000 queries/s on replicas, and 50,000 queries/s on the primary. 在 2021 年的時候變成 1.125M…

September 30, 2021

In "Computer"

測試 MariaDB 上 Galera Cluster 的 Isolation

雖然人還在大阪員旅,不過看到這篇很精彩的測試還是先寫下來:「Call me Maybe: MariaDB Galera Cluster」,作者在 Stripe 花了很多時間在測試各種資料庫的資料正確性。 由於 Percona 的 Percona XtraDB Cluster (PXC) 是基於 Galera Cluster 改出來的,再加上敝公司裡面大量使用 PXC,決定還是趕緊花時間看完整篇文章確認到底講了什麼。 先講目前看到的結論: 作者認為 Galera Cluster 官方宣稱的 SNAPSHOT-ISOLATION (SI) > REPEATABLE-READ (RR) 是有問題的講法,在學理上 SI 與 RR 都低於 SERIALIZABLE (1SR),也都高於 READ-COMMITTED (RC),但兩者之前沒有直接包含。 Galera Cluster 的測試結果看起來沒有做到 SI 的要求,不過測試中看起來有達到 READ-UNCOMMITTED (RU),在 RC 的測試中也沒找到問題,不過作者還是懷疑系統有達到 RC 的要求。…

September 6, 2015

In "Computer"

2019 年 Percona 對 UUID 當作 Primary Key 的看法

前陣子的「為資料庫提案新的 UUID 格式」這邊提到了有人提案要增加新的 UUID 格式,Percona 的老大 Peter Zaitsev 在 Twitter 上貼了「UUIDs are Popular, but Bad for Performance — Let’s Discuss」這篇在 2019 年時他們家的文章,題到了 MySQL 使用 UUID 當作 Primary Key 的事情: UUIDs are Popular, but Bad for Performance — Let’s Discuss https://t.co/2ulSGPB77L— Peter Zaitsev (@PeterZaitsev) August 21, 2021 要注意的是這篇文章沒有要從頭解釋 UUID 對於 Primary…

August 21, 2021

In "Computer"

a611ee8db44c8d03a20edf0bf5a71d80?s=49&d=identicon&r=gAuthor Gea-Suan LinPosted on March 28, 2022Categories Computer, Database, Murmuring, MySQL, Network, Service, SoftwareTags database, db, downtime, github, mysql, mysql1, outage, performance, sql, traffic

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Post navigation


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK