GitHub Availability Report: January 2024
source link: https://github.blog/2024-02-14-github-availability-report-january-2024/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
GitHub Availability Report: January 2024
In January, we experienced three incidents that resulted in degraded performance across GitHub services.
In January, we experienced three incidents that resulted in degraded performance across GitHub services.
January 09 12:20 UTC (lasting 140 minutes)
On January 9 between 12:20 and 14:40 UTC, services in one of our three sites experienced elevated latency for connections. This led to a sustained period of timed-out requests across a number of services, including but not limited to our Git backend. An average of 5% and max of 10% of requests failed with a 5xx response or timed out during this period.
This was caused by an upgrade of hosts, which led to temporarily reduced capacity as the upgrade rolled through the fleet. While these hosts had plenty of capacity to handle the increased load, we found that the configured connection limit was lower than it should have been. We have increased that limit to prevent this from recurring. We have also identified improvements to our monitoring of connection limits and behavior and changes to reduce the risk of host upgrades leading to reduced capacity.
January 21 02:01 UTC (lasting 7 hours 3 minutes)
On January 21 at 2:01 UTC, we experienced an incident that affected customers using GitHub Codespaces. Customers encountered issues creating and resuming Codespaces in multiple regions due to operational issues with compute and storage resources.
Around 25% of customers were impacted, primarily in East US and West Europe. We re-routed traffic for Codespace creations to less impacted regions, but existing Codespaces in these regions may have been unable to resume during the incident.
By 7:30 UTC, we had recovered connectivity to all regions except West Europe, which had an extended recovery time due to increased load in that particular region. The incident was resolved on January 21 at 9:34 UTC once Codespace creations and resumes were working normally in all regions.
We are working to improve our alerting and resiliency to reduce the duration and impact of region-specific outages.
January 31 12:30 UTC (lasting 147 minutes)
On January 31, we deployed an infrastructure change to our load balancers in preparation towards our longer term goal of IPv6 enablement at GitHub.com. This change was deployed to a subset of our global edge sites. The change had the unintended consequence of causing IPv4 addresses to start being passed as an IPv4-mapped IPv6-compatible address (for example, 10.1.2.3 became ::ffff:10.1.2.3) to our IP Allow List functionality. While our IP Allow List functionality was developed with IPv6 in mind, it wasn’t developed to handle these mapped addresses, and hence, started blocking requests as it deemed these to be not in the defined list of allowed addresses. Request error rates peaked at 0.23% of all requests.
In addition to changes deployed to remediate the issues, we have taken steps to improve testing and monitoring to better catch these issues in the future.
Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.
We do newsletters, too
Get tips, technical guides, and best practices right in your inbox.
SubscribeMore on GitHub Availability Report
GitHub Availability Report: December 2023
In December, we experienced three incidents that resulted in degraded performance across GitHub services.
GitHub Availability Report: November 2023
In November, we experienced one incident that resulted in degraded performance across GitHub services.
GitHub Availability Report: October 2023
In October, we experienced two incidents that resulted in degraded performance across GitHub services.
Recommend
-
21
Historically, GitHub has published post-incident reviews for major incidents that impact service availability. Whether we’re sharing new investments to infrastructure or detailing site downtimes, our belief is that we can...
-
4
In February, we experienced one incident resulting in significant impact and degraded state of availability for GitHub.com, issues, pull requests, GitHub Actions, and GitHub Codespaces services. February 2 19:05 UTC (lasting 13...
-
3
In March, we experienced a number of incidents that resulted in significant impact and degraded state of availability to some core GitHub services. This blog post includes a detailed follow-up on
-
3
In May, we experienced three distinct incidents that resulted in significant impact and degraded state of availability to multiple services across GitHub.com. This report also sheds light into the billing incident that im...
-
2
Home Chevron icon...
-
2
GitHub Availability Report: July 2023In July, we experienced one incident that resulted in degraded performance across GitHub services. ...
-
3
In November, we experienced one incident that resulted in degraded performance across GitHub services. November 3 18:42 UTC (lasting 38 minutes) Bet...
-
7
GitHub Availability Report: December 2023In December, we experienced three incidents that resulted in degraded performance across GitHub services.
-
6
Review: BMW CE 02 E-MotorbikeWith this Tron Light Cycle and skateboard mashup, BMW is back with the CE 02—its next future-facing electric super scooter....
-
1
GitHub Availability Report: February 2024In February, we experienced two incidents that resulted in degraded performance across GitHub services.
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK