5

Real-Time Messaging Architecture at Slack

 1 year ago
source link: https://www.infoq.com/news/2023/04/real-time-messaging-slack/?itm_source=infoq&itm_medium=popular_widget&itm_campaign=popular_content_list&itm_content=
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Real-Time Messaging Architecture at Slack

Apr 18, 2023 2 min read

Slack recently published how it sends millions of real-time messages daily across the globe. The company provides a comprehensive insight into its Pub/Sub architecture, designed to manage real-time messages at scale. It highlights the unique challenges posed by delivering real-time messages across different time zones and regions and how Slack's engineers designed the infrastructure to handle them.

Sameera Thangudu, Senior Software Engineer at Slack, explains the importance of this architecture:

Our servers serve tens of millions of channels per host, tens of millions of connected clients, and our system delivers messages across the world in 500ms. With the linear scalability of our current architecture, our projections show that we can serve many more customers.

She states that the company plans to enhance its architecture to serve a more significant customer base.

The system's backend is composed of several services. Channel Servers (CS) are stateful, in-memory servers holding channel history. A consistent hashing mechanism maps each CS to a subset of channels. At peak times, each host serves about 16 million channels. Consistent hash ring managers (CHARMs) manage the consistent hash ring for CSs, ensuring the replacement of unhealthy CSs within 20 seconds. Consul stores the up-to-date configuration of consistent hashes.

2Slack%20-%20Consul-1681645825634.jpg

Source: https://slack.engineering/real-time-messaging/

Gateway Servers (GS), like CSs, are stateful, in-memory servers. They maintain user information and WebSocket channel subscriptions and act as an interface between Slack clients and CSs. GSs are deployed across multiple geographical regions to optimize connection speeds. Admin Servers (AS) are stateless, in-memory servers that interface between the Webapp backend and CSs. Finally, Presence Servers (PS) track online users, powering the green presence dots in Slack clients.

Every Slack client has a persistent WebSocket connection to Slack's servers to receive real-time events to maintain its state. The client sets up a WebSocket connection through several steps, such as fetching the user token and WebSocket connection setup information from the Webapp backend. Then the client initiates a WebSocket connection to the nearest edge region, and GS fetches user information and sends the first message to the client. Envoy load balances incoming traffic and handles TLS termination.

1Slack%20-%20Client%20WebSocket%20Setup-1681645825634.jpg

Source: https://slack.engineering/real-time-messaging/

Once the client setup is complete, each message sent in a channel is broadcasted to all clients online in the channel. Messages journey through Webapp API, AS, and CS before being sent to every subscribed GS worldwide. Each GS that receives the message sends it to every connected client subscribed to that channel ID.

1Slack%20-%20Journey%20of%20a%20Message-1681645825634.jpg

Source: https://slack.engineering/real-time-messaging/

Aside from chat messages, events are another message type that changes the client's state in real-time. Transient events, such as a user typing in a channel, follow a slightly different flow since a database does not persist these events. The diagram below illustrates this flow.

1Slack%20-%20Transient%20Event%20Flow-1681645825634.jpg

Source: https://slack.engineering/real-time-messaging/

About the Author

Eran Stiller

Eran Stiller is a Principal Software Architect based in Melbourne, Australia. As a seasoned software architect and CTO, Eran designed, implemented and reviewed various software solutions across multiple business domains. Eran has many years of experience in the software development world and a track record of public speaking and community contribution. Microsoft recognized him as a Microsoft Regional Director (MRD) since 2018 and a Microsoft Most Valuable Professional (MVP) on Microsoft Azure between 2016-2022.

Show more

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK