Product Updates

How We Halved Our Latency by Rewriting Presence Service API in Rust

Stephen Blum on Feb 26, 2025
How We Halved Our Latency by Rewriting Presence Service API in Rust

Hi everyone, I'm Stephen Blum, CTO at PubNub and I'm excited to share some updates on how we're improving our Presence Service by rewriting it in Rust. By rewriting our Presence API source code in the open source programming language known for its low-level optimizations and safe concurrency primitives, we’ve seen significant drops in P99, P50, and Average latencies. By making internal routing improvements in our codebase, including better synchronization of runtime dependencies and fine-tuned optimizations, we’ve reduced actual latency in our ecosystem by approximately 500ms. This has helped eliminate latency spikes improving memory usage, encoding efficiency, and making our Presence APIs more consistent for all regions across any operating system or device - this is more than a 2x improvement!

Continue reading to learn more about how we are continuing to reduce our global memory and CPU consumption with Rust.

The Challenge

Our Presence Service is crucial: it's how we keep track of who is where in real time across channels. Originally, the backend was all built in Python. While Python is great for many things, we started hitting performance bottlenecks and memory usage as our global user base expanded. While we did get better performance and startup times by purchasing more hardware from AWS to help with allocations in handling CPU and memory spikes, these were not ideal tradeoffs. This was not an efficient and long-term solution for our mission: higher latencies and less stability aren’t acceptable for the real-time experience we aim to provide.

The Journey

Before we begin diving into how we rewrote our Presence Service using Rust to improve our low latency even further, a question remains: Why did we choose Rust?

The reason is simple: Rust is blazing fast, utilizes a powerful type system, thread-safe, concurrency-safe, memory access safe, improved compile time, and provides lower level control. It's like C code syntax with strict guard rails and it protects us from the most common vulnerabilities in its instrumentation. We knew the general purpose language would provide us the ability to dramatically improve our services, which is crucial in real-time systems.

Switching to Rust isn't also about speed; it's also imperative for other use cases. As you’ll learn about how we improved our database call optimizations, we were also able to see a lower load on our Presence containers in all regions. This is a big deal for us: lower cluster resource consumption means more throughput, which translates to cost savings and less environmental impact, all while remaining at a state-of-the-art high performance.

Phase 1: Rewriting the Presence Read APIs

Our first time iteration of this process was tackling the Presence Read APIs:

  • hereNow: Obtain a list of client User IDs, including clients' state data, and the total occupancy of the channel on app startup.

  • whereNow: Confirm the channels to which the client is currently subscribed

  • getState: Get the state of other clients based on their User ID. State is the dynamic custom state for users on one or more channels and stores on a channel as long as the user stays subscribed.

These are the endpoints clients hit to get current Presence information. We slowly rolled out the Rust versions of these APIs and the impact was immediate in our benchmarks:

Graph of the hereNow P99 Latency API

The hereNow P99 Latency API graph above displays each line representing one aggregate of regions by locality (US, EU, Asia, etc). You'll notice that latencies dropped significantly across the board. All regions, some of which previously had higher latencies, saw 2x or better improvements: latency for read endpoints went from an average of 1 second to an average of 200 ms, which is an 800 milliseconds decrease - an 80% reduction in latency is an incredible improvement that drastically improves real-time communication.

Phase 2: Bundling Write APIs

Previously, our Presence device tracking data was stored in many Redis databases, and clients connected from each of the regions were in the worst case experiencing varied latencies because their requests would interact with databases. To solve this issue, we needed to confirm our validation before overhauling our write Presence APIs via Rust (heartbeat, leave, and setState: What if we routed write-requests as a bundled transaction instead of individual read and read requests?

Even if the bundled request took longer, it should be faster to complete the call once rather than multiple times. The idea was that database operations would be faster as one DB call rather than multiple, potentially reducing overall latency. Once we bundled these transactions, we saw a dramatic increase in improvement for each of the write APIs.

Previously, the heartbeat API would return a 200 OK immediately for certain situations. Then behind the scenes perform database operations asynchronously, which may end up being no-ops or errors. By routing the DB calls in tandem, these operations could be completed more quickly, be more reliable, and refine how we handled invariants.

Average Heartbeat API Latency Graph

The setState API is synchronous by nature. In our metrics, we saw latencies drop from around 100ms to 50ms. You can see all the regions experienced similar improvements, which remained consistent.

Latency Improvements for setState API Graph

However, it's important to note that the initial latency spike for our API request metrics is not indicative of actual performance: real-world performance and reliability improvements were substantial. By making the database operations more of a bundle query, we reduced the cumulative latency introduced by multiple Redis calls.

Phase 3: Deploying Rust Write APIs

Currently, our Rust-based Presence Service is running smoothly with just 20% of the resources we would normally need. As we roll out Rust versions of the Presence Write APIs, we're expecting latencies to drop even further. Currently, Python adds significant overhead to processing time. With Rust's efficiency, we anticipate at least another 50% reduction in latency, even for clients halfway around the world.

For example, here is P90 data to showcase the latency improvements for setState for improved speedups:

Latency Improvements for setState API (P90) Part 1

Latency Improvements for setState API (P90) Part 2

What's Next?

When working with a globally distributed system, it's crucial to understand how data center locations and network routes affect performance. You need to think globally when deploying your APIs to ensure your latency is as optimized as possible.

We're excited about the improvements we've seen so far and can't wait to roll out the rest of the Rust-based new features. After we finish deploying the Rust write APIs (Phase 3), we're setting our sights on the Presence Event pipeline (Phase 4), the component responsible for monitoring device traffic which is used as a supplement to counting a device as online. It tracks things like TCP SYN ACK, TCP RST, and TCP FIN and sets the device state to “online” or “offline” depending on the TCP flag. The goal is to have all components of the Presence Service running in Rust, reducing latencies and resource usage further.

Stay tuned for more updates and feel free to ask us anything!