Engineering

Introducing new runtime performance improvements for App Platform

Posted: May 2, 20244 min read

At DigitalOcean, our mission is to provide the tools and infrastructure needed to scale exponentially and accelerate successful cloud journeys. In order to improve the efficiency of App Platform, we have migrated all applications to a newly upgraded infrastructure, which includes substantial improvements to the runtime performance. In this blog, we will walk through how we implemented the latest version of gVisor and share the results of our performance testing.

gVisor enhancements

App Platform is DigitalOcean’s Platform-as-a-Service. You as a user bring the code, we breathe life into it and let it run. One of the core promises of App Platform is that users don’t have to worry about the underlying infrastructure and ensuring that it’s always up-to-date. That’s what we do! There are a lot of building blocks that work together to give you that seamless experience of “just deploying an app”. One of these building blocks is Google’s gVisor container runtime, which helps enable us to securely and densely pack applications next to each other on the same host.

However, gVisor doesn’t come without a price for that added security. One of its core principles is that it intercepts syscalls of the application and handles those syscalls in gVisor rather than the kernel. That interception, implemented with ptrace, has a lot of overhead. To address this issue, Google released a new approach to intercepting syscalls called systrap. This new platform drastically reduces the overhead of handling syscalls in gVisor and thus improves the observed performance of a wide spectrum of applications. Essentially, any workload that is not purely CPU-bound by expensive computations will benefit from this improvement.

To put the improvements into perspective, we’ve measured end-to-end* (as your users would observe it) throughput against a minimal Node.js app, which will likely be network-bound, and a WordPress app, cycling through different themes. The WordPress use case was deliberately chosen because the performance of PHP applications was notably impacted by the gVisor sandbox. This is due to the considerable amount of file operations necessary to run a PHP application, and cycling through different themes makes this a pathological test.

image alt text

The above graph clearly showcases the improvements available with the new gVisor version alongside its systrap platform, resulting in more than twice the throughput on the basic Node.js app, and more than seven times the throughput on the WordPress app. Depending on the profile of your application, the improvements might be more or less noticeable than our results here, which serve as an indicator and confirmation of the numbers that Google shared in their systrap announcement.

Rolling it out

In our testing, we’ve found a couple of regressions in gVisor as compared to the older version we’ve been using. Some of these were incompatibilities with how applications were supposed to behave and others were issues in the platform itself. We’ve worked in a close feedback loop with the gVisor team upstream to collect the necessary information about the issues we’ve seen to pinpoint their root cause and have them swiftly fixed. We’re very thankful for the fast responses of the gVisor team and their responsiveness in helping to get these issues fixed.

Making a platform change like this isn’t without risk and it’s of utmost importance to us not to disturb or break your applications while doing such a change. Almost 60 clusters spread across all of our regions had to be upgraded safely. As such, we’ve taken a rather slow, canary-based approach to rolling this change out across our fleet of clusters. The gVisor update was part of a larger package of updates across our entire stack, including new versions of the Linux kernel and Kubernetes among others. Instead of opting for in-place upgrades to existing clusters, we’ve created new clusters and slowly enabled applications to be deployed to them. Once deemed stable, we started the process of creating replacement clusters for the old ones and moving apps from the old to the new clusters. This process is now complete and all apps are running on new clusters.

Give the improved App Platform performance a try!!

If you had issues with the outright performance of your application on App Platform before, consider giving it another shot. We’ve gotten great feedback from customers that we’ve moved early in this process saying that their respective performance issues have been solved by simply moving the application to the new infrastructure.

Give it a shot! Sign into your App Platform account on the cloud console, or sign up for App Platform today by creating a DigitalOcean account.

*The benchmark throughput numbers are based on DigitalOcean’s internal testing framework and parameters, using an App with 2 dedicated vCPUs. Actual performance numbers may vary depending on a variety of factors such as system configuration, operating environment, and type of workloads.

Share

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!Sign up

Related Articles

Transforming Data Protection: Unveiling Faster, More Reliable Backups and Snapshots
engineering

Transforming Data Protection: Unveiling Faster, More Reliable Backups and Snapshots

Jawaad Tariq, House Li, Urchin Colley , Jenni Griesmann, and Archana Kamath

May 15, 20245 min read

How We Implemented the Dedicated Egress Feature on App Platform
engineering

How We Implemented the Dedicated Egress Feature on App Platform

May 8, 20245 min read

Dolphin: Mastering the Art of Automated Droplet Movement
engineering

Dolphin: Mastering the Art of Automated Droplet Movement

Jes Olson, Lucy Berman, and Roman Gonzalez

January 23, 202410 min read