Traversal

From Data to Discovery: How Traversal Accelerates AI Innovation on DigitalOcean

"DigitalOcean helps us move faster, stay reliable, and focus on what really matters—building technology that helps engineers solve problems faster than ever before."

- Prashanthi Ramachandran, Technical AI Staff

Traversal is an AI startup tackling one of observability’s hardest challenges: cutting through noisy data to understand why issues happen and how to fix them—reducing downtime and stress for engineers. This emerging class of tooling is becoming known as AI Site Reliability Engineering (AI SRE), and Traversal is advancing the category with advanced AI and causal machine learning that delivers root cause analysis for production systems—long considered the holy grail of monitoring.

As infrastructure engineer Carlo Ruiz explained, Traversal’s research roots are central to its vision. He shared that while many AI tools today focus on alert-based triaging and reactive problem solving, Traversal was built to go deeper. “Root cause analysis for incidents that span disparate parts of complex production systems has been a known hard problem for years, and we’re building a system that can finally do it.”

Academic discipline, startup speed

Since launching in early 2024, Traversal has grown quickly into a team of over 50 engineers: a mix of AI researchers, data scientists, and infrastructure experts. The founders’ academic backgrounds have shaped a culture that values experimentation and rigor as much as speed. “Any team working at the forefront of AI is testing and iterating, both the technology and its utility to the end user,” Carlo said. “And when we see Traversal solving real customer problems, it’s incredibly motivating.”

Prashanthi Ramachandran, a member of the technical staff working on the AI side of the product, said that diversity of expertise is one of Traversal’s biggest strengths. “We have people from trading firms, startups, academia, and big tech,” she said. “That variety brings a richness of perspective to how we design the product. It’s not just an AI team, it’s a team that values and understands systems and people.”

Traversal’s AI SRE—powered under the hood by a coordinated swarm of agents—can operate in both reactive and proactive modes. It can answer direct queries from engineers in a chat-like interface or run in the background, continuously scanning data to identify patterns that deserve further investigation. As Carlo put it, “The system doesn’t just wait for alerts—it looks for what stands out and what might become an issue before it’s reported.”

Choosing DigitalOcean for developer-friendly AI infrastructure

As a B2B company, Traversal works closely with large customers that generate massive amounts of observability data. Their platform needs to live close to the data itself—a principle often called “data gravity.” When Traversal began working with customers hosted on DigitalOcean, the team decided to deploy there as well. “We wanted to be as close to our customers as possible,” Carlo said. “Once we started running workloads on DigitalOcean, we realized how much we enjoyed using it ourselves.”

What started as a pragmatic infrastructure decision quickly became a preference. The simplicity of the platform and its focus on developer experience stood out immediately. “The whole UI and UX feels user friendly,” Carlo said. “Our researchers, especially those who aren’t cloud experts, love that you can click a few buttons and it just works. You don’t have to fight through 20 security hurdles before getting started.”

Prashanthi had a similar impression when she first joined the company. Having used other major cloud platforms, she found DigitalOcean refreshingly intuitive. “A lot of cloud platforms can feel overwhelming, with endless configuration options that make simple tasks take forever,” she said. “With DigitalOcean, it’s clean and straightforward. That’s exactly what you need when you’re moving quickly at a startup.”

Powering AI research and production with DigitalOcean

Today, Traversal’s infrastructure relies heavily on several DigitalOcean products: DigitalOcean Kubernetes (DOKS), GPU Droplets, and Serverless Inference through the Gradient™ AI platform.

Traversal deploys its core applications—including its AI agent, web servers, and supporting services—on DOKS, which Carlo described as the industry-standard backbone for scalable software. “Kubernetes is the de facto standard for modern applications,” he said. “With DOKS, we can spin up a cluster in minutes and know it’s stable and secure.”

On the AI side, the team uses Gradient™ AI GPU Droplets to handle the heavy lifting of training, fine-tuning, and evaluating machine learning models. Each enterprise customer’s observability data is unique, so Traversal customizes its models to better interpret that data. “We rely on GPU Droplets to evaluate, train, and fine-tune our models,” Carlo explained. “They’re flexible—you can effectively deploy anything you want on them.” Traversal primarily uses NVIDIA HGX H100 and H200 GPUs for these workloads.

The company also utilizes Serverless Inference for production-scale inference workloads, where flexibility and reliability are key. “Serverless Inference is fantastic because we can make as many calls as we need without worrying about provisioning infrastructure,” Carlo said. “It just scales automatically.”

Prashanthi agreed, emphasizing how valuable it is to have both AI tooling and infrastructure in one ecosystem. “Having everything under one umbrella (through the Gradient AI platform and DigitalOcean’s infrastructure) has been really helpful for us,” she said. “When you’re building fast, you don’t want to juggle multiple providers or spend time wiring systems together. With DigitalOcean, it all just works.”

Reliability that matches enterprise expectations

Reliability is critical for Traversal customers. “When our customers have an incident, we can’t have one too,” Prashanthi said. “We have to be fast, reliable, and available.” That’s why DigitalOcean’s performance and uptime has been so meaningful. Traversal makes millions of API calls each month through the Gradient AI platform and has achieved nearly 99.96% availability. “We’ve been able to reach almost 100% reliability for some of our products,” Prashanthi said.

Carlo credits this success to DigitalOcean’s balance of power and simplicity. “Every software system struggles with how many knobs to expose for control versus how many defaults just work,” he said. “Other cloud providers expose a ton of knobs, but you’re overwhelmed. DigitalOcean gets it right—you click a few buttons and everything just works as expected.”

Building the future of intelligent observability

As Traversal continues to scale, the team plans to deepen its use of GPU Droplets and build custom middleware to give their researchers more control over model training and deployment. “We see ourselves building more abstractions on top of GPU Droplets to give researchers full control of their training pipelines,” Carlo said. “And we’ll continue using Serverless Inference as we scale production workloads.”

For Traversal, DigitalOcean is more than a cloud provider, it enables their mission to help businesses understand their systems at a deeper level. As Prashanthi put it, “DigitalOcean helps us move faster, stay reliable, and focus on what really matters—building technology that helps engineers solve problems faster than ever before.”

More stories

Traversal uses DigitalOcean’s AI and GPU infrastructure to power advanced root cause analysis, helping enterprises understand complex system issues.

Powering high-performance image and video AI models on DigitalOcean’s reliable GPUs, fal helps developers build with generative media at scale.

From launch spikes to daily play, Double Eleven trusts DigitalOcean Droplets to keep Rust online.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.