Evaluating AI agents can be tricky, especially when your tools aren’t built around how you think and work. That’s why we’re excited to announce that we’ve updated our agent evaluations experience in the DigitalOcean Gradient™ AI Platform. These improvements make it faster and easier to evaluate your AI agents, understand results, and debug issues.
The original evaluations feature was powerful but presented friction points that made it hard for developers to adopt. This redesign tackles those challenges head-on:

Evaluations help you test and improve your AI agents systematically, making it easier to identify issues and optimize performance. For those just getting started, the preselected Safety & Security metrics and dataset examples let you quickly check for common issues like unsafe or biased outputs, giving greater confidence in your agent’s behavior.
For those scaling their agents, custom test cases, specialized metric groups like RAG Performance, and the ability to upload your own datasets provide deeper insights into agent performance. With trace integration, you can drill down into low scores to debug and improve your agent with precision. Evaluations make it faster to turn results into actionable improvements, helping developers at any stage build safer, more reliable AI agents.
Ready to put your agents to the test? Getting started with evaluations in the DigitalOcean Gradient™ AI Platform is simple.
For a step-by-step walkthrough, check out our tutorial, which guides you through creating test cases, selecting metrics, and interpreting evaluation results.
Take control of your AI’s performance, start evaluating your agents today to identify issues, optimize behavior, and deliver reliable, production-ready systems faster than ever.


