By roconnor and Anish Singh Walia
When building LLM-powered applications, model availability can be unpredictable. Rate limits, temporary outages, or regional availability issues can break your application at the worst possible time. This post demonstrates how to build resiliency into your LangChain app through automatic model fallback via DigitalOcean’s serverless inferencing. You’ll learn how to:
The DigitalOcean Gradient Platform revolutionizes the development of AI applications by providing a seamless integration of various AI models. The serverless inference feature is a game-changer, allowing you to access a wide range of popular open-source and proprietary models through a single, unified API. This approach offers numerous advantages, including:
By leveraging serverless inference, you can focus on building and deploying AI applications without worrying about the underlying infrastructure, model management, or complex billing processes. This enables you to accelerate your AI development, reduce costs, and improve the overall efficiency of your projects.
Execute the below code to clone and run the repository - you can read the code explanation below. This project uses the package manager uv
, so ensure it is installed on your system.
# Clone the repository
git clone https://github.com/do-community/langchain-gradient-ai-switch-providers.git
cd langchain-gradient-ai-switch-providers
# Install dependencies with uv
uv sync
# Set up environment
cp .env.example .env
Next, go to your DigitalOcean Cloud console, select Agent Platform
in the sidebar, then click the Serverless inference
tab, and finally the Create model access key
button. Name your key, and then paste its value into the .env
file as the value for the DIGITALOCEAN_INFERENCE_KEY
environment variables.
Now you can run the code:
uv run main.py
This will prompt LLaMa 3 70B (Instruct), falling back to LLaMa 3 8B (Instruct) if there is an error. To simulate a failure, use the --mock
flag:
uv run main.py --mock
This will create a mock failure for the first call, falling back to the secondary model to successfully complete the request.
The models.py
file contains an enumeration of all of the models available in DigitalOcean Serverless Inference, including Claude, GPT, and Llama. Defining an enumeration provides static type checking support, IDE completion, and prevents typos that may cause runtime errors.
from enum import Enum
class GradientModel(Enum):
LLAMA3_3_70B_INSTRUCT = "llama3.3-70b-instruct"
ANTHROPIC_CLAUDE_3_5_SONNET = "anthropic-claude-3.5-sonnet"
OPENAI_GPT_4O = "openai-gpt-4o"
# ...
ChatGradientAI
, in the langchain-gradientai
integration, is the LangChain object that provides access to the Gradient AI Platform. We extend this class to add a fallback mechanism.
The FallbackChatGradientAI
class takes in a list of models to try in sequential order when making an LLM call. If one model fails, it moves on to the next and repeats the call until there is a success. If all models fail, it raises an exception.
class FallbackChatGradientAI:
def __init__(
self,
models: List[GradientModel],
api_key: Optional[str] = None,
**kwargs
):
if not models:
raise ValueError("At least one model must be provided")
self.models = models
self.api_key = api_key or os.getenv("DIGITALOCEAN_INFERENCE_KEY")
self.kwargs = kwargs
if not self.api_key:
raise ValueError("API key must be provided or set in DIGITALOCEAN_INFERENCE_KEY")
def invoke(self, input_data: Any) -> Any:
last_exception = None
for i, model in enumerate(self.models):
logger.info(f"Attempting request with model: {model.value}")
try:
llm = self._create_llm(model)
result = llm.invoke(input_data)
if i > 0:
logger.info(f"Successfully fell back to model: {model.value}")
return result
except Exception as e:
logger.warning(f"Model {model.value} failed: {str(e)}")
last_exception = e
continue
# If we get here, all models failed
raise Exception(f"All models failed. Last error: {str(last_exception)}")
This design features:
The MockFallbackChatGradientAI
overrides FallbackChatGradientAI
by creating a mock object in place of the first LLM that intentionally raises an exception when called:
class MockFallbackChatGradientAI(FallbackChatGradientAI):
def _create_llm(self, model: GradientModel) -> ChatGradientAI:
if self.fail_first and model == self.models[0]:
mock_instance = Mock()
mock_instance.invoke.side_effect = Exception("Mocked failure for testing")
return mock_instance
return super()._create_llm(model)
The main scripts, main.py
contains three isolated examples, accessed via different command-line flags:
uv run main.py --basic
)uv run main.py
)uv run main.py --mock
)\LLM fallback significantly improves application reliability by automatically switching to backup AI models when the primary model fails. This approach eliminates single points of failure and ensures your application continues functioning even during model outages, rate limiting, or regional availability issues. With DigitalOcean’s serverless inference, you can access multiple models through a single API, making fallback implementation seamless and cost-effective.
Key benefits:
Retry logic and fallback strategies serve different purposes in LLM applications:
Retry Logic:
Fallback Strategies:
The implementation in this tutorial uses zero-retry fallback for immediate switching, which is ideal for production environments where speed and reliability are critical.
Selecting the right models for your fallback strategy involves balancing several factors:
Primary Model Selection:
Fallback Model Considerations:
Example Strategy:
models = [
GradientModel.LLAMA3_3_70B_INSTRUCT, # High quality, higher cost
GradientModel.LLAMA3_3_8B_INSTRUCT, # Good quality, moderate cost
GradientModel.ANTHROPIC_CLAUDE_3_5_SONNET # Alternative provider
]
Yes, you can implement LLM fallback with other AI providers, but DigitalOcean’s approach offers unique advantages:
Multi-Provider Fallback:
DigitalOcean Gradient AI Advantages:
Implementation Example:
# Multi-provider approach (complex)
class MultiProviderFallback:
def __init__(self):
self.openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
self.anthropic_client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
self.gradient_client = ChatGradientAI(api_key=os.getenv("GRADIENT_API_KEY"))
# DigitalOcean approach (simple)
class SimpleFallback:
def __init__(self):
self.client = FallbackChatGradientAI(
models=[GradientModel.LLAMA3_3_70B_INSTRUCT, GradientModel.ANTHROPIC_CLAUDE_3_5_SONNET]
)
Effective monitoring and debugging of fallback behavior is crucial for production applications:
Key Metrics to Track:
Implementation Strategies:
import logging
from datetime import datetime
class MonitoredFallbackChatGradientAI(FallbackChatGradientAI):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.metrics = {
'total_requests': 0,
'fallbacks_triggered': 0,
'model_performance': {},
'errors': []
}
def invoke(self, input_data: Any) -> Any:
start_time = datetime.now()
self.metrics['total_requests'] += 1
try:
result = super().invoke(input_data)
self._record_success(start_time)
return result
except Exception as e:
self._record_error(e)
raise
def _record_success(self, start_time):
duration = (datetime.now() - start_time).total_seconds()
# Log success metrics and performance data
logging.info(f"Request completed in {duration}s")
def _record_error(self, error):
self.metrics['errors'].append({
'timestamp': datetime.now(),
'error': str(error),
'model_attempted': getattr(self, 'current_model', 'unknown')
})
Monitoring Tools:
Debugging Tips:
Congratulations! You’ve successfully learned how to implement automatic LLM fallback in your LangChain applications using DigitalOcean Gradient Platform. This approach provides production-ready resilience that ensures your AI-powered applications remain operational even when individual models experience issues.
The fallback strategy you’ve implemented offers several advantages over traditional single-model approaches:
Get started with DigitalOcean Gradient Platform and build your own resilient LLM applications today. The platform provides everything you need to implement robust fallback strategies, from unified API access to comprehensive monitoring tools.
Visit the official Langchain Gradient Platform GitHub repository to read the code, or check out some of the other repos in the DigitalOcean Community organization.
Check out some of our other similar tutorials, like:
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.