Claude 4 - the latest and greatest LLM family from Anthropic

Published on June 9, 2025

Technical Evangelist // AI Arcanist

Claude 4 - the latest and greatest LLM family from Anthropic

Here at DigitalOcean, we have been watching the LLM arms race closely. This is not only to stay on top of the newest developments, but also to know what the best performant models are for our customers. That way, we can always ensure the best guidance and experience for our userbase.

One of the companies to watch closely in this race is Anthropic. Since their inception, they have consistently been at the forefront of development for language models in general. On the GenAI Platform, and in the wider LLM user community in general, their flagship model Sonnet 3.7 has been one of the most popular and powerful models anywhere, and for very good reason.

Recently, Anthropic released their most powerful models yet: the Claude 4 family of LLMs. These SOTA models perform incredibly well across a wide range of different benchmarks, including the software engineering SWE-bench benchmark, Terminal bench for agentic coding, and AIME for math. Let’s take a closer look at the models, Sonnet and Opus, and discuss how they could be the best LLMs for coding, writing, and agentic use yet.

Claude 4: Opus and Sonnet

The Claude 4 family consists of two publicly available models, Sonnet and Opus. Each excels at different tasks, but the two of them are both incredibly capable for both typical LLM scenarios, reasoning, and agentic use. In this section, let’s compare and contrast the two models and their use-cases.

image comparing SOTA LLMs with Claude 4

To start, let’s look at what features the models share. Notably, these include but are not limited to:

Parallel tool use: “Both models can use tools in parallel, follow instructions more precisely, and, when given access to local files by developers, demonstrate significantly improved memory capabilities, extracting and saving key facts to maintain continuity and build tacit knowledge over time.”
Extended thinking with tool use: While reasoning, the models can use tools like web search to inform the model’s thought process, and then continue with reasoning to ensure a holistic and comprehensive understanding
Increased memory: the models show increased memory capabilities, able to learn and remember more key information to build understanding of the subject matter over time.
New API Capabilities: “We’re releasing four new capabilities on the Anthropic API that enable developers to build more powerful AI agents: the code execution tool, MCP connector, Files API, and the ability to cache prompts for up to one hour.”

All in all, both models have marked improvements in reasoning and agentic capabilities. Now, let’s look at how these models differ.

image comparing SOTA LLMs with Claude 4

First, we have Claude Opus 4. It is reportedly the most powerful and best coding model in existence, as reported by its scores on the SWE-bench (72.5%) and Terminal-bench (43.2%). The introductory blog post from Anthropic included several reports from organizations that did testing with the models, including Cursor and Replit. Consistently, they reported that the model excels at tasks that just weren’t possible with previous iterations of the model. Some of these include making good changes to a code base when used agentically, and maintaining a high level of precision and accuracy across thousands of steps and many hours of work.

The other model, Claude Sonnet 4, significantly improves on Sonnet 3.7’s industry-leading capabilities. In particular, it excels in coding, with a reported score of 72.7% on the SWE-bench. They assert that this model “balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations.” (Source) While it does not quite meet the capabilities of Claude Opus 4 in the majority of domains, its mix of practicality and capability make it an ideal model for agentic use cases. Testers from companies like Github and Manus reported that the model is ideal for agentic scenarios & had improved aesthetics, complex instruction following, and reasoning.

Using Claude 4 Models

Accessing Claude 4 models is as easy as having an Anthropic account! A free Anthropic account will give you free, limited access to the Sonnet 4 model in their chat playground. This model is connected with numerous tools that enable functionality far beyond a typical LLM deployment, including but not limited to internet search and using web APIs.

travel guide

We used Claude Sonnet 4 to generate a travel guide to Sayulita, Mexico using the available toolsets! We were impressed by how comprehensive the research was, as it included actual restaurants and places to visit there within. You can view the guide here, thanks to the publication feature provided by Anthropic to host the file.

We recommend getting started with Claude 4 by signing up for a free account, and upgrading to Pro if you have a more challenging use case, especially with regards to coding and agentic situations.

The Potential Impact of Claude 4 Models

As Anthropic has iterated on these models over the years, they have continuously moved further and further towards true functionality and utility as proper LLM powered coding assistants and agents. Opus and Sonnet in particular are a step forward for both use cases, and we have been impressed in our experimentation with the Sonnet 4 model with our testing.

As we showed earlier with the travel guide, the capabilities of Sonnet 4 for research purposes are truly extensive, and it is capable of generating production ready reports in just minutes. We assert that this model pipeline, and future model pipelines that mimic these capabilities, is a powerful new tool for researchers that should not be ignored. Pipelines like Claude 4’s have the potential to strike hours and hours of tedious research time down to minutes in applied situations. Watch out for how this technology accelerates research workflows going forward, across all industries.

Closing Thoughts

With Claude 4, things are moving even faster in the AI revolution. This model’s revolutionary long-term thinking ability makes it the go to model where long contexts are needed to be understood, and the agentic capabilities reported are truly next level. We look forward to getting our hands on the models for more testing in the following weeks.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products