Report this

What is the reason for this report?

What Tools Can We Use for Working Locally with AI?

Published on September 17, 2025
James Skelton

By James Skelton

Technical Evangelist // AI Arcanist

What Tools Can We Use for Working Locally with AI?

Vibe coding, the practice of using LLMs to either assist with the creation of or used to directly themselves create code, is becoming more and more popular; and for good reason! Not only does vibe coding seriously cut down on time spent by automating more simple parts of the engineering process, increasingly we are seeing entire projects generated by a series of careful prompts to an LLM.

To facilitate this, we have the ever-growing GPU cloud. Whether you are using a proprietary service like ChatGPT or Anthropic or hosting your own LLM on the cloud using platforms like Gradient TM from DigitalOcean, all these LLMs require powerful computing from GPUs to run. These machines are the engines that power the AI coding revolution unfolding before our eyes.

But what do we do when we cannot access the GPU Cloud?

In this tutorial, we look to provide one solution to that question. Follow along with us as we discuss some of the best options for local LLMs, local machines to run them, and tips for vibe coding offline that you won’t find anywhere else!

Key Takeaways

  • Your local machine’s power determines what kind of model you can run. We recommend at least 16 gb of accessible RAM to follow along with this tutorial.
  • Local agentic vibe coding is now possible thanks to small thinking models like Qwen3 2507 and Nemotron Nano v2
  • It is easy to get started on any OS with local tools like LM Studio and Ollama

The Best Local Agentic LLMs in September, 2025

To get started with local Agentic coding, there are a plethora of models we can run. This is both good and bad, as it can be somewhat difficult to discern which models are worth considering. There are numerous versions of many models, and they come at different sizes, so selecting the model that best fits your device can add an extra challenge. We are testing this on a 2021 MacBook Pro with an M1 Pro chip with 16 GB of memory. Let’s take a look at some of the best local Agentic LLMs currently available, and discuss which we should be using.

Qwen3 2507

Qwen3, in our opinion, is the first place to start with local agentic modeling. Not only is this suite of models one of the top performant across a multitude of different discipline’s benchmarks, it is also objectively a robust model for agentic tasks. Both the thinking and instruct varieties of these models are incredibly powerful, and the 2507 variants are even better than the originals.

We highly recommend the Qwen3 2507 models as the first choice for local agentic modeling and vibe coding. Our computer can only handle the 8b variety, but the 30b-a3b mixture of experts model is significantly better. In our subjective experience, Qwen3 was the easiest to make use of as an assistant to our coding workflow.

Nemotron Nano v2

NVIDIA’s Nemotron Nano v2 is also an exceptional choice for agentic modeling. Coming in 9b and 12b variants, these models are another of our favorites for optimizing and editing code and vibe-coding. This model suite was trained completely from scratch by NVIDIA using the Nemotron-H architecture. A unified model for both reasoning and non-reasoning tasks, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response.

In our testing, we were very impressed with its ability to use the tools enabled by the various IDE’s we used. It performed comparably to the Qwen3 2507 8b model, and is capable of running on the relatively low amount of VRAM available on the M1 Pro.

GPT-OSS

Arguably the best open source model for local developers available right now, GPT-OSS from OpenAI is our recommendation to anyone with at least 24GB of VRAM on an NVIDIA or AMD consumer GPU. The GPT-OSS 20b variant in particular was trained with GPUs of this size in mind.

GPT-OSS is a fantastic agentic and coding model. It performs incredibly strongly on tool use, few-shot function calling, CoT reasoning, and the medical HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o).

Hosting the Models

There are a myriad of ways to host local models, depending on our use case. In this section, we will talk about our two recommendations for hosting LLMs with local hardware. Our favorite hosting applications for LLMs are LM Studio and Ollama.

Follow along for tips for getting started with these tools, and choosing which is best for you.

LM Studio

The first application we want to introduce is LM Studio, a tool designed to let users run models like gpt-oss, Qwen, Gemma, DeepSeek and many more on your own, local computer - privately and for free. LM Studio’s UI is both sleek and user friendly, with an intuitive user interface for selecting, downloading, and running Language Models through the application, offering the ability to run both GGUF and MLX versions of models. The application also offers a myriad of different integrations and custom build options that make the application even more useful, like RAG.

We recommend LM Studio for Mac users in particular, as they can take the best advantage of the application.

Ollama

Another project we really love is Ollama. Ollama was one of the first projects to spin off of the open-source Llama.cpp, and continues to be one of the most popular LLM services available for the open-source community. We like Ollama’s command line interface, which makes interacting with, downloading new, and organizing your models as simple as possible. It offers much of the same capabilities as LM Studio, including the ability to create a server to host the model.

Ollama is perfect for Linux users, as it makes it straightforward to interact with the models through the command line.

Coding with Local Large Language Models

Coding with the assistance of LLMs, often called vibe coding, is becoming increasingly popular and useful as these models become more and more sophisticated. But successfully doing so without access to either a powerful desktop GPU or the cloud is easier said than done. With this tutorial, we aim to show how the models and services introduced earlier in this article can be used with local hardware to successfully vibe code.

VS Code Continue

Our favorite way to vibe code with local models is VS Code Continue, an integration to the popular IDE that makes it simple to use agentic LLMs within a coding workflow. With VS Code Continue, we can access the endpoints created by LM Studio or Ollama to interact with our local files.

To get started with the integration, first install Ollama/LM Studio and Visual Studio Code. Once that’s done, and you have downloaded the model you want to use onto the service of your choice, search the VS Code extensions browser for Continue and install it.

Once installed, we can access the extension via the sidebar on the left using the Continue logo button. From here, we can interact with our local model by configuring the chat agent window to detect models from LM Studio and Ollama. This will make it possible to switch between models hosted across each application with ease.

Continue has three default templates that the agent can use to interact with you: Agent, Plan, and Chat. The first two have access to in-built tooling that makes it possible to interact with the files, with the former being more capable at editing files. The chat option just lets us speak with the model with context. We found a lot of success using the three different modes for the express purpose: to Chat about the content, to Plan changes, and to use the Agent to implement those changes automatically.

The limitations of Continue are really the limitations of the model being used. As models continue to advance, the capabilities of the model tool use will only improve substantially. We were really impressed with the ability of all three models to improve our code and automate simple coding processes as we worked offline. We recommend Continue to people used to using VS Code and its forks like Cursor, especially on Mac or Windows computers.

Zed

Our other favorite offline vibe coding IDE is Zed. Zed is a free and open-source text and source-code editor for Linux and macOS developed by Zed Industries for coding with Language Model assistance. It is a powerful tool for code editing and automation.

To get started with Zed, download the application file from their website, and install the application. Once that is complete, open it and use the filebar to open a directory on your local machine that you wish to make edits with.

Using the IDE, we can chat with our hosted LM Studio or Ollama models by clicking on the second to last icon on the bottom right of the window. Select the model you would like to use before continuing. Once selected, we can decide which profile to run our model in: Write, Ask, and Minimal.

Similar to the templates from Continue, these different profiles are outfitted with different access to tools. The Write configuration is what we can use to make changes to our files using prompt requests to the model, the Ask can answer questions about the files, and the Minimal profile can just be used to chat. If you have more tools you wish to create and then integrate with Zed, we can use the profiles to do so.

In our experience, Zed is an awesome tool for such work. It has excellent built-in tools that make it very easy to code with LLM assistance, even doing editing and writing novel code. We recommend Zed for Mac and Linux users on the go.

Closing Thoughts

In conclusion, the world of local development is beginning to be revolutionized by the availability of edge models, and applications that make coding with these models easier than ever before. In this article, we introduced our favorite models for code editing, two excellent services for hosting these models, and our favorite IDE integrations that make use of these services to facilitate vibe coding. We encourage you to try all of them to see which fits your workflow best.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

James Skelton
James Skelton
Author
Technical Evangelist // AI Arcanist
See author profile
Category:
Tags:

Still looking for an answer?

Was this helpful?


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Creative CommonsThis work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.