Does Gradient AI support vision / multimodal ?

Posted on September 18, 2025

I’m looking to do image analysis (extract title, description and keywords) for JPG images.

With Anthropic Claude e.g., I know I can do this directly through their API. However, I’m interested in GradientAI, to add a knowledgebase e.g.

So my question is, will I be able to use vision/multi-modal in GradientAI and with which of the currently supported LLMs?

Thanks!

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

yolanda smith

September 19, 2025

Oh, I’ve been wondering the same! From what I’ve seen, Gradient AI mainly focuses on text-based tasks but i am not 100% sure about vision or multimodal support. Has anyone tried uploading images or combining text + image prompts with it? Curious if it actually handles that or not.

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Learn more

Resources for startups and AI-native businesses

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Learn more

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

View all products

Get started for free

Get started

*This promotional offer applies to new accounts only.