Report this

What is the reason for this report?

Does Gradient AI support vision / multimodal ?

Posted on September 18, 2025

I’m looking to do image analysis (extract title, description and keywords) for JPG images.

With Anthropic Claude e.g., I know I can do this directly through their API. However, I’m interested in GradientAI, to add a knowledgebase e.g.

So my question is, will I be able to use vision/multi-modal in GradientAI and with which of the currently supported LLMs?

Thanks!



This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Oh, I’ve been wondering the same! From what I’ve seen, Gradient AI mainly focuses on text-based tasks but i am not 100% sure about vision or multimodal support. Has anyone tried uploading images or combining text + image prompts with it? Curious if it actually handles that or not.

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.