I’m looking to do image analysis (extract title, description and keywords) for JPG images.
With Anthropic Claude e.g., I know I can do this directly through their API. However, I’m interested in GradientAI, to add a knowledgebase e.g.
So my question is, will I be able to use vision/multi-modal in GradientAI and with which of the currently supported LLMs?
Thanks!
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Oh, I’ve been wondering the same! From what I’ve seen, Gradient AI mainly focuses on text-based tasks but i am not 100% sure about vision or multimodal support. Has anyone tried uploading images or combining text + image prompts with it? Curious if it actually handles that or not.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.