Featured AI Products
Compute
Build, deploy, and scale cloud compute resources
Containers and Images
Safely store and manage containers and backups
Managed Databases
Fully managed resources running popular database engines
Management and Dev Tools
Control infrastructure and gather insights
Networking
Secure and control traffic to apps
Security
Help protect your account and resources with these security features
Storage
Store and access any amount of data reliably in the cloud
Browse all products
AI/ML
CMS
Data and IoT
Developer Tools
Gaming and Media
Hosting
Security and Networking
Startups and SMBs
Web and App Platforms
See all solutions
Community
Documentation
Developer Tools
Get Involved
Utilities and Help
Become a Partner
Marketplace
Pricing

How Data and Models Feed Computing

By Alejandro (Alex) Jaimes

Published: August 31, 2017
4 min read

This post is the second in a three-part series on artificial intelligence by DigitalOcean’s Head of R&D, Alejandro (Alex) Jaimes. (Click here to read the first installment.)

Not every company, nor every developer will have the resources or the time to collect vast amounts of data to create models from scratch. Fortunately, the same repetition that I described in my last post occurs within and across industries. Because of this, particularly with deep learning, we’ve seen two very important trends:

(1) creation and sharing of public data to build models; and
(2) sharing of the models themselves even when the data is not released.

While the companies that have the most data may never release it, such data is not a requirement for every problem. It’s clear, however, that teams that leverage existing public models and combine public and proprietary datasets will have a competitive advantage. They must be “smart” about how they use and leverage the data they are able to collect, again with an AI mindset and strategy in mind.

Supervised and Unsupervised Learning

The majority of successes in AI so far have been based on supervised learning, in which machine learning algorithms are fed with labeled data—labeled data refers to a sample group that can be identified with a meaningful label or tag—versus unlabeled data. Labeling data is expensive, time consuming, and difficult (e.g., maintaining the desired quality, dealing with subjectivity, etc). For this reason, the ideal algorithms will be “unsupervised”—in other words, learning from unlabeled data. While promising, those algorithms have not shown the success levels needed to have the desired impact. Teams should then rely on creative strategies to leverage existing datasets, and combine supervised and unsupervised methods for now.

A number of companies offer labeling and data collection services. But there are ways to use algorithms to simplify the manual labeling process (e.g., with a “small” dataset one can create an algorithm that labels a much larger unlabeled dataset, so that humans have to correct errors made by the algorithm instead of labeling all of the data from scratch), or to create synthetic datasets (e.g., by using algorithms to generate “fake” data that looks like the original data). The bottom line is that no matter what size the project is, there are almost always alternatives to either obtain new data or augment existing datasets.

AI as a Service

Generally, significant efforts are required in developing models to perform tasks in accurate, efficient ways. For that reason, many companies and teams focus on specific verticals—building functionalities that are limited, but that work well in practice (versus the ideal of building a “human-like” AI capable of doing many things at once).

In some cases, those functionalities can be applied across domains. Developing a speech recognition system from scratch, for example, is a major effort, and most companies and teams that need it would be better off using a service than building it from scratch.

As the AI industry advances, we can expect to see more and more of those functionalities coming from specific vendors and open source initiatives, similar to the way software is built today: combinations of libraries, APIs, and open source and commercial components, coupled with custom software for specific applications.

In addition, given the nature of AI, building an infrastructure that quickly scales as needs shift is a major challenge. This implies that running AI will mostly happen on the cloud. Note that in the new AI computing paradigm, growing datasets, experimentation, and constant “tweaking” of models is a critical component.

Therefore, AI will be used as a cloud-based service for many applications. That’s a natural progression and in many ways leads to the commoditization of AI, which will lead to greater efficiency, opportunities, innovation, and positive economic impact. In our next installment, we’ll explore what all of this means for today’s developers.

In line with the trends we’re seeing in research and industry, we’re releasing a powerful set of tools that allow developers to easily re-use existing models, work with large quantities of data, and easily scale, on the cloud. We encourage you to take a look at our machine learning one-click. What other tools or functionalities would you be interested in having us provide? Feel free to leave feedback in the comments section below.

*Alejandro (Alex) Jaimes is Head of R&D at DigitalOcean. Alex enjoys scuba diving and started coding in Assembly when he was 12. In spite of his fear of heights, he’s climbed a peak or two, gone paragliding, and ridden a bull in a rodeo. He’s been a startup CTO and advisor, and has held leadership positions at Yahoo, Telefonica, IDIAP, FujiXerox, and IBM TJ Watson, among others. He holds a Ph.D. from Columbia University.

Learn more by visiting his personal website or LinkedIn profile. Find him on Twitter: @tinybigdata.*

About the author

Alejandro (Alex) Jaimes

Author

Engineering

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.

Engineering

The Inference Alpha: Maximizing Frontier Models on AMD

Balaji Varadarajan

June 10, 2026
12 min read

Engineering

The Inference Tax: How Prefix-Aware Routing Eliminates the Hidden Cost of LLMs at Scale

Piyush Srivastava

June 1, 2026
13 min read

Engineering

DigitalOcean Serverless Inference: A Deep Dive

smehta

June 1, 2026
17 min read

Engineering

How Data and Models Feed Computing

By Alejandro (Alex) Jaimes

Published: August 31, 2017
4 min read

<- Back to blog home

This post is the second in a three-part series on artificial intelligence by DigitalOcean’s Head of R&D, Alejandro (Alex) Jaimes. (Click here to read the first installment.)

(1) creation and sharing of public data to build models; and
(2) sharing of the models themselves even when the data is not released.

Supervised and Unsupervised Learning

AI as a Service

Learn more by visiting his personal website or LinkedIn profile. Find him on Twitter: @tinybigdata.*

About the author

Alejandro (Alex) Jaimes

Author

Engineering

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.

Engineering