How To Build an Image Classifier with PyTorch and Docker on DigitalOcean GPU Droplets

Question

Introduction

GPU acceleration can reduce machine learning training times from hours to minutes, making AI development accessible for individual developers and small teams. In this tutorial, you will build a complete image classification system using PyTorch on DigitalOcean’s GPU droplets, containerize it with Docker, and see firsthand how GPU acceleration improves performance.

You’ll create a neural network that can classify images from the CIFAR-10 dataset (airplanes, cars, birds, cats, etc.) and compare training times between CPU and GPU processing. By the end, you’ll have a working image classifier running in a Docker container that you can modify for your own projects.

Prerequisites

Before you begin this guide, you’ll need:

A DigitalOcean account with billing enabled
Basic familiarity with Python and command-line operations
SSH access configured for connecting to droplets

KFSys · Accepted Answer

Step 1 — Creating Your GPU Droplet First, you’ll create a GPU droplet using DigitalOcean’s AI/ML-ready image, which includes pre-installed NVIDIA drivers and development tools. Log into your DigitalOcean control panel and click Create → Droplets . Select GPU Droplets from the options. Choose the AI/ML Ready v1.0 image under the Marketplace tab. This Ubuntu 22.04 image includes CUDA 12.9, NVIDIA drivers, and Docker with GPU support pre-configured. For this tutorial, select the RTX 4000 Ada Generation plan ($0.76/hour) which provides 20GB GPU memory—sufficient for learning projects while keeping costs manageable. Choose your preferred datacenter region (NYC2, TOR1, or ATL1 support GPU droplets) and add your SSH key for secure access. Click Create Droplet and wait 2-3 minutes for initialization to complete. Step 2 — Connecting and Verifying GPU Access Once your droplet is running, connect via SSH using the IP address provided: ssh root@your_droplet_ip Verify that your GPU is detected and functioning: nvidia-smi You’ll see output similar to this: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 575.xx.xx Driver Version: 575.xx.xx CUDA Version: 12.9 | |-------------------------------+----------------------+----------------------+ | 0 NVIDIA RTX 4000 Ada | 00000000:01:00.0 On | N/A | | 35% 45C P0 70W / 130W | 0MiB / 20475MiB | 0% Default | +-----------------------------------------------------------------------------+ This confirms your GPU is accessible with the correct drivers installed. Test Docker GPU access: docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi If successful, you’ll see the same GPU information displayed from within the container. Step 3 — Setting Up Your Development Environment Create a project directory and set up a Python virtual environment for development: mkdir ~/image-classifier && cd ~/image-classifier python3 -m venv ml-env source ml-env/bin/activate Install PyTorch with CUDA support and other required packages: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 pip install matplotlib numpy pillow Verify PyTorch can access your GPU: python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}'); print(f'GPU name: {torch.cuda.get_device_name(0)}')" You should see output confirming CUDA is available with your RTX 4000 Ada GPU detected. Step 4 — Creating the Image Classification Script Create a Python script that builds and trains a convolutional neural network: nano train_classifier.py Add the following complete implementation: python import torch import torch.nn as nn import torch.optim as optim import torchvision import torchvision.transforms as transforms import time import matplotlib.pyplot as plt # Define the CNN architecture class ImageClassifier(nn.Module): def __init__(self): super(ImageClassifier, self).__init__() self.conv1 = nn.Conv2d(3, 32, 3, padding=1) self.conv2 = nn.Conv2d(32, 64, 3, padding=1) self.conv3 = nn.Conv2d(64, 64, 3, padding=1) self.pool = nn.MaxPool2d(2, 2) self.fc1 = nn.Linear(64 * 4 * 4, 512) self.fc2 = nn.Linear(512, 10) self.dropout = nn.Dropout(0.5) def forward(self, x): x = self.pool(torch.relu(self.conv1(x))) x = self.pool(torch.relu(self.conv2(x))) x = self.pool(torch.relu(self.conv3(x))) x = x.view(-1, 64 * 4 * 4) x = self.dropout(torch.relu(self.fc1(x))) x = self.fc2(x) return x def train_model(device_type='cuda', epochs=5): # Set device device = torch.device(device_type if torch.cuda.is_available() and device_type == 'cuda' else 'cpu') print(f"Training on: {device}") # Data loading and preprocessing transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2) # Initialize model, loss function, and optimizer model = ImageClassifier().to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) # Training loop with timing start_time = time.time() for epoch in range(epochs): running_loss = 0.0 epoch_start = time.time() for i, (inputs, labels) in enumerate(trainloader): inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() if i % 100 == 99: print(f'[Epoch {epoch + 1}, Batch {i + 1}] Loss: {running_loss / 100:.3f}') running_loss = 0.0 epoch_time = time.time() - epoch_start print(f'Epoch {epoch + 1} completed in {epoch_time:.2f} seconds') total_time = time.time() - start_time print(f'
Total training time ({device}): {total_time:.2f} seconds') print(f'Average time per epoch: {total_time/epochs:.2f} seconds') return model, total_time if __name__ == "__main__": print("Starting CIFAR-10 Image Classification Training") print("=" * 50) # Train on GPU gpu_model, gpu_time = train_model('cuda', epochs=2) # Train on CPU for comparison print("
" + "=" * 50) print("Now training on CPU for comparison...") cpu_model, cpu_time = train_model('cpu', epochs=2) # Performance comparison print("
" + "=" * 50) print("PERFORMANCE COMPARISON:") print(f"GPU Training Time: {gpu_time:.2f} seconds") print(f"CPU Training Time: {cpu_time:.2f} seconds") print(f"GPU Speedup: {cpu_time/gpu_time:.1f}x faster") print("=" * 50) Save the file and exit the editor. Step 5 — Running the Training Comparison Execute the training script to see GPU acceleration in action: python3 train_classifier.py The script will first train the model using your RTX 4000 Ada GPU, then repeat the training on CPU for comparison. You’ll see output similar to: Training on: cuda [Epoch 1, Batch 100] Loss: 1.523 [Epoch 1, Batch 200] Loss: 1.234 Epoch 1 completed in 28.45 seconds [Epoch 2, Batch 100] Loss: 1.089 Epoch 2 completed in 27.89 seconds GPU Training Time: 56.34 seconds Training on: cpu [Epoch 1, Batch 100] Loss: 1.534 Epoch 1 completed in 312.67 seconds ... PERFORMANCE COMPARISON: GPU Training Time: 56.34 seconds CPU Training Time: 625.78 seconds GPU Speedup: 11.1x faster The GPU typically provides 8-15x speedup for this workload, demonstrating the significant performance benefits. Step 6 — Containerizing Your Application Create a Dockerfile to package your classifier for deployment: nano Dockerfile Add the following multi-stage build configuration: FROM nvidia/cuda:12.6.0-cudnn-devel-ubuntu22.04 # Set working directory WORKDIR /app # Install Python and pip RUN apt-get update && apt-get install -y \ python3 \ python3-pip \ && rm -rf /var/lib/apt/lists/* # Install Python dependencies COPY requirements.txt . RUN pip3 install --no-cache-dir -r requirements.txt # Copy application code COPY train_classifier.py . COPY . . # Set environment variables ENV PYTHONUNBUFFERED=1 # Run the training script CMD ["python3", "train_classifier.py"] Create a requirements file: nano requirements.txt Add the dependencies: torch>=2.0.0 torchvision>=0.15.0 torchaudio>=2.0.0 matplotlib>=3.5.0 numpy>=1.21.0 pillow>=8.3.0 Build your Docker image: docker build -t image-classifier:gpu . Step 7 — Running Your Containerized Application Run your containerized classifier with GPU access: docker run --gpus all --rm image-classifier:gpu The container will execute the training script and display the same GPU vs CPU performance comparison from within the isolated environment. For interactive development, run the container with a bash shell: docker run --gpus all -it --rm -v $(pwd):/app image-classifier:gpu bash This mounts your current directory into the container, allowing you to modify code and immediately test changes. Step 8 — Monitoring GPU Usage While your training is running, open a second SSH connection to monitor GPU utilization: ssh root@your_droplet_ip nvidia-smi -l 1 This displays real-time GPU metrics updated every second. During training, you should see: GPU utilization : 80-100% during active training Memory usage : Approximately 2-4GB for this example Temperature : Should remain below 90°C Power consumption : Near the card’s rated power draw Understanding these metrics helps you optimize performance and ensure efficient resource utilization. Step 9 — Modifying for Your Own Projects You can adapt this classifier for different datasets and use cases: For custom image datasets , modify the data loading section: # Replace CIFAR-10 with your own dataset transform = transforms.Compose([ transforms.Resize((224, 224)), # Resize for different input sizes transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # ImageNet standards ]) # Use ImageFolder for custom datasets dataset = torchvision.datasets.ImageFolder(root='./your_data', transform=transform) For transfer learning , replace the model definition: import torchvision.models as models model = models.resnet18(pretrained=True) model.fc = nn.Linear(model.fc.in_features, num_classes) # Adjust final layer For larger models , increase batch size and utilize more GPU memory: trainloader = torch.utils.data.DataLoader(trainset, batch_size=256, shuffle=True) Conclusion You have successfully built and deployed a GPU-accelerated image classification system on DigitalOcean. You created a convolutional neural network with PyTorch, demonstrated 8-15x performance improvements using GPU acceleration, containerized the application with Docker, and learned to monitor GPU utilization. The performance comparison clearly shows why GPU acceleration is essential for machine learning workflows. Your RTX 4000 Ada GPU reduced training time from over 10 minutes to under 1 minute for this example—a speedup that becomes even more dramatic with larger, more complex models. You can now extend this foundation to build more sophisticated AI applications, experiment with different neural network architectures, or deploy production-ready machine learning services. The containerized approach ensures your applications will run consistently across different environments while maintaining access to GPU acceleration. For next steps, consider exploring larger datasets, implementing model serving with FastAPI, or scaling to multi-GPU training for even faster performance.