Customise OpenAI GPT OSS 120b on Ubuntu 24.04 GPU Droplet

Question

How can I customise the GPU Droplet provided by AMD?

When i run the following command ps -ef | grep vllm, it returns

root        1890    1844  0 02:26 ?        00:00:00 /bin/sh -c vllm serve openai/gpt-oss-120b --port 8000 --tensor-parallel 1 --no-enable-prefix-caching --compilation-config '{"full_cuda_graph": true}'

I want to find a way to modify the vLLM configurations to include other flags, example https://docs.vllm.ai/en/latest/features/tool_calling.html, --enable-auto-tool-choice

How can I do this?

Bobby · Answer

Hey,

From what I’ve seen, the GPU Droplet images for GPT OSS 120B start vllm serve automatically in the background. I’m not entirely sure if this is managed through systemd, Docker, or a custom startup script, but it looks like the process is launched with a predefined set of flags.

If you want to add extra ones like --enable-auto-tool-choice, one approach could be to stop that default process and then run vllm serve manually with your own configuration. Another option might be to check if there’s a systemd unit or container config file you can override, though I can’t say for certain how it’s wired up on DigitalOcean’s side.

Since this setup might be specific to the GPU droplets, I think it’s best to reach out to DigitalOcean Support and confirm the right way to customize those flags.

James Wood · Answer

If you’re diving into large language models and want to run or customize the OpenAI GPT OSS 120B model on a powerful GPU droplet in DigitalOcean, here’s a straightforward step-by-step guide. This will help you set up, optimize, and tweak the model to suit your development or content-writing needs.

1. Choose the Right Droplet on DigitalOcean

Select an Ubuntu 24.04 droplet (the latest stable LTS).
Choose a GPU-enabled droplet with sufficient VRAM (at least one NVIDIA A100 or similar recommended for GPT 120B).
Pick a size with enough RAM (256GB+ recommended) and SSD storage (fast NVMe preferred).

2. Prepare Your Droplet Environment

Once your droplet is ready:

Update and upgrade packages:

sudo apt update && sudo apt upgrade -y

Install NVIDIA drivers and CUDA toolkit: DigitalOcean’s GPU droplets usually come with NVIDIA drivers pre-installed, but verify and install CUDA:

sudo apt install nvidia-driver-525 cuda-toolkit-12-0 -y

https://devtechnosys.com/fantasy-golf-app-development.php

Verify GPU status:

nvidia-smi

You should see your GPU listed with current usage stats.

3. Install Python and Dependencies

Install Python 3.11+ and pip:

sudo apt install python3.11 python3-pip -y

Create and activate a virtual environment:

python3.11 -m venv gpt-env source gpt-env/bin/activate

Install necessary libraries (PyTorch with CUDA support, transformers, etc.):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu120 pip install transformers accelerate datasets

4. Download the OpenAI GPT OSS 120B Model

The GPT OSS 120B model weights and configs are usually hosted on platforms like Hugging Face or GitHub.
Use git or wget to clone/download the model repository and weights.

Example:

git clone https://github.com/openai/gpt-oss-120b.git cd gpt-oss-120b

Or download weights using Hugging Face’s transformers library if available.

5. Customize the Model for Your Needs

Modify the configuration files (config.json) to adjust parameters like:
- Number of attention heads
- Layer depth
- Tokenizer settings
- Max input length
Edit or extend the inference scripts to add custom prompts, adjust output length, or tweak decoding strategies (like temperature or top-k sampling).

Example snippet to change temperature:

outputs = model.generate(input_ids, max_length=256, temperature=0.7, top_k=50)

6. Run and Test Your Customized GPT Model

Use provided inference scripts or build your own.
Run a simple test prompt to verify everything is working:

python generate.py --prompt “Hello, DigitalOcean community!”

Monitor GPU usage with nvidia-smi to ensure optimal performance.

—#### 7. Optimize and Scale

Use mixed precision training/inference to save GPU memory and speed up execution (torch.cuda.amp).
Leverage Distributed Data Parallel (DDP) if you have multiple GPUs.
Set up Docker containers for easier deployment and reproducibility.

Bonus Tips:

Regularly update your droplet and Python packages.
Use DigitalOcean backups and snapshots before major changes.
Explore DigitalOcean’s Marketplace for ready-to-go AI droplets.
Document your customizations clearly for your clients or community.

This guide should empower freelancers and developers in your DigitalOcean community to confidently customize and run OpenAI GPT OSS 120B on a cutting-edge Ubuntu GPU droplet.

If you want, I can help with specific scripts or a Dockerfile template next. Just let me know!