By kytayX
How can I customise the GPU Droplet provided by AMD?
When i run the following command ps -ef | grep vllm
, it returns
root 1890 1844 0 02:26 ? 00:00:00 /bin/sh -c vllm serve openai/gpt-oss-120b --port 8000 --tensor-parallel 1 --no-enable-prefix-caching --compilation-config '{"full_cuda_graph": true}'
I want to find a way to modify the vLLM configurations to include other flags, example https://docs.vllm.ai/en/latest/features/tool_calling.html, --enable-auto-tool-choice
How can I do this?
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Hey,
From what I’ve seen, the GPU Droplet images for GPT OSS 120B start vllm serve
automatically in the background. I’m not entirely sure if this is managed through systemd, Docker, or a custom startup script, but it looks like the process is launched with a predefined set of flags.
If you want to add extra ones like --enable-auto-tool-choice
, one approach could be to stop that default process and then run vllm serve
manually with your own configuration. Another option might be to check if there’s a systemd unit or container config file you can override, though I can’t say for certain how it’s wired up on DigitalOcean’s side.
Since this setup might be specific to the GPU droplets, I think it’s best to reach out to DigitalOcean Support and confirm the right way to customize those flags.
If you’re diving into large language models and want to run or customize the OpenAI GPT OSS 120B model on a powerful GPU droplet in DigitalOcean, here’s a straightforward step-by-step guide. This will help you set up, optimize, and tweak the model to suit your development or content-writing needs.
Select an Ubuntu 24.04 droplet (the latest stable LTS).
Choose a GPU-enabled droplet with sufficient VRAM (at least one NVIDIA A100 or similar recommended for GPT 120B).
Pick a size with enough RAM (256GB+ recommended) and SSD storage (fast NVMe preferred).
Once your droplet is ready:
sudo apt update && sudo apt upgrade -y
Install NVIDIA drivers and CUDA toolkit: DigitalOcean’s GPU droplets usually come with NVIDIA drivers pre-installed, but verify and install CUDA:
sudo apt install nvidia-driver-525 cuda-toolkit-12-0 -y
https://devtechnosys.com/fantasy-golf-app-development.php
Verify GPU status:
nvidia-smi
You should see your GPU listed with current usage stats.
sudo apt install python3.11 python3-pip -y
Create and activate a virtual environment:
python3.11 -m venv gpt-env source gpt-env/bin/activate
Install necessary libraries (PyTorch with CUDA support, transformers, etc.):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu120 pip install transformers accelerate datasets
The GPT OSS 120B model weights and configs are usually hosted on platforms like Hugging Face or GitHub.
Use git
or wget
to clone/download the model repository and weights.
Example:
git clone https://github.com/openai/gpt-oss-120b.git cd gpt-oss-120b
Or download weights using Hugging Face’s transformers
library if available.
Modify the configuration files (config.json
) to adjust parameters like:
Number of attention heads
Layer depth
Tokenizer settings
Max input length
Edit or extend the inference scripts to add custom prompts, adjust output length, or tweak decoding strategies (like temperature or top-k sampling).
Example snippet to change temperature:
outputs = model.generate(input_ids, max_length=256, temperature=0.7, top_k=50)
Use provided inference scripts or build your own.
Run a simple test prompt to verify everything is working:
python generate.py --prompt “Hello, DigitalOcean community!”
nvidia-smi
to ensure optimal performance.—#### 7. Optimize and Scale
Use mixed precision training/inference to save GPU memory and speed up execution (torch.cuda.amp
).
Leverage Distributed Data Parallel (DDP) if you have multiple GPUs.
Set up Docker containers for easier deployment and reproducibility.
Regularly update your droplet and Python packages.
Use DigitalOcean backups and snapshots before major changes.
Explore DigitalOcean’s Marketplace for ready-to-go AI droplets.
Document your customizations clearly for your clients or community.
This guide should empower freelancers and developers in your DigitalOcean community to confidently customize and run OpenAI GPT OSS 120B on a cutting-edge Ubuntu GPU droplet.
If you want, I can help with specific scripts or a Dockerfile template next. Just let me know!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.