Technical Writer

Creating high-quality, SEO-optimized content at scale can now be done efficiently with modern AI capabilities. Instead of treating content generation as a manual, one-off task, you can build a structured pipeline that transforms a list of topics into well-formatted, consistent outputs.
By utilising DigitalOcean Serverless Inference, we have created a pipeline that allows you to process multiple inputs simultaneously, significantly improving efficiency while maintaining control over structure and quality. In this guide, generating SEO briefs and articles is used as a practical example to demonstrate how such a system can automate workflows and handle high-throughput content generation.
This pipeline-driven method is also highly flexible. With small adjustments, it can be extended beyond blog articles to support a wide range of use cases, such as:
With this approach, we will:
All of this is implemented using a lightweight Python pipeline and an interactive Gradio interface, making it easy to run bulk content generation workflows with minimal setup.

What is Serverless Inference?
Serverless inference is a way to use AI models without setting up or managing any servers. Instead of downloading models, configuring GPUs, and maintaining infrastructure, you simply send a request (API call) to a cloud service, and it returns the result. Everything else, like scaling, performance, and availability, is handled automatically in the background and with token-based billing.
In this tutorial, we are using a GPU-powered setup to speed up inference and handle bulk content generation efficiently, but feel free to use any environment that fits your needs, such as a CPU-based machine, a local setup, or even a fully managed serverless inference endpoint if you want to avoid infrastructure management altogether.
Bulk inference using Large Language Models (LLMs) refers to the process of running a model on multiple inputs automatically, rather than processing each input individually. Instead of sending one prompt at a time and waiting for a response, bulk inference allows you to handle a large number of prompts in a single workflow, making the entire process faster and more efficient.
In this case, we will use bulk inference to generate articles for different topics. The traditional approach would involve submitting each topic one by one, waiting for the model to generate a response, and repeating the process until all articles are created. With bulk inference, you can provide all 100 topics at once, and the system processes them sequentially or in parallel behind the scenes, delivering all outputs in a streamlined manner.
The main advantage of bulk inference lies in its ability to scale. It saves time, improves efficiency, and enables automation, making it an essential technique when working with LLM-powered systems.
Before we start, we will first need to create a GPU Droplet for our use case. In our previous tutorials, we provided step-by-step instructions for creating a GPU Droplet; please feel free to refer to the documentation for more information.
ssh root@your-server-ip
This can also be done using VS Code; working entirely with the terminal can sometimes be tricky, so that’s why using a familiar IDE like Visual Studio Code can make the process much easier.
Please note:- You can run this setup on any infrastructure you prefer, but for this tutorial, we’ll use a GPU-enabled Droplet to speed up performance.
To start using serverless inference on DigitalOcean Gradient AI Platform, you need to create and use a model endpoint with an access key. Follow these steps:


This key will be used to authenticate all your inference requests.
In this section, we’ll break down how the entire pipeline works from reading topics to generating SEO briefs and full articles and displaying everything through a simple UI built with Gradio.
Below is the full implementation of the pipeline:
import gradio as gr
import os
import zipfile
import pandas as pd
from datetime import datetime
from openai import OpenAI
from dotenv import load_dotenv
# -----------------------------
# Load API Key
# -----------------------------
load_dotenv()
client = OpenAI(
base_url="https://inference.do-ai.run/v1/",
api_key=os.getenv("DO_API_KEY"),
)
MODEL_NAME = "llama3-8b-instruct"
# -----------------------------
# LLM Calls (Serverless)
# -----------------------------
def generate_seo_brief(topic: str) -> str:
prompt = f"""
You are an SEO expert.
Topic: {topic}
Generate:
- SEO Title
- Meta Description
- Target Keywords
- Article Outline
- URL Slug
"""
response = client.chat.completions.create(
model=MODEL_NAME,
messages=[
{"role": "system", "content": "You are an expert SEO strategist."},
{"role": "user", "content": prompt}
],
temperature=0.7,
)
return response.choices[0].message.content
def generate_article(topic: str, seo_brief: str) -> str:
prompt = f"""
You are a professional technical content writer.
Write a detailed SEO-optimized article.
Topic: {topic}
SEO Brief:
{seo_brief}
- Use headings
- Add examples
- Make it engaging
"""
response = client.chat.completions.create(
model=MODEL_NAME,
messages=[
{"role": "system", "content": "You are a technical content writer."},
{"role": "user", "content": prompt}
],
temperature=0.7,
)
return response.choices[0].message.content
# -----------------------------
# File Helpers
# -----------------------------
def read_topics(file):
if file.name.endswith(".csv"):
df = pd.read_csv(file.name)
return df.iloc[:, 0].dropna().tolist()
elif file.name.endswith(".txt"):
with open(file.name, "r") as f:
return [line.strip() for line in f.readlines() if line.strip()]
else:
return []
def save_markdown(topic, seo, article, folder):
filename = topic.replace(" ", "_").replace("/", "_")
filepath = os.path.join(folder, f"{filename}.md")
with open(filepath, "w", encoding="utf-8") as f:
f.write(f"# {topic}\n\n")
f.write("## SEO Brief\n\n")
f.write(seo + "\n\n")
f.write("## Article\n\n")
f.write(article)
return filepath
def create_zip(folder):
zip_path = f"{folder}.zip"
with zipfile.ZipFile(zip_path, "w") as zipf:
for file in os.listdir(folder):
zipf.write(os.path.join(folder, file), file)
return zip_path
# -----------------------------
# Main Pipeline
# -----------------------------
def process_file(file):
topics = read_topics(file)
if not topics:
return "❌ No valid topics found.", None
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_folder = f"outputs_{timestamp}"
os.makedirs(output_folder, exist_ok=True)
logs = []
for topic in topics:
logs.append(f"Processing: {topic}")
seo = generate_seo_brief(topic)
article = generate_article(topic, seo)
save_markdown(topic, seo, article, output_folder)
logs.append(f"✅ Done: {topic}")
zip_file = create_zip(output_folder)
return "\n".join(logs), zip_file
# -----------------------------
# Gradio UI
# -----------------------------
with gr.Blocks() as app:
gr.Markdown("# 🚀 Bulk SEO + Article Generator (Serverless Inference)")
file_input = gr.File(label="Upload .txt or .csv with topics")
output_logs = gr.Textbox(label="Processing Logs", lines=15)
download_file = gr.File(label="Download Markdown ZIP")
run_btn = gr.Button("Generate in Bulk")
run_btn.click(
fn=process_file,
inputs=file_input,
outputs=[output_logs, download_file]
)
if __name__ == "__main__":
app.launch(server_name="0.0.0.0")
Let’s break down how each part of the pipeline works.
client = OpenAI(
base_url="https://inference.do-ai.run/v1/",
api_key=os.getenv("DO_API_KEY"),
)
MODEL_NAME = "llama3-8b-instruct"
Here, the code will initialize a client to interact with serverless inference endpoints. The model used is Llama 3 8B Instruct, which is lightweight and efficient for content generation; however, you can choose any other model. The best model for any specific use case can be determined by comparing different models. This comparison is accessible within the Model Playground of the Agent Platform.

The API key is securely loaded using environment variables. This setup allows you to run inference without managing your own model hosting.
The functions generate_seo_brief and generate_article are responsible for interacting with the local LLM via Ollama.
.csv and .txt files to extract topics.These functions demonstrate how you can perform programmatic LLM inference locally, without any API keys.
The helper functions manage input and output:
.csv or .txt file.This ensures that the pipeline produces clean, portable outputs that can be reused or published.
The process_file function orchestrates the entire workflow:
This is the core orchestrator:
Reads topics
Creates a timestamped output folder
Loops through each topic:
Logs progress for UI display
Returns:
This is where bulk inference happens, utilising the GPU to process multiple inputs efficiently.
To make the pipeline accessible, we use Gradio to create a simple web interface.

The interface includes:
This allows even non-developers to use the pipeline with ease.
Start the application with:
python app.py
.csv or .txt file containing all the topics.

Here is how the project is organized to keep the code clean, reusable, and easy to maintain. Each folder is responsible for a specific part of the pipeline, making it easier to extend or modify in the future.
bulk-content-generator/
│
├── app.py
├── main.py
│
├── config/
│ └── settings.py
│
├── services/
│ ├── llm.py
│ └── pipeline.py
│
├── utils/
│ ├── file_handler.py
│ └── zip_utils.py
│
├── data/
│ └── sample_topics.csv
│
├── outputs/
│
├── .env
├── .gitignore
├── requirements.txt
└── README.md
This bulk inference pipeline is highly flexible and can be applied across a variety of real-world scenarios. For SEO content generation at scale, it enables teams to generate hundreds of blog posts from a simple list of topics, automatically creating both SEO briefs and full-length articles, thus making it ideal for content teams, bloggers, and marketing agencies looking to build niche websites quickly.
Additionally, in the e-commerce space, this pipeline can be used to generate product descriptions in bulk, create category pages, and maintain a consistent tone and structure across large product catalogs, significantly reducing manual effort while improving content consistency.
This pipeline provides a solid foundation for bulk content generation, but it can be further improved to make it more scalable and production-ready.
Bulk inference means running an AI model on many inputs at once instead of one by one. It allows you to process a list of topics or prompts automatically. This makes the workflow much faster and more efficient. It is especially useful when you need to generate content at scale.
No, you can run this pipeline on a CPU or locally as well. However, using a GPU will significantly speed up the process. This is especially important when working with large batches of topics. A GPU helps reduce waiting time and improves performance.
Yes, you can use any LLM that supports API-based inference. The pipeline is flexible and allows you to switch models easily. You just need to update the model name and API configuration. This makes it easy to experiment with different models.
The pipeline supports both .csv and .txt files. In a CSV file, topics are usually read from the first column. In a TXT file, each line is treated as a separate topic. This makes it simple to prepare and upload input data.
Each article is saved as a Markdown (.md) file. The file includes the topic, SEO brief, and the full article content. All files are organized inside a folder for easy access. At the end, everything is compressed into a ZIP file for download.
Yes, you can customize the prompts used for both SEO briefs and articles. This allows you to control tone, structure, and style. You can also modify the code to support different content types. This makes the pipeline highly flexible for different use cases.
This article demonstrates how you can transform content creation into a fully automated and scalable workflow using Serverless Inference. By combining SEO brief generation with article writing, the pipeline ensures that every piece of content is both structured and optimized from the start. The use of a simple, modular architecture makes it easy to understand, extend, and adapt for different use cases such as blogging, marketing, or e-commerce.
Additionally, integrating this pipeline with a GPU-enabled droplet significantly improves performance, allowing you to handle large batches efficiently. Overall, this approach not only saves time but also enables you to produce high-quality content consistently at scale.
As you build on this foundation, you can enhance the pipeline with features like parallel processing, model selection, or direct CMS publishing, turning it into a powerful production-ready system for AI-driven content generation.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
With a strong background in data science and over six years of experience, I am passionate about creating in-depth content on technologies. Currently focused on AI, machine learning, and GPU computing, working on topics ranging from deep learning frameworks to optimizing GPU-based workloads.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Reach out to our team for assistance with GPU Droplets, 1-click LLM models, AI Agents, and bare metal GPUs.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.