Technical Evangelist // AI Arcanist
Image editing is the next step in the evolution of text-to-image Deep Learning generative AI models. In short, this is the ability for the model to take an input image, accept changes based on textual inputs, and return an output image that reflects the changes. This is, for example, one of the most popular use cases for ChatGPT’s GPT-4o’s image generator, and Flux Kontext [pro] and [max] have been making serious waves as well.
In the open source world, the technology for image editing has been a bit behind, until recently. Models like BAGEL were capable of doing complex editing, yes, but they were notably less capable than their commercial competition. Today, that all changed with Flux Kontext dev.
Flux Kontext is the premiere image editing model suite available from Black Forest Labs, and it seemingly outperforms all competition at complex image editing tasks. With the release of Flux Kontext dev, we have created this tutorial to show how to run the models with the ComfyUI on a DigitalOcean GPU Droplet. Afterwards, we will run through a demo showcasing some of the potential use-cases for the models.
To get started with the demo, follow along with the steps shown below.
To get started, we will need a DigitalOcean account to create a GPU Droplet. For this example, we recommend using an AMD MI300X or NVIDIA H100 powered GPU Droplet to run this demo. These GPUs will have the sufficient power to quickly run the model.
Follow the instructions in this tutorial to see how to get started setting up your environment to run the ComfyUI. Once you have set up the GPU Droplet, navigate to the next section of this tutorial.
Follow the instructions on the ComfyUI Github repo to install the repo correctly onto your GPU Droplet. If you are on an NVIDIA GPU, you can paste in the following to automate the process.
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3 -m venv venv_comfy
source venv_comfy/bin/activate
pip install -r requirements.txt
This will install all the required packages for us. Next, we will get the required model files. Make sure that you have logged into HuggingFace on your machine before continuing. Paste the following code in to install the model files.
huggingface-cli download black-forest-labs/FLUX.1-Kontext-dev flux1-kontext-dev.safetensors --cache ./models/diffusion_models/
huggingface-cli download black-forest-labs/FLUX.1-Kontext-dev ae.safetensors --cache ./models/vae/
wget -O ./models/text_encoders/t5xxl_fp8_e4m3fn.safetensors https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors
wget -O ./models/text_encoders/clip_l.safetensors https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors
This will download all the model files to run Flux Kontext dev. Now that this is complete, we can move on to the next section. Run the ComfyUI by pasting the following command into the terminal.
python3 main.py
Then open the link, and paste it into your VS Code (or now Cursor!) simple browser window, as shown in the tutorial we linked to earlier.
To get started, we recommend using the template provided by ComfyAnonymous in their ComfyUI examples repository, shown below.
Save the image file above, and you can upload it to the ComfyUI as a template by clicking Workflow>Open in the top left or ctrl+o. That should give you something like this:
From here, we can begin generating edited images! Simply upload your image into the modal on the left side of the workflow to get started, and enter a good prompt in. Check out the next section for some examples.
Above we have an example of several manipulations and edits we made using a photo of the author. As we can see, the capabilities of the model are wide ranging: from simple style transfer and background removal to full on translation and transformation. We had great success doing popular internet techniques like transferring the style of popular animation studios, and doing complex image editing. The model also seems capable of extrapolating smaller details from the larger image as needed.
In short, the model excels at simple editing tasks. The more descriptive the input prompt, the better the model can do. It is also excellent for tasks like inpainting and outfilling, which we found testing the pro model in the Black Forest Labs playground, as well. Overall, Flux Kontext is a very potent model for both image editing and generation, and we encourage everyone to try it out on a DigitalOcean GPU Droplet.
Flux Kontext dev is a truly powerful image editing model. In our experiments, we found it to be as capable as GPT-4o and other competitive image editing models at corresponding tasks. Be sure to test out Flux Kontext on a DigitalOcean GPU Droplet soon!
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.