Report this

What is the reason for this report?

Comparing the best Image Editing AI Models

Published on October 1, 2025
James Skelton

By James Skelton

Technical Evangelist // AI Arcanist

Comparing the best Image Editing AI Models

Image Generation has been one of the most popular use cases for AI since it’s inception, as we have covered extensively on this blog. From models like Flux and Hi-Dream, we have seen an amazing outpouring of resources into the development of fine-tunings, and the resultant artwork from these places has been incredible to see. There are just so many things that text-to-image models can do - it makes it possible for anyone to bring their imagination to reality.

But these models are not perfect. Frequently, what would be an otherwise perfect image is marred by some small imperfections or mistakes in the image. For example, rather famously, image models used to struggle with limbs and especially hands, which made it very easy to spot ai AI-generated images in the wild. Nowadays, that has largely been fixed in newer models, but the need to fix these small mistakes is still ever present. In order to do that, one would need to have some skill with photo editing software like Photoshop or GIMP. This is where the new image editing models come into play.

Image editing models are text-to-image/image-to-image models that take in text input instructions to make changes to an existing image. These changes can be small, like giving a man a mustache, or they can be large, like changing the entire style of the photo. This gives any user the power to correct their AI-generated images as needed.

In this review, we will take a look at some of the best available open and closed source tools for image editing, and attempt to qualitatively and quantitatively assess the differences between them, discuss their strengths and weaknesses, and briefly show how to use them.

Key Takeaways

  • Different AI Image Editing models thrive in different circumstances
  • Nano Banana is the easiest to use and get started with
  • Qwen Image Edit 2509 is the most versatile and powerful image editing model available

How do we assess the model’s capabilities?

To assess these models, we are going to run a series of image editing tests to qualitatively assess their capabilities. These tests involve using the same images and prompt instructions to make comprehensive edits and changes to the image. We will then look at the output and give our subjective opinion of how well the model performed.

OmniGen2 & UMO

The first image editing models we want to introduce is OmniGen2 from the OmniGen research team, and it’s subsequent applied use in UMO, Bytedance’s Unified Multi-identity Optimization framework. OmniGen2 was the first of these types of model’s to be open-sourced, and its capabilities are more limited than the others we are going to introduce today. Nonetheless, OmniGen2 shows strong capabilities in Instruction-guided Image Editing, editing an image from text inputs, and In-Context generation, to process and flexibly combine diverse inputs; including humans, reference objects, and scenes; to produce novel and coherent visual outputs. OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This pipeline allows for substantial improvement in performance compared to their previous work, OmniGen 1.

UMO was developed by ByteDance researchers to improve upon the capabilities of existing image editing models, UMO has been applied to multiple models including their own, UNO. OmniGen2 UMO improves the utilities of OmniGen2 across the board. Extensive experiments demonstrate that UMO not only improves identity consistency significantly, but also reduces identity confusion on several image customization methods, setting a, at the time, new state-of-the-art among open-source methods along the dimension of identity preserving. We always recommend using UMO over OmniGen2 alone, and will be testing UMO OmniGen2 in this review.

Flux Kontext

Flux Kontext was the next suite of models to be open-sourced for image editing. From Black Forest Labs, these models were an immediate phenomenon upon release, with many users engaging with image editing models for the first time with their tools and platform. The models came in three forms, Max, Pro, and dev.

FLUX.1 Kontext [dev] is a 12 billion parameter rectified flow transformer capable of editing images based on text instructions. “The model enables iterative editing, excels at character preservation across a diverse set of scenes and environments, and allows both precise local and global edits.” (Source) In practice, it is an incredible image editing tool that allows for complex edits with only text prompts.

Qwen Image Edit & Qwen Image Edit 2509

Qwen Image Edit builds on the established progress of diffusion-based image editing methods, which pioneered the practice of combining semantic and appearance control for more flexible editing workflows. The 20B Qwen-Image foundation brings its unique text-rendering precision into the editing space.

To do so, Qwen-Image-Edit directs the input image simultaneously into Qwen2.5-VL (for semantic grounding) and the VAE Encoder (for appearance consistency), achieving high-fidelity edits that balance meaning with style. This alignment with existing research traditions ensures that Qwen Image Edit inherits the strengths of prior innovations while delivering stronger results in precise text and image manipulation.

Qwen Image Edit 2509 is the updated version of the model, exceeding the abilities of Qwen Image Edit in every discernible way. We always recommend using 2509 over the original, and will be testing it in this review. Notably, Qwen Image Edit 2509 is especially improved at Multi-Image editing, allowing users to submit multiple images for editing together at once.

Gemini Flash 2.5 aka Nano Banana

Gemini 2.5 Flash’s image editing capability, codenamed Nano Banana, is the control group for this review. Nano Banana is an image editing model that excels at all tasks and measurements, and dominates the image editing model leaderboards. Not only that, but it runs on Google’s cloud, which enables us to take advantage of their cutting edge work.

In the end, this closed source model is arguably the most powerful image editing tool ever created after Photoshop. As such, we are using it as the bar for comparison between all the other models. Our expectation is that Nano Banana will outperform the open-source competition in all regards, but serve well to act as the benchmark for good model behavior.

Assessing Image Editing Models Qualitatively

Now we get to the review portion of this article. In this section, we will attempt to qualitatively assess the capabilities of the various models chosen. To facilitate this, we generated 5 images with Hunyuan Image 3.0 running on a 8xH100 GPU Droplet. All images were generated with 50 steps, a resolution of 1024x1024, and with a randomized seed. We are then going to apply 5 image manipulations to each of the 5 generated images, using each of the image editing models. This will generate 5 examples for each model with the same image editing prompt for means of comparison.

Prompts

To get started, we have created 4 prompts to generate images that are highly varied in terms of genres, themes, realism, artistic style, contents, and subject matter. We came up with these prompts on our own, with some inspiration from previous works and the Hunyuan Image 3.0 technical report, and then enhanced them using Hunyuan Image 2.1’s prompt enhancement feature. These enhanced prompts are listed below:

image

1: “A hand-drawn propaganda poster captures an explorer on an alien planet, set against the backdrop of a vibrant and strange jungle. The central figure is an explorer encased in a bulky, retrofuturistic space suit, characterized by its silver-toned, padded fabric and prominent corrugated tubing. He wears a large, spherical glass helmet that reveals a determined expression on his face. In one gloved hand, he holds a raygun, a tool with a metallic barrel and a visible screen, suggesting a function for scanning or defense. The explorer stands amidst a dense alien jungle that dominates the middle ground. The jungle is filled with tall, slender trees with deep purple trunks, whose canopies are covered in thick, swirling pink vines. Periodically, large, jagged orange crystalline structures protrude from the ground, glowing with a faint internal light. In the far distance, the silhouettes of ancient, dilapidated ruins are visible, heavily overgrown with vines and scattered with debris. An ornate, handwritten caption at the bottom of the poster spells out, “Explore Rigel-4!” in a bold, stylized font. The artwork is rendered in the style of a mid-20th-century hand-drawn propaganda poster, reminiscent of classic rockwell illustrations.”

image

2: “A grand alphabet puzzle is presented against a plain, neutral background, featuring the letters A to Z arranged horizontally in a standard line-up. Each letter is ingeniously replaced with an object that visually represents its corresponding sound and is rendered in a unique, distinct style. The letter “A” is depicted as dynamic, flowing water, with translucent blue streams forming its shape, frozen in motion with white foam at the tips. Following “A” is “B”, represented by roaring fire, composed of vibrant orange and yellow flames that flicker upwards with trails of dark red smoke. The letter “C” is formed from lush, green grass, with individual blades clearly visible and detailed, suggesting a sharp, crisp cut. The sequence continues with “D” as a block of rich, brown chocolate, with a smooth, glossy surface and visible scored lines. “E” is illustrated as a piece of freshly-cut bread, showing a light golden-brown crust and a porous, airy crumb. “F” appears as a bundle of green leaves, with delicate veins running through them. “G” is shaped like a large, marbled green stone, with swirling patterns of dark and light green. “H” is represented by a bright yellow banana, positioned upright with its brown seeds clearly defined on its tip. This central portion of the alphabet continues with “I” as a towering, slender candle in a simple glass jar, its wick lit from within. “J” is a collection of colorful, circular gumdrops. “K” is crafted from honey, appearing as a dripping, viscous stream of golden-amber liquid. “L” is formed from a single, large, red apple, its skin smooth and reflective. “M” is depicted as a stack of neatly folded, pristine white towels. “N” is a pair of black dominoes, their circular shapes connected by a thin, flexible line. “O” is a perfectly round, white marshmallow, appearing soft and slightly squeezable. “P” is a classic red-and-white striped paper pipe. “Q” is a single, vibrant orange star with a yellow center. “R” is composed of a red apple slice placed next to a small, green pea. “S” is a long, sinuous ribbon of rainbow-colored silk. “T” is a standard hardbound book, closed and showing a dark brown cover. “U” is a simple, golden-yellow cupcake with white frosting dripping down its sides. “V” is a V-shaped cutout frame, revealing a solid red block of fruit. “W” is formed from four straight, brown pretzels. “X” is a bright red-and-white checkered gingham cloth. “Y” is a single, large, pale yellow egg yolk. Finally, “Z” is represented by a straight, metallic silver key. The overall presentation is that of a clean, colorful, and highly creative digital illustration.”

image

3: “The scene is set within a brightly lit, ultra-modern restaurant, where a man and a woman are seated at a dining table, captured in a moment of celebration. In the foreground, the man and woman sit closely together at a round, dark wood table. The man, dressed in a smart dark suit, raises an elegant wine glass, its stem clinking with his companion’s. The woman, wearing a stylish silk blouse, smiles warmly as she meets his gaze and brings her own wine glass forward for a toast. A single, tall candle in the center of the table casts a warm, flickering glow on their faces, emphasizing the joyful, celebratory atmosphere. In the background, visible through expansive floor-to-ceiling glass windows, a group of clowns performs a clumsy but highly comical dance routine on a polished concrete floor. Their movements are exaggerated, featuring acrobatic flips, awkward spins, and humorous facial expressions under the bright, ambient light from the restaurant’s interior. The restaurant’s environment is minimalist, defined by its stark white walls and the vast, clear view of the performance outside. This image presents a high-quality, vibrant photography style.”

image

4: “A playful cartoon monkey is depicted actively climbing a large, gnarled tree, the central focus of a manuscript inspired by ancient Chinese art. The monkey has light brown fur, a long, prehensile tail curling upwards for balance, and a mischievous expression characterized by wide, gleaming eyes and a broad grin. Its small hands grip the textured bark of the trunk, with one leg bent as if in mid-scamper. The tree itself is ancient and robust, with a twisted trunk rendered with dark, expressive calligraphic lines that vary in thickness. Its branches, drawn with a combination of bold strokes and finer lines for detail, stretch across the composition, adorned with stylized leaves in hues of jade green and deep emerald. The background is a soft, off-white or light parchment-like texture, suggesting aged paper, with faint, misty mountains visible in the distance rendered with pale, diluted ink washes. The entire composition is rendered in the style of a traditional Chinese ink wash and watercolor manuscript, emphasizing flowing brushwork and a monochromatic or limited color palette.”

Next, let’s discuss the plan for the editing prompts. These need to be tailored to each image in order to work effectively, as many edits would make no sense if applied universally. For that reason, each set of prompts for the image edits will be specifically created for each. These prompts will be diverse and aim to cover multiple types of edits including Style Transfer, object modification, object addition, object subtraction, and subject manipulation. We will list the edits below each example in the next section.

Comparing the Image Editing Model Capabilities

Image 1

Our edit prompts are listed below:

  • Make the image appear in the style of Van Gogh’s Starry Night
  • add a mustache to the man
  • remove the writing and logo coloration at the bottom of the image

image 1 edits

As we can see from the diagram above, each of the different models seems to excel in different areas. For example, Qwen and Nano Banana are far above the others for the Style Transfer example. They preserve the original image while directly transferring the style over the image. Umo and Flux Kontext lagged behind the others in terms of preserving the finer features of the original image, and UMO didn’t appear to even understand the request.

Nano Banana and Qwen again excelled at task number 2, adding a mustache. UMO and Kontext again seem a bit less effective, with the mustache being visible outside the helmet.

For prompt 3, Nano Banana is the clear standout. Qwen didn’t seem to understand that it needed to remove the logo background as well from the prompt, and, while it did understand the task, UMO failed to extend the background features. Kontext and Nano Banana both resoundingly succeeded at the task, but we prefer the detail of Nano Banana’s image details added where the logo and writing removed.

Overall, Nano Banana was the standout for this series of tasks, but Qwen and Kontext were both quite good in most scenarios.

Image 2

Our edit prompts are listed below:

  • Make the image look like it was animated for a Japanese anime cartoon
  • Remove the fiery letter B from the image
  • Change the letter Z at the bottom right to a “1” number symbol

image 2 edits

As we can see from the example, this series of prompts was a bit more complicated in that it required the models to understand the writing as well as make the edits. For the first example, all three did a decent job of changing the style to an anime style. Qwen seems to be stylistically the closest to actual anime, but that is subjective.

For prompt 2, Flux Kontext was the clear winner. It both understood the exact task and surgically made the edit without affecting the rest of the image. Qwen did nearly as well, but also removed the “A” letter beside the assigned “B”. Nano Banana removed an entire row of letters, which indicates that it may not be as sophisticated when it comes to reading and writing. UMO just replaced the “B” with another “B” shape, which shows it understood the task but still couldn’t complete it.

For prompt 3, all three of Kontext, Qwen, and Nano Banana succeeded easily. This could have been because we included additional instructions for locating the object to be edited in the image, but they succeeded nonetheless. UMO failed completely.

Image 3

Our edit prompts are listed below:

  • Make the scene and characters look like they are from a popular american cartoon with yellow skin characters
  • Remove the clowns in the background
  • Add dramatic sad clown makeup to the man in the foreground

image 3 edits

Looking at prompt 1, Qwen Image Edit was the clear standout. The other models seemed to understand the task, but all failed to achieve the results. Nano Banana is far too realistic for cartoons, and UMO seems to be missing the ubiquitous cartoon example, the Simpsons, that it could be drawing from. The only model that comes close to Qwen here is Kontext, which seems to understand both the task and succeeded in making the edit itself.

For prompt 2, we really like the results from Qwen Image Edit here over the others. The others each successfully removed the background objects, but they did not fill the space with a realistic replacement like Qwen did. The replacement tables make it appear more like a real image.

For prompt 3, Nano Banana and Qwen Image Edit appear to have better successfully made the edits. Nano banana is probably the winner here, as it didn’t modify the facial expression of the woman in the foreground, unlike Qwen Image Edit. UMO and Kontext did make the edit correctly, but also made the edit to the other subject in the image. Since we specified to only edit the man’s features, these results indicate less text understanding than Qwen and Nano Banana demonstrate.

Image 4

Our edit prompts are listed below:

  • Make the scene into a photographic real image taken on camera
  • Give the monkey a tophat, monocle, and fancy suit
  • Flip the monkey so that it is crawling down the tree, upside down

image 4 edits

Finally we get to Image 4’s first prompt. For this example, once again, Qwen Image Edit 2509 is heads and shoulders better than each of the others. It successfully changed the style of the image entirely and believably. Each of the others did not, though Flux Kontext did seem to at least understand the task and attempted it.

For prompt 2, Qwen Image Edit is the winner here. The new clothes added onto the monkey appear most realistic and most faithful to the prompt from the Qwen edits, followed closely by the results for Nano Banana. Flux Kontext did a decent job, but gave the monkey glasses instead of a monocle. UMO completely failed to keep the monkey on the tree.

As we can see for prompt 3, Qwen and Nano Banana are evidently better edits than the others. Each of the two successfully showed the monkey being translated to an upside down position. We especially like the Qwen example with the monkey now hanging from the branch. For UMO, we suspect the model didn’t understand what to do and just opted to remove the object. For Flux Kontext, no edits were made.

Overall Results

From these results, in our subjective opinion, Qwen Image Edit 2509 is the best and most versatile image editing AI tool available. Not only is it the most capable, but the native ability to use it with multiple images at once makes it even more valuable than the others in this review. It can also be fine-tuned with tools like AI-Toolkit, and the customizability of the models alone makes it a better option in many cases than Nano Banana. We recommend Qwen Image Edit 2509 for all image editing tasks.

Try Qwen Image Edit 2509 today on a DigitalOcean Gradient GPU Droplet!

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

James Skelton
James Skelton
Author
Technical Evangelist // AI Arcanist
See author profile
Category:
Tags:

Still looking for an answer?

Was this helpful?


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Creative CommonsThis work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.