May 24, 2026

Flux vs SDXL vs SD 3.5 2026: Which Image Model Wins

By AIFoss · 14 min read

stablediffusionaiimagegenerationgpuopensource

The open-source image generation landscape has gone through a consolidation phase. After two years of fragmentation — every month brought a new model family claiming to beat everything else — three distinct tiers have emerged that cover 95% of real-world use cases: Flux (quality tier), SDXL (ecosystem tier), and SD 3.5 (the contested middle). Where you land depends on your hardware, your use case, and whether you need a commercial license.

Here’s a deep look at all three, including the newest additions to each family as of May 2026.

The Three Families at a Glance

	Flux.1 [dev]	Flux.2 Klein 4B	SDXL 1.0	SD 3.5 Medium	SD 3.5 Large
Parameters	12B	4B	3.5B	2.5B	8.1B
Min VRAM (practical)	6GB (GGUF Q4)	~13GB FP16	8GB	~10GB	~11GB FP8
License	Non-commercial	Apache 2.0	CreativeML RAIL++-M	Stability Community	Stability Community
Steps (typical)	20–50	4	20–30	25–40	25–40
LoRA ecosystem	Growing	Minimal	Massive	Minimal	Minimal
Best for	Quality, research	Fast commercial gen	Fine-tuning, ControlNet	Consumer hardware + text	Quality + text rendering

Short version: if you need the best images and don’t mind a non-commercial license, Flux.1 dev is the default. If you need commercial use under 16GB VRAM, SDXL or Flux.2 Klein 4B. If you’re specifically working with text-in-image or complex compositions and have NVIDIA RTX hardware, SD 3.5 Large with TensorRT is the one case where it earns its place.

Flux: The Quality Benchmark

Black Forest Labs released the original Flux.1 family in mid-2024, and as of 2026 it remains the reference point for open-source image quality. The family has expanded substantially.

Flux.1 [dev] — The 12B reference model

Flux.1 [dev] is a 12-billion-parameter rectified flow transformer, distilled from the commercial Flux.1 [pro]. In practice, it consistently outperforms SDXL on prompt adherence, face detail, spatial coherence in complex multi-subject scenes, and rendered text inside images. Put them side by side with the same prompt and the gap is immediately obvious on photorealistic subjects.

The hardware cost is the main barrier. Running Flux.1 dev at full FP16 requires approximately 24GB VRAM — an RTX 3090, 4090, or A100. GGUF quantized variants shift this considerably: a Q4_K_M GGUF brings the requirement down to around 6GB VRAM, fitting an RTX 3060. There is a quality trade-off at that level — fine textures and high-frequency detail in faces suffer — but for most use cases it’s still better than stock SDXL.

The other barrier: Flux.1 [dev] ships under the FLUX.1-dev Non-Commercial License. You cannot use it in products or services. For production deployments, you need Flux.1 [schnell] or Flux.2 Klein.

Flux.1 [schnell] — Fast, Apache 2.0

Schnell uses 4 inference steps and is fully Apache 2.0 licensed. The quality gap versus dev is real but narrower than expected at 4 steps — it’s a solid choice for rapid iteration and applications where sub-second generation matters more than peak fidelity. Most commercial Flux deployments were running schnell before Klein arrived.

Flux.2 Klein — January 2026, the practical upgrade

On January 15, 2026, Black Forest Labs released FLUX.2 [klein] in two sizes.

The 4B variant (Apache 2.0) generates in 4 inference steps and runs in approximately 13GB VRAM at FP16. Quality sits between schnell and dev — better than schnell on complex prompts, not quite dev at full precision. For commercial applications on under-16GB hardware, this is now the first model to reach for.

The 9B variant raises quality further but reverts to a non-commercial license. At FP16 it needs 27–29GB VRAM; FP8 quantization brings this to roughly 14–16GB, which fits an RTX 4090 with text encoder offloading.

Running Klein 4B locally via the official inference repo:

# Clone the inference repo
git clone https://github.com/black-forest-labs/flux2
cd flux2

# Install dependencies
pip install -e ".[all]"

# Download Klein 4B weights
huggingface-cli download black-forest-labs/FLUX.2-klein-4B \
  --local-dir ./models/flux2-klein-4b

In ComfyUI, load the .safetensors checkpoint through the UNet Loader node — Klein 4B uses the same node structure as Flux.1, so existing workflows load without modification.

Flux.1 Kontext [dev] — A different tool entirely

Flux.1 Kontext [dev] is the same 12B architecture adapted for image-to-image editing via text instructions: change a background while keeping the subject, swap clothing, add objects, adjust lighting. It is not a replacement for Flux.1 dev for text-to-image generation — it’s a separate workflow for iterative image editing. License is non-commercial. If you’re building an editing tool, it’s the most capable open-weight option available as of mid-2026.

When NOT to use Flux

You have 8GB VRAM and want clean output without GGUF artifacts — SDXL is the better fit.
You need a mature LoRA library for a specific subject or style. Civitai has a fraction of SDXL’s catalog for Flux, and training Flux LoRAs requires more VRAM than SDXL Dreambooth.
You need Flux.1 dev commercially — the non-commercial restriction applies to dev and the 9B Klein variant.

SDXL: The Ecosystem Workhorse

SDXL 1.0 is a 3.5B-parameter model that Stability AI released in 2023. In 2026, stock SDXL loses on raw quality to both Flux and SD 3.5 Large. It wins on something else: no other open-source image model has the same depth of community fine-tuning, custom checkpoints, LoRAs, and ControlNet adapters.

The hardware case

SDXL runs on 8GB VRAM — Stability AI’s own baseline recommendation. At 8GB, you get the base model at 1024×1024 without the refiner. Adding the refiner (the second-pass model that adds fine detail) pushes requirements to 12–16GB. For most use cases, 8GB gets you 90% of the SDXL experience.

The license is CreativeML Open RAIL++-M. Commercial use is allowed without revenue caps, subject to use-based restrictions (no illegal content, no harmful applications). This is more permissive than the Stability AI Community License on SD 3.5, which caps free commercial use at $1M annual revenue.

The ecosystem argument

Civitai hosts tens of thousands of SDXL LoRAs, fine-tuned checkpoints, embeddings, and ControlNet adapters — covering photography styles, anime, architecture, product visualization, and character consistency training. No other open model family comes close. Fine-tuning SDXL for a specific character or aesthetic using Dreambooth or LoRA training is thoroughly documented, with tooling in both ComfyUI and Automatic1111/Forge.

The ComfyUI custom nodes ecosystem for SDXL is particularly deep. SeargeSDXL, ComfyRoll, and the built-in SDXL nodes enable multi-ControlNet pipelines, LoRA stacking, aspect ratio management, and refiner scheduling in a single workflow. If you need a customized pipeline for a specific domain, SDXL is still where you start.

Where stock SDXL falls short

Raw text-to-image without fine-tuning: SDXL loses to Flux.1 dev on faces, complex multi-subject prompts, and text rendering in images. The gap is large enough to matter for photorealistic use cases. An SDXL checkpoint fine-tuned on domain-specific data often outperforms generic Flux.1 on that domain — but out of the box, SDXL generates images that look noticeably less detailed.

SDXL also caps out near 1024×1024. Push significantly above that without tiling and quality degrades. Flux and SD 3.5 handle higher resolutions more gracefully.

When NOT to use SDXL

You need photorealistic faces or text rendered inside images — both Flux and SD 3.5 Large beat SDXL here.
You’re generating at 2048×2048 or above without tiling — SDXL was trained at 1024px.
You want the best available quality without fine-tuning — stock SDXL is clearly behind Flux.1 dev.

SD 3.5: The Narrow Use Case

Stability AI released SD 3.5 in late 2024 with two main variants: Large (8.1B parameters) and Medium (2.5B). The architecture is a genuine improvement — MMDiT (multimodal diffusion transformer) with triple text encoders (CLIP-L, CLIP-G, and T5-XXL). The T5 encoder is what gives SD 3.5 its text rendering advantage: it understands language relationships more deeply than SDXL’s CLIP-only setup.

SD 3.5 Medium

At 2.5B parameters, SD 3.5 Medium requires approximately 9.9GB VRAM (excluding text encoders) — in practice, around 12–14GB for the full setup. It fits on a 12GB card and generates at acceptable speeds on RTX 3080/4070-class hardware. The image quality beats SDXL on text rendering and complex compositional prompts, while using roughly the same VRAM.

If you’re on a 12GB card and need better typography than SDXL produces, SD 3.5 Medium is worth testing.

SD 3.5 Large

SD 3.5 Large needs 16–18GB VRAM at FP16. Quality is better than SDXL and SD 3.5 Medium on detailed scenes, though most benchmark comparisons still put it behind Flux.1 dev at full precision.

The practical case for Large depends on hardware. With Stability AI’s TensorRT FP8 optimization, SD 3.5 Large runs in approximately 11GB VRAM at roughly 2.3× the speed of unoptimized FP16. On NVIDIA RTX 30xx and 40xx hardware, this brings Large within reach of 12GB cards and makes it competitive on speed with Flux.1 dev at similar VRAM budgets.

For batch inference or cloud-based generation pipelines, RunPod A100 instances handle SD 3.5 Large at FP16 comfortably for high-volume workflows.

The license is the Stability AI Community License: free for commercial use up to $1M annual revenue, paid enterprise license beyond that.

SD 3.5 Variant	Practical VRAM	Parameters	Speed on RTX 4090	Strength
Medium	~12–14GB	2.5B	Fast	Consumer hardware, typography
Large FP16	~16–18GB	8.1B	Moderate	Composition, complex scenes
Large TensorRT FP8	~11GB	8.1B	~2.3× faster	Quality at lower VRAM on NVIDIA

The honest case against SD 3.5

For most users, SD 3.5 doesn’t beat Flux on quality or SDXL on ecosystem — it fills a gap that wasn’t the most pressing problem. Unless you have a specific reason for it (typography, MMDiT architecture research, or the TensorRT stack on existing NVIDIA infrastructure), you’ll get more value from Flux or SDXL for the same VRAM budget.

When NOT to use SD 3.5

You have 8GB VRAM — SDXL is the only practical choice among these three at that hardware level.
You need a large LoRA library for custom characters or styles — SDXL’s community catalog is incomparably deeper.
You’re generating photorealistic portraits — Flux.1 dev still leads.
You don’t have NVIDIA RTX hardware and need TensorRT benefits — the speed/VRAM gains for Large are NVIDIA-specific.

Running Multiple Models Is Normal

Production image generation pipelines in 2026 rarely rely on a single model. On a 24GB card, a practical rotation:

Fast iteration: Flux.2 Klein 4B (4 steps, Apache 2.0, ~13GB) — test prompts quickly
Production quality, non-commercial: Flux.1 dev GGUF Q4 (6GB) or FP16 (24GB) depending on VRAM headroom
Style-specific output: SDXL 1.0 with domain-tuned checkpoint (8–16GB)
Text-heavy images on NVIDIA RTX: SD 3.5 Large TensorRT FP8 (~11GB)

ComfyUI makes this switching nearly seamless — each model loads into the same workflow structure via different checkpoint nodes, and you can queue generations across model types without restarting.

For developers integrating image generation into applications or automating batch generation pipelines, aicoderscope.com’s guide to AI tools for frontend and design workflows covers how tools like Cursor and Claude Code can script ComfyUI via its API — useful for systematic prompt testing, dataset generation, and production image pipelines.

For details on how quantization choices affect image quality when running lower-precision Flux variants, the same principles covered in the GGUF quantization guide apply — the trade-off between Q4 and Q8 precision maps directly to image sharpness at fine detail levels.

Hardware selection for image generation workloads is covered in depth at runaihome.com’s GPU guides.

Which Model Tier Wins

Flux wins on quality. Flux.1 dev at FP16 or GGUF Q8 is the best default for open-source image generation in 2026 if you don’t need commercial rights. For commercial use, Flux.2 Klein 4B (Apache 2.0, released January 2026) is the new first choice.

SDXL wins on ecosystem. Stock SDXL is showing its age against Flux on raw output, but a well-tuned SDXL checkpoint for a specific domain regularly outperforms generic Flux on that domain. Nothing else has years of community fine-tuning behind it.

SD 3.5 has a defensible niche. Medium on 12GB cards for typography. Large with TensorRT on NVIDIA RTX hardware when you need quality above SDXL without moving to Flux’s non-commercial licensing. Outside those two scenarios, choose Flux or SDXL first.

Pick based on your VRAM floor and commercial license requirements. That narrows the choice faster than quality comparisons alone.

Frequently Asked Questions

Can Flux.1 dev be used commercially?

No. Flux.1 dev ships under the FLUX.1-dev Non-Commercial License, which prohibits use in products, services, or any application that generates revenue. The Apache 2.0 commercial alternatives in 2026 are: Flux.1 schnell (4 steps, lowest quality in the family), and Flux.2 Klein 4B (released January 2026, 4 steps, quality sits between schnell and dev, ~13GB VRAM at FP16). For commercial deployments that need quality above schnell, Flux.2 Klein 4B is the current default. Note that the 9B Klein variant also uses a non-commercial license — only the 4B is Apache 2.0.

What’s the minimum hardware to run Flux locally in 2026?

Flux.1 dev GGUF Q4_K_M runs in approximately 6GB VRAM — an RTX 3060 12GB handles it, though fine texture detail and faces degrade compared to FP16. Flux.2 Klein 4B at FP16 needs ~13GB VRAM, which fits a 16GB card (RTX 5060 Ti 16GB or RTX 4080). Full-precision Flux.1 dev at FP16 requires 24GB — RTX 3090 or RTX 4090. For testing Flux FP16 before committing to 24GB hardware, RunPod rents RTX 4090 instances at $0.34/hr (Community Cloud). Hardware selection by VRAM budget for image generation is covered in depth at runaihome.com’s GPU buying guide.

Is SDXL still worth using in 2026?

Yes — for two specific scenarios. First: 8GB VRAM hardware, where SDXL’s ~4.5GB footprint in ComfyUI is the lowest practical baseline among these three families. Flux.1 dev GGUF Q4 technically runs on 6GB, but SDXL at 8GB with ControlNet often outperforms it on structured compositions. Second: domain-specific fine-tuned checkpoints. SDXL has years of community fine-tuning on Civitai, and a checkpoint trained on your specific subject — product photography, a character, a particular art style — regularly beats generic Flux.1 dev on that exact domain. Stock SDXL for general text-to-image in 2026 is no longer competitive with Flux; SDXL with the right domain checkpoint can still win narrowly on its home turf.

1V1 PLAYBOOK · LOCAL LLM

Cut your local AI bill from $400/month cloud GPU to $47/month at home.

4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.

Get it for $19 (early bird) →

Sources

Recommended Gear

The hardware mentioned in this guide, with current prices on Amazon (affiliate links — at no extra cost to you, purchases help support this site):

Was this article helpful?