Flux vs SDXL vs SD 3.5 2026: Which Image Model Wins
The open-source image generation landscape has gone through a consolidation phase. After two years of fragmentation — every month brought a new model family claiming to beat everything else — three distinct tiers have emerged that cover 95% of real-world use cases: Flux (quality tier), SDXL (ecosystem tier), and SD 3.5 (the contested middle). Where you land depends on your hardware, your use case, and whether you need a commercial license.
Here’s a deep look at all three, including the newest additions to each family as of May 2026.
The Three Families at a Glance
| Flux.1 [dev] | Flux.2 Klein 4B | SDXL 1.0 | SD 3.5 Medium | SD 3.5 Large | |
|---|---|---|---|---|---|
| Parameters | 12B | 4B | 3.5B | 2.5B | 8.1B |
| Min VRAM (practical) | 6GB (GGUF Q4) | ~13GB FP16 | 8GB | ~10GB | ~11GB FP8 |
| License | Non-commercial | Apache 2.0 | CreativeML RAIL++-M | Stability Community | Stability Community |
| Steps (typical) | 20–50 | 4 | 20–30 | 25–40 | 25–40 |
| LoRA ecosystem | Growing | Minimal | Massive | Minimal | Minimal |
| Best for | Quality, research | Fast commercial gen | Fine-tuning, ControlNet | Consumer hardware + text | Quality + text rendering |
Short version: if you need the best images and don’t mind a non-commercial license, Flux.1 dev is the default. If you need commercial use under 16GB VRAM, SDXL or Flux.2 Klein 4B. If you’re specifically working with text-in-image or complex compositions and have NVIDIA RTX hardware, SD 3.5 Large with TensorRT is the one case where it earns its place.
Flux: The Quality Benchmark
Black Forest Labs released the original Flux.1 family in mid-2024, and as of 2026 it remains the reference point for open-source image quality. The family has expanded substantially.
Flux.1 [dev] — The 12B reference model
Flux.1 [dev] is a 12-billion-parameter rectified flow transformer, distilled from the commercial Flux.1 [pro]. In practice, it consistently outperforms SDXL on prompt adherence, face detail, spatial coherence in complex multi-subject scenes, and rendered text inside images. Put them side by side with the same prompt and the gap is immediately obvious on photorealistic subjects.
The hardware cost is the main barrier. Running Flux.1 dev at full FP16 requires approximately 24GB VRAM — an RTX 3090, 4090, or A100. GGUF quantized variants shift this considerably: a Q4_K_M GGUF brings the requirement down to around 6GB VRAM, fitting an RTX 3060. There is a quality trade-off at that level — fine textures and high-frequency detail in faces suffer — but for most use cases it’s still better than stock SDXL.
The other barrier: Flux.1 [dev] ships under the FLUX.1-dev Non-Commercial License. You cannot use it in products or services. For production deployments, you need Flux.1 [schnell] or Flux.2 Klein.
Flux.1 [schnell] — Fast, Apache 2.0
Schnell uses 4 inference steps and is fully Apache 2.0 licensed. The quality gap versus dev is real but narrower than expected at 4 steps — it’s a solid choice for rapid iteration and applications where sub-second generation matters more than peak fidelity. Most commercial Flux deployments were running schnell before Klein arrived.
Flux.2 Klein — January 2026, the practical upgrade
On January 15, 2026, Black Forest Labs released FLUX.2 [klein] in two sizes.
The 4B variant (Apache 2.0) generates in 4 inference steps and runs in approximately 13GB VRAM at FP16. Quality sits between schnell and dev — better than schnell on complex prompts, not quite dev at full precision. For commercial applications on under-16GB hardware, this is now the first model to reach for.
The 9B variant raises quality further but reverts to a non-commercial license. At FP16 it needs 27–29GB VRAM; FP8 quantization brings this to roughly 14–16GB, which fits an RTX 4090 with text encoder offloading.
Running Klein 4B locally via the official inference repo:
# Clone the inference repo
git clone https://github.com/black-forest-labs/flux2
cd flux2
# Install dependencies
pip install -e ".[all]"
# Download Klein 4B weights
huggingface-cli download black-forest-labs/FLUX.2-klein-4B \
--local-dir ./models/flux2-klein-4b
In ComfyUI, load the .safetensors checkpoint through the UNet Loader node — Klein 4B uses the same node structure as Flux.1, so existing workflows load without modification.
Flux.1 Kontext [dev] — A different tool entirely
Flux.1 Kontext [dev] is the same 12B architecture adapted for image-to-image editing via text instructions: change a background while keeping the subject, swap clothing, add objects, adjust lighting. It is not a replacement for Flux.1 dev for text-to-image generation — it’s a separate workflow for iterative image editing. License is non-commercial. If you’re building an editing tool, it’s the most capable open-weight option available as of mid-2026.
When NOT to use Flux
- You have 8GB VRAM and want clean output without GGUF artifacts — SDXL is the better fit.
- You need a mature LoRA library for a specific subject or style. Civitai has a fraction of SDXL’s catalog for Flux, and training Flux LoRAs requires more VRAM than SDXL Dreambooth.
- You need Flux.1 dev commercially — the non-commercial restriction applies to dev and the 9B Klein variant.
SDXL: The Ecosystem Workhorse
SDXL 1.0 is a 3.5B-parameter model that Stability AI released in 2023. In 2026, stock SDXL loses on raw quality to both Flux and SD 3.5 Large. It wins on something else: no other open-source image model has the same depth of community fine-tuning, custom checkpoints, LoRAs, and ControlNet adapters.
The hardware case
SDXL runs on 8GB VRAM — Stability AI’s own baseline recommendation. At 8GB, you get the base model at 1024×1024 without the refiner. Adding the refiner (the second-pass model that adds fine detail) pushes requirements to 12–16GB. For most use cases, 8GB gets you 90% of the SDXL experience.
The license is CreativeML Open RAIL++-M. Commercial use is allowed without revenue caps, subject to use-based restrictions (no illegal content, no harmful applications). This is more permissive than the Stability AI Community License on SD 3.5, which caps free commercial use at $1M annual revenue.
The ecosystem argument
Civitai hosts tens of thousands of SDXL LoRAs, fine-tuned checkpoints, embeddings, and ControlNet adapters — covering photography styles, anime, architecture, product visualization, and character consistency training. No other open model family comes close. Fine-tuning SDXL for a specific character or aesthetic using Dreambooth or LoRA training is thoroughly documented, with tooling in both ComfyUI and Automatic1111/Forge.
The ComfyUI custom nodes ecosystem for SDXL is particularly deep. SeargeSDXL, ComfyRoll, and the built-in SDXL nodes enable multi-ControlNet pipelines, LoRA stacking, aspect ratio management, and refiner scheduling in a single workflow. If you need a customized pipeline for a specific domain, SDXL is still where you start.
Where stock SDXL falls short
Raw text-to-image without fine-tuning: SDXL loses to Flux.1 dev on faces, complex multi-subject prompts, and text rendering in images. The gap is large enough to matter for photorealistic use cases. An SDXL checkpoint fine-tuned on domain-specific data often outperforms generic Flux.1 on that domain — but out of the box, SDXL generates images that look noticeably less detailed.
SDXL also caps out near 1024×1024. Push significantly above that without tiling and quality degrades. Flux and SD 3.5 handle higher resolutions more gracefully.
When NOT to use SDXL
- You need photorealistic faces or text rendered inside images — both Flux and SD 3.5 Large beat SDXL here.
- You’re generating at 2048×2048 or above without tiling — SDXL was trained at 1024px.
- You want the best available quality without fine-tuning — stock SDXL is clearly behind Flux.1 dev.
SD 3.5: The Narrow Use Case
Stability AI released SD 3.5 in late 2024 with two main variants: Large (8.1B parameters) and Medium (2.5B). The architecture is a genuine improvement — MMDiT (multimodal diffusion transformer) with triple text encoders (CLIP-L, CLIP-G, and T5-XXL). The T5 encoder is what gives SD 3.5 its text rendering advantage: it understands language relationships more deeply than SDXL’s CLIP-only setup.
SD 3.5 Medium
At 2.5B parameters, SD 3.5 Medium requires approximately 9.9GB VRAM (excluding text encoders) — in practice, around 12–14GB for the full setup. It fits on a 12GB card and generates at acceptable speeds on RTX 3080/4070-class hardware. The image quality beats SDXL on text rendering and complex compositional prompts, while using roughly the same VRAM.
If you’re on a 12GB card and need better typography than SDXL produces, SD 3.5 Medium is worth testing.
SD 3.5 Large
SD 3.5 Large needs 16–18GB VRAM at FP16. Quality is better than SDXL and SD 3.5 Medium on detailed scenes, though most benchmark comparisons still put it behind Flux.1 dev at full precision.
The practical case for Large depends on hardware. With Stability AI’s TensorRT FP8 optimization, SD 3.5 Large runs in approximately 11GB VRAM at roughly 2.3× the speed of unoptimized FP16. On NVIDIA RTX 30xx and 40xx hardware, this brings Large within reach of 12GB cards and makes it competitive on speed with Flux.1 dev at similar VRAM budgets.
For batch inference or cloud-based generation pipelines, RunPod A100 instances handle SD 3.5 Large at FP16 comfortably for high-volume workflows.
The license is the Stability AI Community License: free for commercial use up to $1M annual revenue, paid enterprise license beyond that.
| SD 3.5 Variant | Practical VRAM | Parameters | Speed on RTX 4090 | Strength |
|---|---|---|---|---|
| Medium | ~12–14GB | 2.5B | Fast | Consumer hardware, typography |
| Large FP16 | ~16–18GB | 8.1B | Moderate | Composition, complex scenes |
| Large TensorRT FP8 | ~11GB | 8.1B | ~2.3× faster | Quality at lower VRAM on NVIDIA |
The honest case against SD 3.5
For most users, SD 3.5 doesn’t beat Flux on quality or SDXL on ecosystem — it fills a gap that wasn’t the most pressing problem. Unless you have a specific reason for it (typography, MMDiT architecture research, or the TensorRT stack on existing NVIDIA infrastructure), you’ll get more value from Flux or SDXL for the same VRAM budget.
When NOT to use SD 3.5
- You have 8GB VRAM — SDXL is the only practical choice among these three at that hardware level.
- You need a large LoRA library for custom characters or styles — SDXL’s community catalog is incomparably deeper.
- You’re generating photorealistic portraits — Flux.1 dev still leads.
- You don’t have NVIDIA RTX hardware and need TensorRT benefits — the speed/VRAM gains for Large are NVIDIA-specific.
Running Multiple Models Is Normal
Production image generation pipelines in 2026 rarely rely on a single model. On a 24GB card, a practical rotation:
- Fast iteration: Flux.2 Klein 4B (4 steps, Apache 2.0, ~13GB) — test prompts quickly
- Production quality, non-commercial: Flux.1 dev GGUF Q4 (6GB) or FP16 (24GB) depending on VRAM headroom
- Style-specific output: SDXL 1.0 with domain-tuned checkpoint (8–16GB)
- Text-heavy images on NVIDIA RTX: SD 3.5 Large TensorRT FP8 (~11GB)
ComfyUI makes this switching nearly seamless — each model loads into the same workflow structure via different checkpoint nodes, and you can queue generations across model types without restarting.
For details on how quantization choices affect image quality when running lower-precision Flux variants, the same principles covered in the GGUF quantization guide apply — the trade-off between Q4 and Q8 precision maps directly to image sharpness at fine detail levels.
Hardware selection for image generation workloads is covered in depth at runaihome.com’s GPU guides.
Which Model Tier Wins
Flux wins on quality. Flux.1 dev at FP16 or GGUF Q8 is the best default for open-source image generation in 2026 if you don’t need commercial rights. For commercial use, Flux.2 Klein 4B (Apache 2.0, released January 2026) is the new first choice.
SDXL wins on ecosystem. Stock SDXL is showing its age against Flux on raw output, but a well-tuned SDXL checkpoint for a specific domain regularly outperforms generic Flux on that domain. Nothing else has years of community fine-tuning behind it.
SD 3.5 has a defensible niche. Medium on 12GB cards for typography. Large with TensorRT on NVIDIA RTX hardware when you need quality above SDXL without moving to Flux’s non-commercial licensing. Outside those two scenarios, choose Flux or SDXL first.
Pick based on your VRAM floor and commercial license requirements. That narrows the choice faster than quality comparisons alone.
1V1 PLAYBOOK · LOCAL LLM
Cut your local AI bill from $400/month cloud GPU to $47/month at home.
4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.
Get it for $19 (early bird) →Sources
- FLUX.2 Klein release announcement — 4B (Apache 2.0) and 9B (non-commercial), January 15, 2026
- FLUX.2 Klein 9B VRAM requirements — FP16 vs FP8 guide
- Image generation VRAM guide 2026 — Flux, SDXL, SD 3.5 compared
- Introducing Stable Diffusion 3.5 — Stability AI, model parameters and license terms
- SD 3.5 TensorRT optimization — 2.3× faster, 40% less VRAM on NVIDIA RTX
- SDXL 1.0 model license — CreativeML Open RAIL++-M, no revenue cap for commercial use
- SD 3.5 Medium spec — 2.5B parameters, 9.9GB VRAM, MMDiT-X architecture
- Flux.1 Kontext dev — open-weight image editing model, 12B parameters
- FLUX 2 Klein 4B guide — 13GB VRAM, 4-step generation, Apache 2.0
Recommended Gear
The hardware mentioned in this guide, with current prices on Amazon (affiliate links — at no extra cost to you, purchases help support this site):
Was this article helpful?
Thanks for the feedback — it helps improve future articles.
Need hands-on help?
I offer 1-on-1 technical consulting for local AI setup, GPU selection, and AI coding tool configuration — same topics covered on this site.
Book a session — $49 / hour →