Stable Diffusion on 8GB VRAM 2026: SDXL vs Flux Guide
Eight-gigabyte GPUs cover most of the consumer market — RTX 3060 Ti, 3070, 4060, 4060 Ti (8GB variant), AMD RX 6700 XT, RX 7600. They’re powerful enough for real image generation work, but most tutorials assume a 24GB workstation card and skip over the workarounds that actually matter at this memory tier.
This is specifically for 8GB VRAM. You’ll get SDXL and Flux.1 running in ComfyUI v0.22.0, with the flags, model choices, and workflow nodes that actually work — along with an honest comparison of what you’re giving up against a larger GPU.
Cards this applies to
| GPU | VRAM | Notes |
|---|---|---|
| RTX 3060 Ti | 8 GB | GDDR6, handles SDXL fine |
| RTX 3070 / 3070 Ti | 8 GB | Faster than 3060 Ti, same VRAM ceiling |
| RTX 4060 | 8 GB | Efficient; good power-to-VRAM ratio |
| RTX 4060 Ti (8GB) | 8 GB | Not the 16GB variant |
| RX 7600 | 8 GB | AMD ROCm; same constraints, some caveats below |
The RX 6700 XT has 12 GB — if you have one, you’re above the 8GB floor and some of the tighter workarounds below are optional for you.
AMD note: ComfyUI’s GGUF quantization extension (city96/ComfyUI-GGUF) has weaker ROCm support than CUDA. If you’re on AMD, the SDXL path is more reliable than Flux GGUF for now.
What fits in 8GB out of the box
This table is the starting point. “VRAM (inference)” means peak usage during generation, not just model size on disk.
| Model | Precision | Peak inference VRAM | 8GB viable? | What’s needed |
|---|---|---|---|---|
| SDXL 1.0 (base) | FP16 | ~7–8 GB | Tight | --medvram + VAE fix |
| SDXL 1.0 (pruned) | FP16 | ~5.5–6.5 GB | Yes | Standard flags |
| Flux.1 Dev | FP16 | ~23 GB | No | Needs 24GB+ |
| Flux.1 Dev | GGUF Q5_K_S | ~7.5–8 GB | Yes | ComfyUI-GGUF extension |
| Flux.1 Dev | GGUF Q4_0 | ~6.5–7 GB | Yes | ComfyUI-GGUF; quality tradeoff |
| Flux.1 Schnell | GGUF Q5_K_S | ~7.5–8 GB | Yes | ComfyUI-GGUF; Apache 2.0 licensed |
Full FP16 Flux is not viable under 12GB regardless of flags. SDXL with the pruned weights and Flux GGUF quantized to Q5 are the two practical paths.
ComfyUI setup for 8GB
The ComfyUI review covers the full installation. The relevant part here is the launch command — on 8GB cards you need the right flags before you load a single model.
Start with:
python main.py --medvram --force-fp16
If you still hit OOM errors, escalate to:
python main.py --lowvram --force-fp16
What each flag does:
--medvram: offloads the text encoder to CPU during generation, freeing ~1–2 GB on the GPU. Mild speed penalty. The right starting point for 8–12 GB cards.--lowvram: moves model components off the GPU between operations. Slower than--medvramdue to the constant CPU-GPU transfers, but fits tighter memory budgets.--force-fp16: loads models that default to FP32 in half precision instead. Halves their VRAM footprint with minimal quality impact.--cpu-vae: if you’re still crashing specifically at the VAE decode step (the final image render), this offloads the VAE to CPU entirely. Adds 10–30 seconds but prevents the crash.
On Windows using the portable ComfyUI install, add flags to run_nvidia_gpu.bat:
.\python_embeded\python.exe -s ComfyUI\main.py --medvram --force-fp16 --windows-standalone-build
Pick one attention optimization at a time — xFormers, Flash Attention, SageAttention, or attention slicing. Running multiple simultaneously causes unexpected behavior.
SDXL on 8GB VRAM
Model and license
SDXL 1.0 is released under the CreativeML OpenRAIL++-M license — restricted for certain commercial uses at the base weights level, though many community-fine-tuned derivatives carry their own licenses. Check before commercial deployment.
The base model is 6.9 GB on disk. The pruned version removes the EMA weights and runs at ~4.7 GB on disk, with substantially lower peak VRAM during inference. Use the pruned version on 8GB. The full base model will push you into --lowvram territory even with other flags.
Pruned SDXL base weights are widely available on HuggingFace and CivitAI under the original Stability AI release or as community repacks.
Fix the VAE before generating
The default SDXL VAE causes an out-of-memory crash specifically at the VAE decode step when generating at 1024×1024 on 8GB cards. The fix is the sdxl-vae-fp16-fix model (~319 MB), a precision-corrected VAE that avoids this failure mode.
Download it from HuggingFace (madebyollin/sdxl-vae-fp16-fix) and place it in ComfyUI/models/vae/. In your workflow, add a VAE Loader node pointing to this file, then wire it into your VAE Decode node instead of using the checkpoint’s bundled VAE.
Without this fix: 768×768 generates fine; 1024×1024 OOMs at decode maybe half the time.
Resolution
SDXL’s training resolution is 1024×1024. Going below that degrades output quality — it wasn’t designed for 512×512 the way SD 1.5 was. On 8GB with the pruned model and VAE fix:
- 1024×1024: works with
--medvram --force-fp16 - 1280×768 or 768×1280 (landscape/portrait): fine, similar VRAM
- Above 1536 on any edge: expect OOM or very slow generation
For final upscaled output, generate at 1024×1024 and run an upscale workflow afterward rather than generating large natively.
A working SDXL node graph
Standard ComfyUI workflow for SDXL:
- Load Checkpoint → pruned SDXL weights
- CLIP Text Encode × 2 → positive and negative prompt
- Empty Latent Image → 1024×1024, batch size 1
- KSampler → 20–25 steps, DPM++ 2M Karras, CFG 7
- VAE Loader →
sdxl-vae-fp16-fix - VAE Decode → latent from KSampler, VAE from VAE Loader
- Save Image
The ComfyUI custom nodes guide covers node packs worth adding on top of this baseline.
Flux.1 on 8GB VRAM
Flux.1 is a large diffusion transformer from Black Forest Labs. At FP16 it’s not remotely close to fitting in 8GB. GGUF quantization — the same format used for LLM compression — brings it into range.
Licenses
- Flux.1 Dev: FLUX.1-dev Non-Commercial License. Personal use, research, education. Cannot train competing models on it. Outputs can be used commercially, with restrictions.
- Flux.1 Schnell: Apache 2.0. Fully open for commercial use.
If you need commercial licensing for generation work, Schnell is the clear choice. Quality is lower than Dev but still well above SDXL for prompt adherence.
Installing ComfyUI-GGUF
Via ComfyUI Manager (recommended): search for “ComfyUI-GGUF” by city96, install, restart.
Manual:
cd ComfyUI/custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF
pip install -r ComfyUI-GGUF/requirements.txt
After restart, a “bootleg” node category appears. That’s where the GGUF-specific loaders live.
What to download
Three model components are required:
1. The UNet (main model) — the quantized .gguf file. Place in ComfyUI/models/unet/
flux1-dev-Q5_K_S.gguf(~8.1 GB on disk) — recommended for 8GB; best quality at this VRAM tierflux1-dev-Q4_0.gguf(~7.1 GB) — use if Q5 is too tight or you’re stacking LoRAs
2. Text encoders — place in ComfyUI/models/clip/
clip_l.safetensors(~235 MB)t5xxl_fp8_e4m3fn.safetensors— T5-XXL at fp8 precision; use fp8 rather than fp16 to keep VRAM under control
3. VAE — ae.safetensors (~335 MB). Place in ComfyUI/models/vae/
All files are on HuggingFace under the black-forest-labs organization and mirrored on CivitAI.
The Flux node graph
Flux does not use the standard Load Checkpoint node — it’s loaded in pieces:
- Unet Loader (GGUF) → select your
.ggufUNet file - DualCLIPLoader (gguf) →
clip_l.safetensors+t5xxl_fp8_e4m3fn.safetensors - CLIP Text Encode (Flux) → your prompt (Flux ignores negative prompts — don’t bother)
- Empty SD3 Latent Image → 1024×1024
- ModelSamplingFlux → guidance 3.5, shift 1.15
- KSampler or SamplerCustomAdvanced → 25–28 steps, Euler
- VAE Loader →
ae.safetensors - VAE Decode
- Save Image
Flux’s text adherence is significantly better than SDXL’s — short, precise prompts work better than the long trigger-word-laden strings common in SDXL LoRA workflows. Describe what you want; the model follows.
Q4 vs Q5: what you’re trading
Community-collected benchmarks put Q5_K_S at roughly 95% quality retention compared to full FP16, with the remaining 5% visible only in fine details — intricate text rendering, very fine patterns, and high-zoom facial features. Q4_0 drops to around 75–85% retention, with the degradation showing more in faces, skin textures, and complex scene composition.
For most 8GB use cases: Q5_K_S. Use Q4_0 only when you need the VRAM headroom for LoRAs or when generating at high resolution where Q5 pushes against the limit.
SDXL vs Flux on 8GB: side by side
| SDXL 1.0 (pruned FP16) | Flux.1 Dev (GGUF Q5_K_S) | |
|---|---|---|
| Peak inference VRAM | ~5.5–6.5 GB | ~7.5–8 GB |
| VRAM headroom for LoRAs | Comfortable | Tight |
| Prompt adherence | Good | Excellent |
| Photorealism ceiling | Good | Superior |
| Generation speed | Faster | Significantly slower |
| LoRA ecosystem | Very large, mature | Growing quickly |
| ControlNet support | Mature | Limited, improving |
| License | CreativeML OpenRAIL++-M | FLUX.1-dev Non-Commercial |
| Commercial option | Via licensed derivatives | Schnell (Apache 2.0) |
| Setup complexity | Low | Medium |
Pick SDXL if: faster iteration matters, you rely on LoRAs and ControlNet from the existing SDXL ecosystem, or you’re on AMD where GGUF has rougher support.
Pick Flux if: prompt fidelity and photorealism are the priority, you’re doing personal or non-commercial work, and you’re on CUDA hardware.
The honest day-to-day difference on an RTX 3070: SDXL generates images noticeably faster with excellent results for stylized and fine-tuned subjects. Flux takes longer but handles complex multi-subject prompts and photorealistic scenes that would require significant prompt engineering to get right in SDXL. Neither is strictly better — they optimize for different things.
When 8GB stops being enough
Some workflows hit a hard ceiling regardless of flags and quantization:
Video generation: AnimateDiff, Wan, and other temporal diffusion models need substantially more VRAM for the attention across frames. 8GB produces very short clips and crashes on anything ambitious.
LoRA training: Training SDXL LoRAs in Kohya SS needs 12–16 GB minimum. Fine-tuning Flux requires 24 GB+. Running inference on a trained model is one thing; training it is another category entirely.
Stacking multiple LoRAs on Flux: Two LoRAs on Flux Dev Q5_K_S will push past 8GB on most cards. Either drop to Q4 or accept one LoRA at a time.
Upscaled generation at native large resolutions: Generating at 2048×2048 natively isn’t feasible on 8GB. Upscale from 1024 instead.
For training workflows, RunPod rents A40 GPUs (48 GB VRAM) at under $0.30/hr — cheaper than buying a 24GB card if training is occasional rather than daily. For permanent home lab hardware recommendations and a breakdown of the 16GB and 24GB consumer tiers, see the GPU guides at runaihome.com.
1V1 PLAYBOOK · LOCAL LLM
Cut your local AI bill from $400/month cloud GPU to $47/month at home.
4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.
Get it for $19 (early bird) →Sources
- ComfyUI Releases — Comfy-Org/ComfyUI — v0.22.0 release date
- Image Generation VRAM Requirements 2026: Flux, SDXL, SD 3.5 Compared — VRAM usage by model and precision
- FLUX GGUF Quantization: Run FLUX on 8GB VRAM (2026) — Q4/Q5 VRAM figures and quality retention estimates
- VRAM Optimization Flags Explained — ComfyUI Guide —
--lowvram,--medvram,--force-fp16behavior - ComfyUI-GGUF by city96 — GGUF loader extension, node names
- Strange VRAM consumption (SDXL barely fits in 8GB VRAM) — ComfyUI Issue #2855 — sdxl-vae-fp16-fix recommendation and 319 MB size
- FLUX.1 Model Licenses — black-forest-labs/flux — Schnell Apache 2.0, Dev Non-Commercial license terms
- SDXL System Requirements — stablediffusionxl.com — 8GB VRAM minimum requirement, CreativeML OpenRAIL++-M license
- GPU Buying Guide for AI Art — ComfyUI Wiki — consumer GPU tier overview
Recommended Gear
The hardware mentioned in this guide, with current prices on Amazon (affiliate links — at no extra cost to you, purchases help support this site):
Was this article helpful?
Thanks for the feedback — it helps improve future articles.
Need hands-on help?
I offer 1-on-1 technical consulting for local AI setup, GPU selection, and AI coding tool configuration — same topics covered on this site.
Book a session — $49 / hour →