How to Install Stable Diffusion Locally: Forge, ComfyUI & Fooocus Setup Guide (2026)
Step-by-step guide to installing Stable Diffusion locally with Forge (A1111), ComfyUI, and Fooocus. Covers GPU requirements, model downloads, and recommended settings for beginners in 2026.
Stable Diffusion runs on your own hardware, generates images in seconds, and costs nothing after the initial GPU investment. No subscriptions, no content policies, no API limits.
The catch? Getting started can be confusing. Three major interfaces, dozens of model versions, hundreds of extensions — and the community assumes you already know what you're doing.
This guide gets you generating images in under 10 minutes. We'll pick the right UI for you, install it, download a model, and create your first image. Everything else — extensions, LoRAs, advanced workflows — comes after.
Step 0: Pick Your Interface
Three UIs dominate in 2026. Each runs Stable Diffusion under the hood — the difference is how you interact with it.
| Interface | Best For | Learning Curve | VRAM Optimization |
|---|---|---|---|
| Stable Diffusion Forge | Most users (recommended) | Low | Excellent |
| ComfyUI | Power users, complex workflows | High | Best |
| AUTOMATIC1111 | Legacy users, extension compatibility | Medium | Adequate |
Our Pick: Forge
Forge is a fork of AUTOMATIC1111 with one critical improvement: it uses 30-50% less VRAM for the same image quality. Same familiar UI, same extensions, dramatically better performance.
Unless you have a specific reason to pick something else:
- Pick ComfyUI if you want node-based workflows (think visual programming for image generation). Steeper learning curve, but more flexible for complex pipelines.
- Pick AUTOMATIC1111 only if you need a specific extension that doesn't work with Forge (rare in 2026, but it happens).
For a deeper comparison of image generation tools, see our ComfyUI vs InvokeAI comparison.
System Requirements
Minimum (Usable)
- GPU: NVIDIA with 8GB VRAM (RTX 3060, RTX 4060)
- RAM: 16GB system RAM
- Storage: 20GB free (base install + one model)
- OS: Windows 10/11 or Linux (Ubuntu 22.04+)
At 8GB VRAM you can run SD 1.5 models comfortably and SDXL with Forge's optimizations. Expect 512×512 images in ~5 seconds and 1024×1024 in ~15 seconds.
Recommended (Good Experience)
- GPU: NVIDIA with 12-16GB VRAM (RTX 4070 Ti, RTX 4080)
- RAM: 32GB system RAM
- Storage: 50GB free
At 12-16GB you can run SDXL at full resolution with ControlNet, ADetailer, and multiple LoRAs loaded simultaneously. Comfortable batch generation.
Ideal (No Compromises)
At 24GB you can run Flux models (the newest generation), generate at high resolution with upscaling, and use every extension without worrying about memory. The RTX 4090 is the gold standard for local image generation.
If you're building a dedicated AI machine, see our Home AI Server Build Guide. For pure GPU comparisons, see Best GPU for AI 2026.
> Budget pick: The RTX 3060 12GB can be found used for under $200 and handles SDXL with Forge. Best value entry point for AI image generation.
> *Disclosure: GPU links are Amazon affiliate links. We earn a commission at no extra cost to you.*
AMD and Mac Users
AMD GPUs work with DirectML (Windows) or ROCm (Linux), but performance is 30-50% slower than equivalent NVIDIA cards and some extensions don't support them. If you're buying a GPU specifically for Stable Diffusion, go NVIDIA.
Mac with Apple Silicon (M1/M2/M3/M4) works via the MPS backend. Generation is slow compared to NVIDIA but functional. The RTX 5090 is worth considering if you want future-proof performance.
Install: Windows
Forge (Recommended)
# 1. Install Python 3.10.x (NOT 3.11+)
# Download from python.org, check "Add to PATH"
# 2. Install Git
# Download from git-scm.com
# 3. Clone Forge
git clone https://github.com/lllyasviel/stable-diffusion-webui-forge
cd stable-diffusion-webui-forge
# 4. Run the launcher (first run downloads dependencies ~5-10 min)
webui-user.bat
Forge opens in your browser at http://127.0.0.1:7860. First launch takes 5-10 minutes while it downloads PyTorch and dependencies.
ComfyUI
# 1. Download the portable package (easiest)
# Go to: github.com/comfyanonymous/ComfyUI/releases
# Download ComfyUI_windows_portable.zip
# 2. Extract and run
# Unzip → run nvidia_gpu\run_nvidia_gpu.bat
ComfyUI opens at http://127.0.0.1:8188. The node interface looks complex at first — start with the default workflow (it's loaded automatically).
Install: Linux
Forge
# 1. Install prerequisites
sudo apt update
sudo apt install python3.10 python3.10-venv git wget -y
# 2. Clone Forge
git clone https://github.com/lllyasviel/stable-diffusion-webui-forge
cd stable-diffusion-webui-forge
# 3. Launch (first run installs everything)
bash webui.sh
ComfyUI
# 1. Clone and setup
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3.10 -m venv venv
source venv/bin/activate
# 2. Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
# 3. Launch
python main.py
Download Your First Model
The UI is installed, but you need a model (checkpoint) to generate images. Here's what to start with:
Model Comparison
| Model | Type | VRAM Needed | Best For | Resolution |
|---|---|---|---|---|
| SDXL 1.0 | Base | 8GB+ | Photorealism, general use | 1024×1024 |
| Juggernaut XL | SDXL fine-tune | 8GB+ | Photorealistic people, cinema | 1024×1024 |
| DreamShaper XL | SDXL fine-tune | 8GB+ | Fantasy, concept art | 1024×1024 |
| Pony Diffusion V6 | SDXL fine-tune | 8GB+ | Anime, illustration | 1024×1024 |
| Flux.1 Dev | New architecture | 16GB+ | Best overall quality (2026) | Variable |
| Flux.1 Schnell | New architecture | 12GB+ | Fast Flux generation | Variable |
Our Recommendation for Beginners
Start with Juggernaut XL. It's the most versatile SDXL fine-tune — excellent at photorealism, people, landscapes, and products. Download from CivitAI:
1. Go to civitai.com
2. Search "Juggernaut XL"
3. Download the latest version (.safetensors file)
4. Place it in: stable-diffusion-webui-forge/models/Stable-diffusion/
5. Refresh models in the UI dropdown
If you have 16GB+ VRAM, try Flux.1 Dev for the best image quality available locally in 2026. It handles text in images, complex compositions, and natural lighting better than any SDXL model.
Generate Your First Image
1. Select your model from the dropdown (top-left in Forge)
2. Enter a prompt: professional photo of a mountain landscape at golden hour, dramatic lighting, 8k
3. Negative prompt: blurry, low quality, watermark, text, deformed
4. Settings: 1024×1024, 25 steps, CFG 7
5. Click Generate
Your first image should appear in 5-30 seconds depending on your GPU.
Essential Extensions
Once you're generating images, these extensions will significantly improve quality:
ADetailer (Face/Hand Fix)
The most important extension. AI faces and hands often have subtle defects. ADetailer automatically detects faces and hands, then regenerates just those areas at higher detail.
Install in Forge: Extensions tab → Install from URL → https://github.com/Bing-su/adetailer.git
ControlNet
Control composition precisely. Upload a reference image or pose sketch, and ControlNet makes Stable Diffusion follow that structure. Essential for:
- Matching specific poses
- Maintaining consistent characters
- Architectural renders from sketches
- Depth-guided generation
Install in Forge: Extensions tab → Available → search "ControlNet" → Install
LoRA Loading
LoRAs are small model add-ons that teach Stable Diffusion specific styles, characters, or concepts. They're tiny (20-200MB vs 6GB for a full model) and stack.
LoRAs load automatically in Forge — just place .safetensors files in models/Lora/ and reference them in your prompt: .
Troubleshooting
"CUDA out of memory"
Your image settings exceed available VRAM. Solutions:
- Lower resolution (try 768×768 instead of 1024×1024)
- Reduce batch size to 1
- In Forge: enable
--medvramor--lowvraminwebui-user.bat/webui-user.sh - Disable extensions you're not using (especially ControlNet when not needed)
"No module named torch"
Python environment issue. Solutions:
- Make sure you're using Python 3.10 (not 3.11 or 3.12)
- Delete
venv/folder and relaunch (it recreates the environment)
Black or completely noise images
Usually a VAE issue. Solutions:
- In Settings → Stable Diffusion → set VAE to "Automatic"
- If using SDXL, make sure the SDXL VAE is downloaded and selected
Extremely slow generation (minutes per image)
Model is running on CPU instead of GPU. Check:
- NVIDIA drivers are installed (
nvidia-smishould show your GPU) - PyTorch CUDA version matches your driver (
python -c "import torch; print(torch.cuda.is_available())"should returnTrue)
Extensions crashing Forge
Some A1111 extensions aren't compatible:
- Disable the extension (Extensions tab → uncheck → Apply and restart)
- Check the extension's GitHub for Forge compatibility notes
- If you absolutely need it, try A1111 instead
What's Next?
Once you're comfortable generating basic images:
1. Explore CivitAI for community models and LoRAs — thousands of free styles and concepts
2. Learn img2img — use existing images as starting points for generation
3. Try inpainting — edit specific parts of an image while keeping the rest
4. Experiment with ControlNet — the single biggest quality-of-life upgrade
5. Build ComfyUI workflows — when you outgrow Forge's single-prompt interface
For understanding how model compression affects image quality, see our What is Quantization guide.
The Bottom Line
Stable Diffusion in 2026 is genuinely easy to run locally. Forge plus a $200 used GPU gets you unlimited, private AI image generation that rivals cloud services.
Quickstart:
1. Install Forge (5 min)
2. Download Juggernaut XL from CivitAI (2 min)
3. Generate your first image
Everything else — extensions, advanced models, complex workflows — builds on top of that foundation. Start simple, add complexity when you need it.
*Related: ComfyUI vs InvokeAI | Best GPU for AI 2026 | Home AI Server Build Guide | What is Quantization?*
Frequently Asked Questions
What GPU do I need to run Stable Diffusion locally?
A GPU with 6GB+ VRAM is the recommended minimum. NVIDIA RTX 3060 (12GB) is the sweet spot — widely available used for under $200 and handles SDXL well. AMD GPUs work with ROCm on Linux but have less community support. 4GB VRAM cards can work with optimizations like --medvram but will be slow on SDXL models.
What is the difference between Stable Diffusion A1111, Forge, and ComfyUI?
A1111 (AUTOMATIC1111) is the original web UI — huge extension ecosystem, great documentation. Forge is a fork of A1111 optimized for speed and lower VRAM usage, recommended for most users in 2026. ComfyUI is a node-based workflow editor — more complex but more flexible for advanced pipelines.
How long does Stable Diffusion take to generate an image?
On a modern GPU (RTX 3060+), SDXL generates a 1024×1024 image in 10–30 seconds with 20–30 steps. SDXL Turbo and Lightning models can do 4-step generation in 2–5 seconds. Speed depends on GPU VRAM, resolution, step count, and the sampler used.
What is the best Stable Diffusion model in 2026?
For photorealism: Juggernaut XL or RealVisXL. For anime/illustration: Pony Diffusion or NoobAI XL. For general use: Stable Diffusion XL base with a refiner. All are free on CivitAI. SD 3.5 Medium is also excellent for quality if you have 8GB+ VRAM.
Can Stable Diffusion run on a CPU (without GPU)?
Yes, but it is extremely slow — 10–30 minutes per image. A GPU with CUDA (NVIDIA) or ROCm (AMD) is strongly recommended for practical use. Apple Silicon Macs can use the MPS backend and generate images in 30–60 seconds, which is usable.
Is Stable Diffusion free to use commercially?
Most models use the CreativeML OpenRAIL-M license which allows commercial use with restrictions (no harmful content, no misrepresentation as real photos of identifiable people). Always check the specific model license on CivitAI — some fine-tuned models restrict commercial use.
Frequently Asked Questions
What GPU do I need to run Stable Diffusion locally?
What is the difference between Stable Diffusion A1111, Forge, and ComfyUI?
How long does Stable Diffusion take to generate an image?
What is the best Stable Diffusion model in 2026?
Can Stable Diffusion run on a CPU (without GPU)?
Is Stable Diffusion free to use commercially?
🔧 Tools in This Article
All tools →Related Guides
All guides →Meta and Broadcom April 2026: Why Custom AI Silicon Matters More Now
Meta and Broadcom April 2026: Why Custom AI Silicon Matters More Now Meta's April 14, 2026 announcement of an expanded Broadcom partnership is a useful reminder that AI competition is increasingly fought below the API layer. Meta said it...
2 min read
AI ToolsMeta Muse Spark April 2026: What It Means for Consumer AI Assistants
Meta Muse Spark April 2026: What It Means for Consumer AI Assistants Meta's April 8, 2026 announcement of Muse Spark matters because it is not just another model launch. Meta is trying to reposition Meta AI around multimodal perception,...
2 min read
AI ToolsProject Glasswing April 2026: The AI Cybersecurity Shift Is Here
Project Glasswing April 2026: The AI Cybersecurity Shift Is Here Anthropic's April 7, 2026 announcement of Project Glasswing is one of the clearest recent signs that frontier AI labs now see cybersecurity as a central deployment battleground, not a...
2 min read