AI Tools

How to Install Stable Diffusion Locally: Forge, ComfyUI & Fooocus Setup Guide (2026)

Step-by-step guide to installing Stable Diffusion locally with Forge (A1111), ComfyUI, and Fooocus. Covers GPU requirements, model downloads, and recommended settings for beginners in 2026.

March 16, 2026·8 min read·1,626 words

Stable Diffusion runs on your own hardware, generates images in seconds, and costs nothing after the initial GPU investment. No subscriptions, no content policies, no API limits.

The catch? Getting started can be confusing. Three major interfaces, dozens of model versions, hundreds of extensions — and the community assumes you already know what you're doing.

This guide gets you generating images in under 10 minutes. We'll pick the right UI for you, install it, download a model, and create your first image. Everything else — extensions, LoRAs, advanced workflows — comes after.

Step 0: Pick Your Interface

Three UIs dominate in 2026. Each runs Stable Diffusion under the hood — the difference is how you interact with it.

Interface Best For Learning Curve VRAM Optimization
Stable Diffusion Forge Most users (recommended) Low Excellent
ComfyUI Power users, complex workflows High Best
AUTOMATIC1111 Legacy users, extension compatibility Medium Adequate

Our Pick: Forge

Forge is a fork of AUTOMATIC1111 with one critical improvement: it uses 30-50% less VRAM for the same image quality. Same familiar UI, same extensions, dramatically better performance.

Unless you have a specific reason to pick something else:

  • Pick ComfyUI if you want node-based workflows (think visual programming for image generation). Steeper learning curve, but more flexible for complex pipelines.
  • Pick AUTOMATIC1111 only if you need a specific extension that doesn't work with Forge (rare in 2026, but it happens).

For a deeper comparison of image generation tools, see our ComfyUI vs InvokeAI comparison.

System Requirements

Minimum (Usable)

  • GPU: NVIDIA with 8GB VRAM (RTX 3060, RTX 4060)
  • RAM: 16GB system RAM
  • Storage: 20GB free (base install + one model)
  • OS: Windows 10/11 or Linux (Ubuntu 22.04+)

At 8GB VRAM you can run SD 1.5 models comfortably and SDXL with Forge's optimizations. Expect 512×512 images in ~5 seconds and 1024×1024 in ~15 seconds.

  • GPU: NVIDIA with 12-16GB VRAM (RTX 4070 Ti, RTX 4080)
  • RAM: 32GB system RAM
  • Storage: 50GB free

At 12-16GB you can run SDXL at full resolution with ControlNet, ADetailer, and multiple LoRAs loaded simultaneously. Comfortable batch generation.

Ideal (No Compromises)

  • GPU: NVIDIA with 24GB VRAM (RTX 4090 or RTX 3090)
  • RAM: 64GB system RAM
  • Storage: 100GB+ free

At 24GB you can run Flux models (the newest generation), generate at high resolution with upscaling, and use every extension without worrying about memory. The RTX 4090 is the gold standard for local image generation.

If you're building a dedicated AI machine, see our Home AI Server Build Guide. For pure GPU comparisons, see Best GPU for AI 2026.

> Budget pick: The RTX 3060 12GB can be found used for under $200 and handles SDXL with Forge. Best value entry point for AI image generation.

> *Disclosure: GPU links are Amazon affiliate links. We earn a commission at no extra cost to you.*

AMD and Mac Users

AMD GPUs work with DirectML (Windows) or ROCm (Linux), but performance is 30-50% slower than equivalent NVIDIA cards and some extensions don't support them. If you're buying a GPU specifically for Stable Diffusion, go NVIDIA.

Mac with Apple Silicon (M1/M2/M3/M4) works via the MPS backend. Generation is slow compared to NVIDIA but functional. The RTX 5090 is worth considering if you want future-proof performance.

Install: Windows


# 1. Install Python 3.10.x (NOT 3.11+)
# Download from python.org, check "Add to PATH"

# 2. Install Git
# Download from git-scm.com

# 3. Clone Forge
git clone https://github.com/lllyasviel/stable-diffusion-webui-forge
cd stable-diffusion-webui-forge

# 4. Run the launcher (first run downloads dependencies ~5-10 min)
webui-user.bat

Forge opens in your browser at http://127.0.0.1:7860. First launch takes 5-10 minutes while it downloads PyTorch and dependencies.

ComfyUI


# 1. Download the portable package (easiest)
# Go to: github.com/comfyanonymous/ComfyUI/releases
# Download ComfyUI_windows_portable.zip

# 2. Extract and run
# Unzip → run nvidia_gpu\run_nvidia_gpu.bat

ComfyUI opens at http://127.0.0.1:8188. The node interface looks complex at first — start with the default workflow (it's loaded automatically).

Install: Linux

Forge


# 1. Install prerequisites
sudo apt update
sudo apt install python3.10 python3.10-venv git wget -y

# 2. Clone Forge
git clone https://github.com/lllyasviel/stable-diffusion-webui-forge
cd stable-diffusion-webui-forge

# 3. Launch (first run installs everything)
bash webui.sh

ComfyUI


# 1. Clone and setup
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3.10 -m venv venv
source venv/bin/activate

# 2. Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# 3. Launch
python main.py

Download Your First Model

The UI is installed, but you need a model (checkpoint) to generate images. Here's what to start with:

Model Comparison

Model Type VRAM Needed Best For Resolution
SDXL 1.0 Base 8GB+ Photorealism, general use 1024×1024
Juggernaut XL SDXL fine-tune 8GB+ Photorealistic people, cinema 1024×1024
DreamShaper XL SDXL fine-tune 8GB+ Fantasy, concept art 1024×1024
Pony Diffusion V6 SDXL fine-tune 8GB+ Anime, illustration 1024×1024
Flux.1 Dev New architecture 16GB+ Best overall quality (2026) Variable
Flux.1 Schnell New architecture 12GB+ Fast Flux generation Variable

Our Recommendation for Beginners

Start with Juggernaut XL. It's the most versatile SDXL fine-tune — excellent at photorealism, people, landscapes, and products. Download from CivitAI:

1. Go to civitai.com

2. Search "Juggernaut XL"

3. Download the latest version (.safetensors file)

4. Place it in: stable-diffusion-webui-forge/models/Stable-diffusion/

5. Refresh models in the UI dropdown

If you have 16GB+ VRAM, try Flux.1 Dev for the best image quality available locally in 2026. It handles text in images, complex compositions, and natural lighting better than any SDXL model.

Generate Your First Image

1. Select your model from the dropdown (top-left in Forge)

2. Enter a prompt: professional photo of a mountain landscape at golden hour, dramatic lighting, 8k

3. Negative prompt: blurry, low quality, watermark, text, deformed

4. Settings: 1024×1024, 25 steps, CFG 7

5. Click Generate

Your first image should appear in 5-30 seconds depending on your GPU.

Essential Extensions

Once you're generating images, these extensions will significantly improve quality:

ADetailer (Face/Hand Fix)

The most important extension. AI faces and hands often have subtle defects. ADetailer automatically detects faces and hands, then regenerates just those areas at higher detail.

Install in Forge: Extensions tab → Install from URL → https://github.com/Bing-su/adetailer.git

ControlNet

Control composition precisely. Upload a reference image or pose sketch, and ControlNet makes Stable Diffusion follow that structure. Essential for:

  • Matching specific poses
  • Maintaining consistent characters
  • Architectural renders from sketches
  • Depth-guided generation

Install in Forge: Extensions tab → Available → search "ControlNet" → Install

LoRA Loading

LoRAs are small model add-ons that teach Stable Diffusion specific styles, characters, or concepts. They're tiny (20-200MB vs 6GB for a full model) and stack.

LoRAs load automatically in Forge — just place .safetensors files in models/Lora/ and reference them in your prompt: .

Troubleshooting

"CUDA out of memory"

Your image settings exceed available VRAM. Solutions:

  • Lower resolution (try 768×768 instead of 1024×1024)
  • Reduce batch size to 1
  • In Forge: enable --medvram or --lowvram in webui-user.bat/webui-user.sh
  • Disable extensions you're not using (especially ControlNet when not needed)

"No module named torch"

Python environment issue. Solutions:

  • Make sure you're using Python 3.10 (not 3.11 or 3.12)
  • Delete venv/ folder and relaunch (it recreates the environment)

Black or completely noise images

Usually a VAE issue. Solutions:

  • In Settings → Stable Diffusion → set VAE to "Automatic"
  • If using SDXL, make sure the SDXL VAE is downloaded and selected

Extremely slow generation (minutes per image)

Model is running on CPU instead of GPU. Check:

  • NVIDIA drivers are installed (nvidia-smi should show your GPU)
  • PyTorch CUDA version matches your driver (python -c "import torch; print(torch.cuda.is_available())" should return True)

Extensions crashing Forge

Some A1111 extensions aren't compatible:

  • Disable the extension (Extensions tab → uncheck → Apply and restart)
  • Check the extension's GitHub for Forge compatibility notes
  • If you absolutely need it, try A1111 instead

What's Next?

Once you're comfortable generating basic images:

1. Explore CivitAI for community models and LoRAs — thousands of free styles and concepts

2. Learn img2img — use existing images as starting points for generation

3. Try inpainting — edit specific parts of an image while keeping the rest

4. Experiment with ControlNet — the single biggest quality-of-life upgrade

5. Build ComfyUI workflows — when you outgrow Forge's single-prompt interface

For understanding how model compression affects image quality, see our What is Quantization guide.

The Bottom Line

Stable Diffusion in 2026 is genuinely easy to run locally. Forge plus a $200 used GPU gets you unlimited, private AI image generation that rivals cloud services.

Quickstart:

1. Install Forge (5 min)

2. Download Juggernaut XL from CivitAI (2 min)

3. Generate your first image

Everything else — extensions, advanced models, complex workflows — builds on top of that foundation. Start simple, add complexity when you need it.


*Related: ComfyUI vs InvokeAI | Best GPU for AI 2026 | Home AI Server Build Guide | What is Quantization?*

Frequently Asked Questions

What GPU do I need to run Stable Diffusion locally?

A GPU with 6GB+ VRAM is the recommended minimum. NVIDIA RTX 3060 (12GB) is the sweet spot — widely available used for under $200 and handles SDXL well. AMD GPUs work with ROCm on Linux but have less community support. 4GB VRAM cards can work with optimizations like --medvram but will be slow on SDXL models.

What is the difference between Stable Diffusion A1111, Forge, and ComfyUI?

A1111 (AUTOMATIC1111) is the original web UI — huge extension ecosystem, great documentation. Forge is a fork of A1111 optimized for speed and lower VRAM usage, recommended for most users in 2026. ComfyUI is a node-based workflow editor — more complex but more flexible for advanced pipelines.

How long does Stable Diffusion take to generate an image?

On a modern GPU (RTX 3060+), SDXL generates a 1024×1024 image in 10–30 seconds with 20–30 steps. SDXL Turbo and Lightning models can do 4-step generation in 2–5 seconds. Speed depends on GPU VRAM, resolution, step count, and the sampler used.

What is the best Stable Diffusion model in 2026?

For photorealism: Juggernaut XL or RealVisXL. For anime/illustration: Pony Diffusion or NoobAI XL. For general use: Stable Diffusion XL base with a refiner. All are free on CivitAI. SD 3.5 Medium is also excellent for quality if you have 8GB+ VRAM.

Can Stable Diffusion run on a CPU (without GPU)?

Yes, but it is extremely slow — 10–30 minutes per image. A GPU with CUDA (NVIDIA) or ROCm (AMD) is strongly recommended for practical use. Apple Silicon Macs can use the MPS backend and generate images in 30–60 seconds, which is usable.

Is Stable Diffusion free to use commercially?

Most models use the CreativeML OpenRAIL-M license which allows commercial use with restrictions (no harmful content, no misrepresentation as real photos of identifiable people). Always check the specific model license on CivitAI — some fine-tuned models restrict commercial use.

Frequently Asked Questions

What GPU do I need to run Stable Diffusion locally?
A GPU with 6GB+ VRAM is the recommended minimum. NVIDIA RTX 3060 (12GB) is the sweet spot — widely available used for under $200 and handles SDXL well. AMD GPUs work with ROCm on Linux but have less community support. 4GB VRAM cards can work with optimizations like --medvram but will be slow on SDXL models.
What is the difference between Stable Diffusion A1111, Forge, and ComfyUI?
A1111 (AUTOMATIC1111) is the original web UI — huge extension ecosystem, great documentation. Forge is a fork of A1111 optimized for speed and lower VRAM usage, recommended for most users in 2026. ComfyUI is a node-based workflow editor — more complex but more flexible for advanced pipelines.
How long does Stable Diffusion take to generate an image?
On a modern GPU (RTX 3060+), SDXL generates a 1024×1024 image in 10–30 seconds with 20–30 steps. SDXL Turbo and Lightning models can do 4-step generation in 2–5 seconds. Speed depends on GPU VRAM, resolution, step count, and the sampler used.
What is the best Stable Diffusion model in 2026?
For photorealism: Juggernaut XL or RealVisXL. For anime/illustration: Pony Diffusion or NoobAI XL. For general use: Stable Diffusion XL base with a refiner. All are free on CivitAI. SD 3.5 Medium is also excellent for quality if you have 8GB+ VRAM.
Can Stable Diffusion run on a CPU (without GPU)?
Yes, but it is extremely slow — 10–30 minutes per image. A GPU with CUDA (NVIDIA) or ROCm (AMD) is strongly recommended for practical use. Apple Silicon Macs can use the MPS backend and generate images in 30–60 seconds, which is usable.
Is Stable Diffusion free to use commercially?
Most models use the CreativeML OpenRAIL-M license which allows commercial use with restrictions (no harmful content, no misrepresentation as real photos of identifiable people). Always check the specific model license on CivitAI — some fine-tuned models restrict commercial use.

🔧 Tools in This Article

All tools →

Related Guides

All guides →