Back to oracle

Mimir Forge / Memory-budget estimator

Can my machine run this?

Enter your GPU, Mac, RAM, context length, and use case. ToolHalla estimates which local models fit — and when cloud GPU is the smarter call.

Verdict / Llama-3.1-8B-Instruct / Chat / RAG

Apple M4 / 24 GB Likely fits comfortably.

LLM directory data / 11.2 GB model memory at Q8_0

Memory estimate: likely fits at Q8_0 with roomy memory pressure at 8k context. Speed estimate: benchmark needed.

Unified memory systems are estimates, not direct VRAM matches.

Q8_0
Recommended quantization
Verdict
Likely
Recommended quantization
Q8_0
Expected speed estimate
Benchmark needed
Memory pressure
Roomy

Memory budget breakdown

01
Weights
8B @ Q8_0
11.2 GB
02
KV cache reserve
8k context / solo concurrency
0.2 GB
03
Headroom for OS + activations
If this drops below about 10%, expect swapping or OOM
6.6 GB

Lighter local alternative

If you want faster/lower-power local inference, consider smaller models.

Fit
Phi-3.5-mini
3.8B class / Q4 fits about 6 GB
3B class

When cloud is smarter

Cloud is usually smarter when you need long context, heavy concurrency, fast experiments, or high-memory models without buying hardware.

01
A100 80GB rental
Good fit for 70B class and longer context
$/hr sample

If you want to upgrade

01
RTX 3090 used
24 GB VRAM class, strong local AI value
used market
02
RTX 4090
24 GB VRAM class, fast consumer card
new/used
Confidence: medium — estimate based on memory requirements, not a live benchmark.

Estimates vary by runtime, quantization, context length, OS overhead, and backend.

LLM directory

All local model entries

Same source data used by the LLM model selector and /models. Quantization memory is directory data, not a live benchmark.

90 models
ModelFamilyParamsContextUse casesLicenseQuant / memory
Qwen3.5-397B-A17B
MoE / 17B active
Qwen397B1M
chatcodingresearchmathagentic+1
Apache-2.0Q2_K 100 GB / Q3_K_M 130 GB / Q4_K_M 168 GB / Q5_K_M 210 GB
MiniMax-M2.5
MoE / 10B active
MiniMax230B1.048576M
chatcodingresearchagentic
Apache-2.0Q2_K 58 GB / Q3_K_M 75 GB / Q3_K_XL 82 GB / Q4_K_M 98 GB
DeepSeek-R1-671B
MoE / 37B active
DeepSeek671B131k
chatcodingresearchreasoning
MITTQ1_0 160 GB / IQ2_XXS 195 GB / Q3_K_M 290 GB / Q4_K_M 380 GB
Qwen3.5-122B-A10B
MoE / 10B active
Qwen122B1M
chatcodingresearchmathagentic
Apache-2.0Q2_K 31 GB / Q3_K_M 40 GB / Q4_K_M 52 GB / Q8_0 68 GB
Kimi-K2.5
MoE / 32B active
Kimi1T131k
chatcodingresearchagenticvision
MIT (modified)TQ1_0 200 GB / Q2_K_XL 375 GB / Q4_K_S 550 GB
GLM-5
MoE / unknown active
GLM744B131k
chatcodingresearchagenticreasoning
MITTQ1_0 174 GB / IQ2_XXS 225 GB / Q3_K_M 320 GB / Q4_K_M 420 GB
Qwen3-235B-A22B
MoE / 22B active
Qwen235B131k
chatcodingresearchreasoning
Apache-2.0Q3_K_M 78 GB / Q4_K_M 100 GB / Q5_K_M 125 GB / Q8_0 190 GB
Llama-3.3-70B-Instruct
Llama70B131k
chatcodingresearchcreativemath
Llama 3.3 CommunityQ2_K 28.8 GB / Q3_K_M 37.4 GB / Q4_K_M 47.4 GB / Q5_K_M 58.8 GB
Qwen2.5-72B-Instruct
Qwen72B33k
chatcodingresearchmathcreative
Apache-2.0Q2_K 29.6 GB / Q3_K_M 38.4 GB / Q4_K_M 48.7 GB / Q5_K_M 60.4 GB
Llama-3.1-70B-Instruct
Llama70B131k
chatresearchcreativemath
Llama 3.1 CommunityQ2_K 28.8 GB / Q3_K_M 37.4 GB / Q4_K_M 47.4 GB / Q5_K_M 58.8 GB
Nous-Hermes-2-Mixtral-8x7B-DPO
Nous46.7B MoE33k
chatcreativecoding
Apache-2.0Q2_K 19.6 GB / Q3_K_M 25.4 GB / Q4_K_M 32.2 GB / Q5_K_M 39.9 GB
Qwen2.5-Coder-32B-Instruct
Qwen32B33k
codingchatmathresearch
Apache-2.0Q2_K 13.6 GB / Q3_K_M 17.6 GB / Q4_K_M 22.3 GB / Q5_K_M 27.6 GB
DeepSeek-R1-Distill-Llama-70B
DeepSeek70B33k
mathresearchcodingchat
MITQ2_K 28.8 GB / Q3_K_M 37.4 GB / Q4_K_M 47.4 GB / Q5_K_M 58.8 GB
Qwen3-32B
Qwen32B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB
Qwen3.5-27B
Qwen27B1M
chatcodingresearchmathagentic
Apache-2.0Q2_K 11.3 GB / Q3_K_M 14.6 GB / Q4_K_M 18.4 GB / Q5_K_M 22.5 GB
DeepSeek-R1-Distill-Qwen-32B
DeepSeek32B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB
Llama-4-Maverick-17B
MoE / 17B active
Llama400B1.048576M
chatcodingvisionresearch
Llama 4 CommunityQ3_K_M 130 GB / Q4_K_M 170 GB / Q5_K_M 210 GB
Qwen3.5-35B-A3B
MoE / 3B active
Qwen35B1M
chatcodingagenticresearchmath
Apache-2.0Q2_K 14.8 GB / Q3_K_M 19.2 GB / Q4_K_M 24.3 GB / Q5_K_M 30.1 GB
Qwen2.5-32B-Instruct
Qwen32B33k
chatcodingresearchmathcreative
Apache-2.0Q2_K 13.6 GB / Q3_K_M 17.6 GB / Q4_K_M 22.3 GB / Q5_K_M 27.6 GB
Gemma-3-27B
Gemma27B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB
Phi-4-14B-Instruct
Phi14B128k
codingmathresearchchat
MITQ2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB
Mistral-Small-24B-Instruct
Mistral24B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 10 GB / Q3_K_M 13 GB / Q4_K_M 16 GB / Q5_K_M 20 GB
InternLM2.5-20B-Chat
InternLM20B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 10 GB / Q3_K_M 13 GB / Q4_K_M 16 GB / Q5_K_M 20 GB
Phi-3-medium-128k-instruct
Phi14B131k
codingchatresearchmath
MITQ2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB
Mistral-Nemo-12B-Instruct
Mistral12B128k
chatcodingcreative
Apache-2.0Q2_K 5.6 GB / Q3_K_M 7.2 GB / Q4_K_M 9.1 GB / Q5_K_M 11.2 GB
Qwen3-30B-A3B
Qwen30B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB
Gemma-4-26B-A4B
MoE / 3.8B active
Gemma25.2B262k
chatcodingresearchmathagentic+2
Apache-2.0Q2_K 6.9 GB / Q3_K_M 10.4 GB / Q4_K_M 13.9 GB / Q5_K_M 17.3 GB
Qwen3.5-9B
MoE
Qwen9B262k
chatcodingresearchmathagentic+2
Apache-2.0Q2_K 2.5 GB / Q3_K_M 3.7 GB / Q4_K_M 5 GB / Q5_K_M 6.2 GB
DeepSeek-R1-Distill-Llama-8B
DeepSeek8B33k
mathresearchchat
MITQ2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB
Mixtral-8x22B-Instruct
Mistral141B MoE66k
chatcodingresearchcreativemath
Apache-2.0Q2_K 36.8 GB / Q3_K_M 47.8 GB / Q4_K_M 60.6 GB / Q5_K_M 75.2 GB
Llama-4-Scout-17B
MoE / 17B active
Llama109B524k
chatcodingvision
Llama 4 CommunityQ3_K_M 35 GB / Q4_K_M 45 GB / Q5_K_M 55 GB / Q8_0 85 GB
Qwen2.5-Coder-14B-Instruct
Qwen14B33k
codingchatmath
Apache-2.0Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB
WizardLM-2-8x22B
WizardLM141B MoE66k
chatcodingresearchcreative
Llama 2 CommunityQ2_K 36.8 GB / Q3_K_M 47.8 GB / Q4_K_M 60.6 GB / Q5_K_M 75.2 GB
Gemma-2-27B-Instruct
Gemma27B8k
chatcodingresearchcreativemath
Gemma TermsQ2_K 11.6 GB / Q3_K_M 15 GB / Q4_K_M 19 GB / Q5_K_M 23.5 GB
Qwen2.5-14B-Instruct
Qwen14B33k
chatcodingresearchmath
Apache-2.0Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB
Gemma-3-12B
Gemma12B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 5.2 GB / Q3_K_M 7 GB / Q4_K_M 8.8 GB / Q5_K_M 11 GB
Llama-3.2-11B-Vision-Instruct
Llama11B33k
codingchatresearchcreativemath+1
Apache-2.0Q2_K 4.4 GB / Q3_K_M 6 GB / Q4_K_M 7.2 GB / Q5_K_M 9 GB
Qwen3.5-4B
MoE
Qwen4B262k
chatcodingresearchmathagentic+2
Apache-2.0Q2_K 1.1 GB / Q3_K_M 1.7 GB / Q4_K_M 2.2 GB / Q5_K_M 2.8 GB
DeepSeek-R1-Distill-Qwen-14B
DeepSeek14B33k
mathresearchcoding
MITQ2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB
Solar-10.7B-Instruct
Solar10.7B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 4.4 GB / Q3_K_M 6 GB / Q4_K_M 7.2 GB / Q5_K_M 9 GB
DBRX-Instruct
Databricks132B MoE33k
chatcodingresearch
Databricks OpenQ2_K 34.4 GB / Q3_K_M 44.7 GB / Q4_K_M 56.6 GB / Q5_K_M 70.3 GB
Ministral-8B-Instruct
Mistral8B128k
chatresearch
Mistral ResearchQ2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB
Mixtral-8x7B-Instruct
Mistral46.7B MoE33k
chatcodingresearchcreative
Apache-2.0Q2_K 19.6 GB / Q3_K_M 25.4 GB / Q4_K_M 32.2 GB / Q5_K_M 39.9 GB
Qwen3-8B
Qwen8B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
Command-R-7B
Cohere7B128k
researchchatcreative
CC-BY-NCQ2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Granite-3.1-8B-Instruct
Granite8B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
Mistral-7B-Instruct-v0.3
Mistral7B33k
chatresearch
Apache-2.0Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Dolphin-2.9.2-Qwen2-7B
Dolphin7B33k
chatcreativecoding
Apache-2.0Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
DeepSeek-Coder-33B-Instruct
DeepSeek33B16k
codingchatmath
DeepSeek LicenseQ2_K 14 GB / Q3_K_M 18.2 GB / Q4_K_M 23 GB / Q5_K_M 28.5 GB
Qwen2.5-Coder-7B-Instruct
Qwen7B33k
codingchat
Apache-2.0Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Falcon-40B-Instruct
Falcon40B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 17 GB / Q3_K_M 22 GB / Q4_K_M 28 GB / Q5_K_M 34 GB
GLM-4.7-9B-Chat
GLM9B33k
chatcoding
Apache-2.0Q4_K_M 6.2 GB / Q5_K_M 7.4 GB / Q8_0 10.5 GB / FP16 18.8 GB
Llama-3.1-8B-Instruct
Llama8B131k
chatresearchcreative
Llama 3.1 CommunityQ2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB
Qwen2.5-7B-Instruct
Qwen7B33k
chatcodingresearchmath
Apache-2.0Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
OpenHermes-2.5-Mistral-7B
Nous7B33k
chatcreative
Apache-2.0Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Gemma-2-9B-Instruct
Gemma9B8k
chatresearchcreative
Gemma TermsQ2_K 4.4 GB / Q3_K_M 5.7 GB / Q4_K_M 7.1 GB / Q5_K_M 8.8 GB
Zephyr-7B-beta
Zephyr7B33k
chatcreative
MITQ2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Vicuna-13B
Vicuna13B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 5.2 GB / Q3_K_M 7 GB / Q4_K_M 8.8 GB / Q5_K_M 11 GB
Orca-2-13B
Orca13B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 5.2 GB / Q3_K_M 7 GB / Q4_K_M 8.8 GB / Q5_K_M 11 GB
CodeLlama-7B-Instruct
CodeLlama7B16k
coding
Llama 2 CommunityQ2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Neural-Chat-7B-v3.3
Intel7B33k
chatresearch
Apache-2.0Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Gemma-4-E4B
Gemma8B131k
chatcodingresearchagenticreasoning+1
Apache-2.0Q2_K 2.2 GB / Q3_K_M 3.3 GB / Q4_K_M 4.4 GB / Q5_K_M 5.5 GB
Phi-4-mini-instruct
Phi3.8B128k
chatcodingresearch
MITQ2_K 2.3 GB / Q3_K_M 3 GB / Q4_K_M 3.7 GB / Q5_K_M 4.5 GB
Command-R-35B
Cohere35B128k
researchchatcoding
CC-BY-NCQ2_K 14.8 GB / Q3_K_M 19.2 GB / Q4_K_M 24.3 GB / Q5_K_M 30.1 GB
CodeLlama-34B-Instruct
CodeLlama34B16k
codingmath
Llama 2 CommunityQ2_K 14.4 GB / Q3_K_M 18.7 GB / Q4_K_M 23.6 GB / Q5_K_M 29.3 GB
OpenChat-3.6-8B
OpenChat8B8k
chatcreative
Apache-2.0Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB
DeepSeek-R1-Distill-Qwen-7B
DeepSeek7B33k
mathresearchcoding
MITQ2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Gemma-3-4B
Gemma4B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 2.8 GB / Q3_K_M 3.8 GB / Q4_K_M 4.8 GB / Q5_K_M 6 GB
Phi-3-mini-4k-instruct
Phi3.8B4k
chatcoding
MITQ2_K 2.3 GB / Q3_K_M 3 GB / Q4_K_M 3.7 GB / Q5_K_M 4.5 GB
InternLM2.5-7B-Chat
InternLM7B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
DeepSeek-Coder-6.7B-Instruct
DeepSeek6.7B16k
codingchat
DeepSeek LicenseQ2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.6 GB / Q5_K_M 6.9 GB
StarCoder2-15B-Instruct
StarCoder15B16k
codingmath
OpenRAIL-MQ2_K 6.8 GB / Q3_K_M 8.8 GB / Q4_K_M 11.1 GB / Q5_K_M 13.7 GB
WizardLM-2-7B
WizardLM7B33k
chatcoding
Llama 2 CommunityQ2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Vicuna-7B
Vicuna7B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
Orca-2-7B
Orca7B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
Qwen2-VL-7B-Instruct
Qwen7B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
CodeGemma-7B-Instruct
Gemma7B8k
codingchat
Gemma TermsQ2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Falcon-7B-Instruct
Falcon7B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
Qwen2.5-3B-Instruct
Qwen3B33k
chatcodingresearch
Apache-2.0Q2_K 2 GB / Q3_K_M 2.6 GB / Q4_K_M 3.2 GB / Q5_K_M 3.9 GB
CodeLlama-13B-Instruct
CodeLlama13B16k
codingmath
Llama 2 CommunityQ2_K 6 GB / Q3_K_M 7.8 GB / Q4_K_M 9.8 GB / Q5_K_M 12.1 GB
Llama-3.2-3B-Instruct
Llama3B131k
chatcreative
Llama 3.2 CommunityQ2_K 2 GB / Q3_K_M 2.6 GB / Q4_K_M 3.2 GB / Q5_K_M 3.9 GB
StarCoder2-7B-Instruct
StarCoder7B16k
coding
OpenRAIL-MQ2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Llama-3.2-1B-Instruct
Llama1B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 0.5 GB / Q3_K_M 0.6 GB / Q4_K_M 0.7 GB / Q5_K_M 0.9 GB
Gemma-2-2B-Instruct
Gemma2B8k
chatcreative
Gemma TermsQ2_K 1.6 GB / Q3_K_M 2 GB / Q4_K_M 2.5 GB / Q5_K_M 3 GB
SmolLM2-1.7B-Instruct
SmolLM1.7B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 1 GB / Q3_K_M 1.3 GB / Q4_K_M 1.6 GB / Q5_K_M 2 GB
StarCoder2-3B-Instruct
StarCoder3B16k
coding
OpenRAIL-MQ2_K 2 GB / Q3_K_M 2.6 GB / Q4_K_M 3.2 GB / Q5_K_M 3.9 GB
TinyLlama-1.1B
TinyLlama1.1B33k
codingchatresearchcreativemath
Apache-2.0Q2_K 0.5 GB / Q3_K_M 0.6 GB / Q4_K_M 0.7 GB / Q5_K_M 0.9 GB
Yi-1.5-34B-Chat
Yi34B33k
chatresearchcreativemath
Apache-2.0Q2_K 14.4 GB / Q3_K_M 18.7 GB / Q4_K_M 23.6 GB / Q5_K_M 29.3 GB
Yi-1.5-9B-Chat
Yi9B33k
chatresearchcreative
Apache-2.0Q2_K 4.4 GB / Q3_K_M 5.7 GB / Q4_K_M 7.1 GB / Q5_K_M 8.8 GB
Llama-3-8B-Instruct
Llama8B8k
chatcreative
Llama 3 CommunityQ2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB