Mimir Forge / Memory-budget estimator

Can my machine run this?

Enter your GPU, Mac, RAM, context length, and use case. ToolHalla estimates which local models fit — and when cloud GPU is the smarter call.

Your forge

Tell the oracle what you have.

PlatformStep 01

Apple chipUnified memory

unified memory budgetFixed

Hardware memory: 24 GB unified memory18 GB usable estimate budgetreserved for OS/runtime/headroom

The memory amount is fixed by the selected hardware. Change hardware to compare another configuration.

Usable estimate budget is reduced because some memory is reserved for OS/runtime/headroom.

Unified memory systems are estimates, not direct VRAM matches.

Target use caseStep 02

Model size filterStep 03

Filters the LLM model dropdown and the directory list below. MoE models are grouped by total parameter class.

LLM model31 matches

Uses ToolHalla's LLM directory quantization data when a model is selected. Choose custom if your exact model is missing.

Context lengthStep 04

ConcurrencyStep 05

Verdict / Llama-3.1-8B-Instruct / Chat / RAG

Apple M4 / 24 GB Likely fits comfortably.

LLM directory data / 11.2 GB model memory at Q8_0

Memory estimate: likely fits at Q8_0 with roomy memory pressure at 8k context. Speed estimate: benchmark needed.

Unified memory systems are estimates, not direct VRAM matches.

Q8_0

Recommended quantization

Verdict

Likely

Recommended quantization

Q8_0

Expected speed estimate

Benchmark needed

Memory pressure

Roomy

Memory budget breakdown

Weights

8B @ Q8_0

11.2 GB

KV cache reserve

8k context / solo concurrency

0.2 GB

Headroom for OS + activations

If this drops below about 10%, expect swapping or OOM

6.6 GB

Lighter local alternative

If you want faster/lower-power local inference, consider smaller models.

Fit

Phi-3.5-mini

3.8B class / Q4 fits about 6 GB

3B class

When cloud is smarter

Cloud is usually smarter when you need long context, heavy concurrency, fast experiments, or high-memory models without buying hardware.

A100 80GB rental

Good fit for 70B class and longer context

$/hr sample

If you want to upgrade

RTX 3090 used

24 GB VRAM class, strong local AI value

used market

RTX 4090

24 GB VRAM class, fast consumer card

new/used

Confidence: medium — estimate based on memory requirements, not a live benchmark.

Estimates vary by runtime, quantization, context length, OS overhead, and backend.

Find my full AI stack Compare cloud GPU See GPU picks

LLM directory

All local model entries

Same source data used by the LLM model selector and /models. Quantization memory is directory data, not a live benchmark.

90 models

Model	Family	Params	Context	Use cases	License	Quant / memory
Qwen3.5-397B-A17B MoE / 17B active	Qwen	397B	1M	chatcodingresearchmathagentic+1	Apache-2.0	Q2_K 100 GB / Q3_K_M 130 GB / Q4_K_M 168 GB / Q5_K_M 210 GB
MiniMax-M2.5 MoE / 10B active	MiniMax	230B	1.048576M	chatcodingresearchagentic	Apache-2.0	Q2_K 58 GB / Q3_K_M 75 GB / Q3_K_XL 82 GB / Q4_K_M 98 GB
DeepSeek-R1-671B MoE / 37B active	DeepSeek	671B	131k	chatcodingresearchreasoning	MIT	TQ1_0 160 GB / IQ2_XXS 195 GB / Q3_K_M 290 GB / Q4_K_M 380 GB
Qwen3.5-122B-A10B MoE / 10B active	Qwen	122B	1M	chatcodingresearchmathagentic	Apache-2.0	Q2_K 31 GB / Q3_K_M 40 GB / Q4_K_M 52 GB / Q8_0 68 GB
Kimi-K2.5 MoE / 32B active	Kimi	1T	131k	chatcodingresearchagenticvision	MIT (modified)	TQ1_0 200 GB / Q2_K_XL 375 GB / Q4_K_S 550 GB
GLM-5 MoE / unknown active	GLM	744B	131k	chatcodingresearchagenticreasoning	MIT	TQ1_0 174 GB / IQ2_XXS 225 GB / Q3_K_M 320 GB / Q4_K_M 420 GB
Qwen3-235B-A22B MoE / 22B active	Qwen	235B	131k	chatcodingresearchreasoning	Apache-2.0	Q3_K_M 78 GB / Q4_K_M 100 GB / Q5_K_M 125 GB / Q8_0 190 GB
Llama-3.3-70B-Instruct	Llama	70B	131k	chatcodingresearchcreativemath	Llama 3.3 Community	Q2_K 28.8 GB / Q3_K_M 37.4 GB / Q4_K_M 47.4 GB / Q5_K_M 58.8 GB
Qwen2.5-72B-Instruct	Qwen	72B	33k	chatcodingresearchmathcreative	Apache-2.0	Q2_K 29.6 GB / Q3_K_M 38.4 GB / Q4_K_M 48.7 GB / Q5_K_M 60.4 GB
Llama-3.1-70B-Instruct	Llama	70B	131k	chatresearchcreativemath	Llama 3.1 Community	Q2_K 28.8 GB / Q3_K_M 37.4 GB / Q4_K_M 47.4 GB / Q5_K_M 58.8 GB
Nous-Hermes-2-Mixtral-8x7B-DPO	Nous	46.7B MoE	33k	chatcreativecoding	Apache-2.0	Q2_K 19.6 GB / Q3_K_M 25.4 GB / Q4_K_M 32.2 GB / Q5_K_M 39.9 GB
Qwen2.5-Coder-32B-Instruct	Qwen	32B	33k	codingchatmathresearch	Apache-2.0	Q2_K 13.6 GB / Q3_K_M 17.6 GB / Q4_K_M 22.3 GB / Q5_K_M 27.6 GB
DeepSeek-R1-Distill-Llama-70B	DeepSeek	70B	33k	mathresearchcodingchat	MIT	Q2_K 28.8 GB / Q3_K_M 37.4 GB / Q4_K_M 47.4 GB / Q5_K_M 58.8 GB
Qwen3-32B	Qwen	32B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB
Qwen3.5-27B	Qwen	27B	1M	chatcodingresearchmathagentic	Apache-2.0	Q2_K 11.3 GB / Q3_K_M 14.6 GB / Q4_K_M 18.4 GB / Q5_K_M 22.5 GB
DeepSeek-R1-Distill-Qwen-32B	DeepSeek	32B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB
Llama-4-Maverick-17B MoE / 17B active	Llama	400B	1.048576M	chatcodingvisionresearch	Llama 4 Community	Q3_K_M 130 GB / Q4_K_M 170 GB / Q5_K_M 210 GB
Qwen3.5-35B-A3B MoE / 3B active	Qwen	35B	1M	chatcodingagenticresearchmath	Apache-2.0	Q2_K 14.8 GB / Q3_K_M 19.2 GB / Q4_K_M 24.3 GB / Q5_K_M 30.1 GB
Qwen2.5-32B-Instruct	Qwen	32B	33k	chatcodingresearchmathcreative	Apache-2.0	Q2_K 13.6 GB / Q3_K_M 17.6 GB / Q4_K_M 22.3 GB / Q5_K_M 27.6 GB
Gemma-3-27B	Gemma	27B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB
Phi-4-14B-Instruct	Phi	14B	128k	codingmathresearchchat	MIT	Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB
Mistral-Small-24B-Instruct	Mistral	24B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 10 GB / Q3_K_M 13 GB / Q4_K_M 16 GB / Q5_K_M 20 GB
InternLM2.5-20B-Chat	InternLM	20B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 10 GB / Q3_K_M 13 GB / Q4_K_M 16 GB / Q5_K_M 20 GB
Phi-3-medium-128k-instruct	Phi	14B	131k	codingchatresearchmath	MIT	Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB
Mistral-Nemo-12B-Instruct	Mistral	12B	128k	chatcodingcreative	Apache-2.0	Q2_K 5.6 GB / Q3_K_M 7.2 GB / Q4_K_M 9.1 GB / Q5_K_M 11.2 GB
Qwen3-30B-A3B	Qwen	30B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB
Gemma-4-26B-A4B MoE / 3.8B active	Gemma	25.2B	262k	chatcodingresearchmathagentic+2	Apache-2.0	Q2_K 6.9 GB / Q3_K_M 10.4 GB / Q4_K_M 13.9 GB / Q5_K_M 17.3 GB
Qwen3.5-9B MoE	Qwen	9B	262k	chatcodingresearchmathagentic+2	Apache-2.0	Q2_K 2.5 GB / Q3_K_M 3.7 GB / Q4_K_M 5 GB / Q5_K_M 6.2 GB
DeepSeek-R1-Distill-Llama-8B	DeepSeek	8B	33k	mathresearchchat	MIT	Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB
Mixtral-8x22B-Instruct	Mistral	141B MoE	66k	chatcodingresearchcreativemath	Apache-2.0	Q2_K 36.8 GB / Q3_K_M 47.8 GB / Q4_K_M 60.6 GB / Q5_K_M 75.2 GB
Llama-4-Scout-17B MoE / 17B active	Llama	109B	524k	chatcodingvision	Llama 4 Community	Q3_K_M 35 GB / Q4_K_M 45 GB / Q5_K_M 55 GB / Q8_0 85 GB
Qwen2.5-Coder-14B-Instruct	Qwen	14B	33k	codingchatmath	Apache-2.0	Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB
WizardLM-2-8x22B	WizardLM	141B MoE	66k	chatcodingresearchcreative	Llama 2 Community	Q2_K 36.8 GB / Q3_K_M 47.8 GB / Q4_K_M 60.6 GB / Q5_K_M 75.2 GB
Gemma-2-27B-Instruct	Gemma	27B	8k	chatcodingresearchcreativemath	Gemma Terms	Q2_K 11.6 GB / Q3_K_M 15 GB / Q4_K_M 19 GB / Q5_K_M 23.5 GB
Qwen2.5-14B-Instruct	Qwen	14B	33k	chatcodingresearchmath	Apache-2.0	Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB
Gemma-3-12B	Gemma	12B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 5.2 GB / Q3_K_M 7 GB / Q4_K_M 8.8 GB / Q5_K_M 11 GB
Llama-3.2-11B-Vision-Instruct	Llama	11B	33k	codingchatresearchcreativemath+1	Apache-2.0	Q2_K 4.4 GB / Q3_K_M 6 GB / Q4_K_M 7.2 GB / Q5_K_M 9 GB
Qwen3.5-4B MoE	Qwen	4B	262k	chatcodingresearchmathagentic+2	Apache-2.0	Q2_K 1.1 GB / Q3_K_M 1.7 GB / Q4_K_M 2.2 GB / Q5_K_M 2.8 GB
DeepSeek-R1-Distill-Qwen-14B	DeepSeek	14B	33k	mathresearchcoding	MIT	Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB
Solar-10.7B-Instruct	Solar	10.7B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 4.4 GB / Q3_K_M 6 GB / Q4_K_M 7.2 GB / Q5_K_M 9 GB
DBRX-Instruct	Databricks	132B MoE	33k	chatcodingresearch	Databricks Open	Q2_K 34.4 GB / Q3_K_M 44.7 GB / Q4_K_M 56.6 GB / Q5_K_M 70.3 GB
Ministral-8B-Instruct	Mistral	8B	128k	chatresearch	Mistral Research	Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB
Mixtral-8x7B-Instruct	Mistral	46.7B MoE	33k	chatcodingresearchcreative	Apache-2.0	Q2_K 19.6 GB / Q3_K_M 25.4 GB / Q4_K_M 32.2 GB / Q5_K_M 39.9 GB
Qwen3-8B	Qwen	8B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
Command-R-7B	Cohere	7B	128k	researchchatcreative	CC-BY-NC	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Granite-3.1-8B-Instruct	Granite	8B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
Mistral-7B-Instruct-v0.3	Mistral	7B	33k	chatresearch	Apache-2.0	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Dolphin-2.9.2-Qwen2-7B	Dolphin	7B	33k	chatcreativecoding	Apache-2.0	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
DeepSeek-Coder-33B-Instruct	DeepSeek	33B	16k	codingchatmath	DeepSeek License	Q2_K 14 GB / Q3_K_M 18.2 GB / Q4_K_M 23 GB / Q5_K_M 28.5 GB
Qwen2.5-Coder-7B-Instruct	Qwen	7B	33k	codingchat	Apache-2.0	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Falcon-40B-Instruct	Falcon	40B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 17 GB / Q3_K_M 22 GB / Q4_K_M 28 GB / Q5_K_M 34 GB
GLM-4.7-9B-Chat	GLM	9B	33k	chatcoding	Apache-2.0	Q4_K_M 6.2 GB / Q5_K_M 7.4 GB / Q8_0 10.5 GB / FP16 18.8 GB
Llama-3.1-8B-Instruct	Llama	8B	131k	chatresearchcreative	Llama 3.1 Community	Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB
Qwen2.5-7B-Instruct	Qwen	7B	33k	chatcodingresearchmath	Apache-2.0	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
OpenHermes-2.5-Mistral-7B	Nous	7B	33k	chatcreative	Apache-2.0	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Gemma-2-9B-Instruct	Gemma	9B	8k	chatresearchcreative	Gemma Terms	Q2_K 4.4 GB / Q3_K_M 5.7 GB / Q4_K_M 7.1 GB / Q5_K_M 8.8 GB
Zephyr-7B-beta	Zephyr	7B	33k	chatcreative	MIT	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Vicuna-13B	Vicuna	13B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 5.2 GB / Q3_K_M 7 GB / Q4_K_M 8.8 GB / Q5_K_M 11 GB
Orca-2-13B	Orca	13B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 5.2 GB / Q3_K_M 7 GB / Q4_K_M 8.8 GB / Q5_K_M 11 GB
CodeLlama-7B-Instruct	CodeLlama	7B	16k	coding	Llama 2 Community	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Neural-Chat-7B-v3.3	Intel	7B	33k	chatresearch	Apache-2.0	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Gemma-4-E4B	Gemma	8B	131k	chatcodingresearchagenticreasoning+1	Apache-2.0	Q2_K 2.2 GB / Q3_K_M 3.3 GB / Q4_K_M 4.4 GB / Q5_K_M 5.5 GB
Phi-4-mini-instruct	Phi	3.8B	128k	chatcodingresearch	MIT	Q2_K 2.3 GB / Q3_K_M 3 GB / Q4_K_M 3.7 GB / Q5_K_M 4.5 GB
Command-R-35B	Cohere	35B	128k	researchchatcoding	CC-BY-NC	Q2_K 14.8 GB / Q3_K_M 19.2 GB / Q4_K_M 24.3 GB / Q5_K_M 30.1 GB
CodeLlama-34B-Instruct	CodeLlama	34B	16k	codingmath	Llama 2 Community	Q2_K 14.4 GB / Q3_K_M 18.7 GB / Q4_K_M 23.6 GB / Q5_K_M 29.3 GB
OpenChat-3.6-8B	OpenChat	8B	8k	chatcreative	Apache-2.0	Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB
DeepSeek-R1-Distill-Qwen-7B	DeepSeek	7B	33k	mathresearchcoding	MIT	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Gemma-3-4B	Gemma	4B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 2.8 GB / Q3_K_M 3.8 GB / Q4_K_M 4.8 GB / Q5_K_M 6 GB
Phi-3-mini-4k-instruct	Phi	3.8B	4k	chatcoding	MIT	Q2_K 2.3 GB / Q3_K_M 3 GB / Q4_K_M 3.7 GB / Q5_K_M 4.5 GB
InternLM2.5-7B-Chat	InternLM	7B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
DeepSeek-Coder-6.7B-Instruct	DeepSeek	6.7B	16k	codingchat	DeepSeek License	Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.6 GB / Q5_K_M 6.9 GB
StarCoder2-15B-Instruct	StarCoder	15B	16k	codingmath	OpenRAIL-M	Q2_K 6.8 GB / Q3_K_M 8.8 GB / Q4_K_M 11.1 GB / Q5_K_M 13.7 GB
WizardLM-2-7B	WizardLM	7B	33k	chatcoding	Llama 2 Community	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Vicuna-7B	Vicuna	7B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
Orca-2-7B	Orca	7B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
Qwen2-VL-7B-Instruct	Qwen	7B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
CodeGemma-7B-Instruct	Gemma	7B	8k	codingchat	Gemma Terms	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Falcon-7B-Instruct	Falcon	7B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB
Qwen2.5-3B-Instruct	Qwen	3B	33k	chatcodingresearch	Apache-2.0	Q2_K 2 GB / Q3_K_M 2.6 GB / Q4_K_M 3.2 GB / Q5_K_M 3.9 GB
CodeLlama-13B-Instruct	CodeLlama	13B	16k	codingmath	Llama 2 Community	Q2_K 6 GB / Q3_K_M 7.8 GB / Q4_K_M 9.8 GB / Q5_K_M 12.1 GB
Llama-3.2-3B-Instruct	Llama	3B	131k	chatcreative	Llama 3.2 Community	Q2_K 2 GB / Q3_K_M 2.6 GB / Q4_K_M 3.2 GB / Q5_K_M 3.9 GB
StarCoder2-7B-Instruct	StarCoder	7B	16k	coding	OpenRAIL-M	Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB
Llama-3.2-1B-Instruct	Llama	1B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 0.5 GB / Q3_K_M 0.6 GB / Q4_K_M 0.7 GB / Q5_K_M 0.9 GB
Gemma-2-2B-Instruct	Gemma	2B	8k	chatcreative	Gemma Terms	Q2_K 1.6 GB / Q3_K_M 2 GB / Q4_K_M 2.5 GB / Q5_K_M 3 GB
SmolLM2-1.7B-Instruct	SmolLM	1.7B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 1 GB / Q3_K_M 1.3 GB / Q4_K_M 1.6 GB / Q5_K_M 2 GB
StarCoder2-3B-Instruct	StarCoder	3B	16k	coding	OpenRAIL-M	Q2_K 2 GB / Q3_K_M 2.6 GB / Q4_K_M 3.2 GB / Q5_K_M 3.9 GB
TinyLlama-1.1B	TinyLlama	1.1B	33k	codingchatresearchcreativemath	Apache-2.0	Q2_K 0.5 GB / Q3_K_M 0.6 GB / Q4_K_M 0.7 GB / Q5_K_M 0.9 GB
Yi-1.5-34B-Chat	Yi	34B	33k	chatresearchcreativemath	Apache-2.0	Q2_K 14.4 GB / Q3_K_M 18.7 GB / Q4_K_M 23.6 GB / Q5_K_M 29.3 GB
Yi-1.5-9B-Chat	Yi	9B	33k	chatresearchcreative	Apache-2.0	Q2_K 4.4 GB / Q3_K_M 5.7 GB / Q4_K_M 7.1 GB / Q5_K_M 8.8 GB
Llama-3-8B-Instruct	Llama	8B	8k	chatcreative	Llama 3 Community	Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB