Mimir Forge / Memory-budget estimator
Can my machine run this?
Enter your GPU, Mac, RAM, context length, and use case. ToolHalla estimates which local models fit — and when cloud GPU is the smarter call.
Verdict / Llama-3.1-8B-Instruct / Chat / RAG
Apple M4 / 24 GB Likely fits comfortably.
LLM directory data / 11.2 GB model memory at Q8_0
Memory estimate: likely fits at Q8_0 with roomy memory pressure at 8k context. Speed estimate: benchmark needed.
Unified memory systems are estimates, not direct VRAM matches.
Memory budget breakdown
Lighter local alternative
If you want faster/lower-power local inference, consider smaller models.
When cloud is smarter
Cloud is usually smarter when you need long context, heavy concurrency, fast experiments, or high-memory models without buying hardware.
If you want to upgrade
Estimates vary by runtime, quantization, context length, OS overhead, and backend.
LLM directory
All local model entries
Same source data used by the LLM model selector and /models. Quantization memory is directory data, not a live benchmark.
| Model | Family | Params | Context | Use cases | License | Quant / memory |
|---|---|---|---|---|---|---|
Qwen3.5-397B-A17B MoE / 17B active | Qwen | 397B | 1M | chatcodingresearchmathagentic+1 | Apache-2.0 | Q2_K 100 GB / Q3_K_M 130 GB / Q4_K_M 168 GB / Q5_K_M 210 GB |
MiniMax-M2.5 MoE / 10B active | MiniMax | 230B | 1.048576M | chatcodingresearchagentic | Apache-2.0 | Q2_K 58 GB / Q3_K_M 75 GB / Q3_K_XL 82 GB / Q4_K_M 98 GB |
DeepSeek-R1-671B MoE / 37B active | DeepSeek | 671B | 131k | chatcodingresearchreasoning | MIT | TQ1_0 160 GB / IQ2_XXS 195 GB / Q3_K_M 290 GB / Q4_K_M 380 GB |
Qwen3.5-122B-A10B MoE / 10B active | Qwen | 122B | 1M | chatcodingresearchmathagentic | Apache-2.0 | Q2_K 31 GB / Q3_K_M 40 GB / Q4_K_M 52 GB / Q8_0 68 GB |
Kimi-K2.5 MoE / 32B active | Kimi | 1T | 131k | chatcodingresearchagenticvision | MIT (modified) | TQ1_0 200 GB / Q2_K_XL 375 GB / Q4_K_S 550 GB |
GLM-5 MoE / unknown active | GLM | 744B | 131k | chatcodingresearchagenticreasoning | MIT | TQ1_0 174 GB / IQ2_XXS 225 GB / Q3_K_M 320 GB / Q4_K_M 420 GB |
Qwen3-235B-A22B MoE / 22B active | Qwen | 235B | 131k | chatcodingresearchreasoning | Apache-2.0 | Q3_K_M 78 GB / Q4_K_M 100 GB / Q5_K_M 125 GB / Q8_0 190 GB |
Llama-3.3-70B-Instruct | Llama | 70B | 131k | chatcodingresearchcreativemath | Llama 3.3 Community | Q2_K 28.8 GB / Q3_K_M 37.4 GB / Q4_K_M 47.4 GB / Q5_K_M 58.8 GB |
Qwen2.5-72B-Instruct | Qwen | 72B | 33k | chatcodingresearchmathcreative | Apache-2.0 | Q2_K 29.6 GB / Q3_K_M 38.4 GB / Q4_K_M 48.7 GB / Q5_K_M 60.4 GB |
Llama-3.1-70B-Instruct | Llama | 70B | 131k | chatresearchcreativemath | Llama 3.1 Community | Q2_K 28.8 GB / Q3_K_M 37.4 GB / Q4_K_M 47.4 GB / Q5_K_M 58.8 GB |
Nous-Hermes-2-Mixtral-8x7B-DPO | Nous | 46.7B MoE | 33k | chatcreativecoding | Apache-2.0 | Q2_K 19.6 GB / Q3_K_M 25.4 GB / Q4_K_M 32.2 GB / Q5_K_M 39.9 GB |
Qwen2.5-Coder-32B-Instruct | Qwen | 32B | 33k | codingchatmathresearch | Apache-2.0 | Q2_K 13.6 GB / Q3_K_M 17.6 GB / Q4_K_M 22.3 GB / Q5_K_M 27.6 GB |
DeepSeek-R1-Distill-Llama-70B | DeepSeek | 70B | 33k | mathresearchcodingchat | MIT | Q2_K 28.8 GB / Q3_K_M 37.4 GB / Q4_K_M 47.4 GB / Q5_K_M 58.8 GB |
Qwen3-32B | Qwen | 32B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB |
Qwen3.5-27B | Qwen | 27B | 1M | chatcodingresearchmathagentic | Apache-2.0 | Q2_K 11.3 GB / Q3_K_M 14.6 GB / Q4_K_M 18.4 GB / Q5_K_M 22.5 GB |
DeepSeek-R1-Distill-Qwen-32B | DeepSeek | 32B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB |
Llama-4-Maverick-17B MoE / 17B active | Llama | 400B | 1.048576M | chatcodingvisionresearch | Llama 4 Community | Q3_K_M 130 GB / Q4_K_M 170 GB / Q5_K_M 210 GB |
Qwen3.5-35B-A3B MoE / 3B active | Qwen | 35B | 1M | chatcodingagenticresearchmath | Apache-2.0 | Q2_K 14.8 GB / Q3_K_M 19.2 GB / Q4_K_M 24.3 GB / Q5_K_M 30.1 GB |
Qwen2.5-32B-Instruct | Qwen | 32B | 33k | chatcodingresearchmathcreative | Apache-2.0 | Q2_K 13.6 GB / Q3_K_M 17.6 GB / Q4_K_M 22.3 GB / Q5_K_M 27.6 GB |
Gemma-3-27B | Gemma | 27B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB |
Phi-4-14B-Instruct | Phi | 14B | 128k | codingmathresearchchat | MIT | Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB |
Mistral-Small-24B-Instruct | Mistral | 24B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 10 GB / Q3_K_M 13 GB / Q4_K_M 16 GB / Q5_K_M 20 GB |
InternLM2.5-20B-Chat | InternLM | 20B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 10 GB / Q3_K_M 13 GB / Q4_K_M 16 GB / Q5_K_M 20 GB |
Phi-3-medium-128k-instruct | Phi | 14B | 131k | codingchatresearchmath | MIT | Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB |
Mistral-Nemo-12B-Instruct | Mistral | 12B | 128k | chatcodingcreative | Apache-2.0 | Q2_K 5.6 GB / Q3_K_M 7.2 GB / Q4_K_M 9.1 GB / Q5_K_M 11.2 GB |
Qwen3-30B-A3B | Qwen | 30B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 13.6 GB / Q3_K_M 18 GB / Q4_K_M 22 GB / Q5_K_M 27 GB |
Gemma-4-26B-A4B MoE / 3.8B active | Gemma | 25.2B | 262k | chatcodingresearchmathagentic+2 | Apache-2.0 | Q2_K 6.9 GB / Q3_K_M 10.4 GB / Q4_K_M 13.9 GB / Q5_K_M 17.3 GB |
Qwen3.5-9B MoE | Qwen | 9B | 262k | chatcodingresearchmathagentic+2 | Apache-2.0 | Q2_K 2.5 GB / Q3_K_M 3.7 GB / Q4_K_M 5 GB / Q5_K_M 6.2 GB |
DeepSeek-R1-Distill-Llama-8B | DeepSeek | 8B | 33k | mathresearchchat | MIT | Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB |
Mixtral-8x22B-Instruct | Mistral | 141B MoE | 66k | chatcodingresearchcreativemath | Apache-2.0 | Q2_K 36.8 GB / Q3_K_M 47.8 GB / Q4_K_M 60.6 GB / Q5_K_M 75.2 GB |
Llama-4-Scout-17B MoE / 17B active | Llama | 109B | 524k | chatcodingvision | Llama 4 Community | Q3_K_M 35 GB / Q4_K_M 45 GB / Q5_K_M 55 GB / Q8_0 85 GB |
Qwen2.5-Coder-14B-Instruct | Qwen | 14B | 33k | codingchatmath | Apache-2.0 | Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB |
WizardLM-2-8x22B | WizardLM | 141B MoE | 66k | chatcodingresearchcreative | Llama 2 Community | Q2_K 36.8 GB / Q3_K_M 47.8 GB / Q4_K_M 60.6 GB / Q5_K_M 75.2 GB |
Gemma-2-27B-Instruct | Gemma | 27B | 8k | chatcodingresearchcreativemath | Gemma Terms | Q2_K 11.6 GB / Q3_K_M 15 GB / Q4_K_M 19 GB / Q5_K_M 23.5 GB |
Qwen2.5-14B-Instruct | Qwen | 14B | 33k | chatcodingresearchmath | Apache-2.0 | Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB |
Gemma-3-12B | Gemma | 12B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 5.2 GB / Q3_K_M 7 GB / Q4_K_M 8.8 GB / Q5_K_M 11 GB |
Llama-3.2-11B-Vision-Instruct | Llama | 11B | 33k | codingchatresearchcreativemath+1 | Apache-2.0 | Q2_K 4.4 GB / Q3_K_M 6 GB / Q4_K_M 7.2 GB / Q5_K_M 9 GB |
Qwen3.5-4B MoE | Qwen | 4B | 262k | chatcodingresearchmathagentic+2 | Apache-2.0 | Q2_K 1.1 GB / Q3_K_M 1.7 GB / Q4_K_M 2.2 GB / Q5_K_M 2.8 GB |
DeepSeek-R1-Distill-Qwen-14B | DeepSeek | 14B | 33k | mathresearchcoding | MIT | Q2_K 6.4 GB / Q3_K_M 8.3 GB / Q4_K_M 10.4 GB / Q5_K_M 12.9 GB |
Solar-10.7B-Instruct | Solar | 10.7B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 4.4 GB / Q3_K_M 6 GB / Q4_K_M 7.2 GB / Q5_K_M 9 GB |
DBRX-Instruct | Databricks | 132B MoE | 33k | chatcodingresearch | Databricks Open | Q2_K 34.4 GB / Q3_K_M 44.7 GB / Q4_K_M 56.6 GB / Q5_K_M 70.3 GB |
Ministral-8B-Instruct | Mistral | 8B | 128k | chatresearch | Mistral Research | Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB |
Mixtral-8x7B-Instruct | Mistral | 46.7B MoE | 33k | chatcodingresearchcreative | Apache-2.0 | Q2_K 19.6 GB / Q3_K_M 25.4 GB / Q4_K_M 32.2 GB / Q5_K_M 39.9 GB |
Qwen3-8B | Qwen | 8B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB |
Command-R-7B | Cohere | 7B | 128k | researchchatcreative | CC-BY-NC | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Granite-3.1-8B-Instruct | Granite | 8B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB |
Mistral-7B-Instruct-v0.3 | Mistral | 7B | 33k | chatresearch | Apache-2.0 | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Dolphin-2.9.2-Qwen2-7B | Dolphin | 7B | 33k | chatcreativecoding | Apache-2.0 | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
DeepSeek-Coder-33B-Instruct | DeepSeek | 33B | 16k | codingchatmath | DeepSeek License | Q2_K 14 GB / Q3_K_M 18.2 GB / Q4_K_M 23 GB / Q5_K_M 28.5 GB |
Qwen2.5-Coder-7B-Instruct | Qwen | 7B | 33k | codingchat | Apache-2.0 | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Falcon-40B-Instruct | Falcon | 40B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 17 GB / Q3_K_M 22 GB / Q4_K_M 28 GB / Q5_K_M 34 GB |
GLM-4.7-9B-Chat | GLM | 9B | 33k | chatcoding | Apache-2.0 | Q4_K_M 6.2 GB / Q5_K_M 7.4 GB / Q8_0 10.5 GB / FP16 18.8 GB |
Llama-3.1-8B-Instruct | Llama | 8B | 131k | chatresearchcreative | Llama 3.1 Community | Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB |
Qwen2.5-7B-Instruct | Qwen | 7B | 33k | chatcodingresearchmath | Apache-2.0 | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
OpenHermes-2.5-Mistral-7B | Nous | 7B | 33k | chatcreative | Apache-2.0 | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Gemma-2-9B-Instruct | Gemma | 9B | 8k | chatresearchcreative | Gemma Terms | Q2_K 4.4 GB / Q3_K_M 5.7 GB / Q4_K_M 7.1 GB / Q5_K_M 8.8 GB |
Zephyr-7B-beta | Zephyr | 7B | 33k | chatcreative | MIT | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Vicuna-13B | Vicuna | 13B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 5.2 GB / Q3_K_M 7 GB / Q4_K_M 8.8 GB / Q5_K_M 11 GB |
Orca-2-13B | Orca | 13B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 5.2 GB / Q3_K_M 7 GB / Q4_K_M 8.8 GB / Q5_K_M 11 GB |
CodeLlama-7B-Instruct | CodeLlama | 7B | 16k | coding | Llama 2 Community | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Neural-Chat-7B-v3.3 | Intel | 7B | 33k | chatresearch | Apache-2.0 | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Gemma-4-E4B | Gemma | 8B | 131k | chatcodingresearchagenticreasoning+1 | Apache-2.0 | Q2_K 2.2 GB / Q3_K_M 3.3 GB / Q4_K_M 4.4 GB / Q5_K_M 5.5 GB |
Phi-4-mini-instruct | Phi | 3.8B | 128k | chatcodingresearch | MIT | Q2_K 2.3 GB / Q3_K_M 3 GB / Q4_K_M 3.7 GB / Q5_K_M 4.5 GB |
Command-R-35B | Cohere | 35B | 128k | researchchatcoding | CC-BY-NC | Q2_K 14.8 GB / Q3_K_M 19.2 GB / Q4_K_M 24.3 GB / Q5_K_M 30.1 GB |
CodeLlama-34B-Instruct | CodeLlama | 34B | 16k | codingmath | Llama 2 Community | Q2_K 14.4 GB / Q3_K_M 18.7 GB / Q4_K_M 23.6 GB / Q5_K_M 29.3 GB |
OpenChat-3.6-8B | OpenChat | 8B | 8k | chatcreative | Apache-2.0 | Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB |
DeepSeek-R1-Distill-Qwen-7B | DeepSeek | 7B | 33k | mathresearchcoding | MIT | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Gemma-3-4B | Gemma | 4B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 2.8 GB / Q3_K_M 3.8 GB / Q4_K_M 4.8 GB / Q5_K_M 6 GB |
Phi-3-mini-4k-instruct | Phi | 3.8B | 4k | chatcoding | MIT | Q2_K 2.3 GB / Q3_K_M 3 GB / Q4_K_M 3.7 GB / Q5_K_M 4.5 GB |
InternLM2.5-7B-Chat | InternLM | 7B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB |
DeepSeek-Coder-6.7B-Instruct | DeepSeek | 6.7B | 16k | codingchat | DeepSeek License | Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.6 GB / Q5_K_M 6.9 GB |
StarCoder2-15B-Instruct | StarCoder | 15B | 16k | codingmath | OpenRAIL-M | Q2_K 6.8 GB / Q3_K_M 8.8 GB / Q4_K_M 11.1 GB / Q5_K_M 13.7 GB |
WizardLM-2-7B | WizardLM | 7B | 33k | chatcoding | Llama 2 Community | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Vicuna-7B | Vicuna | 7B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB |
Orca-2-7B | Orca | 7B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB |
Qwen2-VL-7B-Instruct | Qwen | 7B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB |
CodeGemma-7B-Instruct | Gemma | 7B | 8k | codingchat | Gemma Terms | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Falcon-7B-Instruct | Falcon | 7B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 3.5 GB / Q3_K_M 4.5 GB / Q4_K_M 5.5 GB / Q5_K_M 7 GB |
Qwen2.5-3B-Instruct | Qwen | 3B | 33k | chatcodingresearch | Apache-2.0 | Q2_K 2 GB / Q3_K_M 2.6 GB / Q4_K_M 3.2 GB / Q5_K_M 3.9 GB |
CodeLlama-13B-Instruct | CodeLlama | 13B | 16k | codingmath | Llama 2 Community | Q2_K 6 GB / Q3_K_M 7.8 GB / Q4_K_M 9.8 GB / Q5_K_M 12.1 GB |
Llama-3.2-3B-Instruct | Llama | 3B | 131k | chatcreative | Llama 3.2 Community | Q2_K 2 GB / Q3_K_M 2.6 GB / Q4_K_M 3.2 GB / Q5_K_M 3.9 GB |
StarCoder2-7B-Instruct | StarCoder | 7B | 16k | coding | OpenRAIL-M | Q2_K 3.6 GB / Q3_K_M 4.6 GB / Q4_K_M 5.8 GB / Q5_K_M 7.1 GB |
Llama-3.2-1B-Instruct | Llama | 1B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 0.5 GB / Q3_K_M 0.6 GB / Q4_K_M 0.7 GB / Q5_K_M 0.9 GB |
Gemma-2-2B-Instruct | Gemma | 2B | 8k | chatcreative | Gemma Terms | Q2_K 1.6 GB / Q3_K_M 2 GB / Q4_K_M 2.5 GB / Q5_K_M 3 GB |
SmolLM2-1.7B-Instruct | SmolLM | 1.7B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 1 GB / Q3_K_M 1.3 GB / Q4_K_M 1.6 GB / Q5_K_M 2 GB |
StarCoder2-3B-Instruct | StarCoder | 3B | 16k | coding | OpenRAIL-M | Q2_K 2 GB / Q3_K_M 2.6 GB / Q4_K_M 3.2 GB / Q5_K_M 3.9 GB |
TinyLlama-1.1B | TinyLlama | 1.1B | 33k | codingchatresearchcreativemath | Apache-2.0 | Q2_K 0.5 GB / Q3_K_M 0.6 GB / Q4_K_M 0.7 GB / Q5_K_M 0.9 GB |
Yi-1.5-34B-Chat | Yi | 34B | 33k | chatresearchcreativemath | Apache-2.0 | Q2_K 14.4 GB / Q3_K_M 18.7 GB / Q4_K_M 23.6 GB / Q5_K_M 29.3 GB |
Yi-1.5-9B-Chat | Yi | 9B | 33k | chatresearchcreative | Apache-2.0 | Q2_K 4.4 GB / Q3_K_M 5.7 GB / Q4_K_M 7.1 GB / Q5_K_M 8.8 GB |
Llama-3-8B-Instruct | Llama | 8B | 8k | chatcreative | Llama 3 Community | Q2_K 4 GB / Q3_K_M 5.2 GB / Q4_K_M 6.5 GB / Q5_K_M 8 GB |