LLM Finder
Hybrid CPU+GPU mode — run massive models with RAM offloading.
Configure your hardware
🔀 HYBRID CPU+GPU
MiniMax-M2.5
230BMiniMaxMoE · 10B active
🔀 q4_K_M
98 GB / 112 GB (GPU+RAM)88%
⚡11-17 tok/s↔1.0M📄Apache-2.0
💬 chat💻 coding🔬 research agentic
llama-server -hf MiniMax-M2.5-GGUF:q4_K_M \
--jinja -ngl 999 --ctx-size 16384 --fit onQwen3.5-122B-A10B
122BQwenMoE · 10B active
q3_K_M
40 GB / 48 GB VRAM83%
⚡33-43 tok/s↔1.0M📄Apache-2.0
↑ Hybrid upgrade: q8_0 · 68 GB · 11-17 tok/s
💬 chat💻 coding🔬 research🔢 math agentic
# Needs 40GB+ VRAM/RAM — single H100 or dual 3090
hf download Qwen/Qwen3.5-122B-A10B-Instruct-GGUF --include "Q3_K_M/*"Qwen3-235B-A22B
235BQwenMoE · 22B active
🔀 q4_K_M
100 GB / 112 GB (GPU+RAM)89%
⚡11-17 tok/s↔131k📄Apache-2.0
💬 chat💻 coding🔬 research reasoning
llama-server -hf Qwen3-235B-A22B-GGUF:q4_K_M \
--jinja -ngl 999 --ctx-size 16384 --fit onLlama-3.3-70B-Instruct
70BLlama
q4_K_M
47.4 GB / 48 GB VRAM99%
⚡17-27 tok/s↔131k📄Llama 3.3 Community
↑ Hybrid upgrade: q8_0 · 84.4 GB · 12-18 tok/s
💬 chat💻 coding🔬 research🎨 creative🔢 math
ollama pull llama3.3:70bQwen2.5-72B-Instruct
72BQwen
q3_K_M
38.4 GB / 48 GB VRAM80%
⚡38-48 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: q8_0 · 86.8 GB · 12-18 tok/s
💬 chat💻 coding🔬 research🔢 math🎨 creative
ollama pull qwen2.5:72bQwen2.5-Coder-32B-Instruct
32BQwen
q8_0
39.6 GB / 48 GB VRAM83%
⚡35-45 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 72.9 GB · 14-20 tok/s
💻 coding💬 chat🔢 math🔬 research
ollama pull qwen2.5-coder:32bNous-Hermes-2-Mixtral-8x7B-DPO
46.7B MoENous
q5_K_M
39.9 GB / 48 GB VRAM83%
⚡35-45 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 105.9 GB · 10-16 tok/s
💬 chat🎨 creative💻 coding
ollama pull nous-hermes2-mixtralLlama-3.1-70B-Instruct
70BLlama
q4_K_M
47.4 GB / 48 GB VRAM99%
⚡18-28 tok/s↔131k📄Llama 3.1 Community
↑ Hybrid upgrade: q8_0 · 84.4 GB · 12-18 tok/s
💬 chat🔬 research🎨 creative🔢 math
ollama pull llama3.1:70bQwen3.5-27B
27BQwen
q8_0
32.4 GB / 48 GB VRAM68%
⚡53-63 tok/s↔1.0M📄Apache-2.0
↑ Hybrid upgrade: fp16 · 58.5 GB · 17-23 tok/s
💬 chat💻 coding🔬 research🔢 math agentic
ollama pull qwen3.5:27bQwen3-32B
32BQwen
q8_0
38 GB / 48 GB VRAM79%
⚡40-50 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 64 GB · 16-22 tok/s
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Qwen3-32BDeepSeek-R1-Distill-Llama-70B
70BDeepSeek
q4_K_M
47.4 GB / 48 GB VRAM99%
⚡18-28 tok/s↔33k📄MIT
↑ Hybrid upgrade: q8_0 · 84.4 GB · 12-18 tok/s
🔢 math🔬 research💻 coding💬 chat
ollama pull deepseek-r1:70bDeepSeek-R1-Distill-Qwen-32B
32BDeepSeek
q8_0
38 GB / 48 GB VRAM79%
⚡40-50 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 64 GB · 16-22 tok/s
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull DeepSeek-R1-Distill-Qwen-32BQwen3.5-35B-A3B
35BQwenMoE · 3B active
q8_0
43.1 GB / 48 GB VRAM90%
⚡28-38 tok/s↔1.0M📄Apache-2.0
↑ Hybrid upgrade: fp16 · 79.5 GB · 11-17 tok/s
💬 chat💻 coding agentic🔬 research🔢 math
ollama pull qwen3.5:35b-a3bGemma-3-27B
27BGemma
q8_0
38 GB / 48 GB VRAM79%
⚡41-51 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 64 GB · 16-22 tok/s
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Gemma-3-27BQwen2.5-32B-Instruct
32BQwen
q8_0
39.6 GB / 48 GB VRAM83%
⚡37-47 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 72.9 GB · 14-20 tok/s
💬 chat💻 coding🔬 research🔢 math🎨 creative
ollama pull qwen2.5:32bPhi-4-14B-Instruct
14BPhi
fp16
33.3 GB / 48 GB VRAM69%
⚡52-62 tok/s↔128k📄MIT
💻 coding🔢 math🔬 research💬 chat
ollama pull phi4:14bPhi-3-medium-128k-instruct
14BPhi
fp16
33.3 GB / 48 GB VRAM69%
⚡53-63 tok/s↔131k📄MIT
💻 coding💬 chat🔬 research🔢 math
ollama pull phi3:mediumMistral-Small-24B-Instruct
24BMistral
fp16
48 GB / 48 GB VRAM100%
⚡20-30 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Mistral-Small-24B-InstructInternLM2.5-20B-Chat
20BInternLM
fp16
48 GB / 48 GB VRAM100%
⚡20-30 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull InternLM2.5-20B-ChatMistral-Nemo-12B-Instruct
12BMistral
fp16
28.9 GB / 48 GB VRAM60%
⚡64-74 tok/s↔128k📄Apache-2.0
💬 chat💻 coding🎨 creative
ollama pull mistral-nemo:12bQwen3.5-9B
9BQwen
fp16
19.8 GB / 48 GB VRAM41%
⚡86-96 tok/s↔262k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math agentic reasoning vision
hf download Qwen/Qwen3.5-9BDeepSeek-R1-Distill-Llama-8B
8BDeepSeek
fp16
20.1 GB / 48 GB VRAM42%
⚡86-96 tok/s↔33k📄MIT
🔢 math🔬 research💬 chat
ollama pull deepseek-r1:8bGemma-4-26B-A4B
25.2BGemmaMoE · 3.8B active
q8_0
27.7 GB / 48 GB VRAM58%
⚡68-78 tok/s↔262k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 55.4 GB · 11-17 tok/s
💬 chat💻 coding🔬 research🔢 math agentic reasoning vision
hf download google/gemma-4-26B-A4B-itQwen3-30B-A3B
30BQwen
q8_0
38 GB / 48 GB VRAM79%
⚡43-53 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 64 GB · 16-22 tok/s
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Qwen3-30B-A3BQwen2.5-Coder-14B-Instruct
14BQwen
fp16
33.3 GB / 48 GB VRAM69%
⚡54-64 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔢 math
ollama pull qwen2.5-coder:14bLlama-4-Scout-17B
109BLlamaMoE · 17B active
q4_K_M
45 GB / 48 GB VRAM94%
⚡26-36 tok/s↔524k📄Llama 4 Community
↑ Hybrid upgrade: q8_0 · 85 GB · 11-17 tok/s
💬 chat💻 coding vision
ollama pull llama4-scoutMixtral-8x22B-Instruct
141B MoEMistral
q3_K_M
47.8 GB / 48 GB VRAM100%
⚡22-32 tok/s↔66k📄Apache-2.0
↑ Hybrid upgrade: q8_0 · 108 GB · 10-16 tok/s
💬 chat💻 coding🔬 research🎨 creative🔢 math
ollama pull mixtral:8x22bWizardLM-2-8x22B
141B MoEWizardLM
q3_K_M
47.8 GB / 48 GB VRAM100%
⚡22-32 tok/s↔66k📄Llama 2 Community
↑ Hybrid upgrade: q8_0 · 108 GB · 10-16 tok/s
💬 chat💻 coding🔬 research🎨 creative
ollama pull wizardlm2:8x22bLlama-3.2-11B-Vision-Instruct
11BLlama
fp16
22 GB / 48 GB VRAM46%
⚡82-92 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math vision
ollama pull Llama-3.2-11B-Vision-InstructGemma-3-12B
12BGemma
fp16
26 GB / 48 GB VRAM54%
⚡73-83 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Gemma-3-12BQwen2.5-14B-Instruct
14BQwen
fp16
33.3 GB / 48 GB VRAM69%
⚡55-65 tok/s↔33k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math
ollama pull qwen2.5:14bGemma-2-27B-Instruct
27BGemma
q8_0
33.7 GB / 48 GB VRAM70%
⚡54-64 tok/s↔8k📄Gemma Terms
↑ Hybrid upgrade: fp16 · 61.9 GB · 17-23 tok/s
💬 chat💻 coding🔬 research🎨 creative🔢 math
ollama pull gemma2:27bQwen3.5-4B
4BQwen
fp16
8.8 GB / 48 GB VRAM18%
⚡114-124 tok/s↔262k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math agentic reasoning vision
hf download Qwen/Qwen3.5-4BSolar-10.7B-Instruct
10.7BSolar
fp16
22 GB / 48 GB VRAM46%
⚡83-93 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Solar-10.7B-InstructDeepSeek-R1-Distill-Qwen-14B
14BDeepSeek
fp16
33.3 GB / 48 GB VRAM69%
⚡56-66 tok/s↔33k📄MIT
🔢 math🔬 research💻 coding
ollama pull deepseek-r1:14bMinistral-8B-Instruct
8BMistral
fp16
20.1 GB / 48 GB VRAM42%
⚡88-98 tok/s↔128k📄Mistral Research
💬 chat🔬 research
ollama pull ministral:8bDBRX-Instruct
132B MoEDatabricks
q3_K_M
44.7 GB / 48 GB VRAM93%
⚡29-39 tok/s↔33k📄Databricks Open
↑ Hybrid upgrade: q8_0 · 100.9 GB · 11-17 tok/s
💬 chat💻 coding🔬 research
ollama pull dbrxQwen3-8B
8BQwen
fp16
16 GB / 48 GB VRAM33%
⚡98-108 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Qwen3-8BCommand-R-7B
7BCohere
fp16
17.9 GB / 48 GB VRAM37%
⚡94-104 tok/s↔128k📄CC-BY-NC
🔬 research💬 chat🎨 creative
ollama pull command-r:7bMixtral-8x7B-Instruct
46.7B MoEMistral
q5_K_M
39.9 GB / 48 GB VRAM83%
⚡41-51 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 105.9 GB · 10-16 tok/s
💬 chat💻 coding🔬 research🎨 creative
ollama pull mixtral:8x7bGranite-3.1-8B-Instruct
8BGranite
fp16
16 GB / 48 GB VRAM33%
⚡99-109 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Granite-3.1-8B-InstructMistral-7B-Instruct-v0.3
7BMistral
fp16
17.9 GB / 48 GB VRAM37%
⚡94-104 tok/s↔33k📄Apache-2.0
💬 chat🔬 research
ollama pull mistral:7bDolphin-2.9.2-Qwen2-7B
7BDolphin
fp16
17.9 GB / 48 GB VRAM37%
⚡94-104 tok/s↔33k📄Apache-2.0
💬 chat🎨 creative💻 coding
ollama pull dolphin3:8bQwen2.5-Coder-7B-Instruct
7BQwen
fp16
17.9 GB / 48 GB VRAM37%
⚡95-105 tok/s↔33k📄Apache-2.0
💻 coding💬 chat
ollama pull qwen2.5-coder:7bDeepSeek-Coder-33B-Instruct
33BDeepSeek
q8_0
40.7 GB / 48 GB VRAM85%
⚡40-50 tok/s↔16k📄DeepSeek License
↑ Hybrid upgrade: fp16 · 75.1 GB · 14-20 tok/s
💻 coding💬 chat🔢 math
ollama pull deepseek-coder:33bQwen2.5-7B-Instruct
7BQwen
fp16
17.9 GB / 48 GB VRAM37%
⚡95-105 tok/s↔33k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math
ollama pull qwen2.5:7bOpenHermes-2.5-Mistral-7B
7BNous
fp16
17.9 GB / 48 GB VRAM37%
⚡95-105 tok/s↔33k📄Apache-2.0
💬 chat🎨 creative
ollama pull openhermesGLM-4.7-9B-Chat
9BGLM
fp16
18.8 GB / 48 GB VRAM39%
⚡93-103 tok/s↔33k📄Apache-2.0
💬 chat💻 coding
ollama pull glm4:9bLlama-3.1-8B-Instruct
8BLlama
fp16
20.1 GB / 48 GB VRAM42%
⚡90-100 tok/s↔131k📄Llama 3.1 Community
💬 chat🔬 research🎨 creative
ollama pull llama3.1:8bFalcon-40B-Instruct
40BFalcon
q8_0
48 GB / 48 GB VRAM100%
⚡26-36 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 80 GB · 13-19 tok/s
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Falcon-40B-InstructZephyr-7B-beta
7BZephyr
fp16
17.9 GB / 48 GB VRAM37%
⚡96-106 tok/s↔33k📄MIT
💬 chat🎨 creative
ollama pull zephyr:7bGemma-2-9B-Instruct
9BGemma
fp16
22.3 GB / 48 GB VRAM46%
⚡85-95 tok/s↔8k📄Gemma Terms
💬 chat🔬 research🎨 creative
ollama pull gemma2:9bCodeLlama-7B-Instruct
7BCodeLlama
fp16
17.9 GB / 48 GB VRAM37%
⚡96-106 tok/s↔16k📄Llama 2 Community
💻 coding
ollama pull codellama:7bNeural-Chat-7B-v3.3
7BIntel
fp16
17.9 GB / 48 GB VRAM37%
⚡96-106 tok/s↔33k📄Apache-2.0
💬 chat🔬 research
ollama pull neural-chatVicuna-13B
13BVicuna
fp16
26 GB / 48 GB VRAM54%
⚡77-87 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Vicuna-13BOrca-2-13B
13BOrca
fp16
26 GB / 48 GB VRAM54%
⚡77-87 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Orca-2-13BPhi-4-mini-instruct
3.8BPhi
fp16
10.9 GB / 48 GB VRAM23%
⚡113-123 tok/s↔128k📄MIT
💬 chat💻 coding🔬 research
ollama pull phi4-miniGemma-4-E4B
8BGemma
fp16
17.6 GB / 48 GB VRAM37%
⚡97-107 tok/s↔131k📄Apache-2.0
💬 chat💻 coding🔬 research agentic reasoning vision
hf download google/gemma-4-E4B-itCommand-R-35B
35BCohere
q8_0
43.1 GB / 48 GB VRAM90%
⚡37-47 tok/s↔128k📄CC-BY-NC
↑ Hybrid upgrade: fp16 · 79.5 GB · 13-19 tok/s
🔬 research💬 chat💻 coding
ollama pull command-r:35bCodeLlama-34B-Instruct
34BCodeLlama
q8_0
41.9 GB / 48 GB VRAM87%
⚡40-50 tok/s↔16k📄Llama 2 Community
↑ Hybrid upgrade: fp16 · 77.3 GB · 13-19 tok/s
💻 coding🔢 math
ollama pull codellama:34bDeepSeek-R1-Distill-Qwen-7B
7BDeepSeek
fp16
17.9 GB / 48 GB VRAM37%
⚡98-108 tok/s↔33k📄MIT
🔢 math🔬 research💻 coding
ollama pull deepseek-r1:7bOpenChat-3.6-8B
8BOpenChat
fp16
20.1 GB / 48 GB VRAM42%
⚡93-103 tok/s↔8k📄Apache-2.0
💬 chat🎨 creative
ollama pull openchat:8bPhi-3-mini-4k-instruct
3.8BPhi
fp16
10.9 GB / 48 GB VRAM23%
⚡115-125 tok/s↔4k📄MIT
💬 chat💻 coding
ollama pull phi3:miniGemma-3-4B
4BGemma
fp16
16 GB / 48 GB VRAM33%
⚡103-113 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Gemma-3-4BInternLM2.5-7B-Chat
7BInternLM
fp16
16 GB / 48 GB VRAM33%
⚡104-114 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull InternLM2.5-7B-ChatDeepSeek-Coder-6.7B-Instruct
6.7BDeepSeek
fp16
17.2 GB / 48 GB VRAM36%
⚡101-111 tok/s↔16k📄DeepSeek License
💻 coding💬 chat
ollama pull deepseek-coder:6.7bVicuna-7B
7BVicuna
fp16
16 GB / 48 GB VRAM33%
⚡105-115 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Vicuna-7BOrca-2-7B
7BOrca
fp16
16 GB / 48 GB VRAM33%
⚡105-115 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Orca-2-7BQwen2-VL-7B-Instruct
7BQwen
fp16
16 GB / 48 GB VRAM33%
⚡105-115 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Qwen2-VL-7B-InstructWizardLM-2-7B
7BWizardLM
fp16
17.9 GB / 48 GB VRAM37%
⚡100-110 tok/s↔33k📄Llama 2 Community
💬 chat💻 coding
ollama pull wizardlm2:7bStarCoder2-15B-Instruct
15BStarCoder
fp16
35.5 GB / 48 GB VRAM74%
⚡58-68 tok/s↔16k📄OpenRAIL-M
💻 coding🔢 math
ollama pull starcoder2:15bFalcon-7B-Instruct
7BFalcon
fp16
16 GB / 48 GB VRAM33%
⚡106-116 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Falcon-7B-InstructCodeGemma-7B-Instruct
7BGemma
fp16
17.9 GB / 48 GB VRAM37%
⚡101-111 tok/s↔8k📄Gemma Terms
💻 coding💬 chat
ollama pull codegemma:7bQwen2.5-3B-Instruct
3BQwen
fp16
9.1 GB / 48 GB VRAM19%
⚡123-133 tok/s↔33k📄Apache-2.0
💬 chat💻 coding🔬 research
ollama pull qwen2.5:3bLlama-3.2-3B-Instruct
3BLlama
fp16
9.1 GB / 48 GB VRAM19%
⚡123-133 tok/s↔131k📄Llama 3.2 Community
💬 chat🎨 creative
ollama pull llama3.2:3bCodeLlama-13B-Instruct
13BCodeLlama
fp16
31.1 GB / 48 GB VRAM65%
⚡71-81 tok/s↔16k📄Llama 2 Community
💻 coding🔢 math
ollama pull codellama:13bLlama-3.2-1B-Instruct
1BLlama
fp16
2 GB / 48 GB VRAM4%
⚡142-152 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Llama-3.2-1B-InstructStarCoder2-7B-Instruct
7BStarCoder
fp16
17.9 GB / 48 GB VRAM37%
⚡104-114 tok/s↔16k📄OpenRAIL-M
💻 coding
ollama pull starcoder2:7bSmolLM2-1.7B-Instruct
1.7BSmolLM
fp16
4 GB / 48 GB VRAM8%
⚡140-150 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull SmolLM2-1.7B-InstructGemma-2-2B-Instruct
2BGemma
fp16
6.9 GB / 48 GB VRAM14%
⚡133-143 tok/s↔8k📄Gemma Terms
💬 chat🎨 creative
ollama pull gemma2:2bStarCoder2-3B-Instruct
3BStarCoder
fp16
9.1 GB / 48 GB VRAM19%
⚡129-139 tok/s↔16k📄OpenRAIL-M
💻 coding
ollama pull starcoder2:3bTinyLlama-1.1B
1.1BTinyLlama
fp16
2 GB / 48 GB VRAM4%
⚡148-158 tok/s↔33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull TinyLlama-1.1BYi-1.5-34B-Chat
34BYi
q8_0
41.9 GB / 48 GB VRAM87%
⚡55-65 tok/s↔33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 77.3 GB · 13-19 tok/s
💬 chat🔬 research🎨 creative🔢 math
ollama pull yi:34bYi-1.5-9B-Chat
9BYi
fp16
22.3 GB / 48 GB VRAM46%
⚡104-114 tok/s↔33k📄Apache-2.0
💬 chat🔬 research🎨 creative
ollama pull yi:9bLlama-3-8B-Instruct
8BLlama
fp16
20.1 GB / 48 GB VRAM42%
⚡113-123 tok/s↔8k📄Llama 3 Community
💬 chat🎨 creative
ollama pull llama3:8b