LLM Finder

Hybrid CPU+GPU mode — run massive models with RAM offloading.

Configure your hardware

MiniMax-M2.5

230B
MiniMaxMoE · 10B active
🔀 q4_K_M
98 GB / 112 GB (GPU+RAM)88%
11-17 tok/s1.0M📄Apache-2.0
💬 chat💻 coding🔬 research agentic
llama-server -hf MiniMax-M2.5-GGUF:q4_K_M \ --jinja -ngl 999 --ctx-size 16384 --fit on

Qwen3.5-122B-A10B

122B
QwenMoE · 10B active
q3_K_M
40 GB / 48 GB VRAM83%
33-43 tok/s1.0M📄Apache-2.0
↑ Hybrid upgrade: q8_0 · 68 GB · 11-17 tok/s
💬 chat💻 coding🔬 research🔢 math agentic
# Needs 40GB+ VRAM/RAM — single H100 or dual 3090 hf download Qwen/Qwen3.5-122B-A10B-Instruct-GGUF --include "Q3_K_M/*"

Qwen3-235B-A22B

235B
QwenMoE · 22B active
🔀 q4_K_M
100 GB / 112 GB (GPU+RAM)89%
11-17 tok/s131k📄Apache-2.0
💬 chat💻 coding🔬 research reasoning
llama-server -hf Qwen3-235B-A22B-GGUF:q4_K_M \ --jinja -ngl 999 --ctx-size 16384 --fit on

Llama-3.3-70B-Instruct

70B
Llama
q4_K_M
47.4 GB / 48 GB VRAM99%
17-27 tok/s131k📄Llama 3.3 Community
↑ Hybrid upgrade: q8_0 · 84.4 GB · 12-18 tok/s
💬 chat💻 coding🔬 research🎨 creative🔢 math
ollama pull llama3.3:70b

Qwen2.5-72B-Instruct

72B
Qwen
q3_K_M
38.4 GB / 48 GB VRAM80%
38-48 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: q8_0 · 86.8 GB · 12-18 tok/s
💬 chat💻 coding🔬 research🔢 math🎨 creative
ollama pull qwen2.5:72b

Qwen2.5-Coder-32B-Instruct

32B
Qwen
q8_0
39.6 GB / 48 GB VRAM83%
35-45 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 72.9 GB · 14-20 tok/s
💻 coding💬 chat🔢 math🔬 research
ollama pull qwen2.5-coder:32b

Nous-Hermes-2-Mixtral-8x7B-DPO

46.7B MoE
Nous
q5_K_M
39.9 GB / 48 GB VRAM83%
35-45 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 105.9 GB · 10-16 tok/s
💬 chat🎨 creative💻 coding
ollama pull nous-hermes2-mixtral

Llama-3.1-70B-Instruct

70B
Llama
q4_K_M
47.4 GB / 48 GB VRAM99%
18-28 tok/s131k📄Llama 3.1 Community
↑ Hybrid upgrade: q8_0 · 84.4 GB · 12-18 tok/s
💬 chat🔬 research🎨 creative🔢 math
ollama pull llama3.1:70b

Qwen3.5-27B

27B
Qwen
q8_0
32.4 GB / 48 GB VRAM68%
53-63 tok/s1.0M📄Apache-2.0
↑ Hybrid upgrade: fp16 · 58.5 GB · 17-23 tok/s
💬 chat💻 coding🔬 research🔢 math agentic
ollama pull qwen3.5:27b

Qwen3-32B

32B
Qwen
q8_0
38 GB / 48 GB VRAM79%
40-50 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 64 GB · 16-22 tok/s
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Qwen3-32B

DeepSeek-R1-Distill-Llama-70B

70B
DeepSeek
q4_K_M
47.4 GB / 48 GB VRAM99%
18-28 tok/s33k📄MIT
↑ Hybrid upgrade: q8_0 · 84.4 GB · 12-18 tok/s
🔢 math🔬 research💻 coding💬 chat
ollama pull deepseek-r1:70b

DeepSeek-R1-Distill-Qwen-32B

32B
DeepSeek
q8_0
38 GB / 48 GB VRAM79%
40-50 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 64 GB · 16-22 tok/s
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull DeepSeek-R1-Distill-Qwen-32B

Qwen3.5-35B-A3B

35B
QwenMoE · 3B active
q8_0
43.1 GB / 48 GB VRAM90%
28-38 tok/s1.0M📄Apache-2.0
↑ Hybrid upgrade: fp16 · 79.5 GB · 11-17 tok/s
💬 chat💻 coding agentic🔬 research🔢 math
ollama pull qwen3.5:35b-a3b

Gemma-3-27B

27B
Gemma
q8_0
38 GB / 48 GB VRAM79%
41-51 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 64 GB · 16-22 tok/s
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Gemma-3-27B

Qwen2.5-32B-Instruct

32B
Qwen
q8_0
39.6 GB / 48 GB VRAM83%
37-47 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 72.9 GB · 14-20 tok/s
💬 chat💻 coding🔬 research🔢 math🎨 creative
ollama pull qwen2.5:32b

Phi-4-14B-Instruct

14B
Phi
fp16
33.3 GB / 48 GB VRAM69%
52-62 tok/s128k📄MIT
💻 coding🔢 math🔬 research💬 chat
ollama pull phi4:14b

Phi-3-medium-128k-instruct

14B
Phi
fp16
33.3 GB / 48 GB VRAM69%
53-63 tok/s131k📄MIT
💻 coding💬 chat🔬 research🔢 math
ollama pull phi3:medium

Mistral-Small-24B-Instruct

24B
Mistral
fp16
48 GB / 48 GB VRAM100%
20-30 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Mistral-Small-24B-Instruct

InternLM2.5-20B-Chat

20B
InternLM
fp16
48 GB / 48 GB VRAM100%
20-30 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull InternLM2.5-20B-Chat

Mistral-Nemo-12B-Instruct

12B
Mistral
fp16
28.9 GB / 48 GB VRAM60%
64-74 tok/s128k📄Apache-2.0
💬 chat💻 coding🎨 creative
ollama pull mistral-nemo:12b

Qwen3.5-9B

9B
Qwen
fp16
19.8 GB / 48 GB VRAM41%
86-96 tok/s262k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math agentic reasoning vision
hf download Qwen/Qwen3.5-9B

DeepSeek-R1-Distill-Llama-8B

8B
DeepSeek
fp16
20.1 GB / 48 GB VRAM42%
86-96 tok/s33k📄MIT
🔢 math🔬 research💬 chat
ollama pull deepseek-r1:8b

Gemma-4-26B-A4B

25.2B
GemmaMoE · 3.8B active
q8_0
27.7 GB / 48 GB VRAM58%
68-78 tok/s262k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 55.4 GB · 11-17 tok/s
💬 chat💻 coding🔬 research🔢 math agentic reasoning vision
hf download google/gemma-4-26B-A4B-it

Qwen3-30B-A3B

30B
Qwen
q8_0
38 GB / 48 GB VRAM79%
43-53 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 64 GB · 16-22 tok/s
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Qwen3-30B-A3B

Qwen2.5-Coder-14B-Instruct

14B
Qwen
fp16
33.3 GB / 48 GB VRAM69%
54-64 tok/s33k📄Apache-2.0
💻 coding💬 chat🔢 math
ollama pull qwen2.5-coder:14b

Llama-4-Scout-17B

109B
LlamaMoE · 17B active
q4_K_M
45 GB / 48 GB VRAM94%
26-36 tok/s524k📄Llama 4 Community
↑ Hybrid upgrade: q8_0 · 85 GB · 11-17 tok/s
💬 chat💻 coding vision
ollama pull llama4-scout

Mixtral-8x22B-Instruct

141B MoE
Mistral
q3_K_M
47.8 GB / 48 GB VRAM100%
22-32 tok/s66k📄Apache-2.0
↑ Hybrid upgrade: q8_0 · 108 GB · 10-16 tok/s
💬 chat💻 coding🔬 research🎨 creative🔢 math
ollama pull mixtral:8x22b

WizardLM-2-8x22B

141B MoE
WizardLM
q3_K_M
47.8 GB / 48 GB VRAM100%
22-32 tok/s66k📄Llama 2 Community
↑ Hybrid upgrade: q8_0 · 108 GB · 10-16 tok/s
💬 chat💻 coding🔬 research🎨 creative
ollama pull wizardlm2:8x22b

Llama-3.2-11B-Vision-Instruct

11B
Llama
fp16
22 GB / 48 GB VRAM46%
82-92 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math vision
ollama pull Llama-3.2-11B-Vision-Instruct

Gemma-3-12B

12B
Gemma
fp16
26 GB / 48 GB VRAM54%
73-83 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Gemma-3-12B

Qwen2.5-14B-Instruct

14B
Qwen
fp16
33.3 GB / 48 GB VRAM69%
55-65 tok/s33k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math
ollama pull qwen2.5:14b

Gemma-2-27B-Instruct

27B
Gemma
q8_0
33.7 GB / 48 GB VRAM70%
54-64 tok/s8k📄Gemma Terms
↑ Hybrid upgrade: fp16 · 61.9 GB · 17-23 tok/s
💬 chat💻 coding🔬 research🎨 creative🔢 math
ollama pull gemma2:27b

Qwen3.5-4B

4B
Qwen
fp16
8.8 GB / 48 GB VRAM18%
114-124 tok/s262k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math agentic reasoning vision
hf download Qwen/Qwen3.5-4B

Solar-10.7B-Instruct

10.7B
Solar
fp16
22 GB / 48 GB VRAM46%
83-93 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Solar-10.7B-Instruct

DeepSeek-R1-Distill-Qwen-14B

14B
DeepSeek
fp16
33.3 GB / 48 GB VRAM69%
56-66 tok/s33k📄MIT
🔢 math🔬 research💻 coding
ollama pull deepseek-r1:14b

Ministral-8B-Instruct

8B
Mistral
fp16
20.1 GB / 48 GB VRAM42%
88-98 tok/s128k📄Mistral Research
💬 chat🔬 research
ollama pull ministral:8b

DBRX-Instruct

132B MoE
Databricks
q3_K_M
44.7 GB / 48 GB VRAM93%
29-39 tok/s33k📄Databricks Open
↑ Hybrid upgrade: q8_0 · 100.9 GB · 11-17 tok/s
💬 chat💻 coding🔬 research
ollama pull dbrx

Qwen3-8B

8B
Qwen
fp16
16 GB / 48 GB VRAM33%
98-108 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Qwen3-8B

Command-R-7B

7B
Cohere
fp16
17.9 GB / 48 GB VRAM37%
94-104 tok/s128k📄CC-BY-NC
🔬 research💬 chat🎨 creative
ollama pull command-r:7b

Mixtral-8x7B-Instruct

46.7B MoE
Mistral
q5_K_M
39.9 GB / 48 GB VRAM83%
41-51 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 105.9 GB · 10-16 tok/s
💬 chat💻 coding🔬 research🎨 creative
ollama pull mixtral:8x7b

Granite-3.1-8B-Instruct

8B
Granite
fp16
16 GB / 48 GB VRAM33%
99-109 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Granite-3.1-8B-Instruct

Mistral-7B-Instruct-v0.3

7B
Mistral
fp16
17.9 GB / 48 GB VRAM37%
94-104 tok/s33k📄Apache-2.0
💬 chat🔬 research
ollama pull mistral:7b

Dolphin-2.9.2-Qwen2-7B

7B
Dolphin
fp16
17.9 GB / 48 GB VRAM37%
94-104 tok/s33k📄Apache-2.0
💬 chat🎨 creative💻 coding
ollama pull dolphin3:8b

Qwen2.5-Coder-7B-Instruct

7B
Qwen
fp16
17.9 GB / 48 GB VRAM37%
95-105 tok/s33k📄Apache-2.0
💻 coding💬 chat
ollama pull qwen2.5-coder:7b

DeepSeek-Coder-33B-Instruct

33B
DeepSeek
q8_0
40.7 GB / 48 GB VRAM85%
40-50 tok/s16k📄DeepSeek License
↑ Hybrid upgrade: fp16 · 75.1 GB · 14-20 tok/s
💻 coding💬 chat🔢 math
ollama pull deepseek-coder:33b

Qwen2.5-7B-Instruct

7B
Qwen
fp16
17.9 GB / 48 GB VRAM37%
95-105 tok/s33k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math
ollama pull qwen2.5:7b

OpenHermes-2.5-Mistral-7B

7B
Nous
fp16
17.9 GB / 48 GB VRAM37%
95-105 tok/s33k📄Apache-2.0
💬 chat🎨 creative
ollama pull openhermes

GLM-4.7-9B-Chat

9B
GLM
fp16
18.8 GB / 48 GB VRAM39%
93-103 tok/s33k📄Apache-2.0
💬 chat💻 coding
ollama pull glm4:9b

Llama-3.1-8B-Instruct

8B
Llama
fp16
20.1 GB / 48 GB VRAM42%
90-100 tok/s131k📄Llama 3.1 Community
💬 chat🔬 research🎨 creative
ollama pull llama3.1:8b

Falcon-40B-Instruct

40B
Falcon
q8_0
48 GB / 48 GB VRAM100%
26-36 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 80 GB · 13-19 tok/s
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Falcon-40B-Instruct

Zephyr-7B-beta

7B
Zephyr
fp16
17.9 GB / 48 GB VRAM37%
96-106 tok/s33k📄MIT
💬 chat🎨 creative
ollama pull zephyr:7b

Gemma-2-9B-Instruct

9B
Gemma
fp16
22.3 GB / 48 GB VRAM46%
85-95 tok/s8k📄Gemma Terms
💬 chat🔬 research🎨 creative
ollama pull gemma2:9b

CodeLlama-7B-Instruct

7B
CodeLlama
fp16
17.9 GB / 48 GB VRAM37%
96-106 tok/s16k📄Llama 2 Community
💻 coding
ollama pull codellama:7b

Neural-Chat-7B-v3.3

7B
Intel
fp16
17.9 GB / 48 GB VRAM37%
96-106 tok/s33k📄Apache-2.0
💬 chat🔬 research
ollama pull neural-chat

Vicuna-13B

13B
Vicuna
fp16
26 GB / 48 GB VRAM54%
77-87 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Vicuna-13B

Orca-2-13B

13B
Orca
fp16
26 GB / 48 GB VRAM54%
77-87 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Orca-2-13B

Phi-4-mini-instruct

3.8B
Phi
fp16
10.9 GB / 48 GB VRAM23%
113-123 tok/s128k📄MIT
💬 chat💻 coding🔬 research
ollama pull phi4-mini

Gemma-4-E4B

8B
Gemma
fp16
17.6 GB / 48 GB VRAM37%
97-107 tok/s131k📄Apache-2.0
💬 chat💻 coding🔬 research agentic reasoning vision
hf download google/gemma-4-E4B-it

Command-R-35B

35B
Cohere
q8_0
43.1 GB / 48 GB VRAM90%
37-47 tok/s128k📄CC-BY-NC
↑ Hybrid upgrade: fp16 · 79.5 GB · 13-19 tok/s
🔬 research💬 chat💻 coding
ollama pull command-r:35b

CodeLlama-34B-Instruct

34B
CodeLlama
q8_0
41.9 GB / 48 GB VRAM87%
40-50 tok/s16k📄Llama 2 Community
↑ Hybrid upgrade: fp16 · 77.3 GB · 13-19 tok/s
💻 coding🔢 math
ollama pull codellama:34b

DeepSeek-R1-Distill-Qwen-7B

7B
DeepSeek
fp16
17.9 GB / 48 GB VRAM37%
98-108 tok/s33k📄MIT
🔢 math🔬 research💻 coding
ollama pull deepseek-r1:7b

OpenChat-3.6-8B

8B
OpenChat
fp16
20.1 GB / 48 GB VRAM42%
93-103 tok/s8k📄Apache-2.0
💬 chat🎨 creative
ollama pull openchat:8b

Phi-3-mini-4k-instruct

3.8B
Phi
fp16
10.9 GB / 48 GB VRAM23%
115-125 tok/s4k📄MIT
💬 chat💻 coding
ollama pull phi3:mini

Gemma-3-4B

4B
Gemma
fp16
16 GB / 48 GB VRAM33%
103-113 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Gemma-3-4B

InternLM2.5-7B-Chat

7B
InternLM
fp16
16 GB / 48 GB VRAM33%
104-114 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull InternLM2.5-7B-Chat

DeepSeek-Coder-6.7B-Instruct

6.7B
DeepSeek
fp16
17.2 GB / 48 GB VRAM36%
101-111 tok/s16k📄DeepSeek License
💻 coding💬 chat
ollama pull deepseek-coder:6.7b

Vicuna-7B

7B
Vicuna
fp16
16 GB / 48 GB VRAM33%
105-115 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Vicuna-7B

Orca-2-7B

7B
Orca
fp16
16 GB / 48 GB VRAM33%
105-115 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Orca-2-7B

Qwen2-VL-7B-Instruct

7B
Qwen
fp16
16 GB / 48 GB VRAM33%
105-115 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Qwen2-VL-7B-Instruct

WizardLM-2-7B

7B
WizardLM
fp16
17.9 GB / 48 GB VRAM37%
100-110 tok/s33k📄Llama 2 Community
💬 chat💻 coding
ollama pull wizardlm2:7b

StarCoder2-15B-Instruct

15B
StarCoder
fp16
35.5 GB / 48 GB VRAM74%
58-68 tok/s16k📄OpenRAIL-M
💻 coding🔢 math
ollama pull starcoder2:15b

Falcon-7B-Instruct

7B
Falcon
fp16
16 GB / 48 GB VRAM33%
106-116 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Falcon-7B-Instruct

CodeGemma-7B-Instruct

7B
Gemma
fp16
17.9 GB / 48 GB VRAM37%
101-111 tok/s8k📄Gemma Terms
💻 coding💬 chat
ollama pull codegemma:7b

Qwen2.5-3B-Instruct

3B
Qwen
fp16
9.1 GB / 48 GB VRAM19%
123-133 tok/s33k📄Apache-2.0
💬 chat💻 coding🔬 research
ollama pull qwen2.5:3b

Llama-3.2-3B-Instruct

3B
Llama
fp16
9.1 GB / 48 GB VRAM19%
123-133 tok/s131k📄Llama 3.2 Community
💬 chat🎨 creative
ollama pull llama3.2:3b

CodeLlama-13B-Instruct

13B
CodeLlama
fp16
31.1 GB / 48 GB VRAM65%
71-81 tok/s16k📄Llama 2 Community
💻 coding🔢 math
ollama pull codellama:13b

Llama-3.2-1B-Instruct

1B
Llama
fp16
2 GB / 48 GB VRAM4%
142-152 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Llama-3.2-1B-Instruct

StarCoder2-7B-Instruct

7B
StarCoder
fp16
17.9 GB / 48 GB VRAM37%
104-114 tok/s16k📄OpenRAIL-M
💻 coding
ollama pull starcoder2:7b

SmolLM2-1.7B-Instruct

1.7B
SmolLM
fp16
4 GB / 48 GB VRAM8%
140-150 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull SmolLM2-1.7B-Instruct

Gemma-2-2B-Instruct

2B
Gemma
fp16
6.9 GB / 48 GB VRAM14%
133-143 tok/s8k📄Gemma Terms
💬 chat🎨 creative
ollama pull gemma2:2b

StarCoder2-3B-Instruct

3B
StarCoder
fp16
9.1 GB / 48 GB VRAM19%
129-139 tok/s16k📄OpenRAIL-M
💻 coding
ollama pull starcoder2:3b

TinyLlama-1.1B

1.1B
TinyLlama
fp16
2 GB / 48 GB VRAM4%
148-158 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull TinyLlama-1.1B

Yi-1.5-34B-Chat

34B
Yi
q8_0
41.9 GB / 48 GB VRAM87%
55-65 tok/s33k📄Apache-2.0
↑ Hybrid upgrade: fp16 · 77.3 GB · 13-19 tok/s
💬 chat🔬 research🎨 creative🔢 math
ollama pull yi:34b

Yi-1.5-9B-Chat

9B
Yi
fp16
22.3 GB / 48 GB VRAM46%
104-114 tok/s33k📄Apache-2.0
💬 chat🔬 research🎨 creative
ollama pull yi:9b

Llama-3-8B-Instruct

8B
Llama
fp16
20.1 GB / 48 GB VRAM42%
113-123 tok/s8k📄Llama 3 Community
💬 chat🎨 creative
ollama pull llama3:8b