LLM Finder

Match local AI models to your GPU.

Configure your hardware

Qwen3.5-27B

27B
Qwen
q3_K_M
14.6 GB / 16 GB VRAM91%
19-29 tok/s1.0M📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math agentic
ollama pull qwen3.5:27b

Phi-4-14B-Instruct

14B
Phi
q5_K_M
12.9 GB / 16 GB VRAM81%
25-35 tok/s128k📄MIT
💻 coding🔢 math🔬 research💬 chat
ollama pull phi4:14b

Phi-3-medium-128k-instruct

14B
Phi
q5_K_M
12.9 GB / 16 GB VRAM81%
25-35 tok/s131k📄MIT
💻 coding💬 chat🔬 research🔢 math
ollama pull phi3:medium

Mistral-Small-24B-Instruct

24B
Mistral
q4_K_M
16 GB / 16 GB VRAM100%
20-30 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Mistral-Small-24B-Instruct

InternLM2.5-20B-Chat

20B
InternLM
q4_K_M
16 GB / 16 GB VRAM100%
20-30 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull InternLM2.5-20B-Chat

Mistral-Nemo-12B-Instruct

12B
Mistral
q8_0
16 GB / 16 GB VRAM100%
21-31 tok/s128k📄Apache-2.0
💬 chat💻 coding🎨 creative
ollama pull mistral-nemo:12b

Qwen3.5-9B

9B
Qwen
q8_0
9.9 GB / 16 GB VRAM62%
33-43 tok/s262k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math agentic reasoning vision
hf download Qwen/Qwen3.5-9B

DeepSeek-R1-Distill-Llama-8B

8B
DeepSeek
q8_0
11.2 GB / 16 GB VRAM70%
30-40 tok/s33k📄MIT
🔢 math🔬 research💬 chat
ollama pull deepseek-r1:8b

Gemma-4-26B-A4B

25.2B
GemmaMoE · 3.8B active
q4_K_M
13.9 GB / 16 GB VRAM87%
24-34 tok/s262k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math agentic reasoning vision
hf download google/gemma-4-26B-A4B-it

Qwen2.5-Coder-14B-Instruct

14B
Qwen
q5_K_M
12.9 GB / 16 GB VRAM81%
27-37 tok/s33k📄Apache-2.0
💻 coding💬 chat🔢 math
ollama pull qwen2.5-coder:14b

Qwen2.5-14B-Instruct

14B
Qwen
q5_K_M
12.9 GB / 16 GB VRAM81%
27-37 tok/s33k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math
ollama pull qwen2.5:14b

Llama-3.2-11B-Vision-Instruct

11B
Llama
q8_0
13 GB / 16 GB VRAM81%
27-37 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math vision
ollama pull Llama-3.2-11B-Vision-Instruct

Gemma-2-27B-Instruct

27B
Gemma
q3_K_M
15 GB / 16 GB VRAM94%
22-32 tok/s8k📄Gemma Terms
💬 chat💻 coding🔬 research🎨 creative🔢 math
ollama pull gemma2:27b

Gemma-3-12B

12B
Gemma
q8_0
16 GB / 16 GB VRAM100%
22-32 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Gemma-3-12B

Qwen3.5-4B

4B
Qwen
fp16
8.8 GB / 16 GB VRAM55%
38-48 tok/s262k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math agentic reasoning vision
hf download Qwen/Qwen3.5-4B

DeepSeek-R1-Distill-Qwen-14B

14B
DeepSeek
q5_K_M
12.9 GB / 16 GB VRAM81%
28-38 tok/s33k📄MIT
🔢 math🔬 research💻 coding
ollama pull deepseek-r1:14b

Solar-10.7B-Instruct

10.7B
Solar
q8_0
13 GB / 16 GB VRAM81%
28-38 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Solar-10.7B-Instruct

Ministral-8B-Instruct

8B
Mistral
q8_0
11.2 GB / 16 GB VRAM70%
33-43 tok/s128k📄Mistral Research
💬 chat🔬 research
ollama pull ministral:8b

Command-R-7B

7B
Cohere
q8_0
10.1 GB / 16 GB VRAM63%
36-46 tok/s128k📄CC-BY-NC
🔬 research💬 chat🎨 creative
ollama pull command-r:7b

Qwen3-8B

8B
Qwen
fp16
16 GB / 16 GB VRAM100%
24-34 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Qwen3-8B

Mistral-7B-Instruct-v0.3

7B
Mistral
q8_0
10.1 GB / 16 GB VRAM63%
36-46 tok/s33k📄Apache-2.0
💬 chat🔬 research
ollama pull mistral:7b

Dolphin-2.9.2-Qwen2-7B

7B
Dolphin
q8_0
10.1 GB / 16 GB VRAM63%
36-46 tok/s33k📄Apache-2.0
💬 chat🎨 creative💻 coding
ollama pull dolphin3:8b

Granite-3.1-8B-Instruct

8B
Granite
fp16
16 GB / 16 GB VRAM100%
24-34 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Granite-3.1-8B-Instruct

Qwen2.5-Coder-7B-Instruct

7B
Qwen
q8_0
10.1 GB / 16 GB VRAM63%
37-47 tok/s33k📄Apache-2.0
💻 coding💬 chat
ollama pull qwen2.5-coder:7b

Qwen2.5-7B-Instruct

7B
Qwen
q8_0
10.1 GB / 16 GB VRAM63%
37-47 tok/s33k📄Apache-2.0
💬 chat💻 coding🔬 research🔢 math
ollama pull qwen2.5:7b

OpenHermes-2.5-Mistral-7B

7B
Nous
q8_0
10.1 GB / 16 GB VRAM63%
37-47 tok/s33k📄Apache-2.0
💬 chat🎨 creative
ollama pull openhermes

GLM-4.7-9B-Chat

9B
GLM
q8_0
10.5 GB / 16 GB VRAM66%
36-46 tok/s33k📄Apache-2.0
💬 chat💻 coding
ollama pull glm4:9b

Llama-3.1-8B-Instruct

8B
Llama
q8_0
11.2 GB / 16 GB VRAM70%
35-45 tok/s131k📄Llama 3.1 Community
💬 chat🔬 research🎨 creative
ollama pull llama3.1:8b

Zephyr-7B-beta

7B
Zephyr
q8_0
10.1 GB / 16 GB VRAM63%
38-48 tok/s33k📄MIT
💬 chat🎨 creative
ollama pull zephyr:7b

Gemma-2-9B-Instruct

9B
Gemma
q8_0
12.4 GB / 16 GB VRAM78%
32-42 tok/s8k📄Gemma Terms
💬 chat🔬 research🎨 creative
ollama pull gemma2:9b

CodeLlama-7B-Instruct

7B
CodeLlama
q8_0
10.1 GB / 16 GB VRAM63%
38-48 tok/s16k📄Llama 2 Community
💻 coding
ollama pull codellama:7b

Neural-Chat-7B-v3.3

7B
Intel
q8_0
10.1 GB / 16 GB VRAM63%
38-48 tok/s33k📄Apache-2.0
💬 chat🔬 research
ollama pull neural-chat

Vicuna-13B

13B
Vicuna
q8_0
16 GB / 16 GB VRAM100%
26-36 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Vicuna-13B

Orca-2-13B

13B
Orca
q8_0
16 GB / 16 GB VRAM100%
26-36 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Orca-2-13B

Gemma-4-E4B

8B
Gemma
q8_0
8.8 GB / 16 GB VRAM55%
42-52 tok/s131k📄Apache-2.0
💬 chat💻 coding🔬 research agentic reasoning vision
hf download google/gemma-4-E4B-it

Phi-4-mini-instruct

3.8B
Phi
fp16
10.9 GB / 16 GB VRAM68%
37-47 tok/s128k📄MIT
💬 chat💻 coding🔬 research
ollama pull phi4-mini

DeepSeek-R1-Distill-Qwen-7B

7B
DeepSeek
q8_0
10.1 GB / 16 GB VRAM63%
40-50 tok/s33k📄MIT
🔢 math🔬 research💻 coding
ollama pull deepseek-r1:7b

OpenChat-3.6-8B

8B
OpenChat
q8_0
11.2 GB / 16 GB VRAM70%
37-47 tok/s8k📄Apache-2.0
💬 chat🎨 creative
ollama pull openchat:8b

Phi-3-mini-4k-instruct

3.8B
Phi
fp16
10.9 GB / 16 GB VRAM68%
38-48 tok/s4k📄MIT
💬 chat💻 coding
ollama pull phi3:mini

Gemma-3-4B

4B
Gemma
fp16
16 GB / 16 GB VRAM100%
28-38 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Gemma-3-4B

DeepSeek-Coder-6.7B-Instruct

6.7B
DeepSeek
q8_0
9.7 GB / 16 GB VRAM61%
42-52 tok/s16k📄DeepSeek License
💻 coding💬 chat
ollama pull deepseek-coder:6.7b

InternLM2.5-7B-Chat

7B
InternLM
fp16
16 GB / 16 GB VRAM100%
30-40 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull InternLM2.5-7B-Chat

WizardLM-2-7B

7B
WizardLM
q8_0
10.1 GB / 16 GB VRAM63%
42-52 tok/s33k📄Llama 2 Community
💬 chat💻 coding
ollama pull wizardlm2:7b

StarCoder2-15B-Instruct

15B
StarCoder
q5_K_M
13.7 GB / 16 GB VRAM86%
34-44 tok/s16k📄OpenRAIL-M
💻 coding🔢 math
ollama pull starcoder2:15b

Vicuna-7B

7B
Vicuna
fp16
16 GB / 16 GB VRAM100%
30-40 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Vicuna-7B

Orca-2-7B

7B
Orca
fp16
16 GB / 16 GB VRAM100%
30-40 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Orca-2-7B

Qwen2-VL-7B-Instruct

7B
Qwen
fp16
16 GB / 16 GB VRAM100%
30-40 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Qwen2-VL-7B-Instruct

CodeGemma-7B-Instruct

7B
Gemma
q8_0
10.1 GB / 16 GB VRAM63%
43-53 tok/s8k📄Gemma Terms
💻 coding💬 chat
ollama pull codegemma:7b

Falcon-7B-Instruct

7B
Falcon
fp16
16 GB / 16 GB VRAM100%
31-41 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Falcon-7B-Instruct

Qwen2.5-3B-Instruct

3B
Qwen
fp16
9.1 GB / 16 GB VRAM57%
46-56 tok/s33k📄Apache-2.0
💬 chat💻 coding🔬 research
ollama pull qwen2.5:3b

Llama-3.2-3B-Instruct

3B
Llama
fp16
9.1 GB / 16 GB VRAM57%
47-57 tok/s131k📄Llama 3.2 Community
💬 chat🎨 creative
ollama pull llama3.2:3b

CodeLlama-13B-Instruct

13B
CodeLlama
q5_K_M
12.1 GB / 16 GB VRAM76%
39-49 tok/s16k📄Llama 2 Community
💻 coding🔢 math
ollama pull codellama:13b

Llama-3.2-1B-Instruct

1B
Llama
fp16
2 GB / 16 GB VRAM13%
66-76 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull Llama-3.2-1B-Instruct

StarCoder2-7B-Instruct

7B
StarCoder
q8_0
10.1 GB / 16 GB VRAM63%
46-56 tok/s16k📄OpenRAIL-M
💻 coding
ollama pull starcoder2:7b

SmolLM2-1.7B-Instruct

1.7B
SmolLM
fp16
4 GB / 16 GB VRAM25%
63-73 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull SmolLM2-1.7B-Instruct

Gemma-2-2B-Instruct

2B
Gemma
fp16
6.9 GB / 16 GB VRAM43%
56-66 tok/s8k📄Gemma Terms
💬 chat🎨 creative
ollama pull gemma2:2b

StarCoder2-3B-Instruct

3B
StarCoder
fp16
9.1 GB / 16 GB VRAM57%
53-63 tok/s16k📄OpenRAIL-M
💻 coding
ollama pull starcoder2:3b

TinyLlama-1.1B

1.1B
TinyLlama
fp16
2 GB / 16 GB VRAM13%
72-82 tok/s33k📄Apache-2.0
💻 coding💬 chat🔬 research🎨 creative🔢 math
ollama pull TinyLlama-1.1B

Yi-1.5-9B-Chat

9B
Yi
q8_0
12.4 GB / 16 GB VRAM78%
51-61 tok/s33k📄Apache-2.0
💬 chat🔬 research🎨 creative
ollama pull yi:9b

Llama-3-8B-Instruct

8B
Llama
q8_0
11.2 GB / 16 GB VRAM70%
58-68 tok/s8k📄Llama 3 Community
💬 chat🎨 creative
ollama pull llama3:8b