LLM Finder 2026: Match Local AI Models to Your GPU & RAM | ToolHalla

Qwen3.5-27B

27B

Qwen

q3_K_M

14.6 GB / 16 GB VRAM91%

⚡19-29 tok/s↔1.0M📄Apache-2.0

💬 chat💻 coding🔬 research🔢 math agentic

ollama pull qwen3.5:27b

Phi-4-14B-Instruct

14B

Phi

q5_K_M

12.9 GB / 16 GB VRAM81%

⚡25-35 tok/s↔128k📄MIT

💻 coding🔢 math🔬 research💬 chat

ollama pull phi4:14b

Phi-3-medium-128k-instruct

14B

Phi

q5_K_M

12.9 GB / 16 GB VRAM81%

⚡25-35 tok/s↔131k📄MIT

💻 coding💬 chat🔬 research🔢 math

ollama pull phi3:medium

Mistral-Small-24B-Instruct

24B

Mistral

q4_K_M

16 GB / 16 GB VRAM100%

⚡20-30 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Mistral-Small-24B-Instruct

InternLM2.5-20B-Chat

20B

InternLM

q4_K_M

16 GB / 16 GB VRAM100%

⚡20-30 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull InternLM2.5-20B-Chat

Mistral-Nemo-12B-Instruct

12B

Mistral

q8_0

16 GB / 16 GB VRAM100%

⚡21-31 tok/s↔128k📄Apache-2.0

💬 chat💻 coding🎨 creative

ollama pull mistral-nemo:12b

Qwen3.5-9B

9B

Qwen

q8_0

9.9 GB / 16 GB VRAM62%

⚡33-43 tok/s↔262k📄Apache-2.0

💬 chat💻 coding🔬 research🔢 math agentic reasoning vision

hf download Qwen/Qwen3.5-9B

DeepSeek-R1-Distill-Llama-8B

8B

DeepSeek

q8_0

11.2 GB / 16 GB VRAM70%

⚡30-40 tok/s↔33k📄MIT

🔢 math🔬 research💬 chat

ollama pull deepseek-r1:8b

Gemma-4-26B-A4B

25.2B

GemmaMoE · 3.8B active

q4_K_M

13.9 GB / 16 GB VRAM87%

⚡24-34 tok/s↔262k📄Apache-2.0

💬 chat💻 coding🔬 research🔢 math agentic reasoning vision

hf download google/gemma-4-26B-A4B-it

Qwen2.5-Coder-14B-Instruct

14B

Qwen

q5_K_M

12.9 GB / 16 GB VRAM81%

⚡27-37 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔢 math

ollama pull qwen2.5-coder:14b

Qwen2.5-14B-Instruct

14B

Qwen

q5_K_M

12.9 GB / 16 GB VRAM81%

⚡27-37 tok/s↔33k📄Apache-2.0

💬 chat💻 coding🔬 research🔢 math

ollama pull qwen2.5:14b

Llama-3.2-11B-Vision-Instruct

11B

Llama

q8_0

13 GB / 16 GB VRAM81%

⚡27-37 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math vision

ollama pull Llama-3.2-11B-Vision-Instruct

Gemma-2-27B-Instruct

27B

Gemma

q3_K_M

15 GB / 16 GB VRAM94%

⚡22-32 tok/s↔8k📄Gemma Terms

💬 chat💻 coding🔬 research🎨 creative🔢 math

ollama pull gemma2:27b

Gemma-3-12B

12B

Gemma

q8_0

16 GB / 16 GB VRAM100%

⚡22-32 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Gemma-3-12B

Qwen3.5-4B

4B

Qwen

fp16

8.8 GB / 16 GB VRAM55%

⚡38-48 tok/s↔262k📄Apache-2.0

💬 chat💻 coding🔬 research🔢 math agentic reasoning vision

hf download Qwen/Qwen3.5-4B

DeepSeek-R1-Distill-Qwen-14B

14B

DeepSeek

q5_K_M

12.9 GB / 16 GB VRAM81%

⚡28-38 tok/s↔33k📄MIT

🔢 math🔬 research💻 coding

ollama pull deepseek-r1:14b

Solar-10.7B-Instruct

10.7B

Solar

q8_0

13 GB / 16 GB VRAM81%

⚡28-38 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Solar-10.7B-Instruct

Ministral-8B-Instruct

8B

Mistral

q8_0

11.2 GB / 16 GB VRAM70%

⚡33-43 tok/s↔128k📄Mistral Research

💬 chat🔬 research

ollama pull ministral:8b

Command-R-7B

7B

Cohere

q8_0

10.1 GB / 16 GB VRAM63%

⚡36-46 tok/s↔128k📄CC-BY-NC

🔬 research💬 chat🎨 creative

ollama pull command-r:7b

Qwen3-8B

8B

Qwen

fp16

16 GB / 16 GB VRAM100%

⚡24-34 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Qwen3-8B

Mistral-7B-Instruct-v0.3

7B

Mistral

q8_0

10.1 GB / 16 GB VRAM63%

⚡36-46 tok/s↔33k📄Apache-2.0

💬 chat🔬 research

ollama pull mistral:7b

Dolphin-2.9.2-Qwen2-7B

7B

Dolphin

q8_0

10.1 GB / 16 GB VRAM63%

⚡36-46 tok/s↔33k📄Apache-2.0

💬 chat🎨 creative💻 coding

ollama pull dolphin3:8b

Granite-3.1-8B-Instruct

8B

Granite

fp16

16 GB / 16 GB VRAM100%

⚡24-34 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Granite-3.1-8B-Instruct

Qwen2.5-Coder-7B-Instruct

7B

Qwen

q8_0

10.1 GB / 16 GB VRAM63%

⚡37-47 tok/s↔33k📄Apache-2.0

💻 coding💬 chat

ollama pull qwen2.5-coder:7b

Qwen2.5-7B-Instruct

7B

Qwen

q8_0

10.1 GB / 16 GB VRAM63%

⚡37-47 tok/s↔33k📄Apache-2.0

💬 chat💻 coding🔬 research🔢 math

ollama pull qwen2.5:7b

OpenHermes-2.5-Mistral-7B

7B

Nous

q8_0

10.1 GB / 16 GB VRAM63%

⚡37-47 tok/s↔33k📄Apache-2.0

💬 chat🎨 creative

ollama pull openhermes

GLM-4.7-9B-Chat

9B

GLM

q8_0

10.5 GB / 16 GB VRAM66%

⚡36-46 tok/s↔33k📄Apache-2.0

💬 chat💻 coding

ollama pull glm4:9b

Llama-3.1-8B-Instruct

8B

Llama

q8_0

11.2 GB / 16 GB VRAM70%

⚡35-45 tok/s↔131k📄Llama 3.1 Community

💬 chat🔬 research🎨 creative

ollama pull llama3.1:8b

Zephyr-7B-beta

7B

Zephyr

q8_0

10.1 GB / 16 GB VRAM63%

⚡38-48 tok/s↔33k📄MIT

💬 chat🎨 creative

ollama pull zephyr:7b

Gemma-2-9B-Instruct

9B

Gemma

q8_0

12.4 GB / 16 GB VRAM78%

⚡32-42 tok/s↔8k📄Gemma Terms

💬 chat🔬 research🎨 creative

ollama pull gemma2:9b

CodeLlama-7B-Instruct

7B

CodeLlama

q8_0

10.1 GB / 16 GB VRAM63%

⚡38-48 tok/s↔16k📄Llama 2 Community

💻 coding

ollama pull codellama:7b

Neural-Chat-7B-v3.3

7B

Intel

q8_0

10.1 GB / 16 GB VRAM63%

⚡38-48 tok/s↔33k📄Apache-2.0

💬 chat🔬 research

ollama pull neural-chat

Vicuna-13B

13B

Vicuna

q8_0

16 GB / 16 GB VRAM100%

⚡26-36 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Vicuna-13B

Orca-2-13B

13B

Orca

q8_0

16 GB / 16 GB VRAM100%

⚡26-36 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Orca-2-13B

Gemma-4-E4B

8B

Gemma

q8_0

8.8 GB / 16 GB VRAM55%

⚡42-52 tok/s↔131k📄Apache-2.0

💬 chat💻 coding🔬 research agentic reasoning vision

hf download google/gemma-4-E4B-it

Phi-4-mini-instruct

3.8B

Phi

fp16

10.9 GB / 16 GB VRAM68%

⚡37-47 tok/s↔128k📄MIT

💬 chat💻 coding🔬 research

ollama pull phi4-mini

DeepSeek-R1-Distill-Qwen-7B

7B

DeepSeek

q8_0

10.1 GB / 16 GB VRAM63%

⚡40-50 tok/s↔33k📄MIT

🔢 math🔬 research💻 coding

ollama pull deepseek-r1:7b

OpenChat-3.6-8B

8B

OpenChat

q8_0

11.2 GB / 16 GB VRAM70%

⚡37-47 tok/s↔8k📄Apache-2.0

💬 chat🎨 creative

ollama pull openchat:8b

Phi-3-mini-4k-instruct

3.8B

Phi

fp16

10.9 GB / 16 GB VRAM68%

⚡38-48 tok/s↔4k📄MIT

💬 chat💻 coding

ollama pull phi3:mini

Gemma-3-4B

4B

Gemma

fp16

16 GB / 16 GB VRAM100%

⚡28-38 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Gemma-3-4B

DeepSeek-Coder-6.7B-Instruct

6.7B

DeepSeek

q8_0

9.7 GB / 16 GB VRAM61%

⚡42-52 tok/s↔16k📄DeepSeek License

💻 coding💬 chat

ollama pull deepseek-coder:6.7b

InternLM2.5-7B-Chat

7B

InternLM

fp16

16 GB / 16 GB VRAM100%

⚡30-40 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull InternLM2.5-7B-Chat

WizardLM-2-7B

7B

WizardLM

q8_0

10.1 GB / 16 GB VRAM63%

⚡42-52 tok/s↔33k📄Llama 2 Community

💬 chat💻 coding

ollama pull wizardlm2:7b

StarCoder2-15B-Instruct

15B

StarCoder

q5_K_M

13.7 GB / 16 GB VRAM86%

⚡34-44 tok/s↔16k📄OpenRAIL-M

💻 coding🔢 math

ollama pull starcoder2:15b

Vicuna-7B

7B

Vicuna

fp16

16 GB / 16 GB VRAM100%

⚡30-40 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Vicuna-7B

Orca-2-7B

7B

Orca

fp16

16 GB / 16 GB VRAM100%

⚡30-40 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Orca-2-7B

Qwen2-VL-7B-Instruct

7B

Qwen

fp16

16 GB / 16 GB VRAM100%

⚡30-40 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Qwen2-VL-7B-Instruct

CodeGemma-7B-Instruct

7B

Gemma

q8_0

10.1 GB / 16 GB VRAM63%

⚡43-53 tok/s↔8k📄Gemma Terms

💻 coding💬 chat

ollama pull codegemma:7b

Falcon-7B-Instruct

7B

Falcon

fp16

16 GB / 16 GB VRAM100%

⚡31-41 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Falcon-7B-Instruct

Qwen2.5-3B-Instruct

3B

Qwen

fp16

9.1 GB / 16 GB VRAM57%

⚡46-56 tok/s↔33k📄Apache-2.0

💬 chat💻 coding🔬 research

ollama pull qwen2.5:3b

Llama-3.2-3B-Instruct

3B

Llama

fp16

9.1 GB / 16 GB VRAM57%

⚡47-57 tok/s↔131k📄Llama 3.2 Community

💬 chat🎨 creative

ollama pull llama3.2:3b

CodeLlama-13B-Instruct

13B

CodeLlama

q5_K_M

12.1 GB / 16 GB VRAM76%

⚡39-49 tok/s↔16k📄Llama 2 Community

💻 coding🔢 math

ollama pull codellama:13b

Llama-3.2-1B-Instruct

1B

Llama

fp16

2 GB / 16 GB VRAM13%

⚡66-76 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull Llama-3.2-1B-Instruct

StarCoder2-7B-Instruct

7B

StarCoder

q8_0

10.1 GB / 16 GB VRAM63%

⚡46-56 tok/s↔16k📄OpenRAIL-M

💻 coding

ollama pull starcoder2:7b

SmolLM2-1.7B-Instruct

1.7B

SmolLM

fp16

4 GB / 16 GB VRAM25%

⚡63-73 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull SmolLM2-1.7B-Instruct

Gemma-2-2B-Instruct

2B

Gemma

fp16

6.9 GB / 16 GB VRAM43%

⚡56-66 tok/s↔8k📄Gemma Terms

💬 chat🎨 creative

ollama pull gemma2:2b

StarCoder2-3B-Instruct

3B

StarCoder

fp16

9.1 GB / 16 GB VRAM57%

⚡53-63 tok/s↔16k📄OpenRAIL-M

💻 coding

ollama pull starcoder2:3b

TinyLlama-1.1B

1.1B

TinyLlama

fp16

2 GB / 16 GB VRAM13%

⚡72-82 tok/s↔33k📄Apache-2.0

💻 coding💬 chat🔬 research🎨 creative🔢 math

ollama pull TinyLlama-1.1B

Yi-1.5-9B-Chat

9B

Yi

q8_0

12.4 GB / 16 GB VRAM78%

⚡51-61 tok/s↔33k📄Apache-2.0

💬 chat🔬 research🎨 creative

ollama pull yi:9b

Llama-3-8B-Instruct

8B

Llama

q8_0

11.2 GB / 16 GB VRAM70%

⚡58-68 tok/s↔8k📄Llama 3 Community

💬 chat🎨 creative

ollama pull llama3:8b

LLM Finder

Configure your hardware

Qwen3.5-27B

Phi-4-14B-Instruct

Phi-3-medium-128k-instruct

Mistral-Small-24B-Instruct

InternLM2.5-20B-Chat

Mistral-Nemo-12B-Instruct

Qwen3.5-9B

DeepSeek-R1-Distill-Llama-8B

Gemma-4-26B-A4B

Qwen2.5-Coder-14B-Instruct

Qwen2.5-14B-Instruct

Llama-3.2-11B-Vision-Instruct

Gemma-2-27B-Instruct

Gemma-3-12B

Qwen3.5-4B

DeepSeek-R1-Distill-Qwen-14B

Solar-10.7B-Instruct

Ministral-8B-Instruct

Command-R-7B

Qwen3-8B

Mistral-7B-Instruct-v0.3

Dolphin-2.9.2-Qwen2-7B

Granite-3.1-8B-Instruct

Qwen2.5-Coder-7B-Instruct

Qwen2.5-7B-Instruct

OpenHermes-2.5-Mistral-7B

GLM-4.7-9B-Chat

Llama-3.1-8B-Instruct

Zephyr-7B-beta

Gemma-2-9B-Instruct

CodeLlama-7B-Instruct

Neural-Chat-7B-v3.3

Vicuna-13B

Orca-2-13B

Gemma-4-E4B

Phi-4-mini-instruct

DeepSeek-R1-Distill-Qwen-7B

OpenChat-3.6-8B

Phi-3-mini-4k-instruct

Gemma-3-4B

DeepSeek-Coder-6.7B-Instruct

InternLM2.5-7B-Chat

WizardLM-2-7B

StarCoder2-15B-Instruct

Vicuna-7B

Orca-2-7B

Qwen2-VL-7B-Instruct

CodeGemma-7B-Instruct

Falcon-7B-Instruct

Qwen2.5-3B-Instruct

Llama-3.2-3B-Instruct

CodeLlama-13B-Instruct

Llama-3.2-1B-Instruct

StarCoder2-7B-Instruct

SmolLM2-1.7B-Instruct

Gemma-2-2B-Instruct

StarCoder2-3B-Instruct

TinyLlama-1.1B

Yi-1.5-9B-Chat

Llama-3-8B-Instruct

Learn more