Local LLM directory

All local LLM models

Browse the model data used by Machine Fit and LLM Finder: descriptions, parameter counts, MoE active params, context windows, licenses, providers, and quantized memory requirements.

90
Models
28
Families
13
MoE

Qwen3.5-397B-A17B

397BMoE / 17B active

Qwen3.5-397B-A17B is a mixture-of-experts model with 397B total parameters and 17B active parameters in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 1M context window, Apache-2.0 license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingresearchmathagenticvision
FamilyQwen
Context1M
LicenseApache-2.0
Quality98

Quantized memory

Q2_K 100 GBQ3_K_M 130 GBQ4_K_M 168 GBQ5_K_M 210 GB

Providers

llama.cppvllmsglang

MiniMax-M2.5

230BMoE / 10B active

MiniMax-M2.5 is a mixture-of-experts model with 230B total parameters and 10B active parameters in the MiniMax family. ToolHalla tracks it for chat, coding, research, agentic with a 1.048576M context window, Apache-2.0 license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingresearchagentic
FamilyMiniMax
Context1.048576M
LicenseApache-2.0
Quality97

Quantized memory

Q2_K 58 GBQ3_K_M 75 GBQ3_K_XL 82 GBQ4_K_M 98 GB

Providers

llama.cppvllmsglang

DeepSeek-R1-671B

671BMoE / 37B active

DeepSeek-R1-671B is a mixture-of-experts model with 671B total parameters and 37B active parameters in the DeepSeek family. ToolHalla tracks it for chat, coding, research, reasoning with a 131k context window, MIT license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingresearchreasoning
FamilyDeepSeek
Context131k
LicenseMIT
Quality96

Quantized memory

TQ1_0 160 GBIQ2_XXS 195 GBQ3_K_M 290 GBQ4_K_M 380 GB

Providers

llama.cppvllmsglangollama

Qwen3.5-122B-A10B

122BMoE / 10B active

Qwen3.5-122B-A10B is a mixture-of-experts model with 122B total parameters and 10B active parameters in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 1M context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmathagentic
FamilyQwen
Context1M
LicenseApache-2.0
Quality96

Quantized memory

Q2_K 31 GBQ3_K_M 40 GBQ4_K_M 52 GBQ8_0 68 GB

Providers

ollamallama.cppvllmsglang

Kimi-K2.5

1TMoE / 32B active

Kimi-K2.5 is a mixture-of-experts model with 1T total parameters and 32B active parameters in the Kimi family. ToolHalla tracks it for chat, coding, research, agentic with a 131k context window, MIT (modified) license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingresearchagenticvision
FamilyKimi
Context131k
LicenseMIT (modified)
Quality96

Quantized memory

TQ1_0 200 GBQ2_K_XL 375 GBQ4_K_S 550 GB

Providers

llama.cppvllmsglang

GLM-5

744BMoE / unknown active

GLM-5 is a mixture-of-experts model with 744B total parameters and unknown active parameters in the GLM family. ToolHalla tracks it for chat, coding, research, agentic with a 131k context window, MIT license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingresearchagenticreasoning
FamilyGLM
Context131k
LicenseMIT
Quality95

Quantized memory

TQ1_0 174 GBIQ2_XXS 225 GBQ3_K_M 320 GBQ4_K_M 420 GB

Providers

llama.cppvllmsglang

Qwen3-235B-A22B

235BMoE / 22B active

Qwen3-235B-A22B is a mixture-of-experts model with 235B total parameters and 22B active parameters in the Qwen family. ToolHalla tracks it for chat, coding, research, reasoning with a 131k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchreasoning
FamilyQwen
Context131k
LicenseApache-2.0
Quality95

Quantized memory

Q3_K_M 78 GBQ4_K_M 100 GBQ5_K_M 125 GBQ8_0 190 GB

Providers

ollamallama.cppvllm

Llama-3.3-70B-Instruct

70B

Llama-3.3-70B-Instruct is a dense 70B parameter model in the Llama family. ToolHalla tracks it for chat, coding, research, creative with a 131k context window, Llama 3.3 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchcreativemath
FamilyLlama
Context131k
LicenseLlama 3.3 Community
Quality94

Quantized memory

Q2_K 28.8 GBQ3_K_M 37.4 GBQ4_K_M 47.4 GBQ5_K_M 58.8 GBQ8_0 84.4 GB

Providers

ollamallama.cppvllm

Qwen2.5-72B-Instruct

72B

Qwen2.5-72B-Instruct is a dense 72B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmathcreative
FamilyQwen
Context33k
LicenseApache-2.0
Quality93

Quantized memory

Q2_K 29.6 GBQ3_K_M 38.4 GBQ4_K_M 48.7 GBQ5_K_M 60.4 GBQ8_0 86.8 GB

Providers

ollamallama.cppvllm

Llama-3.1-70B-Instruct

70B

Llama-3.1-70B-Instruct is a dense 70B parameter model in the Llama family. ToolHalla tracks it for chat, research, creative, math with a 131k context window, Llama 3.1 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearchcreativemath
FamilyLlama
Context131k
LicenseLlama 3.1 Community
Quality92

Quantized memory

Q2_K 28.8 GBQ3_K_M 37.4 GBQ4_K_M 47.4 GBQ5_K_M 58.8 GBQ8_0 84.4 GB

Providers

ollamallama.cppvllm

Nous-Hermes-2-Mixtral-8x7B-DPO

46.7B MoE

Nous-Hermes-2-Mixtral-8x7B-DPO is a dense 46.7B MoE parameter model in the Nous family. ToolHalla tracks it for chat, creative, coding with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreativecoding
FamilyNous
Context33k
LicenseApache-2.0
Quality92

Quantized memory

Q2_K 19.6 GBQ3_K_M 25.4 GBQ4_K_M 32.2 GBQ5_K_M 39.9 GBQ8_0 57.3 GB

Providers

ollamallama.cppvllm

Qwen2.5-Coder-32B-Instruct

32B

Qwen2.5-Coder-32B-Instruct is a dense 32B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, math, research with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatmathresearch
FamilyQwen
Context33k
LicenseApache-2.0
Quality92

Quantized memory

Q2_K 13.6 GBQ3_K_M 17.6 GBQ4_K_M 22.3 GBQ5_K_M 27.6 GBQ8_0 39.6 GB

Providers

ollamallama.cppvllm

DeepSeek-R1-Distill-Llama-70B

70B

DeepSeek-R1-Distill-Llama-70B is a dense 70B parameter model in the DeepSeek family. ToolHalla tracks it for math, research, coding, chat with a 33k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

mathresearchcodingchat
FamilyDeepSeek
Context33k
LicenseMIT
Quality91

Quantized memory

Q2_K 28.8 GBQ3_K_M 37.4 GBQ4_K_M 47.4 GBQ5_K_M 58.8 GBQ8_0 84.4 GB

Providers

ollamallama.cppvllm

Qwen3-32B

32B

Qwen3-32B is a dense 32B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyQwen
Context33k
LicenseApache-2.0
Quality91

Quantized memory

Q2_K 13.6 GBQ3_K_M 18 GBQ4_K_M 22 GBQ5_K_M 27 GBQ8_0 38 GB

Providers

ollamallama.cppvllm

Qwen3.5-27B

27B

Qwen3.5-27B is a dense 27B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 1M context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmathagentic
FamilyQwen
Context1M
LicenseApache-2.0
Quality91

Quantized memory

Q2_K 11.3 GBQ3_K_M 14.6 GBQ4_K_M 18.4 GBQ5_K_M 22.5 GBQ8_0 32.4 GB

Providers

ollamallama.cppvllm

DeepSeek-R1-Distill-Qwen-32B

32B

DeepSeek-R1-Distill-Qwen-32B is a dense 32B parameter model in the DeepSeek family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyDeepSeek
Context33k
LicenseApache-2.0
Quality90

Quantized memory

Q2_K 13.6 GBQ3_K_M 18 GBQ4_K_M 22 GBQ5_K_M 27 GBQ8_0 38 GB

Providers

ollamallama.cppvllm

Llama-4-Maverick-17B

400BMoE / 17B active

Llama-4-Maverick-17B is a mixture-of-experts model with 400B total parameters and 17B active parameters in the Llama family. ToolHalla tracks it for chat, coding, vision, research with a 1.048576M context window, Llama 4 Community license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingvisionresearch
FamilyLlama
Context1.048576M
LicenseLlama 4 Community
Quality89

Quantized memory

Q3_K_M 130 GBQ4_K_M 170 GBQ5_K_M 210 GB

Providers

llama.cppvllmsglang

Qwen3.5-35B-A3B

35BMoE / 3B active

Qwen3.5-35B-A3B is a mixture-of-experts model with 35B total parameters and 3B active parameters in the Qwen family. ToolHalla tracks it for chat, coding, agentic, research with a 1M context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingagenticresearchmath
FamilyQwen
Context1M
LicenseApache-2.0
Quality89

Quantized memory

Q2_K 14.8 GBQ3_K_M 19.2 GBQ4_K_M 24.3 GBQ5_K_M 30.1 GBQ8_0 43.1 GB

Providers

ollamallama.cppvllm

Qwen2.5-32B-Instruct

32B

Qwen2.5-32B-Instruct is a dense 32B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmathcreative
FamilyQwen
Context33k
LicenseApache-2.0
Quality88

Quantized memory

Q2_K 13.6 GBQ3_K_M 17.6 GBQ4_K_M 22.3 GBQ5_K_M 27.6 GBQ8_0 39.6 GB

Providers

ollamallama.cppvllm

Gemma-3-27B

27B

Gemma-3-27B is a dense 27B parameter model in the Gemma family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyGemma
Context33k
LicenseApache-2.0
Quality88

Quantized memory

Q2_K 13.6 GBQ3_K_M 18 GBQ4_K_M 22 GBQ5_K_M 27 GBQ8_0 38 GB

Providers

ollamallama.cppvllm

Phi-4-14B-Instruct

14B

Phi-4-14B-Instruct is a dense 14B parameter model in the Phi family. ToolHalla tracks it for coding, math, research, chat with a 128k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

codingmathresearchchat
FamilyPhi
Context128k
LicenseMIT
Quality87

Quantized memory

Q2_K 6.4 GBQ3_K_M 8.3 GBQ4_K_M 10.4 GBQ5_K_M 12.9 GBQ8_0 18.3 GB

Providers

ollamallama.cppvllm

Mistral-Small-24B-Instruct

24B

Mistral-Small-24B-Instruct is a dense 24B parameter model in the Mistral family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyMistral
Context33k
LicenseApache-2.0
Quality85

Quantized memory

Q2_K 10 GBQ3_K_M 13 GBQ4_K_M 16 GBQ5_K_M 20 GBQ8_0 28 GB

Providers

ollamallama.cppvllm

InternLM2.5-20B-Chat

20B

InternLM2.5-20B-Chat is a dense 20B parameter model in the InternLM family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyInternLM
Context33k
LicenseApache-2.0
Quality85

Quantized memory

Q2_K 10 GBQ3_K_M 13 GBQ4_K_M 16 GBQ5_K_M 20 GBQ8_0 28 GB

Providers

ollamallama.cppvllm

Phi-3-medium-128k-instruct

14B

Phi-3-medium-128k-instruct is a dense 14B parameter model in the Phi family. ToolHalla tracks it for coding, chat, research, math with a 131k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchmath
FamilyPhi
Context131k
LicenseMIT
Quality85

Quantized memory

Q2_K 6.4 GBQ3_K_M 8.3 GBQ4_K_M 10.4 GBQ5_K_M 12.9 GBQ8_0 18.3 GB

Providers

ollamallama.cppvllm

Mistral-Nemo-12B-Instruct

12B

Mistral-Nemo-12B-Instruct is a dense 12B parameter model in the Mistral family. ToolHalla tracks it for chat, coding, creative with a 128k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingcreative
FamilyMistral
Context128k
LicenseApache-2.0
Quality84

Quantized memory

Q2_K 5.6 GBQ3_K_M 7.2 GBQ4_K_M 9.1 GBQ5_K_M 11.2 GBQ8_0 16 GB

Providers

ollamallama.cppvllm

Qwen3-30B-A3B

30B

Qwen3-30B-A3B is a dense 30B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyQwen
Context33k
LicenseApache-2.0
Quality83

Quantized memory

Q2_K 13.6 GBQ3_K_M 18 GBQ4_K_M 22 GBQ5_K_M 27 GBQ8_0 38 GB

Providers

ollamallama.cppvllm

Gemma-4-26B-A4B

25.2BMoE / 3.8B active

Gemma-4-26B-A4B is a mixture-of-experts model with 25.2B total parameters and 3.8B active parameters in the Gemma family. ToolHalla tracks it for chat, coding, research, math with a 262k context window, Apache-2.0 license, and local runtimes such as huggingface, transformers.

chatcodingresearchmathagenticreasoningvision
FamilyGemma
Context262k
LicenseApache-2.0
Quality83

Quantized memory

Q2_K 6.9 GBQ3_K_M 10.4 GBQ4_K_M 13.9 GBQ5_K_M 17.3 GBQ8_0 27.7 GB

Providers

huggingfacetransformers

Qwen3.5-9B

9BMoE

Qwen3.5-9B is a mixture-of-experts model with 9B total parameters in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 262k context window, Apache-2.0 license, and local runtimes such as huggingface, transformers, vllm.

chatcodingresearchmathagenticreasoningvision
FamilyQwen
Context262k
LicenseApache-2.0
Quality83

Quantized memory

Q2_K 2.5 GBQ3_K_M 3.7 GBQ4_K_M 5 GBQ5_K_M 6.2 GBQ8_0 9.9 GB

Providers

huggingfacetransformersvllmsglang

DeepSeek-R1-Distill-Llama-8B

8B

DeepSeek-R1-Distill-Llama-8B is a dense 8B parameter model in the DeepSeek family. ToolHalla tracks it for math, research, chat with a 33k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

mathresearchchat
FamilyDeepSeek
Context33k
LicenseMIT
Quality83

Quantized memory

Q2_K 4 GBQ3_K_M 5.2 GBQ4_K_M 6.5 GBQ5_K_M 8 GBQ8_0 11.2 GB

Providers

ollamallama.cppvllm

Mixtral-8x22B-Instruct

141B MoE

Mixtral-8x22B-Instruct is a dense 141B MoE parameter model in the Mistral family. ToolHalla tracks it for chat, coding, research, creative with a 66k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchcreativemath
FamilyMistral
Context66k
LicenseApache-2.0
Quality82

Quantized memory

Q2_K 36.8 GBQ3_K_M 47.8 GBQ4_K_M 60.6 GBQ5_K_M 75.2 GBQ8_0 108 GB

Providers

ollamallama.cppvllm

Llama-4-Scout-17B

109BMoE / 17B active

Llama-4-Scout-17B is a mixture-of-experts model with 109B total parameters and 17B active parameters in the Llama family. ToolHalla tracks it for chat, coding, vision with a 524k context window, Llama 4 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingvision
FamilyLlama
Context524k
LicenseLlama 4 Community
Quality82

Quantized memory

Q3_K_M 35 GBQ4_K_M 45 GBQ5_K_M 55 GBQ8_0 85 GB

Providers

ollamallama.cppvllm

Qwen2.5-Coder-14B-Instruct

14B

Qwen2.5-Coder-14B-Instruct is a dense 14B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatmath
FamilyQwen
Context33k
LicenseApache-2.0
Quality82

Quantized memory

Q2_K 6.4 GBQ3_K_M 8.3 GBQ4_K_M 10.4 GBQ5_K_M 12.9 GBQ8_0 18.3 GB

Providers

ollamallama.cppvllm

WizardLM-2-8x22B

141B MoE

WizardLM-2-8x22B is a dense 141B MoE parameter model in the WizardLM family. ToolHalla tracks it for chat, coding, research, creative with a 66k context window, Llama 2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchcreative
FamilyWizardLM
Context66k
LicenseLlama 2 Community
Quality81

Quantized memory

Q2_K 36.8 GBQ3_K_M 47.8 GBQ4_K_M 60.6 GBQ5_K_M 75.2 GBQ8_0 108 GB

Providers

ollamallama.cppvllm

Gemma-2-27B-Instruct

27B

Gemma-2-27B-Instruct is a dense 27B parameter model in the Gemma family. ToolHalla tracks it for chat, coding, research, creative with a 8k context window, Gemma Terms license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchcreativemath
FamilyGemma
Context8k
LicenseGemma Terms
Quality80

Quantized memory

Q2_K 11.6 GBQ3_K_M 15 GBQ4_K_M 19 GBQ5_K_M 23.5 GBQ8_0 33.7 GB

Providers

ollamallama.cppvllm

Qwen2.5-14B-Instruct

14B

Qwen2.5-14B-Instruct is a dense 14B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmath
FamilyQwen
Context33k
LicenseApache-2.0
Quality80

Quantized memory

Q2_K 6.4 GBQ3_K_M 8.3 GBQ4_K_M 10.4 GBQ5_K_M 12.9 GBQ8_0 18.3 GB

Providers

ollamallama.cppvllm

Gemma-3-12B

12B

Gemma-3-12B is a dense 12B parameter model in the Gemma family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyGemma
Context33k
LicenseApache-2.0
Quality80

Quantized memory

Q2_K 5.2 GBQ3_K_M 7 GBQ4_K_M 8.8 GBQ5_K_M 11 GBQ8_0 16 GB

Providers

ollamallama.cppvllm

Llama-3.2-11B-Vision-Instruct

11B

Llama-3.2-11B-Vision-Instruct is a dense 11B parameter model in the Llama family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemathvision
FamilyLlama
Context33k
LicenseApache-2.0
Quality80

Quantized memory

Q2_K 4.4 GBQ3_K_M 6 GBQ4_K_M 7.2 GBQ5_K_M 9 GBQ8_0 13 GB

Providers

ollamallama.cppvllm

Qwen3.5-4B

4BMoE

Qwen3.5-4B is a mixture-of-experts model with 4B total parameters in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 262k context window, Apache-2.0 license, and local runtimes such as huggingface, transformers, vllm.

chatcodingresearchmathagenticreasoningvision
FamilyQwen
Context262k
LicenseApache-2.0
Quality79

Quantized memory

Q2_K 1.1 GBQ3_K_M 1.7 GBQ4_K_M 2.2 GBQ5_K_M 2.8 GBQ8_0 4.4 GB

Providers

huggingfacetransformersvllmsglang

DeepSeek-R1-Distill-Qwen-14B

14B

DeepSeek-R1-Distill-Qwen-14B is a dense 14B parameter model in the DeepSeek family. ToolHalla tracks it for math, research, coding with a 33k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

mathresearchcoding
FamilyDeepSeek
Context33k
LicenseMIT
Quality78

Quantized memory

Q2_K 6.4 GBQ3_K_M 8.3 GBQ4_K_M 10.4 GBQ5_K_M 12.9 GBQ8_0 18.3 GB

Providers

ollamallama.cppvllm

Solar-10.7B-Instruct

10.7B

Solar-10.7B-Instruct is a dense 10.7B parameter model in the Solar family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilySolar
Context33k
LicenseApache-2.0
Quality78

Quantized memory

Q2_K 4.4 GBQ3_K_M 6 GBQ4_K_M 7.2 GBQ5_K_M 9 GBQ8_0 13 GB

Providers

ollamallama.cppvllm

DBRX-Instruct

132B MoE

DBRX-Instruct is a dense 132B MoE parameter model in the Databricks family. ToolHalla tracks it for chat, coding, research with a 33k context window, Databricks Open license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearch
FamilyDatabricks
Context33k
LicenseDatabricks Open
Quality77

Quantized memory

Q2_K 34.4 GBQ3_K_M 44.7 GBQ4_K_M 56.6 GBQ5_K_M 70.3 GBQ8_0 100.9 GB

Providers

ollamallama.cppvllm

Ministral-8B-Instruct

8B

Ministral-8B-Instruct is a dense 8B parameter model in the Mistral family. ToolHalla tracks it for chat, research with a 128k context window, Mistral Research license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearch
FamilyMistral
Context128k
LicenseMistral Research
Quality77

Quantized memory

Q2_K 4 GBQ3_K_M 5.2 GBQ4_K_M 6.5 GBQ5_K_M 8 GBQ8_0 11.2 GB

Providers

ollamallama.cppvllm

Mixtral-8x7B-Instruct

46.7B MoE

Mixtral-8x7B-Instruct is a dense 46.7B MoE parameter model in the Mistral family. ToolHalla tracks it for chat, coding, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchcreative
FamilyMistral
Context33k
LicenseApache-2.0
Quality76

Quantized memory

Q2_K 19.6 GBQ3_K_M 25.4 GBQ4_K_M 32.2 GBQ5_K_M 39.9 GBQ8_0 57.3 GB

Providers

ollamallama.cppvllm

Qwen3-8B

8B

Qwen3-8B is a dense 8B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyQwen
Context33k
LicenseApache-2.0
Quality76

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

Command-R-7B

7B

Command-R-7B is a dense 7B parameter model in the Cohere family. ToolHalla tracks it for research, chat, creative with a 128k context window, CC-BY-NC license, and local runtimes such as ollama, llama.cpp, vllm.

researchchatcreative
FamilyCohere
Context128k
LicenseCC-BY-NC
Quality76

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Granite-3.1-8B-Instruct

8B

Granite-3.1-8B-Instruct is a dense 8B parameter model in the Granite family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyGranite
Context33k
LicenseApache-2.0
Quality75

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

Mistral-7B-Instruct-v0.3

7B

Mistral-7B-Instruct-v0.3 is a dense 7B parameter model in the Mistral family. ToolHalla tracks it for chat, research with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearch
FamilyMistral
Context33k
LicenseApache-2.0
Quality75

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Dolphin-2.9.2-Qwen2-7B

7B

Dolphin-2.9.2-Qwen2-7B is a dense 7B parameter model in the Dolphin family. ToolHalla tracks it for chat, creative, coding with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreativecoding
FamilyDolphin
Context33k
LicenseApache-2.0
Quality75

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

DeepSeek-Coder-33B-Instruct

33B

DeepSeek-Coder-33B-Instruct is a dense 33B parameter model in the DeepSeek family. ToolHalla tracks it for coding, chat, math with a 16k context window, DeepSeek License license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatmath
FamilyDeepSeek
Context16k
LicenseDeepSeek License
Quality74

Quantized memory

Q2_K 14 GBQ3_K_M 18.2 GBQ4_K_M 23 GBQ5_K_M 28.5 GBQ8_0 40.7 GB

Providers

ollamallama.cppvllm

Qwen2.5-Coder-7B-Instruct

7B

Qwen2.5-Coder-7B-Instruct is a dense 7B parameter model in the Qwen family. ToolHalla tracks it for coding, chat with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchat
FamilyQwen
Context33k
LicenseApache-2.0
Quality74

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Falcon-40B-Instruct

40B

Falcon-40B-Instruct is a dense 40B parameter model in the Falcon family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyFalcon
Context33k
LicenseApache-2.0
Quality72

Quantized memory

Q2_K 17 GBQ3_K_M 22 GBQ4_K_M 28 GBQ5_K_M 34 GBQ8_0 48 GB

Providers

ollamallama.cppvllm

GLM-4.7-9B-Chat

9B

GLM-4.7-9B-Chat is a dense 9B parameter model in the GLM family. ToolHalla tracks it for chat, coding with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcoding
FamilyGLM
Context33k
LicenseApache-2.0
Quality72

Quantized memory

Q4_K_M 6.2 GBQ5_K_M 7.4 GBQ8_0 10.5 GBFP16 18.8 GB

Providers

ollamallama.cppvllm

Llama-3.1-8B-Instruct

8B

Llama-3.1-8B-Instruct is a dense 8B parameter model in the Llama family. ToolHalla tracks it for chat, research, creative with a 131k context window, Llama 3.1 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearchcreative
FamilyLlama
Context131k
LicenseLlama 3.1 Community
Quality72

Quantized memory

Q2_K 4 GBQ3_K_M 5.2 GBQ4_K_M 6.5 GBQ5_K_M 8 GBQ8_0 11.2 GB

Providers

ollamallama.cppvllm

Qwen2.5-7B-Instruct

7B

Qwen2.5-7B-Instruct is a dense 7B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmath
FamilyQwen
Context33k
LicenseApache-2.0
Quality72

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

OpenHermes-2.5-Mistral-7B

7B

OpenHermes-2.5-Mistral-7B is a dense 7B parameter model in the Nous family. ToolHalla tracks it for chat, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative
FamilyNous
Context33k
LicenseApache-2.0
Quality72

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Gemma-2-9B-Instruct

9B

Gemma-2-9B-Instruct is a dense 9B parameter model in the Gemma family. ToolHalla tracks it for chat, research, creative with a 8k context window, Gemma Terms license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearchcreative
FamilyGemma
Context8k
LicenseGemma Terms
Quality71

Quantized memory

Q2_K 4.4 GBQ3_K_M 5.7 GBQ4_K_M 7.1 GBQ5_K_M 8.8 GBQ8_0 12.4 GB

Providers

ollamallama.cppvllm

Zephyr-7B-beta

7B

Zephyr-7B-beta is a dense 7B parameter model in the Zephyr family. ToolHalla tracks it for chat, creative with a 33k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative
FamilyZephyr
Context33k
LicenseMIT
Quality71

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Vicuna-13B

13B

Vicuna-13B is a dense 13B parameter model in the Vicuna family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyVicuna
Context33k
LicenseApache-2.0
Quality70

Quantized memory

Q2_K 5.2 GBQ3_K_M 7 GBQ4_K_M 8.8 GBQ5_K_M 11 GBQ8_0 16 GB

Providers

ollamallama.cppvllm

Orca-2-13B

13B

Orca-2-13B is a dense 13B parameter model in the Orca family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyOrca
Context33k
LicenseApache-2.0
Quality70

Quantized memory

Q2_K 5.2 GBQ3_K_M 7 GBQ4_K_M 8.8 GBQ5_K_M 11 GBQ8_0 16 GB

Providers

ollamallama.cppvllm

CodeLlama-7B-Instruct

7B

CodeLlama-7B-Instruct is a dense 7B parameter model in the CodeLlama family. ToolHalla tracks it for coding with a 16k context window, Llama 2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

coding
FamilyCodeLlama
Context16k
LicenseLlama 2 Community
Quality70

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Neural-Chat-7B-v3.3

7B

Neural-Chat-7B-v3.3 is a dense 7B parameter model in the Intel family. ToolHalla tracks it for chat, research with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearch
FamilyIntel
Context33k
LicenseApache-2.0
Quality70

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Gemma-4-E4B

8B

Gemma-4-E4B is a dense 8B parameter model in the Gemma family. ToolHalla tracks it for chat, coding, research, agentic with a 131k context window, Apache-2.0 license, and local runtimes such as huggingface, transformers.

chatcodingresearchagenticreasoningvision
FamilyGemma
Context131k
LicenseApache-2.0
Quality69

Quantized memory

Q2_K 2.2 GBQ3_K_M 3.3 GBQ4_K_M 4.4 GBQ5_K_M 5.5 GBQ8_0 8.8 GB

Providers

huggingfacetransformers

Phi-4-mini-instruct

3.8B

Phi-4-mini-instruct is a dense 3.8B parameter model in the Phi family. ToolHalla tracks it for chat, coding, research with a 128k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearch
FamilyPhi
Context128k
LicenseMIT
Quality69

Quantized memory

Q2_K 2.3 GBQ3_K_M 3 GBQ4_K_M 3.7 GBQ5_K_M 4.5 GBQ8_0 6.3 GB

Providers

ollamallama.cppvllm

Command-R-35B

35B

Command-R-35B is a dense 35B parameter model in the Cohere family. ToolHalla tracks it for research, chat, coding with a 128k context window, CC-BY-NC license, and local runtimes such as ollama, llama.cpp, vllm.

researchchatcoding
FamilyCohere
Context128k
LicenseCC-BY-NC
Quality68

Quantized memory

Q2_K 14.8 GBQ3_K_M 19.2 GBQ4_K_M 24.3 GBQ5_K_M 30.1 GBQ8_0 43.1 GB

Providers

ollamallama.cppvllm

CodeLlama-34B-Instruct

34B

CodeLlama-34B-Instruct is a dense 34B parameter model in the CodeLlama family. ToolHalla tracks it for coding, math with a 16k context window, Llama 2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

codingmath
FamilyCodeLlama
Context16k
LicenseLlama 2 Community
Quality67

Quantized memory

Q2_K 14.4 GBQ3_K_M 18.7 GBQ4_K_M 23.6 GBQ5_K_M 29.3 GBQ8_0 41.9 GB

Providers

ollamallama.cppvllm

OpenChat-3.6-8B

8B

OpenChat-3.6-8B is a dense 8B parameter model in the OpenChat family. ToolHalla tracks it for chat, creative with a 8k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative
FamilyOpenChat
Context8k
LicenseApache-2.0
Quality66

Quantized memory

Q2_K 4 GBQ3_K_M 5.2 GBQ4_K_M 6.5 GBQ5_K_M 8 GBQ8_0 11.2 GB

Providers

ollamallama.cppvllm

DeepSeek-R1-Distill-Qwen-7B

7B

DeepSeek-R1-Distill-Qwen-7B is a dense 7B parameter model in the DeepSeek family. ToolHalla tracks it for math, research, coding with a 33k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

mathresearchcoding
FamilyDeepSeek
Context33k
LicenseMIT
Quality66

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Gemma-3-4B

4B

Gemma-3-4B is a dense 4B parameter model in the Gemma family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyGemma
Context33k
LicenseApache-2.0
Quality65

Quantized memory

Q2_K 2.8 GBQ3_K_M 3.8 GBQ4_K_M 4.8 GBQ5_K_M 6 GBQ8_0 9 GB

Providers

ollamallama.cppvllm

Phi-3-mini-4k-instruct

3.8B

Phi-3-mini-4k-instruct is a dense 3.8B parameter model in the Phi family. ToolHalla tracks it for chat, coding with a 4k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

chatcoding
FamilyPhi
Context4k
LicenseMIT
Quality65

Quantized memory

Q2_K 2.3 GBQ3_K_M 3 GBQ4_K_M 3.7 GBQ5_K_M 4.5 GBQ8_0 6.3 GB

Providers

ollamallama.cppvllm

InternLM2.5-7B-Chat

7B

InternLM2.5-7B-Chat is a dense 7B parameter model in the InternLM family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyInternLM
Context33k
LicenseApache-2.0
Quality62

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

DeepSeek-Coder-6.7B-Instruct

6.7B

DeepSeek-Coder-6.7B-Instruct is a dense 6.7B parameter model in the DeepSeek family. ToolHalla tracks it for coding, chat with a 16k context window, DeepSeek License license, and local runtimes such as ollama, llama.cpp, vllm.

codingchat
FamilyDeepSeek
Context16k
LicenseDeepSeek License
Quality62

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.6 GBQ5_K_M 6.9 GBQ8_0 9.7 GB

Providers

ollamallama.cppvllm

StarCoder2-15B-Instruct

15B

StarCoder2-15B-Instruct is a dense 15B parameter model in the StarCoder family. ToolHalla tracks it for coding, math with a 16k context window, OpenRAIL-M license, and local runtimes such as ollama, llama.cpp, vllm.

codingmath
FamilyStarCoder
Context16k
LicenseOpenRAIL-M
Quality60

Quantized memory

Q2_K 6.8 GBQ3_K_M 8.8 GBQ4_K_M 11.1 GBQ5_K_M 13.7 GBQ8_0 19.5 GB

Providers

ollamallama.cppvllm

WizardLM-2-7B

7B

WizardLM-2-7B is a dense 7B parameter model in the WizardLM family. ToolHalla tracks it for chat, coding with a 33k context window, Llama 2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcoding
FamilyWizardLM
Context33k
LicenseLlama 2 Community
Quality60

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Vicuna-7B

7B

Vicuna-7B is a dense 7B parameter model in the Vicuna family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyVicuna
Context33k
LicenseApache-2.0
Quality60

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

Orca-2-7B

7B

Orca-2-7B is a dense 7B parameter model in the Orca family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyOrca
Context33k
LicenseApache-2.0
Quality60

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

Qwen2-VL-7B-Instruct

7B

Qwen2-VL-7B-Instruct is a dense 7B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyQwen
Context33k
LicenseApache-2.0
Quality60

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

CodeGemma-7B-Instruct

7B

CodeGemma-7B-Instruct is a dense 7B parameter model in the Gemma family. ToolHalla tracks it for coding, chat with a 8k context window, Gemma Terms license, and local runtimes such as ollama, llama.cpp, vllm.

codingchat
FamilyGemma
Context8k
LicenseGemma Terms
Quality58

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Falcon-7B-Instruct

7B

Falcon-7B-Instruct is a dense 7B parameter model in the Falcon family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyFalcon
Context33k
LicenseApache-2.0
Quality58

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

Qwen2.5-3B-Instruct

3B

Qwen2.5-3B-Instruct is a dense 3B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearch
FamilyQwen
Context33k
LicenseApache-2.0
Quality56

Quantized memory

Q2_K 2 GBQ3_K_M 2.6 GBQ4_K_M 3.2 GBQ5_K_M 3.9 GBQ8_0 5.3 GB

Providers

ollamallama.cppvllm

CodeLlama-13B-Instruct

13B

CodeLlama-13B-Instruct is a dense 13B parameter model in the CodeLlama family. ToolHalla tracks it for coding, math with a 16k context window, Llama 2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

codingmath
FamilyCodeLlama
Context16k
LicenseLlama 2 Community
Quality55

Quantized memory

Q2_K 6 GBQ3_K_M 7.8 GBQ4_K_M 9.8 GBQ5_K_M 12.1 GBQ8_0 17.1 GB

Providers

ollamallama.cppvllm

Llama-3.2-3B-Instruct

3B

Llama-3.2-3B-Instruct is a dense 3B parameter model in the Llama family. ToolHalla tracks it for chat, creative with a 131k context window, Llama 3.2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative
FamilyLlama
Context131k
LicenseLlama 3.2 Community
Quality55

Quantized memory

Q2_K 2 GBQ3_K_M 2.6 GBQ4_K_M 3.2 GBQ5_K_M 3.9 GBQ8_0 5.3 GB

Providers

ollamallama.cppvllm

StarCoder2-7B-Instruct

7B

StarCoder2-7B-Instruct is a dense 7B parameter model in the StarCoder family. ToolHalla tracks it for coding with a 16k context window, OpenRAIL-M license, and local runtimes such as ollama, llama.cpp, vllm.

coding
FamilyStarCoder
Context16k
LicenseOpenRAIL-M
Quality50

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Llama-3.2-1B-Instruct

1B

Llama-3.2-1B-Instruct is a dense 1B parameter model in the Llama family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyLlama
Context33k
LicenseApache-2.0
Quality50

Quantized memory

Q2_K 0.5 GBQ3_K_M 0.6 GBQ4_K_M 0.7 GBQ5_K_M 0.9 GBQ8_0 1.2 GB

Providers

ollamallama.cppvllm

Gemma-2-2B-Instruct

2B

Gemma-2-2B-Instruct is a dense 2B parameter model in the Gemma family. ToolHalla tracks it for chat, creative with a 8k context window, Gemma Terms license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative
FamilyGemma
Context8k
LicenseGemma Terms
Quality45

Quantized memory

Q2_K 1.6 GBQ3_K_M 2 GBQ4_K_M 2.5 GBQ5_K_M 3 GBQ8_0 4.2 GB

Providers

ollamallama.cppvllm

SmolLM2-1.7B-Instruct

1.7B

SmolLM2-1.7B-Instruct is a dense 1.7B parameter model in the SmolLM family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilySmolLM
Context33k
LicenseApache-2.0
Quality45

Quantized memory

Q2_K 1 GBQ3_K_M 1.3 GBQ4_K_M 1.6 GBQ5_K_M 2 GBQ8_0 2.5 GB

Providers

ollamallama.cppvllm

StarCoder2-3B-Instruct

3B

StarCoder2-3B-Instruct is a dense 3B parameter model in the StarCoder family. ToolHalla tracks it for coding with a 16k context window, OpenRAIL-M license, and local runtimes such as ollama, llama.cpp, vllm.

coding
FamilyStarCoder
Context16k
LicenseOpenRAIL-M
Quality40

Quantized memory

Q2_K 2 GBQ3_K_M 2.6 GBQ4_K_M 3.2 GBQ5_K_M 3.9 GBQ8_0 5.3 GB

Providers

ollamallama.cppvllm

TinyLlama-1.1B

1.1B

TinyLlama-1.1B is a dense 1.1B parameter model in the TinyLlama family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath
FamilyTinyLlama
Context33k
LicenseApache-2.0
Quality35

Quantized memory

Q2_K 0.5 GBQ3_K_M 0.6 GBQ4_K_M 0.7 GBQ5_K_M 0.9 GBQ8_0 1.2 GB

Providers

ollamallama.cppvllm

Yi-1.5-34B-Chat

34B

Yi-1.5-34B-Chat is a dense 34B parameter model in the Yi family. ToolHalla tracks it for chat, research, creative, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearchcreativemath
FamilyYi
Context33k
LicenseApache-2.0
Quality30

Quantized memory

Q2_K 14.4 GBQ3_K_M 18.7 GBQ4_K_M 23.6 GBQ5_K_M 29.3 GBQ8_0 41.9 GB

Providers

ollamallama.cppvllm

Yi-1.5-9B-Chat

9B

Yi-1.5-9B-Chat is a dense 9B parameter model in the Yi family. ToolHalla tracks it for chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearchcreative
FamilyYi
Context33k
LicenseApache-2.0
Quality25

Quantized memory

Q2_K 4.4 GBQ3_K_M 5.7 GBQ4_K_M 7.1 GBQ5_K_M 8.8 GBQ8_0 12.4 GB

Providers

ollamallama.cppvllm

Llama-3-8B-Instruct

8B

Llama-3-8B-Instruct is a dense 8B parameter model in the Llama family. ToolHalla tracks it for chat, creative with a 8k context window, Llama 3 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative
FamilyLlama
Context8k
LicenseLlama 3 Community
Quality14

Quantized memory

Q2_K 4 GBQ3_K_M 5.2 GBQ4_K_M 6.5 GBQ5_K_M 8 GBQ8_0 11.2 GB

Providers

ollamallama.cppvllm