Local LLM directory

All local LLM models

Browse the model data used by Machine Fit and LLM Finder: descriptions, parameter counts, MoE active params, context windows, licenses, providers, and quantized memory requirements.

Models

Families

MoE

Check my machine Open LLM Finder

Qwen3.5-397B-A17B

397BMoE / 17B active

Qwen3.5-397B-A17B is a mixture-of-experts model with 397B total parameters and 17B active parameters in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 1M context window, Apache-2.0 license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingresearchmathagenticvision

FamilyQwen

Context1M

LicenseApache-2.0

Quality98

Quantized memory

Q2_K 100 GBQ3_K_M 130 GBQ4_K_M 168 GBQ5_K_M 210 GB

Providers

llama.cppvllmsglang

MiniMax-M2.5

230BMoE / 10B active

MiniMax-M2.5 is a mixture-of-experts model with 230B total parameters and 10B active parameters in the MiniMax family. ToolHalla tracks it for chat, coding, research, agentic with a 1.048576M context window, Apache-2.0 license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingresearchagentic

FamilyMiniMax

Context1.048576M

LicenseApache-2.0

Quality97

Quantized memory

Q2_K 58 GBQ3_K_M 75 GBQ3_K_XL 82 GBQ4_K_M 98 GB

Providers

llama.cppvllmsglang

DeepSeek-R1-671B

671BMoE / 37B active

DeepSeek-R1-671B is a mixture-of-experts model with 671B total parameters and 37B active parameters in the DeepSeek family. ToolHalla tracks it for chat, coding, research, reasoning with a 131k context window, MIT license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingresearchreasoning

FamilyDeepSeek

Context131k

LicenseMIT

Quality96

Quantized memory

TQ1_0 160 GBIQ2_XXS 195 GBQ3_K_M 290 GBQ4_K_M 380 GB

Providers

llama.cppvllmsglangollama

Qwen3.5-122B-A10B

122BMoE / 10B active

Qwen3.5-122B-A10B is a mixture-of-experts model with 122B total parameters and 10B active parameters in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 1M context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmathagentic

FamilyQwen

Context1M

LicenseApache-2.0

Quality96

Quantized memory

Q2_K 31 GBQ3_K_M 40 GBQ4_K_M 52 GBQ8_0 68 GB

Providers

ollamallama.cppvllmsglang

Kimi-K2.5

1TMoE / 32B active

Kimi-K2.5 is a mixture-of-experts model with 1T total parameters and 32B active parameters in the Kimi family. ToolHalla tracks it for chat, coding, research, agentic with a 131k context window, MIT (modified) license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingresearchagenticvision

FamilyKimi

Context131k

LicenseMIT (modified)

Quality96

Quantized memory

TQ1_0 200 GBQ2_K_XL 375 GBQ4_K_S 550 GB

Providers

llama.cppvllmsglang

GLM-5

744BMoE / unknown active

GLM-5 is a mixture-of-experts model with 744B total parameters and unknown active parameters in the GLM family. ToolHalla tracks it for chat, coding, research, agentic with a 131k context window, MIT license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingresearchagenticreasoning

FamilyGLM

Context131k

LicenseMIT

Quality95

Quantized memory

TQ1_0 174 GBIQ2_XXS 225 GBQ3_K_M 320 GBQ4_K_M 420 GB

Providers

llama.cppvllmsglang

Qwen3-235B-A22B

235BMoE / 22B active

Qwen3-235B-A22B is a mixture-of-experts model with 235B total parameters and 22B active parameters in the Qwen family. ToolHalla tracks it for chat, coding, research, reasoning with a 131k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchreasoning

FamilyQwen

Context131k

LicenseApache-2.0

Quality95

Quantized memory

Q3_K_M 78 GBQ4_K_M 100 GBQ5_K_M 125 GBQ8_0 190 GB

Providers

ollamallama.cppvllm

Llama-3.3-70B-Instruct

70B

Llama-3.3-70B-Instruct is a dense 70B parameter model in the Llama family. ToolHalla tracks it for chat, coding, research, creative with a 131k context window, Llama 3.3 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchcreativemath

FamilyLlama

Context131k

LicenseLlama 3.3 Community

Quality94

Quantized memory

Q2_K 28.8 GBQ3_K_M 37.4 GBQ4_K_M 47.4 GBQ5_K_M 58.8 GBQ8_0 84.4 GB

Providers

ollamallama.cppvllm

Qwen2.5-72B-Instruct

72B

Qwen2.5-72B-Instruct is a dense 72B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmathcreative

FamilyQwen

Context33k

LicenseApache-2.0

Quality93

Quantized memory

Q2_K 29.6 GBQ3_K_M 38.4 GBQ4_K_M 48.7 GBQ5_K_M 60.4 GBQ8_0 86.8 GB

Providers

ollamallama.cppvllm

Llama-3.1-70B-Instruct

70B

Llama-3.1-70B-Instruct is a dense 70B parameter model in the Llama family. ToolHalla tracks it for chat, research, creative, math with a 131k context window, Llama 3.1 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearchcreativemath

FamilyLlama

Context131k

LicenseLlama 3.1 Community

Quality92

Quantized memory

Q2_K 28.8 GBQ3_K_M 37.4 GBQ4_K_M 47.4 GBQ5_K_M 58.8 GBQ8_0 84.4 GB

Providers

ollamallama.cppvllm

Nous-Hermes-2-Mixtral-8x7B-DPO

46.7B MoE

Nous-Hermes-2-Mixtral-8x7B-DPO is a dense 46.7B MoE parameter model in the Nous family. ToolHalla tracks it for chat, creative, coding with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreativecoding

FamilyNous

Context33k

LicenseApache-2.0

Quality92

Quantized memory

Q2_K 19.6 GBQ3_K_M 25.4 GBQ4_K_M 32.2 GBQ5_K_M 39.9 GBQ8_0 57.3 GB

Providers

ollamallama.cppvllm

Qwen2.5-Coder-32B-Instruct

32B

Qwen2.5-Coder-32B-Instruct is a dense 32B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, math, research with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatmathresearch

FamilyQwen

Context33k

LicenseApache-2.0

Quality92

Quantized memory

Q2_K 13.6 GBQ3_K_M 17.6 GBQ4_K_M 22.3 GBQ5_K_M 27.6 GBQ8_0 39.6 GB

Providers

ollamallama.cppvllm

DeepSeek-R1-Distill-Llama-70B

70B

DeepSeek-R1-Distill-Llama-70B is a dense 70B parameter model in the DeepSeek family. ToolHalla tracks it for math, research, coding, chat with a 33k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

mathresearchcodingchat

FamilyDeepSeek

Context33k

LicenseMIT

Quality91

Quantized memory

Q2_K 28.8 GBQ3_K_M 37.4 GBQ4_K_M 47.4 GBQ5_K_M 58.8 GBQ8_0 84.4 GB

Providers

ollamallama.cppvllm

Qwen3-32B

32B

Qwen3-32B is a dense 32B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyQwen

Context33k

LicenseApache-2.0

Quality91

Quantized memory

Q2_K 13.6 GBQ3_K_M 18 GBQ4_K_M 22 GBQ5_K_M 27 GBQ8_0 38 GB

Providers

ollamallama.cppvllm

Qwen3.5-27B

27B

Qwen3.5-27B is a dense 27B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 1M context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmathagentic

FamilyQwen

Context1M

LicenseApache-2.0

Quality91

Quantized memory

Q2_K 11.3 GBQ3_K_M 14.6 GBQ4_K_M 18.4 GBQ5_K_M 22.5 GBQ8_0 32.4 GB

Providers

ollamallama.cppvllm

DeepSeek-R1-Distill-Qwen-32B

32B

DeepSeek-R1-Distill-Qwen-32B is a dense 32B parameter model in the DeepSeek family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyDeepSeek

Context33k

LicenseApache-2.0

Quality90

Quantized memory

Q2_K 13.6 GBQ3_K_M 18 GBQ4_K_M 22 GBQ5_K_M 27 GBQ8_0 38 GB

Providers

ollamallama.cppvllm

Llama-4-Maverick-17B

400BMoE / 17B active

Llama-4-Maverick-17B is a mixture-of-experts model with 400B total parameters and 17B active parameters in the Llama family. ToolHalla tracks it for chat, coding, vision, research with a 1.048576M context window, Llama 4 Community license, and local runtimes such as llama.cpp, vllm, sglang.

chatcodingvisionresearch

FamilyLlama

Context1.048576M

LicenseLlama 4 Community

Quality89

Quantized memory

Q3_K_M 130 GBQ4_K_M 170 GBQ5_K_M 210 GB

Providers

llama.cppvllmsglang

Qwen3.5-35B-A3B

35BMoE / 3B active

Qwen3.5-35B-A3B is a mixture-of-experts model with 35B total parameters and 3B active parameters in the Qwen family. ToolHalla tracks it for chat, coding, agentic, research with a 1M context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingagenticresearchmath

FamilyQwen

Context1M

LicenseApache-2.0

Quality89

Quantized memory

Q2_K 14.8 GBQ3_K_M 19.2 GBQ4_K_M 24.3 GBQ5_K_M 30.1 GBQ8_0 43.1 GB

Providers

ollamallama.cppvllm

Qwen2.5-32B-Instruct

32B

Qwen2.5-32B-Instruct is a dense 32B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmathcreative

FamilyQwen

Context33k

LicenseApache-2.0

Quality88

Quantized memory

Q2_K 13.6 GBQ3_K_M 17.6 GBQ4_K_M 22.3 GBQ5_K_M 27.6 GBQ8_0 39.6 GB

Providers

ollamallama.cppvllm

Gemma-3-27B

27B

Gemma-3-27B is a dense 27B parameter model in the Gemma family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyGemma

Context33k

LicenseApache-2.0

Quality88

Quantized memory

Q2_K 13.6 GBQ3_K_M 18 GBQ4_K_M 22 GBQ5_K_M 27 GBQ8_0 38 GB

Providers

ollamallama.cppvllm

Phi-4-14B-Instruct

14B

Phi-4-14B-Instruct is a dense 14B parameter model in the Phi family. ToolHalla tracks it for coding, math, research, chat with a 128k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

codingmathresearchchat

FamilyPhi

Context128k

LicenseMIT

Quality87

Quantized memory

Q2_K 6.4 GBQ3_K_M 8.3 GBQ4_K_M 10.4 GBQ5_K_M 12.9 GBQ8_0 18.3 GB

Providers

ollamallama.cppvllm

Mistral-Small-24B-Instruct

24B

Mistral-Small-24B-Instruct is a dense 24B parameter model in the Mistral family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyMistral

Context33k

LicenseApache-2.0

Quality85

Quantized memory

Q2_K 10 GBQ3_K_M 13 GBQ4_K_M 16 GBQ5_K_M 20 GBQ8_0 28 GB

Providers

ollamallama.cppvllm

InternLM2.5-20B-Chat

20B

InternLM2.5-20B-Chat is a dense 20B parameter model in the InternLM family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyInternLM

Context33k

LicenseApache-2.0

Quality85

Quantized memory

Q2_K 10 GBQ3_K_M 13 GBQ4_K_M 16 GBQ5_K_M 20 GBQ8_0 28 GB

Providers

ollamallama.cppvllm

Phi-3-medium-128k-instruct

14B

Phi-3-medium-128k-instruct is a dense 14B parameter model in the Phi family. ToolHalla tracks it for coding, chat, research, math with a 131k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchmath

FamilyPhi

Context131k

LicenseMIT

Quality85

Quantized memory

Q2_K 6.4 GBQ3_K_M 8.3 GBQ4_K_M 10.4 GBQ5_K_M 12.9 GBQ8_0 18.3 GB

Providers

ollamallama.cppvllm

Mistral-Nemo-12B-Instruct

12B

Mistral-Nemo-12B-Instruct is a dense 12B parameter model in the Mistral family. ToolHalla tracks it for chat, coding, creative with a 128k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingcreative

FamilyMistral

Context128k

LicenseApache-2.0

Quality84

Quantized memory

Q2_K 5.6 GBQ3_K_M 7.2 GBQ4_K_M 9.1 GBQ5_K_M 11.2 GBQ8_0 16 GB

Providers

ollamallama.cppvllm

Qwen3-30B-A3B

30B

Qwen3-30B-A3B is a dense 30B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyQwen

Context33k

LicenseApache-2.0

Quality83

Quantized memory

Q2_K 13.6 GBQ3_K_M 18 GBQ4_K_M 22 GBQ5_K_M 27 GBQ8_0 38 GB

Providers

ollamallama.cppvllm

Gemma-4-26B-A4B

25.2BMoE / 3.8B active

Gemma-4-26B-A4B is a mixture-of-experts model with 25.2B total parameters and 3.8B active parameters in the Gemma family. ToolHalla tracks it for chat, coding, research, math with a 262k context window, Apache-2.0 license, and local runtimes such as huggingface, transformers.

chatcodingresearchmathagenticreasoningvision

FamilyGemma

Context262k

LicenseApache-2.0

Quality83

Quantized memory

Q2_K 6.9 GBQ3_K_M 10.4 GBQ4_K_M 13.9 GBQ5_K_M 17.3 GBQ8_0 27.7 GB

Providers

huggingfacetransformers

Qwen3.5-9B

9BMoE

Qwen3.5-9B is a mixture-of-experts model with 9B total parameters in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 262k context window, Apache-2.0 license, and local runtimes such as huggingface, transformers, vllm.

chatcodingresearchmathagenticreasoningvision

FamilyQwen

Context262k

LicenseApache-2.0

Quality83

Quantized memory

Q2_K 2.5 GBQ3_K_M 3.7 GBQ4_K_M 5 GBQ5_K_M 6.2 GBQ8_0 9.9 GB

Providers

huggingfacetransformersvllmsglang

DeepSeek-R1-Distill-Llama-8B

DeepSeek-R1-Distill-Llama-8B is a dense 8B parameter model in the DeepSeek family. ToolHalla tracks it for math, research, chat with a 33k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

mathresearchchat

FamilyDeepSeek

Context33k

LicenseMIT

Quality83

Quantized memory

Q2_K 4 GBQ3_K_M 5.2 GBQ4_K_M 6.5 GBQ5_K_M 8 GBQ8_0 11.2 GB

Providers

ollamallama.cppvllm

Mixtral-8x22B-Instruct

141B MoE

Mixtral-8x22B-Instruct is a dense 141B MoE parameter model in the Mistral family. ToolHalla tracks it for chat, coding, research, creative with a 66k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchcreativemath

FamilyMistral

Context66k

LicenseApache-2.0

Quality82

Quantized memory

Q2_K 36.8 GBQ3_K_M 47.8 GBQ4_K_M 60.6 GBQ5_K_M 75.2 GBQ8_0 108 GB

Providers

ollamallama.cppvllm

Llama-4-Scout-17B

109BMoE / 17B active

Llama-4-Scout-17B is a mixture-of-experts model with 109B total parameters and 17B active parameters in the Llama family. ToolHalla tracks it for chat, coding, vision with a 524k context window, Llama 4 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingvision

FamilyLlama

Context524k

LicenseLlama 4 Community

Quality82

Quantized memory

Q3_K_M 35 GBQ4_K_M 45 GBQ5_K_M 55 GBQ8_0 85 GB

Providers

ollamallama.cppvllm

Qwen2.5-Coder-14B-Instruct

14B

Qwen2.5-Coder-14B-Instruct is a dense 14B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatmath

FamilyQwen

Context33k

LicenseApache-2.0

Quality82

Quantized memory

Q2_K 6.4 GBQ3_K_M 8.3 GBQ4_K_M 10.4 GBQ5_K_M 12.9 GBQ8_0 18.3 GB

Providers

ollamallama.cppvllm

WizardLM-2-8x22B

141B MoE

WizardLM-2-8x22B is a dense 141B MoE parameter model in the WizardLM family. ToolHalla tracks it for chat, coding, research, creative with a 66k context window, Llama 2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchcreative

FamilyWizardLM

Context66k

LicenseLlama 2 Community

Quality81

Quantized memory

Q2_K 36.8 GBQ3_K_M 47.8 GBQ4_K_M 60.6 GBQ5_K_M 75.2 GBQ8_0 108 GB

Providers

ollamallama.cppvllm

Gemma-2-27B-Instruct

27B

Gemma-2-27B-Instruct is a dense 27B parameter model in the Gemma family. ToolHalla tracks it for chat, coding, research, creative with a 8k context window, Gemma Terms license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchcreativemath

FamilyGemma

Context8k

LicenseGemma Terms

Quality80

Quantized memory

Q2_K 11.6 GBQ3_K_M 15 GBQ4_K_M 19 GBQ5_K_M 23.5 GBQ8_0 33.7 GB

Providers

ollamallama.cppvllm

Qwen2.5-14B-Instruct

14B

Qwen2.5-14B-Instruct is a dense 14B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmath

FamilyQwen

Context33k

LicenseApache-2.0

Quality80

Quantized memory

Q2_K 6.4 GBQ3_K_M 8.3 GBQ4_K_M 10.4 GBQ5_K_M 12.9 GBQ8_0 18.3 GB

Providers

ollamallama.cppvllm

Gemma-3-12B

12B

Gemma-3-12B is a dense 12B parameter model in the Gemma family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyGemma

Context33k

LicenseApache-2.0

Quality80

Quantized memory

Q2_K 5.2 GBQ3_K_M 7 GBQ4_K_M 8.8 GBQ5_K_M 11 GBQ8_0 16 GB

Providers

ollamallama.cppvllm

Llama-3.2-11B-Vision-Instruct

11B

Llama-3.2-11B-Vision-Instruct is a dense 11B parameter model in the Llama family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemathvision

FamilyLlama

Context33k

LicenseApache-2.0

Quality80

Quantized memory

Q2_K 4.4 GBQ3_K_M 6 GBQ4_K_M 7.2 GBQ5_K_M 9 GBQ8_0 13 GB

Providers

ollamallama.cppvllm

Qwen3.5-4B

4BMoE

Qwen3.5-4B is a mixture-of-experts model with 4B total parameters in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 262k context window, Apache-2.0 license, and local runtimes such as huggingface, transformers, vllm.

chatcodingresearchmathagenticreasoningvision

FamilyQwen

Context262k

LicenseApache-2.0

Quality79

Quantized memory

Q2_K 1.1 GBQ3_K_M 1.7 GBQ4_K_M 2.2 GBQ5_K_M 2.8 GBQ8_0 4.4 GB

Providers

huggingfacetransformersvllmsglang

DeepSeek-R1-Distill-Qwen-14B

14B

DeepSeek-R1-Distill-Qwen-14B is a dense 14B parameter model in the DeepSeek family. ToolHalla tracks it for math, research, coding with a 33k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

mathresearchcoding

FamilyDeepSeek

Context33k

LicenseMIT

Quality78

Quantized memory

Q2_K 6.4 GBQ3_K_M 8.3 GBQ4_K_M 10.4 GBQ5_K_M 12.9 GBQ8_0 18.3 GB

Providers

ollamallama.cppvllm

Solar-10.7B-Instruct

10.7B

Solar-10.7B-Instruct is a dense 10.7B parameter model in the Solar family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilySolar

Context33k

LicenseApache-2.0

Quality78

Quantized memory

Q2_K 4.4 GBQ3_K_M 6 GBQ4_K_M 7.2 GBQ5_K_M 9 GBQ8_0 13 GB

Providers

ollamallama.cppvllm

DBRX-Instruct

132B MoE

DBRX-Instruct is a dense 132B MoE parameter model in the Databricks family. ToolHalla tracks it for chat, coding, research with a 33k context window, Databricks Open license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearch

FamilyDatabricks

Context33k

LicenseDatabricks Open

Quality77

Quantized memory

Q2_K 34.4 GBQ3_K_M 44.7 GBQ4_K_M 56.6 GBQ5_K_M 70.3 GBQ8_0 100.9 GB

Providers

ollamallama.cppvllm

Ministral-8B-Instruct

Ministral-8B-Instruct is a dense 8B parameter model in the Mistral family. ToolHalla tracks it for chat, research with a 128k context window, Mistral Research license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearch

FamilyMistral

Context128k

LicenseMistral Research

Quality77

Quantized memory

Q2_K 4 GBQ3_K_M 5.2 GBQ4_K_M 6.5 GBQ5_K_M 8 GBQ8_0 11.2 GB

Providers

ollamallama.cppvllm

Mixtral-8x7B-Instruct

46.7B MoE

Mixtral-8x7B-Instruct is a dense 46.7B MoE parameter model in the Mistral family. ToolHalla tracks it for chat, coding, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchcreative

FamilyMistral

Context33k

LicenseApache-2.0

Quality76

Quantized memory

Q2_K 19.6 GBQ3_K_M 25.4 GBQ4_K_M 32.2 GBQ5_K_M 39.9 GBQ8_0 57.3 GB

Providers

ollamallama.cppvllm

Qwen3-8B

Qwen3-8B is a dense 8B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyQwen

Context33k

LicenseApache-2.0

Quality76

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

Command-R-7B

Command-R-7B is a dense 7B parameter model in the Cohere family. ToolHalla tracks it for research, chat, creative with a 128k context window, CC-BY-NC license, and local runtimes such as ollama, llama.cpp, vllm.

researchchatcreative

FamilyCohere

Context128k

LicenseCC-BY-NC

Quality76

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Granite-3.1-8B-Instruct

Granite-3.1-8B-Instruct is a dense 8B parameter model in the Granite family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyGranite

Context33k

LicenseApache-2.0

Quality75

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

Mistral-7B-Instruct-v0.3

Mistral-7B-Instruct-v0.3 is a dense 7B parameter model in the Mistral family. ToolHalla tracks it for chat, research with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearch

FamilyMistral

Context33k

LicenseApache-2.0

Quality75

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Dolphin-2.9.2-Qwen2-7B

Dolphin-2.9.2-Qwen2-7B is a dense 7B parameter model in the Dolphin family. ToolHalla tracks it for chat, creative, coding with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreativecoding

FamilyDolphin

Context33k

LicenseApache-2.0

Quality75

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

DeepSeek-Coder-33B-Instruct

33B

DeepSeek-Coder-33B-Instruct is a dense 33B parameter model in the DeepSeek family. ToolHalla tracks it for coding, chat, math with a 16k context window, DeepSeek License license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatmath

FamilyDeepSeek

Context16k

LicenseDeepSeek License

Quality74

Quantized memory

Q2_K 14 GBQ3_K_M 18.2 GBQ4_K_M 23 GBQ5_K_M 28.5 GBQ8_0 40.7 GB

Providers

ollamallama.cppvllm

Qwen2.5-Coder-7B-Instruct

Qwen2.5-Coder-7B-Instruct is a dense 7B parameter model in the Qwen family. ToolHalla tracks it for coding, chat with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchat

FamilyQwen

Context33k

LicenseApache-2.0

Quality74

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Falcon-40B-Instruct

40B

Falcon-40B-Instruct is a dense 40B parameter model in the Falcon family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyFalcon

Context33k

LicenseApache-2.0

Quality72

Quantized memory

Q2_K 17 GBQ3_K_M 22 GBQ4_K_M 28 GBQ5_K_M 34 GBQ8_0 48 GB

Providers

ollamallama.cppvllm

GLM-4.7-9B-Chat

GLM-4.7-9B-Chat is a dense 9B parameter model in the GLM family. ToolHalla tracks it for chat, coding with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcoding

FamilyGLM

Context33k

LicenseApache-2.0

Quality72

Quantized memory

Q4_K_M 6.2 GBQ5_K_M 7.4 GBQ8_0 10.5 GBFP16 18.8 GB

Providers

ollamallama.cppvllm

Llama-3.1-8B-Instruct

Llama-3.1-8B-Instruct is a dense 8B parameter model in the Llama family. ToolHalla tracks it for chat, research, creative with a 131k context window, Llama 3.1 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearchcreative

FamilyLlama

Context131k

LicenseLlama 3.1 Community

Quality72

Quantized memory

Q2_K 4 GBQ3_K_M 5.2 GBQ4_K_M 6.5 GBQ5_K_M 8 GBQ8_0 11.2 GB

Providers

ollamallama.cppvllm

Qwen2.5-7B-Instruct

Qwen2.5-7B-Instruct is a dense 7B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearchmath

FamilyQwen

Context33k

LicenseApache-2.0

Quality72

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

OpenHermes-2.5-Mistral-7B

OpenHermes-2.5-Mistral-7B is a dense 7B parameter model in the Nous family. ToolHalla tracks it for chat, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative

FamilyNous

Context33k

LicenseApache-2.0

Quality72

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Gemma-2-9B-Instruct

Gemma-2-9B-Instruct is a dense 9B parameter model in the Gemma family. ToolHalla tracks it for chat, research, creative with a 8k context window, Gemma Terms license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearchcreative

FamilyGemma

Context8k

LicenseGemma Terms

Quality71

Quantized memory

Q2_K 4.4 GBQ3_K_M 5.7 GBQ4_K_M 7.1 GBQ5_K_M 8.8 GBQ8_0 12.4 GB

Providers

ollamallama.cppvllm

Zephyr-7B-beta

Zephyr-7B-beta is a dense 7B parameter model in the Zephyr family. ToolHalla tracks it for chat, creative with a 33k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative

FamilyZephyr

Context33k

LicenseMIT

Quality71

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Vicuna-13B

13B

Vicuna-13B is a dense 13B parameter model in the Vicuna family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyVicuna

Context33k

LicenseApache-2.0

Quality70

Quantized memory

Q2_K 5.2 GBQ3_K_M 7 GBQ4_K_M 8.8 GBQ5_K_M 11 GBQ8_0 16 GB

Providers

ollamallama.cppvllm

Orca-2-13B

13B

Orca-2-13B is a dense 13B parameter model in the Orca family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyOrca

Context33k

LicenseApache-2.0

Quality70

Quantized memory

Q2_K 5.2 GBQ3_K_M 7 GBQ4_K_M 8.8 GBQ5_K_M 11 GBQ8_0 16 GB

Providers

ollamallama.cppvllm

CodeLlama-7B-Instruct

CodeLlama-7B-Instruct is a dense 7B parameter model in the CodeLlama family. ToolHalla tracks it for coding with a 16k context window, Llama 2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

coding

FamilyCodeLlama

Context16k

LicenseLlama 2 Community

Quality70

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Neural-Chat-7B-v3.3

Neural-Chat-7B-v3.3 is a dense 7B parameter model in the Intel family. ToolHalla tracks it for chat, research with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearch

FamilyIntel

Context33k

LicenseApache-2.0

Quality70

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Gemma-4-E4B

Gemma-4-E4B is a dense 8B parameter model in the Gemma family. ToolHalla tracks it for chat, coding, research, agentic with a 131k context window, Apache-2.0 license, and local runtimes such as huggingface, transformers.

chatcodingresearchagenticreasoningvision

FamilyGemma

Context131k

LicenseApache-2.0

Quality69

Quantized memory

Q2_K 2.2 GBQ3_K_M 3.3 GBQ4_K_M 4.4 GBQ5_K_M 5.5 GBQ8_0 8.8 GB

Providers

huggingfacetransformers

Phi-4-mini-instruct

3.8B

Phi-4-mini-instruct is a dense 3.8B parameter model in the Phi family. ToolHalla tracks it for chat, coding, research with a 128k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearch

FamilyPhi

Context128k

LicenseMIT

Quality69

Quantized memory

Q2_K 2.3 GBQ3_K_M 3 GBQ4_K_M 3.7 GBQ5_K_M 4.5 GBQ8_0 6.3 GB

Providers

ollamallama.cppvllm

Command-R-35B

35B

Command-R-35B is a dense 35B parameter model in the Cohere family. ToolHalla tracks it for research, chat, coding with a 128k context window, CC-BY-NC license, and local runtimes such as ollama, llama.cpp, vllm.

researchchatcoding

FamilyCohere

Context128k

LicenseCC-BY-NC

Quality68

Quantized memory

Q2_K 14.8 GBQ3_K_M 19.2 GBQ4_K_M 24.3 GBQ5_K_M 30.1 GBQ8_0 43.1 GB

Providers

ollamallama.cppvllm

CodeLlama-34B-Instruct

34B

CodeLlama-34B-Instruct is a dense 34B parameter model in the CodeLlama family. ToolHalla tracks it for coding, math with a 16k context window, Llama 2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

codingmath

FamilyCodeLlama

Context16k

LicenseLlama 2 Community

Quality67

Quantized memory

Q2_K 14.4 GBQ3_K_M 18.7 GBQ4_K_M 23.6 GBQ5_K_M 29.3 GBQ8_0 41.9 GB

Providers

ollamallama.cppvllm

OpenChat-3.6-8B

OpenChat-3.6-8B is a dense 8B parameter model in the OpenChat family. ToolHalla tracks it for chat, creative with a 8k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative

FamilyOpenChat

Context8k

LicenseApache-2.0

Quality66

Quantized memory

Q2_K 4 GBQ3_K_M 5.2 GBQ4_K_M 6.5 GBQ5_K_M 8 GBQ8_0 11.2 GB

Providers

ollamallama.cppvllm

DeepSeek-R1-Distill-Qwen-7B

DeepSeek-R1-Distill-Qwen-7B is a dense 7B parameter model in the DeepSeek family. ToolHalla tracks it for math, research, coding with a 33k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

mathresearchcoding

FamilyDeepSeek

Context33k

LicenseMIT

Quality66

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Gemma-3-4B

Gemma-3-4B is a dense 4B parameter model in the Gemma family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyGemma

Context33k

LicenseApache-2.0

Quality65

Quantized memory

Q2_K 2.8 GBQ3_K_M 3.8 GBQ4_K_M 4.8 GBQ5_K_M 6 GBQ8_0 9 GB

Providers

ollamallama.cppvllm

Phi-3-mini-4k-instruct

3.8B

Phi-3-mini-4k-instruct is a dense 3.8B parameter model in the Phi family. ToolHalla tracks it for chat, coding with a 4k context window, MIT license, and local runtimes such as ollama, llama.cpp, vllm.

chatcoding

FamilyPhi

Context4k

LicenseMIT

Quality65

Quantized memory

Q2_K 2.3 GBQ3_K_M 3 GBQ4_K_M 3.7 GBQ5_K_M 4.5 GBQ8_0 6.3 GB

Providers

ollamallama.cppvllm

InternLM2.5-7B-Chat

InternLM2.5-7B-Chat is a dense 7B parameter model in the InternLM family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyInternLM

Context33k

LicenseApache-2.0

Quality62

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

DeepSeek-Coder-6.7B-Instruct

6.7B

DeepSeek-Coder-6.7B-Instruct is a dense 6.7B parameter model in the DeepSeek family. ToolHalla tracks it for coding, chat with a 16k context window, DeepSeek License license, and local runtimes such as ollama, llama.cpp, vllm.

codingchat

FamilyDeepSeek

Context16k

LicenseDeepSeek License

Quality62

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.6 GBQ5_K_M 6.9 GBQ8_0 9.7 GB

Providers

ollamallama.cppvllm

StarCoder2-15B-Instruct

15B

StarCoder2-15B-Instruct is a dense 15B parameter model in the StarCoder family. ToolHalla tracks it for coding, math with a 16k context window, OpenRAIL-M license, and local runtimes such as ollama, llama.cpp, vllm.

codingmath

FamilyStarCoder

Context16k

LicenseOpenRAIL-M

Quality60

Quantized memory

Q2_K 6.8 GBQ3_K_M 8.8 GBQ4_K_M 11.1 GBQ5_K_M 13.7 GBQ8_0 19.5 GB

Providers

ollamallama.cppvllm

WizardLM-2-7B

WizardLM-2-7B is a dense 7B parameter model in the WizardLM family. ToolHalla tracks it for chat, coding with a 33k context window, Llama 2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcoding

FamilyWizardLM

Context33k

LicenseLlama 2 Community

Quality60

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Vicuna-7B

Vicuna-7B is a dense 7B parameter model in the Vicuna family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyVicuna

Context33k

LicenseApache-2.0

Quality60

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

Orca-2-7B

Orca-2-7B is a dense 7B parameter model in the Orca family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyOrca

Context33k

LicenseApache-2.0

Quality60

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

Qwen2-VL-7B-Instruct

Qwen2-VL-7B-Instruct is a dense 7B parameter model in the Qwen family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyQwen

Context33k

LicenseApache-2.0

Quality60

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

CodeGemma-7B-Instruct

CodeGemma-7B-Instruct is a dense 7B parameter model in the Gemma family. ToolHalla tracks it for coding, chat with a 8k context window, Gemma Terms license, and local runtimes such as ollama, llama.cpp, vllm.

codingchat

FamilyGemma

Context8k

LicenseGemma Terms

Quality58

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Falcon-7B-Instruct

Falcon-7B-Instruct is a dense 7B parameter model in the Falcon family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyFalcon

Context33k

LicenseApache-2.0

Quality58

Quantized memory

Q2_K 3.5 GBQ3_K_M 4.5 GBQ4_K_M 5.5 GBQ5_K_M 7 GBQ8_0 10 GB

Providers

ollamallama.cppvllm

Qwen2.5-3B-Instruct

Qwen2.5-3B-Instruct is a dense 3B parameter model in the Qwen family. ToolHalla tracks it for chat, coding, research with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatcodingresearch

FamilyQwen

Context33k

LicenseApache-2.0

Quality56

Quantized memory

Q2_K 2 GBQ3_K_M 2.6 GBQ4_K_M 3.2 GBQ5_K_M 3.9 GBQ8_0 5.3 GB

Providers

ollamallama.cppvllm

CodeLlama-13B-Instruct

13B

CodeLlama-13B-Instruct is a dense 13B parameter model in the CodeLlama family. ToolHalla tracks it for coding, math with a 16k context window, Llama 2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

codingmath

FamilyCodeLlama

Context16k

LicenseLlama 2 Community

Quality55

Quantized memory

Q2_K 6 GBQ3_K_M 7.8 GBQ4_K_M 9.8 GBQ5_K_M 12.1 GBQ8_0 17.1 GB

Providers

ollamallama.cppvllm

Llama-3.2-3B-Instruct

Llama-3.2-3B-Instruct is a dense 3B parameter model in the Llama family. ToolHalla tracks it for chat, creative with a 131k context window, Llama 3.2 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative

FamilyLlama

Context131k

LicenseLlama 3.2 Community

Quality55

Quantized memory

Q2_K 2 GBQ3_K_M 2.6 GBQ4_K_M 3.2 GBQ5_K_M 3.9 GBQ8_0 5.3 GB

Providers

ollamallama.cppvllm

StarCoder2-7B-Instruct

StarCoder2-7B-Instruct is a dense 7B parameter model in the StarCoder family. ToolHalla tracks it for coding with a 16k context window, OpenRAIL-M license, and local runtimes such as ollama, llama.cpp, vllm.

coding

FamilyStarCoder

Context16k

LicenseOpenRAIL-M

Quality50

Quantized memory

Q2_K 3.6 GBQ3_K_M 4.6 GBQ4_K_M 5.8 GBQ5_K_M 7.1 GBQ8_0 10.1 GB

Providers

ollamallama.cppvllm

Llama-3.2-1B-Instruct

Llama-3.2-1B-Instruct is a dense 1B parameter model in the Llama family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyLlama

Context33k

LicenseApache-2.0

Quality50

Quantized memory

Q2_K 0.5 GBQ3_K_M 0.6 GBQ4_K_M 0.7 GBQ5_K_M 0.9 GBQ8_0 1.2 GB

Providers

ollamallama.cppvllm

Gemma-2-2B-Instruct

Gemma-2-2B-Instruct is a dense 2B parameter model in the Gemma family. ToolHalla tracks it for chat, creative with a 8k context window, Gemma Terms license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative

FamilyGemma

Context8k

LicenseGemma Terms

Quality45

Quantized memory

Q2_K 1.6 GBQ3_K_M 2 GBQ4_K_M 2.5 GBQ5_K_M 3 GBQ8_0 4.2 GB

Providers

ollamallama.cppvllm

SmolLM2-1.7B-Instruct

1.7B

SmolLM2-1.7B-Instruct is a dense 1.7B parameter model in the SmolLM family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilySmolLM

Context33k

LicenseApache-2.0

Quality45

Quantized memory

Q2_K 1 GBQ3_K_M 1.3 GBQ4_K_M 1.6 GBQ5_K_M 2 GBQ8_0 2.5 GB

Providers

ollamallama.cppvllm

StarCoder2-3B-Instruct

StarCoder2-3B-Instruct is a dense 3B parameter model in the StarCoder family. ToolHalla tracks it for coding with a 16k context window, OpenRAIL-M license, and local runtimes such as ollama, llama.cpp, vllm.

coding

FamilyStarCoder

Context16k

LicenseOpenRAIL-M

Quality40

Quantized memory

Q2_K 2 GBQ3_K_M 2.6 GBQ4_K_M 3.2 GBQ5_K_M 3.9 GBQ8_0 5.3 GB

Providers

ollamallama.cppvllm

TinyLlama-1.1B

1.1B

TinyLlama-1.1B is a dense 1.1B parameter model in the TinyLlama family. ToolHalla tracks it for coding, chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

codingchatresearchcreativemath

FamilyTinyLlama

Context33k

LicenseApache-2.0

Quality35

Quantized memory

Q2_K 0.5 GBQ3_K_M 0.6 GBQ4_K_M 0.7 GBQ5_K_M 0.9 GBQ8_0 1.2 GB

Providers

ollamallama.cppvllm

Yi-1.5-34B-Chat

34B

Yi-1.5-34B-Chat is a dense 34B parameter model in the Yi family. ToolHalla tracks it for chat, research, creative, math with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearchcreativemath

FamilyYi

Context33k

LicenseApache-2.0

Quality30

Quantized memory

Q2_K 14.4 GBQ3_K_M 18.7 GBQ4_K_M 23.6 GBQ5_K_M 29.3 GBQ8_0 41.9 GB

Providers

ollamallama.cppvllm

Yi-1.5-9B-Chat

Yi-1.5-9B-Chat is a dense 9B parameter model in the Yi family. ToolHalla tracks it for chat, research, creative with a 33k context window, Apache-2.0 license, and local runtimes such as ollama, llama.cpp, vllm.

chatresearchcreative

FamilyYi

Context33k

LicenseApache-2.0

Quality25

Quantized memory

Q2_K 4.4 GBQ3_K_M 5.7 GBQ4_K_M 7.1 GBQ5_K_M 8.8 GBQ8_0 12.4 GB

Providers

ollamallama.cppvllm

Llama-3-8B-Instruct

Llama-3-8B-Instruct is a dense 8B parameter model in the Llama family. ToolHalla tracks it for chat, creative with a 8k context window, Llama 3 Community license, and local runtimes such as ollama, llama.cpp, vllm.

chatcreative

FamilyLlama

Context8k

LicenseLlama 3 Community

Quality14

Quantized memory

Q2_K 4 GBQ3_K_M 5.2 GBQ4_K_M 6.5 GBQ5_K_M 8 GBQ8_0 11.2 GB

Providers

ollamallama.cppvllm