Models LibraryDiscover amazing Large Language Models by the gaia community!
All LLMs
Codestral
Codestral is Mistral AI's 22B code model, proficient in over 80 programming languages. It can generate code, write tests and complete partial code for tasks such as Java, Python or C++.
DeepSeekR1DistillLlama
DeepSeek-R1-Zero, a reinforcement learning model, shows remarkable reasoning performance but faces challenges. DeepSeek-R1 addresses these issues and matches OpenAI-o1's performance.
EXAONE3.5
EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B, build and developed by LG AI Research.
EXAONEDeep
EXAONE reasoning model series of 2.4B, 7.8B, and 32B, optimized for reasoning tasks including math and coding.
gemma2
A family of lightweight open language models from Google, based on Gemini technology. These efficient models excel at text generation tasks while being small enough to run on personal computers.
gemma3
Gemma 3 is a collection of lightweight, state-of-the-art open models built from the same research and technology that powers our Gemini 2.0 models. These are our most advanced, portable and responsibly developed open models yet. They are designed to run fast, directly on devices — from phones and laptops to workstations — helping developers create AI applications, wherever people need them. Gemma 3 comes in a range of sizes (1B, 4B, 12B and 27B), allowing you to choose the best model for your specific hardware and performance needs.
Llama3.1
The Meta Llama 3.1 collection includes pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. These text-only models excel in multilingual dialogue applications, outperforming many open-source and closed-source chat models on industry benchmarks.
Llama3.2
The Meta Llama 3.2 collection features pre-trained and instruction-tuned generative models in 1B and 3B sizes. These text-only models excel in multilingual dialogue applications, including agentic retrieval and summarization tasks, and outperform many open-source and closed-source chat models on industry benchmarks.
Llama3.3
The Meta Llama 3.3 model is a 70B-sized, instruction-tuned LLM optimized for multilingual dialogue applications. It outperforms many open-source and closed-source chat models on industry benchmarks, making it a top choice for dialogue use cases.
MiniCPM-V-2_6
MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding.
Phi3.5
This model, Phi-3.5-mini, is a compact version of the advanced Phi-3 family, trained on high-quality data and focusing on complex reasoning.
Phi4
Phi-4 is a cutting-edge model combining high-quality data from books, websites and Q&A datasets for advanced reasoning. It was refined through supervised fine-tuning and safety optimization.
Qwen2.5
The latest series of Code-Specific Qwen models, delivering significant improvements in code generation, reasoning, and fixing capabilities.
Qwen2.5Coder
The latest series of Code-Specific Qwen models, delivering significant improvements in code generation, reasoning, and fixing capabilities.
QwQ
QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.
Yi1.5
Yi-1.5 is an improved version of Yi, trained on 500B tokens and fine-tuned on 3M examples. It exceeds Yi in coding, math, reasoning, and following instructions, while maintaining strong language understanding and reading skills.
YiCoder
An efficient open-source code language model delivering state-of-the-art performance under 10B parameters, with 128K token context length and support for 52 programming languages.