Complete Ollama Models Guide 2025 - Every Model Explained | Practical Web Tools
What are the best Ollama models in 2025? Ollama now offers over 100 open-source AI models for local deployment, ranging from tiny 270M parameter models to massive 671B reasoning systems. The most popular choices are Llama 3.1 8B for general use (108M+ downloads), DeepSeek-R1 for advanced reasoning (75M+ downloads), and Gemma 3 for efficient multimodal tasks (28M+ downloads). This guide covers every model available on Ollama, helping you choose the right one for your specific needs.
Running AI locally has become essential for developers, researchers, and businesses who need privacy, cost control, and offline capability. With Ollama making local deployment as simple as running a single command, the only question remaining is which model to choose.
This comprehensive guide examines every model in the Ollama library, providing the technical details, performance characteristics, and practical recommendations you need to make informed decisions.
What Is Ollama and Why Does Model Selection Matter?
Ollama is an open-source platform that simplifies running large language models locally on your hardware. Instead of sending data to cloud APIs like OpenAI or Anthropic, you download models once and run them entirely on your machine. Your data never leaves your device.
The platform handles the complexity of model quantization, memory management, and optimization automatically. You run ollama run llama3.1 and start chatting within minutes.
Model selection matters because each model has different strengths:
- Parameter count affects capability and memory requirements
- Training focus determines whether models excel at code, reasoning, or conversation
- Quantization level trades quality for speed and memory efficiency
- Context window limits how much text the model can process at once
Choosing the wrong model wastes hardware resources or leaves performance on the table. This guide helps you match models to your actual needs.
The Meta Llama Family: The Foundation of Local AI
Meta's Llama models form the backbone of local AI. They are the most widely used, best supported, and most thoroughly tested models available.
Llama 3.3 (70B Parameters)
Llama 3.3 is Meta's latest flagship model, offering performance comparable to the much larger Llama 3.1 405B while requiring only 43GB of storage.
Key Specifications:
- Parameters: 70 billion
- Context Window: 128K tokens
- Size: 43GB
- Languages: English, German, French, Italian, Portuguese, Hindi, Spanish, Thai
- Downloads: 2.9 million
Best For: Users who need maximum capability and have RTX 4090 or Apple Silicon with 64GB+ memory. This model approaches GPT-4 quality for many tasks while running locally.
Hardware Requirements: Minimum 64GB RAM or 24GB VRAM with CPU offloading. Runs well on M2 Max or M3 Max MacBooks.
Llama 3.2 (1B and 3B Parameters)
Llama 3.2 represents Meta's push into efficient, edge-deployable models. These are designed for devices with limited resources.
Key Specifications:
- Parameters: 1B (1.3GB) or 3B (2.0GB)
- Context Window: 128K tokens
- Languages: 8 officially supported
- Downloads: 51 million
The 3B model outperforms Gemma 2 2.6B and Phi 3.5-mini on instruction-following, summarization, and tool use benchmarks.
Best For: Mobile development, IoT applications, and situations where you need AI on resource-constrained devices. Also excellent for rapid prototyping when speed matters more than maximum quality.
Hardware Requirements: Runs on any modern hardware. Even laptops with 8GB RAM handle these models comfortably.
Llama 3.2-Vision (11B and 90B Parameters)
The vision variants add image understanding to the Llama 3.2 architecture.
Key Specifications:
- Parameters: 11B (7.8GB) or 90B (55GB)
- Context Window: 128K tokens
- Capabilities: Image reasoning, captioning, visual question answering
- Input: Text and images
Best For: Applications requiring image analysis: document processing, visual content moderation, image-based research assistance. The 11B variant is the sweet spot for most users.
Limitations: Image+text combinations only support English. Text-only tasks support the full 8-language set.
Llama 3.1 (8B, 70B, and 405B Parameters)
Llama 3.1 remains the workhorse of local AI, with the 8B version being the most downloaded model on Ollama at over 108 million downloads.
Key Specifications:
- Sizes: 8B (4.9GB), 70B (43GB), 405B (243GB)
- Context Window: 128K tokens
- Capabilities: Tool use, multilingual, long-form summarization, coding
The 405B variant was the first openly available model to rival GPT-4 and Claude 3 Opus in capability.
Best For:
- 8B: Everyday professional work, document summarization, code generation, content drafting
- 70B: Complex analysis, detailed reasoning, high-stakes professional applications
- 405B: Research and enterprise applications requiring maximum capability
Hardware Requirements:
- 8B: 8GB VRAM or 16GB RAM
- 70B: 64GB RAM or distributed GPU setup
- 405B: Multiple high-end GPUs or specialized infrastructure
Llama 3 (8B and 70B Parameters)
The previous generation remains useful for applications optimized for its architecture.
Key Specifications:
- Sizes: 8B (4.7GB), 70B (40GB)
- Context Window: 8K tokens
- Downloads: 13.2 million
Best For: Legacy compatibility or when the shorter 8K context window is sufficient. Generally recommend Llama 3.1 for new projects.
Llama 2 (7B, 13B, and 70B Parameters)
The foundation that started the open-source AI revolution.
Key Specifications:
- Sizes: 7B (3.8GB), 13B (7.4GB), 70B (39GB)
- Context Window: 4K tokens
- Training: 2 trillion tokens
- Downloads: 4.9 million
Best For: Research comparisons, fine-tuning base models, or applications where you have existing Llama 2 infrastructure.
Llama 2 Uncensored (7B and 70B Parameters)
A variant of Llama 2 with safety guardrails removed, created using Eric Hartford's uncensoring methodology.
Key Specifications:
- Sizes: 7B (3.8GB), 70B (39GB)
- Context Window: 2K tokens
- Downloads: 1.5 million
Best For: Research purposes, creative writing without restrictions, or applications where you need the model to engage with topics the standard version refuses.
Caution: Use responsibly. The lack of guardrails means the model will attempt to comply with any request.
DeepSeek Models: The Reasoning Revolution
DeepSeek has emerged as a major force in open-source AI, particularly with reasoning-focused models.
DeepSeek-R1 (1.5B to 671B Parameters)
DeepSeek-R1 is a family of open reasoning models approaching the performance of OpenAI's o1 and Google's Gemini 2.5 Pro.
Key Specifications:
- Sizes: 1.5B (1.1GB), 7B (4.7GB), 8B (5.2GB), 14B (9.0GB), 32B (20GB), 70B (43GB), 671B (404GB)
- Context Window: 128K-160K tokens
- Downloads: 75.2 million
- License: MIT
The distilled models demonstrate that larger model reasoning patterns can transfer effectively to smaller models.
Best For: Mathematical reasoning, programming challenges, logical problem-solving, scientific analysis. The 14B-32B range offers the best balance of capability and hardware requirements.
Hardware Requirements:
- 7B-8B: 8GB VRAM
- 14B: 12GB VRAM
- 32B: 24GB VRAM
- 70B: 48GB+ VRAM or large RAM
- 671B: Specialized infrastructure
DeepSeek-Coder (1.3B to 33B Parameters)
A coding-focused model trained on 87% code and 13% natural language.
Key Specifications:
- Sizes: 1.3B (776MB), 6.7B (3.8GB), 33B (19GB)
- Context Window: 16K tokens
- Training: 2 trillion tokens
- Downloads: 2.4 million
Best For: Code completion, code generation, programming assistance, and technical documentation.
DeepSeek-Coder-V2 (16B and 236B Parameters)
An improved Mixture-of-Experts coding model achieving GPT-4 Turbo-level performance on code tasks.
Key Specifications:
- Sizes: 16B (8.9GB), 236B (133GB)
- Context Window: Up to 160K tokens
- Architecture: Mixture-of-Experts
- Downloads: 1.3 million
Best For: Professional development environments, code review automation, and complex programming tasks requiring maximum accuracy.
Google Gemma Family: Efficiency Meets Capability
Google's Gemma models leverage technology from the Gemini family in compact, efficient packages.
Gemma 3 (270M to 27B Parameters)
Gemma 3 is Google's latest, most capable model family that runs on a single GPU.
Key Specifications:
- Sizes: 270M (text only), 1B, 4B, 12B, 27B
- Context Window: 32K-128K tokens (varies by size)
- Languages: 140+ supported
- Multimodal: 4B and larger process text and images
- Downloads: 28.9 million
The 27B variant achieves 85.6 on HellaSwag and 89.0 on ARC-e benchmarks, competing with much larger models.
Best For: Multilingual applications, multimodal projects, and situations where you need strong performance with reasonable hardware. The 12B variant is particularly efficient.
Hardware Requirements:
- 270M-1B: Any modern hardware
- 4B: 6GB VRAM
- 12B: 12GB VRAM
- 27B: 20GB+ VRAM
Gemma 2 (2B, 9B, and 27B Parameters)
The previous generation remains excellent for many applications.
Key Specifications:
- Sizes: 2B (1.6GB), 9B (5.4GB), 27B (16GB)
- Context Window: 8K tokens
- Downloads: 12.3 million
The 27B variant delivers "performance surpassing models more than twice its size."
Best For: Creative text generation, chatbots, content summarization, NLP research, and language learning applications.
Gemma (2B and 7B Parameters)
The original Gemma release, lightweight but capable.
Key Specifications:
- Sizes: 2B (1.7GB), 7B (5.0GB)
- Context Window: 8K tokens
- Training: Web documents, code, mathematics
- Downloads: Included in Gemma 2 count
Best For: Edge deployments, resource-constrained environments, and applications needing a small but capable model.
CodeGemma (2B and 7B Parameters)
Google's code-specialized variant.
Key Specifications:
- Sizes: 2B (1.6GB), 7B (5.0GB)
- Context Window: 8K tokens
- Languages: Python, JavaScript, Java, Kotlin, C++, C#, Rust, Go, and others
- Training: 500 billion tokens including code and mathematics
Best For: IDE integration, code completion, fill-in-the-middle tasks, and coding assistant applications.
Alibaba Qwen Family: Multilingual Excellence
Qwen models from Alibaba excel at multilingual tasks and offer excellent performance across the capability spectrum.
Qwen3 (0.6B to 235B Parameters)
The latest Qwen generation provides dense and Mixture-of-Experts variants.
Key Specifications:
- Sizes: 0.6B, 1.7B, 4B, 8B (default), 14B, 30B, 32B, 235B
- Context Window: 40K-256K tokens
- Languages: 100+ languages and dialects
- Reasoning: Enhanced mathematics, code generation, logical reasoning
- Downloads: Combined with other Qwen models
The flagship Qwen3-235B competes with o1, DeepSeek-R1, and Gemini-2.5-Pro.
Best For: Multilingual applications, agent development, creative writing, role-playing, and multi-turn dialogue systems.
Qwen3-Coder (30B and 480B Parameters)
Alibaba's latest coding models optimized for agentic and coding tasks.
Key Specifications:
- Sizes: 30B (19GB), 480B (varies)
- Long Context: Optimized for extended code contexts
- Downloads: 1.6 million
Best For: Complex software development, large codebase navigation, and autonomous coding agents.
Qwen3-VL (2B to 235B Parameters)
The most powerful vision-language model in the Qwen family.
Key Specifications:
- Size Range: 2B to 235B
- Capabilities: Visual understanding, document analysis, multimodal reasoning
- Downloads: 881K
Best For: Document processing, visual question answering, and applications requiring both image and text understanding.
Qwen2.5-Coder (0.5B to 32B Parameters)
The latest code-specific Qwen models with significant improvements.
Key Specifications:
- Sizes: 0.5B (398MB), 1.5B, 3B, 7B, 14B, 32B (20GB)
- Context Window: 32K tokens
- Languages: 40+ programming languages
- Downloads: 9.5 million
The 32B variant achieves performance comparable to GPT-4o on code repair benchmarks (73.7 on Aider).
Best For: Professional development, code generation, code reasoning, and code fixing tasks.
Qwen2 (0.5B to 72B Parameters)
The previous generation with excellent multilingual support.
Key Specifications:
- Sizes: 0.5B (352MB), 1.5B (935MB), 7B (4.4GB), 72B (41GB)
- Context Window: 32K-128K tokens
- Languages: 29 languages including major European, Asian, and Middle Eastern languages
Best For: Multilingual chatbots, translation, and cross-lingual applications.
CodeQwen (7B Parameters)
An earlier code-specialized Qwen model.
Key Specifications:
- Size: 7B (4.2GB)
- Context Window: 64K tokens
- Training: 3 trillion tokens of code data
- Languages: 92 coding languages
Best For: Long-context code understanding, Text-to-SQL, and bug fixing.
Mistral AI Models: French Excellence
Mistral AI, based in Paris, has produced some of the most efficient and capable open-source models.
Mistral (7B Parameters)
The original Mistral model that proved smaller models could outperform much larger ones.
Key Specifications:
- Size: 7B (4.4GB)
- Context Window: 32K tokens
- License: Apache
- Downloads: 23.6 million
Outperforms Llama 2 13B on all benchmarks and approaches CodeLlama 7B on code tasks.
Best For: General-purpose applications, chatbots, and situations where you need reliable performance with moderate resources.
Mixtral 8x7B and 8x22B (47B and 141B Total Parameters)
Mistral's Mixture-of-Experts models that use only a fraction of their parameters for each inference.
Key Specifications:
- Sizes: 8x7B (26GB), 8x22B (80GB)
- Context Window: 32K-64K tokens
- Active Parameters: 13B (8x7B) or 39B (8x22B) per inference
- Languages: English, French, Italian, German, Spanish
- Downloads: 1.6 million
The 8x22B model offers "unparalleled cost efficiency" with 141B total parameters but only 39B active.
Best For: Applications requiring high capability with better efficiency than pure dense models. Excellent for multilingual European applications.
Microsoft Phi Family: Small But Mighty
Microsoft's Phi models prove that careful training can create remarkably capable small models.
Phi-4 (14B Parameters)
The latest Phi model trained on synthetic datasets and high-quality filtered data.
Key Specifications:
- Size: 14B (9.1GB)
- Context Window: 16K tokens
- Focus: Reasoning and logic
- Downloads: 6.7 million
Designed for memory/compute-constrained environments and latency-sensitive applications.
Best For: Edge deployment, real-time applications, and situations requiring strong reasoning in a compact package.
Phi-4-Reasoning (14B Parameters)
A fine-tuned variant specifically optimized for reasoning tasks.
Key Specifications:
- Size: 14B (11GB)
- Context Window: 32K tokens
- Training: Supervised fine-tuning + reinforcement learning
- Downloads: 916K
Outperforms DeepSeek-R1 Distill Llama 70B despite being 5x smaller.
Best For: Mathematical reasoning, scientific analysis, complex problem-solving, and coding tasks.
Phi-3 (3.8B and 14B Parameters)
The previous generation with excellent efficiency.
Key Specifications:
- Sizes: Mini 3.8B (2.2GB), Medium 14B (7.9GB)
- Context Window: 128K tokens
- Training: 3.3 trillion tokens
Best For: Quick prototyping, mobile applications, and situations where Phi-4 is too resource-intensive.
Phi-2 (2.7B Parameters)
Microsoft's earlier small model, still useful for many applications.
Key Specifications:
- Size: 2.7B (1.6GB)
- Context Window: 2K tokens
- Capabilities: Common-sense reasoning, language understanding
Best For: Extremely constrained environments, quick experiments, and applications where even Phi-3 is too large.
Coding-Specialized Models
Beyond the coding variants of general models, Ollama offers several dedicated code models.
CodeLlama (7B to 70B Parameters)
Meta's code-specialized version of Llama 2.
Key Specifications:
- Sizes: 7B (3.8GB), 13B (7.4GB), 34B (19GB), 70B (39GB)
- Context Window: 16K (2K for 70B)
- Languages: Python, C++, Java, PHP, TypeScript, C#, Bash
- Variants: Base, Instruct, Python-specialized
Best For: Code completion, generation, review, and fill-in-the-middle tasks.
StarCoder2 (3B, 7B, and 15B Parameters)
Next-generation open code models from BigCode.
Key Specifications:
- Sizes: 3B (1.7GB), 7B (4.0GB), 15B (9.1GB)
- Context Window: 16K tokens
- Languages: 17-600+ depending on variant
- Training: 3-4 trillion tokens
The 15B model matches larger 33B+ models on many benchmarks. The 3B variant performs comparably to the original StarCoder-15B.
Best For: Code completion, generation, and applications where transparency about training data matters.
WizardCoder (7B and 33B Parameters)
State-of-the-art code generation using Evol-Instruct techniques.
Key Specifications:
- Sizes: 7B (3.8GB), 33B (19GB)
- Context Window: 16K tokens
- Base: Code Llama and DeepSeek Coder
Best For: Advanced code generation tasks requiring high accuracy.
Stable Code 3B (3B Parameters)
Stability AI's efficient code completion model.
Key Specifications:
- Size: 3B (1.6GB)
- Context Window: 16K tokens
- Languages: 18 programming languages
- Feature: Fill-in-the-Middle capability
Best For: IDE integration, real-time code completion, and applications requiring fast inference.
Granite Code (3B to 34B Parameters)
IBM's decoder-only code models.
Key Specifications:
- Sizes: 3B (2.0GB), 8B (4.6GB), 20B (12GB), 34B (19GB)
- Context Window: 8K-125K tokens
- Capabilities: Code generation, explanation, fixing
Best For: Enterprise environments and applications requiring IBM's support and licensing terms.
Magicoder (7B Parameters)
Code models trained using the innovative OSS-Instruct methodology.
Key Specifications:
- Size: 7B (3.8GB)
- Context Window: 16K tokens
- Training: 75K synthetic instructions from open-source code
Best For: Diverse, realistic code generation with reduced training bias.
SQL-Specialized Models
For database work, these specialized models convert natural language to SQL.
SQLCoder (7B and 15B Parameters)
Fine-tuned on StarCoder specifically for SQL generation.
Key Specifications:
- Sizes: 7B (4.1GB), 15B (9.0GB)
- Context Window: 8K-32K tokens
Slightly outperforms GPT-3.5-turbo on natural language to SQL tasks.
Best For: Database querying, business intelligence, and SQL generation from natural language descriptions.
DuckDB-NSQL (7B Parameters)
Specialized for DuckDB SQL generation.
Key Specifications:
- Size: 7B (3.8GB)
- Context Window: 16K tokens
- Base: Llama-2 7B with SQL-specific training
Best For: DuckDB-specific applications, analytics workloads, and data engineering tasks.
Vision-Language Models
These models combine text and image understanding.
LLaVA (7B to 34B Parameters)
Large Language and Vision Assistant, combining vision encoders with language models.
Key Specifications:
- Sizes: 7B (4.7GB), 13B (8.0GB), 34B (20GB)
- Context Window: 4K-32K tokens
- Capabilities: Visual reasoning, OCR, image captioning
- Downloads: 12.3 million
Version 1.6 supports up to 672x672 resolution with improved OCR and visual reasoning.
Best For: General visual understanding, document analysis, and multimodal conversations.
LLaVA-Llama3 (8B Parameters)
LLaVA fine-tuned from Llama 3 Instruct with improved benchmark scores.
Key Specifications:
- Size: 8B (5.5GB)
- Context Window: 8K tokens
- Downloads: 2.1 million
Best For: Users who want LLaVA capabilities with Llama 3's improved language understanding.
BakLLaVA (7B Parameters)
Mistral 7B augmented with LLaVA architecture.
Key Specifications:
- Size: 7B (4.7GB)
- Context Window: 32K tokens
- Downloads: 373K
Best For: Visual understanding with Mistral's efficient architecture.
MiniCPM-V (8B Parameters)
Efficient multimodal model from OpenBMB.
Key Specifications:
- Size: 8B (5.5GB)
- Context Window: 32K tokens
- Resolution: Up to 1.8 million pixels
- Languages: English, Chinese, German, French, Italian, Korean
Achieves 65.2 on OpenCompass, surpassing GPT-4o mini and Claude 3.5 Sonnet on single-image understanding.
Best For: High-resolution image analysis, OCR-heavy applications, and efficient multimodal deployment.
Moondream (1.8B Parameters)
Tiny vision-language model designed for edge deployment.
Key Specifications:
- Size: 1.8B (1.7GB)
- Context Window: 2K tokens
- Downloads: 472K
Best For: Edge devices, mobile applications, and situations requiring vision capabilities with minimal resources.
Embedding Models
Embedding models convert text to numerical vectors for semantic search and retrieval.
nomic-embed-text
High-performance text embedding that surpasses OpenAI's ada-002 and text-embedding-3-small.
Key Specifications:
- Size: 274MB
- Context Window: 2K tokens
- Downloads: 48.7 million
Best For: Semantic search, similarity matching, and RAG applications.
mxbai-embed-large
State-of-the-art large embedding model from mixedbread.ai.
Key Specifications:
- Size: 335M (670MB)
- Context Window: 512 tokens
- Downloads: 6 million
Achieves top performance among BERT-large models on MTEB benchmark, outperforming OpenAI's commercial embedding.
Best For: High-accuracy embedding applications where quality matters more than speed.
BGE-M3
Versatile multilingual embedding model from BAAI.
Key Specifications:
- Size: 567M (1.2GB)
- Context Window: 8K tokens
- Languages: 100+ languages
- Capabilities: Dense, multi-vector, and sparse retrieval
- Downloads: 3 million
Best For: Multilingual retrieval, cross-lingual search, and applications requiring variable text lengths.
all-minilm
Lightweight embedding model for resource-constrained environments.
Key Specifications:
- Sizes: 46MB and 67MB variants
- Context Window: 512 tokens
- Downloads: 2.1 million
Best For: Quick prototyping, edge deployment, and applications where embedding model size matters.
Snowflake Arctic Embed
Retrieval-optimized embeddings from Snowflake.
Key Specifications:
- Sizes: 22M (46MB), 33M (67MB), 110M (219MB), 137M (274MB), 335M (669MB)
- Context Window: 512-2K tokens
Best For: Retrieval-focused applications, search systems, and production RAG pipelines.
Enterprise and Specialized Models
Command R (35B Parameters)
Cohere's model optimized for RAG and tool integration.
Key Specifications:
- Size: 35B (19GB)
- Context Window: 128K tokens
- Languages: 10+ languages
Best For: Enterprise RAG applications, tool-using agents, and production chatbots requiring high throughput.
Aya (8B and 35B Parameters)
Cohere's multilingual model supporting 23 languages.
Key Specifications:
- Sizes: 8B (4.8GB), 35B (20GB)
- Context Window: 8K tokens
- Downloads: 213K
Best For: Multilingual applications requiring strong cross-lingual performance.
Solar (10.7B Parameters)
Upstage's efficient model using Depth Up-Scaling.
Key Specifications:
- Size: 10.7B (6.1GB)
- Context Window: 4K tokens
- Base: Llama 2 architecture with Mistral weights
Outperforms models up to 30B parameters including Mixtral 8x7B on H6 benchmarks.
Best For: Single-turn conversations and applications where efficiency matters.
Nemotron (70B Parameters)
NVIDIA-customized Llama 3.1 for enhanced response quality.
Key Specifications:
- Size: 70B (43GB)
- Context Window: 128K tokens
- Training: RLHF with REINFORCE algorithm
Best For: Enterprise applications requiring NVIDIA ecosystem integration and high-quality responses.
InternLM2 (1.8B to 20B Parameters)
Shanghai AI Lab's model with outstanding reasoning capability.
Key Specifications:
- Sizes: 1.8B (1.1GB), 7B (4.5GB), 20B (11GB)
- Context Window: 32K-256K tokens
Best For: Mathematical reasoning, tool utilization, and web browsing applications.
Yi (6B to 34B Parameters)
01.ai's bilingual English-Chinese models.
Key Specifications:
- Sizes: 6B (3.5GB), 9B (5.0GB), 34B (19GB)
- Context Window: 4K tokens
- Training: 3 trillion tokens
Best For: English-Chinese bilingual applications and research.
Community and Fine-Tuned Models
OpenHermes (7B Parameters)
Teknium's fine-tune on Mistral using open datasets.
Key Specifications:
- Size: 7B (4.1GB)
- Context Window: 32K tokens
- Training: 900K instructions
- Downloads: Significant community adoption
Matches larger 70B models on certain benchmarks.
Best For: Multi-turn conversations, coding tasks, and applications requiring strong instruction-following.
Dolphin-Mixtral (8x7B and 8x22B)
Uncensored fine-tune of Mixtral optimized for coding.
Key Specifications:
- Sizes: 8x7B (26GB), 8x22B (80GB)
- Context Window: 32K-64K tokens
- Downloads: 799K
Best For: Uncensored coding assistance and creative applications.
Zephyr (7B and 141B Parameters)
HuggingFace's helpful assistant models.
Key Specifications:
- Sizes: 7B (4.1GB), 141B (80GB)
- Context Window: 32K-64K tokens
- Downloads: 338K
Best For: Helpful, conversational applications prioritizing user assistance.
OpenChat (7B Parameters)
C-RLFT trained model that surpasses ChatGPT on various benchmarks.
Key Specifications:
- Size: 7B (4.1GB)
- Context Window: 8K tokens
- Downloads: 253K
Best For: Chat applications requiring strong open-source performance.
Nous-Hermes 2 (10.7B and 34B Parameters)
Nous Research's scientific and coding-focused models.
Key Specifications:
- Sizes: 10.7B (6.1GB), 34B (19GB)
- Context Window: 4K tokens
- Downloads: 196K
Best For: Scientific discussion, coding tasks, and research applications.
Samantha-Mistral (7B Parameters)
Eric Hartford's companion assistant trained on philosophy and psychology.
Key Specifications:
- Size: 7B (4.1GB)
- Context Window: 32K tokens
- Downloads: 159K
Best For: Conversational AI emphasizing personal development and relationship coaching.
Vicuna (7B to 33B Parameters)
LMSYS's chat assistant trained on ShareGPT conversations.
Key Specifications:
- Sizes: 7B (3.8GB), 13B (7.4GB), 33B (18GB)
- Context Window: 2K-16K tokens
Best For: General chat applications and fine-tuning experiments.
Orca-Mini (3B to 70B Parameters)
Llama-based models trained using Orca methodology.
Key Specifications:
- Sizes: 3B (2.0GB), 7B (3.8GB), 13B (7.4GB), 70B (39GB)
- Context Window: Various
Best For: Entry-level hardware deployments and learning complex reasoning patterns.
Neural Chat (7B Parameters)
Intel's Mistral-based model for high-performance chatbots.
Key Specifications:
- Size: 7B (4.1GB)
- Context Window: 32K tokens
- Downloads: 198K
Best For: Chatbot applications optimized for Intel hardware.
TinyLlama (1.1B Parameters)
Compact Llama trained on 3 trillion tokens.
Key Specifications:
- Size: 1.1B (638MB)
- Context Window: 2K tokens
- Downloads: 3.2 million
Best For: Extremely constrained environments and minimal footprint deployments.
EverythingLM (13B Parameters)
Uncensored Llama 2 with 16K context.
Key Specifications:
- Size: 13B (7.4GB)
- Context Window: 16K tokens
- Downloads: 91K
Best For: Extended context applications without content restrictions.
Notux (8x7B Parameters)
Optimized Mixtral variant.
Key Specifications:
- Size: 8x7B (26GB)
- Context Window: 32K tokens
Best For: Users wanting improved Mixtral performance through fine-tuning.
XWinLM (7B and 13B Parameters)
Llama 2-based model with competitive benchmark performance.
Key Specifications:
- Sizes: 7B (3.8GB), 13B (7.4GB)
- Context Window: 4K tokens
- Downloads: 143K
Best For: General chat and alternative to base Llama 2.
Domain-Specific Models
Meditron (7B and 70B Parameters)
Medical-specialized model from EPFL.
Key Specifications:
- Sizes: 7B (3.8GB), 70B (39GB)
- Context Window: 2K-4K tokens
Outperforms Llama 2, GPT-3.5, and Flan-PaLM on many medical reasoning tasks.
Best For: Medical question answering, differential diagnosis support, and health information (with appropriate clinical oversight).
MedLlama2 (7B Parameters)
Llama 2 fine-tuned on MedQA dataset.
Key Specifications:
- Size: 7B (3.8GB)
- Context Window: 4K tokens
- Downloads: 114K
Best For: Medical question-answering and research (not for clinical use).
Wizard-Math (7B to 70B Parameters)
Mathematical reasoning specialist.
Key Specifications:
- Sizes: 7B (4.1GB), 13B (7.4GB), 70B (39GB)
- Context Window: 2K-32K tokens
- Downloads: 164K
Best For: Mathematical problem-solving, tutoring applications, and computational reasoning.
FunctionGemma (270M Parameters)
Google's Gemma 3 variant fine-tuned for function calling.
Key Specifications:
- Size: 270M
- Specialization: Tool and function calling
- Downloads: 13K
Best For: Agent development and applications requiring reliable function calling.
Multilingual Models
StableLM2 (1.6B and 12B Parameters)
Stability AI's multilingual model.
Key Specifications:
- Sizes: 1.6B (983MB), 12B (7.0GB)
- Context Window: 4K tokens
- Languages: English, Spanish, German, Italian, French, Portuguese, Dutch
- Downloads: 179K
Best For: Multilingual European applications with moderate resource requirements.
Falcon (7B to 180B Parameters)
Technology Innovation Institute's multilingual models.
Key Specifications:
- Sizes: 7B (4.2GB), 40B (24GB), 180B (101GB)
- Context Window: 2K tokens
The 180B variant performs between GPT-3.5 and GPT-4 levels.
Best For: High-capability multilingual applications and research.
Hardware Requirements Quick Reference
| Model Category | VRAM Needed | RAM Alternative | Best GPUs |
|---|---|---|---|
| 1-3B models | 4GB | 8GB | Any modern GPU |
| 7-8B models | 8GB | 16GB | RTX 3060, RTX 4060 |
| 13-14B models | 12GB | 24GB | RTX 3060 12GB, RTX 4070 |
| 32-34B models | 24GB | 48GB | RTX 4090, A6000 |
| 70B models | 48GB+ | 64GB+ | Multiple GPUs, Apple Silicon |
| 100B+ models | Specialized | 128GB+ | Enterprise infrastructure |
Apple Silicon Recommendations:
- M1/M2 (16GB): 7-8B models comfortably
- M2 Pro/M3 Pro (32GB): Up to 32B models, 70B with slow speed
- M3 Max (128GB): 70B models at usable speeds
How to Choose the Right Model
For General Chat and Assistance
- Budget hardware: Llama 3.2 3B, Phi-3 Mini, Gemma 2 2B
- Standard hardware: Llama 3.1 8B, Mistral 7B, Gemma 3 12B
- High-end hardware: Llama 3.3 70B, Qwen3 32B
For Coding and Development
- Quick completions: Stable Code 3B, CodeGemma 2B
- General coding: Qwen2.5-Coder 7B, DeepSeek-Coder 6.7B
- Maximum quality: Qwen2.5-Coder 32B, DeepSeek-Coder-V2 16B
For Reasoning and Analysis
- Efficient reasoning: Phi-4-Reasoning, DeepSeek-R1 14B
- Maximum capability: DeepSeek-R1 70B, Qwen3 32B
For Image Understanding
- Lightweight: Moondream, LLaVA 7B
- Balanced: MiniCPM-V, Gemma 3 12B
- Maximum capability: Llama 3.2-Vision 90B, Qwen3-VL
For Multilingual Applications
- European languages: Mixtral 8x7B, StableLM2
- Asian languages: Qwen3, Yi
- 100+ languages: BGE-M3, Qwen2
For RAG and Search
- Standard embedding: nomic-embed-text, all-minilm
- High-quality embedding: mxbai-embed-large, BGE-M3
- RAG systems: Command R with your embedding choice
Getting Started with Ollama
Installing Ollama and running your first model takes just a few minutes:
- Install Ollama: Download from ollama.com for Windows, Mac, or Linux
- Pull a model:
ollama pull llama3.1 - Start chatting:
ollama run llama3.1
For integration with applications, Ollama provides a REST API at localhost:11434:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "Why is the sky blue?"
}'
Using Local AI on Practical Web Tools
If you want to experience local AI without any setup, try our AI Chat feature. It connects to your local Ollama installation, providing a polished interface while keeping all processing on your machine. Your prompts never touch our servers, maintaining complete privacy.
The interface works with any Ollama model. Simply select your preferred model and start chatting. Combined with our privacy-focused file conversion tools, you can build complete local workflows without sending sensitive data to the cloud.
Frequently Asked Questions
What is the best Ollama model for beginners?
Start with Llama 3.1 8B. It runs on most hardware (8GB VRAM or 16GB RAM), provides excellent quality across diverse tasks, and has the largest community support. Once comfortable, explore specialized models based on your specific needs.
How much VRAM do I need for Ollama?
For 7-8B models, 8GB VRAM is sufficient. For 13-14B models, aim for 12GB. For 32B+ models, you need 24GB or more. Alternatively, models can run in system RAM at reduced speed, roughly doubling the memory requirement.
What is the fastest Ollama model?
The fastest capable model is Llama 3.2 1B or Phi-3 Mini, generating 100+ tokens per second on modest hardware. For usable quality, Llama 3.1 8B at 40-70 tokens per second on modern GPUs offers the best speed/quality balance.
Which Ollama model is best for coding?
Qwen2.5-Coder 32B offers the best quality, matching GPT-4o on code repair benchmarks. For smaller hardware, Qwen2.5-Coder 7B or DeepSeek-Coder 6.7B provide excellent results. StarCoder2 15B offers transparency about training data.
Can Ollama models process images?
Yes. Llama 3.2-Vision, LLaVA, MiniCPM-V, BakLLaVA, Moondream, and Gemma 3 (4B+) all process images. MiniCPM-V and LLaVA 1.6 offer the best image understanding for their size.
What is the difference between quantization levels?
Q4 uses 4 bits per parameter, reducing model size by 75% with minimal quality loss. Q8 uses 8 bits for higher quality but more memory. Q2-Q3 saves more memory but noticeably degrades quality. For most uses, Q4_K_M is the sweet spot.
How do I choose between Llama, Mistral, and Qwen?
Llama has the largest ecosystem and broadest support. Mistral offers excellent efficiency and European language performance. Qwen excels at multilingual tasks (especially Asian languages) and provides strong coding variants. Try each for your specific task.
Are these models safe to use?
Most models include safety training. However, "uncensored" variants (Llama 2 Uncensored, Dolphin-Mixtral) have guardrails removed and should be used responsibly. Always implement appropriate safeguards for production applications.
How do Ollama models compare to ChatGPT?
Llama 3.1 70B and DeepSeek-R1 70B approach GPT-4 quality for many tasks. For everyday use, Llama 3.1 8B competes with GPT-3.5. The gap has narrowed significantly, though frontier models still lead on the most complex reasoning.
Can I fine-tune Ollama models?
Ollama itself runs pre-existing models. For fine-tuning, use the base models from HuggingFace with tools like Axolotl or PEFT, then import the fine-tuned weights into Ollama.
Conclusion
The Ollama model library offers something for every use case, from tiny 270M parameter edge models to massive 671B reasoning systems. The key is matching model capabilities to your actual needs rather than always choosing the largest option.
For most users, starting with Llama 3.1 8B provides an excellent foundation. As you identify specific needs, whether coding, reasoning, multilingual support, or image understanding, explore the specialized models in those categories.
Local AI has reached a maturity where quality rivals cloud APIs for many tasks, while offering complete privacy, zero ongoing costs, and offline capability. With Ollama making deployment trivial, the only barrier is choosing your first model.
Start experimenting today with our AI Chat feature, which connects seamlessly to your local Ollama installation for a polished, private AI experience.
Model information current as of December 2025. Download counts and specifications updated regularly by Ollama. Always check ollama.com/library for the latest models and versions.