RunLocal

Hardware-aware · Open source · Updated weekly

Local AI: run open source models on your own computer.

The hub for local LLMs, install guides and a hardware picker.

You can run AI models like the ones behind ChatGPT or Claude directly on your laptop or desktop, without sending your data anywhere. RunLocal shows you which open source model to choose for your hardware, which free software to install (Ollama, LM Studio, llama.cpp), and how to get started in about ten minutes. No prior knowledge required.

New to this? Start with the glossary for plain-language definitions, or jump straight to the hardware picker to see what your computer can run.

Models worth your disk space

A short list, opinionated. Full catalog in the directory.

See all models →

Llama 4 (Scout & Maverick)

Meta AI · United States

2025
License
Llama Community License (custom)
Context
Up to 10M tokens (Scout)
Sizes
Scout 109B (MoE, 17B active) · Maverick 400B (MoE, ~17B active)
Long-context retrievalCodebase-scale RAGGeneral reasoning

Mixture-of-experts architecture. Scout pushes context to 10M tokens, Maverick targets frontier-tier reasoning. Custom license restricts use above 700M MAU and forbids training competitors.

Qwen 3.5

Alibaba (Qwen team) · China

2025
License
Apache 2.0 (small sizes); Qwen License (larger)
Context
Up to 1M (long-context variant)
Sizes
1.8B · 4B · 7B · 14B · 32B · 72B · MoE variants
MultilingualCodeCost-sensitive deployments

The most permissive frontier-class family for smaller sizes. Strong on Asian languages, competitive on English code and reasoning benchmarks.

DeepSeek V4 (Pro & Flash)

DeepSeek AI · China

2025
License
MIT (V4) / DeepSeek License (V4 Pro variants)
Context
1M tokens
Sizes
236B MoE · Distilled dense variants
Mathematical reasoningCode generationLong-context analysis

Topped open-source leaderboards on SWE-Bench and GPQA Diamond in early 2026. The MIT-licensed core variant is the most permissive of the top-tier open weights.

Mistral Medium 3.5

Mistral AI · France (EU)

2026
License
Apache 2.0 (open weight tier)
Context
256k tokens
Sizes
~70B class
EU-friendly deploymentsCodingCompliance-sensitive workloads

Released April 2026. The most credible non-Chinese, non-American option at frontier level. Mistral remains the leading European LLM lab by capability.

Snapshot · May 15, 2026

Trending on Hugging Face

Auto-curated from the Hugging Face Hub by a weighted mix of downloads, community likes and recency. Refreshes weekly via GitHub Action. License tier is a quick visual hint, not legal advice.

DeepSeek-V4-Pro

deepseek-ai

Permissive

MIT

Downloads
2.8M
Likes
4.0k
Updated
9 days ago
View on HF →score 0.89

Kimi-K2-Instruct

moonshotai

Open weight

Custom license

Downloads
826.7k
Likes
2.4k
Updated
22 days ago
View on HF →score 0.83

DeepSeek-V4-Flash

deepseek-ai

Permissive

MIT

Downloads
1.6M
Likes
1.1k
Updated
9 days ago
View on HF →score 0.83

GLM-5.1

zai-org

Permissive

MIT

Downloads
241.3k
Likes
1.6k
Updated
2 days ago
View on HF →score 0.81

MiniMax-M2.7

MiniMaxAI

Open weight

Custom license

Downloads
635.6k
Likes
1.1k
Updated
25 days ago
View on HF →score 0.79

GLM-5

zai-org

Permissive

MIT

Downloads
189.3k
Likes
2.1k
Updated
1 months ago
View on HF →score 0.77

DeepSeek-R1

deepseek-ai

Permissive

MIT

Downloads
3.8M
Likes
13.3k
Updated
1 years ago
View on HF →score 0.76

MiniMax-M2.5

MiniMaxAI

Open weight

Custom license

Downloads
780.4k
Likes
1.5k
Updated
2 months ago
View on HF →score 0.76

The tools that actually run them

Runtimes, GUIs and inference servers, with their real trade-offs.

See all tools →

Ollama

Runtime

MIT

The fastest way to get a local LLM running with one command.

macOSLinuxWindows

Strengths

  • One-line install, one-line model pulls
  • Built-in OpenAI-compatible API on localhost:11434
  • Active model library with 4,500+ tagged variants

Trade-offs

  • Less raw throughput than vLLM under heavy concurrent load
  • Configuration is opinionated; advanced tuning means dropping into llama.cpp anyway

llama.cpp

Runtime

MIT

Maximum control and the broadest hardware coverage in the open ecosystem.

macOSLinuxWindowsAndroidiOS

Strengths

  • Runs almost anywhere: CUDA, ROCm, Metal, Vulkan, CPU-only
  • Tight GGUF quantization control
  • Reference implementation behind most desktop LLM tools

Trade-offs

  • Command-line first; the UX assumes you read READMEs
  • Quantization options multiply quickly, easy to pick the wrong one

LM Studio

GUI

Proprietary

Browsing, comparing and chatting with local models in a desktop GUI.

macOSLinuxWindows

Strengths

  • Polished chat UI with side-by-side model comparison
  • Built-in Hugging Face model browser
  • Local OpenAI-compatible API server with one click

Trade-offs

  • Closed source; the engine is llama.cpp but the shell is not
  • Less scriptable than CLI-first tools

vLLM

Server

Apache

Production-grade inference with concurrent users and high throughput targets.

Linux (CUDA, ROCm)

Strengths

  • PagedAttention for memory-efficient KV cache
  • Continuous batching and speculative decoding
  • An order of magnitude more throughput than Ollama under heavy concurrency

Trade-offs

  • GPU-only path; not aimed at single-user desktops
  • Operational complexity is real; budget for tuning
Visit project →No guide yet

Install guides

Latest writing

Why bother running AI locally?

The big cloud services are easier to start with. But there are real reasons to do it yourself. Three of them.

Your data stays with you

What you type and what the model answers never leave your computer. Handy when you are working with personal notes, client documents, internal code, or anything you would not paste into a public website.

It works even when the cloud does not

The model file lives on your disk. If the company that made it shuts down, raises prices, or simply changes its terms, your setup keeps working. The model you download today still runs in 2030 if your computer does.

No surprise bills

Cloud AI charges per use. Local AI costs you the price of your computer, plus electricity. After the first month, the marginal cost of an extra question is essentially zero.