RunLocal

Tool catalog

The software you need to actually run an AI model.

A model is just a file. To use it, you need software that loads the file and lets you talk to it. The tools below cover everything from one-click chat apps for beginners to industrial servers for teams. Pick by category: a Runtime is the engine, a GUI is the friendly app on top, a Server is for sharing it with multiple users at once. If you are just starting, Ollama is the easiest entry point.

Runtime2 entries

Ollama

Runtime

MIT

The fastest way to get a local LLM running with one command.

macOSLinuxWindows

Strengths

  • One-line install, one-line model pulls
  • Built-in OpenAI-compatible API on localhost:11434
  • Active model library with 4,500+ tagged variants

Trade-offs

  • Less raw throughput than vLLM under heavy concurrent load
  • Configuration is opinionated; advanced tuning means dropping into llama.cpp anyway

llama.cpp

Runtime

MIT

Maximum control and the broadest hardware coverage in the open ecosystem.

macOSLinuxWindowsAndroidiOS

Strengths

  • Runs almost anywhere: CUDA, ROCm, Metal, Vulkan, CPU-only
  • Tight GGUF quantization control
  • Reference implementation behind most desktop LLM tools

Trade-offs

  • Command-line first; the UX assumes you read READMEs
  • Quantization options multiply quickly, easy to pick the wrong one

GUI5 entries

LM Studio

GUI

Proprietary

Browsing, comparing and chatting with local models in a desktop GUI.

macOSLinuxWindows

Strengths

  • Polished chat UI with side-by-side model comparison
  • Built-in Hugging Face model browser
  • Local OpenAI-compatible API server with one click

Trade-offs

  • Closed source; the engine is llama.cpp but the shell is not
  • Less scriptable than CLI-first tools

Open WebUI

GUI

MIT

A multi-user web frontend that talks to Ollama or any OpenAI-compatible backend.

Web (Docker), self-hosted

Strengths

  • Multi-user with authentication and chat history
  • Tool calling, RAG and prompt templates out of the box
  • Drop-in replacement for the ChatGPT web UI inside your network

Trade-offs

  • Needs Docker or Python plus a separate inference backend
  • Feature breadth means a steeper config surface
Visit project →No guide yet

Jan

GUI

AGPL-3.0

An open source desktop alternative to LM Studio.

macOSLinuxWindows

Strengths

  • Fully open source desktop client
  • Local-first design, no required cloud account
  • Plugin system for extensions

Trade-offs

  • Model catalog is smaller than LM Studio's
  • Newer project; some rough edges on Windows
Visit project →No guide yet

GPT4All

GUI

MIT

A friendly desktop client aimed at non-technical users.

macOSLinuxWindows

Strengths

  • Lowest barrier to entry of any desktop LLM client
  • Local document chat (RAG) built in
  • Cross-platform installers

Trade-offs

  • Less raw control than llama.cpp
  • Performance depends on the bundled engine version
Visit project →No guide yet

text-generation-webui

GUI

AGPL-3.0

Power users who want every knob exposed.

LinuxWindowsmacOS

Strengths

  • Supports multiple backends (Transformers, llama.cpp, ExLlamaV2)
  • Detailed sampler controls
  • Extension ecosystem for RAG, characters, voice

Trade-offs

  • Setup can be fiddly across CUDA versions
  • UI density is intimidating for newcomers
Visit project →No guide yet

Server3 entries

vLLM

Server

Apache

Production-grade inference with concurrent users and high throughput targets.

Linux (CUDA, ROCm)

Strengths

  • PagedAttention for memory-efficient KV cache
  • Continuous batching and speculative decoding
  • An order of magnitude more throughput than Ollama under heavy concurrency

Trade-offs

  • GPU-only path; not aimed at single-user desktops
  • Operational complexity is real; budget for tuning
Visit project →No guide yet

LocalAI

Server

MIT

A self-hosted drop-in for the OpenAI API, with multi-model support.

LinuxmacOSWindows (Docker)

Strengths

  • OpenAI API compatibility across chat, embeddings, images, audio
  • Pluggable backends including llama.cpp, whisper.cpp, diffusers
  • Designed for Docker and Kubernetes deployments

Trade-offs

  • Configuration sprawls quickly as you add modalities
  • Performance depends heavily on the underlying backend you pick
Visit project →No guide yet

Hugging Face Text Generation Inference

Server

Apache

A production server that pairs naturally with Hugging Face Hub.

Linux (CUDA, ROCm)

Strengths

  • Tensor parallelism, continuous batching, quantization
  • First-class integration with HF Hub models
  • OpenAI-compatible endpoint

Trade-offs

  • Throughput sometimes lags behind vLLM on the same hardware
  • Less community plugin work than vLLM
Visit project →No guide yet

Orchestrator1 entry

Kubernetes + Kubeflow

Orchestrator

Apache

Operating inference at scale across a fleet of GPUs.

Linux (any cluster)

Strengths

  • Mature operator pattern for batch and online inference
  • Pairs well with vLLM and Triton
  • Strong story for multi-tenant workloads

Trade-offs

  • Heavyweight; only worth it past a certain scale
  • Ops cost is non-trivial
Visit project →No guide yet

Framework2 entries

LangChain

Framework

MIT

Wiring models, tools, retrieval and memory into application logic.

PythonJavaScript

Strengths

  • Large community and integrations catalog
  • Useful patterns for agents, RAG, multi-step chains
  • Pairs with most local runtimes via OpenAI-compatible APIs

Trade-offs

  • Surface area is huge and not always cohesive
  • Abstraction overhead is sometimes more cost than value
Visit project →No guide yet

LlamaIndex

Framework

MIT

Building retrieval pipelines and document-grounded chatbots.

PythonTypeScript

Strengths

  • Strong primitives for indexing and retrieval
  • Many connectors to data sources
  • Works against any OpenAI-compatible local endpoint

Trade-offs

  • Naming and module reshuffling has been frequent
  • Some abstractions feel premature
Visit project →No guide yet