Tool catalog

The software you need to actually run an AI model.

A model is just a file. To use it, you need software that loads the file and lets you talk to it. The tools below cover everything from one-click chat apps for beginners to industrial servers for teams. Pick by category: a Runtime is the engine, a GUI is the friendly app on top, a Server is for sharing it with multiple users at once. If you are just starting, Ollama is the easiest entry point.

Runtime2 entries

Ollama

Runtime

MIT

The fastest way to get a local LLM running with one command.

macOSLinuxWindows

Strengths

One-line install, one-line model pulls
Built-in OpenAI-compatible API on localhost:11434
Active model library with 4,500+ tagged variants

Trade-offs

Less raw throughput than vLLM under heavy concurrent load
Configuration is opinionated; advanced tuning means dropping into llama.cpp anyway

Visit project →Read install guide

llama.cpp

Runtime

MIT

Maximum control and the broadest hardware coverage in the open ecosystem.

macOSLinuxWindowsAndroidiOS

Strengths

Runs almost anywhere: CUDA, ROCm, Metal, Vulkan, CPU-only
Tight GGUF quantization control
Reference implementation behind most desktop LLM tools

Trade-offs

Command-line first; the UX assumes you read READMEs
Quantization options multiply quickly, easy to pick the wrong one

Visit project →Read install guide

GUI5 entries

LM Studio

GUI

Proprietary

Browsing, comparing and chatting with local models in a desktop GUI.

macOSLinuxWindows

Strengths

Polished chat UI with side-by-side model comparison
Built-in Hugging Face model browser
Local OpenAI-compatible API server with one click

Trade-offs

Closed source; the engine is llama.cpp but the shell is not
Less scriptable than CLI-first tools

Visit project →Read install guide

Open WebUI

GUI

MIT

A multi-user web frontend that talks to Ollama or any OpenAI-compatible backend.

Web (Docker), self-hosted

Strengths

Multi-user with authentication and chat history
Tool calling, RAG and prompt templates out of the box
Drop-in replacement for the ChatGPT web UI inside your network

Trade-offs

Needs Docker or Python plus a separate inference backend
Feature breadth means a steeper config surface

Visit project →No guide yet

Jan

GUI

AGPL-3.0

An open source desktop alternative to LM Studio.

macOSLinuxWindows

Strengths

Fully open source desktop client
Local-first design, no required cloud account
Plugin system for extensions

Trade-offs

Model catalog is smaller than LM Studio's
Newer project; some rough edges on Windows

Visit project →No guide yet

GPT4All

GUI

MIT

A friendly desktop client aimed at non-technical users.

macOSLinuxWindows

Strengths

Lowest barrier to entry of any desktop LLM client
Local document chat (RAG) built in
Cross-platform installers

Trade-offs

Less raw control than llama.cpp
Performance depends on the bundled engine version

Visit project →No guide yet

text-generation-webui

GUI

AGPL-3.0

Power users who want every knob exposed.

LinuxWindowsmacOS

Strengths

Supports multiple backends (Transformers, llama.cpp, ExLlamaV2)
Detailed sampler controls
Extension ecosystem for RAG, characters, voice

Trade-offs

Setup can be fiddly across CUDA versions
UI density is intimidating for newcomers

Visit project →No guide yet

Server3 entries

vLLM

Server

Apache

Production-grade inference with concurrent users and high throughput targets.

Linux (CUDA, ROCm)

Strengths

PagedAttention for memory-efficient KV cache
Continuous batching and speculative decoding
An order of magnitude more throughput than Ollama under heavy concurrency

Trade-offs

GPU-only path; not aimed at single-user desktops
Operational complexity is real; budget for tuning

Visit project →No guide yet

LocalAI

Server

MIT

A self-hosted drop-in for the OpenAI API, with multi-model support.

LinuxmacOSWindows (Docker)

Strengths

OpenAI API compatibility across chat, embeddings, images, audio
Pluggable backends including llama.cpp, whisper.cpp, diffusers
Designed for Docker and Kubernetes deployments

Trade-offs

Configuration sprawls quickly as you add modalities
Performance depends heavily on the underlying backend you pick

Visit project →No guide yet

Hugging Face Text Generation Inference

Server

Apache

A production server that pairs naturally with Hugging Face Hub.

Linux (CUDA, ROCm)

Strengths

Tensor parallelism, continuous batching, quantization
First-class integration with HF Hub models
OpenAI-compatible endpoint

Trade-offs

Throughput sometimes lags behind vLLM on the same hardware
Less community plugin work than vLLM

Visit project →No guide yet

Orchestrator1 entry

Kubernetes + Kubeflow

Orchestrator

Apache

Operating inference at scale across a fleet of GPUs.

Linux (any cluster)

Strengths

Mature operator pattern for batch and online inference
Pairs well with vLLM and Triton
Strong story for multi-tenant workloads

Trade-offs

Heavyweight; only worth it past a certain scale
Ops cost is non-trivial

Visit project →No guide yet

Framework2 entries

LangChain

Framework

MIT

Wiring models, tools, retrieval and memory into application logic.

PythonJavaScript

Strengths

Large community and integrations catalog
Useful patterns for agents, RAG, multi-step chains
Pairs with most local runtimes via OpenAI-compatible APIs

Trade-offs

Surface area is huge and not always cohesive
Abstraction overhead is sometimes more cost than value

Visit project →No guide yet

LlamaIndex

Framework

MIT

Building retrieval pipelines and document-grounded chatbots.

PythonTypeScript

Strengths

Strong primitives for indexing and retrieval
Many connectors to data sources
Works against any OpenAI-compatible local endpoint

Trade-offs

Naming and module reshuffling has been frequent
Some abstractions feel premature

Visit project →No guide yet