Install guide · Beginner · 12 min

LM Studio setup and side-by-side model evaluation

LM Studio is the most polished desktop client for running open weights on your machine. It uses llama.cpp underneath, adds a model browser wired to Hugging Face, a chat UI, and a one-click OpenAI-compatible server. The shortest way to describe its niche: it is the tool you reach for when you want to evaluate three or four models against the same prompts before committing to one.

When LM Studio is the right tool

LM Studio earns its disk space when you are still deciding which model to run. The chat UI with side-by-side comparison makes it trivial to run the same prompt against several models and watch their answers appear in parallel. For one-shot “just run a chat against this model” workflows, Ollama is faster; for production serving with concurrent users, vLLM is a different category of tool. LM Studio sits in the middle, intentionally.

Step 1. Install

Download the installer from lmstudio.ai. Builds exist for macOS (Apple Silicon and Intel), Windows, and Linux (AppImage). The free tier covers personal use and most business scenarios; check the license page if you plan a large internal deployment.

Step 2. Pick a first model

Open the app, head to the Discover tab, and search Hugging Face from inside LM Studio. A few sensible defaults to start with:

Llama 3.1 8B Instruct for general chat on machines with at least 16 GB of memory.
Qwen 2.5 14B Instruct if you have 24 GB or more and want noticeably better reasoning.
Phi-5 7B for fast inference on lighter hardware.

For each model, LM Studio lists the available GGUF quantizations from community uploaders. Look for files tagged Q4_K_M as a starting point and check the “Compatible with your hardware” indicator before downloading.

Step 3. Configure the chat

Open the Chat tab and load your downloaded model. Three knobs to know:

GPU offload layers. Setting it as high as your VRAM allows is almost always right. LM Studio shows an estimate of how much memory each setting will use.
Context length. Larger contexts use more memory. Start at the model's training length, lower it if you are tight on VRAM.
System prompt. Leave it empty unless you have a reason to constrain the model; many UI templates apply their own system prompts that interact badly with custom ones.

Step 4. Run side-by-side comparisons

The Multi-Model session lets you load two or three models and send the same prompt to all of them. This is where LM Studio earns its keep. Set up a prompt set that represents the kind of work you actually do (a coding task, a summarization, a reasoning question), then watch the answers stream in parallel. Decisions made this way tend to hold up better than benchmark numbers from leaderboards.

A useful evaluation kit, kept small on purpose:

One factual question that has a wrong-sounding correct answer.
One short coding task with a tricky edge case.
One summarization of a passage longer than 1,500 words.
One follow-up question that tests whether the model retained the prior turn.
One refusal-test prompt to see how each model handles boundaries.

Step 5. Start the local server

Head to the Developer tab, load a model, and click Start Server. LM Studio exposes an OpenAI-compatible API on http://localhost:1234/v1. Any client that lets you override the API base URL works against it. Toggle CORS in the server settings if you plan to call it from a browser.

# Test it from the terminal
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "loaded-model-id",
    "messages": [{"role": "user", "content": "Hello in 5 words."}]
  }'

Tips that save time

Store models on the largest drive you have. LM Studio has a setting for the model directory; pointing it at an external SSD avoids filling the boot disk.
Use the “Estimate” column in the model browser. The numbers are usually accurate within 10% on Apple Silicon and within 15% on NVIDIA.
For coding, set the chat template explicitly. LM Studio auto-detects it most of the time, but a wrong template silently degrades quality and is hard to debug.

When to graduate from LM Studio

Two natural exits. If you settle on a single model and want it permanently available with a small footprint, move it to Ollama. If you want maximum speed on Apple Silicon or fine quantization control, build llama.cpp from source. LM Studio is a good evaluation environment; it is not the best long-term home for either single-user productivity or multi-user serving.