RunLocal

Picker · Hardware-aware recommender

Which AI model can your computer actually run?

Different AI models need different amounts of memory. A small model fits on a phone, a frontier-grade model needs a workstation. Tell us what hardware you have and what you want to do, and the tool will suggest the best options that will actually run on your machine. The form updates as you type. Nothing is sent to a server.

How do I find my specs?

On a Mac: click the Apple menu → About This Mac. The number next to "Memory" is your unified memory. Pick "Apple Silicon" in the form below.

On Windows with an NVIDIA GPU: open Task Manager → Performance → GPU. The number next to "Dedicated GPU memory" is your VRAM. Pick "NVIDIA GPU" in the form below.

On Windows or Linux without a discrete GPU: pick "CPU only" and enter your system RAM. AI will run slowly, but it will run.

Not sure which terms apply? Open the glossary in a new tab.

Find it under Apple menu → About This Mac → Memory. Common configurations: 8, 16, 24, 32, 48, 64, 96, 128, 192 GB.

Available memory for the model: 26.0 GB

Computed from your specs minus a reasonable system overhead. Models that exceed this with a 15% safety margin are excluded from the recommendations.

Recommended models

6 options ranked by use-case fit and headroom.

#1Alibaba · China

Qwen 3.5 14B Instruct

Apache 2.0 · 2025

Score
90/100
Quantization
Q8_0
Memory fit
16.0 GB / 26.0 GB
Context
128k tokens
Moderate (~20–50 tok/s)
ollama run qwen3.5:14b-instruct-q8_0
#2Google · United States

Gemma 4 9B Instruct

Gemma Terms of Use · 2026

Score
90/100
Quantization
Q8_0
Memory fit
10.0 GB / 26.0 GB
Context
128k tokens
Fast (~50+ tok/s on a single user)

Strong Apple Silicon performance via MLX. Long context (128k) makes it the better choice over Gemma 2 9B.

ollama run gemma4:9b-instruct-q8_0
#3Google · United States

Gemma 4 27B Instruct

Gemma Terms of Use · 2026

Score
89/100
Quantization
Q5_K_M
Memory fit
20.0 GB / 26.0 GB
Context
128k tokens
Moderate (~20–50 tok/s)

Excellent general-purpose model at workstation scale. 128k context, MLX-friendly on Apple Silicon.

ollama run gemma4:27b-instruct-q5_k_m
#4Microsoft · United States

Phi-4 14B

MIT · 2025

Score
85/100
Quantization
Q8_0
Memory fit
16.0 GB / 26.0 GB
Context
16k tokens
Moderate (~20–50 tok/s)

Strong reasoning per parameter, weak long-context (training ctx ~16k).

ollama run phi4:14b-q8_0
#5Alibaba · China

Qwen 3.5 32B Instruct

Apache 2.0 · 2025

Score
84/100
Quantization
Q4_K_M
Memory fit
20.0 GB / 26.0 GB
Context
128k tokens
Moderate (~20–50 tok/s)
ollama run qwen3.5:32b-instruct-q4_k_m
#6Microsoft · United States

Phi-4 3.8B Mini

MIT · 2025

Score
80/100
Quantization
Q8_0
Memory fit
5.5 GB / 26.0 GB
Context
128k tokens
Fast (~50+ tok/s on a single user)
ollama run phi4-mini:3.8b-q8_0
4 models excluded
  • Llama 3.3 70B Instruct Smallest quant (42 GB) exceeds available memory (26.0 GB).
  • Mistral Medium 3.5 ~70B class Smallest quant (42 GB) exceeds available memory (26.0 GB).
  • Llama 4 Scout (109B MoE, 17B active) Smallest quant (65 GB) exceeds available memory (26.0 GB).
  • DeepSeek V4 Flash (MoE, distilled) Smallest quant (54 GB) exceeds available memory (26.0 GB).

How the tool decides

Every model in the catalog is paired with realistic memory estimates per quantization (Q4_K_M, Q5_K_M, Q8_0) at a moderate 8k context. The recommender computes your usable memory by subtracting a small system overhead (six gigabytes on Apple Silicon, two gigabytes on a discrete GPU, four gigabytes on CPU-only setups), then requires the chosen model to fit with a fifteen percent safety margin. Anything that does not fit lands in the excluded list below the results, with the reason printed out. The ranking that follows weights use-case fit most heavily, then quantization quality, then recency of the release, with a modest bonus for models that leave breathing room rather than filling the memory to the brim.

Memory estimates are rounded for clarity. Actual usage depends on context length, batch size, and which inference engine you run. If a model is on the edge of fitting, give it a try at a smaller context first.