Picker · Hardware-aware recommender

Which AI model can your computer actually run?

Different AI models need different amounts of memory. A small model fits on a phone, a frontier-grade model needs a workstation. Tell us what hardware you have and what you want to do, and the tool will suggest the best options that will actually run on your machine. The form updates as you type. Nothing is sent to a server.

How do I find my specs?

On a Mac: click the Apple menu → About This Mac. The number next to "Memory" is your unified memory. Pick "Apple Silicon" in the form below.

On Windows with an NVIDIA GPU: open Task Manager → Performance → GPU. The number next to "Dedicated GPU memory" is your VRAM. Pick "NVIDIA GPU" in the form below.

On Windows or Linux without a discrete GPU: pick "CPU only" and enter your system RAM. AI will run slowly, but it will run.

Not sure which terms apply? Open the glossary in a new tab.

Hardware platform

Apple Silicon

M1, M2, M3 or M4 Mac with unified memory

NVIDIA GPU

RTX 3000/4000/5000 or workstation cards (CUDA)

AMD GPU

Radeon RX 7000+ or Instinct (ROCm)

Intel GPU

Intel Arc (A and B series, OpenVINO/Vulkan)

CPU only

No discrete GPU; runs on system RAM

Unified memory (GB)

Find it under Apple menu → About This Mac → Memory. Common configurations: 8, 16, 24, 32, 48, 64, 96, 128, 192 GB.

What will you use it for?

General chat

Writing, summaries, everyday assistance

Coding

Code generation, refactor, debug

Long context

Document analysis, RAG over large corpora

Math & reasoning

Chained logic, calculation, proofs

License preference

Available memory for the model: 26.0 GB

Computed from your specs minus a reasonable system overhead. Models that exceed this with a 15% safety margin are excluded from the recommendations.

Recommended models

6 options ranked by use-case fit and headroom.

#1Alibaba · China

Qwen 3.5 14B Instruct

Apache 2.0 · 2025

Score

90/100

Quantization

Q8_0

Memory fit

16.0 GB / 26.0 GB

Context

128k tokens

Moderate (~20–50 tok/s)

ollama run qwen3.5:14b-instruct-q8_0

#2Google · United States

Gemma 4 9B Instruct

Gemma Terms of Use · 2026

Score

90/100

Quantization

Q8_0

Memory fit

10.0 GB / 26.0 GB

Context

128k tokens

Fast (~50+ tok/s on a single user)

Strong Apple Silicon performance via MLX. Long context (128k) makes it the better choice over Gemma 2 9B.

ollama run gemma4:9b-instruct-q8_0

#3Google · United States

Gemma 4 27B Instruct

Gemma Terms of Use · 2026

Score

89/100

Quantization

Q5_K_M

Memory fit

20.0 GB / 26.0 GB

Context

128k tokens

Moderate (~20–50 tok/s)

Excellent general-purpose model at workstation scale. 128k context, MLX-friendly on Apple Silicon.

ollama run gemma4:27b-instruct-q5_k_m

#4Microsoft · United States

Phi-4 14B

MIT · 2025

Score

85/100

Quantization

Q8_0

Memory fit

16.0 GB / 26.0 GB

Context

16k tokens

Moderate (~20–50 tok/s)

Strong reasoning per parameter, weak long-context (training ctx ~16k).

ollama run phi4:14b-q8_0

#5Alibaba · China

Qwen 3.5 32B Instruct

Apache 2.0 · 2025

Score

84/100

Quantization

Q4_K_M

Memory fit

20.0 GB / 26.0 GB

Context

128k tokens

Moderate (~20–50 tok/s)

ollama run qwen3.5:32b-instruct-q4_k_m

#6Microsoft · United States

Phi-4 3.8B Mini

MIT · 2025

Score

80/100

Quantization

Q8_0

Memory fit

5.5 GB / 26.0 GB

Context

128k tokens

Fast (~50+ tok/s on a single user)

ollama run phi4-mini:3.8b-q8_0

4 models excluded

Llama 3.3 70B Instruct — Smallest quant (42 GB) exceeds available memory (26.0 GB).
Mistral Medium 3.5 ~70B class — Smallest quant (42 GB) exceeds available memory (26.0 GB).
Llama 4 Scout (109B MoE, 17B active) — Smallest quant (65 GB) exceeds available memory (26.0 GB).
DeepSeek V4 Flash (MoE, distilled) — Smallest quant (54 GB) exceeds available memory (26.0 GB).

How the tool decides

Every model in the catalog is paired with realistic memory estimates per quantization (Q4_K_M, Q5_K_M, Q8_0) at a moderate 8k context. The recommender computes your usable memory by subtracting a small system overhead (six gigabytes on Apple Silicon, two gigabytes on a discrete GPU, four gigabytes on CPU-only setups), then requires the chosen model to fit with a fifteen percent safety margin. Anything that does not fit lands in the excluded list below the results, with the reason printed out. The ranking that follows weights use-case fit most heavily, then quantization quality, then recency of the release, with a modest bonus for models that leave breathing room rather than filling the memory to the brim.

Memory estimates are rounded for clarity. Actual usage depends on context length, batch size, and which inference engine you run. If a model is on the edge of fitting, give it a try at a smaller context first.