Qwen 3.5 14B Instruct
Apache 2.0 · 2025
ollama run qwen3.5:14b-instruct-q8_0Picker · Hardware-aware recommender
Different AI models need different amounts of memory. A small model fits on a phone, a frontier-grade model needs a workstation. Tell us what hardware you have and what you want to do, and the tool will suggest the best options that will actually run on your machine. The form updates as you type. Nothing is sent to a server.
On a Mac: click the Apple menu → About This Mac. The number next to "Memory" is your unified memory. Pick "Apple Silicon" in the form below.
On Windows with an NVIDIA GPU: open Task Manager → Performance → GPU. The number next to "Dedicated GPU memory" is your VRAM. Pick "NVIDIA GPU" in the form below.
On Windows or Linux without a discrete GPU: pick "CPU only" and enter your system RAM. AI will run slowly, but it will run.
Not sure which terms apply? Open the glossary in a new tab.
Find it under Apple menu → About This Mac → Memory. Common configurations: 8, 16, 24, 32, 48, 64, 96, 128, 192 GB.
Computed from your specs minus a reasonable system overhead. Models that exceed this with a 15% safety margin are excluded from the recommendations.
6 options ranked by use-case fit and headroom.
Apache 2.0 · 2025
ollama run qwen3.5:14b-instruct-q8_0Gemma Terms of Use · 2026
Strong Apple Silicon performance via MLX. Long context (128k) makes it the better choice over Gemma 2 9B.
ollama run gemma4:9b-instruct-q8_0Gemma Terms of Use · 2026
Excellent general-purpose model at workstation scale. 128k context, MLX-friendly on Apple Silicon.
ollama run gemma4:27b-instruct-q5_k_mMIT · 2025
Strong reasoning per parameter, weak long-context (training ctx ~16k).
ollama run phi4:14b-q8_0Apache 2.0 · 2025
ollama run qwen3.5:32b-instruct-q4_k_mMIT · 2025
ollama run phi4-mini:3.8b-q8_0Every model in the catalog is paired with realistic memory estimates per quantization (Q4_K_M, Q5_K_M, Q8_0) at a moderate 8k context. The recommender computes your usable memory by subtracting a small system overhead (six gigabytes on Apple Silicon, two gigabytes on a discrete GPU, four gigabytes on CPU-only setups), then requires the chosen model to fit with a fifteen percent safety margin. Anything that does not fit lands in the excluded list below the results, with the reason printed out. The ranking that follows weights use-case fit most heavily, then quantization quality, then recency of the release, with a modest bonus for models that leave breathing room rather than filling the memory to the brim.
Memory estimates are rounded for clarity. Actual usage depends on context length, batch size, and which inference engine you run. If a model is on the edge of fitting, give it a try at a smaller context first.