vMLX
vMLX for macOS
Version 1.2.1 · Apple Silicon (arm64) · 361 MB
Download from GitHub
downloads
Code-signed & notarized Developer ID: ShieldStack LLC macOS 26+ (Tahoe)

Installation

Open the DMG

Double-click vMLX-1.2.1-arm64.dmg to mount it. The app is code-signed and notarized by Apple — no Gatekeeper warnings.

Drag to Applications

Drag the vMLX icon to the Applications folder. Close the DMG window and eject the disk image.

Launch vMLX

Open vMLX from Applications or Spotlight. On first launch, it will offer to install the MLX inference engine with one click.

Pick a model

Search and download any MLX model from HuggingFace directly in the app — or point it at models you already have. We publish optimized models at huggingface.co/dealignai.

Start chatting

Create a session, hit Start, and you're running a local AI inference server with prefix caching, paged KV, voice, vision, and 20+ agentic tools. An OpenAI-compatible API is available at http://127.0.0.1:8000.

Requirements

Platform
macOS 26+ (Tahoe)
Remote endpoints available on macOS 14+
Chip
Apple Silicon (M1 or later)
RAM (minimum)
8 GB unified memory
RAM (recommended)
16 GB+ (for 7B–20B models)

More unified memory = larger models. 16 GB handles up to ~20B parameters, 32 GB handles ~35B, 64 GB handles ~70B, and 192 GB handles 400B+ MoE models. vMLX's KV cache quantization (q4/q8) lets you push these limits further.

What's included

vMLX is a single self-contained app. The DMG includes everything you need — no Python, pip, Docker, or command-line setup required. On first launch, it automatically installs the MLX inference engine into a bundled environment.

Features: multi-model sessions, streaming chat, 5-layer caching (prefix + paged KV + q4/q8 quantization + batching + disk), Mamba/SSM support, 50+ auto-detected architectures, 14 tool call parsers, 4 reasoning parsers, 20+ agentic tools (file, shell, git, web search, browser), voice chat, vision/multimodal, embeddings, benchmarks, and an OpenAI-compatible API.

Report a Bug