Features

Everything MLX Studio offers — a beautiful chat experience, powerful agentic coding tools, and seamless model management.

Beautiful Chat UI

A polished streaming chat interface built for local AI. Every interaction feels fast, responsive, and native — designed from the ground up around Apple Silicon inference with real-time token rendering.

Multi-Turn Streaming

Real-time token streaming with smooth rendering. Conversations feel instant and responsive.

Inline Tool Call Pills

When models use tools, expandable pills show arguments and results in real time with live status indicators (executing → generating → complete).

Reasoning Blocks

Collapsible thinking sections for reasoning models like DeepSeek R1, Qwen 3, and GLM-4.7. Enable with enable_thinking and control depth with reasoning_effort.

Image Previews

Paste or drag-and-drop images into chat. Inline display with click-to-zoom for vision models like Qwen VL and LLaVA.

20+ Agentic Coding Tools

The only local AI app with native MCP (Model Context Protocol) integration. Models can autonomously read, write, search, and execute to complete multi-step coding tasks.

Category Tools Description
File I/O read_file write_file edit_file list_dir copy_file move_file delete_file Read, write, edit, copy, move, and delete files. List directory contents.
Code Search grep glob Search file contents with regex patterns. Find files by glob patterns.
Shell execute_command Run any shell command with configurable working directory.
Web Search duckduckgo_search brave_search Search the web. DuckDuckGo is free; Brave requires an API key.
URL Fetch fetch_url Fetch and read any URL content.
Git git_status git_diff git_log git_show Check repository status, view diffs, browse history.
Utilities clipboard_read clipboard_write current_datetime Read/write system clipboard. Get current date, time, timezone.

Configuration

Configurable working directory, max tool iterations per turn, tool-choice modes (auto, required, none), and multi-step agentic workflows where models chain tool calls autonomously until the task is complete.

Voice & Vision

Voice Chat

Built-in text-to-speech playback button on every assistant message. Uses native macOS speech synthesis — no external API needed.

Vision / Multimodal

Attach images via paste or drag-and-drop. Vision models (Qwen VL, LLaVA) analyze visual content locally. Inline display with click-to-zoom.

Model Support

50+ Architectures

Auto-detected from config.json. DeepSeek, Llama, Qwen, Gemma, Mistral, Phi, Mamba/SSM hybrids, and more.

14 Tool Call Parsers

Automatic format detection across model families. No manual configuration needed.

4 Reasoning Parsers

Proper reasoning separation with enable_thinking and reasoning_effort controls.

HuggingFace Browser

Search, browse, and one-click download any MLX model directly in the app.

Remote Endpoints

Connect to OpenAI, Anthropic, Groq, or any OpenAI-compatible API endpoint. Use MLX Studio's full agentic toolkit with cloud models — the tools run locally regardless of where the model is hosted.

Remote endpoints work on macOS 14+. Local inference requires macOS 26+ (Tahoe).

OpenAI-Compatible API

MLX Studio exposes 7 API endpoints at localhost:8000 — the most complete of any local MLX app.

  • /v1/chat/completions — Standard streaming chat
  • /v1/responses — OpenAI Responses API format
  • /v1/completions — Text completions
  • /v1/embeddings — Vector embeddings (separate model)
  • /v1/mcp/tools — MCP tool listing and execution
  • /v1/audio/* — Text-to-speech and speech-to-text
  • Cancel endpoint — Abort running requests

Plus: API key authentication, reasoning separation (enable_thinking, reasoning_effort), and served_model_name alias.

System Requirements

Platform
macOS 26+ (Tahoe)
Remote endpoints available on macOS 14+
Chip
Apple Silicon (M1 or later)
RAM (minimum)
8 GB unified memory
RAM (recommended)
16 GB+ (for 7B–20B models)

More unified memory = larger models. 16 GB handles up to ~20B parameters, 32 GB handles ~35B, 64 GB handles ~70B, and 192 GB handles 400B+ MoE models.