Everything MLX Studio offers — a beautiful chat experience, powerful agentic coding tools, and seamless model management.
A polished streaming chat interface built for on-device AI. Every interaction feels fast, responsive, and native — designed from the ground up around Apple Silicon inference with real-time token rendering.
Real-time token streaming with smooth rendering. Conversations feel instant and responsive.
When models use tools, expandable pills show arguments and results in real time with live status indicators (executing → generating → complete).
Collapsible thinking sections for reasoning models like DeepSeek R1, Qwen 3, and GLM-4.7. Enable with
enable_thinking and control depth with reasoning_effort.
Paste or drag-and-drop images into chat. Inline display with click-to-zoom for vision models like Qwen VL and LLaVA.
The only on-device AI app with native MCP (Model Context Protocol) integration. Models can autonomously read, write, search, and execute to complete multi-step coding tasks.
| Category | Tools | Description |
|---|---|---|
| File I/O | read_file write_file edit_file list_dir copy_file move_file delete_file | Read, write, edit, copy, move, and delete files. List directory contents. |
| Code Search | grep glob | Search file contents with regex patterns. Find files by glob patterns. |
| Shell | execute_command | Run any shell command with configurable working directory. |
| Web Search | duckduckgo_search brave_search | Search the web. DuckDuckGo is free; Brave requires an API key. |
| URL Fetch | fetch_url | Fetch and read any URL content. |
| Git | git_status git_diff git_log git_show | Check repository status, view diffs, browse history. |
| Utilities | clipboard_read clipboard_write current_datetime | Read/write system clipboard. Get current date, time, timezone. |
Configurable working directory, max tool iterations per turn, tool-choice modes
(auto, required, none), and multi-step agentic workflows where
models chain tool calls autonomously until the task is complete.
Built-in text-to-speech playback button on every assistant message. Uses native macOS speech synthesis — no external API needed.
Attach images via paste or drag-and-drop. Vision models (Qwen VL, LLaVA) analyze visual content locally. Inline display with click-to-zoom.
Auto-detected from config.json. DeepSeek, Llama, Qwen, Gemma, Mistral, Phi, Mamba/SSM hybrids, and more.
Automatic format detection across model families. No manual configuration needed.
Proper reasoning separation with enable_thinking and reasoning_effort
controls.
Search, browse, and one-click download any MLX model directly in the app.
Connect to OpenAI, Anthropic, Groq, or any OpenAI-compatible API endpoint. Use MLX Studio's full agentic toolkit with cloud models — the tools run locally regardless of where the model is hosted.
Remote endpoints work on macOS 14+. Local inference requires macOS 14.5+ (Apple Silicon).
MLX Studio exposes 7 API endpoints at localhost:8000 — the most complete of any local MLX app.
/v1/chat/completions — Standard streaming chat/v1/responses — OpenAI Responses API format/v1/completions — Text completions/v1/embeddings — Vector embeddings (separate model)/v1/mcp/tools — MCP tool listing and execution/v1/audio/* — Text-to-speech and speech-to-textCancel endpoint — Abort running requestsPlus: API key authentication, reasoning separation (enable_thinking,
reasoning_effort), and served_model_name alias.
More unified memory = larger models. 16 GB handles up to ~20B parameters, 32 GB handles ~35B, 64 GB handles ~70B, and 192 GB handles 400B+ MoE models.