Everything MLX Studio offers — a beautiful chat experience, powerful agentic coding tools, and seamless model management.
A polished streaming chat interface built for local AI. Every interaction feels fast, responsive, and native — designed from the ground up around Apple Silicon inference with real-time token rendering.
Real-time token streaming with smooth rendering. Conversations feel instant and responsive.
When models use tools, expandable pills show arguments and results in real time with live status indicators (executing → generating → complete).
Collapsible thinking sections for reasoning models like DeepSeek R1, Qwen 3, and GLM-4.7. Enable with
enable_thinking and control depth with reasoning_effort.
Paste or drag-and-drop images into chat. Inline display with click-to-zoom for vision models like Qwen VL and LLaVA.
The only local AI app with native MCP (Model Context Protocol) integration. Models can autonomously read, write, search, and execute to complete multi-step coding tasks.
| Category | Tools | Description |
|---|---|---|
| File I/O | read_file write_file edit_file list_dir copy_file move_file delete_file | Read, write, edit, copy, move, and delete files. List directory contents. |
| Code Search | grep glob | Search file contents with regex patterns. Find files by glob patterns. |
| Shell | execute_command | Run any shell command with configurable working directory. |
| Web Search | duckduckgo_search brave_search | Search the web. DuckDuckGo is free; Brave requires an API key. |
| URL Fetch | fetch_url | Fetch and read any URL content. |
| Git | git_status git_diff git_log git_show | Check repository status, view diffs, browse history. |
| Utilities | clipboard_read clipboard_write current_datetime | Read/write system clipboard. Get current date, time, timezone. |
Configurable working directory, max tool iterations per turn, tool-choice modes
(auto, required, none), and multi-step agentic workflows where
models chain tool calls autonomously until the task is complete.
Built-in text-to-speech playback button on every assistant message. Uses native macOS speech synthesis — no external API needed.
Attach images via paste or drag-and-drop. Vision models (Qwen VL, LLaVA) analyze visual content locally. Inline display with click-to-zoom.
Auto-detected from config.json. DeepSeek, Llama, Qwen, Gemma, Mistral, Phi, Mamba/SSM hybrids, and more.
Automatic format detection across model families. No manual configuration needed.
Proper reasoning separation with enable_thinking and reasoning_effort
controls.
Search, browse, and one-click download any MLX model directly in the app.
Connect to OpenAI, Anthropic, Groq, or any OpenAI-compatible API endpoint. Use MLX Studio's full agentic toolkit with cloud models — the tools run locally regardless of where the model is hosted.
Remote endpoints work on macOS 14+. Local inference requires macOS 26+ (Tahoe).
MLX Studio exposes 7 API endpoints at localhost:8000 — the most complete of any local MLX app.
/v1/chat/completions — Standard streaming chat/v1/responses — OpenAI Responses API format/v1/completions — Text completions/v1/embeddings — Vector embeddings (separate model)/v1/mcp/tools — MCP tool listing and execution/v1/audio/* — Text-to-speech and speech-to-textCancel endpoint — Abort running requestsPlus: API key authentication, reasoning separation (enable_thinking,
reasoning_effort), and served_model_name alias.
More unified memory = larger models. 16 GB handles up to ~20B parameters, 32 GB handles ~35B, 64 GB handles ~70B, and 192 GB handles 400B+ MoE models.