MLX Studio Features — Chat UI, Agentic Coding, Voice & Vision

❖Beautiful Chat UI

A polished streaming chat interface built for on-device AI. Every interaction feels fast, responsive, and native — designed from the ground up around Apple Silicon inference with real-time token rendering.

Multi-Turn Streaming

Real-time token streaming with smooth rendering. Conversations feel instant and responsive.

Inline Tool Call Pills

When models use tools, expandable pills show arguments and results in real time with live status indicators (executing → generating → complete).

Reasoning Blocks

Collapsible thinking sections for reasoning models like DeepSeek R1, Qwen 3, and GLM-4.7. Enable with enable_thinking and control depth with reasoning_effort.

Image Previews

Paste or drag-and-drop images into chat. Inline display with click-to-zoom for vision models like Qwen VL and LLaVA.

⚙20+ Agentic Coding Tools

The only on-device AI app with native MCP (Model Context Protocol) integration. Models can autonomously read, write, search, and execute to complete multi-step coding tasks.

Category	Tools	Description
File I/O	read_file write_file edit_file list_dir copy_file move_file delete_file	Read, write, edit, copy, move, and delete files. List directory contents.
Code Search	grep glob	Search file contents with regex patterns. Find files by glob patterns.
Shell	execute_command	Run any shell command with configurable working directory.
Web Search	duckduckgo_search brave_search	Search the web. DuckDuckGo is free; Brave requires an API key.
URL Fetch	fetch_url	Fetch and read any URL content.
Git	git_status git_diff git_log git_show	Check repository status, view diffs, browse history.
Utilities	clipboard_read clipboard_write current_datetime	Read/write system clipboard. Get current date, time, timezone.

Configuration

Configurable working directory, max tool iterations per turn, tool-choice modes (auto, required, none), and multi-step agentic workflows where models chain tool calls autonomously until the task is complete.

◆Voice & Vision

Voice Chat

Built-in text-to-speech playback button on every assistant message. Uses native macOS speech synthesis — no external API needed.

Vision / Multimodal

Attach images via paste or drag-and-drop. Vision models (Qwen VL, LLaVA) analyze visual content locally. Inline display with click-to-zoom.

Model Support

50+ Architectures

Auto-detected from config.json. DeepSeek, Llama, Qwen, Gemma, Mistral, Phi, Mamba/SSM hybrids, and more.

14 Tool Call Parsers

Automatic format detection across model families. No manual configuration needed.

4 Reasoning Parsers

Proper reasoning separation with enable_thinking and reasoning_effort controls.

HuggingFace Browser

Search, browse, and one-click download any MLX model directly in the app.

Remote Endpoints

Connect to OpenAI, Anthropic, Groq, or any OpenAI-compatible API endpoint. Use MLX Studio's full agentic toolkit with cloud models — the tools run locally regardless of where the model is hosted.

Remote endpoints work on macOS 14+. Local inference requires macOS 14.5+ (Apple Silicon).

OpenAI-Compatible API

MLX Studio exposes 7 API endpoints at localhost:8000 — the most complete of any local MLX app.

/v1/chat/completions — Standard streaming chat
/v1/responses — OpenAI Responses API format
/v1/completions — Text completions
/v1/embeddings — Vector embeddings (separate model)
/v1/mcp/tools — MCP tool listing and execution
/v1/audio/* — Text-to-speech and speech-to-text
Cancel endpoint — Abort running requests

Plus: API key authentication, reasoning separation (enable_thinking, reasoning_effort), and served_model_name alias.

System Requirements

Platform

macOS 14.5+ (Apple Silicon)

Remote endpoints available on macOS 14+

Chip

Apple Silicon (M1 or later)

RAM (minimum)

8 GB unified memory

RAM (recommended)

16 GB+ (for 7B–20B models)

More unified memory = larger models. 16 GB handles up to ~20B parameters, 32 GB handles ~35B, 64 GB handles ~70B, and 192 GB handles 400B+ MoE models.