Agent architecture.

Orchid's agents are review-first: they propose, you approve.

The starting assumption

Most AI agent systems are designed to maximize how much the agent can do unsupervised. Orchid is designed the other way around — to maximize how much you can trust the work it produces. The two goals share most of their tooling, but they pull on different rails in a few key places, and those places are where the design choices live.

Trust comes from three things being true: you can see what the agent did, you can roll it back, and writes don't happen without your consent. Everything else in this page is in service of those three.

Single agent — the default

One chat thread, one agent, one unified set of tools. Type a request, the agent works through it end to end, and proposed changes show up in the notebook as pending cells you can accept or reject.

Flow diagram: User message → Agent decides what to do → calls tools (schema lookup, SQL runner, Python kernel, file ops) → drafts new cells → cells appear inline with a Review banner → user approves or rejects./docs-images/concepts/single-agent-flow.png

One agent, one thread. Tool calls and proposed cells stream into the conversation as the work happens.

Single mode is the right default because the orchestration overhead of multi-agent — "which specialist should handle this?" — rarely earns its keep on the kinds of tasks analysts actually run. Most requests are a couple of tool calls and a chart. One agent with the full toolbox finishes faster and uses fewer tokens.

Multi-agent — when separation helps

Toggle multi-agent mode from the agent panel and an orchestrator appears in front of four specialists.

Diagram: User message goes to Orchestrator. Orchestrator routes to four specialists arranged around it — Data (SQL, schema), Analyst (Python, stats), Viz (charts, formatting), Report (narrative). Each specialist has its own scoped tools but they all share a single kernel state in the middle./docs-images/concepts/multi-agent-orchestrator.png

The orchestrator routes work to specialists. They share kernel state, so variables defined by one are visible to the next.

Data — schema exploration, joins, SQL drafting. Has access to the SQL runner and schema browser tools; doesn't do Python.
Analyst — Python analysis, statistics, transformations. Knows pandas, numpy, and the kernel; doesn't draft SQL from scratch.
Viz — chart selection, formatting, annotations. Operates on DataFrames already in the kernel.
Report — narrative prose, summaries, executive write-ups. Reads what the others produced and explains it.

Multi-agent earns its keep on longer sessions with clear phase boundaries: pull the data, transform it, plot it, explain it. The separation forces each step to leave a cleaner trace in the conversation, because handoffs are explicit. You can switch modes mid-thread; history carries over.

Why this set

We tried with more specialists and fewer. Four is what stayed — enough to match the natural shape of analytics work, few enough that you don't spend cognitive effort tracking who's holding the baton.

Kernel state is the shared substrate

The specialists don't pass state around as JSON blobs in their messages. They share the kernel — the same Python interpreter that your notebook cells run against. When Data finishes a SQL query and names the result orders_q1, Analyst sees orders_q1 in scope on its next turn. Viz can plot it without anyone re-serializing it.

This is the single biggest reason multi-agent works in practice. Without shared kernel state, each handoff has to drag the data along in the prompt, which is expensive in tokens and lossy in fidelity. A million-row DataFrame doesn't fit in a prompt; a variable name does.

Side effect: when the agent suggests df.head() in chat, you can run it yourself in a code cell and see exactly what the agent saw. There's only one source of truth for what's in memory, and you have direct access to it.

The tool registry

Both modes use the same underlying tools. Each tool is a typed function with a clear schema — name, parameters, return shape. The agent decides when to call which; the IDE executes the call locally and streams the result back into the conversation.

Integration tools — inspect_schema, preview, and query against connected databases.
Kernel tools — execute_code, get_variable, install_package in the project venv.
Shell — run_command for git, ls, cat, and other CLI tasks inside the workspace. Read-only commands run immediately; anything mutating requires approval.
File ops — read_file, create_file, edit_file, delete_file inside the project workspace. Writes are diffed and reviewed.
Notebook + dashboard + document authoring — higher-level tools for creating .orchid and .orchid-dashboard files, embedding charts, and writing reports. See Tools for the full surface.

Every tool call appears in the chat with its inputs and outputs, so the log doubles as an audit trail. Tokens used per call are tallied in the activity tab.

Why review-first

Two reasons, both load-bearing.

First, agents hallucinate. Not as often as they used to, but often enough that "run whatever the model says" is a bad default for analysis work, where a wrong query can corrupt a dashboard or worse, a production table. Reviewing a draft is much faster than diagnosing a quiet failure two weeks later.

Second, analysis is a craft. The agent's job is to do the rote parts — typing the same join clause for the eightieth time, recalling which column has the right grain. Your job is to decide whether the analysis is the right shape. Review-first puts you in that role without making you the bottleneck on the typing.

Why writes pause for approval

Read operations — SELECT, SHOW, file reads, Python introspection — run automatically. Write operations don't. That's any of:

SQL with INSERT, UPDATE, DELETE, DROP, TRUNCATE, or DDL of any kind.
File writes outside the notebook's own output spill — touching data/, modifying a different .orchid file, anywhere on disk.
Cell modifications that change source (as opposed to proposed new cells which always go through the review banner).

When the agent wants to do one of these, it shows you the exact operation — full SQL, full file path, full diff — with one-click Approve or Reject. There is no global "trust this agent" toggle. Each write is its own decision.

Connection-level locks

Connections default to read-only at the connection level too — even an approved write would fail. To enable writes on a connection you flip a setting in the connection profile, with a separate confirmation. Cells on a write-enabled connection show a lock icon so it's visible at a glance.

The action log

Open the activity tab and you see every action the agent took: tool name, inputs, outputs, token cost, timestamp, and which conversation turn triggered it. The log is a real artifact — you can scroll back and ask "why did the dashboard change yesterday?" and get an answer.

Because the IDE is the source of truth and the agent only ever proposes, the log is also a complete record of what would have happened if you'd accepted everything. Nothing the agent does goes around your screen.

Free at launch, BYOK for premium

Free at launch. The default agent uses Google Gemini and works for most analyst tasks — SQL drafting, light analysis, chart selection. For longer-chain reasoning, plug in Anthropic Claude or OpenAI GPT with your own API key (BYOK). Premium model credits are on the roadmap.

Either way, the kernel runs locally. The model provider drafts what to do; your machine does it.

Where to read next

Agents guide — the practical side of using the agent panel.
Local-first — why the kernel running locally matters for the agent story.
Security — the full agent-safety story. We use providers' zero-retention modes where supported.