The .orchid format.
Notebooks are plain YAML files — readable, diffable, and the product as much as the IDE is.
The shape of a file
An .orchid file has three top-level keys: a version, a metadata block, and an ordered list of blocks. Each block is a typed cell — code, sql, markdown, or chart — with an id, a source, and (optionally) the last output it produced.
orchid: '1.0'
metadata:
title: Monthly revenue review
created: '2026-05-14T09:12:00Z'
modified: '2026-05-14T14:38:00Z'
python_version: '3.11'
blocks:
- id: m1
type: markdown
source: "# Q1 revenue\n\nWeekly trend by channel."
- id: q1
type: sql
integration: warehouse
output_variable: rev_by_week
source: |-
SELECT date_trunc('week', created_at) AS w, channel, sum(amount) AS rev
FROM orders
WHERE created_at >= '2026-01-01'
GROUP BY 1, 2 ORDER BY 1;
- id: c1
type: chart
library: plotly
source: |-
import plotly.express as px
fig = px.line(rev_by_week, x='w', y='rev', color='channel')
figThat's the whole format. There's no hidden binary, no auto-generated IDs that change on every save, no execution_count counters drifting every time you re-run a cell. A diff of two versions of this file shows you exactly what changed in the analysis.
Why YAML, not JSON
Jupyter chose JSON in 2011 and analysts have been paying for it ever since. JSON cell sources are escaped strings — newlines as \n, quotes as \". Code embedded that way is readable to a parser, not to a person; git diffs are noisy; merge conflicts are unfixable by hand.
YAML is human-shaped for the things we care about — multi-line strings with the | block scalar, dictionaries that look like config, ordered lists that look like ordered lists. The trade-off is that YAML is a more dangerous format in the abstract (the famous Norway problem, indentation gotchas), but Orchid never hand-parses untrusted YAML — we use a strict loader and only the documented keys.
Some tools encode notebooks as Python with cell markers (Marimo, Papermill style). That's great for pure code, but Orchid blends SQL, markdown, chart configs, and Python — you'd end up encoding everything as Python strings, which is JSON's problem in a different costume. YAML lets each cell type have its natural shape.
How outputs work
Each cell's last output is part of the notebook — that's how reopening a project shows yesterday's results instantly. But outputs can be big. A DataFrame preview is fine inline; a 200 MB result is not.
Small outputs (a number, a short table, a few KB of HTML) inline directly in the YAML. Larger outputs spill to .orchid/outputs/ next to the notebook, and the cell keeps a reference. The YAML file itself stays small and diff-friendly.
# Small output — inlined
- id: q1
type: code
language: python
source: 'df.shape'
outputs:
- type: text
data: '(1247, 8)'
# Large output — spilled to disk (.orchid/outputs/q2.parquet)
- id: q2
type: sql
output_variable: events
source: 'SELECT * FROM events LIMIT 100000;'
outputs:
- type: dataframe
summary: '100000 rows × 8 columns'
variable: events
data:
truncated: true
truncatedAt: 100000The spill directory is gitignored by default. If you want outputs committed alongside source — useful for reviewing analysis on GitHub without re-running anything — un-ignore .orchid/outputs/. The format works either way; that's a project policy decision, not a format decision.
Why git matters
Analytics work is software-shaped now. We write code, we ship it, we review each other's pull requests, we revert when something turns out to be wrong. The tooling for "your repo" — git, GitHub, code review — is the best collaboration system ever built for source.
Cloud notebooks broke that contract. You can't meaningfully diff a Jupyter notebook in a PR. You can't resolve a merge conflict in a cloud notebook. You can't bisect to find when a metric changed. So teams either give up on review (and their analysis quality slowly degrades) or they bolt nbdiff onto their CI and pretend it's fine.
Orchid notebooks just work in git. Two analysts edit two cells; the merge is automatic. Someone refactors a notebook; the diff is readable. The CTO asks why last quarter's revenue chart changed; the history is right there in git log. The format is the thing that makes that possible.
What the format doesn't do
A few deliberate omissions worth naming so you don't go looking for them.
- No execution count. Cells don't track how many times they've run. The number was always lying anyway — it counted since the kernel restart, not since the cell was written — and it made diffs noisy.
- No per-cell metadata bag. Jupyter cells have a free metadata dict that every tool writes garbage into. Orchid cells have typed fields for the things that need to exist; nothing else.
- No collapsed-state in source. Whether a cell is collapsed in your view is in your local IDE state, not the file. Different analysts can collapse different things without fighting.
Where to read next
- Projects on disk — what the rest of the project folder looks like.
- Notebooks guide — the practical side of writing in
.orchidfiles. - Cells & blocks — adding, ordering, and running cells in the IDE.