The .orchid format.

Notebooks are plain YAML files — readable, diffable, and the product as much as the IDE is.

The shape of a file

An .orchid file has three top-level keys: a version, a metadata block, and an ordered list of blocks. Each block is a typed cell — code, sql, markdown, or chart — with an id, a source, and (optionally) the last output it produced.

orchid: '1.0'
metadata:
  title: Monthly revenue review
  created: '2026-05-14T09:12:00Z'
  modified: '2026-05-14T14:38:00Z'
  python_version: '3.11'
blocks:
  - id: m1
    type: markdown
    source: "# Q1 revenue\n\nWeekly trend by channel."
  - id: q1
    type: sql
    integration: warehouse
    output_variable: rev_by_week
    source: |-
      SELECT date_trunc('week', created_at) AS w, channel, sum(amount) AS rev
      FROM orders
      WHERE created_at >= '2026-01-01'
      GROUP BY 1, 2 ORDER BY 1;
  - id: c1
    type: chart
    library: plotly
    source: |-
      import plotly.express as px
      fig = px.line(rev_by_week, x='w', y='rev', color='channel')
      fig

That's the whole format. There's no hidden binary, no auto-generated IDs that change on every save, no execution_count counters drifting every time you re-run a cell. A diff of two versions of this file shows you exactly what changed in the analysis.

Why YAML, not JSON

Jupyter chose JSON in 2011 and analysts have been paying for it ever since. JSON cell sources are escaped strings — newlines as \n, quotes as \". Code embedded that way is readable to a parser, not to a person; git diffs are noisy; merge conflicts are unfixable by hand.

YAML is human-shaped for the things we care about — multi-line strings with the | block scalar, dictionaries that look like config, ordered lists that look like ordered lists. The trade-off is that YAML is a more dangerous format in the abstract (the famous Norway problem, indentation gotchas), but Orchid never hand-parses untrusted YAML — we use a strict loader and only the documented keys.

Why not Python files

Some tools encode notebooks as Python with cell markers (Marimo, Papermill style). That's great for pure code, but Orchid blends SQL, markdown, chart configs, and Python — you'd end up encoding everything as Python strings, which is JSON's problem in a different costume. YAML lets each cell type have its natural shape.

How outputs work

Each cell's last output is part of the notebook — that's how reopening a project shows yesterday's results instantly. But outputs can be big. A DataFrame preview is fine inline; a 200 MB result is not.

Small outputs (a number, a short table, a few KB of HTML) inline directly in the YAML. Larger outputs spill to .orchid/outputs/ next to the notebook, and the cell keeps a reference. The YAML file itself stays small and diff-friendly.

# Small output — inlined
- id: q1
  type: code
  language: python
  source: 'df.shape'
  outputs:
    - type: text
      data: '(1247, 8)'

# Large output — spilled to disk (.orchid/outputs/q2.parquet)
- id: q2
  type: sql
  output_variable: events
  source: 'SELECT * FROM events LIMIT 100000;'
  outputs:
    - type: dataframe
      summary: '100000 rows × 8 columns'
      variable: events
      data:
        truncated: true
        truncatedAt: 100000

The spill directory is gitignored by default. If you want outputs committed alongside source — useful for reviewing analysis on GitHub without re-running anything — un-ignore .orchid/outputs/. The format works either way; that's a project policy decision, not a format decision.

File tree showing analysis.orchid as a small YAML file with most of its size in cell sources, alongside .orchid/outputs/ containing parquet files for the heavy results./docs-images/concepts/notebook-on-disk.png
The notebook file itself stays small. Heavy outputs live next to it, optionally versioned.

Why git matters

Analytics work is software-shaped now. We write code, we ship it, we review each other's pull requests, we revert when something turns out to be wrong. The tooling for "your repo" — git, GitHub, code review — is the best collaboration system ever built for source.

Cloud notebooks broke that contract. You can't meaningfully diff a Jupyter notebook in a PR. You can't resolve a merge conflict in a cloud notebook. You can't bisect to find when a metric changed. So teams either give up on review (and their analysis quality slowly degrades) or they bolt nbdiff onto their CI and pretend it's fine.

Orchid notebooks just work in git. Two analysts edit two cells; the merge is automatic. Someone refactors a notebook; the diff is readable. The CTO asks why last quarter's revenue chart changed; the history is right there in git log. The format is the thing that makes that possible.

What the format doesn't do

A few deliberate omissions worth naming so you don't go looking for them.

  • No execution count. Cells don't track how many times they've run. The number was always lying anyway — it counted since the kernel restart, not since the cell was written — and it made diffs noisy.
  • No per-cell metadata bag. Jupyter cells have a free metadata dict that every tool writes garbage into. Orchid cells have typed fields for the things that need to exist; nothing else.
  • No collapsed-state in source. Whether a cell is collapsed in your view is in your local IDE state, not the file. Different analysts can collapse different things without fighting.

Where to read next