Architecture

Folio’s architecture follows one rule: the sheet is the system of record. Everything else — caches, runtimes, the Viewer, the MCP server — is derived or transient.

What’s inside a sheet

my-sheet/
├── contract.yaml          # ODCS subset, declarative schema
├── records.jsonl          # data
├── derivations/           # rules that fill x-derived fields
│   └── *.yaml
├── scripts/               # reusable Python/bash helpers
│   ├── *.py
│   └── requirements.txt   # optional: triggers a per-sheet venv
├── provenance.jsonl       # append-only audit log (Folio writes this)
├── README.md              # optional, with typed frontmatter
└── .lock                  # single-writer lock (filelock; Folio manages it)

Everything in this list is part of the sheet. All of it ships in the tarball. Two specific things you might expect to be here are deliberately not:

No cache files. The cache lives at <user-cache>/folio/<sheet-id>/cache/. See ADR-0008.
No virtualenv. When scripts/requirements.txt exists, Folio creates a venv at <user-cache>/folio/<sheet-id>/runtime/venv/. The sheet stays clean.

The four surfaces

                        +---------------------+
                        |   contract.yaml     |
                        |   records.jsonl     |    sheet (1 directory)
                        |   derivations/      |
                        |   provenance.jsonl  |
                        +---------------------+
                                  ▲
                                  │ same SDK, same files
        ┌────────────────┬────────┼────────┬─────────────────┐
        │                │                 │                 │
   ┌────────┐      ┌─────────┐       ┌──────────┐     ┌────────────┐
   │  CLI   │      │ Python  │       │   MCP    │     │   Viewer   │
   │ folio  │      │   SDK   │       │ folio-   │     │  folio-    │
   │        │      │ (folio) │       │  mcp     │     │  viewer    │
   └────────┘      └─────────┘       └──────────┘     └────────────┘
   shell flows     application       agents through   humans through
                   code & tests      MCP runtimes     127.0.0.1 only

The Python SDK is the only place that touches files directly. The other three surfaces import the SDK and project it onto their transport:

CLI — Typer commands per verb, JSON to stdout.
MCP — FastMCP tools, structured payloads.
Viewer — FastAPI routes per Sheet method, plus an in-process EventBus that streams materialize lifecycle frames over SSE.

Single-writer semantics

A sheet is single-writer through a .lock file (filelock, 30 s timeout, see ADR-0006). All write operations — upsert_records, delete_records, materialize — acquire the lock before reading records.jsonl, write atomically (temp file + rename), then release.

Reads do not take the lock. Reads see whichever atomic version was committed most recently.

Materialize loop

                      ┌────────────────────────────────────┐
                      │ derivations/* (topologically sorted) │
                      └────────────────────────────────────┘
                                       │
                       for each derivation file:
                                       │
                       for each target record:
                                       │
                  ┌────────────────────┼────────────────────┐
                  │                    │                    │
            input_hash =          cache hit?            execute kind
            sha256(canonical       (yes → skip)         (ai/import/...)
            JSON of inputs)             │                    │
                                        ▼                    ▼
                                                       update record
                                                  append provenance line
                                                       cache result

Failures are reported per record × field on the §10.6 envelope, not raised — so a 5,000-row materialize doesn’t abort on one bad row.

Where things live (cheat sheet)

Item	Location	Why
Contract, records, derivations, scripts	inside the sheet	the sheet is the system of record
Provenance	inside the sheet	tamper-evident; ships with the data
Cache (per derivation × input_hash)	`<user-cache>/folio/<sheet-id>/cache/`	not deterministic; recoverable; ADR‑0008
Per-sheet runtime venv	`<user-cache>/folio/<sheet-id>/runtime/`	environment-dependent; ADR‑0008
Lock	inside the sheet (`.lock`)	tied to the writer; cleared between processes

If you can’t redo it, it’s in the sheet. If you can recompute it, it’s not.