Architecture
Folio’s architecture follows one rule: the sheet is the system of record. Everything else — caches, runtimes, the Viewer, the MCP server — is derived or transient.
What’s inside a sheet
my-sheet/├── contract.yaml # ODCS subset, declarative schema├── records.jsonl # data├── derivations/ # rules that fill x-derived fields│ └── *.yaml├── scripts/ # reusable Python/bash helpers│ ├── *.py│ └── requirements.txt # optional: triggers a per-sheet venv├── provenance.jsonl # append-only audit log (Folio writes this)├── README.md # optional, with typed frontmatter└── .lock # single-writer lock (filelock; Folio manages it)Everything in this list is part of the sheet. All of it ships in the tarball. Two specific things you might expect to be here are deliberately not:
- No cache files. The cache lives at
<user-cache>/folio/<sheet-id>/cache/. See ADR-0008. - No virtualenv. When
scripts/requirements.txtexists, Folio creates a venv at<user-cache>/folio/<sheet-id>/runtime/venv/. The sheet stays clean.
The four surfaces
+---------------------+ | contract.yaml | | records.jsonl | sheet (1 directory) | derivations/ | | provenance.jsonl | +---------------------+ ▲ │ same SDK, same files ┌────────────────┬────────┼────────┬─────────────────┐ │ │ │ │ ┌────────┐ ┌─────────┐ ┌──────────┐ ┌────────────┐ │ CLI │ │ Python │ │ MCP │ │ Viewer │ │ folio │ │ SDK │ │ folio- │ │ folio- │ │ │ │ (folio) │ │ mcp │ │ viewer │ └────────┘ └─────────┘ └──────────┘ └────────────┘ shell flows application agents through humans through code & tests MCP runtimes 127.0.0.1 onlyThe Python SDK is the only place that touches files directly. The other three surfaces import the SDK and project it onto their transport:
- CLI — Typer commands per verb, JSON to stdout.
- MCP — FastMCP tools, structured payloads.
- Viewer — FastAPI routes per
Sheetmethod, plus an in-process EventBus that streams materialize lifecycle frames over SSE.
Single-writer semantics
A sheet is single-writer through a .lock file (filelock, 30 s timeout, see
ADR-0006). All write operations — upsert_records,
delete_records, materialize — acquire the lock before reading
records.jsonl, write atomically (temp file + rename), then release.
Reads do not take the lock. Reads see whichever atomic version was committed most recently.
Materialize loop
┌────────────────────────────────────┐ │ derivations/* (topologically sorted) │ └────────────────────────────────────┘ │ for each derivation file: │ for each target record: │ ┌────────────────────┼────────────────────┐ │ │ │ input_hash = cache hit? execute kind sha256(canonical (yes → skip) (ai/import/...) JSON of inputs) │ │ ▼ ▼ update record append provenance line cache resultFailures are reported per record × field on the §10.6 envelope,
not raised — so a 5,000-row materialize doesn’t abort on one bad row.
Where things live (cheat sheet)
| Item | Location | Why |
|---|---|---|
| Contract, records, derivations, scripts | inside the sheet | the sheet is the system of record |
| Provenance | inside the sheet | tamper-evident; ships with the data |
| Cache (per derivation × input_hash) | <user-cache>/folio/<sheet-id>/cache/ | not deterministic; recoverable; ADR‑0008 |
| Per-sheet runtime venv | <user-cache>/folio/<sheet-id>/runtime/ | environment-dependent; ADR‑0008 |
| Lock | inside the sheet (.lock) | tied to the writer; cleared between processes |
If you can’t redo it, it’s in the sheet. If you can recompute it, it’s not.