Skip to content

Architecture

Folio’s architecture follows one rule: the sheet is the system of record. Everything else — caches, runtimes, the Viewer, the MCP server — is derived or transient.

What’s inside a sheet

my-sheet/
├── contract.yaml # ODCS subset, declarative schema
├── records.jsonl # data
├── derivations/ # rules that fill x-derived fields
│ └── *.yaml
├── scripts/ # reusable Python/bash helpers
│ ├── *.py
│ └── requirements.txt # optional: triggers a per-sheet venv
├── provenance.jsonl # append-only audit log (Folio writes this)
├── README.md # optional, with typed frontmatter
└── .lock # single-writer lock (filelock; Folio manages it)

Everything in this list is part of the sheet. All of it ships in the tarball. Two specific things you might expect to be here are deliberately not:

  • No cache files. The cache lives at <user-cache>/folio/<sheet-id>/cache/. See ADR-0008.
  • No virtualenv. When scripts/requirements.txt exists, Folio creates a venv at <user-cache>/folio/<sheet-id>/runtime/venv/. The sheet stays clean.

The four surfaces

+---------------------+
| contract.yaml |
| records.jsonl | sheet (1 directory)
| derivations/ |
| provenance.jsonl |
+---------------------+
│ same SDK, same files
┌────────────────┬────────┼────────┬─────────────────┐
│ │ │ │
┌────────┐ ┌─────────┐ ┌──────────┐ ┌────────────┐
│ CLI │ │ Python │ │ MCP │ │ Viewer │
│ folio │ │ SDK │ │ folio- │ │ folio- │
│ │ │ (folio) │ │ mcp │ │ viewer │
└────────┘ └─────────┘ └──────────┘ └────────────┘
shell flows application agents through humans through
code & tests MCP runtimes 127.0.0.1 only

The Python SDK is the only place that touches files directly. The other three surfaces import the SDK and project it onto their transport:

  • CLI — Typer commands per verb, JSON to stdout.
  • MCP — FastMCP tools, structured payloads.
  • Viewer — FastAPI routes per Sheet method, plus an in-process EventBus that streams materialize lifecycle frames over SSE.

Single-writer semantics

A sheet is single-writer through a .lock file (filelock, 30 s timeout, see ADR-0006). All write operations — upsert_records, delete_records, materialize — acquire the lock before reading records.jsonl, write atomically (temp file + rename), then release.

Reads do not take the lock. Reads see whichever atomic version was committed most recently.

Materialize loop

┌────────────────────────────────────┐
│ derivations/* (topologically sorted) │
└────────────────────────────────────┘
for each derivation file:
for each target record:
┌────────────────────┼────────────────────┐
│ │ │
input_hash = cache hit? execute kind
sha256(canonical (yes → skip) (ai/import/...)
JSON of inputs) │ │
▼ ▼
update record
append provenance line
cache result

Failures are reported per record × field on the §10.6 envelope, not raised — so a 5,000-row materialize doesn’t abort on one bad row.

Where things live (cheat sheet)

ItemLocationWhy
Contract, records, derivations, scriptsinside the sheetthe sheet is the system of record
Provenanceinside the sheettamper-evident; ships with the data
Cache (per derivation × input_hash)<user-cache>/folio/<sheet-id>/cache/not deterministic; recoverable; ADR‑0008
Per-sheet runtime venv<user-cache>/folio/<sheet-id>/runtime/environment-dependent; ADR‑0008
Lockinside the sheet (.lock)tied to the writer; cleared between processes

If you can’t redo it, it’s in the sheet. If you can recompute it, it’s not.