Cache and runtime
Two things live outside the sheet so the sheet stays portable: the cache
and the runtime venv. Both are addressed by the contract’s id.
The cache
<user-cache>/folio/<sheet-id>/cache/└── <hash[0:2]>/<hash>.jsonEach cached derivation output is one file, addressed by the full
SHA-256 input_hash and sharded into 256 sub-directories by the first two
hex characters. Sharding keeps individual directories small (~1k files at
most for a 256k-entry cache).
Each file is a small JSON envelope:
{ "values": {"country_code": "JP"}, "cost_usd": null}Folio reads the envelope on cache hit and writes it on cache miss. The file format is internal — don’t depend on it from outside Folio.
How input_hash is composed
The hash is over a canonical-JSON encoding (RFC 8785) of a deterministic structure built per kind:
- All kinds: the canonical JSON of every value listed in
inputs. - All kinds: the SHA-256 of the derivation file itself.
- All kinds: when
inputsis empty, the calling record’s primary key value (so derivations with no inputs still produce per-record hashes).
Plus, kind-specific extras:
| Kind | Extra hashed input |
|---|---|
ai | the resolved prompt_body (after {{ field }} substitution) |
import | the SHA-256 of the source file’s bytes |
python | the SHA-256 of the script file’s bytes |
sql | (none beyond the derivation file) |
http | the resolved URL and body (after substitution) |
cross_sheet | the SHA-256 of the foreign sheet’s records.jsonl |
The complete recipe is in
src/folio/_cache.py:compute_input_hash.
When entries are written
On every successful execution of a kind:
result = execute_kind(...)write_cache(cache_root, input_hash, {"values": result, "cost_usd": cost})update_records_jsonl(...)append_provenance(...)Folio writes the cache before updating records and before appending provenance. If the records or provenance write fails, the cache is still warm — re-running materialize cache-hits and re-attempts the records/provenance step.
When entries are read
Cache reads happen at the start of each derivation × record iteration:
existing = read_cache(cache_root, input_hash)if existing is not None and not force: skipped += len(targets) continueA cache hit avoids the kind execution entirely — no script subprocess, no HTTP call, no AI request, no foreign-sheet read.
When entries become stale
They don’t, automatically. The cache is content-addressed: an entry is stale only when the inputs that hashed to its key no longer hash to it. Typical triggers:
- Edit the input field’s value → hash flips → next materialize misses.
- Edit the derivation file → hash flips → next materialize misses.
- Edit the script (for
python) or source file (forimport) → hash flips. - Edit the foreign sheet’s
records.jsonl(forcross_sheet) → hash flips for every calling record.
If you want to discard the cache, delete the directory:
rm -rf "$(python -c 'import platformdirs; print(platformdirs.user_cache_dir("folio"))')/<sheet-id>/cache"folio materialize --force ignores cache hits without deleting them, which
is usually what you want.
The runtime venv
Folio creates a per-sheet venv when scripts/requirements.txt exists.
<user-cache>/folio/<sheet-id>/runtime/└── venv/ ├── bin/python └── lib/python3.13/site-packages/...The venv is created on the first script execution and reused on subsequent runs. Folio:
- Resolves
<user-cache>/folio/<sheet-id>/runtime/venv/bin/python. - If absent, runs
python -m venv <path>then<path>/bin/pip install -r scripts/requirements.txt. - Invokes scripts via that interpreter.
Bash scripts (.sh) don’t go through the venv — they execute under the
host shell.
If you change scripts/requirements.txt, delete the venv directory to
force a clean install:
rm -rf "<user-cache>/folio/<sheet-id>/runtime/venv"Folio doesn’t ship requirements.txt-hashing for venv invalidation yet
(FOLIO-H-006 is a placeholder). Removing the directory
is the supported workaround.
Why outside the sheet
This is ADR-0008. Two reasons:
- Portability. The sheet stays
tar-portable. - Per-environment correctness. The cache and venv depend on the host
Python, the host filesystem, and (for
ai) the host’s API access. Baking them into the sheet would tie the sheet to one environment.
The trade-off is that two checkouts of the same sheet on the same machine
share a cache and venv (because they share <sheet-id>). This is usually
what you want; if you really want isolated caches, change the contract’s
id (and accept that the cache is now per-fork).