cross_sheet derivation
cross sheet joins a sheet to a sibling sheet on the calling sheet’s primary key. Use it when two sheets share an identifier 1:1 — the canonical pattern is a financial sidecar:
customers/ customer-revenue/├── contract.yaml ──→ ├── contract.yaml├── records.jsonl ├── records.jsonl (id, revenue_usd, ...)└── derivations/ └── revenue.yaml ◀── reads from ../customer-revenueMinimal example
targets: [current_revenue_usd]inputs: [] # join is by PK, no other inputskind: cross_sheetsource_sheet: ../customer-revenue # path relative to this sheetkey_field: id # name of the field in the foreign sheetvalue_field: revenue_usd # name of the field whose value we copyFor each customers record (e.g. cust_001), Folio:
- Loads
../customer-revenue/records.jsonl. - Looks for a row where
<key_field> == cust_001. - Copies
value_fieldtocurrent_revenue_usd.
Fields
| Field | Required | Notes |
|---|---|---|
targets | yes | Field(s) this derivation writes. |
inputs | yes | Often []; the join is by PK. |
kind | yes | Always cross_sheet. |
source_sheet | yes | Path to the foreign sheet, relative to this sheet’s root. |
key_field | yes | Field on the foreign sheet whose value should equal the calling sheet’s primary key. |
value_field | one of | Used when writing one target. |
value_fields | one of | {target: foreign_field} map for multiple targets. |
value_field and value_fields are mutually exclusive.
What happens when no match is found
Folio writes nothing for that record and reports nothing on the envelope —
“no match” is an expected case for cross_sheet. The field stays null.
If you want a failure instead, add a python derivation that asserts the
field is non-null after cross_sheet runs.
Multi-target
targets: [current_revenue_usd, contract_value_usd, finance_as_of]inputs: []kind: cross_sheetsource_sheet: ../customer-revenuekey_field: idvalue_fields: current_revenue_usd: revenue_usd contract_value_usd: contract_value_usd finance_as_of: as_ofAll three fields stay consistent: they share one input_hash, so they
either all update or all stay cached.
The semantics rule
cross_sheet matches the foreign sheet’s <key_field> against the calling
sheet’s primary key. It does not match against an arbitrary input.
If you need to look up by a non-PK field — e.g. customers.industry_name → industry-master.id — Folio’s built-in cross_sheet does not fit. Use
import instead, or write a python / sql derivation that does the
lookup explicitly.
Cache invariants
input_hash for cross_sheet includes:
- the canonical JSON of every input value (often empty),
- the calling sheet’s primary key value,
- the SHA-256 of the foreign sheet’s
records.jsonl, - the SHA-256 of the derivation file.
That last one matters: edit customer-revenue/records.jsonl and every
record’s hash on the calling side changes, so next materialize re-runs the
join for everything. This is the right behaviour when you don’t know which
foreign rows changed.
If your foreign sheet is large and changes constantly, prefer import from
a snapshot — import hashes only the source file’s bytes (still per-record,
but at least you control when you re-export).
Why not allow lookup by an input field?
Two reasons. First, the cache becomes much more expensive: each record
needs its own (input_hash, foreign records hash) pair instead of a single
join. Second, the natural use case — “join by matching field” — is what
import does, and a cross_sheet that does this would duplicate import
without the file-as-source-of-truth property.
The current design is the simplest thing that’s useful. Future versions may add an “indirect key” form; this isn’t on the near-term roadmap.
Worked example
customers/contract.yaml (excerpt):
- name: id logicalType: string primaryKey: true required: true- name: current_revenue_usd logicalType: number x-derived: true x-inputs: []customer-revenue/records.jsonl:
{"id": "cust_001", "revenue_usd": 12500000, "as_of": "2026-04-30"}{"id": "cust_002", "revenue_usd": 88000000, "as_of": "2026-04-30"}customers/derivations/revenue.yaml:
targets: [current_revenue_usd]inputs: []kind: cross_sheetsource_sheet: ../customer-revenuekey_field: idvalue_field: revenue_usdMaterialize customers:
$ folio materialize ./customers --actor agent:demo{"materialized": 2, "skipped": 0, "failures": [], "total_cost": 0.0}
$ folio query ./customers \ "SELECT id, current_revenue_usd FROM records ORDER BY id"[{"id":"cust_001","current_revenue_usd":12500000.0}, {"id":"cust_002","current_revenue_usd":88000000.0}]Update customer-revenue/records.jsonl and re-materialize: every customer
re-joins because the foreign-records hash flipped.
When to choose cross_sheet
- Two Folio sheets, same primary key, 1:1 relationship.
- You want the foreign sheet’s contract enforced on the source side.
- The foreign sheet has its own provenance, version, and lifecycle.
For one-shot CSV imports or non-Folio sources, import
is the right kind.