records.jsonl
records.jsonl is the data. Line-delimited JSON
(JSON Lines), UTF-8, one object per line, no
trailing comma, no enclosing array.
{"id": "cust_001", "company_name": "Acme Manufacturing", "country": "Japan"}{"id": "cust_002", "company_name": "DataFlow", "country": "United States"}That’s it. The format is intentionally trivial so head, grep,
jq -c, and DuckDB’s read_json_auto all work without ceremony
(ADR-0004).
Encoding and line semantics
- Encoding: UTF-8. No BOM.
- Line terminator:
\n. CRLF lines are accepted but Folio rewrites to\non the next atomic write. - Empty lines: ignored.
- Trailing newline: Folio always writes one. Files without a trailing newline are accepted.
Per-record invariants
For each non-empty line:
- Must parse as a JSON object (not a string, not an array).
- Must include the contract’s primary key as a non-null value.
- Required fields (per the contract) must be present and non-null.
- May include extra fields not in the contract — Folio preserves them verbatim across writes. (We do not silently drop unknown columns.)
Violations raise RecordsError (parse failures) or OperationError
(missing required / missing primary key).
Writes are atomic
Sheet.upsert_records and Sheet.materialize write records.jsonl through a
temp file + rename:
records.jsonl.tmp.<pid> ← write the new contentrecords.jsonl ← atomically rename over the old fileA reader holding an open file descriptor on records.jsonl sees a complete
older snapshot. A new reader sees the new file. There is no torn state.
A .lock file in the sheet serializes writers (single-writer, ADR-0006).
Querying via DuckDB
Folio exposes records to DuckDB as a view named records.
folio query ./customers \ "SELECT country, COUNT(*) AS n FROM records GROUP BY country ORDER BY n DESC"sheet.query("SELECT * FROM records WHERE country = ?", ["Japan"])Only SELECT is allowed. INSERT / UPDATE / DELETE / DDL are rejected at the
Folio layer (ADR-0005). Use upsert_records and
delete_records to mutate.
Pagination
Sheet.list_records (and folio list) accepts limit and cursor. The
cursor is an opaque string Folio gives you back; pass it on the next call to
continue.
page = sheet.list_records(limit=100)while page["next_cursor"]: page = sheet.list_records(limit=100, cursor=page["next_cursor"])Order
Folio does not guarantee any record order across writes. Materialize and
upsert may reorder rows (in particular, upsert_records keeps the relative
order of pre-existing rows but appends new ones at the end). If you need a
stable order, use an ORDER BY clause on the query.
Size guidance
The format streams. There is no in-memory load step at read; Folio reads
line-by-line. A records.jsonl with tens of thousands of rows is the
designed scale. Hundreds of thousands work. Millions are out of scope —
that’s the regime where you’d want a database, not a file.
Inspecting from the shell
# Pretty-print one recordjq . customers/records.jsonl | head -10
# Count rowswc -l customers/records.jsonl
# Filter without DuckDBjq -c 'select(.country == "Japan")' customers/records.jsonl
# Diff two checkouts of the same sheetdiff <(jq -S . a/records.jsonl) <(jq -S . b/records.jsonl)Why JSONL and not CSV / Parquet / ndjson?
- CSV doesn’t carry types and chokes on commas in strings; nullability is ambiguous.
- Parquet isn’t human-readable, breaks
cat/grep, and pulls in a binary dependency. - ndjson is the same thing as JSONL — different name. Folio uses “JSONL” because it’s the term used in the wider data-contract ecosystem.
If your toolchain prefers Parquet, run folio export datapackage to get a
Frictionless descriptor that points to records.jsonl — many tools can
convert from there.