records.jsonl

records.jsonl is the data. Line-delimited JSON (JSON Lines), UTF-8, one object per line, no trailing comma, no enclosing array.

{"id": "cust_001", "company_name": "Acme Manufacturing", "country": "Japan"}
{"id": "cust_002", "company_name": "DataFlow",          "country": "United States"}

That’s it. The format is intentionally trivial so head, grep, jq -c, and DuckDB’s read_json_auto all work without ceremony (ADR-0004).

Encoding and line semantics

Encoding: UTF-8. No BOM.
Line terminator: \n. CRLF lines are accepted but Folio rewrites to \n on the next atomic write.
Empty lines: ignored.
Trailing newline: Folio always writes one. Files without a trailing newline are accepted.

Per-record invariants

For each non-empty line:

Must parse as a JSON object (not a string, not an array).
Must include the contract’s primary key as a non-null value.
Required fields (per the contract) must be present and non-null.
May include extra fields not in the contract — Folio preserves them verbatim across writes. (We do not silently drop unknown columns.)

Violations raise RecordsError (parse failures) or OperationError (missing required / missing primary key).

Writes are atomic

Sheet.upsert_records and Sheet.materialize write records.jsonl through a temp file + rename:

records.jsonl.tmp.<pid>     ← write the new content
records.jsonl               ← atomically rename over the old file

A reader holding an open file descriptor on records.jsonl sees a complete older snapshot. A new reader sees the new file. There is no torn state.

A .lock file in the sheet serializes writers (single-writer, ADR-0006).

Querying via DuckDB

Folio exposes records to DuckDB as a view named records.

folio query ./customers \
  "SELECT country, COUNT(*) AS n
     FROM records
    GROUP BY country
    ORDER BY n DESC"

sheet.query("SELECT * FROM records WHERE country = ?", ["Japan"])

Only SELECT is allowed. INSERT / UPDATE / DELETE / DDL are rejected at the Folio layer (ADR-0005). Use upsert_records and delete_records to mutate.

Pagination

Sheet.list_records (and folio list) accepts limit and cursor. The cursor is an opaque string Folio gives you back; pass it on the next call to continue.

page = sheet.list_records(limit=100)
while page["next_cursor"]:
    page = sheet.list_records(limit=100, cursor=page["next_cursor"])

Order

Folio does not guarantee any record order across writes. Materialize and upsert may reorder rows (in particular, upsert_records keeps the relative order of pre-existing rows but appends new ones at the end). If you need a stable order, use an ORDER BY clause on the query.

Size guidance

The format streams. There is no in-memory load step at read; Folio reads line-by-line. A records.jsonl with tens of thousands of rows is the designed scale. Hundreds of thousands work. Millions are out of scope — that’s the regime where you’d want a database, not a file.

Inspecting from the shell

# Pretty-print one record
jq . customers/records.jsonl | head -10

# Count rows
wc -l customers/records.jsonl

# Filter without DuckDB
jq -c 'select(.country == "Japan")' customers/records.jsonl

# Diff two checkouts of the same sheet
diff <(jq -S . a/records.jsonl) <(jq -S . b/records.jsonl)

Why JSONL and not CSV / Parquet / ndjson?

CSV doesn’t carry types and chokes on commas in strings; nullability is ambiguous.
Parquet isn’t human-readable, breaks cat / grep, and pulls in a binary dependency.
ndjson is the same thing as JSONL — different name. Folio uses “JSONL” because it’s the term used in the wider data-contract ecosystem.

If your toolchain prefers Parquet, run folio export datapackage to get a Frictionless descriptor that points to records.jsonl — many tools can convert from there.