Skip to content

ai derivation

ai uses an LLM to fill derived fields. Folio’s implementation routes every call through an AIClient Protocol — the AnthropicClientAdapter is the only module that imports the anthropic SDK (ADR-0009), and a deterministic StubAIClient ships for offline tests and demos.

Minimal example

derivations/industry_tag.yaml
targets: [industry_tag]
inputs: [company_name]
kind: ai
model: claude-sonnet-4-6
prompt: |
Industry of {{ company_name }} in one word.
output: text

When folio materialize reaches a record, Folio:

  1. Resolves the prompt template (substituting {{ field }} placeholders).
  2. Computes input_hash over the inputs and the resolved prompt body.
  3. Cache hit → done. Cache miss → calls the AIClient, writes the value, appends a provenance line that includes model and cost_usd.

Fields

FieldRequiredNotes
targetsyesFields this derivation writes.
inputsyesFields whose values the prompt may read.
kindyesAlways ai.
modelyesAny string the AIClient accepts. The default adapter passes it through to Anthropic.
promptone ofInline template with {{ field }} placeholders.
prompt_refone ofPath (relative to the sheet) of a markdown / text file. Mutually exclusive with prompt.
outputnotext (default) or json.
output_schemawhen multi-target{name: type} map describing the expected JSON keys.
materializationnoSee below.

prompt_ref

Use a separate file when the prompt is more than a few lines:

prompt_ref: prompts/enrich-industry.md

The file lives inside the sheet (so it travels in the tarball). The file’s content goes into the input_hash, so editing the prompt invalidates the cache.

output: json and output_schema

For multi-target derivations, the model must return a JSON object whose keys match output_schema. The default adapter sends a structured-output hint to the model; the StubAIClient honours output_schema literally.

targets: [industry, employee_count]
inputs: [company_name, country]
kind: ai
model: claude-sonnet-4-6
prompt_ref: prompts/enrich.md
output: json
output_schema:
industry: string
employee_count: integer

If the model returns an extra key, Folio writes only the keys in output_schema and ignores the rest. If the model returns a missing or null key, the field stays null and the derivation logs a failure.

materialization

Optional sub-block tuning the loop:

materialization:
respect_human_override: true # default
retries: 0 # default
retry_delay_seconds: 1.0

Retries apply to AIClient errors only; deterministic content errors (a missing output_schema key) do not retry.

Prompt template syntax

A tiny {{ field }} substitution. Whitespace inside the braces is allowed: {{ company_name }} and {{company_name}} both work. The substituted value is the JSON-encoded form of the field — strings get quotes, integers stay bare, arrays become bracket lists. This keeps prompts safe against fields that contain quotes or newlines.

Cost reporting

Every successful ai call appends cost_usd to its provenance line. Folio keeps a small PRICE_TABLE_USD in _ai_kind.py for known models; unknown models report cost_usd: null rather than fabricating a price (ADR-0009). The total_cost on the materialize envelope is the sum of all known costs in the run.

Running offline with the StubAIClient

For tests and the offline materialize smoke, inject a StubAIClient:

from folio import open_sheet
from folio._ai_kind import StubAIClient
stub = StubAIClient()
stub.prepare("Industry of Acme", "Manufacturing") # canned response
sheet = open_sheet("./customers", actor="agent:demo")
result = sheet.materialize(ai_client=stub)

StubAIClient.prepare(substring, value) matches by substring against the resolved prompt. A fallback responder lets you return canned shapes for any prompt:

stub = StubAIClient(default_responder=lambda prompt: {"industry": "Unknown"})

When to use ai vs other kinds

  • Use ai when the answer is fundamentally fuzzy — classifying free text, summarizing, extracting from messy inputs.
  • Use python or sql when the answer is deterministic. The cache is cheaper, the result is reproducible, the test story is trivial, and there’s no API key to manage.
  • Use import when the answer already exists in a local file.

A common pattern is a deterministic python / sql first pass to handle the easy cases, with an ai fallback for the long tail.

Where to next