Working memory for agents
Use Case 2.2 from the design overview (§2.2). A long-running research agent stores its intermediate findings as a sheet; humans graduate candidates to verified.
The shipped sheet is
examples/research-memory/.
The shape
contract.yaml columns: id string PK query string agent + human url string agent + human title string agent + human snippet string agent + human status string human-only ("candidate" / "verified" / "rejected") notes string human-only domain string derived (python — host extraction from url)The agent writes the first five columns. Humans gate status and
notes. The python derivation extracts a normalized hostname from
url so the dashboard can group findings by source without an LLM call.
Why deterministic enrichment matters
A research agent can produce hundreds of findings per session. The review interface needs to:
- group by source (so you can spot a single domain dominating the list),
- filter by status,
- sort by recency.
All three are SQL-friendly given a domain column. So the derivation is
a tiny python script:
import json, sysfrom urllib.parse import urlparse
inputs = json.loads(sys.argv[2])host = urlparse(inputs.get("url", "") or "").hostname or ""if host.startswith("www."): host = host[4:]print(host)targets: [domain]inputs: [url]kind: pythonscript: url_to_domainCache hit on every run unless the URL changes. No tokens, no API calls, no flaky tests.
Walking through the sheet
folio validate examples/research-memoryfolio materialize examples/research-memory --actor agent:demo
folio query examples/research-memory \ "SELECT domain, COUNT(*) AS n FROM records GROUP BY domain ORDER BY n DESC"[{"domain":"world-nuclear.org","n":1}, {"domain":"iea.org","n":1}, {"domain":"nature.com","n":1}, ...]Open the Viewer:
folio serve examples/research-memory --port 3000 --actor agent:humanUse the Records tab to:
- skim findings by
query(the column the agent grouped them under), - update
statusfromcandidate→verified/rejected, - add
notesfor findings that need follow-up.
A typical session
- Agent runs. The agent writes ~50 findings per query under
status: "candidate". folio materializefillsdomain. Thepythonderivation runs for new rows; existing rows cache-hit. Free.- Human reviews. Skim the Viewer. Promote good findings, reject
the obviously wrong ones, leave the genuinely uncertain at
candidatefor a deeper review later. - Next agent run. The agent re-runs against new queries. Existing
verifiedandrejectedfindings are not touched (the agent’s prompt tells it to skip rows it didn’t add).
Extending
Natural additions:
- An ai derivation that scores plausibility
(
relevance_score: number) per finding — a cheap pre-filter before human review. - A second sheet
research-projects/keyed byquery, with one row per active research thread, and a cross sheet derivation that pulls the thread’spriorityinto each finding.
See also
- examples/research-memory/README.md
pythonderivation — how the script is invoked and cached.- Editing and provenance —
respect_human_overridefor thestatuscolumn.