We build evidence‑annotated claim graphs: standardized, paper‑level representations of what a paper claims and how it supports those claims.
A claim graph is a directed network:
• Nodes: standardized economic concepts mapped to JEL codes.
• Edges: relationships claimed by the authors (source → sink).
• Each edge is annotated with evidentiary basis.
We classify an edge as “causal” if the claim is supported by canonical causal identification designs:
Difference‑in‑Differences (including event studies), Instrumental Variables, Randomized Controlled Trials, Regression Discontinuity, or Synthetic Control.
Other evidence types are recorded as non‑causal (e.g., theory, descriptive evidence, correlational analysis).
We use a structured multi‑stage workflow designed for transparency and stability:
Extract a structured summary of the research question, methods, identification language, and key claims from the first 30 pages.
Extract claim‑graph edges (source concept → sink concept), relationship type, and evidentiary basis.
Map free‑text concept mentions to standardized JEL concepts to enable cross‑paper comparison.
We run multiple passes and aggregate edges using an “edge overlap” criterion (how often an edge re‑appears across runs).
This provides a transparent precision/recall trade‑off and helps filter fragile edges.
We use three complementary checks:
• Iteration stability: do results hold as we tighten the edge‑overlap threshold?
• Snippet‑only self‑consistency: compare summary‑based edges to edges extracted from verbatim snippets.
• External dataset validation for extractable components (methods/fields and plausibly exogenous variation benchmarks).
• We extract what papers state; we do not adjudicate truth.
• We focus on the first 30 pages for comparability.
• Any automated extraction can make mistakes; we mitigate this with repeated passes, aggregation, and validation.
All data and code are available on GitHub: