Methods

Data Sources

Our analysis is based on a comprehensive collection of over 44,000 working papers from two major economic research institutions:

These papers span from 1980 to 2023 and cover a wide range of economics subfields, providing a broad view of the research landscape. They encompass various empirical strategies, including Randomized Controlled Trials (RCTs), Instrumental Variables (IV), Difference-in-Differences (DiD), and Regression Discontinuity Designs (RDD).


AI-Powered Information Retrieval Process

We employed a multi-stage process using a custom Large Language Model (LLM) to extract and analyze information from the working papers. This approach allowed us to efficiently process the text and extract detailed structured data necessary for our analysis.

1. Qualitative Summary Extraction

In the first stage, the AI model analyzed each paper to extract a curated summary of key elements, including:

This initial extraction provided a structured overview of each paper, serving as a foundation for deeper analysis.


2. Extraction of Causal Claims

Using the summaries from the first stage, we:


3. Data Usage and Accessibility Extraction

We gathered structured information regarding:

This information is used for assessing trends in data usage and implications for transparency and replicability in economic research.


4. Mapping Variables to Standardized Economic Concepts

To systematically analyze and aggregate the causal claims, we:

Figure Notes: This flowchart illustrates our AI-powered approach to retrieving, assessing, and mapping causal claims and contributions from academic papers. The process begins with academic papers, from which the LLM extracts fields such as Author, Publication, Institution, Field, Method, and Data/Code Availability. These aspects feed into two main branches: Identification and Causal Claims. The Identification branch focuses on elements like Identification Strategy and Robustness Checks. The analysis extends to understanding precise measurements and contexts, as well as extrapolated concepts and contexts, leading to insights on contributions claimed and policy recommendations. The Causal Claims branch involves analyzing the causal relationships identified in the papers, consisting of arrays of source (cause) and sink (effect) variables. The analysis operates across three levels. First, for each source or sink node, we consider the source of sink as claimed by the author and as measured in the paper, including the type the owner of the data used. Second, for each source-sink edge, we examine the method(s) used to evidence a claim, and whether null result was found. Third, at the graph level, we assess the number of steps taken from cause to effect, the descriptions of these steps, and the overall complexity of the underlying narrative.

Retrieving Concepts Using AI 

Figure Notes: This diagram illustrates our AI-driven methodology for analyzing and mapping causal linkages between economic concepts, represented by JEL (Journal of Economic Literature) codes. Starting with a corpus of working papers, we use a custom prompt and pre-trained language model to extract causal relationships, identifying source (cause) and sink (effect) variables within the text. The extracted causal claims are parsed to generate directed linkages between JEL codes, forming a knowledge graph that aggregates these relationships across the corpus. We employ OpenAI's vector embeddings to numerically represent descriptions of JEL codes and utilize cosine similarity with sources and sinks, assigning the most similar JEL code to each of the source and sink nodes. This approach enables us to construct a structured representation of causal evidence in economics over time, facilitating the exploration of interconnected economic concepts and the evolution of empirical research frontiers.

Our AI-driven methodology allows us to systematically analyze a vast corpus of economic research, uncovering trends in empirical methods, causal narratives, and data usage. By mapping causal claims to standardized concepts, we can explore the interconnectedness of economic ideas and how they have evolved over time.

For more detailed information, visit our full Data and Methods section in the paper [here].