Frequently asked questions
Jump to a question:
1. What is the main objective of this study?
The primary objective of our study is to analyze the evolution of empirical methods and causal narratives in economics over the past four decades. By leveraging a custom large language model (LLM) to process over 44,000 working papers from NBER and CEPR, we aim to understand how the complexity and structure of causal claims have changed over time. We also investigate how these factors influence publication outcomes and the credibility of economic research.
2. How was the dataset constructed?
We compiled a comprehensive dataset of 44,852 working papers from the National Bureau of Economic Research (NBER) and the Centre for Economic Policy Research (CEPR), spanning from 1980 to 2023.
Our machine learning pipeline involved several key steps:
Data Preprocessing: Cleaning and normalizing text data for efficient processing by the LLM.
LLM-Based Information Retrieval: Using a custom large language model to extract structured information, including causal claims, empirical methods, data usage, and metadata.
Construction of Causal Graphs: Mapping extracted causal claims to standardized economic concepts using Journal of Economic Literature (JEL) codes and constructing causal graphs for each paper.
Measuring Causal Narrative Complexity: Developing measures such as the number of causal edges, unique paths, longest path length, cause-effect ratio, and eigenvector centrality to quantify the complexity of causal narratives.
Statistical Analysis: Examining trends over time, across fields and methods, and analyzing how causal narrative complexity relates to publication outcomes using regression models.
3. How did you extract causal claims from the papers?
We employed a custom large language model to analyze each paper and extract detailed causal relationships as presented by the authors. The LLM identified cause and effect variables, determined the types of causal relationships (e.g., direct effect, indirect effect), and recorded the causal inference methods used (e.g., RCT, IV, DiD). This resulted in an edge list per paper, where each row represents a causal claim, forming the basis for constructing causal graphs.
4. What are the key findings of the study?
The share of causal claims within papers rose from about 4% in 1990 to 28% in 2020, reflecting the "credibility revolution".
Our findings reveal a trade-off between factors enhancing publication in top journals and those driving citation impact.
While employing causal inference methods, introducing novel causal relationships, and engaging with less central, specialized concepts increase the likelihood of publication in top 5 journals, these features do not necessarily lead to higher citation counts.
Instead, papers focusing on central concepts tend to receive more citations once published.
However, papers with intricate, interconnected causal narratives—measured by the complexity and depth of causal channels—are more likely to be both published in top journals and receive more citations.
Finally, we observe a decline in reporting null results and increased use of private data, which may hinder transparency and replicability of empirical research, highlighting the need for research practices that enhance both credibility and accessibility.
5. How does this study contribute to existing literature?
Our study provides a comprehensive, data-driven analysis of the evolution of empirical methods and causal narratives in economics. By constructing causal graphs for a vast corpus of papers and quantifying narrative complexity, we offer new insights into how research practices have changed over time. This work contributes to the broader discourse on the "credibility revolution" in economics and underscores the need for transparency and replicability in research.
6. What are the implications of this study for economic research and policy?
For Researchers: Understanding that narrative complexity and causal structure can influence publication success may impact how researchers design and present their analyses. It emphasizes the importance of clear, well-supported causal narratives.
For Journals and Publishers: The findings suggest that top journals may favor papers with complex causal narratives. Recognizing this can inform editorial policies to encourage diversity in research approaches and promote transparency.
For Policy-Makers: Highlighting trends such as the underreporting of null results and increased use of private data can inform discussions on evidence-based policy-making and the need for accessible, transparent research to guide decisions.
7. How reliable are the methods used in this study?
We took several steps to ensure the reliability of our methods:
Advanced AI Techniques: Used a custom LLM capable of understanding complex economic texts to extract structured data.
Structured Extraction Process: Followed a multi-stage extraction process with predefined schemas to ensure consistency.
Validation: While full validation of all data was not feasible due to scale, we conducted spot checks and cross-referenced findings with known trends in the literature.
Robustness Checks: Analyzed trends across different fields and methods, and controlled for temporal effects in our statistical models.
However, we acknowledge limitations such as potential biases introduced by the LLM and the challenges of mapping nuanced concepts to standardized codes.
8. Were there any limitations in your study?
Yes, our study has several limitations:
Reliance on Working Papers: We focused on NBER and CEPR working papers, which may not represent the entire spectrum of economic research.
Use of LLM for Data Extraction: While powerful, LLMs can introduce errors or biases, especially in interpreting complex or ambiguous text.
Mapping to JEL Codes: Mapping variables to JEL codes abstracts nuanced differences and may oversimplify complex concepts.
Temporal Coverage: Our dataset covers up to 2023, and ongoing developments may not be captured.
9. How can future research build on this study?
Future research can:
Extend Methodology: Apply similar approaches to other disciplines or broader datasets to analyze trends in research practices.
Deepen Analysis: Explore causal narratives in more detail, including interactions, mediators, and moderators.
Investigate Incentives: Examine how academic incentives and pressures influence research methods and publication outcomes.
Enhance Validation: Develop methods for large-scale validation of AI-extracted data, possibly integrating human oversight.
Promote Transparency: Study the impact of open science initiatives on research practices and credibility.
10. Is the dataset available for other researchers to use?
Yes, we are making the aggregated dataset available for download under an Apache License 2.0. Researchers can access paper-level and causal claim-level data. For those interested in more detailed data or potential collaborations, please fill out our Data Access and Updates Form or contact us directly at prashant.garg@imperial.ac.uk.
11. How does this study address concerns about transparency and replicability?
Our study contributes by:
Highlighting Trends: Documenting the decline in reporting null results and the rise of private data use, which are concerns for transparency and replicability.
Providing Open Data: Making our dataset available to encourage replication and further research.
Advocating for Best Practices: Emphasizing the need for data accessibility and transparent reporting in economic research.
12. How do the findings relate to the 'credibility revolution' in economics?
The "credibility revolution" refers to the shift towards more rigorous empirical methods focused on causal inference. Our findings show significant growth in the use of advanced empirical methods like DiD, IV, and RCTs, reflecting this movement. However, we also highlight challenges such as increased narrative complexity and underreporting of null results, suggesting that while methods have advanced, issues related to transparency and replicability remain.
13. What are causal graphs, and how are they used in your study?
Causal graphs are visual representations of causal relationships, where nodes represent variables or concepts, and directed edges represent causal effects from one variable to another. In our study, we constructed causal graphs for each paper by mapping the extracted causal claims to JEL codes. This allowed us to analyze the complexity and structure of causal narratives systematically.
For more info, check Pearl, J., 2009. Causality. Cambridge university press.
14. How did you measure causal narrative complexity?
We developed several measures based on the causal graphs:
Number of Causal Edges: Total number of causal claims in a paper.
Number of Unique Paths: Distinct causal pathways within the graph.
Longest Path Length: Length of the longest causal chain.
These measures help quantify the depth and interconnectedness of the causal narratives in each paper.
15. How does this study relate to the use of AI in economics research?
Our study demonstrates how AI, specifically large language models, can be utilized to process and analyze large volumes of academic text efficiently. By automating the extraction of structured information from tens of thousands of papers, we showcase the potential of AI to augment traditional research methods in economics, opening avenues for large-scale meta-analyses and insights into research trends.
16. Did you find any differences across different fields within economics?
Yes, variations were observed:
Method Adoption: Fields like Labor and Public Economics heavily use DiD, while Behavioral and Development Economics feature more RCTs.
Narrative Complexity: Fields such as Finance and Macroeconomics tend to have higher narrative complexity.
Reporting Null Results: Econometrics and Behavioral Economics report higher shares of null results.
These differences reflect how methodological choices and research practices vary across subfields.
17. How can your findings inform publication practices in economics?
Our findings suggest that top journals may favor papers with complex causal narratives. Recognizing this can help authors in structuring their research and may prompt journals to consider potential biases in their publication processes. Encouraging the publication of studies with simpler narratives or null results could enhance diversity and transparency in published research.
18. How did you handle data privacy and ethical considerations in your research?
We adhered to ethical standards by:
Using Publicly Available Data: Our sources were publicly accessible working papers and metadata.
Anonymity: We focused on aggregate trends and did not disclose sensitive personal information.
Transparency: Detailed our methods and made the dataset available under an open license.
Compliance: Ensured adherence to data protection regulations and best practices in data handling.
19. Where can I find more details about the methods used in this study?
A detailed explanation of our methods can be found in the Methods section of our paper, available as a preprint here. For an accessible overview, please visit the Methods page on our website. If you have further questions, feel free to contact us at team@causal.claims
We hope this FAQ addresses your questions about our study. If you have additional inquiries, please do not hesitate to reach out to us.