Abstract:
In response to McCullough and Vinod (2003)'s failed replication attempt of several articles in the American Economic Review (AER), then-editor of the AER Ben Bernanke strengthened the AER's data and code availability policy to allow for successful replication of published results by requiring authors to submit to the AER data and code replication files Bernanke (2004). Since the AER strengthened its policy, many of the other top journals in economics, such as Econometrica and the Journal of Political Economy, also started requiring data and code replication files.
There are two main goals of these replication files: (1) to bring economics more in line with the natural sciences by embracing the scientific method's power to verify published results, and (2) to help improve and extend existing research, which presumes the original research is replicable. These benefits are illustrated by the policy-relevant debates between (,) and Neumark and Wascher (2000) on minimum wages and employment; Hoxby (2007,2000) and Rothstein (2007) on school choice; Levitt (1997,2002) and McCrary (2002) on the causal impact of police on crime; and, more recently, Reinhart and Rogoff (2010) and Herndon, Ash, and Pollin (2014) on fiscal austerity. In extreme cases, replication can also facilitate the discovery of scientific fraud, as in the case of Broockman, Kalla, and Aronow (2015)'s investigation of the retracted article by LaCour and Green (2014).
This article is a cross-journal, broad analysis of the state of replication in economics.1 We attempt to replicate articles using author-provided data and code files from 67 papers published in 13 well-regarded general interest and macroeconomics journals from July 2008 to October 2013. This sampling frame is designed to be more comprehensive across well-regarded economics journals than used by existing research. Previous work has tended to focus on a single journal, such as McCullough, McGeary, and Harrison (2006), who look at the Journal of Money, Credit and Banking (JMCB); McCullough and Vinod (2003), who attempt to replicate a single issue of the AER (but end up replicating only Shachar and Nalebuff (1999) with multiple software packages); or Glandon (2010), who replicates a selected sample of nine papers only from the AER.
Using the author-provided data and code replication files, we are able to replicate 22 of 67 papers (33%) independently of the authors by following the instructions in the author-provided readme files. The most common reason we are unable to replicate the remaining 45 papers is that the authors do not provide data and code replication files. We find that some authors do not provide data and code replication files even when their article is published in a journal with a policy that requires submission of such files as a condition of publication, indicating that editorial offices do not strictly enforce these policies, although provision of replication files is more common at journals that have such a policy than at journals that do not. Excluding 6 papers that rely on confidential data for all of their results and 2 papers that provide code written for software we do not possess, we successfully replicate 29 of 59 papers (49%) with help from the authors. Because we successfully replicate less than half of the papers in our sample even with assistance from the authors, we conclude that economics research is usually not replicable.2
Despite our finding that economics research is usually not replicable, our replication success rates are still notably higher than those reported by existing studies of replication in economics. McCullough, McGeary, and Harrison (2006) find a replication success rate for articles published in the JMCB of 14 of 186 papers (8%), conditioned on the replicators' access to appropriate software, the original article's use of non-proprietary data, and without assistance from the original article's authors. Adding a requirement that the JMCB archive contain data and code replication files the paper increases their success rate to 14 of 62 papers (23%). Our comparable success rates are 22 of 59 papers (37%), conditioned on our having appropriate software and non-proprietary data, and 22 of 38 papers (58%) when we impose the additional requirement of having data and code files. Dewald, Thursby, and Anderson (1986) successfully replicate 7 of 54 papers (13%) from the JMCB, conditioned on the replicators having data and code files, the original article's use of non-confidential data, help from the original article's authors, and appropriate software. Our comparable figure is 29 of 38 papers (76%).
Our sampling frame includes papers from 13 well-regarded macroeconomics and general interest economics journals: American Economic Journal: Economic Policy, American Economic Journal: Macroeconomics, American Economic Review, American Economic Review: Papers and Proceedings (P&P), Canadian Journal of Economics, Econometrica, Economic Journal, Journal of Applied Econometrics, Journal of Political Economy, Review of Economic Dynamics, Review of Economic Studies, Review of Economics and Statistics, and Quarterly Journal of Economics. We choose papers from these journals because of the relative likelihood that such papers will have a policy effect and also influence future research.3 We do not select these journals to single out a particular author, methodology, institution, or ideology.
From our sample of journals, we browse for original research articles published in issues from July 2008 to October 2013.4,5 Within these issues, we identify all papers with the following three characteristics: (1) an empirical component, (2) model estimation with only US data, and (3) a key empirical result produced by inclusion of US gross domestic product (GDP), published by the Bureau of Economic Analysis (BEA), in an estimated model.6 We choose to focus on GDP because of its status as a standard macroeconomic statistic and its widespread use in research.7
For each paper in this set, we attempt to replicate the key empirical results.8 We focus on the key empirical results for two reasons: (1) replicating only the key results allows us to expand the sample to more papers, and (2) the key result of the paper is presumably what drove the paper's publication; robustness checks merely serve as confirming evidence.
Defining a key result is subjective and requires judgmental decisions on our part. We attribute a key result of the paper to GDP when the authors themselves refer to GDP as driving a key result, or when a discussion of GDP is featured either in the abstract or prominently in the introduction of their work (or both). We also take key results as those that appear in figures and tables.9
We find 67 papers that fit these criteria. Of these papers, 6 papers use proprietary data for all of the key results, so we do not include them in our replication exercise (,); Alexopoulos (2011); (,,). If a subset of the key results could be obtained using non-proprietary data, then we attempt to replicate those results.
For the remaining papers that use public data and are published in journals that maintain data and code archives, we download the replication files provided by the authors through the online archives provided by the journals. Unlike prior work by McCullough, McGeary, and Harrison (2006), who found difficulty in accessing the archives of selected journals, we had no trouble doing so through the Board of Governors of the Federal Reserve System or Office of the Comptroller of the Currency subscriptions. However, consistent with McCullough, McGeary, and Harrison (2006), we find that journal data and code archives are incomplete. Of the 35 papers that use public data and are published in journals that require data and code replication files, we obtain files for 28 papers (80%) from journal archives.
For papers where we are unable to obtain replication data and code files from journal archive sites, either because the mandatory files are is missing or because the paper is not subject to a data availability policy, we check the personal websites of each of the authors for replication files. If we are unable to locate replication files online, then we email each of the authors individually requesting the replication files.10 Of the 7 papers that use public data, are subject to a data and code policy, and do not have replication files on the journal's archive site, this procedure nets us one additional set of replication files. Therefore, we are unable to locate replication files for 6 of 35 papers (17%) that are published in journals that require submission of data and code replication files. For papers published in journals without a data and code availability policy and that use public data, we are unable to obtain data and code replication files for 15 of 26 papers (58%). We do not single out any paper or author that fails to comply with a journal's mandatory data and code policy. We therefore only report these summary statistics of compliance with data availability policies and only cite papers that we either successfully replicate, that use proprietary data, or where we have what appears to be a complete set of replication files in a software we do not possess. Our intention is to highlight the general state of replication files for published economics research, not to berate any given author, methodology, institution, or ideology.
To determine whether a paper was subject to a data availability policy, we check the implementation dates of the journal data policies and compare them to the publication and submission dates of the published work. If the journal's website does not allow us to extract this information, then we query the editorial office as to when their data availability policy became effective. We do not ask the editorial offices whether a particular paper was subject to a data availability policy. Aside from papers with proprietary data, we find that journal data archives do not provide lists of potentially exempt papers. Therefore, we are unable to determine whether a paper is exempt for a reason other than using proprietary data, although we are not aware of reasons why journals would grant a paper a data and code exemption other than for proprietary data. The authors we query whose papers we believe are subject to a data availability policy yet whose replication files we are unable to locate do not volunteer whether their papers are exempt from the policy, and we do not ask the authors for this information.
For the papers for which we are able to obtain data and code replication files, we attempt to replicate the key results of the paper using only the instructions provided in the author readme files. If the readme files are insufficient or if the replication files are incomplete (or both) and the paper is subject to a replication policy, then we email the corresponding author (if no corresponding author, then the first author) for either clarification or to request the missing files. If we do not receive a response within a week, then we query the second author, and so on, until all authors on the paper had been contacted.11
We define a successful replication as when the authors or journal provide data and code files that allow us to qualitatively reproduce the key results of the paper. For example, if the paper estimates a fiscal multiplier for GDP of 2.0, then any multiplier greater than 1.0 would produce the same qualitative result (i.e., there is a positive multiplier effect and that government spending is not merely a transfer or crowding out private investment).12 We define success using this extremely loose definition to get an upper bound on what the replication success rate could potentially be.13 We allow for minimal re-working of the provided files, following the procedure of McCullough, McGeary, and Harrison (2006).14
One dimension where we are unable to follow the authors exactly is the software version they use. To execute the replications, we make use of the following software version-operating system combinations: Dynare 4.3 and 4.4.2 (Windows), EViews 6 and 7 (Windows), EViews 8 (Linux), Gauss 9.0.2 (Linux), Fortran f90 (Linux), Matlab R2008a and R2012a and R2013a (Windows), Matlab R2010a and R2012a (Linux), OX 6.30 (Windows), Oxmetrics 6.30 (Windows), Stata 11.0 and 13.1 (Windows), Stata 13.0 (Windows and Linux), R 2.15.1 and 3.0.1 and 3.0.2 and 3.0.3 and 3.1.0 (Linux), and RATS 7.10 (Linux).15 When available in the readme, we attempt to run the software version-operating system combination specified by the authors. When the replication files fail to execute on a given software version-operating system combination, the author readme did not specify a particular software version-operating system combination, and it appeared that the data and code were complete, we email the authors to find out which combination they use.
This section presents summary statistics of our replication attempts.16
Table 1 lists the papers we successfully replicate. Table 2 breaks down our replication results by journal type. Panel A of Table 2 shows that our overall replication success rate is 29 of 67 papers (43%).
Table 2, Panel B shows that we successfully replicate 23 of 39 papers (59%) from journals that require data and code replication files. This rate compares to 6 of 28 (21%) of the papers from journals that do not require such files, shown in Table 2, Panel C. These replication rates are similar when we only consider papers with publicly available data: we successfully replicate 23 of 35 (66%) of the papers from journals with mandatory data and code policies and 6 of 26 (23%) of the papers from journals without such policies. The presence of a mandatory data and code policy does not necessarily imply a causal relationship from the policy to successful replication. Authors select which journals to submit papers to, taking into account idiosyncratic journal policies such as mandatory submission of replication data and code. However, we find that it is significantly easier to replicate published research that comes from journals that require authors to submit their data and code.
Table 3, Panel A provides explanations for why we are unable to replicate papers according to four broad classifications: "missing public data or code," "incorrect public data or code," "missing software," or "proprietary data." Panel B provides the breakdown for journals that require data and code. Panel C shows the results for journals that do not require data and code.
From Table 3, Panel A we find that we are unable to replicate 21 papers because of "missing data or code," which constitutes the majority of our failed replications (55%). As we outline in our methodology, for each of these unsuccessful replications we attempt to secure data and code from the authors by visiting their personal websites, visiting the journal websites (when the journal requires authors to submit data or code), and sending email requests. We classify an unsuccessful replication as "missing data or code" when at least one of two events occur: (1) the replication code file(s) are clearly missing necessary author-written functions for a subset or all of the key results or (2) the replication data file(s) are missing at least one variable. If the replication data has a shorter data sample than reported in the paper, then we still attempt the estimation and do not necessarily classify the paper as "missing data or code."
We are unable to replicate 9 papers (24% of failed replications) because of "incorrect data or code." We classify an unsuccessful replication as "incorrect data or code" when all variables are present in the dataset and the authors self-identify code for each of the key figures and tables we attempt to replicate. The author-provided code may finish executing and give different results or the code may not finish executing and still fall into this category.
We believe we do not have the needed software to run two papers (); Senyuz (2011) because we are unable to locate a necessary packaged function in our versions of the appropriate software, because of significant syntax changes between software versions, or because the authors declared that they use a particular software version and we are aware that our software would not be compatible. However, it is tricky differentiating between an unsuccessful replication due to "incorrect data or code" or due to "missing software." Because the implementation of packaged functions may differ across software versions even without syntax changes, we believe the number of failed replications we classify as "missing software" is a lower bound. It is possible that a paper we classify as "incorrect data or code" is actually replicable with the appropriate operating system-software combination, so some of the papers that we classify as "incorrect data or code" may belong in the "missing software" category. However, we cannot verify this statement without additional documentation.
Table 4 shows our summary statistics for successful replications independent of the authors versus replications that were successful with the author's help. Overall, we find that contacting the authors marginally improves our success rate for replication. Of the 29 successful replications, we complete 22 without any help from the authors.
In this article, we attempt to replicate 67 papers from 13 well-regarded economics journals using author-provided data and code replication files. Improving on existing work evaluating the state of replication in economics, our sampling frame is broader across different journals and covers a large number of original research articles. We replicate 22 of 67 papers (33%) by using only the authors' data and code files, and an additional 7 papers (for a total of 29 papers, 43%) with assistance from the authors. The most common cause of our inability to replicate findings is that authors do not provide files to the journal replication archives, which constitutes approximately half of our failed replication attempts (21 of 38 papers, 55%). Because we are able to replicate less than half of the papers in our sample, we conclude that economics research is generally not replicable.
We now turn to some recommendations that we feel would improve the ability for researchers to replicate and extend published articles, largely echoing the recommendations of McCullough, McGeary, and Harrison (2006).
We now turn to two recommendations that will improve the ability of researchers to extend published work, in addition to merely replicating it.
We replicate the corrected results of Auerbach and Gorodnichenko (2012) found in Auerbach and Gorodnichenko (2013).
Table 2: Replication Sample and Results By Journal
Journal of Applied Econometrics requires data only. Economic Journal currently requires data and code, but the papers in our sample were not subject to a data and code policy according to the Economic Journal's editorial office.Papers Replicated Successfully | Papers With Public Data | Total Papers | |
---|---|---|---|
Panel A: All Journals | 29 | 61 | 67 |
Panel B: Journals that Require Data and Code:
Papers Replicated Successfully | Papers With Public Data | Total Papers | |
---|---|---|---|
American Economic Journal: Economic Policy | 2 | 4 | 4 |
American Economic Journal: Macroeconomics | 3 | 3 | 4 |
American Economic Review | 5 | 8 | 10 |
Canadian Journal of Economics | 0 | 0 | 1 |
Econometrica | 3 | 3 | 3 |
Journal of Political Economy | 1 | 1 | 1 |
Review of Economic Dynamics | 4 | 7 | 7 |
Review of Economic Studies | 0 | 2 | 2 |
Review of Economics and Statistics | 5 | 7 | 7 |
Total for Journals that Require Data and Code | 23 | 35 | 39 |
Panel C: Journals that Do Not Require Data and Code:
Papers Replicated Successfully | Papers With Public Data | Total Papers | |
---|---|---|---|
American Economic Review: P&P | 0 | 4 | 5 |
Economic Journal | 3 | 10 | 11 |
Journal of Applied Econometrics | 1 | 10 | 10 |
Quarterly Journal of Economics | 2 | 2 | 2 |
Total for Journals that Do Not Require Data and Code | 6 | 26 | 28 |
Table 3: Failed Replication Results, Including Causes of Failure, By Journal Type
Paper Count | Percentage of Sample | |
---|---|---|
Replication Unsuccessful | 38 | 100 |
Unsuccessful Because of Missing Public Data or Code | 21 | 55 |
Unsuccessful Because of Incorrect Public Data or Code | 9 | 24 |
Unsuccessful Because of Missing Software | 2 | 5 |
Unsuccessful Because of Proprietary Data | 6 | 16 |
Panel B: Journals With Mandatory Data and Code Policies
Paper Count | Percentage of Sample | |
---|---|---|
Replication Unsuccessful | 16 | 100 |
Unsuccessful Because of Missing Public Data or Code | 6 | 38 |
Unsuccessful Because of Incorrect Public Data or Code | 5 | 31 |
Unsuccessful Because of Missing Software | 1 | 6 |
Unsuccessful Because of Proprietary Data | 4 | 25 |
Panel C: Journals Without Mandatory Data and Code Policies
Paper Count | Percentage of Sample | |
---|---|---|
Replication Unsuccessful | 22 | 100 |
Unsuccessful Because of Missing Public Data or Code | 15 | 68 |
Unsuccessful Because of Incorrect Public Data or Code | 4 | 18 |
Unsuccessful Because of Missing Software | 1 | 5 |
Unsuccessful Because of Proprietary Data | 2 | 9 |
Table 4: Successful Replication Results By Journal Type
Paper Count | Percentage of Sample | |
---|---|---|
Replication Successful | 29 | 100 |
Successful With Contacting Authors | 7 | 24 |
Successful Without Contacting Authors | 22 | 76 |
Panel B: Journals With Mandatory Data and Code Policies
Paper Count | Percentage of Sample | |
---|---|---|
Replication Successful | 23 | 100 |
Successful With Contacting Authors | 3 | 13 |
Successful Without Contacting Authors | 20 | 87 |
Panel C: Journals Without Mandatory Data and Code Policies
Paper Count | Percentage of Sample | |
---|---|---|
Replication Successful | 6 | 100 |
Successful With Contacting Authors | 4 | 67 |
Successful Without Contacting Authors | 2 | 33 |