Do Standard Error Corrections Exacerbate Publication Bias? [ Abstract | Draft ]
Over the past several decades, econometrics research has devoted substantial efforts to improving the credibility of standard errors. This paper studies how such improvements interact with the selective publication process to affect the credibility of published studies. I show that adopting improved but enlarged standard errors in individual studies can inadvertently lead to higher bias in the studies selected for publication. Intuitively, this is because increasing standard errors raises the bar on statistical significance, which exacerbates publication bias. Despite the possibility of higher bias, I show that the coverage of published confidence intervals unambiguously increases. I illustrate these phenomena using a newly constructed dataset on the adoption of clustered standard errors in difference-in-differences studies published between 2000 and 2009. Clustering is associated with a near doubling in the magnitude of effect sizes. I estimate a model of the publication process and find that clustering led to large improvements in coverage but also sizable increases in bias. To examine the overall impact on evidence-based policy, I develop a model of a policymaker who uses information from published studies to inform policy decisions and overestimates the precision of estimates when standard errors are unclustered. I find that clustering lowers minimax regret when policymakers exhibit sufficiently high loss aversion for mistakenly implementing an ineffective or harmful policy.
Job Market Paper. 2023 George Borts Prize for best doctoral dissertation in economics, Brown University. Presented at 2023 Econometrics Society North American Summer Meeting, Los Angeles; 2023 MAER-Net Colloquium; 2023 AYEW Monash University
Empirical research can inform evidence-based policy choice but may be censored due to publication bias. How does this impact the decisions of policymakers who do not have, or are unwilling to use, prior beliefs about a policy's impact? For minimax regret policymakers, we characterize the optimal treatment rule with selective publication against statistically insignificant results. We then show that the optimal publication rule which minimizes maximum regret is non-selective. This contrasts with the optimal publication rule for Bayesian policymakers studied in the literature, where only `extreme' results that sufficiently move the prior are published. Thus, in the minimax regret framework, the optimal publication regime for policy choice is consistent with valid statistical inference in scientific research.
Many explanations have been offered for why replication rates are low in the social sciences, including selective publication, p-hacking, and treatment effect heterogeneity. This article emphasizes that issues with the most commonly used approach for setting sample sizes in replication studies may also play an important role. Theoretically, I show in a simple model of the publication process that we should expect the replication rate to fall below its nominal target, even when original studies are unbiased. The main mechanism is that the most commonly used approach for setting the replication sample size does not properly account for the fact that original effect sizes are estimated. Specifically, it sets the replication sample size to achieve a nominal power target under the assumption that estimated effect sizes correspond to fixed true effects. However, since there are non-linearities in the replication power function linking original effect sizes to power, ignoring the fact that effect sizes are estimated leads to systematically lower replication rates than intended. Empirically, I find that a parsimonious model accounting only for these issues can fully explain observed replication rates in experimental economics and social science, and two-thirds of the replication gap in psychology. I conclude with practical recommendations for replicators.
Presented at 2024 BITSS Annual Meeting, Berkeley; 2023 Econometrics Society Australian Meeting, Sydney; 2022 Association for Interdisciplinary Meta-research & Open Science Conference, Melbourne
The rapid advancement of ‘deepfake’ video technology — which uses deep learning artificial intelligence algorithms to create fake videos that look real — has given urgency to the question of how policymakers and technology companies should moderate inauthentic content. We conduct an experiment to measure people’s alertness to and ability to detect a high-quality deepfake amongst a set of videos. First, we find that in a natural setting with no content warnings, individuals who are exposed to a deepfake video of neutral content are no more likely to detect anything out of the ordinary (32.9%) compared to a control group who viewed only authentic videos (34.1%). Second, we find that when individuals are given a warning that at least one video in a set of five videos is a deepfake, only 21.6% of respondents correctly identify the deepfake as the only inauthentic video, while the remainder erroneously select at least one genuine video as a deepfake.
Women’s schooling attainment in India continues to lag considerably behind that of men. This paper uses nationally representative district-level data from the 2007–8 District Level Household and Facility Survey (DLHS-3), Indicus Analytics, and the 2011–12 Indian Human Development Survey-II (IHDS-II) to examine the role of socioeconomic and cultural factors in influencing gender differentials in schooling. The results provide quantitative evidence of the role of different economic and sociocultural factors on gender disparities in education. The empirical results show that economic development is an important factor in narrowing gender gaps in education, with wealthier districts more likely to educate girls than poorer districts. However, the norm of patrilocal exogamy, where wives migrate to co-reside with their husband’s kin, is associated with worse outcomes for women’s schooling relative to men’s schooling; and, in keeping with anthropological research, gender-differentiated inequities in education are more pronounced in Northern India.
The MTurk Replication Project will test the reproducibility of 26 social science studies that used online research participants and were published in PNAS between 2015 and 2018. For the subset of 19 studies reporting t-ratios, this paper preregisters an out-of-sample prediction that 57% will be successfully replicated with a statistical significant effect in the same direction as the original study. It also preregisters individual-study predictions, identifying ten studies with very high expected replication probabilities (>95%) and seven studies with relatively low expected replication probabilities (<15%). These 'predictions' should be viewed as estimates of real replication power given the project's replication design. When replication outcomes are made publicly available, I will compare them against my preregistered estimates of real replication power.