Do Standard Error Corrections Exacerbate Publication Bias? [ Abstract | Draft ]
Over the past several decades, econometrics research has devoted substantial efforts to improving the credibility of standard errors. This paper studies how such improvements interact with the selective publication process to affect the ultimate credibility of published studies. I show that adopting improved but enlarged standard errors for individual studies can lead to higher bias in the studies selected for publication. Intuitively, this is because increasing standard errors raises the bar on statistical significance, which exacerbates publication bias. Despite the possibility of higher bias, I show that the coverage of published confidence intervals unambiguously increases. I illustrate these phenomena using a newly constructed dataset on the adoption of clustered standard errors in the difference-in-differences literature between 2000 and 2009. Clustering is associated with a near doubling in the magnitude of published effect sizes. I estimate a model of the publication process and find that clustering led to large improvements in coverage but also sizable increases in bias. To examine the overall impact on evidence-based policy, I develop a model of a policymaker who uses information from published studies to inform policy decisions and overestimates the precision of estimates when standard errors are unclustered. I find that clustering lowers minimax regret when policymakers exhibit sufficiently high loss aversion for mistakenly implementing an ineffective or harmful policy.
2023 George Borts Prize for best doctoral dissertation in economics, Brown University
Presented at 2023 Econometrics Society North American Summer Meeting, Los Angeles; 2023 MAER-Net Colloquium; 2023 AYEW Monash University
Many explanations have been offered for why replication rates are low in the social sciences, including selective publication, p-hacking, and treatment effect heterogeneity. This article emphasizes that issues with common power calculations in replication studies may also play an important role. Theoretically, I show in a simple model of the publication process that issues with the way that replication power is commonly calculated imply we should always expect replication rates to fall below their intended power targets, even when original studies are unbiased and there is no p-hacking or treatment effect heterogeneity. Empirically, I find that a parsimonious model accounting only for issues with power calculations can fully explain observed replication rates in experimental economics and social science, and two-thirds of the replication gap in psychology.
Presented at 2023 Econometrics Society Australian Meeting, Sydney; 2022 Association for Interdisciplinary Meta-research & Open Science Conference, Melbourne
Empirical research can inform evidence-based policy choice but may be censored due to publication bias. How does this impact the decisions of policymakers who do not have, or are unwilling to use, prior beliefs about a policy's impact? For minimax regret policymakers, we characterize the optimal treatment rule with selective publication against statistically insignificant results. We then show that the optimal publication rule which minimizes maximum regret is non-selective. This contrasts with the optimal publication rule for Bayesian policymakers studied in the literature, where only `extreme' results that sufficiently move the prior are published. Thus, in the minimax regret framework, the optimal publication regime for policy choice is consistent with valid statistical inference in scientific research.
The rapid advancement of ‘deepfake’ video technology — which uses deep learning artificial intelligence algorithms to create fake videos that look real — has given urgency to the question of how policymakers and technology companies should moderate inauthentic content. We conduct an experiment to measure people’s alertness to and ability to detect a high-quality deepfake amongst a set of videos. First, we find that in a natural setting with no content warnings, individuals who are exposed to a deepfake video of neutral content are no more likely to detect anything out of the ordinary (32.9%) compared to a control group who viewed only authentic videos (34.1%). Second, we find that when individuals are given a warning that at least one video in a set of five videos is a deepfake, only 21.6% of respondents correctly identify the deepfake as the only inauthentic video, while the remainder erroneously select at least one genuine video as a deepfake.
Gender Inequality in Education and Kinship Norms in India (with Anu Rammohan). 2018. Feminist Economics. [ Abstract | Paper ]
Women’s schooling attainment in India continues to lag considerably behind that of men. This paper uses nationally representative district-level data from the 2007–8 District Level Household and Facility Survey (DLHS-3), Indicus Analytics, and the 2011–12 Indian Human Development Survey-II (IHDS-II) to examine the role of socioeconomic and cultural factors in influencing gender differentials in schooling. The results provide quantitative evidence of the role of different economic and sociocultural factors on gender disparities in education. The empirical results show that economic development is an important factor in narrowing gender gaps in education, with wealthier districts more likely to educate girls than poorer districts. However, the norm of patrilocal exogamy, where wives migrate to co-reside with their husband’s kin, is associated with worse outcomes for women’s schooling relative to men’s schooling; and, in keeping with anthropological research, gender-differentiated inequities in education are more pronounced in Northern India.
The MTurk Replication Project will test the reproducibility of 26 social science studies that used online research participants and were published in PNAS between 2015 and 2018. For the subset of 19 studies reporting t-ratios, this paper preregisters an out-of-sample prediction that 57% will be successfully replicated with a statistical significant effect in the same direction as the original study. It also preregisters individual-study predictions, identifying ten studies with very high expected replication probabilities (>95%) and seven studies with relatively low expected replication probabilities (<15%). These 'predictions' should be viewed as estimates of real replication power given the project's replication design. When replication outcomes are made publicly available, I will compare them against my preregistered estimates of real replication power.