Job Market Paper
Voice AI in Firms: A Natural Field Experiment on Automated Job Interviews
with Luca Henkel
SSRN version | Latest version: September 19, 2025 (PDF) | Twitter thread
Abstract
We study the impact of replacing human recruiters with AI voice agents to conduct job interviews. Partnering with a recruitment firm, we conducted a natural field experiment in which 70,000 applicants were randomly assigned to be interviewed by human recruiters, AI voice agents, or given a choice between the two. In all three conditions, human recruiters evaluated interviews and made hiring decisions based on applicants' performance in the interview and a standardized test. Contrary to the forecasts of professional recruiters, we find that AI-led interviews increase job offers by 12%, job starts by 18%, and 30-day retention by 17% among all applicants. To explain these results, we explore three channels. First, analyzing interview transcripts reveals that AI-led interviews elicit more hiring-relevant information from applicants compared to human-led interviews. Second, recruiters score the interview performance of AI-interviewed applicants higher, but place greater weight on standardized tests in their hiring decisions. Third, applicants accept job offers with a similar likelihood and rate interview, as well as recruiter quality, similarly in a customer experience survey. Moreover, when offered the choice, 78% of applicants choose the AI recruiter, and we find evidence that applicants with lower test scores are more likely to choose AI.
Coverage
Bloomberg (interview) | The Information (interview) | HuffPost (interview) | Poets & Quant (interview) | Marginal Revolution (mention) | PSG Global Solutions (press release) | Teleperformance (press release) | CBS (press release) | Barchart (press release) | Yahoo Finance (press release) | Booth Center for Applied AI (interview) | Chicago Booth Review Podcast (Podcast, scheduled) | Business Insider (quote) | Financial Times (mention) | Fortune (mention) | Rest of World (mention) | HRM Outlook (mention) | HR Tech Cube (mention) | Kyla Scanlon’s Newsletter (mention) | Nasdaq (mention) | El Espectador (mention) | eWeek (mention); Morning Brew (mention) | ReWorked (mention) | Greg Isenberg’s post (mention) | Ethan Molick’s post (mention) | Talent Edge (mention) | Numerama (quote)
Presentations
Google, Google Economics Seminar | Microsoft Research, AI & Business Value Internal Meeting | Conference on Field Experiments in Strategy 2025, Harvard Business School & INSEAD, San Francisco | Advances with Field Experiments Conference 2025, University of Chicago, Department of Economics | Applied Micro Seminar (University of Illinois Urbana-Champaign, Department of Economics | AI Behavioral Science Workshop 2025, Stanford University, CASBS | Behavioral Science Seminar, University of Chicago, Booth School of Business, 2025 | Experimental Economics Workshop University of Chicago, Department of Economics, 2025 | TOM Workshop Meeting Harvard Business School, 2024 | Conference on Field Experiments in Strategy 2024, Harvard Business School & INSEAD; Paris | Conference on AI in Business, Harvard Business School, Harvard D3 and Nova Business School, 2024.
Working Papers
Artificial Writing and Automated Detection New Paper
with Alex Imas
SSRN version | NBER version
GitHub Replication Package
Abstract
Artificial intelligence (AI) tools are increasingly used for written deliverables. This has created demand for distinguishing human-generated text from AI-generated text at scale, e.g., ensuring assignments were completed by students, product reviews written by actual customers, etc. A decision-maker aiming to implement a detector in practice must trade-off two key statistics: the False Negative Rate (FNR), which corresponds to the proportion of AI-generated text that is falsely classified as human, and the False Positive Rate (FPR), which corresponds to the proportion of human-written text that is falsely classified as AI-generated. We evaluate three leading commercial detectors —Pangram, OriginalityAI, GPTZero — and an open-source one — RoBERTa — on their performance in minimizing separately these statistics using a large corpus spanning genres, lengths, and models and then testing their robustness in the face of these trade-offs. While commercial detectors outperform open-source, with Pangram achieving near-zero FNR and FPR rates that remain robust across models, threshold rules, ultra-short passages, "stubs" (≤ 50 words) and 'humanizer' tools in our sample, we recommend relying on an AI governance framework to adopt detectors in real-world settings. A decision-maker may weight one type of error (Type I vs. Type II) as more important than the other. To account for such a reality that AI detection is a decision problem, we introduce a framework where the decision-maker can clearly set a policy cap---a detector-independent metric reflecting tolerance for false positives or negatives. This framework is especially relevant given the uncertainty surrounding how AI may be used at different stages of writing, where certain uses may be encouraged (e.g., grammar correction) but may be difficult to separate from other uses.
Coverage
Twitter thread | The Information (mention)| Marginal Revolution (mention) | Full Media Coverage: Less Wrong #132 (Zvi Mowshowitz); Ethan Mollick’s posts; Businesswire; AI World; Chicago Booth Review Podcast (Podcast, scheduled);
AI Behavioral Science New Paper
with Matthew O. Jackson, Qiaozhu Mei, Stephanie W. Wang, Yutong Xie, Walter Yuan, Seth Benzell, Erik Brynjolfsson, Colin F. Camerer, James Evans, Jon Kleinberg, Juanjuan Meng, Sendhil Mullainathan, Asu Ozdaglar, Thomas Pfeiffer, Moshe Tennenholtz, Robb Willer, Diyi Yang, and Teng Ye
SSRN version
The Virtues of Lab Experiments
with Gary Charness, James Cox, Charles Holt, Catherine Eckel
CESifo version
R&R at Journal of Economic Behavior and Organization
Distributional Approach to Risk Preferences
with Nir Chemaya, Charles Johnson, Enoch Yeung, Gary Charness
Pre-print version
Two-Ball Ellsberg Paradox
with Simon Lazarus
CESifo version
Critical Thinking and Storytelling Contexts
with Elia Sartori
CESifo version
Selected Work in Progress
Automated Cognitive Expertise and Human-AI Error Decomposition Coming Soon
Screening Labor with AI and Humans: Optimal Choice and Welfare Field Data Collected
with Pëllumb Reshidi
Human-AI Learning: Theory and Field Evidence from Job Interviews Field Data Collected
with Andrew Koh
Critical Thinking and Economic Impacts: A Natural Field Experiment in Saudi Arabia on Educational and Labor Performance Pilot Data Collected
with Michael Cuna, Faith Fatchen, Faisal Kattan, Min Sok Lee and John List
Peer-Reviewed Articles
The Next Generation of Experimental Research with LLMs
with Gary Charness and John List
NBER No. 31679 | Teaching Slides
Nature Human Behaviour, 2025
World Economic Forum | Chicago Booth Review | VoxEU Column
Invited Survey and Book Chapters
LLMs for Behavioral Economics: Ensuring Internal Validity and Elicitating Mental Models
Invited Entry under preparation for the Elgar Encyclopedia of Experimental Social Science
🌐 SSRN version
LLMs for Behavioral Economics: Synthetic Mental Models and Data Generalization
Invited Entry under preparation for the Elgar Encyclopedia of Experimental Social Science
🌐 SSRN version
Black Boxes: Mental Models and AI Models
Invited Chapter under preparation for the Oxford Research Encyclopedia of Economics and Finance
🌐 SSRN version
Doctoral Theses
PhD Thesis in Economics, Paris School of Economics, 2023
Online Manuscript: [pre-print]
Citation APA: Jabarian, B. (2023). The Economics of Moral Uncertainty: Essays in Behavioral and Experimental Economics (Doctoral dissertation, Université Panthéon-Sorbonne-Paris I).
Title: The Economics of Moral Uncertainty
Committee: Jean-Marc Tallon (Supervisor), Roland Bénabou (Chair), Leeat Yariv, Mohammed Abdelaoui, Nicolas Jacquemet
Abstract: This thesis, rooted in experimental economics, political behavioral economics, and macroeconomic behavioral economics, tackles diverse topics concerning the decision-making processes of economic agents. In the first chapter, we conduct an incentivized experiment on a nationally representative US sample (N=708) to test whether people prefer to avoid ambiguity even when it means choosing dominated options. In contrast to the literature, we find that 55% of subjects prefer a risky act to an ambiguous act that always provides a larger probability of winning. Our experimental design shows that such a preference is not mainly due to a lack of understanding. We conclude that subjects avoid ambiguity per se rather than avoiding ambiguity because it may yield a worse outcome. Such behavior cannot be reconciled with existing models of ambiguity aversion straightforwardly. In the second chapter, in an incentivized online social media experiment (N = 706), we show that different digital storytelling formats – different visual designs and writing styles to present the same set of facts – affect the intensity at which individuals become critical thinkers. Intermediate-length designs (Facebook posts) are most effective at triggering individuals into critical thinking. Individuals with a high need for cognition mostly drive the differential effects of the treatments. We further explore the implications of such results for the welfare and political economy. Particularly, we establish that increasing the share of critical thinkers – individuals who are aware of the ambivalent nature of a certain issue – in the population increases the efficiency of surveys (elections) but might increase surveys’ bias. In the third chapter, we present a novel climate-macroeconomic model, Nested Inequalities Climate-Economy with Risk, and Inequality Uncertainty (NICERIU). We also introduce a social welfare function, the Worldview-Inclusive Welfare. This latter incorporates heterogeneous worldviews regarding welfare and uncertainty about inequality, proposing redistributive economic policies based on equality-distributed equivalence and a novel axiom of minimal comparability of worldviews. NICERIU has been calibrated using a representative sample from the US population (N=500), and the calibration outcomes reveal intriguing insights. With symmetrically weighted distinct world-views, the optimal taxation policy closely approximates conservatively the taxation policy based on a particular worldview but differs in specific ways. In summary, this thesis explores various often overlooked aspects within traditional approaches to economics. It challenges models of ambiguity aversion, highlights the impact of narrative formats on critical thinking, and proposes a climate-economic model that incorporates uncertainty about inequality. The obtained results call for profound reflection and underscore the importance of integrating diverse aspects of decision-making processes into economic analyses.
PhD Thesis in Philosophy, Department of Philosophy, University Panthéon-Sorbonne Paris 1, 2023
Online Manuscript: [pre-print]
Citation APA: Jabarian, B. (2023). Operationalizing moral uncertainty: a framework for critical thinking in an uncertain world (Doctoral dissertation, Université Panthéon-Sorbonne-Paris I).
Title: Operationalizing Moral Uncertainty
Committee: Laurent Jaffro (primary supervisor), Franz Dietrich (co-supervisor), Marc Fleurbaey (Chair), Pierre Livet, Richard Bradley, Katie Steele
Abstract: This Ph.D. in philosophy explores the normative uncertainty problem, i.e., the complex ethical problem of what we should do when uncertain about what we should do. We conduct our thesis in the tradition of the long-forgotten philosophy of science of operationalization. The latter is a thorough analytical approach that allows for applied investigations of a concept whose empirical implications are neither proven nor clear. In the case of an ethical evaluation or choice problem, operationalization includes two main dimensions: (1) providing a framework for reasoning, comparing the values of options, and decision-making by individuals or groups; (2) providing empirical evidence to demonstrate the concept’s relevance for applied research and further scientific investigations. We divide our thesis into two main parts based on these dimensions. A preceding introduction addresses normative uncertainty and its relations to other ethical and meta-ethical concepts. Part I provides a comprehensive framework for comparing the values of options, reasoning, and making individual decisions under normative uncertainty, depending on the types and amount of information available to the decision-maker. Part II demonstrates how we may employ the humanities in survey methods and establish normative uncertainty as an empirical fact by combining both disciplines. The conclusion summarizes our thesis’s main contributions.