Joel Becker

Member of Technical Staff at METR, leading groundbreaking research on AI capability measurement and evaluation. Previously PhD student in Economics at NYU and researcher at Harvard, bringing rigorous empirical methods to AI safety research.

Bridging the Benchmark-Economics Gap

Joel Becker has emerged as a critical voice in AI capability measurement, challenging the field’s reliance on laboratory benchmarks that often fail to predict real-world economic impact. His research reveals fundamental disconnects between how AI progress is measured and how it actually creates value.

Current Work

As Member of Technical Staff at METR (Model Evaluation & Threat Research), Joel leads evaluation research focused on understanding AI capabilities through both controlled testing and field evidence. His work spans:

Capability measurement frameworks - Building evaluation methods that capture real-world performance, not just benchmark scores
Developer productivity research - RCT study finding AI tools increased task completion time by 19% for experienced open-source developers
Long-horizon task evaluation - Research on measuring AI ability to complete complex, multi-step tasks
AI R&D capabilities - RE-Bench evaluating frontier AI research capabilities against human experts

Joel also runs Qally’s, applying similar evaluation rigor to healthcare technology.

Background

Previously PhD student in Economics at New York University (2020-2022) and researcher at Harvard University (2018-2022), Joel brings academic economics methodology to AI safety research. His work has been published in Nature Human Behaviour and accepted to ICML 2025 (Spotlight), with coverage in Reuters, The Atlantic, WIRED, MIT Technology Review, Financial Times, and The Economist.

Philosophy on Capability Measurement

Joel’s approach challenges conventional AI evaluation:

Field evidence over laboratory optimism - Deployment data provides ground truth that controlled benchmarks cannot capture. Real-world constraints reveal hidden dependencies and failure modes invisible in lab settings.

Economic impact as capability measure - True capability assessment requires understanding actual value creation, not just raw performance scores. Some high-benchmark systems produce minimal practical value, while modest improvements in overlooked capabilities drive significant economic impact.

Implications for automated AI R&D - As AI systems begin doing their own research, what should they optimize for? Benchmark scores or real-world utility? Joel’s work suggests the benchmark-economics gap has profound implications for how autonomous research systems should be designed.

Key Publications

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - Randomized controlled trial with surprising results
Measuring AI Ability to Complete Long Tasks - Framework for evaluating complex multi-step capabilities
RE-Bench: Evaluating frontier AI R&D capabilities - Comparing language model agents against human experts
Resource profile of the Polygenic Index Repository - Nature Human Behaviour (2021)

Conference Appearance

Event: AI Engineering Code Summit 2025 Date: November 21, 2025 Time: 5:00 PM - 5:19 PM Session: Benchmarks vs economics: the AI capability measurement gap

Joel presented on reconciling laboratory benchmarks with field evidence on AI capabilities, exploring what this means for automated AI R&D. His talk challenged the field’s fixation on benchmark scores and emphasized the need for evaluation frameworks grounded in real-world economic impact.

Related Themes

Model Quality Over Scaffolding: Minimalism in Agent Architecture

Data Collection & Quality as the New Bottleneck

Joel Becker

Bridging the Benchmark-Economics Gap

Current Work

Background

Philosophy on Capability Measurement

Key Publications

Conference Appearance

CLASSIFIED_FILES

Joel Becker

Related Themes

Model Quality Over Scaffolding: Minimalism in Agent Architecture

Data Collection & Quality as the New Bottleneck

Joel Becker

Bridging the Benchmark-Economics Gap

Current Work

Background

Philosophy on Capability Measurement

Key Publications

Conference Appearance

Get the Latest AI Engineering Insights

CLASSIFIED_FILES

Stay Ahead of the Curve