Jacob Kahn

Research Manager at Meta AI leading FAIR’s code generation research in North America, also serving as CS Faculty at the University of Pennsylvania. Creator of CWM (Code World Models), pioneering the use of execution traces and behavioral data in neural code generation.

Pioneering Code Generation with World Models

Jacob Kahn challenges conventional approaches to code generation by treating it as a world-modeling problem rather than pure syntax prediction. His work fundamentally rethinks how neural networks learn from code by incorporating program execution traces, bash outputs, and CI/CD results alongside source code.

Current Work & Research

Jacob leads FAIR’s code generation research, focusing on the co-design of deep neural models and systems. His CWM project introduces a 32B parameter model trained on execution data from GitHub repositories, achieving 65.8% on SWE-bench Verified, 68.6% on LiveCodeBench, and 96.6% on Math-500.

The CWM approach incorporates:

Memory traces from program execution
Bash outputs and command results
CI/CD build success/failure patterns
Complete execution histories alongside code syntax

This enables the model to develop what Jacob calls a “neural debugger” capability - implicit understanding of program behavior without actually running code.

Key Technical Innovations

Training methodology: Rather than learning syntax patterns alone, CWM learns from how code actually behaves at runtime, creating an internalized simulator of program execution.

SWE-RL integration: The model improves through reinforcement learning on failed agentic reasoning attempts, treating mistakes as valuable training signals rather than noise.

Bash-centric design: Emphasizes practical shell understanding over tool proliferation, with significant post-training scaling on real-world workflows.

“in some sense this is difficult to decide” - CWM’s response when asked about the halting problem, demonstrating nuanced understanding of computational complexity beyond pattern matching.

Notable Publications

Creator of flashlight (5.4k stars), a C++ standalone library for machine learning, and wav2letter (6.4k stars), an automatic speech recognition toolkit.

Background

Holds M.S. and B.S. degrees in Computer Science from University of Pennsylvania’s Management & Technology program, plus B.S. in Economics from Wharton School with focus on statistics and operations research. Previously led PennApps hackathon organization.

Conference Appearance

Event: AI Engineering Code Summit 2025 Date: November 21, 2025 Time: 11:00 AM - 11:19 AM Session: Code World Models: Building World Models for Computation

Jacob introduced the paradigm shift from syntax-based to behavior-based code generation, demonstrating how execution traces create fundamentally better code models. His presentation emphasized learning from failure patterns, the importance of bash understanding, and how 32B specialized models can achieve frontier-level performance through focused training on execution data.

Related Themes

Reinforcement Learning for Specialized Models: The Economics of Domain Expertise

Jacob Kahn

Pioneering Code Generation with World Models

Current Work & Research

Key Technical Innovations

Notable Publications

Background

Conference Appearance

CLASSIFIED_FILES

Jacob Kahn

Related Themes

Reinforcement Learning for Specialized Models: The Economics of Domain Expertise

Jacob Kahn

Pioneering Code Generation with World Models

Current Work & Research

Key Technical Innovations

Notable Publications

Background

Conference Appearance

Get the Latest AI Engineering Insights

CLASSIFIED_FILES

Stay Ahead of the Curve