Will Hang

Member of Technical Staff at OpenAI, pioneering Agent Reinforcement Fine-Tuning (ARFT) alongside Cathy Zhou. Leads breakthrough work enabling domain-specific coding agents that dramatically outperform frontier models using minimal training data.

The 1000-Example Revolution

Will Hang’s headline innovation: 1000 examples can yield 10-point improvements. Not 10,000. Not 100,000. One thousand high-quality examples redefine the economics of specialized model training and democratize AI customization for organizations of all sizes.

Current Work

At OpenAI, Will develops Agent Reinforcement Fine-Tuning (ARFT), a breakthrough approach that fine-tunes model weights specifically for custom tools and reward functions using only tens to hundreds of examples. The key innovation: agents can access customer tools during training, enabling dramatic performance improvements with modest datasets.

See: OpenAI Reinforcement Fine-tuning Documentation

Background

Previously contributed to machine learning infrastructure at Google Brain focusing on chip design optimization. Holds patents in IC floorplan generation and medical imaging. Stanford education with prior work in reinforcement learning for graph partitioning and multimodal healthcare systems.

Philosophy on Agent Training

Will’s approach emphasizes efficiency and practical deployment over massive data requirements:

Model capability with domain tools - Fine-tuning for specific tools and reward functions rather than relying solely on generic pre-training patterns, requiring only hundreds versus millions of examples.

Data quality over quantity - High-quality examples exponentially outperform mediocre datasets, with validated results showing 1,000 examples achieving 10-point improvements and 100 examples yielding 72% gains in specialized domains.

Production-mirror evaluation - Training must reflect real-world usage patterns with unhackable rewards using judge LLMs to prevent reward gaming and ensure incremental performance improvement.

Validated Results

Cognition (Code Edit Planning): 10-point improvement from 1,000 examples, discovering optimization strategies autonomously.

Mako (GPU Kernel Building): 72% improvement over frontier models from just 100 PyTorch examples.

Qodo (Code Review): Stabilized behavior and reduced long-tail tool calls using 1,000 question pairs.

Cosine (Enterprise Code): Faster agents with reliable outputs across 30 distinct tools using uncompromising evaluation.

OpenAI

OpenAI develops frontier AI models and tools for developers. Will’s work on ARFT represents a fundamental shift in how organizations can customize AI agents, making specialized model training economically viable for companies of all sizes through modest datasets and reasonable budgets.

Conference Appearance

Event: AI Engineering Code Summit 2025 Date: November 21, 2025 Time: 12:00 PM - 12:19 PM Co-Presenter: Cathy Zhou (OpenAI)

Will covered ARFT methodology, domain shift challenges, agent tool access during training, partner case studies, and reproducible success principles. Emphasized that data quality matters far more than quantity, shifting the AI economics bottleneck from compute resources to problem definition, data quality, and reward engineering.

Related Themes

Reinforcement Learning for Specialized Models: The Economics of Domain Expertise

Data Collection & Quality as the New Bottleneck

Will Hang

The 1000-Example Revolution

Current Work

Background

Philosophy on Agent Training

Validated Results

OpenAI

Conference Appearance

CLASSIFIED_FILES

Will Hang

Related Themes

Reinforcement Learning for Specialized Models: The Economics of Domain Expertise

Data Collection & Quality as the New Bottleneck

Will Hang

The 1000-Example Revolution

Current Work

Background

Philosophy on Agent Training

Validated Results

OpenAI

Conference Appearance

Get the Latest AI Engineering Insights

CLASSIFIED_FILES

Stay Ahead of the Curve