Cathy Zhou

Member of Technical Staff at OpenAI’s fine-tuning team, pioneering Agent Reinforcement Fine-Tuning (ARFT) - the industry’s first approach enabling agents to access external tools and APIs during training. Leading the development of custom reward functions and tool-aware training systems that make specialized AI agents possible.

Pioneering Tool-Aware Agent Training

Cathy Zhou is at the forefront of a fundamental shift in how AI agents learn. As part of OpenAI’s fine-tuning team, she developed ARFT, which marks the first time OpenAI allowed agents to access external tools during the training process - not just at inference time.

This breakthrough enables agents to learn the semantics of tools, understand when to call them, and optimize for domain-specific behaviors. The result: agents that are faster, more specialized, and capable of handling customer-specific toolsets with dramatically improved performance.

Philosophy: Three Levels Before Fine-Tuning

Cathy advocates for a systematic progression before reaching for ARFT:

Prompt Engineering - Steer models through better prompts
Task Optimization - Refine tools and interfaces
Fine-Tuning (ARFT) - Only then, change model weights

Her emphasis on data quality over quantity has proven critical: customers achieve 10-point improvements with just 1000 well-crafted examples. The minimum viable dataset is 10-100 examples, but performance scales with thoughtful data collection.

Real-World Impact

Cathy’s work has enabled breakthrough applications across multiple domains:

Cognition - Code edit planning agents learning parallel tool calls, 10-point improvement with 1000 examples
Qodo - Code review agents with stabilized behavior and reduced long-tail tool calls
Cosine - Enterprise agents integrating 30 tools with strict, no-partial-credit grading
Mako - GPU kernel builders achieving 72% improvement over frontier models

Each success story reinforces core principles: well-specified tasks, production-aligned evaluation, unhackable reward functions, and continuous feedback loops.

Technical Challenges & Solutions

Reward hacking remains the hardest problem. Models are sophisticated enough to game reward functions in unexpected ways. Cathy’s team addresses this through judge LLMs that evaluate quality and prevent gaming - particularly valuable for assessing code style, architectural decisions, and validation practices.

Domain shift occurs when agents learn to call training-distribution tools rather than customer-specific ones. The solution: careful reward design and production-aligned evaluation that mirrors real-world behavior.

Background

Before OpenAI, Cathy studied CS + Math at Stanford, focusing on computer vision and AI/ML. Her technical foundation spans neural networks, reinforcement learning systems, and large-scale training infrastructure.

Conference Appearance

Event: AI Engineering Code Summit 2025 Date: November 21, 2025 Time: 12:00 PM - 12:19 PM Session: Agent Reinforcement Fine Tuning (with Will Hang)

Cathy presented real customer success stories and the technical principles enabling effective agent fine-tuning. Her talk covered the prerequisites for ARFT success, the importance of reward function design, and practical lessons from production deployments across code editing, code review, enterprise tooling, and GPU kernel generation.

Related Themes

Reinforcement Learning for Specialized Models: The Economics of Domain Expertise

Data Collection & Quality as the New Bottleneck

Cathy Zhou

Pioneering Tool-Aware Agent Training

Philosophy: Three Levels Before Fine-Tuning

Real-World Impact

Technical Challenges & Solutions

Background

Conference Appearance

CLASSIFIED_FILES

Cathy Zhou

Related Themes

Reinforcement Learning for Specialized Models: The Economics of Domain Expertise

Data Collection & Quality as the New Bottleneck

Cathy Zhou

Pioneering Tool-Aware Agent Training

Philosophy: Three Levels Before Fine-Tuning

Real-World Impact

Technical Challenges & Solutions

Background

Conference Appearance

Get the Latest AI Engineering Insights

CLASSIFIED_FILES

Stay Ahead of the Curve