Rhythm Garg

Co-founder and CTO at Applied Compute, building production-ready reinforcement learning infrastructure. Former OpenAI researcher who developed reasoning models including o1, with expertise in distributed RL training and competitive programming systems.

Building Fast, Cheap, Predictable RL for Production

Rhythm Garg advocates for a pragmatic approach to reinforcement learning that differs fundamentally from academic research labs. His focus: making RL deployable at scale through infrastructure that prioritizes speed, cost-efficiency, and predictable training dynamics over theoretical optimality.

Current Work

As CTO of Applied Compute, Rhythm leads technical development of enterprise-focused RL infrastructure. The company, valued at $700M after raising $80M, builds specialized AI agents trained on company-specific data. Key customers include DoorDash, Cognition, and Mercor.

Applied Compute’s technical approach:

Pipeline RL with in-flight weight updates - Sophisticated async training that manages the staleness-variance tradeoff for maximum GPU utilization
First-principles system modeling - Mathematical frameworks for optimizing distributed training pipelines across heterogeneous GPU clusters
Data flywheel deployment - Agents that try actions hundreds of times, grade results systematically, and use RL to reinforce successful reasoning paths
Specific Intelligence - Customized agents with domain expertise, trained on proprietary data, optimized for enterprise workflows

Background

Prior to founding Applied Compute, Rhythm was a researcher at OpenAI working on the first reasoning models trained with reinforcement learning. His contributions include:

Development of the o1 reasoning model series
Infrastructure for training models on competitive programming tasks
Research on scaling RL approaches for complex reasoning domains

Education from Stanford University with achievements including Intel ISEF 2nd Prize Grand Award, Putnam Mathematical Competition Top 500, and multiple competitive programming honors.

Philosophy on Reinforcement Learning

Rhythm’s technical approach challenges conventional wisdom:

Staleness as a feature - Properly managed staleness enables faster training by keeping GPUs saturated. The key is understanding exactly where the staleness-variance balance lies.

System-level optimization - GPU utilization drives everything. Success requires modeling the entire training pipeline from first principles: n_gpus, training_batch_size, KV cache memory, forward pass latency, and training throughput.

Deploy with a flywheel - Real-world agent deployment generates the training data needed for continuous improvement. Try hundreds of times, grade systematically, reinforce successful paths.

Practical over theoretical - Speed and cost matter as much as accuracy. Training pipelines must have known variance and convergence properties to be production-ready.

Research & Publications

Co-author on major OpenAI research:

Competitive Programming with Large Reasoning Models (2025) - Demonstrates that o3 achieves gold medal at 2024 IOI and Codeforces rating on par with elite human competitors
OpenAI o1 System Card (2024) - Technical documentation for the o1 reasoning model series
Additional work on logic puzzles and LLM reasoning capabilities

Conference Appearance

Event: AI Engineering Code Summit 2025 Date: November 21, 2025 Time: 11:20 AM - 11:39 AM Session: Efficient Reinforcement Learning (co-presented with Linden Li)

Rhythm presented Applied Compute’s proprietary RL stack for training superhuman agents. The talk covered pipeline RL with in-flight weight updates, managing staleness-variance tradeoffs, first-principles modeling of distributed training systems, and practical approaches for pushing AI past productivity into real-world reasoning tasks. Technical deep-dive included GPU utilization optimization, async training architectures, and the engineering reality of making RL fast, cheap, and predictable for production environments.

Related Themes

Reinforcement Learning for Specialized Models: The Economics of Domain Expertise

Rhythm Garg

Building Fast, Cheap, Predictable RL for Production

Current Work

Background

Philosophy on Reinforcement Learning

Research & Publications

Conference Appearance

CLASSIFIED_FILES

Rhythm Garg

Related Themes

Reinforcement Learning for Specialized Models: The Economics of Domain Expertise

Rhythm Garg

Building Fast, Cheap, Predictable RL for Production

Current Work

Background

Philosophy on Reinforcement Learning

Research & Publications

Conference Appearance

Get the Latest AI Engineering Insights

CLASSIFIED_FILES

Stay Ahead of the Curve