Expert Biography

Rhythm Garg

Co-founder and CTO at Applied Compute, building production-ready reinforcement learning infrastructure. Former OpenAI researcher who developed reasoning models including o1, with expertise in distributed RL training and competitive programming systems.

Building Fast, Cheap, Predictable RL for Production

Rhythm Garg advocates for a pragmatic approach to reinforcement learning that differs fundamentally from academic research labs. His focus: making RL deployable at scale through infrastructure that prioritizes speed, cost-efficiency, and predictable training dynamics over theoretical optimality.

Current Work

As CTO of Applied Compute, Rhythm leads technical development of enterprise-focused RL infrastructure. The company, valued at $700M after raising $80M, builds specialized AI agents trained on company-specific data. Key customers include DoorDash, Cognition, and Mercor.

Applied Compute’s technical approach:

  • Pipeline RL with in-flight weight updates - Sophisticated async training that manages the staleness-variance tradeoff for maximum GPU utilization
  • First-principles system modeling - Mathematical frameworks for optimizing distributed training pipelines across heterogeneous GPU clusters
  • Data flywheel deployment - Agents that try actions hundreds of times, grade results systematically, and use RL to reinforce successful reasoning paths
  • Specific Intelligence - Customized agents with domain expertise, trained on proprietary data, optimized for enterprise workflows

Background

Prior to founding Applied Compute, Rhythm was a researcher at OpenAI working on the first reasoning models trained with reinforcement learning. His contributions include:

  • Development of the o1 reasoning model series
  • Infrastructure for training models on competitive programming tasks
  • Research on scaling RL approaches for complex reasoning domains

Education from Stanford University with achievements including Intel ISEF 2nd Prize Grand Award, Putnam Mathematical Competition Top 500, and multiple competitive programming honors.

Philosophy on Reinforcement Learning

Rhythm’s technical approach challenges conventional wisdom:

Staleness as a feature - Properly managed staleness enables faster training by keeping GPUs saturated. The key is understanding exactly where the staleness-variance balance lies.

System-level optimization - GPU utilization drives everything. Success requires modeling the entire training pipeline from first principles: n_gpus, training_batch_size, KV cache memory, forward pass latency, and training throughput.

Deploy with a flywheel - Real-world agent deployment generates the training data needed for continuous improvement. Try hundreds of times, grade systematically, reinforce successful paths.

Practical over theoretical - Speed and cost matter as much as accuracy. Training pipelines must have known variance and convergence properties to be production-ready.

Research & Publications

Co-author on major OpenAI research:

Conference Appearance

Event: AI Engineering Code Summit 2025 Date: November 21, 2025 Time: 11:20 AM - 11:39 AM Session: Efficient Reinforcement Learning (co-presented with Linden Li)

Rhythm presented Applied Compute’s proprietary RL stack for training superhuman agents. The talk covered pipeline RL with in-flight weight updates, managing staleness-variance tradeoffs, first-principles modeling of distributed training systems, and practical approaches for pushing AI past productivity into real-world reasoning tasks. Technical deep-dive included GPU utilization optimization, async training architectures, and the engineering reality of making RL fast, cheap, and predictable for production environments.

Stay Updated

Get the Latest AI Engineering Insights

Join the Focus.AI newsletter for curated research, analysis, and perspectives on the evolving AI landscape.

No spam. Unsubscribe anytime.

CLASSIFIED_FILES

USER: AUTHORIZED

[ EMPTY DRAWER ]

No documents have been filed.