THEFOCUS.AI LABS

PROJECT: TheFocus.AI Research Division

DOC. NO: RM-REPORTS-INDEX

TheFocus.AI Labs • Est. 2024

7 REPORTS

REPORTS &
DEEP DIVES

Model evaluations, tool-use benchmarks, and honest field reports on where AI-assisted software is actually headed. Written by the team that ships production AI for Samsung, Tesla, Perplexity, and Rivian.

May 2026 RM-2026-05-LP

Local Providers: Which Open Model Should You Actually Run?

We tested 7 model families across 3 local runtimes on a 64 GB Mac — 20 cells, 5 dimensions. gemma-4-26b-a4b hits 95% in 4 minutes. Thinking mode usually hurts except for tool-calling. Interactive charts and full methodology.

Read the Report →

March 2026 RM-2026-03-MS

Testing 48 LLMs Across 5 Dimensions for $4.63

Which model should you actually use? We tested 48 models across reasoning, knowledge, instruction following, coding, and MCP tool use. Claude Sonnet 4.6 leads at 93.8% across all 5 dimensions. Interactive leaderboard with per-model detail panels.

Read the Report →

March 2026 RM-2026-03-MCP

Can LLMs Use Real-World Tools?

We gave 39 models access to a Rivian owner's real driving and charging data via MCP tools and asked them to summarize 10 days of activity. Most can call tools. Far fewer call the right tools.

Read the Report →

February 2026 RM-2026-02-CW

The Car Wash Test: Do LLMs Have Common Sense?

We tested 131 AI models with one simple question: "Should I walk or drive to the car wash?" Only 24% figured out you need to bring your car. Interactive report with charts and full model comparison.

Read the Report →

December 2025 RM-2025-12-WCA

Weekend Coding Agent

Want to understand how AI coding agents actually work? We built one from scratch — and documented every step. This is the tutorial we wish existed when we started.

Read the Tutorial →

November 2025 RM-2025-11-AIECODE

AI Engineering Code Summit 2025

We gathered leading coding agent builders, engineers, and researchers to share what's actually working. Here's what we learned.

Read the Report →

June 2025 RM-2025-06-CAR

June 2025 Coding Agent Report

We tested 15 leading AI coding tools and documented the strengths, weaknesses, and surprises. Clear recommendations for different workflows and experience levels.

Read the Full Report (PDF) →

For Practitioners Ready to Build

The team behind this research also ships production AI.

Principal-led engagements. From tribal knowledge to production AI — in weeks, not quarters.

See the Framework →

TheFocus.AI Research Division

Exploring the future of AI-powered software development

Subscribe to our newsletter

Ready to ship production AI?

Whether you need a quick Vibe Check or a full Habitat built on Habitat OS, we'd love to hear what you're working on.

Talk to a principal See Our Work

REPORTS & DEEP DIVES

Local Providers: Which Open Model Should You Actually Run?

Testing 48 LLMs Across 5 Dimensions for $4.63

Can LLMs Use Real-World Tools?

The Car Wash Test: Do LLMs Have Common Sense?

Weekend Coding Agent

AI Engineering Code Summit 2025

June 2025 Coding Agent Report

Subscribe to our newsletter

Ready to ship production AI?

REPORTS &
DEEP DIVES