NEW: Our latest report on the current state of AI Engineering is now available →

FOCUS.AI LABS

PROJECT: Focus.AI Research Division

DOC. NO: RM-2025-INSIGHTS

Focus.AI Labs • Est. 2024

UNCLASSIFIED

RESEARCH
REPORTS

We write about what we're learning. Research notes, technical deep-dives, and honest observations about AI, software development, and where things are headed.

LATEST

The Car Wash Test: Learning from Model Evals

We asked 131 AI models a simple question — should I walk or drive to the car wash? 76% got it wrong. Simple gotcha questions reveal more about model reasoning than any benchmark leaderboard.

March 1, 2026 •

Essay Compare Models

The Car Wash Test: Learning from Model Evals

The Agent Habitat

#002

The Agent Habitat

An agent isn't just automation with LLM calls. It carries state, accumulates memory, and makes bounded decisions under uncertainty — and a git repo is where all of that lives.

February 5, 2026 Essay

The Data Flywheel Pattern

#003

The Data Flywheel Pattern

Build applications by dropping in data and letting AI handle parsing, structuring, and synthesis. Three case studies.

January 22, 2026 Essay

Claude Code, not Code

#004

Claude Code, not Code

The real power of Claude Code isn't writing software—it's orchestrating skills for research, newsletters, browser automation, and turning one-off requests into repeatable workflows.

January 14, 2026 Essay

How I Use AI in Jan 2026

#005

How I Use AI in Jan 2026

We're at the beginning of infinity. In geological time this is all happening in an instant — and 2026 might be the best year to be alive.

January 3, 2026 Essay

#006

"You're absolutely right" and other AI warning signs

When your AI assistant starts agreeing with everything, you've hit the dumb zone. Practical techniques for managing context windows and keeping AI useful.

December 16, 2025 Essay

#007

AI tools fail loudly where humans failed quietly

AI coding tools don't work well in messy codebases. The fix is doing what should have been done all along - and that's good for everyone.

December 10, 2025 Essay

Weekend Coding Agent: Build Your Own AI Assistant

#008

Weekend Coding Agent: Build Your Own AI Assistant

A hands-on tutorial for building an AI coding agent from scratch. Bootstrap it with an existing agent, then use it to improve itself. Learn the core concepts of how coding agents work.

November 29, 2025 Tutorial

AI Engineering Summit 2025: Bash-Pilled and Building for Everyone

#009

AI Engineering Summit 2025: Bash-Pilled and Building for Everyone

The models got smart. The tools got simple. The circle of who builds is expanding.

November 22, 2025 Agents

AI Engineering Code Summit 2025: Deep Dive Report

#010

AI Engineering Code Summit 2025: Deep Dive Report

A comprehensive analysis of the state of AI engineering tools, frameworks, and best practices from the November 2025 Code Summit. Exploring cutting-edge developments in AI-assisted development, infrastructure, and production deployment strategies.

November 20, 2025 Report

gpt5 is smarter than you are

#011

gpt5 is smarter than you are

gpt5 can choose to be so smart it's almost impossible to judge. Lets see how it does on some unanswerable questions and if it can totally replace google.

September 4, 2025 Models

Single file swift mini-apps

#012

Single file swift mini-apps

swift files can be run directly without compiling and without XCode, making it easy to create native UI elements and access all of macOS's APIs. Once you see Swift as a scripting language rather than just an app language, you start wondering what other capabilities are hiding in plain sight.

August 22, 2025 Affordance

Code Generation with Local Models

#013

Code Generation with Local Models

Small, local AI models deliver surprisingly effective results for everyday tasks. Also llama3.2 is surprisingly fast and gpt-oss is surprisingly good.

August 20, 2025 Essay

gpt-5 and gpt-oss

#014

gpt-5 and gpt-oss

OpenAI’s GPT-5 launch stole headlines, but GPT-OSS quietly made local AI a lot more practical. This post covers what’s new, how to run it with Ollama or LM Studio, and why context size can change your results.

August 13, 2025 Models

Technical Debt and the ROI Threshold

#015

Technical Debt and the ROI Threshold

With agents now able to read and refactor code, the future cost of messy code -- and the current costs of unwritten code -- is shrinking. Code is more disposable and experimentation more rewarding.

July 3, 2025 Essay

Don't be passive aggressive with your agents

#016

Don't be passive aggressive with your agents

Treat your coding agents as adaptable collaborators—communicate clearly, value efficiency over endurance, match tools to your workflow, skip unnecessary formality, rethink technical debt, and document your development rules for best results.

June 25, 2025 Use Case

June 2025 Coding Agent Report

#017

June 2025 Coding Agent Report

A comprehensive analysis of 15 leading AI coding agents in 2025. We break down the strengths, weaknesses, and surprises from top tools, with clear winners for pros, tinkerers, and casual users alike.

June 15, 2025 Report

Feature Development on the go

#018

Feature Development on the go

What happens when you challenge Google Jules, OpenAI Codex, and Cursor to build a PWA—using just your phone? Find out which agent delivered.

June 8, 2025 Essay

Geo-affordance

#019

Geo-affordance

Imagine having Sherlock Holmes’ legendary eye for detail—AI now makes that possible for all of us. Is this AI changing us? It will alter our expectations and the risks of everyday digital life.

June 2, 2025 Essay

Report from Microsoft Build 2025

#020

Report from Microsoft Build 2025

Microsoft is betting big on an open, agent-powered web—where protocols like MCP, A2A, and NLWeb redefine how AI and services interact. The real opportunity in AI isn’t just smarter models, but the “capability overhang” waiting to be unlocked by better reasoning and open standards.

May 21, 2025 Conference

Thoughts on gemini

#021

Thoughts on gemini

Despite popular narratives about Google lagging in AI, their Gemini models reveal engineering excellence that's hard to ignore when you strip away the conservative product decisions and UI polish. From the lightweight yet powerful Gemma 3 to the multimodal capabilities of Gemini 2.5, Google's models demonstrate a level of speed, precision, and fundamental understanding that suggests they're not playing catch-up—they're just being cautious.

April 4, 2025 Essay

Schema-Driven AI: Better User Experiences with Structured Output

#022

Schema-Driven AI: Better User Experiences with Structured Output

Transforms chatting from simple text generators into powerful data processing engines, enabling extraction of organized information from PDFs, audio files, and more. Here are some practical techniques for building, including audio analysis, pdf data extraction and conversation state management, showcasing how constraint-driven outputs can power rich user experiences.

March 30, 2025 Use Case

Moral Vibe Check

#023

Moral Vibe Check

Technical correctness and meaningful insight: well-formatted, detailed AI responses can mask a fundamental lack of understanding—a "raving lunatic" hidden behind impressive form. Maybe P-doom is less about malice and more of making us intellectually poorer by substituting form for substance, facts for understanding, and technical accuracy for wisdom.

March 24, 2025 Essay

Image Gen on Apple Silicon

#024

Image Gen on Apple Silicon

We've got the apple silicon, lets download some models and make some pictures

March 21, 2025 Use Case

Recipes big and small

#025

Recipes big and small

The hardest thing about living in the future is that we're figuring it out as we go. Here's some notes of things to play with.

March 18, 2025 Use Case

Exposing Services with MCP

#026

Exposing Services with MCP

Model Context Protocol bridges the gap between AI models and your applications. Learn how defining simple tools with descriptions and parameters lets Claude intelligently combine services without explicit instructions.

March 15, 2025 Use Case

Agentic YOLO with Warp, Cursor, and Claude

#027

Agentic YOLO with Warp, Cursor, and Claude

What happens when you let AI help you think through and build your ideas, with minimal supervision and maximum trust? What does it mean to be a programmer? Are we closer or further from thought-stuff?

March 7, 2025 Essay

Clipboards are eating the world

#028

Clipboards are eating the world

The untold story of how your computer's clipboard sees itself as the essential bridge between humans and AI tools in the creative process. Through its eyes, we witness the journey of how digital projects come together through countless transfers between different AI services.

February 25, 2025 Process

The New Touch Interface

#029

The New Touch Interface

The real killer apps of smartphones weren't the early games but became things like group chats and video calls that fundamentally changed how we communicate. Similarly, while we're currently amazed by AI's capabilities, we're still discovering how these tools will meaningfully integrate into our lives.

February 11, 2025 Essay

Tools for thinking. Everyday AI.

#030

Tools for thinking. Everyday AI.

From building nuclear fusors to probing Vatican AI doctrine, this exploration reveals how AI tools are reshaping our daily intellectual work in surprisingly practical ways. Through examples of interfacing with databases, analyzing legal documents, and diving into deep research rabbit holes, we see how AI assistants are becoming intuitive research companions that expand our ability to quickly understand and synthesize complex information.

January 30, 2025 Essay

How I classify models

#031

How I classify models

Small models are smart yet limited in knowledge; foundation models possess both deep understanding and extensive knowledge but lack structured problem-solving approaches. Educated models like DeepResearch excel by combining learned reasoning processes with large memory capacities, enabling them to adapt effectively to complex tasks while handling vast information instantaneously.

January 21, 2025 Models

AI for research: DeepResearch a clear winner

#032

AI for research: DeepResearch a clear winner

Asking the tough questions: DeepResearch excels in depth and comprehensiveness, while o1, Sonnet 3.5, and DeepSeek with DeepThought provide comparable results for complex inquiries. Smaller models like phi4 and llama3.2 are deemed inadequate for intricate topics.

January 12, 2025 Models

Learning on the go with NotebookLM

#033

Learning on the go with NotebookLM

By utilizing NotebookLM, an AI model capable of generating audio summaries and interactive conversations, you can create customized podcasts on-the-go. You can also join the conversation.

January 9, 2025 Essay

Making hard things easier

#034

Making hard things easier

Explore how generative AI-based tools can revolutionize the way we work, making creative tasks more accessible and efficient for both novices and experts, while also highlighting the importance of critical thinking and creativity in the face of automation.

December 15, 2024 Essay

Welcome to The Focus AI

#035

Welcome to The Focus AI

Let me tell you a bit about what we do here, a personal journey from cofounding a software development company to exploring the revolutionary potential of generative AI and how it's transforming the way humans interact with knowledge and information.

November 29, 2024 Meta

Slicing up a design from figma

#036

Slicing up a design from figma

In this hands-on comparison, three coding tools - Cursor, Aider, and v0 - are put to the test as they attempt to replicate a design from Figma into functional HTML and CSS code, revealing their strengths, weaknesses, and quirks.

November 27, 2024 Use Case

Focus.AI Research Division

Exploring the future of AI-powered software development

Subscribe to our newsletter

Ready to distill signal from noise?

Whether you're exploring possibilities or ready to build, we'd love to hear what you're working on.

Start a Conversation See Our Work