FOCUS.AI

EST. 24

gpt-5 and gpt-oss

OpenAI’s GPT-5 launch stole headlines, but GPT-OSS quietly made local AI a lot more practical. This post covers what’s new, how to run it with Ollama or LM Studio, and why context size can change your results.

Will Schenk August 13, 2025
gpt-5 and gpt-oss

Did you hear that the new gpt weights just dropped? OpenAI just released gpt-5 and did a coordinated release of gpt-oss with Hugging Face, Ollama, LMStudio and more that you can run and tinker with at home.

People’s reaction has been mixed — “GPT5 Should be ashamed of itself — but in my personal experience it’s a like a jump from GPT-3 to GPT-4 and I consistently find it giving me better answers then the other models. I find myself switch between google’s AI Mode and GPT5 for basically everything at this point, so I’d consider it a success.

But lets talk about the open source version.

Why this is cool

gpt-oss feels like a foundation model from a year ago. Its polished and one of the strongest text-only models you can run locally, now available for remixing. I expected that small models would become more capable, but its still surprising that my 4 year old laptop can run the smaller of the two released models. It’s text only, which rules out some of the use cases that gemma 3 in particular is good at, but the over all result feels very polished.

Tool use in general is good but can get stuck in a loop when trying to reason. Agentic workflows aren’t there yet but it feels very close. It can write code surprisingly well for running locally, but when it started called tools to actually write the files it fell down. The 20b sized one is not to the level when it can property back something like goose or RooCode. Goose was able to design and write code to the console, but when it started called tools to actually write the code in the filesystem it fell down. RooCode was completely confused by the assignment even with the context window maxed out.

Why local

Even without the actual source, these models give you the freedom to run, freedom to study, and the freedom to modify. But there’s something really empowering about being able to have this work in the sky, to be sure that no one is looking at what you are doing, to have what the preppers call Sovereignty.

Harmony

Splitting up system and developer as a message type better reflects how we are using system prompting, and the formalization of the output channels means that applications can start to standardize outputs. commentary will show tool calls, but the analysis channel is its chain of thought (CoT), and in the spec its marked

Important: Messages in the analysis channel do not adhere to the same safety standards as final messages do. Avoid showing these to end-users.`

Installation

Ollama and lmstudio have coordinated a release so that the latest version works out of the box. ollama pull gpt-oss:20b and you are good to go, and lmstudio gives you a prompt. Those are basically the

Ollama

Download ollama. Start it up. In the terminal pull down the model, and then lets go

ollama pull gpt-oss:20b

And then run

ollama run gpt-oss:20b "why is the sky dark at night"

Or use the front end. Grounding in web search makes a difference.

lmstudio

Download lmstudio and then it will prompt you.

Download it and load it up.

You can use the interface or the command line

lms start

llm-mlx

llm mlx download-model openai/gpt-oss-20b

Then run it

llm -m openai/gpt-oss20b "Why is it dark at night"

Context Size Changes How it Answers

The larger the context size, the more that it thinks. You can as to to think on three levels using the prompt, but the context is also important.

Ollama has a default context size of

$ ollama run gpt-oss:20b 
>>> /show info
  Model
    architecture        gptoss    
    parameters          20.9B     
    context length      131072    
    embedding length    2880      
    quantization        MXFP4     

lmstudio defaults to 4096 so you need to change the context length and then reload the model.

  1. https://simonwillison.net/2025/Aug/5/gpt-oss/
  2. https://www.oneusefulthing.org/p/gpt-5-it-just-does-stuff
  3. https://substack.com/inbox/post/170387696

Will Schenk August 13, 2025

Related Posts

Code Generation with Local Models

Code Generation with Local Models

Small, local AI models deliver surprisingly effective results for everyday tasks. Also llama3.2 is surprisingly fast and gpt-oss is surprisingly good.

VIEW POST

Technical Debt and the ROI Threshold

Technical Debt and the ROI Threshold

With agents now able to read and refactor code, the future cost of messy code -- and the current costs of unwritten code -- is shrinking. Code is more disposable and experimentation more rewarding.

VIEW POST

Subscribe to our newsletter

Archived issues

What we offer

2024

01.
AGENTIC TOOLING
02.
(JIT) JUST IN TIME SOFTWARE
03.
CONVERTING LEGACY SOFTWARE TO MCP / AGENTIC
04.
UNDERSTANDING UI DESIGN AND PRODUCT IN THE AI WORLD
05.
MODIFY LEGACY WORKFLOW AND UI

CONTACT US TO GET STARTED.