Weave is model-agnostic. Use a frontier API when you want to, or a small model running on your own hardware. Two concepts connect Weave to a model: providers and agents.
Providers
A provider is a connection to a model backend. Weave supports three kinds:
- OpenAI — including the
/v1/responsesAPI for GPT-5 reasoning variants. - Anthropic — Claude models, with optional prompt caching.
- Ollama — any model served locally; point Weave at the Ollama base URL.
Each provider stores a name, an API key (where needed), and a base URL. Because the base URL is configurable, most OpenAI-compatible servers work too. Weave can list the models a provider offers so you can pick one from a dropdown.
Agents
An agent pairs a provider with a specific model and the settings it should run under: a system prompt, an optional max-output-token override, and whether to use OpenAI's responses API. Agents are what chats and tasks run on — define a fast local agent for routine tasks and a frontier agent for the hard ones.
Prompt caching
For Anthropic, Weave can add cache breakpoints to the static prefix of agentic loops so re-sent context is served from cache. OpenAI caches automatically; Ollama has no caching. It's on by default and can be overridden per chat.
Weave's design is intentional preparation for the Tokenpocalypse — the point at which frontier model prices rise to reflect true compute cost, making small self-hosted models essential.