Weave Docs — Models & agents

Weave is model-agnostic. Use a frontier API when you want to, or a small model running on your own hardware. Two concepts connect Weave to a model: providers and agents.

Providers

A provider is a connection to a model backend. Weave supports three kinds:

OpenAI — including the /v1/responses API for GPT-5 reasoning variants.
Anthropic — Claude models, with optional prompt caching.
Ollama — any model served locally; point Weave at the Ollama base URL.

Each provider stores a name, an API key (where needed), and a base URL. Because the base URL is configurable, most OpenAI-compatible servers work too. Weave can list the models a provider offers so you can pick one from a dropdown.

Agents

An agent pairs a provider with a specific model and the settings it should run under: a system prompt, an optional max-output-token override, and whether to use OpenAI's responses API. Agents are what chats and tasks run on — define a fast local agent for routine tasks and a frontier agent for the hard ones.

Prompt caching

For Anthropic, Weave can add cache breakpoints to the static prefix of agentic loops so re-sent context is served from cache. OpenAI caches automatically; Ollama has no caching. It's on by default and can be overridden per chat.

⚡

Weave's design is intentional preparation for the Tokenpocalypse — the point at which frontier model prices rise to reflect true compute cost, making small self-hosted models essential.