Using OSS Models via Ollama, vLLM, and other LLM servers¶
VibePod can connect agents to external LLM servers that expose OpenAI- or Anthropic-compatible APIs. This lets you run agents like Claude Code and Codex against open-source models served by Ollama, vLLM, or any compatible endpoint.
Supported agents¶
| Agent | Env vars injected | CLI flags appended |
|---|---|---|
| claude | ANTHROPIC_BASE_URL, ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_MODEL, ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL |
--model <model> |
| codex | CODEX_OSS_BASE_URL |
--oss -m <model> |
Other agents do not yet have LLM mapping and will not receive any LLM configuration.
Quick start with Ollama¶
1. Start Ollama and pull a model¶
2. Configure VibePod¶
Add the following to your global or project config:
# ~/.config/vibepod/config.yaml
llm:
enabled: true
base_url: "http://host.docker.internal:11434"
api_key: "ollama"
model: "qwen3:14b"
Note
Use host.docker.internal (not localhost) so the Docker container can reach Ollama on the host machine.
3. Run an agent¶
vp run claude
# Starts Claude Code with:
# ANTHROPIC_BASE_URL=http://host.docker.internal:11434
# ANTHROPIC_API_KEY=ollama
# ANTHROPIC_AUTH_TOKEN=ollama
# ANTHROPIC_MODEL=qwen3:14b
# ANTHROPIC_DEFAULT_OPUS_MODEL=qwen3:14b
# ANTHROPIC_DEFAULT_SONNET_MODEL=qwen3:14b
# ANTHROPIC_DEFAULT_HAIKU_MODEL=qwen3:14b
# claude --model qwen3:14b
vp run codex
# Starts Codex with:
# CODEX_OSS_BASE_URL=http://host.docker.internal:11434
# codex --oss -m qwen3:14b
Using environment variables¶
You can also configure LLM settings at runtime without editing config files.
Claude Code with a remote Ollama server:
VP_LLM_ENABLED=true VP_LLM_MODEL=qwen3.5:9b VP_LLM_BASE_URL=https://ollama.example.com vp run claude
Codex with a remote Ollama server (note the /v1 suffix):
VP_LLM_ENABLED=true VP_LLM_MODEL=qwen3.5:9b VP_LLM_BASE_URL=https://ollama.example.com/v1 vp run codex
Local Ollama with an API key:
VP_LLM_ENABLED=true VP_LLM_BASE_URL=http://host.docker.internal:11434 VP_LLM_API_KEY=ollama VP_LLM_MODEL=qwen3:14b vp run claude
Note
Claude Code uses the Anthropic-compatible endpoint (no /v1 suffix), while Codex uses the OpenAI-compatible endpoint (with /v1 suffix). Adjust VP_LLM_BASE_URL accordingly, or use per-agent overrides if you need both agents to work from the same config.
See Configuration > Environment variables for the full list.
Using vLLM or other OpenAI-compatible servers¶
Point base_url at any server that speaks the OpenAI or Anthropic API:
llm:
enabled: true
base_url: "http://my-vllm-server:8000/v1"
api_key: "my-api-key"
model: "meta-llama/Llama-3-8B-Instruct"
Per-agent overrides¶
If you need different LLM settings for a specific agent, use the per-agent env config. Per-agent env vars take precedence over the llm section:
llm:
enabled: true
base_url: "http://host.docker.internal:11434"
api_key: "ollama"
model: "qwen3:14b"
agents:
claude:
env:
ANTHROPIC_BASE_URL: "http://different-server:11434"
Disabling¶
To turn off LLM injection without removing the config:
Or at runtime: