This guide describes a practical, end-to-end setup for using agentic AI tooling from a workstation.
It is written as experimental / best-effort guidance. Where the original material referenced organisation-specific resources, those have been redacted or replaced with generic terms.
Architecture overview
At a high level, the workstation setup consists of:
- A local model runtime (optional) for offline or privacy-sensitive use
- A local LLM gateway that can route to:
- hosted model providers (when online)
- local models (when offline)
- One or more agent clients (CLI or IDE)
Conceptual flow (expand)
- Your CLI/IDE agent sends prompts/tool calls to a local gateway.
- The gateway routes requests either to hosted providers or local model runtimes.
- Optional: the agent uses tool integrations (e.g., MCP servers / OpenAPI tooling).
Quick install guide (high level)
The original reference implementation lives in a restricted internal repository. The overall steps are:
- Keep an active API key (for hosted models)
- Deploy local models (optional)
- Deploy a local LLM gateway
- Choose your client (CLI/IDE)
I) Keep an active API key
Hosted-model API keys may expire periodically. If you rely on hosted models, you’ll need a way to renew/refresh your key.
Notes
- This is not required for purely local models.
- A hybrid approach works well:
- use top-performing hosted models when online
- use local models when offline or when you do not want LLM traffic leaving the workstation
II) Deploy local models (optional)
Choose a local runtime that fits your machine and constraints. Common choices include:
llama.cpp- Ollama
- Docker-based model runners
Practical constraints
- Be mindful of RAM and available system resources.
- Smaller models are typically more practical on managed workstation builds.
III) Deploy a local LLM gateway
To support both hosted and local models, run a local gateway that can:
- forward requests to hosted providers when available
- expose local models via a consistent API for your clients
This completes the “backend” of the workstation setup.
IV) Choose your client
Below are common client categories and the typical configuration pattern.
OpenWebUI
By default, OpenWebUI is accessed via a local URL.
It can be configured to see multiple providers, for example:
- Hosted provider models (via the local gateway)
- Local containerised model registry endpoints
- Ollama models (if exposed locally)
Refreshing the local environment after an API key renewal
If an API key was renewed and your local environment is containerised, you may need to refresh/recreate the containers so the gateway picks up the new key.
Example pattern:
refresh_local_gateway() {
local original_dir="$PWD"
cd ~/path/to/your/local/gateway
docker compose pull
docker compose up -d --force-recreate
cd "$original_dir"
}
Claude Code
Follow the relevant client setup guide, then:
- set the base URL to your local gateway
- set the model to one of the available models
Gemini CLI
Follow the relevant client setup guide, then:
- set the base URL to your local gateway
- set the model to one of the available models
OpenAI Codex CLI
Follow the relevant client setup guide, then:
- set the base URL to your local gateway
- set the model to one of the available models
OpenCode
Follow the relevant client setup guide.
GitHub Copilot CLI
Follow the relevant client setup guide.
VSCode agent clients
Follow the relevant client setup guide.
Security & privacy notes
- Treat API keys as secrets; store them in a secure keychain/secret store.
- If you are handling sensitive data, prefer local models or ensure your hosted provider meets your compliance requirements.