General 3 March 2026 5 min read

An End-to-End Journey into Agentic AI from Your Workstation

A practical guide to running agentic AI tooling locally: local model runners, local LLM gateways, and CLI/IDE clients.

Agentic AILLMToolingLocal DevMCP

This guide describes a practical, end-to-end setup for using agentic AI tooling from a workstation.

It is written as experimental / best-effort guidance. Where the original material referenced organisation-specific resources, those have been redacted or replaced with generic terms.

Architecture overview

At a high level, the workstation setup consists of:

  • A local model runtime (optional) for offline or privacy-sensitive use
  • A local LLM gateway that can route to:
    • hosted model providers (when online)
    • local models (when offline)
  • One or more agent clients (CLI or IDE)
Conceptual flow (expand)
  1. Your CLI/IDE agent sends prompts/tool calls to a local gateway.
  2. The gateway routes requests either to hosted providers or local model runtimes.
  3. Optional: the agent uses tool integrations (e.g., MCP servers / OpenAPI tooling).

Quick install guide (high level)

The original reference implementation lives in a restricted internal repository. The overall steps are:

  1. Keep an active API key (for hosted models)
  2. Deploy local models (optional)
  3. Deploy a local LLM gateway
  4. Choose your client (CLI/IDE)

I) Keep an active API key

Hosted-model API keys may expire periodically. If you rely on hosted models, you’ll need a way to renew/refresh your key.

Notes
  • This is not required for purely local models.
  • A hybrid approach works well:
    • use top-performing hosted models when online
    • use local models when offline or when you do not want LLM traffic leaving the workstation

II) Deploy local models (optional)

Choose a local runtime that fits your machine and constraints. Common choices include:

  • llama.cpp
  • Ollama
  • Docker-based model runners
Practical constraints
  • Be mindful of RAM and available system resources.
  • Smaller models are typically more practical on managed workstation builds.

III) Deploy a local LLM gateway

To support both hosted and local models, run a local gateway that can:

  • forward requests to hosted providers when available
  • expose local models via a consistent API for your clients

This completes the “backend” of the workstation setup.

IV) Choose your client

Below are common client categories and the typical configuration pattern.

OpenWebUI

By default, OpenWebUI is accessed via a local URL.

It can be configured to see multiple providers, for example:

  • Hosted provider models (via the local gateway)
  • Local containerised model registry endpoints
  • Ollama models (if exposed locally)
Refreshing the local environment after an API key renewal

If an API key was renewed and your local environment is containerised, you may need to refresh/recreate the containers so the gateway picks up the new key.

Example pattern:

refresh_local_gateway() {
  local original_dir="$PWD"
  cd ~/path/to/your/local/gateway
  docker compose pull
  docker compose up -d --force-recreate
  cd "$original_dir"
}

Claude Code

Follow the relevant client setup guide, then:

  • set the base URL to your local gateway
  • set the model to one of the available models

Gemini CLI

Follow the relevant client setup guide, then:

  • set the base URL to your local gateway
  • set the model to one of the available models

OpenAI Codex CLI

Follow the relevant client setup guide, then:

  • set the base URL to your local gateway
  • set the model to one of the available models

OpenCode

Follow the relevant client setup guide.

GitHub Copilot CLI

Follow the relevant client setup guide.

VSCode agent clients

Follow the relevant client setup guide.

Security & privacy notes

  • Treat API keys as secrets; store them in a secure keychain/secret store.
  • If you are handling sensitive data, prefer local models or ensure your hosted provider meets your compliance requirements.