diff --git a/readme.ui.md b/readme.ui.md
new file mode 100644
index 0000000..da72dd8
--- /dev/null
+++ b/readme.ui.md
@@ -0,0 +1,392 @@
+# 🖥️ ModelGrid — UI Concept
+
+**A browser-based operations console for ModelGrid, served by the same daemon that
+already exposes the OpenAI-compatible API.**
+
+This document sketches the user interface that will sit on top of the ModelGrid
+daemon: what it shows, how it is organized, how an operator moves through it,
+and how it stays in sync with a running node or a small cluster. It is a
+concept, not a final spec — the goal is to lock the shape of the product
+before any frontend code is written.
+
+The structural idioms (tabbed top-level views, route-origin awareness,
+embedded ops dashboard on a dedicated port, API-first with a thin UI on top)
+are adapted from `@serve.zone/dcrouter`'s Ops dashboard. ModelGrid's UI should
+feel familiar to anyone who has operated dcrouter, while staying grounded in
+ModelGrid's own domain: GPUs, vLLM deployments, a public model catalog, and a
+cluster of gateway-capable nodes.
+
+## 🎯 Purpose & Audience
+
+- **Primary user:** the operator of one or a few ModelGrid nodes. Often the
+  same person who provisioned the GPU host and ran `modelgrid service enable`.
+- **Secondary user:** a platform engineer wiring ModelGrid into an internal
+  AI platform who needs to manage API keys, audit deployments, and watch
+  request traffic.
+- **Not an end-user chat UI.** Consumers of the OpenAI-compatible API keep
+  using their own SDKs and tools. The browser UI is for operating the fleet,
+  not for prompting models.
+
+The UI should collapse gracefully from a full cluster view down to a
+single-node, standalone deployment, because both shapes are first-class in
+ModelGrid's `cluster.role` model (`standalone` / `control-plane` / `worker`).
+
+## 🧭 Top-Level Information Architecture
+
+URLs follow `/{view}` for flat views and `/{view}/{subview}` for tabbed
+views, matching dcrouter's routing idiom.
+
+```
+/overview
+  /stats
+  /configuration
+
+/cluster
+  /nodes
+  /placements
+  /desired
+
+/gpus
+  /devices
+  /drivers
+
+/deployments
+  /active
+  /history
+
+/models
+  /catalog
+  /deployed
+
+/access
+  /apikeys
+  /clients
+
+/logs                  (flat)
+/metrics               (flat)
+/settings              (flat)
+```
+
+Rationale for the split:
+
+- **Overview** is the landing page — one screen that answers "is the fleet
+  healthy right now?"
+- **Cluster / GPUs / Deployments / Models** are the four nouns an operator
+  actually reasons about when running ModelGrid. Keeping them at the top
+  level matches the CLI verbs (`modelgrid cluster`, `modelgrid gpu`,
+  `modelgrid container`, `modelgrid model`) so muscle memory transfers.
+- **Access** consolidates the authn/authz surface (API keys today,
+  user/OIDC later) into one place, the way dcrouter groups `apitokens` and
+  `users` under `access`.
+- **Logs** and **Metrics** are flat because they are cross-cutting streams,
+  not noun-scoped tabs.
+
+The navigation chrome itself is a persistent left rail on desktop, collapsing
+into a top hamburger on narrow viewports. The selected view is indicated
+there; subviews surface as a tab strip at the top of the content area.
+
+```
+┌────────────┬──────────────────────────────────────────────────────────────┐
+│  ModelGrid │  Overview ▸ Stats  Configuration                             │
+│            ├──────────────────────────────────────────────────────────────┤
+│  Overview ●│                                                              │
+│  Cluster   │   ┌─ Fleet Health ─────────────────────────────────────┐     │
+│  GPUs      │   │  2 nodes  •  3 GPUs  •  4 deployments  •  api OK   │     │
+│  Deploys   │   └───────────────────────────────────────────────────┘     │
+│  Models    │   ┌─ Live Traffic ──────────────┐ ┌─ GPU Utilization ─┐     │
+│  Access    │   │  42 req/s   p95 820 ms      │ │  ▁▂▄▅▇█▇▅▄▂▁      │     │
+│            │   │  ▁▂▃▅▇▇▅▃▂▁▁▂▄▆             │ │  avg 64%          │     │
+│  Logs      │   └─────────────────────────────┘ └───────────────────┘     │
+│  Metrics   │   ┌─ Deployments ────────────────────────────────────┐      │
+│  Settings  │   │  llama-3.1-8b      running    2/2  nvidia-0,1    │      │
+│            │   │  qwen2.5-7b        running    1/1  nvidia-2      │      │
+│ node: ctrl │   │  bge-m3            pending    0/1  (no capacity) │      │
+│ v1.1.0     │   └──────────────────────────────────────────────────┘      │
+└────────────┴──────────────────────────────────────────────────────────────┘
+```
+
+The footer of the rail surfaces the local node's identity (`nodeName`,
+`role`), the daemon version, and a small link to the API base URL —
+equivalent to how dcrouter surfaces its runtime identity in the sidebar.
+
+## 📄 Per-View Sketches
+
+### Overview ▸ Stats (landing page)
+
+A dashboard of the things that an on-call operator wants to see in under
+two seconds:
+
+- **Fleet health band**: green/yellow/red status tiles for nodes, GPUs,
+  deployments, API.
+- **Live traffic**: requests/sec, p50/p95/p99 latency, error rate. Sparkline
+  for the last 15 minutes, streaming from `/metrics` and a server-pushed
+  channel.
+- **GPU utilization strip**: one micro-sparkline per GPU, colored by VRAM
+  pressure.
+- **Deployment summary**: the `modelgrid ps` output, but clickable. Each
+  row deep-links into Deployments ▸ Active.
+- **Catalog drift**: a small callout when `list.modelgrid.com` has newer
+  model entries than the node's cached catalog.
+
+### Overview ▸ Configuration
+
+A read-only rendering of the resolved `/etc/modelgrid/config.json` with
+section headers (`api`, `docker`, `gpus`, `models`, `cluster`). Operators
+can copy the JSON; editing config is intentionally kept to the Settings view
+(or the CLI) to avoid a "two sources of truth" problem.
+
+### Cluster ▸ Nodes
+
+Mirrors `modelgrid cluster nodes`. Each row: node name, role badge
+(`standalone` / `control-plane` / `worker`), advertised URL, last heartbeat,
+GPU inventory summary, status (`active` / `cordoned` / `draining`).
+
+Row actions: `cordon`, `drain`, `activate` — the same verbs as the CLI.
+Hitting an action fires the corresponding control-plane call and shows an
+in-row toast on success.
+
+```
+┌ Nodes ───────────────────────────────────────────────────────────────────┐
+│  Name          Role            Advertised URL              Heartbeat     │
+│  ──────────────────────────────────────────────────────────────────────  │
+│  control-a     control-plane   http://ctrl.internal:8080   2s ago    ●   │
+│  worker-a      worker          http://wa.internal:8080     3s ago    ●   │
+│  worker-b      worker          http://wb.internal:8080     41s ago   ◐   │
+│                                                            [cordon] [drain]
+└──────────────────────────────────────────────────────────────────────────┘
+```
+
+### Cluster ▸ Placements
+
+A live map of where every deployed model is currently running, read from
+the control-plane's placement state. Grouped by model, with a column per
+node. Cells show replica count and health. This is where the operator
+answers "where did `llama-3.1-8b` actually end up?".
+
+### Cluster ▸ Desired
+
+The companion to Placements: the desired-state table. Each row is a model
+with a target replica count. Rows can be added (`cluster ensure`), edited
+(`cluster scale`), or removed (`cluster clear`). The reconciler's pending
+work is surfaced as a diff badge: e.g. `+1 replica`, `moving from worker-b
+→ worker-a`.
+
+### GPUs ▸ Devices
+
+Mirrors `modelgrid gpu list` / `gpu status`, rendered as a card per GPU:
+vendor, model, VRAM free/total, driver version, temperature, current
+utilization, and which deployment is using it. Cards stream their
+utilization via the realtime channel; no full page reloads.
+
+### GPUs ▸ Drivers
+
+Status per vendor (NVIDIA / AMD / Intel): driver installed? version? any
+known issue? Includes a button to run `modelgrid gpu install`
+interactively — but since the install flow is privileged and interactive,
+the UI only kicks off the CLI walk-through in a terminal session rather
+than trying to reimplement it in the browser. A small "copy the command"
+affordance makes this explicit.
+
+### Deployments ▸ Active
+
+The core operational table. One row per active vLLM deployment:
+
+- container ID, display name, model, GPU bindings, port, uptime, request
+  rate, error rate
+- status pill (`running`, `pending`, `restarting`, `failed`)
+- row actions: `logs`, `stop`, `restart`, `remove`
+
+Clicking a row opens a detail drawer with sub-tabs:
+
+- **Summary** — the effective container config and the scheduling
+  decision that landed it on this node
+- **Logs** — a live tail (SSE)
+- **Metrics** — request latency histogram, token throughput, VRAM
+  occupancy
+- **Events** — a timeline of lifecycle events (scheduled, pulled image,
+  started, health check, restart, stopped)
+
+### Deployments ▸ History
+
+Deployments that have been stopped or removed, with the reason and the
+last-known logs. Useful for post-mortem on a failed deploy.
+
+### Models ▸ Catalog
+
+The current catalog resolved from `list.modelgrid.com`, with a "refresh"
+action that calls `modelgrid model refresh`. Each entry shows canonical
+ID, aliases, capabilities (chat / completions / embeddings), minimum
+VRAM, default GPU count, and a `Deploy` button. Deploying opens a small
+form that mirrors `modelgrid run`: target node (or auto), desired replica
+count, optional env overrides (e.g. `HF_TOKEN`).
+
+A visible "source" badge marks whether the entry came from the public
+catalog or a custom `registryUrl`, so operators can tell at a glance which
+models the cluster will actually trust for auto-deploy.
+
+### Models ▸ Deployed
+
+Shows the union of what is running across the cluster, with replica
+counts, keyed by canonical model ID. This is the view a developer asks
+the operator for when they want to know "what models can I hit on this
+endpoint?". It is effectively a pretty rendering of `/v1/models`.
+
+### Access ▸ API Keys
+
+Mirrors `modelgrid config apikey list`. Columns: label, prefix (first
+8 chars), created, last used, status. Actions: `generate`, `revoke`.
+Generating a key shows the secret once in a modal with a copy button,
+then never shows it again — the same contract as dcrouter's API tokens.
+
+### Access ▸ Clients
+
+Placeholder for per-consumer rate limits, quotas, and request labels.
+This view is explicitly future work; it renders as "not yet configured"
+until the daemon exposes client records. Listing it now reserves the IA
+slot so it doesn't have to be retrofitted later.
+
+### Logs
+
+A unified tail across daemon, scheduler, and deployments, with filters
+by source (`daemon`, `scheduler`, `deployment:<id>`), level, and
+free-text. Streamed via SSE. A "pause" toggle freezes the view for
+reading; a "download" action exports the current buffer as NDJSON.
+
+### Metrics
+
+The `/metrics` endpoint rendered as a small set of charts (request rate,
+latency, error rate, VRAM occupancy, model throughput). This is
+deliberately lightweight — serious monitoring is expected to come from
+Prometheus scraping `/metrics` into Grafana, and the UI says so with a
+link to the recommended dashboard snippet.
+
+### Settings
+
+Editable configuration, grouped to match the config file:
+
+- **API** — port, bind host, CORS, rate limit
+- **Docker** — runtime, network name, socket path
+- **GPUs** — auto-detect toggle, per-GPU assignments
+- **Models** — registry URL, auto-deploy, default engine, auto-load list
+- **Cluster** — role, advertise URL, control-plane URL, shared secret,
+  heartbeat interval, seeds
+
+Edits write through the daemon's config API (to be defined) and reload
+without a restart wherever possible. Settings that require a restart are
+marked with a `restart required` badge, and the UI surfaces a single
+"restart daemon" action at the top of the view when any are pending.
+
+## 🛤️ Key User Journeys
+
+### Deploy a model from the catalog
+
+1. Operator opens **Models ▸ Catalog**, filters for chat-capable models
+   with VRAM ≤ 24 GB.
+2. Clicks `Deploy` on `meta-llama/Llama-3.1-8B-Instruct`.
+3. Dialog appears with target node (`auto` / specific worker), replica
+   count (default from catalog), optional env (`HF_TOKEN`).
+4. On submit, the UI calls the control plane (`cluster ensure` + `scale`
+   under the hood). The dialog closes and the new row appears in
+   **Deployments ▸ Active** in `pending` state.
+5. SSE updates walk the row through `pulling image → starting → running`.
+6. A toast links to the deployment detail drawer for logs.
+
+### Add a worker node to an existing control plane
+
+1. Operator opens **Cluster ▸ Nodes** on the control plane.
+2. Clicks `Add node`, which opens a helper that pre-fills the worker's
+   expected `cluster` config block — role, control-plane URL, shared
+   secret — and exposes a one-liner install command.
+3. The operator runs the install command on the worker host. The UI does
+   **not** SSH into anything; it just hands out the exact snippet.
+4. Once the worker's daemon starts and registers, the new node appears
+   in the Nodes table with its first heartbeat. The helper closes
+   automatically.
+
+### Rotate an API key
+
+1. **Access ▸ API Keys** → `Generate`.
+2. Name the key, pick a scope (today: single scope; later: per-model).
+3. The secret is shown once in a modal; copy-to-clipboard and a clear
+   "you will not see this again" note.
+4. Old key row gets a `revoke` action. Revoke is a confirm-then-apply
+   flow because it will break live traffic.
+
+### Investigate a failing deployment
+
+1. **Overview ▸ Stats** shows a red tile: `1 deployment failed`.
+2. Click drills into **Deployments ▸ Active**, filtered to `failed`.
+3. Open the row drawer → **Events** tab to see the lifecycle timeline.
+4. Jump to **Logs** tab for the live tail. If the deployment is down,
+   fall back to the last 500 lines from its event buffer.
+5. From the drawer, `restart` retries the deployment; if it fails again,
+   the `Summary` tab shows the scheduling decision so the operator can
+   see whether VRAM, GPU pinning, or image pull is the root cause.
+
+## 📡 Realtime, Auth, and API Contract
+
+- **Realtime updates.** Metrics, logs, GPU utilization, heartbeats, and
+  deployment state changes stream over Server-Sent Events. A single
+  `/v1/_ui/events?topics=...` endpoint is preferred over per-feature
+  sockets so the browser holds exactly one connection. WebSocket is
+  reserved for bidirectional features (e.g. an interactive install
+  walkthrough) that we do not need in v1.
+- **Auth model.** The UI runs behind the same daemon process as the
+  OpenAI-compatible API, on a dedicated `uiPort` (default `8081`) to
+  keep the data-plane clean. Login uses a session cookie; the first-boot
+  bootstrap seeds an `admin` user with a one-time password printed to
+  `journalctl -u modelgrid`, the same way dcrouter prints its initial
+  `admin`/`admin`. SSO/OIDC is a later add-on.
+- **API contract.** Every UI action maps to an HTTP endpoint on the
+  daemon (`/v1/_ui/...`). The UI must not talk to any private internals
+  directly; this keeps `@modelgrid.com/modelgrid-apiclient` (a future
+  sibling to `@serve.zone/dcrouter-apiclient`) able to do everything the
+  UI can do, from scripts.
+- **Origin badges.** Similar to dcrouter's `config` / `email` / `dns` /
+  `api` route-origin model, ModelGrid should tag each deployment with
+  its origin: `config` (seeded via `containers` in config.json),
+  `catalog` (auto-deployed from `models.autoLoad`), `api` (created via
+  UI/API). Origin determines what the UI allows: `config`-origin
+  deployments are toggle-only, `api`-origin deployments are full CRUD.
+
+## 🧱 Implementation Notes (non-binding)
+
+- **Web component stack.** Match the dcrouter OpsServer approach:
+  component-per-view under `ts_web/elements/<area>/`, a tiny
+  SmartRouter-style client router (`ts_web/router.ts`), and a single
+  `appstate.ts` as the store.
+- **Packaging.** Follow dcrouter's module split: `@modelgrid.com/modelgrid`
+  ships the daemon and the UI bundle; a future
+  `@modelgrid.com/modelgrid-web` can carve out the web boundary if the
+  bundle grows large.
+- **Dark theme default** (black background, high-contrast foreground) to
+  match dcrouter and the expected server-ops environment. Light theme
+  is a later toggle.
+- **No server-side rendering.** The UI is a static SPA served by the
+  daemon; all data is fetched through the API. This keeps the runtime
+  surface small and makes the UI-less `curl` story identical to the UI
+  story.
+
+## ❓ Open Questions
+
+- **Edit config from the UI or keep it CLI/file-first?** Current lean:
+  UI is authoritative only for API keys, deployments, and cluster
+  actions. Config editing is exposed but optional, with CLI still the
+  canonical path for reproducible installs.
+- **Do we expose a model prompt playground?** Nice to have for smoke
+  tests, but it blurs the operator/consumer line. Defer to v2.
+- **Cluster-wide vs per-node view.** On a worker node, should the UI
+  show only local state, or proxy the control plane's cluster view? The
+  current lean: workers show local-only, and link to the control plane
+  for cluster views. This avoids split-brain confusion.
+- **Access control granularity.** API keys today are coarse (all or
+  nothing). A future model might scope keys per deployment or per
+  model. Reserve the column in the Access ▸ API Keys table now.
+
+## 🛑 Out of Scope (for this concept)
+
+- End-user chat or prompt UIs for the OpenAI-compatible API.
+- Billing, quotas, or usage-based pricing dashboards.
+- Multi-tenant isolation beyond per-API-key separation.
+- Anything specific to non-vLLM runtimes — the UI assumes the v1.1.0
+  reorientation around vLLM as the only first-class runtime.