readme.ui.md

# 🖥️ ModelGrid — UI Concept

**A browser-based operations console for ModelGrid, served by the same daemon that
already exposes the OpenAI-compatible API.**

This document sketches the user interface that will sit on top of the ModelGrid
daemon: what it shows, how it is organized, how an operator moves through it,
and how it stays in sync with a running node or a small cluster. It is a
concept, not a final spec — the goal is to lock the shape of the product
before any frontend code is written.

The structural idioms (tabbed top-level views, route-origin awareness,
embedded ops dashboard on a dedicated port, API-first with a thin UI on top)
are adapted from `@serve.zone/dcrouter`'s Ops dashboard. ModelGrid's UI should
feel familiar to anyone who has operated dcrouter, while staying grounded in
ModelGrid's own domain: GPUs, vLLM deployments, a public model catalog, and a
cluster of gateway-capable nodes.

## 🎯 Purpose & Audience

- **Primary user:** the operator of one or a few ModelGrid nodes. Often the
  same person who provisioned the GPU host and ran `modelgrid service enable`.
- **Secondary user:** a platform engineer wiring ModelGrid into an internal
  AI platform who needs to manage API keys, audit deployments, and watch
  request traffic.
- **Not an end-user chat UI.** Consumers of the OpenAI-compatible API keep
  using their own SDKs and tools. The browser UI is for operating the fleet,
  not for prompting models.

The UI should collapse gracefully from a full cluster view down to a
single-node, standalone deployment, because both shapes are first-class in
ModelGrid's `cluster.role` model (`standalone` / `control-plane` / `worker`).

## 🧭 Top-Level Information Architecture

URLs follow `/{view}` for flat views and `/{view}/{subview}` for tabbed
views, matching dcrouter's routing idiom.

```
/overview
  /stats
  /configuration

/cluster
  /nodes
  /placements
  /desired

/gpus
  /devices
  /drivers

/deployments
  /active
  /history

/models
  /catalog
  /deployed

/access
  /apikeys
  /clients

/logs                  (flat)
/metrics               (flat)
/settings              (flat)
```

Rationale for the split:

- **Overview** is the landing page — one screen that answers "is the fleet
  healthy right now?"
- **Cluster / GPUs / Deployments / Models** are the four nouns an operator
  actually reasons about when running ModelGrid. Keeping them at the top
  level matches the CLI verbs (`modelgrid cluster`, `modelgrid gpu`,
  `modelgrid container`, `modelgrid model`) so muscle memory transfers.
- **Access** consolidates the authn/authz surface (API keys today,
  user/OIDC later) into one place, the way dcrouter groups `apitokens` and
  `users` under `access`.
- **Logs** and **Metrics** are flat because they are cross-cutting streams,
  not noun-scoped tabs.

The navigation chrome itself is a persistent left rail on desktop, collapsing
into a top hamburger on narrow viewports. The selected view is indicated
there; subviews surface as a tab strip at the top of the content area.

```
┌────────────┬──────────────────────────────────────────────────────────────┐
│  ModelGrid │  Overview ▸ Stats  Configuration                             │
│            ├──────────────────────────────────────────────────────────────┤
│  Overview ●│                                                              │
│  Cluster   │   ┌─ Fleet Health ─────────────────────────────────────┐     │
│  GPUs      │   │  2 nodes  •  3 GPUs  •  4 deployments  •  api OK   │     │
│  Deploys   │   └───────────────────────────────────────────────────┘     │
│  Models    │   ┌─ Live Traffic ──────────────┐ ┌─ GPU Utilization ─┐     │
│  Access    │   │  42 req/s   p95 820 ms      │ │  ▁▂▄▅▇█▇▅▄▂▁      │     │
│            │   │  ▁▂▃▅▇▇▅▃▂▁▁▂▄▆             │ │  avg 64%          │     │
│  Logs      │   └─────────────────────────────┘ └───────────────────┘     │
│  Metrics   │   ┌─ Deployments ────────────────────────────────────┐      │
│  Settings  │   │  llama-3.1-8b      running    2/2  nvidia-0,1    │      │
│            │   │  qwen2.5-7b        running    1/1  nvidia-2      │      │
│ node: ctrl │   │  bge-m3            pending    0/1  (no capacity) │      │
│ v1.1.0     │   └──────────────────────────────────────────────────┘      │
└────────────┴──────────────────────────────────────────────────────────────┘
```

The footer of the rail surfaces the local node's identity (`nodeName`,
`role`), the daemon version, and a small link to the API base URL —
equivalent to how dcrouter surfaces its runtime identity in the sidebar.

## 📄 Per-View Sketches

### Overview ▸ Stats (landing page)

A dashboard of the things that an on-call operator wants to see in under
two seconds:

- **Fleet health band**: green/yellow/red status tiles for nodes, GPUs,
  deployments, API.
- **Live traffic**: requests/sec, p50/p95/p99 latency, error rate. Sparkline
  for the last 15 minutes, streaming from `/metrics` and a server-pushed
  channel.
- **GPU utilization strip**: one micro-sparkline per GPU, colored by VRAM
  pressure.
- **Deployment summary**: the `modelgrid ps` output, but clickable. Each
  row deep-links into Deployments ▸ Active.
- **Catalog drift**: a small callout when `list.modelgrid.com` has newer
  model entries than the node's cached catalog.

### Overview ▸ Configuration

A read-only rendering of the resolved `/etc/modelgrid/config.json` with
section headers (`api`, `docker`, `gpus`, `models`, `cluster`). Operators
can copy the JSON; editing config is intentionally kept to the Settings view
(or the CLI) to avoid a "two sources of truth" problem.

### Cluster ▸ Nodes

Mirrors `modelgrid cluster nodes`. Each row: node name, role badge
(`standalone` / `control-plane` / `worker`), advertised URL, last heartbeat,
GPU inventory summary, status (`active` / `cordoned` / `draining`).

Row actions: `cordon`, `drain`, `activate` — the same verbs as the CLI.
Hitting an action fires the corresponding control-plane call and shows an
in-row toast on success.

```
┌ Nodes ───────────────────────────────────────────────────────────────────┐
│  Name          Role            Advertised URL              Heartbeat     │
│  ──────────────────────────────────────────────────────────────────────  │
│  control-a     control-plane   http://ctrl.internal:8080   2s ago    ●   │
│  worker-a      worker          http://wa.internal:8080     3s ago    ●   │
│  worker-b      worker          http://wb.internal:8080     41s ago   ◐   │
│                                                            [cordon] [drain]
└──────────────────────────────────────────────────────────────────────────┘
```

### Cluster ▸ Placements

A live map of where every deployed model is currently running, read from
the control-plane's placement state. Grouped by model, with a column per
node. Cells show replica count and health. This is where the operator
answers "where did `llama-3.1-8b` actually end up?".

### Cluster ▸ Desired

The companion to Placements: the desired-state table. Each row is a model
with a target replica count. Rows can be added (`cluster ensure`), edited
(`cluster scale`), or removed (`cluster clear`). The reconciler's pending
work is surfaced as a diff badge: e.g. `+1 replica`, `moving from worker-b
→ worker-a`.

### GPUs ▸ Devices

Mirrors `modelgrid gpu list` / `gpu status`, rendered as a card per GPU:
vendor, model, VRAM free/total, driver version, temperature, current
utilization, and which deployment is using it. Cards stream their
utilization via the realtime channel; no full page reloads.

### GPUs ▸ Drivers

Status per vendor (NVIDIA / AMD / Intel): driver installed? version? any
known issue? Includes a button to run `modelgrid gpu install`
interactively — but since the install flow is privileged and interactive,
the UI only kicks off the CLI walk-through in a terminal session rather
than trying to reimplement it in the browser. A small "copy the command"
affordance makes this explicit.

### Deployments ▸ Active

The core operational table. One row per active vLLM deployment:

- container ID, display name, model, GPU bindings, port, uptime, request
  rate, error rate
- status pill (`running`, `pending`, `restarting`, `failed`)
- row actions: `logs`, `stop`, `restart`, `remove`

Clicking a row opens a detail drawer with sub-tabs:

- **Summary** — the effective container config and the scheduling
  decision that landed it on this node
- **Logs** — a live tail (SSE)
- **Metrics** — request latency histogram, token throughput, VRAM
  occupancy
- **Events** — a timeline of lifecycle events (scheduled, pulled image,
  started, health check, restart, stopped)

### Deployments ▸ History

Deployments that have been stopped or removed, with the reason and the
last-known logs. Useful for post-mortem on a failed deploy.

### Models ▸ Catalog

The current catalog resolved from `list.modelgrid.com`, with a "refresh"
action that calls `modelgrid model refresh`. Each entry shows canonical
ID, aliases, capabilities (chat / completions / embeddings), minimum
VRAM, default GPU count, and a `Deploy` button. Deploying opens a small
form that mirrors `modelgrid run`: target node (or auto), desired replica
count, optional env overrides (e.g. `HF_TOKEN`).

A visible "source" badge marks whether the entry came from the public
catalog or a custom `registryUrl`, so operators can tell at a glance which
models the cluster will actually trust for auto-deploy.

### Models ▸ Deployed

Shows the union of what is running across the cluster, with replica
counts, keyed by canonical model ID. This is the view a developer asks
the operator for when they want to know "what models can I hit on this
endpoint?". It is effectively a pretty rendering of `/v1/models`.

### Access ▸ API Keys

Mirrors `modelgrid config apikey list`. Columns: label, prefix (first
8 chars), created, last used, status. Actions: `generate`, `revoke`.
Generating a key shows the secret once in a modal with a copy button,
then never shows it again — the same contract as dcrouter's API tokens.

### Access ▸ Clients

Placeholder for per-consumer rate limits, quotas, and request labels.
This view is explicitly future work; it renders as "not yet configured"
until the daemon exposes client records. Listing it now reserves the IA
slot so it doesn't have to be retrofitted later.

### Logs

A unified tail across daemon, scheduler, and deployments, with filters
by source (`daemon`, `scheduler`, `deployment:<id>`), level, and
free-text. Streamed via SSE. A "pause" toggle freezes the view for
reading; a "download" action exports the current buffer as NDJSON.

### Metrics

The `/metrics` endpoint rendered as a small set of charts (request rate,
latency, error rate, VRAM occupancy, model throughput). This is
deliberately lightweight — serious monitoring is expected to come from
Prometheus scraping `/metrics` into Grafana, and the UI says so with a
link to the recommended dashboard snippet.

### Settings

Editable configuration, grouped to match the config file:

- **API** — port, bind host, CORS, rate limit
- **Docker** — runtime, network name, socket path
- **GPUs** — auto-detect toggle, per-GPU assignments
- **Models** — registry URL, auto-deploy, default engine, auto-load list
- **Cluster** — role, advertise URL, control-plane URL, shared secret,
  heartbeat interval, seeds

Edits write through the daemon's config API (to be defined) and reload
without a restart wherever possible. Settings that require a restart are
marked with a `restart required` badge, and the UI surfaces a single
"restart daemon" action at the top of the view when any are pending.

## 🛤️ Key User Journeys

### Deploy a model from the catalog

1. Operator opens **Models ▸ Catalog**, filters for chat-capable models
   with VRAM ≤ 24 GB.
2. Clicks `Deploy` on `meta-llama/Llama-3.1-8B-Instruct`.
3. Dialog appears with target node (`auto` / specific worker), replica
   count (default from catalog), optional env (`HF_TOKEN`).
4. On submit, the UI calls the control plane (`cluster ensure` + `scale`
   under the hood). The dialog closes and the new row appears in
   **Deployments ▸ Active** in `pending` state.
5. SSE updates walk the row through `pulling image → starting → running`.
6. A toast links to the deployment detail drawer for logs.

### Add a worker node to an existing control plane

1. Operator opens **Cluster ▸ Nodes** on the control plane.
2. Clicks `Add node`, which opens a helper that pre-fills the worker's
   expected `cluster` config block — role, control-plane URL, shared
   secret — and exposes a one-liner install command.
3. The operator runs the install command on the worker host. The UI does
   **not** SSH into anything; it just hands out the exact snippet.
4. Once the worker's daemon starts and registers, the new node appears
   in the Nodes table with its first heartbeat. The helper closes
   automatically.

### Rotate an API key

1. **Access ▸ API Keys** → `Generate`.
2. Name the key, pick a scope (today: single scope; later: per-model).
3. The secret is shown once in a modal; copy-to-clipboard and a clear
   "you will not see this again" note.
4. Old key row gets a `revoke` action. Revoke is a confirm-then-apply
   flow because it will break live traffic.

### Investigate a failing deployment

1. **Overview ▸ Stats** shows a red tile: `1 deployment failed`.
2. Click drills into **Deployments ▸ Active**, filtered to `failed`.
3. Open the row drawer → **Events** tab to see the lifecycle timeline.
4. Jump to **Logs** tab for the live tail. If the deployment is down,
   fall back to the last 500 lines from its event buffer.
5. From the drawer, `restart` retries the deployment; if it fails again,
   the `Summary` tab shows the scheduling decision so the operator can
   see whether VRAM, GPU pinning, or image pull is the root cause.

## 📡 Realtime, Auth, and API Contract

- **Realtime updates.** Metrics, logs, GPU utilization, heartbeats, and
  deployment state changes stream over Server-Sent Events. A single
  `/v1/_ui/events?topics=...` endpoint is preferred over per-feature
  sockets so the browser holds exactly one connection. WebSocket is
  reserved for bidirectional features (e.g. an interactive install
  walkthrough) that we do not need in v1.
- **Auth model.** The UI runs behind the same daemon process as the
  OpenAI-compatible API, on a dedicated `uiPort` (default `8081`) to
  keep the data-plane clean. Login uses a session cookie; the first-boot
  bootstrap seeds an `admin` user with a one-time password printed to
  `journalctl -u modelgrid`, the same way dcrouter prints its initial
  `admin`/`admin`. SSO/OIDC is a later add-on.
- **API contract.** Every UI action maps to an HTTP endpoint on the
  daemon (`/v1/_ui/...`). The UI must not talk to any private internals
  directly; this keeps `@modelgrid.com/modelgrid-apiclient` (a future
  sibling to `@serve.zone/dcrouter-apiclient`) able to do everything the
  UI can do, from scripts.
- **Origin badges.** Similar to dcrouter's `config` / `email` / `dns` /
  `api` route-origin model, ModelGrid should tag each deployment with
  its origin: `config` (seeded via `containers` in config.json),
  `catalog` (auto-deployed from `models.autoLoad`), `api` (created via
  UI/API). Origin determines what the UI allows: `config`-origin
  deployments are toggle-only, `api`-origin deployments are full CRUD.

## 🧱 Implementation Notes (non-binding)

- **Web component stack.** Match the dcrouter OpsServer approach:
  component-per-view under `ts_web/elements/<area>/`, a tiny
  SmartRouter-style client router (`ts_web/router.ts`), and a single
  `appstate.ts` as the store.
- **Packaging.** Follow dcrouter's module split: `@modelgrid.com/modelgrid`
  ships the daemon and the UI bundle; a future
  `@modelgrid.com/modelgrid-web` can carve out the web boundary if the
  bundle grows large.
- **Dark theme default** (black background, high-contrast foreground) to
  match dcrouter and the expected server-ops environment. Light theme
  is a later toggle.
- **No server-side rendering.** The UI is a static SPA served by the
  daemon; all data is fetched through the API. This keeps the runtime
  surface small and makes the UI-less `curl` story identical to the UI
  story.

## ❓ Open Questions

- **Edit config from the UI or keep it CLI/file-first?** Current lean:
  UI is authoritative only for API keys, deployments, and cluster
  actions. Config editing is exposed but optional, with CLI still the
  canonical path for reproducible installs.
- **Do we expose a model prompt playground?** Nice to have for smoke
  tests, but it blurs the operator/consumer line. Defer to v2.
- **Cluster-wide vs per-node view.** On a worker node, should the UI
  show only local state, or proxy the control plane's cluster view? The
  current lean: workers show local-only, and link to the control plane
  for cluster views. This avoids split-brain confusion.
- **Access control granularity.** API keys today are coarse (all or
  nothing). A future model might scope keys per deployment or per
  model. Reserve the column in the Access ▸ API Keys table now.

## 🛑 Out of Scope (for this concept)

- End-user chat or prompt UIs for the OpenAI-compatible API.
- Billing, quotas, or usage-based pricing dashboards.
- Multi-tenant isolation beyond per-API-key separation.
- Anything specific to non-vLLM runtimes — the UI assumes the v1.1.0
  reorientation around vLLM as the only first-class runtime.
docs(readme): add UI concept document 2026-04-21 09:29:30 +00:00			`# 🖥️ ModelGrid — UI Concept`

			`**A browser-based operations console for ModelGrid, served by the same daemon that`
			`already exposes the OpenAI-compatible API.**`

			`This document sketches the user interface that will sit on top of the ModelGrid`
			`daemon: what it shows, how it is organized, how an operator moves through it,`
			`and how it stays in sync with a running node or a small cluster. It is a`
			`concept, not a final spec — the goal is to lock the shape of the product`
			`before any frontend code is written.`

			`The structural idioms (tabbed top-level views, route-origin awareness,`
			`embedded ops dashboard on a dedicated port, API-first with a thin UI on top)`
			are adapted from `@serve.zone/dcrouter`'s Ops dashboard. ModelGrid's UI should
			`feel familiar to anyone who has operated dcrouter, while staying grounded in`
			`ModelGrid's own domain: GPUs, vLLM deployments, a public model catalog, and a`
			`cluster of gateway-capable nodes.`

			`## 🎯 Purpose & Audience`

			`- Primary user: the operator of one or a few ModelGrid nodes. Often the`
			same person who provisioned the GPU host and ran `modelgrid service enable`.
			`- Secondary user: a platform engineer wiring ModelGrid into an internal`
			`AI platform who needs to manage API keys, audit deployments, and watch`
			`request traffic.`
			`- Not an end-user chat UI. Consumers of the OpenAI-compatible API keep`
			`using their own SDKs and tools. The browser UI is for operating the fleet,`
			`not for prompting models.`

			`The UI should collapse gracefully from a full cluster view down to a`
			`single-node, standalone deployment, because both shapes are first-class in`
			ModelGrid's `cluster.role` model (`standalone` / `control-plane` / `worker`).

			`## 🧭 Top-Level Information Architecture`

			URLs follow `/{view}` for flat views and `/{view}/{subview}` for tabbed
			`views, matching dcrouter's routing idiom.`

			```
			`/overview`
			`/stats`
			`/configuration`

			`/cluster`
			`/nodes`
			`/placements`
			`/desired`

			`/gpus`
			`/devices`
			`/drivers`

			`/deployments`
			`/active`
			`/history`

			`/models`
			`/catalog`
			`/deployed`

			`/access`
			`/apikeys`
			`/clients`

			`/logs (flat)`
			`/metrics (flat)`
			`/settings (flat)`
			```

			`Rationale for the split:`

			`- Overview is the landing page — one screen that answers "is the fleet`
			`healthy right now?"`
			`- Cluster / GPUs / Deployments / Models are the four nouns an operator`
			`actually reasons about when running ModelGrid. Keeping them at the top`
			level matches the CLI verbs (`modelgrid cluster`, `modelgrid gpu`,
			`modelgrid container`, `modelgrid model`) so muscle memory transfers.
			`- Access consolidates the authn/authz surface (API keys today,`
			user/OIDC later) into one place, the way dcrouter groups `apitokens` and
			`users` under `access`.
			`- Logs and Metrics are flat because they are cross-cutting streams,`
			`not noun-scoped tabs.`

			`The navigation chrome itself is a persistent left rail on desktop, collapsing`
			`into a top hamburger on narrow viewports. The selected view is indicated`
			`there; subviews surface as a tab strip at the top of the content area.`

			```
			`┌────────────┬──────────────────────────────────────────────────────────────┐`
			`│ ModelGrid │ Overview ▸ Stats Configuration │`
			`│ ├──────────────────────────────────────────────────────────────┤`
			`│ Overview ●│ │`
			`│ Cluster │ ┌─ Fleet Health ─────────────────────────────────────┐ │`
			`│ GPUs │ │ 2 nodes • 3 GPUs • 4 deployments • api OK │ │`
			`│ Deploys │ └───────────────────────────────────────────────────┘ │`
			`│ Models │ ┌─ Live Traffic ──────────────┐ ┌─ GPU Utilization ─┐ │`
			`│ Access │ │ 42 req/s p95 820 ms │ │ ▁▂▄▅▇█▇▅▄▂▁ │ │`
			`│ │ │ ▁▂▃▅▇▇▅▃▂▁▁▂▄▆ │ │ avg 64% │ │`
			`│ Logs │ └─────────────────────────────┘ └───────────────────┘ │`
			`│ Metrics │ ┌─ Deployments ────────────────────────────────────┐ │`
			`│ Settings │ │ llama-3.1-8b running 2/2 nvidia-0,1 │ │`
			`│ │ │ qwen2.5-7b running 1/1 nvidia-2 │ │`
			`│ node: ctrl │ │ bge-m3 pending 0/1 (no capacity) │ │`
			`│ v1.1.0 │ └──────────────────────────────────────────────────┘ │`
			`└────────────┴──────────────────────────────────────────────────────────────┘`
			```

			The footer of the rail surfaces the local node's identity (`nodeName`,
			`role`), the daemon version, and a small link to the API base URL —
			`equivalent to how dcrouter surfaces its runtime identity in the sidebar.`

			`## 📄 Per-View Sketches`

			`### Overview ▸ Stats (landing page)`

			`A dashboard of the things that an on-call operator wants to see in under`
			`two seconds:`

			`- Fleet health band: green/yellow/red status tiles for nodes, GPUs,`
			`deployments, API.`
			`- Live traffic: requests/sec, p50/p95/p99 latency, error rate. Sparkline`
			for the last 15 minutes, streaming from `/metrics` and a server-pushed
			`channel.`
			`- GPU utilization strip: one micro-sparkline per GPU, colored by VRAM`
			`pressure.`
			- Deployment summary: the `modelgrid ps` output, but clickable. Each
			`row deep-links into Deployments ▸ Active.`
			- Catalog drift: a small callout when `list.modelgrid.com` has newer
			`model entries than the node's cached catalog.`

			`### Overview ▸ Configuration`

			A read-only rendering of the resolved `/etc/modelgrid/config.json` with
			section headers (`api`, `docker`, `gpus`, `models`, `cluster`). Operators
			`can copy the JSON; editing config is intentionally kept to the Settings view`
			`(or the CLI) to avoid a "two sources of truth" problem.`

			`### Cluster ▸ Nodes`

			Mirrors `modelgrid cluster nodes`. Each row: node name, role badge
			(`standalone` / `control-plane` / `worker`), advertised URL, last heartbeat,
			GPU inventory summary, status (`active` / `cordoned` / `draining`).

			Row actions: `cordon`, `drain`, `activate` — the same verbs as the CLI.
			`Hitting an action fires the corresponding control-plane call and shows an`
			`in-row toast on success.`

			```
			`┌ Nodes ───────────────────────────────────────────────────────────────────┐`
			`│ Name Role Advertised URL Heartbeat │`
			`│ ────────────────────────────────────────────────────────────────────── │`
			`│ control-a control-plane http://ctrl.internal:8080 2s ago ● │`
			`│ worker-a worker http://wa.internal:8080 3s ago ● │`
			`│ worker-b worker http://wb.internal:8080 41s ago ◐ │`
			`│ [cordon] [drain]`
			`└──────────────────────────────────────────────────────────────────────────┘`
			```

			`### Cluster ▸ Placements`

			`A live map of where every deployed model is currently running, read from`
			`the control-plane's placement state. Grouped by model, with a column per`
			`node. Cells show replica count and health. This is where the operator`
			answers "where did `llama-3.1-8b` actually end up?".

			`### Cluster ▸ Desired`

			`The companion to Placements: the desired-state table. Each row is a model`
			with a target replica count. Rows can be added (`cluster ensure`), edited
			(`cluster scale`), or removed (`cluster clear`). The reconciler's pending
			work is surfaced as a diff badge: e.g. `+1 replica`, `moving from worker-b
			→ worker-a`.

			`### GPUs ▸ Devices`

			Mirrors `modelgrid gpu list` / `gpu status`, rendered as a card per GPU:
			`vendor, model, VRAM free/total, driver version, temperature, current`
			`utilization, and which deployment is using it. Cards stream their`
			`utilization via the realtime channel; no full page reloads.`

			`### GPUs ▸ Drivers`

			`Status per vendor (NVIDIA / AMD / Intel): driver installed? version? any`
			known issue? Includes a button to run `modelgrid gpu install`
			`interactively — but since the install flow is privileged and interactive,`
			`the UI only kicks off the CLI walk-through in a terminal session rather`
			`than trying to reimplement it in the browser. A small "copy the command"`
			`affordance makes this explicit.`

			`### Deployments ▸ Active`

			`The core operational table. One row per active vLLM deployment:`

			`- container ID, display name, model, GPU bindings, port, uptime, request`
			`rate, error rate`
			- status pill (`running`, `pending`, `restarting`, `failed`)
			- row actions: `logs`, `stop`, `restart`, `remove`

			`Clicking a row opens a detail drawer with sub-tabs:`

			`- Summary — the effective container config and the scheduling`
			`decision that landed it on this node`
			`- Logs — a live tail (SSE)`
			`- Metrics — request latency histogram, token throughput, VRAM`
			`occupancy`
			`- Events — a timeline of lifecycle events (scheduled, pulled image,`
			`started, health check, restart, stopped)`

			`### Deployments ▸ History`

			`Deployments that have been stopped or removed, with the reason and the`
			`last-known logs. Useful for post-mortem on a failed deploy.`

			`### Models ▸ Catalog`

			The current catalog resolved from `list.modelgrid.com`, with a "refresh"
			action that calls `modelgrid model refresh`. Each entry shows canonical
			`ID, aliases, capabilities (chat / completions / embeddings), minimum`
			VRAM, default GPU count, and a `Deploy` button. Deploying opens a small
			form that mirrors `modelgrid run`: target node (or auto), desired replica
			count, optional env overrides (e.g. `HF_TOKEN`).

			`A visible "source" badge marks whether the entry came from the public`
			catalog or a custom `registryUrl`, so operators can tell at a glance which
			`models the cluster will actually trust for auto-deploy.`

			`### Models ▸ Deployed`

			`Shows the union of what is running across the cluster, with replica`
			`counts, keyed by canonical model ID. This is the view a developer asks`
			`the operator for when they want to know "what models can I hit on this`
			endpoint?". It is effectively a pretty rendering of `/v1/models`.

			`### Access ▸ API Keys`

			Mirrors `modelgrid config apikey list`. Columns: label, prefix (first
			8 chars), created, last used, status. Actions: `generate`, `revoke`.
			`Generating a key shows the secret once in a modal with a copy button,`
			`then never shows it again — the same contract as dcrouter's API tokens.`

			`### Access ▸ Clients`

			`Placeholder for per-consumer rate limits, quotas, and request labels.`
			`This view is explicitly future work; it renders as "not yet configured"`
			`until the daemon exposes client records. Listing it now reserves the IA`
			`slot so it doesn't have to be retrofitted later.`

			`### Logs`

			`A unified tail across daemon, scheduler, and deployments, with filters`
			by source (`daemon`, `scheduler`, `deployment:<id>`), level, and
			`free-text. Streamed via SSE. A "pause" toggle freezes the view for`
			`reading; a "download" action exports the current buffer as NDJSON.`

			`### Metrics`

			The `/metrics` endpoint rendered as a small set of charts (request rate,
			`latency, error rate, VRAM occupancy, model throughput). This is`
			`deliberately lightweight — serious monitoring is expected to come from`
			Prometheus scraping `/metrics` into Grafana, and the UI says so with a
			`link to the recommended dashboard snippet.`

			`### Settings`

			`Editable configuration, grouped to match the config file:`

			`- API — port, bind host, CORS, rate limit`
			`- Docker — runtime, network name, socket path`
			`- GPUs — auto-detect toggle, per-GPU assignments`
			`- Models — registry URL, auto-deploy, default engine, auto-load list`
			`- Cluster — role, advertise URL, control-plane URL, shared secret,`
			`heartbeat interval, seeds`

			`Edits write through the daemon's config API (to be defined) and reload`
			`without a restart wherever possible. Settings that require a restart are`
			marked with a `restart required` badge, and the UI surfaces a single
			`"restart daemon" action at the top of the view when any are pending.`

			`## 🛤️ Key User Journeys`

			`### Deploy a model from the catalog`

			`1. Operator opens Models ▸ Catalog, filters for chat-capable models`
			`with VRAM ≤ 24 GB.`
			2. Clicks `Deploy` on `meta-llama/Llama-3.1-8B-Instruct`.
			3. Dialog appears with target node (`auto` / specific worker), replica
			count (default from catalog), optional env (`HF_TOKEN`).
			4. On submit, the UI calls the control plane (`cluster ensure` + `scale`
			`under the hood). The dialog closes and the new row appears in`
			Deployments ▸ Active in `pending` state.
			5. SSE updates walk the row through `pulling image → starting → running`.
			`6. A toast links to the deployment detail drawer for logs.`

			`### Add a worker node to an existing control plane`

			`1. Operator opens Cluster ▸ Nodes on the control plane.`
			2. Clicks `Add node`, which opens a helper that pre-fills the worker's
			expected `cluster` config block — role, control-plane URL, shared
			`secret — and exposes a one-liner install command.`
			`3. The operator runs the install command on the worker host. The UI does`
			`not SSH into anything; it just hands out the exact snippet.`
			`4. Once the worker's daemon starts and registers, the new node appears`
			`in the Nodes table with its first heartbeat. The helper closes`
			`automatically.`

			`### Rotate an API key`

			1. Access ▸ API Keys → `Generate`.
			`2. Name the key, pick a scope (today: single scope; later: per-model).`
			`3. The secret is shown once in a modal; copy-to-clipboard and a clear`
			`"you will not see this again" note.`
			4. Old key row gets a `revoke` action. Revoke is a confirm-then-apply
			`flow because it will break live traffic.`

			`### Investigate a failing deployment`

			1. Overview ▸ Stats shows a red tile: `1 deployment failed`.
			2. Click drills into Deployments ▸ Active, filtered to `failed`.
			`3. Open the row drawer → Events tab to see the lifecycle timeline.`
			`4. Jump to Logs tab for the live tail. If the deployment is down,`
			`fall back to the last 500 lines from its event buffer.`
			5. From the drawer, `restart` retries the deployment; if it fails again,
			the `Summary` tab shows the scheduling decision so the operator can
			`see whether VRAM, GPU pinning, or image pull is the root cause.`

			`## 📡 Realtime, Auth, and API Contract`

			`- Realtime updates. Metrics, logs, GPU utilization, heartbeats, and`
			`deployment state changes stream over Server-Sent Events. A single`
			`/v1/_ui/events?topics=...` endpoint is preferred over per-feature
			`sockets so the browser holds exactly one connection. WebSocket is`
			`reserved for bidirectional features (e.g. an interactive install`
			`walkthrough) that we do not need in v1.`
			`- Auth model. The UI runs behind the same daemon process as the`
			OpenAI-compatible API, on a dedicated `uiPort` (default `8081`) to
			`keep the data-plane clean. Login uses a session cookie; the first-boot`
			bootstrap seeds an `admin` user with a one-time password printed to
			`journalctl -u modelgrid`, the same way dcrouter prints its initial
			`admin`/`admin`. SSO/OIDC is a later add-on.
			`- API contract. Every UI action maps to an HTTP endpoint on the`
			daemon (`/v1/_ui/...`). The UI must not talk to any private internals
			directly; this keeps `@modelgrid.com/modelgrid-apiclient` (a future
			sibling to `@serve.zone/dcrouter-apiclient`) able to do everything the
			`UI can do, from scripts.`
			- Origin badges. Similar to dcrouter's `config` / `email` / `dns` /
			`api` route-origin model, ModelGrid should tag each deployment with
			its origin: `config` (seeded via `containers` in config.json),
			`catalog` (auto-deployed from `models.autoLoad`), `api` (created via
			UI/API). Origin determines what the UI allows: `config`-origin
			deployments are toggle-only, `api`-origin deployments are full CRUD.

			`## 🧱 Implementation Notes (non-binding)`

			`- Web component stack. Match the dcrouter OpsServer approach:`
			component-per-view under `ts_web/elements/<area>/`, a tiny
			SmartRouter-style client router (`ts_web/router.ts`), and a single
			`appstate.ts` as the store.
			- Packaging. Follow dcrouter's module split: `@modelgrid.com/modelgrid`
			`ships the daemon and the UI bundle; a future`
			`@modelgrid.com/modelgrid-web` can carve out the web boundary if the
			`bundle grows large.`
			`- Dark theme default (black background, high-contrast foreground) to`
			`match dcrouter and the expected server-ops environment. Light theme`
			`is a later toggle.`
			`- No server-side rendering. The UI is a static SPA served by the`
			`daemon; all data is fetched through the API. This keeps the runtime`
			surface small and makes the UI-less `curl` story identical to the UI
			`story.`

			`## ❓ Open Questions`

			`- Edit config from the UI or keep it CLI/file-first? Current lean:`
			`UI is authoritative only for API keys, deployments, and cluster`
			`actions. Config editing is exposed but optional, with CLI still the`
			`canonical path for reproducible installs.`
			`- Do we expose a model prompt playground? Nice to have for smoke`
			`tests, but it blurs the operator/consumer line. Defer to v2.`
			`- Cluster-wide vs per-node view. On a worker node, should the UI`
			`show only local state, or proxy the control plane's cluster view? The`
			`current lean: workers show local-only, and link to the control plane`
			`for cluster views. This avoids split-brain confusion.`
			`- Access control granularity. API keys today are coarse (all or`
			`nothing). A future model might scope keys per deployment or per`
			`model. Reserve the column in the Access ▸ API Keys table now.`

			`## 🛑 Out of Scope (for this concept)`

			`- End-user chat or prompt UIs for the OpenAI-compatible API.`
			`- Billing, quotas, or usage-based pricing dashboards.`
			`- Multi-tenant isolation beyond per-API-key separation.`
			`- Anything specific to non-vLLM runtimes — the UI assumes the v1.1.0`
			`reorientation around vLLM as the only first-class runtime.`