test(ui): cover degraded overview health semantics

refactor(health): share health snapshot computation
feat(health): include degraded reasons in responses
2026-04-21 13:36:53 +00:00 · 2026-04-21 13:36:07 +00:00 · 2026-04-21 13:34:58 +00:00 · 2026-04-21 13:33:38 +00:00 · 2026-04-21 13:32:50 +00:00 · 2026-04-21 13:32:12 +00:00
36 changed files with 2848 additions and 167 deletions
@@ -1,6 +1,9 @@
 # Compiled Deno binaries (built by scripts/compile-all.sh)
 dist/binaries/
 # Generated UI bundle (built by scripts/bundle-ui.ts)
 ts_bundled/
 # Deno cache and lock file
 .deno/
 deno.lock
@@ -4,9 +4,10 @@
  "exports": "./mod.ts",
  "nodeModulesDir": "auto",
  "tasks": {
-    "dev": "deno run --allow-all mod.ts",
+    "dev": "UI_ASSET_SOURCE=disk deno run --allow-all mod.ts",
    "bundle:ui": "deno run --allow-read --allow-write scripts/bundle-ui.ts",
    "compile": "deno task compile:all",
-    "compile:all": "bash scripts/compile-all.sh",
+    "compile:all": "deno task bundle:ui && bash scripts/compile-all.sh",
    "test": "deno test --allow-all test/",
    "test:watch": "deno test --allow-all --watch test/",
    "check": "deno check mod.ts",
@@ -37,8 +37,9 @@
  "scripts": {
    "postinstall": "node scripts/install-binary.js",
    "prepublishOnly": "echo 'Publishing ModelGrid binaries to npm...'",
-    "test": "echo 'Tests are run with Deno: deno task test'",
+    "test": "deno task test",
-    "build": "echo 'no build needed'"
+    "check": "deno task check",
    "build": "deno task bundle:ui"
  },
  "files": [
    "bin/",
@@ -3,7 +3,7 @@
 ## Project Overview
 ModelGrid is a root-level daemon that manages GPU infrastructure, Docker, and AI model containers
-(Ollama, vLLM, TGI) with an OpenAI-compatible API interface.
+(vLLM, TGI) with an OpenAI-compatible API interface.
 ## Architecture
@@ -84,13 +84,12 @@ ts/
 ### Greenlit Model System
- Only pre-approved models can be auto-pulled for security
+- Only catalog-listed models can be auto-deployed on demand
- Greenlist fetched from remote URL (configurable)
+- Catalog fetched from a remote URL (configurable)
 - VRAM requirements checked before loading
 ### Container Types
 - **Ollama**: Easy to use, native API converted to OpenAI format
 - **vLLM**: High performance, natively OpenAI-compatible
 - **TGI**: HuggingFace Text Generation Inference
@@ -111,12 +110,20 @@ interface IModelGridConfig {
    port: number; // Default: 8080
    host: string; // Default: '0.0.0.0'
    apiKeys: string[]; // Valid API keys
    rateLimit?: number;
    cors: boolean;
    corsOrigins: string[];
  };
  ui: {
    enabled: boolean;
    port: number; // Default: 8081
    host: string; // Default: '0.0.0.0'
    assetSource: 'bundle' | 'disk';
  };
  docker: {
    networkName: string; // Default: 'modelgrid'
    runtime: 'docker' | 'podman';
    socketPath?: string;
  };
  gpus: {
    autoDetect: boolean;
@@ -124,11 +131,12 @@ interface IModelGridConfig {
  };
  containers: IContainerConfig[];
  models: {
-    greenlistUrl: string;
+    registryUrl: string;
-    autoPull: boolean;
+    autoDeploy: boolean;
-    defaultContainer: string;
+    defaultEngine: 'vllm';
    autoLoad: string[];
  };
  cluster: IClusterConfig;
  checkInterval: number;
 }
 ```
@@ -318,15 +318,15 @@ modelgrid cluster activate NODE # Mark a node active again
 High-performance inference with PagedAttention and continuous batching.
-```bash
+```jsonc
 {
  "id": "vllm-1",
  "type": "vllm",
  "name": "vLLM Server",
-  "gpuIds": ["nvidia-0", "nvidia-1"],  # Tensor parallelism
+  "gpuIds": ["nvidia-0", "nvidia-1"], // Tensor parallelism
  "port": 8000,
  "env": {
-    "HF_TOKEN": "your-huggingface-token"  # For gated models
+    "HF_TOKEN": "your-huggingface-token" // For gated models
  }
 }
 ```
@@ -555,6 +555,12 @@ deno run --allow-all mod.ts help
 # Run tests
 deno task test
 # Run the main regression suite used during focused changes
 deno test --allow-all test/test.ts
 # Run the full suite, including focused seam tests
 deno test --allow-all test/
 # Type check
 deno task check
@@ -595,6 +601,14 @@ modelgrid/
 └── bin/                      # npm wrapper
 ```
 Focused seam tests live alongside `test/test.ts`:
 - `test/api-router_test.ts` covers routing, auth failures, and request-size handling
 - `test/api-server_test.ts` covers health, metrics, and authenticated model listing
 - `test/modelgrid-config_test.ts` covers config normalization and ignored-key warnings
 - `test/model-registry_test.ts` covers fallback and file-backed catalog loading
 - `test/cluster-manager-persistence_test.ts` covers persisted cluster state loading and pruning
 ## 🗑️ Uninstallation
 ```bash
@@ -26,9 +26,9 @@
 ┌─────────────────────────────────────────────────────────────────┐
 │                    Container Runtime                             │
 │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
-│  │ Ollama   │  │  vLLM    │  │   TGI    │  │ Custom   │        │
+│  │  vLLM    │  │   TGI    │  │ Custom   │                      │
-│  │Container │  │Container │  │Container │  │Container │        │
+│  │Container │  │Container │  │Container │                      │
-│  └──────────┘  └──────────┘  └──────────┘  └──────────┘        │
+│  └──────────┘  └──────────┘  └──────────┘                      │
 └─────────────────────────────────────────────────────────────────┘
 ```
@@ -61,7 +61,7 @@
 ### Pending Tasks
 - [ ] Integration testing with real GPUs
- [ ] End-to-end API testing
+- [x] End-to-end API smoke testing for health, metrics, and authenticated model listings
 - [ ] Documentation improvements
 - [ ] First release (v1.0.0)
@@ -116,8 +116,7 @@ modelgrid/
 │   │   │   └── embeddings.ts # /v1/embeddings
 │   │   └── middleware/       # Request processing
 │   │       ├── auth.ts       # API key validation
-│   │       ├── sanity.ts     # Request validation
+│   │       └── sanity.ts     # Request validation
 │   │       └── proxy.ts      # Container proxy
 │   ├── models/               # Model management
 │   │   ├── index.ts
 │   │   ├── registry.ts       # Model registry
@@ -134,6 +133,17 @@ modelgrid/
 └── docs/                     # Documentation
 ```
 ## Test Commands
 ```bash
 deno task check
 deno test --allow-all test/test.ts
 deno test --allow-all test/
 ```
 The focused seam tests currently cover API routing, API server endpoints, config normalization,
 model registry loading, and cluster state persistence.
 ---
 ## CLI Commands
@@ -177,7 +187,7 @@ modelgrid config init         # Initialize configuration
 ## Greenlit Model System
-Models are controlled via a remote greenlist to prevent arbitrary downloads:
+Models are resolved through a remote catalog so deployments come from an explicit allowlist:
 ```json
 {
@@ -0,0 +1,414 @@
 # 🖥️ ModelGrid — UI Concept
 **A browser-based operations console for ModelGrid, served by the same daemon that
 already exposes the OpenAI-compatible API.**
 This document sketches the user interface that will sit on top of the ModelGrid
 daemon: what it shows, how it is organized, how an operator moves through it,
 and how it stays in sync with a running node or a small cluster. It is a
 concept, not a final spec — the goal is to lock the shape of the product
 before any frontend code is written.
 The structural idioms (tabbed top-level views, route-origin awareness,
 embedded ops dashboard on a dedicated port, API-first with a thin UI on top)
 are adapted from `@serve.zone/dcrouter`'s Ops dashboard. ModelGrid's UI should
 feel familiar to anyone who has operated dcrouter, while staying grounded in
 ModelGrid's own domain: GPUs, vLLM deployments, a public model catalog, and a
 cluster of gateway-capable nodes.
 ## 🎯 Purpose & Audience
 - **Primary user:** the operator of one or a few ModelGrid nodes. Often the
  same person who provisioned the GPU host and ran `modelgrid service enable`.
 - **Secondary user:** a platform engineer wiring ModelGrid into an internal
  AI platform who needs to manage API keys, audit deployments, and watch
  request traffic.
 - **Not an end-user chat UI.** Consumers of the OpenAI-compatible API keep
  using their own SDKs and tools. The browser UI is for operating the fleet,
  not for prompting models.
 The UI should collapse gracefully from a full cluster view down to a
 single-node, standalone deployment, because both shapes are first-class in
 ModelGrid's `cluster.role` model (`standalone` / `control-plane` / `worker`).
 ## 🧭 Top-Level Information Architecture
 URLs follow `/{view}` for flat views and `/{view}/{subview}` for tabbed
 views, matching dcrouter's routing idiom.
 ```
 /overview
  /stats
  /configuration
 /cluster
  /nodes
  /placements
  /desired
 /gpus
  /devices
  /drivers
 /deployments
  /active
  /history
 /models
  /catalog
  /deployed
 /access
  /apikeys
  /clients
 /logs                  (flat)
 /metrics               (flat)
 /settings              (flat)
 ```
 Rationale for the split:
 - **Overview** is the landing page — one screen that answers "is the fleet
  healthy right now?"
 - **Cluster / GPUs / Deployments / Models** are the four nouns an operator
  actually reasons about when running ModelGrid. Keeping them at the top
  level matches the CLI verbs (`modelgrid cluster`, `modelgrid gpu`,
  `modelgrid container`, `modelgrid model`) so muscle memory transfers.
 - **Access** consolidates the authn/authz surface (API keys today,
  user/OIDC later) into one place, the way dcrouter groups `apitokens` and
  `users` under `access`.
 - **Logs** and **Metrics** are flat because they are cross-cutting streams,
  not noun-scoped tabs.
 The navigation chrome itself is a persistent left rail on desktop, collapsing
 into a top hamburger on narrow viewports. The selected view is indicated
 there; subviews surface as a tab strip at the top of the content area.
 ```
 ┌────────────┬──────────────────────────────────────────────────────────────┐
 │  ModelGrid │  Overview ▸ Stats  Configuration                             │
 │            ├──────────────────────────────────────────────────────────────┤
 │  Overview ●│                                                              │
 │  Cluster   │   ┌─ Fleet Health ─────────────────────────────────────┐     │
 │  GPUs      │   │  2 nodes  •  3 GPUs  •  4 deployments  •  api OK   │     │
 │  Deploys   │   └───────────────────────────────────────────────────┘     │
 │  Models    │   ┌─ Live Traffic ──────────────┐ ┌─ GPU Utilization ─┐     │
 │  Access    │   │  42 req/s   p95 820 ms      │ │  ▁▂▄▅▇█▇▅▄▂▁      │     │
 │            │   │  ▁▂▃▅▇▇▅▃▂▁▁▂▄▆             │ │  avg 64%          │     │
 │  Logs      │   └─────────────────────────────┘ └───────────────────┘     │
 │  Metrics   │   ┌─ Deployments ────────────────────────────────────┐      │
 │  Settings  │   │  llama-3.1-8b      running    2/2  nvidia-0,1    │      │
 │            │   │  qwen2.5-7b        running    1/1  nvidia-2      │      │
 │ node: ctrl │   │  bge-m3            pending    0/1  (no capacity) │      │
 │ v1.1.0     │   └──────────────────────────────────────────────────┘      │
 └────────────┴──────────────────────────────────────────────────────────────┘
 ```
 The footer of the rail surfaces the local node's identity (`nodeName`,
 `role`), the daemon version, and a small link to the API base URL —
 equivalent to how dcrouter surfaces its runtime identity in the sidebar.
 ## 📄 Per-View Sketches
 ### Overview ▸ Stats (landing page)
 A dashboard of the things that an on-call operator wants to see in under
 two seconds:
 - **Fleet health band**: green/yellow/red status tiles for nodes, GPUs,
  deployments, API.
 - **Live traffic**: requests/sec, p50/p95/p99 latency, error rate. Sparkline
  for the last 15 minutes, streaming from `/metrics` and a server-pushed
  channel.
 - **GPU utilization strip**: one micro-sparkline per GPU, colored by VRAM
  pressure.
 - **Deployment summary**: the `modelgrid ps` output, but clickable. Each
  row deep-links into Deployments ▸ Active.
 - **Catalog drift**: a small callout when `list.modelgrid.com` has newer
  model entries than the node's cached catalog.
 ### Overview ▸ Configuration
 A read-only rendering of the resolved `/etc/modelgrid/config.json` with
 section headers (`api`, `docker`, `gpus`, `models`, `cluster`). Operators
 can copy the JSON; editing config is intentionally kept to the Settings view
 (or the CLI) to avoid a "two sources of truth" problem.
 ### Cluster ▸ Nodes
 Mirrors `modelgrid cluster nodes`. Each row: node name, role badge
 (`standalone` / `control-plane` / `worker`), advertised URL, last heartbeat,
 GPU inventory summary, status (`active` / `cordoned` / `draining`).
 Row actions: `cordon`, `drain`, `activate` — the same verbs as the CLI.
 Hitting an action fires the corresponding control-plane call and shows an
 in-row toast on success.
 ```
 ┌ Nodes ───────────────────────────────────────────────────────────────────┐
 │  Name          Role            Advertised URL              Heartbeat     │
 │  ──────────────────────────────────────────────────────────────────────  │
 │  control-a     control-plane   http://ctrl.internal:8080   2s ago    ●   │
 │  worker-a      worker          http://wa.internal:8080     3s ago    ●   │
 │  worker-b      worker          http://wb.internal:8080     41s ago   ◐   │
 │                                                            [cordon] [drain]
 └──────────────────────────────────────────────────────────────────────────┘
 ```
 ### Cluster ▸ Placements
 A live map of where every deployed model is currently running, read from
 the control-plane's placement state. Grouped by model, with a column per
 node. Cells show replica count and health. This is where the operator
 answers "where did `llama-3.1-8b` actually end up?".
 ### Cluster ▸ Desired
 The companion to Placements: the desired-state table. Each row is a model
 with a target replica count. Rows can be added (`cluster ensure`), edited
 (`cluster scale`), or removed (`cluster clear`). The reconciler's pending
 work is surfaced as a diff badge: e.g. `+1 replica`, `moving from worker-b
 → worker-a`.
 ### GPUs ▸ Devices
 Mirrors `modelgrid gpu list` / `gpu status`, rendered as a card per GPU:
 vendor, model, VRAM free/total, driver version, temperature, current
 utilization, and which deployment is using it. Cards stream their
 utilization via the realtime channel; no full page reloads.
 ### GPUs ▸ Drivers
 Status per vendor (NVIDIA / AMD / Intel): driver installed? version? any
 known issue? Includes a button to run `modelgrid gpu install`
 interactively — but since the install flow is privileged and interactive,
 the UI only kicks off the CLI walk-through in a terminal session rather
 than trying to reimplement it in the browser. A small "copy the command"
 affordance makes this explicit.
 ### Deployments ▸ Active
 The core operational table. One row per active vLLM deployment:
 - container ID, display name, model, GPU bindings, port, uptime, request
  rate, error rate
 - status pill (`running`, `pending`, `restarting`, `failed`)
 - row actions: `logs`, `stop`, `restart`, `remove`
 Clicking a row opens a detail drawer with sub-tabs:
 - **Summary** — the effective container config and the scheduling
  decision that landed it on this node
 - **Logs** — a live tail (SSE)
 - **Metrics** — request latency histogram, token throughput, VRAM
  occupancy
 - **Events** — a timeline of lifecycle events (scheduled, pulled image,
  started, health check, restart, stopped)
 ### Deployments ▸ History
 Deployments that have been stopped or removed, with the reason and the
 last-known logs. Useful for post-mortem on a failed deploy.
 ### Models ▸ Catalog
 The current catalog resolved from `list.modelgrid.com`, with a "refresh"
 action that calls `modelgrid model refresh`. Each entry shows canonical
 ID, aliases, capabilities (chat / completions / embeddings), minimum
 VRAM, default GPU count, and a `Deploy` button. Deploying opens a small
 form that mirrors `modelgrid run`: target node (or auto), desired replica
 count, optional env overrides (e.g. `HF_TOKEN`).
 A visible "source" badge marks whether the entry came from the public
 catalog or a custom `registryUrl`, so operators can tell at a glance which
 models the cluster will actually trust for auto-deploy.
 ### Models ▸ Deployed
 Shows the union of what is running across the cluster, with replica
 counts, keyed by canonical model ID. This is the view a developer asks
 the operator for when they want to know "what models can I hit on this
 endpoint?". It is effectively a pretty rendering of `/v1/models`.
 ### Access ▸ API Keys
 Mirrors `modelgrid config apikey list`. Columns: label, prefix (first
 8 chars), created, last used, status. Actions: `generate`, `revoke`.
 Generating a key shows the secret once in a modal with a copy button,
 then never shows it again — the same contract as dcrouter's API tokens.
 ### Access ▸ Clients
 Placeholder for per-consumer rate limits, quotas, and request labels.
 This view is explicitly future work; it renders as "not yet configured"
 until the daemon exposes client records. Listing it now reserves the IA
 slot so it doesn't have to be retrofitted later.
 ### Logs
 A unified tail across daemon, scheduler, and deployments, with filters
 by source (`daemon`, `scheduler`, `deployment:<id>`), level, and
 free-text. Streamed via SSE. A "pause" toggle freezes the view for
 reading; a "download" action exports the current buffer as NDJSON.
 ### Metrics
 The `/metrics` endpoint rendered as a small set of charts (request rate,
 latency, error rate, VRAM occupancy, model throughput). This is
 deliberately lightweight — serious monitoring is expected to come from
 Prometheus scraping `/metrics` into Grafana, and the UI says so with a
 link to the recommended dashboard snippet.
 ### Settings
 Editable configuration, grouped to match the config file:
 - **API** — port, bind host, CORS, rate limit
 - **Docker** — runtime, network name, socket path
 - **GPUs** — auto-detect toggle, per-GPU assignments
 - **Models** — registry URL, auto-deploy, default engine, auto-load list
 - **Cluster** — role, advertise URL, control-plane URL, shared secret,
  heartbeat interval, seeds
 Edits write through the daemon's config API (to be defined) and reload
 without a restart wherever possible. Settings that require a restart are
 marked with a `restart required` badge, and the UI surfaces a single
 "restart daemon" action at the top of the view when any are pending.
 ## 🛤️ Key User Journeys
 ### Deploy a model from the catalog
 1. Operator opens **Models ▸ Catalog**, filters for chat-capable models
   with VRAM ≤ 24 GB.
 2. Clicks `Deploy` on `meta-llama/Llama-3.1-8B-Instruct`.
 3. Dialog appears with target node (`auto` / specific worker), replica
   count (default from catalog), optional env (`HF_TOKEN`).
 4. On submit, the UI calls the control plane (`cluster ensure` + `scale`
   under the hood). The dialog closes and the new row appears in
   **Deployments ▸ Active** in `pending` state.
 5. SSE updates walk the row through `pulling image → starting → running`.
 6. A toast links to the deployment detail drawer for logs.
 ### Add a worker node to an existing control plane
 1. Operator opens **Cluster ▸ Nodes** on the control plane.
 2. Clicks `Add node`, which opens a helper that pre-fills the worker's
   expected `cluster` config block — role, control-plane URL, shared
   secret — and exposes a one-liner install command.
 3. The operator runs the install command on the worker host. The UI does
   **not** SSH into anything; it just hands out the exact snippet.
 4. Once the worker's daemon starts and registers, the new node appears
   in the Nodes table with its first heartbeat. The helper closes
   automatically.
 ### Rotate an API key
 1. **Access ▸ API Keys** → `Generate`.
 2. Name the key, pick a scope (today: single scope; later: per-model).
 3. The secret is shown once in a modal; copy-to-clipboard and a clear
   "you will not see this again" note.
 4. Old key row gets a `revoke` action. Revoke is a confirm-then-apply
   flow because it will break live traffic.
 ### Investigate a failing deployment
 1. **Overview ▸ Stats** shows a red tile: `1 deployment failed`.
 2. Click drills into **Deployments ▸ Active**, filtered to `failed`.
 3. Open the row drawer → **Events** tab to see the lifecycle timeline.
 4. Jump to **Logs** tab for the live tail. If the deployment is down,
   fall back to the last 500 lines from its event buffer.
 5. From the drawer, `restart` retries the deployment; if it fails again,
   the `Summary` tab shows the scheduling decision so the operator can
   see whether VRAM, GPU pinning, or image pull is the root cause.
 ## 📡 Realtime, Auth, and API Contract
 - **Realtime updates.** Metrics, logs, GPU utilization, heartbeats, and
  deployment state changes stream over Server-Sent Events. A single
  `/v1/_ui/events?topics=...` endpoint is preferred over per-feature
  sockets so the browser holds exactly one connection. WebSocket is
  reserved for bidirectional features (e.g. an interactive install
  walkthrough) that we do not need in v1.
 - **Auth model.** The UI runs behind the same daemon process as the
  OpenAI-compatible API, on a dedicated `uiPort` (default `8081`) to
  keep the data-plane clean. Login uses a session cookie; the first-boot
  bootstrap seeds an `admin` user with a one-time password printed to
  `journalctl -u modelgrid`, the same way dcrouter prints its initial
  `admin`/`admin`. SSO/OIDC is a later add-on.
 - **API contract.** Every UI action maps to an HTTP endpoint on the
  daemon (`/v1/_ui/...`). The UI must not talk to any private internals
  directly; this keeps `@modelgrid.com/modelgrid-apiclient` (a future
  sibling to `@serve.zone/dcrouter-apiclient`) able to do everything the
  UI can do, from scripts.
 - **Origin badges.** Similar to dcrouter's `config` / `email` / `dns` /
  `api` route-origin model, ModelGrid should tag each deployment with
  its origin: `config` (seeded via `containers` in config.json),
  `catalog` (auto-deployed from `models.autoLoad`), `api` (created via
  UI/API). Origin determines what the UI allows: `config`-origin
  deployments are toggle-only, `api`-origin deployments are full CRUD.
 ## 🧱 Implementation Notes (non-binding)
 - **Web component stack.** Match the dcrouter OpsServer approach:
  component-per-view under `ts_web/elements/<area>/`, a tiny
  SmartRouter-style client router (`ts_web/router.ts`), and a single
  `appstate.ts` as the store.
 - **Bundled into the binary via `ts_bundled/bundle.ts`.** ModelGrid is a
  Deno project that ships as a `deno compile` single binary, so the UI
  follows the `@stack.gallery/registry` pattern: a build step bundles
  the `ts_web/` sources (HTML, JS, CSS, fonts, icons) into a single
  generated `ts_bundled/bundle.ts` module that exports a
  `{ path → bytes | string }` map. The daemon dynamically imports that
  module at startup and hands the map to **typedserver**, which serves
  it on the UI port. Result: no external asset directory, no runtime
  filesystem dependency, one binary still ships the entire console.
 - **Dev vs prod asset source.** In `deno task dev`, typedserver is
  pointed at `ts_web/` on disk so UI edits are hot-reloadable without
  re-running the bundler. In `deno task compile` / prod, the bundler
  regenerates `ts_bundled/bundle.ts` first and the compiled binary
  serves exclusively from the embedded map. A single flag
  (`UI_ASSET_SOURCE=disk|bundle`, default `bundle`) picks the strategy
  at runtime.
 - **Bundler placement.** Mirrors `@stack.gallery/registry`: keep the
  bundler in `scripts/bundle-ui.ts`, invoke it from a `deno task
  bundle:ui` that the `compile:all` task depends on, and `.gitignore`
  the generated `ts_bundled/bundle.ts` so it is only produced during
  release builds (or regenerated on demand for local prod testing).
 - **Packaging.** Follow dcrouter's module split: `@modelgrid.com/modelgrid`
  ships the daemon and the embedded UI bundle; a future
  `@modelgrid.com/modelgrid-web` can carve out the web sources as their
  own publishable boundary if the bundle grows large or the UI needs to
  be consumed independently.
 - **Dark theme default** (black background, high-contrast foreground) to
  match dcrouter and the expected server-ops environment. Light theme
  is a later toggle.
 - **No server-side rendering.** The UI is a static SPA; typedserver
  returns the asset map's `index.html` for the app shell and the rest
  of the state comes from the API. This keeps the runtime surface
  small and makes the UI-less `curl` story identical to the UI story.
 ## ❓ Open Questions
 - **Edit config from the UI or keep it CLI/file-first?** Current lean:
  UI is authoritative only for API keys, deployments, and cluster
  actions. Config editing is exposed but optional, with CLI still the
  canonical path for reproducible installs.
 - **Do we expose a model prompt playground?** Nice to have for smoke
  tests, but it blurs the operator/consumer line. Defer to v2.
 - **Cluster-wide vs per-node view.** On a worker node, should the UI
  show only local state, or proxy the control plane's cluster view? The
  current lean: workers show local-only, and link to the control plane
  for cluster views. This avoids split-brain confusion.
 - **Access control granularity.** API keys today are coarse (all or
  nothing). A future model might scope keys per deployment or per
  model. Reserve the column in the Access ▸ API Keys table now.
 ## 🛑 Out of Scope (for this concept)
 - End-user chat or prompt UIs for the OpenAI-compatible API.
 - Billing, quotas, or usage-based pricing dashboards.
 - Multi-tenant isolation beyond per-API-key separation.
 - Anything specific to non-vLLM runtimes — the UI assumes the v1.1.0
  reorientation around vLLM as the only first-class runtime.
@@ -0,0 +1,88 @@
 #!/usr/bin/env -S deno run --allow-read --allow-write
 /**
 * bundle-ui.ts
 *
 * Walks `ts_web/` and emits `ts_bundled/bundle.ts`, a single TypeScript
 * module that exports every UI asset as base64 in order. The daemon's
 * UI server imports this module at runtime to serve the console without
 * any external filesystem dependency — the entire browser app ends up
 * embedded in the `deno compile` binary.
 *
 * The output shape matches the `@stack.gallery/registry` convention so
 * a consumer can loop `files` as `{ path, contentBase64 }` entries.
 */
 import { walk } from 'jsr:@std/fs@^1.0.0/walk';
 import { fromFileUrl, join, relative } from 'jsr:@std/path@^1.0.0';
 const here = fromFileUrl(new URL('./', import.meta.url));
 const repoRoot = join(here, '..');
 const sourceDir = join(repoRoot, 'ts_web');
 const outDir = join(repoRoot, 'ts_bundled');
 const outFile = join(outDir, 'bundle.ts');
 async function main(): Promise<void> {
  const entries: Array<{ path: string; contentBase64: string; size: number }> = [];
  for await (
    const entry of walk(sourceDir, {
      includeDirs: false,
      includeSymlinks: false,
    })
  ) {
    const rel = relative(sourceDir, entry.path).replaceAll('\\', '/');
    const bytes = await Deno.readFile(entry.path);
    entries.push({
      path: rel,
      contentBase64: encodeBase64(bytes),
      size: bytes.byteLength,
    });
  }
  entries.sort((a, b) => a.path.localeCompare(b.path));
  const generatedAt = new Date().toISOString();
  const totalBytes = entries.reduce((sum, e) => sum + e.size, 0);
  const header = [
    '// AUTO-GENERATED — do not edit.',
    '// Regenerate with: deno task bundle:ui',
    `// Source: ts_web/ (${entries.length} files, ${totalBytes} bytes)`,
    `// Generated: ${generatedAt}`,
    '',
    'export interface IBundledFile {',
    '  path: string;',
    '  contentBase64: string;',
    '}',
    '',
    'export const files: IBundledFile[] = [',
  ].join('\n');
  const body = entries.map((e) =>
    `  { path: ${JSON.stringify(e.path)}, contentBase64: ${JSON.stringify(e.contentBase64)} },`
  ).join('\n');
  const footer = '\n];\n';
  await Deno.mkdir(outDir, { recursive: true });
  await Deno.writeTextFile(outFile, header + '\n' + body + footer);
  console.log(
    `bundle-ui: wrote ${entries.length} file(s), ${totalBytes} bytes → ${
      relative(repoRoot, outFile)
    }`,
  );
 }
 function encodeBase64(bytes: Uint8Array): string {
  let binary = '';
  for (let i = 0; i < bytes.length; i++) {
    binary += String.fromCharCode(bytes[i]);
  }
  return btoa(binary);
 }
 if (import.meta.main) {
  await main();
 }
@@ -0,0 +1,131 @@
 import { assertEquals } from 'jsr:@std/assert@^1.0.0';
 import { EventEmitter } from 'node:events';
 import { AuthMiddleware } from '../ts/api/middleware/auth.ts';
 import { ApiRouter } from '../ts/api/router.ts';
 class TestResponse {
  public statusCode = 200;
  public headers: Record<string, string> = {};
  public body = '';
  public writeHead(statusCode: number, headers: Record<string, string>): TestResponse {
    this.statusCode = statusCode;
    this.headers = headers;
    return this;
  }
  public end(body = ''): TestResponse {
    this.body = body;
    return this;
  }
 }
 class TestRequest extends EventEmitter {
  public method: string;
  public headers: Record<string, string>;
  public destroyed = false;
  public paused = false;
  constructor(method: string, headers: Record<string, string>) {
    super();
    this.method = method;
    this.headers = headers;
  }
  public pause(): this {
    this.paused = true;
    return this;
  }
  public destroy(): this {
    this.destroyed = true;
    return this;
  }
 }
 function createRouter(): ApiRouter {
  return new ApiRouter(
    {} as never,
    {} as never,
    {} as never,
    {} as never,
    ['valid-key'],
    {
      authMiddleware: new AuthMiddleware(['valid-key']),
      sanityMiddleware: {
        validateChatRequest() {
          return { valid: true };
        },
        sanitizeChatRequest(body: Record<string, unknown>) {
          return body;
        },
        validateEmbeddingsRequest() {
          return { valid: true };
        },
        sanitizeEmbeddingsRequest(body: Record<string, unknown>) {
          return body;
        },
      } as never,
      chatHandler: {
        async handleChatCompletion() {
          throw new Error('chat handler should not run in this test');
        },
      } as never,
      modelsHandler: {
        async handleListModels() {
          throw new Error('models handler should not run in this test');
        },
      } as never,
      embeddingsHandler: {
        async handleEmbeddings() {
          throw new Error('embeddings handler should not run in this test');
        },
      } as never,
    },
  );
 }
 Deno.test('ApiRouter returns 404 for unknown endpoints', async () => {
  const router = createRouter();
  const response = new TestResponse();
  await router.route(
    { method: 'GET', headers: {} } as never,
    response as never,
    '/does-not-exist',
  );
  assertEquals(response.statusCode, 404);
  assertEquals(JSON.parse(response.body).error.type, 'invalid_request_error');
 });
 Deno.test('ApiRouter rejects protected endpoints without a bearer token', async () => {
  const router = createRouter();
  const response = new TestResponse();
  await router.route(
    { method: 'GET', headers: {} } as never,
    response as never,
    '/v1/models',
  );
  assertEquals(response.statusCode, 401);
  assertEquals(JSON.parse(response.body).error.type, 'authentication_error');
 });
 Deno.test('ApiRouter returns 413 for oversized request bodies', async () => {
  const router = createRouter();
  const request = new TestRequest('POST', {
    authorization: 'Bearer valid-key',
  });
  const response = new TestResponse();
  const routePromise = router.route(request as never, response as never, '/v1/chat/completions');
  request.emit('data', 'x'.repeat(10 * 1024 * 1024 + 1));
  await routePromise;
  assertEquals(response.statusCode, 413);
  assertEquals(request.paused, true);
  assertEquals(request.destroyed, true);
  assertEquals(JSON.parse(response.body).error.message, 'Request body too large');
 });
@@ -0,0 +1,315 @@
 import { assertEquals } from 'jsr:@std/assert@^1.0.0';
 import { ApiServer } from '../ts/api/server.ts';
 Deno.test('ApiServer serves health metrics and authenticated model listings', async () => {
  const port = 18100 + Math.floor(Math.random() * 1000);
  const server = new ApiServer(
    {
      host: '127.0.0.1',
      port,
      apiKeys: ['valid-key'],
      cors: false,
      corsOrigins: [],
    },
    {
      async getAllStatus() {
        return new Map([
          ['vllm-1', { running: true, health: 'healthy' }],
        ]);
      },
      async getAllAvailableModels() {
        return new Map([
          ['meta-llama/Llama-3.1-8B-Instruct', [{ type: 'vllm' }]],
        ]);
      },
    } as never,
    {
      async getAllModels() {
        return [
          {
            id: 'meta-llama/Llama-3.1-8B-Instruct',
            engine: 'vllm',
            source: { repo: 'meta-llama/Llama-3.1-8B-Instruct' },
            capabilities: { chat: true },
            requirements: { minVramGb: 18 },
          },
        ];
      },
    } as never,
    {} as never,
    {
      getStatus() {
        return {
          localNode: null,
          nodes: [],
          models: {},
          desiredDeployments: [],
        };
      },
    } as never,
    {
      gpuDetector: {
        async detectGpus() {
          return [{ id: 'nvidia-0' }];
        },
      } as never,
    },
  );
  await server.start();
  try {
    const healthResponse = await fetch(`http://127.0.0.1:${port}/health`);
    const healthBody = await healthResponse.json();
    assertEquals(healthResponse.status, 200);
    assertEquals(healthBody.status, 'ok');
    assertEquals(healthBody.models, 1);
    assertEquals(Array.isArray(healthBody.reasons), true);
    assertEquals(healthBody.reasons.length, 0);
    assertEquals(typeof healthResponse.headers.get('x-request-id'), 'string');
    const metricsResponse = await fetch(`http://127.0.0.1:${port}/metrics`);
    const metricsBody = await metricsResponse.text();
    assertEquals(metricsResponse.status, 200);
    assertEquals(metricsBody.includes('modelgrid_uptime_seconds'), true);
    assertEquals(metricsBody.includes('modelgrid_models_available 1'), true);
    const unauthenticatedModels = await fetch(`http://127.0.0.1:${port}/v1/models`);
    const unauthenticatedBody = await unauthenticatedModels.json();
    assertEquals(unauthenticatedModels.status, 401);
    assertEquals(unauthenticatedBody.error.type, 'authentication_error');
    const authenticatedModels = await fetch(`http://127.0.0.1:${port}/v1/models`, {
      headers: {
        Authorization: 'Bearer valid-key',
        'X-Request-Id': 'req-test-models',
      },
    });
    const authenticatedBody = await authenticatedModels.json();
    assertEquals(authenticatedModels.status, 200);
    assertEquals(authenticatedBody.object, 'list');
    assertEquals(authenticatedBody.data[0].id, 'meta-llama/Llama-3.1-8B-Instruct');
    assertEquals(authenticatedModels.headers.get('x-request-id'), 'req-test-models');
    const metricsAfterRequests = await fetch(`http://127.0.0.1:${port}/metrics`);
    const metricsAfterRequestsBody = await metricsAfterRequests.text();
    assertEquals(
      metricsAfterRequestsBody.includes('modelgrid_api_requests_total{path="/v1/models"} 2'),
      true,
    );
    assertEquals(
      metricsAfterRequestsBody.includes('modelgrid_api_auth_failures_total{path="/v1/models"} 1'),
      true,
    );
  } finally {
    await server.stop();
  }
 });
 Deno.test('ApiServer metrics expose 5xx counts for failing endpoints', async () => {
  const port = 19100 + Math.floor(Math.random() * 1000);
  let failModelListing = true;
  const server = new ApiServer(
    {
      host: '127.0.0.1',
      port,
      apiKeys: ['valid-key'],
      cors: false,
      corsOrigins: [],
    },
    {
      async getAllStatus() {
        return new Map();
      },
      async getAllAvailableModels() {
        if (failModelListing) {
          failModelListing = false;
          throw new Error('models unavailable');
        }
        return new Map();
      },
    } as never,
    {
      async getAllModels() {
        return [];
      },
    } as never,
    {} as never,
    {
      getStatus() {
        return {
          localNode: null,
          nodes: [],
          models: {},
          desiredDeployments: [],
        };
      },
    } as never,
    {
      gpuDetector: {
        async detectGpus() {
          return [];
        },
      } as never,
    },
  );
  await server.start();
  try {
    const failedModels = await fetch(`http://127.0.0.1:${port}/v1/models`, {
      headers: {
        Authorization: 'Bearer valid-key',
      },
    });
    assertEquals(failedModels.status, 500);
    await failedModels.text();
    const metricsResponse = await fetch(`http://127.0.0.1:${port}/metrics`);
    const metricsBody = await metricsResponse.text();
    assertEquals(
      metricsBody.includes('modelgrid_api_server_errors_total{path="/v1/models"} 1'),
      true,
    );
  } finally {
    await server.stop();
  }
 });
 Deno.test('ApiServer health reports degraded reasons', async () => {
  const port = 19300 + Math.floor(Math.random() * 1000);
  const server = new ApiServer(
    {
      host: '127.0.0.1',
      port,
      apiKeys: ['valid-key'],
      cors: false,
      corsOrigins: [],
    },
    {
      async getAllStatus() {
        return new Map([
          ['vllm-1', { running: false, health: 'unhealthy' }],
        ]);
      },
      async getAllAvailableModels() {
        return new Map();
      },
    } as never,
    {
      async getAllModels() {
        return [];
      },
    } as never,
    {} as never,
    {
      getStatus() {
        return {
          localNode: null,
          nodes: [],
          models: {},
          desiredDeployments: [],
        };
      },
    } as never,
    {
      gpuDetector: {
        async detectGpus() {
          return [{ id: 'nvidia-0' }];
        },
      } as never,
    },
  );
  await server.start();
  try {
    const response = await fetch(`http://127.0.0.1:${port}/health`);
    const body = await response.json();
    assertEquals(response.status, 503);
    assertEquals(body.status, 'degraded');
    assertEquals(body.reasons.includes('unhealthy_container'), true);
    assertEquals(body.reasons.includes('no_models_available'), true);
  } finally {
    await server.stop();
  }
 });
 Deno.test('ApiServer enforces api rate limits while exempting health and metrics', async () => {
  const port = 19200 + Math.floor(Math.random() * 1000);
  const server = new ApiServer(
    {
      host: '127.0.0.1',
      port,
      apiKeys: ['valid-key'],
      rateLimit: 2,
      cors: false,
      corsOrigins: [],
    },
    {
      async getAllStatus() {
        return new Map();
      },
      async getAllAvailableModels() {
        return new Map([
          ['meta-llama/Llama-3.1-8B-Instruct', [{ type: 'vllm' }]],
        ]);
      },
    } as never,
    {
      async getAllModels() {
        return [];
      },
    } as never,
    {} as never,
    {
      getStatus() {
        return {
          localNode: null,
          nodes: [],
          models: {},
          desiredDeployments: [],
        };
      },
    } as never,
    {
      gpuDetector: {
        async detectGpus() {
          return [];
        },
      } as never,
    },
  );
  await server.start();
  try {
    const requestHeaders = {
      Authorization: 'Bearer valid-key',
    };
    const first = await fetch(`http://127.0.0.1:${port}/v1/models`, { headers: requestHeaders });
    assertEquals(first.status, 200);
    await first.text();
    const second = await fetch(`http://127.0.0.1:${port}/v1/models`, { headers: requestHeaders });
    assertEquals(second.status, 200);
    await second.text();
    const third = await fetch(`http://127.0.0.1:${port}/v1/models`, { headers: requestHeaders });
    assertEquals(third.status, 429);
    assertEquals((await third.json()).error.type, 'rate_limit_exceeded');
    const health = await fetch(`http://127.0.0.1:${port}/health`);
    assertEquals(health.status, 200);
    await health.text();
    const metrics = await fetch(`http://127.0.0.1:${port}/metrics`);
    assertEquals(metrics.status, 200);
    await metrics.text();
  } finally {
    await server.stop();
  }
 });
@@ -0,0 +1,120 @@
 import { assertEquals } from 'jsr:@std/assert@^1.0.0';
 import { ChatHandler } from '../ts/api/handlers/chat.ts';
 import { EmbeddingsHandler } from '../ts/api/handlers/embeddings.ts';
 import { UpstreamTimeoutError } from '../ts/containers/base-container.ts';
 class TestResponse {
  public statusCode = 200;
  public headers: Record<string, string> = {};
  public body = '';
  public writeHead(statusCode: number, headers: Record<string, string>): TestResponse {
    this.statusCode = statusCode;
    this.headers = headers;
    return this;
  }
  public end(body = ''): TestResponse {
    this.body = body;
    return this;
  }
  public write(_chunk: string | Uint8Array): boolean {
    return true;
  }
 }
 Deno.test('ChatHandler maps upstream timeouts to 504 responses', async () => {
  const handler = new ChatHandler(
    {
      async findContainerForModel() {
        return {
          async chatCompletion() {
            throw new UpstreamTimeoutError();
          },
          async chatCompletionStream() {
            throw new UpstreamTimeoutError();
          },
        };
      },
    } as never,
    {
      async getModel(modelName: string) {
        return { id: modelName };
      },
    } as never,
    {
      async loadModel() {
        return { success: false };
      },
    } as never,
    {
      shouldDeployLocallyFirst() {
        return false;
      },
    } as never,
  );
  const response = new TestResponse();
  await handler.handleChatCompletion(
    { headers: {} } as never,
    response as never,
    { model: 'meta-llama/Llama-3.1-8B-Instruct', messages: [{ role: 'user', content: 'hi' }] },
  );
  assertEquals(response.statusCode, 504);
  assertEquals(JSON.parse(response.body).error.type, 'upstream_timeout');
 });
 Deno.test('EmbeddingsHandler maps upstream timeouts to 504 responses', async () => {
  const originalFetch = globalThis.fetch;
  globalThis.fetch = async () => {
    const error = new Error('request aborted');
    error.name = 'AbortError';
    throw error;
  };
  try {
    const handler = new EmbeddingsHandler(
      {
        async findContainerForModel() {
          return null;
        },
      } as never,
      {
        async getModel(modelName: string) {
          return { id: modelName };
        },
      } as never,
      {
        async ensureModelViaControlPlane(modelName: string) {
          return {
            location: {
              modelId: modelName,
              nodeName: 'worker-a',
              endpoint: 'http://worker-a:8080',
              healthy: true,
              engine: 'vllm',
              containerId: 'remote',
            },
          };
        },
        getLocalNodeName() {
          return 'control';
        },
      } as never,
    );
    const response = new TestResponse();
    await handler.handleEmbeddings(
      { headers: {} } as never,
      response as never,
      { model: 'BAAI/bge-m3', input: 'hello' },
    );
    assertEquals(response.statusCode, 504);
    assertEquals(JSON.parse(response.body).error.type, 'upstream_timeout');
  } finally {
    globalThis.fetch = originalFetch;
  }
 });
@@ -0,0 +1,134 @@
 import { assertEquals, assertExists } from 'jsr:@std/assert@^1.0.0';
 import { CLUSTER, PATHS } from '../ts/constants.ts';
 import { ClusterManager } from '../ts/cluster/cluster-manager.ts';
 import type { IClusterNodeHeartbeat } from '../ts/interfaces/cluster.ts';
 function createNode(nodeName: string, lastSeenAt: number): IClusterNodeHeartbeat {
  return {
    nodeName,
    role: nodeName === 'control' ? 'control-plane' : 'worker',
    endpoint: `http://${nodeName}:8080`,
    healthy: true,
    resources: {
      gpuCount: 1,
      totalVramGb: 24,
      availableVramGb: 24,
      maxSingleGpuVramGb: 24,
      largestGpuGroupCount: 1,
      largestGpuGroupVramGb: 24,
      deploymentCount: 0,
      topologyGroups: [
        {
          id: 'nvidia-0',
          vendor: 'nvidia',
          gpuIds: ['gpu-0'],
          gpuCount: 1,
          totalVramGb: 24,
          maxSingleGpuVramGb: 24,
          busNumbers: [1],
        },
      ],
    },
    deployments: [],
    lastSeenAt,
  };
 }
 async function waitForPersistence(): Promise<void> {
  await new Promise((resolve) => setTimeout(resolve, 25));
 }
 Deno.test('ClusterManager initialize loads persisted state and prunes stale nodes', async () => {
  const originalDataDir = PATHS.DATA_DIR;
  const tempDir = await Deno.makeTempDir();
  (PATHS as { DATA_DIR: string }).DATA_DIR = tempDir;
  try {
    const now = Date.now();
    await Deno.writeTextFile(
      `${tempDir}/cluster-state.json`,
      JSON.stringify({
        nodes: [
          createNode('control', now),
          createNode('worker-fresh', now),
          createNode('worker-stale', now - CLUSTER.NODE_STALE_AFTER_MS - 1000),
        ],
      }),
    );
    await Deno.writeTextFile(
      `${tempDir}/cluster-control-state.json`,
      JSON.stringify({
        desiredDeployments: [
          { modelId: 'meta-llama/Llama-3.1-8B-Instruct', desiredReplicas: 2, updatedAt: now },
        ],
        nodeSchedulerStates: {
          'worker-fresh': 'cordoned',
        },
      }),
    );
    const clusterManager = new ClusterManager();
    clusterManager.configure({
      enabled: true,
      nodeName: 'control',
      role: 'control-plane',
      bindHost: '0.0.0.0',
      gossipPort: 7946,
      heartbeatIntervalMs: 5000,
      seedNodes: [],
    });
    await clusterManager.initialize();
    assertEquals(clusterManager.getAllNodes().map((node) => node.nodeName), ['control', 'worker-fresh']);
    assertExists(clusterManager.getLocalNode());
    assertEquals(clusterManager.getDesiredDeployments().length, 1);
    assertEquals(clusterManager.getNodeSchedulerState('worker-fresh'), 'cordoned');
  } finally {
    (PATHS as { DATA_DIR: string }).DATA_DIR = originalDataDir;
    await Deno.remove(tempDir, { recursive: true });
  }
 });
 Deno.test('ClusterManager persists state only after initialization completes', async () => {
  const originalDataDir = PATHS.DATA_DIR;
  const tempDir = await Deno.makeTempDir();
  (PATHS as { DATA_DIR: string }).DATA_DIR = tempDir;
  try {
    const clusterManager = new ClusterManager();
    clusterManager.configure({
      enabled: true,
      nodeName: 'control',
      role: 'control-plane',
      bindHost: '0.0.0.0',
      gossipPort: 7946,
      heartbeatIntervalMs: 5000,
      seedNodes: [],
    });
    clusterManager.updateLocalNode(createNode('control', Date.now()));
    clusterManager.upsertDesiredDeployment('meta-llama/Llama-3.1-8B-Instruct', 1);
    await waitForPersistence();
    assertEquals(await Deno.stat(`${tempDir}/cluster-state.json`).catch(() => null), null);
    assertEquals(await Deno.stat(`${tempDir}/cluster-control-state.json`).catch(() => null), null);
    await clusterManager.initialize();
    clusterManager.updateLocalNode(createNode('control', Date.now()));
    clusterManager.setNodeSchedulerState('control', 'active');
    clusterManager.upsertDesiredDeployment('meta-llama/Llama-3.1-8B-Instruct', 3);
    await waitForPersistence();
    const stateFile = JSON.parse(await Deno.readTextFile(`${tempDir}/cluster-state.json`));
    const controlFile = JSON.parse(await Deno.readTextFile(`${tempDir}/cluster-control-state.json`));
    assertEquals(stateFile.nodes.length, 1);
    assertEquals(stateFile.nodes[0].nodeName, 'control');
    assertEquals(controlFile.desiredDeployments[0].desiredReplicas, 3);
    assertEquals(controlFile.nodeSchedulerStates.control, 'active');
  } finally {
    (PATHS as { DATA_DIR: string }).DATA_DIR = originalDataDir;
    await Deno.remove(tempDir, { recursive: true });
  }
 });
@@ -0,0 +1,111 @@
 import { assertEquals } from 'jsr:@std/assert@^1.0.0';
 import { ConfigHandler } from '../ts/cli/config-handler.ts';
 import { PATHS } from '../ts/constants.ts';
 import { logger } from '../ts/logger.ts';
 Deno.test('ConfigHandler init writes the current default config shape', async () => {
  const tempDir = await Deno.makeTempDir();
  const originalConfigDir = PATHS.CONFIG_DIR;
  const originalConfigFile = PATHS.CONFIG_FILE;
  (PATHS as { CONFIG_DIR: string }).CONFIG_DIR = tempDir;
  (PATHS as { CONFIG_FILE: string }).CONFIG_FILE = `${tempDir}/config.json`;
  try {
    const handler = new ConfigHandler();
    await handler.init();
    const config = JSON.parse(await Deno.readTextFile(`${tempDir}/config.json`));
    assertEquals(config.ui.enabled, true);
    assertEquals(config.ui.assetSource, 'bundle');
    assertEquals(config.cluster.role, 'standalone');
    assertEquals(config.models.registryUrl, 'https://list.modelgrid.com/catalog/models.json');
    assertEquals(config.models.autoDeploy, true);
    assertEquals(config.models.defaultEngine, 'vllm');
  } finally {
    (PATHS as { CONFIG_DIR: string }).CONFIG_DIR = originalConfigDir;
    (PATHS as { CONFIG_FILE: string }).CONFIG_FILE = originalConfigFile;
    await Deno.remove(tempDir, { recursive: true });
  }
 });
 Deno.test('ConfigHandler show renders canonical model and ui settings', async () => {
  const tempDir = await Deno.makeTempDir();
  const originalConfigFile = PATHS.CONFIG_FILE;
  const boxes: Array<{ title: string; lines: string[] }> = [];
  const originalLog = logger.log;
  const originalLogBox = logger.logBox;
  (PATHS as { CONFIG_FILE: string }).CONFIG_FILE = `${tempDir}/config.json`;
  logger.log = (_message: string) => {};
  logger.logBox = (
    title: string,
    lines: string[],
  ) => {
    boxes.push({ title, lines });
  };
  try {
    await Deno.writeTextFile(
      `${tempDir}/config.json`,
      JSON.stringify({
        version: '1.0.0',
        api: {
          port: 8080,
          host: '0.0.0.0',
          apiKeys: ['sk-test'],
          rateLimit: 60,
          cors: true,
          corsOrigins: ['*'],
        },
        ui: {
          enabled: true,
          port: 8081,
          host: '0.0.0.0',
          assetSource: 'bundle',
        },
        docker: {
          networkName: 'modelgrid',
          runtime: 'docker',
        },
        gpus: {
          autoDetect: true,
          assignments: {},
        },
        containers: [],
        models: {
          registryUrl: 'https://example.com/catalog.json',
          autoDeploy: false,
          defaultEngine: 'vllm',
          autoLoad: ['meta-llama/Llama-3.1-8B-Instruct'],
        },
        cluster: {
          enabled: false,
          nodeName: 'modelgrid-local',
          role: 'standalone',
          bindHost: '0.0.0.0',
          gossipPort: 7946,
          heartbeatIntervalMs: 5000,
          seedNodes: [],
        },
        checkInterval: 30000,
      }),
    );
    const handler = new ConfigHandler();
    await handler.show();
    const modelsBox = boxes.find((box) => box.title === 'Models');
    assertEquals(modelsBox?.lines.some((line) => line.includes('Auto Deploy:')), true);
    assertEquals(modelsBox?.lines.some((line) => line.includes('Default Engine: vllm')), true);
    assertEquals(modelsBox?.lines.some((line) => line.includes('https://example.com/catalog.json')), true);
    const apiBox = boxes.find((box) => box.title === 'API Server');
    assertEquals(apiBox?.lines.some((line) => line.includes('Rate Limit: 60 req/min')), true);
  } finally {
    logger.log = originalLog;
    logger.logBox = originalLogBox;
    (PATHS as { CONFIG_FILE: string }).CONFIG_FILE = originalConfigFile;
    await Deno.remove(tempDir, { recursive: true });
  }
 });
@@ -0,0 +1,40 @@
 import { assertEquals } from 'jsr:@std/assert@^1.0.0';
 import { ModelRegistry } from '../ts/models/registry.ts';
 Deno.test('ModelRegistry falls back to the built-in catalog when the source is unavailable', async () => {
  const registry = new ModelRegistry('http://127.0.0.1:9/catalog.json');
  const catalog = await registry.fetchCatalog(true);
  assertEquals(catalog.version, '1.0');
  assertEquals(catalog.models.length > 0, true);
 });
 Deno.test('ModelRegistry reads catalog entries from a local file source', async () => {
  const filePath = await Deno.makeTempFile({ suffix: '.json' });
  await Deno.writeTextFile(
    filePath,
    JSON.stringify({
      version: '1.0',
      generatedAt: '2026-01-01T00:00:00.000Z',
      models: [
        {
          id: 'Qwen/Qwen2.5-7B-Instruct',
          aliases: ['qwen-local'],
          engine: 'vllm',
          source: { repo: 'Qwen/Qwen2.5-7B-Instruct' },
          capabilities: { chat: true },
          requirements: { minVramGb: 16 },
        },
      ],
    }),
  );
  try {
    const registry = new ModelRegistry(filePath);
    const model = await registry.getModel('qwen-local');
    assertEquals(model?.id, 'Qwen/Qwen2.5-7B-Instruct');
  } finally {
    await Deno.remove(filePath);
  }
 });
@@ -0,0 +1,119 @@
 import { assertEquals } from 'jsr:@std/assert@^1.0.0';
 import { ConfigManager } from '../ts/config/config-manager.ts';
 import type { IModelGridConfig } from '../ts/interfaces/config.ts';
 import { logger } from '../ts/logger.ts';
 Deno.test('ConfigManager normalizes current config defaults', () => {
  const configManager = new ConfigManager();
  const normalized = configManager.normalizeConfig({
    version: '1.0.0',
    api: {
      port: 9000,
      host: '127.0.0.1',
      apiKeys: ['test-key'],
    },
    docker: {
      networkName: 'modelgrid',
      runtime: 'docker',
    },
    gpus: {
      autoDetect: true,
      assignments: {},
    },
    containers: [],
    models: {
      registryUrl: 'https://example.com/catalog.json',
      autoDeploy: false,
      defaultEngine: 'vllm',
      autoLoad: ['Qwen/Qwen2.5-7B-Instruct'],
    },
    cluster: {
      enabled: false,
      nodeName: 'modelgrid-local',
      role: 'standalone',
      bindHost: '0.0.0.0',
      gossipPort: 7946,
      heartbeatIntervalMs: 5000,
      seedNodes: [],
    },
    checkInterval: 15000,
  });
  assertEquals(normalized.models.registryUrl, 'https://example.com/catalog.json');
  assertEquals(normalized.models.autoDeploy, false);
  assertEquals(normalized.models.defaultEngine, 'vllm');
  assertEquals(normalized.ui.enabled, true);
  assertEquals(normalized.ui.port, 8081);
  assertEquals(normalized.ui.assetSource, 'bundle');
 });
 Deno.test('ConfigManager warns when config contains ignored keys', () => {
  const configManager = new ConfigManager();
  const warnings: string[] = [];
  const originalWarn = logger.warn;
  logger.warn = (message: string) => {
    warnings.push(message);
  };
  try {
    configManager.normalizeConfig({
      version: '1.0.0',
      api: {
        port: 8080,
        host: '127.0.0.1',
        apiKeys: [],
      },
      docker: {
        networkName: 'modelgrid',
        runtime: 'docker',
      },
      gpus: {
        autoDetect: true,
        assignments: {},
      },
      containers: [
        { id: 'legacy', type: 'ollama' } as never,
      ],
      models: {
        registryUrl: 'https://example.com/catalog.json',
        autoDeploy: true,
        defaultEngine: 'vllm',
        autoLoad: [],
        greenlistUrl: 'https://legacy.example.com/catalog.json',
        autoPull: true,
        defaultContainer: 'legacy-container',
      } as IModelGridConfig['models'] & {
        greenlistUrl: string;
        autoPull: boolean;
        defaultContainer: string;
      },
      cluster: {
        enabled: false,
        nodeName: 'modelgrid-local',
        role: 'standalone',
        bindHost: '0.0.0.0',
        gossipPort: 7946,
        heartbeatIntervalMs: 5000,
        seedNodes: [],
      },
      checkInterval: 30000,
      legacySection: true,
    } as Partial<IModelGridConfig> & {
      legacySection: boolean;
      models: IModelGridConfig['models'] & {
        greenlistUrl: string;
        autoPull: boolean;
        defaultContainer: string;
      };
    });
  } finally {
    logger.warn = originalWarn;
  }
  assertEquals(warnings.includes('Ignoring unknown config key: legacySection'), true);
  assertEquals(warnings.includes('Ignoring removed config key: models.greenlistUrl'), true);
  assertEquals(warnings.includes('Ignoring removed config key: models.autoPull'), true);
  assertEquals(warnings.includes('Ignoring removed config key: models.defaultContainer'), true);
  assertEquals(warnings.includes('Ignoring unsupported container type: ollama'), true);
 });
@@ -0,0 +1,67 @@
 // Smoke test for the UI server: bundle mode serves /index.html,
 // disk mode serves /app.js, /_ui/overview returns structured JSON.
 // Run with: deno run --allow-all test/ui-server.smoke.ts
 import { UiServer } from '../ts/ui/server.ts';
 import { ContainerManager } from '../ts/containers/container-manager.ts';
 import { ClusterManager } from '../ts/cluster/cluster-manager.ts';
 async function probe(source: 'bundle' | 'disk', port: number): Promise<void> {
  const cm = new ContainerManager();
  const cluster = new ClusterManager();
  cluster.configure({
    enabled: false,
    nodeName: 'test-node',
    role: 'standalone',
    bindHost: '127.0.0.1',
    gossipPort: 7946,
    heartbeatIntervalMs: 5000,
    seedNodes: [],
  });
  const server = new UiServer(
    { enabled: true, port, host: '127.0.0.1', assetSource: source },
    cm,
    cluster,
  );
  await server.start();
  try {
    const index = await fetch(`http://127.0.0.1:${port}/`);
    const indexBody = await index.text();
    if (!index.ok || !indexBody.includes('ModelGrid')) {
      throw new Error(`[${source}] index.html missing expected content (status=${index.status})`);
    }
    const app = await fetch(`http://127.0.0.1:${port}/app.js`);
    const appBody = await app.text();
    if (!app.ok || !appBody.includes('ModelGrid UI')) {
      throw new Error(`[${source}] app.js missing expected content (status=${app.status})`);
    }
    const spa = await fetch(`http://127.0.0.1:${port}/cluster/nodes`);
    const spaBody = await spa.text();
    if (!spa.ok || !spaBody.includes('ModelGrid')) {
      throw new Error(`[${source}] SPA fallback did not return index.html (status=${spa.status})`);
    }
    const overview = await fetch(`http://127.0.0.1:${port}/_ui/overview`);
    const data = await overview.json();
    if (!overview.ok || data.node?.name !== 'test-node' || !data.health?.status) {
      throw new Error(`[${source}] /_ui/overview unexpected: ${JSON.stringify(data)}`);
    }
    const missing = await fetch(`http://127.0.0.1:${port}/nope.png`);
    if (missing.status !== 404) {
      throw new Error(`[${source}] expected 404 for missing asset, got ${missing.status}`);
    }
    console.log(`ok: ${source} mode — index, app.js, SPA fallback, /_ui/overview, 404`);
  } finally {
    await server.stop();
  }
 }
 await probe('bundle', 18081);
 await probe('disk', 18082);
 console.log('UI server smoke test passed');
@@ -0,0 +1,55 @@
 import { assertEquals } from 'jsr:@std/assert@^1.0.0';
 import { ClusterManager } from '../ts/cluster/cluster-manager.ts';
 import { UiServer } from '../ts/ui/server.ts';
 Deno.test('UiServer overview mirrors degraded API health semantics', async () => {
  const port = 20300 + Math.floor(Math.random() * 1000);
  const cluster = new ClusterManager();
  cluster.configure({
    enabled: false,
    nodeName: 'ui-test-node',
    role: 'standalone',
    bindHost: '127.0.0.1',
    gossipPort: 7946,
    heartbeatIntervalMs: 5000,
    seedNodes: [],
  });
  const server = new UiServer(
    { enabled: true, port, host: '127.0.0.1', assetSource: 'disk' },
    {
      async getAllStatus() {
        return new Map([
          ['vllm-1', { running: false, health: 'unhealthy' }],
        ]);
      },
      async getAllAvailableModels() {
        return new Map();
      },
    } as never,
    cluster,
  );
  (server as unknown as {
    gpuDetector: { detectGpus: () => Promise<unknown[]> };
  }).gpuDetector = {
    async detectGpus() {
      return [{ id: 'nvidia-0' }];
    },
  };
  await server.start();
  try {
    const response = await fetch(`http://127.0.0.1:${port}/_ui/overview`);
    const body = await response.json();
    assertEquals(response.status, 200);
    assertEquals(body.health.status, 'degraded');
    assertEquals(body.health.reasons.includes('unhealthy_container'), true);
    assertEquals(body.health.reasons.includes('no_models_available'), true);
    assertEquals(body.node.name, 'ui-test-node');
  } finally {
    await server.stop();
  }
 });
@@ -6,6 +6,8 @@ import * as http from 'node:http';
 import type { IApiError, IChatCompletionRequest } from '../../interfaces/api.ts';
 import { ClusterCoordinator } from '../../cluster/coordinator.ts';
 import { ContainerManager } from '../../containers/container-manager.ts';
 import { UpstreamTimeoutError } from '../../containers/base-container.ts';
 import { API_SERVER } from '../../constants.ts';
 import { logger } from '../../logger.ts';
 import { ModelRegistry } from '../../models/registry.ts';
 import { ModelLoader } from '../../models/loader.ts';
@@ -85,6 +87,11 @@ export class ChatHandler {
      await this.proxyChatRequest(req, res, ensured.location.endpoint, requestBody);
    } catch (error) {
      if (error instanceof UpstreamTimeoutError) {
        this.sendError(res, 504, error.message, 'upstream_timeout');
        return;
      }
      const message = error instanceof Error ? error.message : String(error);
      logger.error(`Chat completion error: ${message}`);
      this.sendError(res, 500, `Chat completion failed: ${message}`, 'server_error');
@@ -158,11 +165,19 @@ export class ChatHandler {
    targetEndpoint: string,
    body: IChatCompletionRequest,
  ): Promise<void> {
    const controller = new AbortController();
    const timeout = setTimeout(() => controller.abort(), API_SERVER.REQUEST_TIMEOUT_MS);
    const response = await fetch(`${targetEndpoint}/v1/chat/completions`, {
      method: 'POST',
      headers: this.buildForwardHeaders(req),
      body: JSON.stringify(body),
-    });
+      signal: controller.signal,
    }).catch((error) => {
      if (error instanceof Error && error.name === 'AbortError') {
        throw new UpstreamTimeoutError();
      }
      throw error;
    }).finally(() => clearTimeout(timeout));
    if (body.stream) {
      res.writeHead(response.status, {
@@ -11,6 +11,8 @@ import type {
 } from '../../interfaces/api.ts';
 import { ClusterCoordinator } from '../../cluster/coordinator.ts';
 import { ContainerManager } from '../../containers/container-manager.ts';
 import { UpstreamTimeoutError } from '../../containers/base-container.ts';
 import { API_SERVER } from '../../constants.ts';
 import { logger } from '../../logger.ts';
 import { ModelRegistry } from '../../models/registry.ts';
@@ -80,7 +82,7 @@ export class EmbeddingsHandler {
        return;
      }
-      const response = await fetch(`${ensured.location.endpoint}/v1/embeddings`, {
+      const response = await this.fetchWithTimeout(`${ensured.location.endpoint}/v1/embeddings`, {
        method: 'POST',
        headers: this.buildForwardHeaders(req),
        body: JSON.stringify(requestBody),
@@ -92,6 +94,11 @@ export class EmbeddingsHandler {
      });
      res.end(text);
    } catch (error) {
      if (error instanceof UpstreamTimeoutError) {
        this.sendError(res, 504, error.message, 'upstream_timeout');
        return;
      }
      const message = error instanceof Error ? error.message : String(error);
      logger.error(`Embeddings error: ${message}`);
      this.sendError(res, 500, `Embeddings generation failed: ${message}`, 'server_error');
@@ -159,7 +166,7 @@ export class EmbeddingsHandler {
    model: string,
    input: string,
  ): Promise<{ vector: number[]; tokenCount: number }> {
-    const response = await fetch(`${endpoint}/v1/embeddings`, {
+    const response = await this.fetchWithTimeout(`${endpoint}/v1/embeddings`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ model, input }),
@@ -181,7 +188,7 @@ export class EmbeddingsHandler {
    _model: string,
    input: string,
  ): Promise<{ vector: number[]; tokenCount: number }> {
-    const response = await fetch(`${endpoint}/embed`, {
+    const response = await this.fetchWithTimeout(`${endpoint}/embed`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ inputs: input }),
@@ -214,6 +221,25 @@ export class EmbeddingsHandler {
    return headers;
  }
  private async fetchWithTimeout(url: string, init: RequestInit): Promise<Response> {
    const controller = new AbortController();
    const timeout = setTimeout(() => controller.abort(), API_SERVER.REQUEST_TIMEOUT_MS);
    try {
      return await fetch(url, {
        ...init,
        signal: controller.signal,
      });
    } catch (error) {
      if (error instanceof Error && error.name === 'AbortError') {
        throw new UpstreamTimeoutError();
      }
      throw error;
    } finally {
      clearTimeout(timeout);
    }
  }
  private sendError(
    res: http.ServerResponse,
    statusCode: number,
@@ -17,6 +17,19 @@ import { EmbeddingsHandler } from './handlers/embeddings.ts';
 import { AuthMiddleware } from './middleware/auth.ts';
 import { SanityMiddleware } from './middleware/sanity.ts';
 interface IParsedRequestBody {
  kind: 'ok' | 'invalid' | 'too_large';
  body?: unknown;
 }
 interface IApiRouterOptions {
  chatHandler?: ChatHandler;
  modelsHandler?: ModelsHandler;
  embeddingsHandler?: EmbeddingsHandler;
  authMiddleware?: AuthMiddleware;
  sanityMiddleware?: SanityMiddleware;
 }
 /**
 * API Router - routes requests to handlers
 */
@@ -37,6 +50,7 @@ export class ApiRouter {
    modelLoader: ModelLoader,
    clusterCoordinator: ClusterCoordinator,
    apiKeys: string[],
    options: IApiRouterOptions = {},
  ) {
    this.containerManager = containerManager;
    this.modelRegistry = modelRegistry;
@@ -44,22 +58,23 @@ export class ApiRouter {
    this.clusterCoordinator = clusterCoordinator;
    // Initialize handlers
-    this.chatHandler = new ChatHandler(
+    this.chatHandler = options.chatHandler || new ChatHandler(
      containerManager,
      modelRegistry,
      modelLoader,
      clusterCoordinator,
    );
-    this.modelsHandler = new ModelsHandler(containerManager, modelRegistry, clusterCoordinator);
+    this.modelsHandler =
-    this.embeddingsHandler = new EmbeddingsHandler(
+      options.modelsHandler || new ModelsHandler(containerManager, modelRegistry, clusterCoordinator);
    this.embeddingsHandler = options.embeddingsHandler || new EmbeddingsHandler(
      containerManager,
      modelRegistry,
      clusterCoordinator,
    );
    // Initialize middleware
-    this.authMiddleware = new AuthMiddleware(apiKeys);
+    this.authMiddleware = options.authMiddleware || new AuthMiddleware(apiKeys);
-    this.sanityMiddleware = new SanityMiddleware(modelRegistry);
+    this.sanityMiddleware = options.sanityMiddleware || new SanityMiddleware(modelRegistry);
  }
  /**
@@ -119,11 +134,16 @@ export class ApiRouter {
    }
    // Parse body
-    const body = await this.parseRequestBody(req);
+    const parsedBody = await this.parseRequestBody(req);
-    if (!body) {
+    if (parsedBody.kind === 'too_large') {
      this.sendError(res, 413, 'Request body too large', 'invalid_request_error');
      return;
    }
    if (parsedBody.kind !== 'ok') {
      this.sendError(res, 400, 'Invalid JSON body', 'invalid_request_error');
      return;
    }
    const body = parsedBody.body;
    // Validate request
    const validation = this.sanityMiddleware.validateChatRequest(body);
@@ -155,11 +175,16 @@ export class ApiRouter {
    }
    // Parse body
-    const body = await this.parseRequestBody(req);
+    const parsedBody = await this.parseRequestBody(req);
-    if (!body) {
+    if (parsedBody.kind === 'too_large') {
      this.sendError(res, 413, 'Request body too large', 'invalid_request_error');
      return;
    }
    if (parsedBody.kind !== 'ok') {
      this.sendError(res, 400, 'Invalid JSON body', 'invalid_request_error');
      return;
    }
    const body = parsedBody.body;
    // Convert to chat format and handle
    const chatBody = this.convertCompletionToChat(body as Record<string, unknown>);
@@ -229,11 +254,16 @@ export class ApiRouter {
    }
    // Parse body
-    const body = await this.parseRequestBody(req);
+    const parsedBody = await this.parseRequestBody(req);
-    if (!body) {
+    if (parsedBody.kind === 'too_large') {
      this.sendError(res, 413, 'Request body too large', 'invalid_request_error');
      return;
    }
    if (parsedBody.kind !== 'ok') {
      this.sendError(res, 400, 'Invalid JSON body', 'invalid_request_error');
      return;
    }
    const body = parsedBody.body;
    const validation = this.sanityMiddleware.validateEmbeddingsRequest(body);
    if (!validation.valid) {
@@ -250,28 +280,45 @@ export class ApiRouter {
  /**
   * Parse request body
   */
-  private async parseRequestBody(req: http.IncomingMessage): Promise<unknown | null> {
+  private async parseRequestBody(req: http.IncomingMessage): Promise<IParsedRequestBody> {
    return new Promise((resolve) => {
      let body = '';
      let resolved = false;
      const finish = (result: IParsedRequestBody): void => {
        if (resolved) {
          return;
        }
        resolved = true;
        resolve(result);
      };
      req.on('data', (chunk) => {
        if (resolved) {
          return;
        }
        body += chunk.toString();
-        // Limit body size
+
        if (body.length > 10 * 1024 * 1024) {
-          resolve(null);
+          req.pause();
          req.destroy();
          finish({ kind: 'too_large' });
        }
      });
      req.on('end', () => {
        try {
-          resolve(JSON.parse(body));
+          finish({ kind: 'ok', body: JSON.parse(body) });
        } catch {
-          resolve(null);
+          finish({ kind: 'invalid' });
        }
      });
      req.on('error', () => {
-        resolve(null);
+        if (!resolved) {
          finish({ kind: 'invalid' });
        }
      });
    });
  }
@@ -16,6 +16,13 @@ import { ModelRegistry } from '../models/registry.ts';
 import { ModelLoader } from '../models/loader.ts';
 import { GpuDetector } from '../hardware/gpu-detector.ts';
 import { ClusterHandler } from './handlers/cluster.ts';
 import { buildHealthSnapshot } from '../helpers/health.ts';
 interface IApiServerOptions {
  gpuDetector?: GpuDetector;
  router?: ApiRouter;
  clusterHandler?: ClusterHandler;
 }
 /**
 * API Server for ModelGrid
@@ -31,6 +38,10 @@ export class ApiServer {
  private clusterCoordinator: ClusterCoordinator;
  private clusterHandler: ClusterHandler;
  private startTime: number = 0;
  private requestCounts = new Map<string, number>();
  private authFailureCounts = new Map<string, number>();
  private serverErrorCounts = new Map<string, number>();
  private rateLimitBuckets = new Map<string, { count: number; windowStart: number }>();
  constructor(
    config: IApiConfig,
@@ -38,15 +49,16 @@ export class ApiServer {
    modelRegistry: ModelRegistry,
    modelLoader: ModelLoader,
    clusterCoordinator: ClusterCoordinator,
    options: IApiServerOptions = {},
  ) {
    this.config = config;
    this.containerManager = containerManager;
    this.modelRegistry = modelRegistry;
-    this.gpuDetector = new GpuDetector();
+    this.gpuDetector = options.gpuDetector || new GpuDetector();
    this.modelLoader = modelLoader;
    this.clusterCoordinator = clusterCoordinator;
-    this.clusterHandler = new ClusterHandler(clusterCoordinator);
+    this.clusterHandler = options.clusterHandler || new ClusterHandler(clusterCoordinator);
-    this.router = new ApiRouter(
+    this.router = options.router || new ApiRouter(
      containerManager,
      modelRegistry,
      this.modelLoader,
@@ -112,6 +124,7 @@ export class ApiServer {
    res: http.ServerResponse,
  ): Promise<void> {
    const startTime = Date.now();
    const requestId = this.ensureRequestId(req, res);
    // Set CORS headers if enabled
    if (this.config.cors) {
@@ -131,18 +144,27 @@ export class ApiServer {
    if (path.startsWith('/_cluster')) {
      await this.clusterHandler.handle(req, res, path, url);
      this.recordRequest(path, res.statusCode);
      return;
    }
    // Health check endpoint (no auth required)
    if (path === '/health' || path === '/healthz') {
      await this.handleHealthCheck(res);
      this.recordRequest(path, res.statusCode);
      return;
    }
    // Metrics endpoint (no auth required)
    if (path === '/metrics') {
      await this.handleMetrics(res);
      this.recordRequest(path, res.statusCode);
      return;
    }
    if (!this.isRequestWithinRateLimit(req)) {
      this.sendError(res, 429, 'Rate limit exceeded', 'rate_limit_exceeded');
      this.recordRequest(path, res.statusCode);
      return;
    }
@@ -156,7 +178,8 @@ export class ApiServer {
    // Log request
    const duration = Date.now() - startTime;
-    logger.dim(`${req.method} ${path} - ${res.statusCode} (${duration}ms)`);
+    this.recordRequest(path, res.statusCode);
    logger.dim(`[${requestId}] ${req.method} ${path} - ${res.statusCode} (${duration}ms)`);
  }
  /**
@@ -187,44 +210,21 @@ export class ApiServer {
      const gpus = await this.gpuDetector.detectGpus();
      const models = await this.containerManager.getAllAvailableModels();
-      let status: 'ok' | 'degraded' | 'error' = 'ok';
+      const response: IHealthResponse = buildHealthSnapshot({
-      const containerHealth: Record<string, 'healthy' | 'unhealthy'> = {};
+        statuses,
-      const gpuStatus: Record<string, 'available' | 'in_use' | 'error'> = {};
+        modelCount: models.size,
-
+        gpus,
-      // Check container health
+        startTime: this.startTime,
      for (const [id, containerStatus] of statuses) {
        if (containerStatus.running && containerStatus.health === 'healthy') {
          containerHealth[id] = 'healthy';
        } else {
          containerHealth[id] = 'unhealthy';
          status = 'degraded';
        }
      }
      // Check GPU status
      for (const gpu of gpus) {
        gpuStatus[gpu.id] = 'available';
      }
      const response: IHealthResponse = {
        status,
        version: VERSION,
-        uptime: Math.floor((Date.now() - this.startTime) / 1000),
+      });
        containers: statuses.size,
        models: models.size,
        gpus: gpus.length,
        details: {
          containers: containerHealth,
          gpus: gpuStatus,
        },
      };
-      res.writeHead(status === 'ok' ? 200 : 503, { 'Content-Type': 'application/json' });
+      res.writeHead(response.status === 'ok' ? 200 : 503, { 'Content-Type': 'application/json' });
      res.end(JSON.stringify(response, null, 2));
    } catch (error) {
      res.writeHead(500, { 'Content-Type': 'application/json' });
      res.end(JSON.stringify({
        status: 'error',
        reasons: ['gpu_detection_failed'],
        error: error instanceof Error ? error.message : String(error),
      }));
    }
@@ -268,6 +268,28 @@ export class ApiServer {
      metrics.push(`# TYPE modelgrid_gpus_total gauge`);
      metrics.push(`modelgrid_gpus_total ${gpus.length}`);
      for (const [path, count] of this.requestCounts.entries()) {
        metrics.push(`# HELP modelgrid_api_requests_total Total API requests by path`);
        metrics.push(`# TYPE modelgrid_api_requests_total counter`);
        metrics.push(`modelgrid_api_requests_total{path="${this.escapeMetricLabel(path)}"} ${count}`);
      }
      for (const [path, count] of this.authFailureCounts.entries()) {
        metrics.push(`# HELP modelgrid_api_auth_failures_total Total authentication failures by path`);
        metrics.push(`# TYPE modelgrid_api_auth_failures_total counter`);
        metrics.push(
          `modelgrid_api_auth_failures_total{path="${this.escapeMetricLabel(path)}"} ${count}`,
        );
      }
      for (const [path, count] of this.serverErrorCounts.entries()) {
        metrics.push(`# HELP modelgrid_api_server_errors_total Total 5xx responses by path`);
        metrics.push(`# TYPE modelgrid_api_server_errors_total counter`);
        metrics.push(
          `modelgrid_api_server_errors_total{path="${this.escapeMetricLabel(path)}"} ${count}`,
        );
      }
      res.writeHead(200, { 'Content-Type': 'text/plain; charset=utf-8' });
      res.end(metrics.join('\n') + '\n');
    } catch (error) {
@@ -310,4 +332,73 @@ export class ApiServer {
      uptime: this.startTime ? Math.floor((Date.now() - this.startTime) / 1000) : 0,
    };
  }
  private recordRequest(path: string, statusCode: number): void {
    this.incrementMetric(this.requestCounts, path);
    if (statusCode === 401) {
      this.incrementMetric(this.authFailureCounts, path);
    }
    if (statusCode >= 500) {
      this.incrementMetric(this.serverErrorCounts, path);
    }
  }
  private isRequestWithinRateLimit(req: http.IncomingMessage): boolean {
    const configuredLimit = this.config.rateLimit;
    if (!configuredLimit || configuredLimit <= 0) {
      return true;
    }
    const key = this.getRateLimitKey(req);
    const now = Date.now();
    const windowMs = 60 * 1000;
    const bucket = this.rateLimitBuckets.get(key);
    if (!bucket || now - bucket.windowStart >= windowMs) {
      this.rateLimitBuckets.set(key, { count: 1, windowStart: now });
      return true;
    }
    if (bucket.count >= configuredLimit) {
      return false;
    }
    bucket.count += 1;
    return true;
  }
  private getRateLimitKey(req: http.IncomingMessage): string {
    if (typeof req.headers.authorization === 'string') {
      const match = req.headers.authorization.match(/^Bearer\s+(.+)$/i);
      if (match) {
        return `api_key:${match[1]}`;
      }
    }
    return `ip:${req.socket.remoteAddress || 'unknown'}`;
  }
  private incrementMetric(metric: Map<string, number>, path: string): void {
    metric.set(path, (metric.get(path) || 0) + 1);
  }
  private ensureRequestId(req: http.IncomingMessage, res: http.ServerResponse): string {
    const existing = typeof req.headers['x-request-id'] === 'string'
      ? req.headers['x-request-id']
      : undefined;
    const requestId = existing || this.generateRequestId();
    req.headers['x-request-id'] = requestId;
    res.setHeader('X-Request-Id', requestId);
    return requestId;
  }
  private generateRequestId(): string {
    return `req-${Date.now().toString(36)}-${Math.random().toString(36).slice(2, 8)}`;
  }
  private escapeMetricLabel(value: string): string {
    return value.replaceAll('\\', '\\\\').replaceAll('"', '\\"');
  }
 }
@@ -26,11 +26,8 @@ export class ConfigHandler {
      const configContent = await fs.readFile(configPath, 'utf-8');
      const config = JSON.parse(configContent) as IModelGridConfig;
      const modelConfig = {
-        registryUrl: config.models.registryUrl ||
+        registryUrl: config.models.registryUrl || 'https://list.modelgrid.com/catalog/models.json',
-          (config.models as { greenlistUrl?: string }).greenlistUrl ||
+        autoDeploy: config.models.autoDeploy ?? true,
          'https://list.modelgrid.com/catalog/models.json',
        autoDeploy: config.models.autoDeploy ??
          (config.models as { autoPull?: boolean }).autoPull ?? true,
        defaultEngine: config.models.defaultEngine || 'vllm',
        autoLoad: config.models.autoLoad || [],
      };
@@ -218,6 +215,12 @@ export class ConfigHandler {
        cors: true,
        corsOrigins: ['*'],
      },
      ui: {
        enabled: true,
        port: 8081,
        host: '0.0.0.0',
        assetSource: 'bundle',
      },
      docker: {
        networkName: 'modelgrid',
        runtime: 'docker',
@@ -1,4 +1,3 @@
 import os from 'node:os';
 import * as fs from 'node:fs/promises';
 import type { IModelCatalogEntry } from '../interfaces/catalog.ts';
 import type {
@@ -14,9 +13,10 @@ import type {
 import { CLUSTER, PATHS } from '../constants.ts';
 export class ClusterManager {
  private initialized = false;
  private config: IClusterConfig = {
    enabled: false,
-    nodeName: os.hostname(),
+    nodeName: 'modelgrid-local',
    role: 'standalone',
    bindHost: CLUSTER.DEFAULT_BIND_HOST,
    gossipPort: CLUSTER.DEFAULT_GOSSIP_PORT,
@@ -64,6 +64,8 @@ export class ClusterManager {
    } catch {
      // No persisted control state yet.
    }
    this.initialized = true;
  }
  public configure(config: IClusterConfig): void {
@@ -385,6 +387,10 @@ export class ClusterManager {
  }
  private schedulePersist(): void {
    if (!this.initialized) {
      return;
    }
    if (this.persistQueued) {
      return;
    }
@@ -397,6 +403,10 @@ export class ClusterManager {
  }
  private scheduleControlPersist(): void {
    if (!this.initialized) {
      return;
    }
    if (this.controlPersistQueued) {
      return;
    }
@@ -0,0 +1,112 @@
 import * as fs from 'node:fs/promises';
 import { PATHS, VERSION } from '../constants.ts';
 import type { IModelGridConfig } from '../interfaces/config.ts';
 import { logger } from '../logger.ts';
 export class ConfigManager {
  public async loadConfig(): Promise<IModelGridConfig> {
    try {
      const configContent = await fs.readFile(PATHS.CONFIG_FILE, 'utf-8');
      return this.normalizeConfig(JSON.parse(configContent) as Partial<IModelGridConfig>);
    } catch (error) {
      if ((error as NodeJS.ErrnoException).code === 'ENOENT') {
        throw new Error(`Configuration file not found: ${PATHS.CONFIG_FILE}`);
      }
      throw error;
    }
  }
  public async saveConfig(config: IModelGridConfig): Promise<void> {
    await fs.mkdir(PATHS.CONFIG_DIR, { recursive: true });
    await fs.writeFile(PATHS.CONFIG_FILE, JSON.stringify(config, null, 2));
  }
  public normalizeConfig(config: Partial<IModelGridConfig>): IModelGridConfig {
    this.logIgnoredConfigKeys(config);
    const filteredContainers = (config.containers || []).filter(
      (container) => (container as { type?: string }).type !== 'ollama',
    );
    return {
      version: config.version || VERSION,
      api: {
        port: config.api?.port || 8080,
        host: config.api?.host || '0.0.0.0',
        apiKeys: config.api?.apiKeys || [],
        rateLimit: config.api?.rateLimit,
        cors: config.api?.cors ?? true,
        corsOrigins: config.api?.corsOrigins || ['*'],
      },
      ui: {
        enabled: config.ui?.enabled ?? true,
        port: config.ui?.port || 8081,
        host: config.ui?.host || '0.0.0.0',
        assetSource: config.ui?.assetSource === 'disk' ? 'disk' : 'bundle',
      },
      docker: {
        networkName: config.docker?.networkName || 'modelgrid',
        runtime: config.docker?.runtime || 'docker',
        socketPath: config.docker?.socketPath,
      },
      gpus: {
        autoDetect: config.gpus?.autoDetect ?? true,
        assignments: config.gpus?.assignments || {},
      },
      containers: filteredContainers,
      models: {
        registryUrl: config.models?.registryUrl || 'https://list.modelgrid.com/catalog/models.json',
        autoDeploy: config.models?.autoDeploy ?? true,
        defaultEngine: 'vllm',
        autoLoad: config.models?.autoLoad || [],
      },
      cluster: {
        enabled: config.cluster?.enabled ?? false,
        nodeName: config.cluster?.nodeName || 'modelgrid-local',
        role: config.cluster?.role || 'standalone',
        bindHost: config.cluster?.bindHost || '0.0.0.0',
        gossipPort: config.cluster?.gossipPort || 7946,
        sharedSecret: config.cluster?.sharedSecret,
        advertiseUrl: config.cluster?.advertiseUrl,
        controlPlaneUrl: config.cluster?.controlPlaneUrl,
        heartbeatIntervalMs: config.cluster?.heartbeatIntervalMs || 5000,
        seedNodes: config.cluster?.seedNodes || [],
      },
      checkInterval: config.checkInterval || 30000,
    };
  }
  private logIgnoredConfigKeys(config: Partial<IModelGridConfig>): void {
    const unknownTopLevelKeys = Object.keys(config).filter((key) =>
      !['version', 'api', 'ui', 'docker', 'gpus', 'containers', 'models', 'cluster', 'checkInterval']
        .includes(key)
    );
    for (const key of unknownTopLevelKeys) {
      logger.warn(`Ignoring unknown config key: ${key}`);
    }
    const legacyModelConfig = config.models as {
      greenlistUrl?: string;
      autoPull?: boolean;
      defaultContainer?: string;
    } | undefined;
    if (legacyModelConfig?.greenlistUrl) {
      logger.warn('Ignoring removed config key: models.greenlistUrl');
    }
    if (legacyModelConfig?.autoPull !== undefined) {
      logger.warn('Ignoring removed config key: models.autoPull');
    }
    if (legacyModelConfig?.defaultContainer) {
      logger.warn('Ignoring removed config key: models.defaultContainer');
    }
    for (const container of config.containers || []) {
      const containerType = (container as { type?: string }).type;
      if (containerType === 'ollama') {
        logger.warn('Ignoring unsupported container type: ollama');
      }
    }
  }
 }
@@ -11,6 +11,7 @@ import type {
  TContainerType,
 } from '../interfaces/container.ts';
 import type { IChatCompletionRequest, IChatCompletionResponse } from '../interfaces/api.ts';
 import { API_SERVER } from '../constants.ts';
 import { ContainerRuntime } from '../docker/container-runtime.ts';
 import { logger } from '../logger.ts';
@@ -23,6 +24,13 @@ export type TModelPullProgress = (progress: {
  percent?: number;
 }) => void;
 export class UpstreamTimeoutError extends Error {
  constructor(message: string = 'Upstream request timed out') {
    super(message);
    this.name = 'UpstreamTimeoutError';
  }
 }
 /**
 * Abstract base class for AI model containers
 */
@@ -165,7 +173,7 @@ export abstract class BaseContainer {
    const url = `${endpoint}${path}`;
    const controller = new AbortController();
-    const timeout = options.timeout || 30000;
+    const timeout = options.timeout || API_SERVER.REQUEST_TIMEOUT_MS;
    const timeoutId = setTimeout(() => controller.abort(), timeout);
    try {
@@ -180,6 +188,11 @@ export abstract class BaseContainer {
      });
      return response;
    } catch (error) {
      if (error instanceof Error && error.name === 'AbortError') {
        throw new UpstreamTimeoutError();
      }
      throw error;
    } finally {
      clearTimeout(timeoutId);
    }
@@ -9,6 +9,7 @@ import { logger } from './logger.ts';
 import { TIMING } from './constants.ts';
 import type { ModelGrid } from './modelgrid.ts';
 import { ApiServer } from './api/server.ts';
 import { UiServer } from './ui/server.ts';
 import type { IModelGridConfig } from './interfaces/config.ts';
 /**
@@ -18,6 +19,7 @@ export class Daemon {
  private modelgrid: ModelGrid;
  private isRunning: boolean = false;
  private apiServer?: ApiServer;
  private uiServer?: UiServer;
  constructor(modelgrid: ModelGrid) {
    this.modelgrid = modelgrid;
@@ -48,6 +50,9 @@ export class Daemon {
      // Start API server
      await this.startApiServer(config);
      // Start UI server (runs on its own port, serves the operations console)
      await this.startUiServer(config);
      // Start containers
      await this.startContainers();
@@ -67,10 +72,9 @@ export class Daemon {
      await this.monitor();
    } catch (error) {
      this.isRunning = false;
-      logger.error(
+      const message = error instanceof Error ? error.message : String(error);
-        `Daemon failed to start: ${error instanceof Error ? error.message : String(error)}`,
+      logger.error(`Daemon failed to start: ${message}`);
-      );
+      throw error;
      process.exit(1);
    }
  }
@@ -86,6 +90,11 @@ export class Daemon {
    this.isRunning = false;
    // Stop UI server
    if (this.uiServer) {
      await this.uiServer.stop();
    }
    // Stop API server
    if (this.apiServer) {
      await this.apiServer.stop();
@@ -114,6 +123,26 @@ export class Daemon {
    await this.apiServer.start();
  }
  /**
   * Start the UI server, if enabled.
   */
  private async startUiServer(config: IModelGridConfig): Promise<void> {
    if (!config.ui.enabled) {
      logger.dim('UI server disabled in configuration');
      return;
    }
    logger.info('Starting UI server...');
    this.uiServer = new UiServer(
      config.ui,
      this.modelgrid.getContainerManager(),
      this.modelgrid.getClusterManager(),
    );
    await this.uiServer.start();
  }
  /**
   * Start configured containers
   */
@@ -0,0 +1,49 @@
 import type { IHealthResponse } from '../interfaces/api.ts';
 import type { IContainerStatus } from '../interfaces/container.ts';
 import type { IGpuInfo } from '../interfaces/gpu.ts';
 export function buildHealthSnapshot(options: {
  statuses: Map<string, IContainerStatus>;
  modelCount: number;
  gpus: IGpuInfo[];
  startTime: number;
  version: string;
 }): IHealthResponse {
  let status: 'ok' | 'degraded' | 'error' = 'ok';
  const reasons = new Set<'unhealthy_container' | 'no_models_available' | 'gpu_detection_failed'>();
  const containerHealth: Record<string, 'healthy' | 'unhealthy'> = {};
  const gpuStatus: Record<string, 'available' | 'in_use' | 'error'> = {};
  for (const [id, containerStatus] of options.statuses) {
    if (containerStatus.running && containerStatus.health === 'healthy') {
      containerHealth[id] = 'healthy';
    } else {
      containerHealth[id] = 'unhealthy';
      status = 'degraded';
      reasons.add('unhealthy_container');
    }
  }
  for (const gpu of options.gpus) {
    gpuStatus[gpu.id] = 'available';
  }
  if (options.modelCount === 0) {
    status = 'degraded';
    reasons.add('no_models_available');
  }
  return {
    status,
    reasons: Array.from(reasons),
    version: options.version,
    uptime: Math.floor((Date.now() - options.startTime) / 1000),
    containers: options.statuses.size,
    models: options.modelCount,
    gpus: options.gpus.length,
    details: {
      containers: containerHealth,
      gpus: gpuStatus,
    },
  };
 }
@@ -309,6 +309,8 @@ export interface IApiError {
 export interface IHealthResponse {
  /** Status */
  status: 'ok' | 'degraded' | 'error';
  /** Machine-readable reasons for degraded or error states */
  reasons?: Array<'unhealthy_container' | 'no_models_available' | 'gpu_detection_failed'>;
  /** Version */
  version: string;
  /** Uptime in seconds */
@@ -60,6 +60,28 @@ export interface IModelConfig {
  autoLoad: string[];
 }
 /**
 * Browser-based operations console (UI) configuration.
 * The UI is served on its own port, distinct from the OpenAI API port,
 * so that the data plane stays clean.
 */
 export interface IUiConfig {
  /** Whether to start the UI server alongside the API */
  enabled: boolean;
  /** Port to bind the UI server to (default: 8081) */
  port: number;
  /** Host to bind the UI server to (default: '0.0.0.0') */
  host: string;
  /**
   * Where UI assets come from.
   *  - 'bundle': from the compiled-in `ts_bundled/bundle.ts` (default, required
   *    for `deno compile` single-binary builds)
   *  - 'disk':   read on demand from `ts_web/` for the dev loop
   * Overridden at runtime by the `UI_ASSET_SOURCE` env var.
   */
  assetSource: 'bundle' | 'disk';
 }
 /**
 * Main ModelGrid configuration interface
 */
@@ -68,6 +90,8 @@ export interface IModelGridConfig {
  version: string;
  /** API server configuration */
  api: IApiConfig;
  /** UI server configuration */
  ui: IUiConfig;
  /** Docker configuration */
  docker: IDockerConfig;
  /** GPU configuration */
@@ -24,7 +24,7 @@ import { ClusterHandler } from './cli/cluster-handler.ts';
 import { ModelHandler } from './cli/model-handler.ts';
 import { ConfigHandler } from './cli/config-handler.ts';
 import { ServiceHandler } from './cli/service-handler.ts';
-import * as fs from 'node:fs/promises';
+import { ConfigManager } from './config/config-manager.ts';
 /**
 * ModelGrid - Main application coordinator
@@ -42,6 +42,7 @@ export class ModelGrid {
  private clusterCoordinator?: ClusterCoordinator;
  private modelRegistry: ModelRegistry;
  private modelLoader?: ModelLoader;
  private configManager: ConfigManager;
  // CLI Handlers
  private gpuHandler: GpuHandler;
@@ -60,6 +61,7 @@ export class ModelGrid {
    this.containerManager = new ContainerManager();
    this.clusterManager = new ClusterManager();
    this.modelRegistry = new ModelRegistry();
    this.configManager = new ConfigManager();
    this.systemd = new Systemd();
    this.daemon = new Daemon(this);
@@ -80,23 +82,8 @@ export class ModelGrid {
   * Load configuration from file
   */
  public async loadConfig(): Promise<void> {
-    try {
+    this.config = await this.configManager.loadConfig();
-      const configContent = await fs.readFile(PATHS.CONFIG_FILE, 'utf-8');
+    logger.dim(`Configuration loaded from ${PATHS.CONFIG_FILE}`);
      this.config = this.normalizeConfig(
        JSON.parse(configContent) as Partial<IModelGridConfig> & {
          models?: {
            greenlistUrl?: string;
            autoPull?: boolean;
          } & Partial<IModelGridConfig['models']>;
        },
      );
      logger.dim(`Configuration loaded from ${PATHS.CONFIG_FILE}`);
    } catch (error) {
      if ((error as NodeJS.ErrnoException).code === 'ENOENT') {
        throw new Error(`Configuration file not found: ${PATHS.CONFIG_FILE}`);
      }
      throw error;
    }
  }
  /**
@@ -107,8 +94,7 @@ export class ModelGrid {
      throw new Error('No configuration to save');
    }
-    await fs.mkdir(PATHS.CONFIG_DIR, { recursive: true });
+    await this.configManager.saveConfig(this.config);
    await fs.writeFile(PATHS.CONFIG_FILE, JSON.stringify(this.config, null, 2));
    logger.dim(`Configuration saved to ${PATHS.CONFIG_FILE}`);
  }
@@ -270,7 +256,7 @@ export class ModelGrid {
    // Initialize containers from config
    for (const containerConfig of this.config.containers) {
-      await this.containerManager.addContainer(containerConfig);
+      this.containerManager.addContainer(containerConfig);
    }
    // Initialize model registry
@@ -294,61 +280,6 @@ export class ModelGrid {
    logger.success('ModelGrid initialized');
  }
  private normalizeConfig(
    config: Partial<IModelGridConfig> & {
      models?: {
        greenlistUrl?: string;
        autoPull?: boolean;
      } & Partial<IModelGridConfig['models']>;
    },
  ): IModelGridConfig {
    const filteredContainers = (config.containers || []).filter(
      (container) => (container as { type?: string }).type !== 'ollama',
    );
    return {
      version: config.version || VERSION,
      api: {
        port: config.api?.port || 8080,
        host: config.api?.host || '0.0.0.0',
        apiKeys: config.api?.apiKeys || [],
        rateLimit: config.api?.rateLimit,
        cors: config.api?.cors ?? true,
        corsOrigins: config.api?.corsOrigins || ['*'],
      },
      docker: {
        networkName: config.docker?.networkName || 'modelgrid',
        runtime: config.docker?.runtime || 'docker',
        socketPath: config.docker?.socketPath,
      },
      gpus: {
        autoDetect: config.gpus?.autoDetect ?? true,
        assignments: config.gpus?.assignments || {},
      },
      containers: filteredContainers,
      models: {
        registryUrl: config.models?.registryUrl || config.models?.greenlistUrl ||
          'https://list.modelgrid.com/catalog/models.json',
        autoDeploy: config.models?.autoDeploy ?? config.models?.autoPull ?? true,
        defaultEngine: 'vllm',
        autoLoad: config.models?.autoLoad || [],
      },
      cluster: {
        enabled: config.cluster?.enabled ?? false,
        nodeName: config.cluster?.nodeName || 'modelgrid-local',
        role: config.cluster?.role || 'standalone',
        bindHost: config.cluster?.bindHost || '0.0.0.0',
        gossipPort: config.cluster?.gossipPort || 7946,
        sharedSecret: config.cluster?.sharedSecret,
        advertiseUrl: config.cluster?.advertiseUrl,
        controlPlaneUrl: config.cluster?.controlPlaneUrl,
        heartbeatIntervalMs: config.cluster?.heartbeatIntervalMs || 5000,
        seedNodes: config.cluster?.seedNodes || [],
      },
      checkInterval: config.checkInterval || 30000,
    };
  }
  /**
   * Shutdown the ModelGrid system
   */
@@ -4,7 +4,7 @@
 import * as fs from 'node:fs/promises';
 import type { IModelCatalog, IModelCatalogEntry } from '../interfaces/catalog.ts';
-import { MODEL_REGISTRY, TIMING } from '../constants.ts';
+import { API_SERVER, MODEL_REGISTRY, TIMING } from '../constants.ts';
 import { logger } from '../logger.ts';
 export class ModelRegistry {
@@ -167,7 +167,7 @@ export class ModelRegistry {
    }
    const controller = new AbortController();
-    const timeout = setTimeout(() => controller.abort(), 30000);
+    const timeout = setTimeout(() => controller.abort(), API_SERVER.REQUEST_TIMEOUT_MS);
    try {
      const response = await fetch(source, {
@@ -0,0 +1 @@
 export { UiServer } from './server.ts';
@@ -0,0 +1,317 @@
 /**
 * UI Server
 *
 * Serves the ModelGrid operations console on its own port, separate from
 * the OpenAI-compatible API. Assets come from one of two sources:
 *   - 'disk':   read on demand from `ts_web/` (dev loop, hot edits)
 *   - 'bundle': from the generated `ts_bundled/bundle.ts` module
 *               (default, required for `deno compile` single-binary builds)
 *
 * Plus a single JSON endpoint `/_ui/overview` that the SPA calls to render
 * the Overview view without cross-origin fetches into the API server.
 */
 import * as http from 'node:http';
 import * as fs from 'node:fs/promises';
 import { dirname, extname, join, resolve } from 'node:path';
 import { fileURLToPath } from 'node:url';
 import type { IUiConfig } from '../interfaces/config.ts';
 import type { IHealthResponse } from '../interfaces/api.ts';
 import { logger } from '../logger.ts';
 import { VERSION } from '../constants.ts';
 import type { ContainerManager } from '../containers/container-manager.ts';
 import type { ClusterManager } from '../cluster/cluster-manager.ts';
 import { GpuDetector } from '../hardware/gpu-detector.ts';
 import { buildHealthSnapshot } from '../helpers/health.ts';
 interface IBundledFile {
  path: string;
  contentBase64: string;
 }
 interface IAssetEntry {
  bytes: Uint8Array;
  contentType: string;
 }
 const __filename = fileURLToPath(import.meta.url);
 const __dirname = dirname(__filename);
 const REPO_ROOT = resolve(__dirname, '..', '..');
 const TS_WEB_DIR = join(REPO_ROOT, 'ts_web');
 export class UiServer {
  private server?: http.Server;
  private config: IUiConfig;
  private containerManager: ContainerManager;
  private clusterManager: ClusterManager;
  private gpuDetector: GpuDetector;
  private bundleMap: Map<string, IAssetEntry> | null = null;
  private activeAssetSource: 'disk' | 'bundle' = 'bundle';
  private startTime = 0;
  constructor(
    config: IUiConfig,
    containerManager: ContainerManager,
    clusterManager: ClusterManager,
  ) {
    this.config = config;
    this.containerManager = containerManager;
    this.clusterManager = clusterManager;
    this.gpuDetector = new GpuDetector();
  }
  public async start(): Promise<void> {
    if (this.server) {
      logger.warn('UI server is already running');
      return;
    }
    this.activeAssetSource = this.resolveAssetSource();
    if (this.activeAssetSource === 'bundle') {
      this.bundleMap = await this.loadBundleMap();
      if (!this.bundleMap) {
        logger.warn(
          'UI bundle not found (ts_bundled/bundle.ts missing). ' +
            'Falling back to disk mode — run `deno task bundle:ui` before `deno compile`.',
        );
        this.activeAssetSource = 'disk';
      }
    }
    this.startTime = Date.now();
    this.server = http.createServer(async (req, res) => {
      try {
        await this.handleRequest(req, res);
      } catch (err) {
        logger.error(`UI request error: ${err instanceof Error ? err.message : String(err)}`);
        if (!res.headersSent) {
          res.writeHead(500, { 'Content-Type': 'text/plain' });
          res.end('Internal server error');
        }
      }
    });
    await new Promise<void>((resolve, reject) => {
      this.server!.listen(this.config.port, this.config.host, () => {
        logger.success(
          `UI server started on ${this.config.host}:${this.config.port} ` +
            `(asset source: ${this.activeAssetSource})`,
        );
        resolve();
      });
      this.server!.on('error', (error) => {
        logger.error(`UI server error: ${error.message}`);
        reject(error);
      });
    });
  }
  public async stop(): Promise<void> {
    if (!this.server) return;
    await new Promise<void>((resolve) => {
      this.server!.close(() => resolve());
    });
    this.server = undefined;
    logger.log('UI server stopped');
  }
  public getInfo(): { running: boolean; host: string; port: number; assetSource: string } {
    return {
      running: !!this.server,
      host: this.config.host,
      port: this.config.port,
      assetSource: this.activeAssetSource,
    };
  }
  private async handleRequest(
    req: http.IncomingMessage,
    res: http.ServerResponse,
  ): Promise<void> {
    const url = new URL(req.url || '/', `http://${req.headers.host || 'localhost'}`);
    const path = url.pathname;
    if (req.method !== 'GET' && req.method !== 'HEAD') {
      res.writeHead(405, { 'Content-Type': 'text/plain', 'Allow': 'GET, HEAD' });
      res.end('Method Not Allowed');
      return;
    }
    if (path === '/_ui/overview') {
      await this.handleOverview(res);
      return;
    }
    await this.serveAsset(path, res);
  }
  private async handleOverview(res: http.ServerResponse): Promise<void> {
    const statuses = await this.containerManager.getAllStatus();
    const models = await this.containerManager.getAllAvailableModels();
    const gpus = await this.gpuDetector.detectGpus();
    const health: IHealthResponse = buildHealthSnapshot({
      statuses,
      modelCount: models.size,
      gpus,
      startTime: this.startTime,
      version: VERSION,
    });
    const clusterConfig = this.clusterManager.getConfig();
    const body = {
      health,
      node: {
        name: clusterConfig?.nodeName ?? 'modelgrid-local',
        role: clusterConfig?.role ?? 'standalone',
        version: VERSION,
      },
    };
    res.writeHead(200, {
      'Content-Type': 'application/json; charset=utf-8',
      'Cache-Control': 'no-store',
    });
    res.end(JSON.stringify(body));
  }
  private async serveAsset(path: string, res: http.ServerResponse): Promise<void> {
    const normalized = path === '/' ? '/index.html' : path;
    if (this.activeAssetSource === 'bundle' && this.bundleMap) {
      const hit = this.bundleMap.get(normalized);
      if (hit) {
        this.writeAsset(res, hit);
        return;
      }
      // SPA fallback: any unknown non-asset path gets index.html.
      if (!hasKnownAssetExtension(normalized)) {
        const shell = this.bundleMap.get('/index.html');
        if (shell) {
          this.writeAsset(res, shell);
          return;
        }
      }
      res.writeHead(404, { 'Content-Type': 'text/plain' });
      res.end('Not Found');
      return;
    }
    // Disk mode
    const safe = normalizePath(normalized);
    if (!safe) {
      res.writeHead(400, { 'Content-Type': 'text/plain' });
      res.end('Bad Request');
      return;
    }
    const full = join(TS_WEB_DIR, safe);
    try {
      const bytes = await fs.readFile(full);
      this.writeAsset(res, {
        bytes: new Uint8Array(bytes),
        contentType: contentTypeForPath(safe),
      });
      return;
    } catch (err) {
      if ((err as NodeJS.ErrnoException).code === 'ENOENT') {
        if (!hasKnownAssetExtension(safe)) {
          try {
            const shell = await fs.readFile(join(TS_WEB_DIR, 'index.html'));
            this.writeAsset(res, {
              bytes: new Uint8Array(shell),
              contentType: 'text/html; charset=utf-8',
            });
            return;
          } catch {
            // fall through to 404
          }
        }
        res.writeHead(404, { 'Content-Type': 'text/plain' });
        res.end('Not Found');
        return;
      }
      throw err;
    }
  }
  private writeAsset(res: http.ServerResponse, asset: IAssetEntry): void {
    res.writeHead(200, {
      'Content-Type': asset.contentType,
      'Content-Length': asset.bytes.byteLength,
      'Cache-Control': 'no-cache',
    });
    res.end(asset.bytes);
  }
  private resolveAssetSource(): 'disk' | 'bundle' {
    const envOverride = typeof Deno !== 'undefined' ? Deno.env.get('UI_ASSET_SOURCE') : undefined;
    const picked = (envOverride || this.config.assetSource || 'bundle').toLowerCase();
    if (picked === 'disk' || picked === 'bundle') return picked;
    logger.warn(`Unknown UI_ASSET_SOURCE "${picked}", defaulting to bundle`);
    return 'bundle';
  }
  private async loadBundleMap(): Promise<Map<string, IAssetEntry> | null> {
    try {
      // The bundle module is generated by `deno task bundle:ui`.
      // @ts-ignore — generated file may not exist until the bundle task runs.
      const mod = await import('../../ts_bundled/bundle.ts');
      const files: IBundledFile[] = mod.files ?? [];
      const map = new Map<string, IAssetEntry>();
      for (const file of files) {
        map.set(`/${file.path}`, {
          bytes: decodeBase64(file.contentBase64),
          contentType: contentTypeForPath(file.path),
        });
      }
      return map;
    } catch {
      return null;
    }
  }
 }
 function decodeBase64(input: string): Uint8Array {
  const binary = atob(input);
  const bytes = new Uint8Array(binary.length);
  for (let i = 0; i < binary.length; i++) bytes[i] = binary.charCodeAt(i);
  return bytes;
 }
 function normalizePath(path: string): string | null {
  // Strip leading slashes, reject traversal.
  const stripped = path.replace(/^\/+/, '');
  if (stripped.includes('..')) return null;
  return stripped;
 }
 function hasKnownAssetExtension(path: string): boolean {
  return extname(path) !== '';
 }
 function contentTypeForPath(path: string): string {
  const ext = extname(path).toLowerCase().replace(/^\./, '');
  const types: Record<string, string> = {
    html: 'text/html; charset=utf-8',
    js: 'application/javascript; charset=utf-8',
    mjs: 'application/javascript; charset=utf-8',
    css: 'text/css; charset=utf-8',
    json: 'application/json; charset=utf-8',
    map: 'application/json; charset=utf-8',
    svg: 'image/svg+xml',
    png: 'image/png',
    jpg: 'image/jpeg',
    jpeg: 'image/jpeg',
    gif: 'image/gif',
    ico: 'image/x-icon',
    webp: 'image/webp',
    woff: 'font/woff',
    woff2: 'font/woff2',
    ttf: 'font/ttf',
    otf: 'font/otf',
    txt: 'text/plain; charset=utf-8',
  };
  return types[ext] || 'application/octet-stream';
 }
@@ -0,0 +1,187 @@
 :root {
  color-scheme: dark;
  --bg: #000;
  --bg-1: #0b0b0d;
  --bg-2: #14141a;
  --fg: #e6e6ea;
  --fg-dim: #8a8a92;
  --border: #23232b;
  --accent: #4357d9;
  --ok: #2ecc71;
  --warn: #f1c40f;
  --err: #e74c3c;
 }
 * { box-sizing: border-box; }
 html, body {
  margin: 0;
  padding: 0;
  height: 100%;
  background: var(--bg);
  color: var(--fg);
  font-family: Inter, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
  font-size: 14px;
 }
 body {
  display: grid;
  grid-template-columns: 220px 1fr;
 }
 a { color: inherit; text-decoration: none; }
 .dim { color: var(--fg-dim); }
 .nav {
  background: var(--bg-1);
  border-right: 1px solid var(--border);
  display: flex;
  flex-direction: column;
  height: 100vh;
  position: sticky;
  top: 0;
 }
 .nav-brand {
  padding: 20px 16px 12px;
  font-size: 15px;
  font-weight: 600;
  letter-spacing: 0.02em;
  border-bottom: 1px solid var(--border);
 }
 .nav-items {
  display: flex;
  flex-direction: column;
  padding: 8px 0;
  flex: 1;
  overflow-y: auto;
 }
 .nav-items a {
  padding: 8px 16px;
  color: var(--fg-dim);
  border-left: 2px solid transparent;
  transition: color 0.1s, background 0.1s, border-color 0.1s;
 }
 .nav-items a:hover {
  color: var(--fg);
  background: var(--bg-2);
 }
 .nav-items a.active {
  color: var(--fg);
  background: var(--bg-2);
  border-left-color: var(--accent);
 }
 .nav-footer {
  padding: 12px 16px;
  border-top: 1px solid var(--border);
  font-size: 12px;
 }
 main {
  padding: 24px 32px;
  overflow-y: auto;
  height: 100vh;
 }
 h1 {
  font-size: 18px;
  font-weight: 600;
  margin: 0 0 20px;
  letter-spacing: 0.01em;
 }
 .cards {
  display: grid;
  grid-template-columns: repeat(auto-fill, minmax(200px, 1fr));
  gap: 12px;
  margin-bottom: 24px;
 }
 .card {
  background: var(--bg-1);
  border: 1px solid var(--border);
  border-radius: 6px;
  padding: 16px;
 }
 .card-label {
  font-size: 11px;
  text-transform: uppercase;
  letter-spacing: 0.08em;
  color: var(--fg-dim);
  margin-bottom: 8px;
 }
 .card-value {
  font-size: 22px;
  font-weight: 600;
 }
 .card-sub {
  font-size: 12px;
  color: var(--fg-dim);
  margin-top: 4px;
 }
 .status-dot {
  display: inline-block;
  width: 8px;
  height: 8px;
  border-radius: 50%;
  margin-right: 6px;
  vertical-align: middle;
 }
 .status-dot.ok  { background: var(--ok); }
 .status-dot.warn{ background: var(--warn); }
 .status-dot.err { background: var(--err); }
 table {
  width: 100%;
  border-collapse: collapse;
  background: var(--bg-1);
  border: 1px solid var(--border);
  border-radius: 6px;
  overflow: hidden;
 }
 th, td {
  text-align: left;
  padding: 10px 14px;
  border-bottom: 1px solid var(--border);
  font-weight: normal;
 }
 th {
  color: var(--fg-dim);
  font-size: 11px;
  text-transform: uppercase;
  letter-spacing: 0.08em;
  background: var(--bg-2);
 }
 tr:last-child td { border-bottom: none; }
 .placeholder {
  padding: 40px;
  text-align: center;
  color: var(--fg-dim);
  background: var(--bg-1);
  border: 1px dashed var(--border);
  border-radius: 6px;
 }
 .error {
  background: var(--bg-1);
  border: 1px solid var(--err);
  color: var(--err);
  padding: 12px 16px;
  border-radius: 6px;
  font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
  font-size: 12px;
 }
@@ -0,0 +1,161 @@
 // ModelGrid UI — vanilla client. Bundled into ts_bundled/bundle.ts for
 // the single-binary build, or served from disk in dev mode.
 const VIEWS = [
  'overview',
  'cluster',
  'gpus',
  'deployments',
  'models',
  'access',
  'logs',
  'metrics',
  'settings',
 ];
 const view = document.getElementById('view');
 const nodeIdent = document.getElementById('node-ident');
 const nodeVersion = document.getElementById('node-version');
 function parseHash() {
  const raw = location.hash.replace(/^#\/?/, '');
  const [top = 'overview'] = raw.split('/').filter(Boolean);
  return VIEWS.includes(top) ? top : 'overview';
 }
 function setActive(current) {
  document.querySelectorAll('.nav-items a').forEach((el) => {
    el.classList.toggle('active', el.dataset.view === current);
  });
 }
 async function fetchHealth() {
  const res = await fetch('/_ui/overview', { headers: { accept: 'application/json' } });
  if (!res.ok) throw new Error(`HTTP ${res.status}`);
  return res.json();
 }
 function statusDot(status) {
  const ok = status === 'ok';
  const warn = status === 'degraded';
  const cls = ok ? 'ok' : warn ? 'warn' : 'err';
  return `<span class="status-dot ${cls}"></span>`;
 }
 async function renderOverview() {
  view.innerHTML = `<h1>Overview</h1><div id="ovstate" class="placeholder">Loading…</div>`;
  try {
    const data = await fetchHealth();
    const health = data.health;
    const containers = health.containers || 0;
    const models = health.models || 0;
    const gpus = health.gpus || 0;
    const uptime = health.uptime || 0;
    const detailEntries = Object.entries(health.details?.containers || {});
    const runningContainers = detailEntries.filter(([, v]) => v === 'healthy').length;
    view.innerHTML = `
      <h1>Overview</h1>
      <div class="cards">
        <div class="card">
          <div class="card-label">Fleet</div>
          <div class="card-value">${statusDot(health.status)}${health.status}</div>
          <div class="card-sub">v${health.version} · up ${formatUptime(uptime)}</div>
        </div>
        <div class="card">
          <div class="card-label">Deployments</div>
          <div class="card-value">${runningContainers} / ${containers}</div>
          <div class="card-sub">${containers === 0 ? 'no deployments' : `${runningContainers} healthy`}</div>
        </div>
        <div class="card">
          <div class="card-label">GPUs</div>
          <div class="card-value">${gpus}</div>
          <div class="card-sub">${gpus === 0 ? 'no GPU detected' : 'detected'}</div>
        </div>
        <div class="card">
          <div class="card-label">Models</div>
          <div class="card-value">${models}</div>
          <div class="card-sub">served via OpenAI API</div>
        </div>
      </div>
      <h1 style="margin-top:24px">Deployments</h1>
      ${renderContainerTable(detailEntries)}
    `;
    if (data.node) {
      nodeIdent.textContent = `${data.node.name} · ${data.node.role}`;
      nodeVersion.textContent = `v${data.node.version}`;
    }
  } catch (err) {
    view.innerHTML = `<h1>Overview</h1><div class="error">Failed to load: ${escapeHtml(String(err.message || err))}</div>`;
  }
 }
 function renderContainerTable(entries) {
  if (entries.length === 0) {
    return `<div class="placeholder">No deployments configured. Add one with <code>modelgrid run &lt;model&gt;</code>.</div>`;
  }
  const rows = entries.map(([id, state]) => `
    <tr>
      <td>${escapeHtml(id)}</td>
      <td>${statusDot(state === 'healthy' ? 'ok' : 'err')}${escapeHtml(state)}</td>
    </tr>
  `).join('');
  return `<table><thead><tr><th>Container</th><th>Health</th></tr></thead><tbody>${rows}</tbody></table>`;
 }
 function renderPlaceholder(name) {
  view.innerHTML = `
    <h1>${name}</h1>
    <div class="placeholder">
      This view is part of the UI concept (see <code>readme.ui.md</code>) but is not implemented yet.
      Use the CLI for now: <code>modelgrid ${cliHint(name)}</code>.
    </div>
  `;
 }
 function cliHint(view) {
  const map = {
    Cluster: 'cluster status',
    GPUs: 'gpu list',
    Deployments: 'ps',
    Models: 'model list',
    Access: 'config apikey list',
    Logs: 'service logs',
    Metrics: 'service status',
    Settings: 'config show',
  };
  return map[view] || '--help';
 }
 function formatUptime(s) {
  if (s < 60) return `${s}s`;
  if (s < 3600) return `${Math.floor(s / 60)}m`;
  if (s < 86400) return `${Math.floor(s / 3600)}h`;
  return `${Math.floor(s / 86400)}d`;
 }
 function escapeHtml(s) {
  return s.replace(/[&<>"']/g, (c) => ({
    '&': '&amp;', '<': '&lt;', '>': '&gt;', '"': '&quot;', "'": '&#39;',
  }[c]));
 }
 function route() {
  const current = parseHash();
  setActive(current);
  switch (current) {
    case 'overview':    return renderOverview();
    case 'cluster':     return renderPlaceholder('Cluster');
    case 'gpus':        return renderPlaceholder('GPUs');
    case 'deployments': return renderPlaceholder('Deployments');
    case 'models':      return renderPlaceholder('Models');
    case 'access':      return renderPlaceholder('Access');
    case 'logs':        return renderPlaceholder('Logs');
    case 'metrics':     return renderPlaceholder('Metrics');
    case 'settings':    return renderPlaceholder('Settings');
  }
 }
 window.addEventListener('hashchange', route);
 if (!location.hash) location.hash = '#/overview';
 route();
@@ -0,0 +1,32 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <meta name="theme-color" content="#000000">
  <title>ModelGrid</title>
  <link rel="stylesheet" href="/app.css">
 </head>
 <body>
  <aside class="nav">
    <div class="nav-brand">ModelGrid</div>
    <nav class="nav-items">
      <a href="#/overview" data-view="overview">Overview</a>
      <a href="#/cluster" data-view="cluster">Cluster</a>
      <a href="#/gpus" data-view="gpus">GPUs</a>
      <a href="#/deployments" data-view="deployments">Deployments</a>
      <a href="#/models" data-view="models">Models</a>
      <a href="#/access" data-view="access">Access</a>
      <a href="#/logs" data-view="logs">Logs</a>
      <a href="#/metrics" data-view="metrics">Metrics</a>
      <a href="#/settings" data-view="settings">Settings</a>
    </nav>
    <div class="nav-footer">
      <div id="node-ident">—</div>
      <div id="node-version" class="dim">—</div>
    </div>
  </aside>
  <main id="view"></main>
  <script src="/app.js"></script>
 </body>
 </html>
Author	SHA1	Message	Date
jkunz	4af9d3de69	test(ui): cover degraded overview health semantics CI / Type Check & Lint (push) Successful in 6s Details CI / Build Test (Current Platform) (push) Successful in 6s Details CI / Build All Platforms (push) Successful in 38s Details	2026-04-21 13:36:53 +00:00
jkunz	405fff91af	refactor(health): share health snapshot computation	2026-04-21 13:36:07 +00:00
jkunz	9022c8dbf3	feat(health): include degraded reasons in responses	2026-04-21 13:34:58 +00:00
jkunz	703cceb512	test(config): cover config init and show output	2026-04-21 13:33:38 +00:00
jkunz	9d925f9401	feat(api): assign and echo request ids	2026-04-21 13:32:50 +00:00
jkunz	fe4fdb32d7	fix(api): map upstream timeouts to 504 responses	2026-04-21 13:32:12 +00:00
jkunz	d6b4c0def1	refactor(api): inject server dependencies in tests	2026-04-21 13:30:52 +00:00
jkunz	9608540792	feat(api): enforce per-minute request rate limits	2026-04-21 13:30:20 +00:00
jkunz	3762fc661e	feat(metrics): count api requests auth failures and 5xxs CI / Type Check & Lint (push) Successful in 6s Details CI / Build Test (Current Platform) (push) Successful in 6s Details CI / Build All Platforms (push) Successful in 39s Details	2026-04-21 13:15:34 +00:00
jkunz	6541b2db1c	docs(testing): document regression and seam test commands	2026-04-21 13:13:51 +00:00
jkunz	da7375c889	refactor(timeout): reuse shared api request timeout	2026-04-21 13:13:13 +00:00
jkunz	44eb9b9173	test(cluster): cover persistence initialization and pruning	2026-04-21 13:12:01 +00:00
jkunz	1f24df0d80	fix(config): warn on ignored legacy config keys	2026-04-21 13:10:55 +00:00
jkunz	c95961d596	refactor(api): allow router handler injection in tests	2026-04-21 13:10:00 +00:00
jkunz	0921dfbe5e	test(api): cover health metrics and auth flows	2026-04-21 13:09:29 +00:00
jkunz	5172002ec0	fix(api): reject oversized request bodies with 413	2026-04-21 13:08:47 +00:00
jkunz	58eabba84d	refactor(runtime): let daemon startup failures bubble up CI / Type Check & Lint (push) Successful in 6s Details CI / Build Test (Current Platform) (push) Successful in 6s Details CI / Build All Platforms (push) Successful in 40s Details	2026-04-21 12:46:24 +00:00
jkunz	5e8ce6690d	docs(readme): remove stale runtime references	2026-04-21 12:45:52 +00:00
jkunz	0ea98caed6	refactor(config): extract config loading from modelgrid	2026-04-21 12:45:16 +00:00
jkunz	871afedbb7	test(behavior): cover config routing and registry seams	2026-04-21 12:44:11 +00:00
jkunz	1f6cf51794	build(package): wire scripts to the real deno tasks	2026-04-21 12:42:07 +00:00
jkunz	054875abb5	refactor(config): remove legacy model config aliases	2026-04-21 12:41:49 +00:00
jkunz	3e341bbfda	docs(config): sync hints with current config schema	2026-04-21 12:40:45 +00:00
jkunz	9f7308498c	fix(cluster): skip persistence scheduling until initialize has run schedulePersist and scheduleControlPersist can fire from configure() and the public scheduling paths before initialize() has completed. Without a guard, those queued microtasks call persistState/persistControlState, which try to mkdir PATHS.DATA_DIR and write state files from tests and short-lived scripts that never meant to touch the data directory. That produced async-leak warnings in the Cluster manager unit tests and left orphan directories on hosts that only constructed a ClusterManager to inspect it. Add an `initialized` flag set at the end of initialize() and early-return from both schedulers when it is false. Real runtime paths always call initialize() during Daemon startup, so this changes no production behavior.	2026-04-21 12:39:50 +00:00
jkunz	952bf394d3	fix(cluster): avoid hostname lookup during construction	2026-04-21 12:37:12 +00:00
jkunz	3b2a16b151	feat(ui): add browser console served by the daemon CI / Type Check & Lint (push) Successful in 8s Details CI / Build Test (Current Platform) (push) Successful in 9s Details CI / Build All Platforms (push) Successful in 39s Details Introduce a minimal operations console reachable on a dedicated UI port (default 8081), kept separate from the OpenAI-compatible API port. - ts_web/ holds the SPA shell (index.html, app.css, vanilla app.js) with sidebar navigation for all views from readme.ui.md and a working Overview page backed by a new /_ui/overview JSON endpoint. - scripts/bundle-ui.ts walks ts_web/ and emits ts_bundled/bundle.ts, a single generated module exporting every asset as base64. Mirrors the @stack.gallery/registry pattern so deno compile binaries embed the entire UI with no external filesystem dependency at runtime. - ts/ui/server.ts (UiServer) serves assets from either the bundled map (default, prod) or directly from ts_web/ on disk (dev). The source is chosen per-config and can be overridden by UI_ASSET_SOURCE=disk\|bundle. SPA fallback routes unknown extensionless paths to index.html. - IModelGridConfig.ui block with enabled/port/host/assetSource defaults; config init writes the block, the normalizer fills in defaults on load, and the daemon starts/stops the UI server alongside the API. - deno.json gains a bundle:ui task; compile:all now depends on it so released binaries always contain an up-to-date bundle. dev task sets UI_ASSET_SOURCE=disk for hot edits. - ts_bundled/ is gitignored (generated on build). - test/ui-server.smoke.ts exercises bundle and disk modes end to end (index, app.js, SPA fallback, /_ui/overview, 404).	2026-04-21 10:01:44 +00:00
jkunz	9c9c0c90ae	docs(readme): ship UI via typedserver + bundled ts module CI / Type Check & Lint (push) Successful in 6s Details CI / Build Test (Current Platform) (push) Successful in 6s Details CI / Build All Platforms (push) Successful in 41s Details Rework the implementation-notes section of readme.ui.md so the UI delivery story matches ModelGrid's deno-compile single-binary shape. Adopt the @stack.gallery/registry pattern: a build step bundles ts_web/ into a generated ts_bundled/bundle.ts exporting a path->bytes map that typedserver serves at runtime. Add a dev-vs-prod asset-source switch so UI edits stay hot-reloadable during development while release builds embed the whole console in the binary.	2026-04-21 09:38:24 +00:00
jkunz	24bb6b3058	docs(readme): add UI concept document CI / Type Check & Lint (push) Successful in 6s Details CI / Build Test (Current Platform) (push) Successful in 6s Details CI / Build All Platforms (push) Successful in 40s Details Sketch a browser-based operations console for ModelGrid, served by the daemon on a dedicated UI port. Lays out top-level IA (overview, cluster, gpus, deployments, models, access, logs, metrics, settings), per-view content, key user journeys, realtime/auth expectations, and open questions. Structural idioms adapted from the dcrouter Ops dashboard and grounded in ModelGrid's own cluster/catalog/vLLM domain.	2026-04-21 09:29:30 +00:00
jkunz	cec102e54e	docs(readme): fix vLLM config example fence to jsonc CI / Type Check & Lint (push) Successful in 6s Details CI / Build Test (Current Platform) (push) Successful in 6s Details CI / Build All Platforms (push) Successful in 39s Details Switch the vLLM config example from a bash code fence to jsonc and convert its inline `#` comments to `//` so the snippet is valid JSONC.	2026-04-21 08:23:10 +00:00