T

jkunz 9f7308498c fix(cluster): skip persistence scheduling until initialize has run

schedulePersist and scheduleControlPersist can fire from configure() and
the public scheduling paths before initialize() has completed. Without a
guard, those queued microtasks call persistState/persistControlState,
which try to mkdir PATHS.DATA_DIR and write state files from tests and
short-lived scripts that never meant to touch the data directory. That
produced async-leak warnings in the Cluster manager unit tests and
left orphan directories on hosts that only constructed a ClusterManager
to inspect it.

Add an `initialized` flag set at the end of initialize() and early-return
from both schedulers when it is false. Real runtime paths always call
initialize() during Daemon startup, so this changes no production
behavior.

2026-04-21 12:39:50 +00:00

.gitea

initial

2026-01-30 03:16:57 +00:00

.vscode

initial

2026-01-30 03:16:57 +00:00

bin

initial

2026-01-30 03:16:57 +00:00

docs

initial

2026-01-30 03:16:57 +00:00

scripts

feat(ui): add browser console served by the daemon

2026-04-21 10:01:44 +00:00

test

feat(ui): add browser console served by the daemon

2026-04-21 10:01:44 +00:00

fix(cluster): skip persistence scheduling until initialize has run

2026-04-21 12:39:50 +00:00

ts_web

feat(ui): add browser console served by the daemon

2026-04-21 10:01:44 +00:00

.gitignore

feat(ui): add browser console served by the daemon

2026-04-21 10:01:44 +00:00

.npmignore

initial

2026-01-30 03:16:57 +00:00

changelog.md

feat(cluster,api,models,cli): add cluster-aware model catalog deployments and request routing

2026-04-20 23:00:50 +00:00

deno.json

feat(ui): add browser console served by the daemon

2026-04-21 10:01:44 +00:00

install.sh

initial

2026-01-30 03:16:57 +00:00

license

initial

2026-01-30 03:16:57 +00:00

mod.ts

initial

2026-01-30 03:16:57 +00:00

npmextra.json

initial

2026-01-30 03:16:57 +00:00

package.json

v1.1.0

2026-04-20 23:00:50 +00:00

readme.hints.md

feat(cluster,api,models,cli): add cluster-aware model catalog deployments and request routing

2026-04-20 23:00:50 +00:00

readme.md

docs(readme): fix vLLM config example fence to jsonc

2026-04-21 08:23:10 +00:00

readme.plan.md

feat(cluster,api,models,cli): add cluster-aware model catalog deployments and request routing

2026-04-20 23:00:50 +00:00

readme.ui.md

docs(readme): ship UI via typedserver + bundled ts module

2026-04-21 09:38:24 +00:00

uninstall.sh

initial

2026-01-30 03:16:57 +00:00

readme.md

🚀 ModelGrid

vLLM deployment manager with an OpenAI-compatible API, clustering foundations, and a public OSS model catalog.

ModelGrid is a root-level daemon that turns any GPU-equipped machine into a vLLM-serving node. It manages single-model vLLM deployments across NVIDIA, AMD, and Intel GPUs, exposes a unified OpenAI-compatible API, and resolves deployable models from list.modelgrid.com.

┌─────────────────────────────────────────────────────────────────────────┐
│                          ModelGrid Daemon                                │
│  ┌───────────────┐   ┌───────────────┐   ┌───────────────────────────┐  │
│  │  GPU Manager  │   │  vLLM Deploy  │   │   OpenAI-Compatible API   │  │
│  │ NVIDIA/AMD/   │──▶│   Scheduler    │──▶│   /v1/chat/completions    │  │
│  │ Intel Arc     │   │ + Cluster Base │   │   /v1/models              │  │
│  └───────────────┘   └───────────────┘   │   /v1/embeddings          │  │
│           │                  │            └───────────────────────────┘  │
│           └──── list.modelgrid.com catalog + deployment metadata ───────┘
└─────────────────────────────────────────────────────────────────────────┘

Issue Reporting and Security

For reporting bugs, issues, or security vulnerabilities, please visit community.foss.global/. This is the central community hub for all issue reporting. Developers who sign and comply with our contribution agreement and go through identification can also get a code.foss.global/ account to submit Pull Requests directly.

✨ Features

🎯 OpenAI-Compatible API — Drop-in replacement for OpenAI's API. Works with existing tools, SDKs, and applications
🖥️ Multi-GPU Support — Auto-detect and manage NVIDIA (CUDA), AMD (ROCm), and Intel Arc (oneAPI) GPUs
📦 vLLM Deployments — Launch model-specific vLLM runtimes instead of hand-managing containers
📚 OSS Model Catalog — Resolve supported models from list.modelgrid.com
🕸️ Cluster Foundation — Cluster-aware config surface for standalone, control-plane, and worker roles
⚡ Streaming Support — Real-time token streaming via Server-Sent Events
🔄 Auto-Recovery — Health monitoring with automatic container restart
🐳 Docker Native — Full Docker/Podman integration with isolated networking
📊 Prometheus Metrics — Built-in /metrics endpoint for monitoring
🖥️ Cross-Platform — Pre-compiled binaries for Linux, macOS, and Windows

📥 Installation

Via npm (Recommended)

npm install -g @modelgrid.com/modelgrid

Via Installer Script

curl -sSL https://code.foss.global/modelgrid.com/modelgrid/raw/branch/main/install.sh | sudo bash

Manual Binary Download

Download the appropriate binary for your platform from releases:

Platform	Binary
Linux x64	`modelgrid-linux-x64`
Linux ARM64	`modelgrid-linux-arm64`
macOS Intel	`modelgrid-macos-x64`
macOS Apple Silicon	`modelgrid-macos-arm64`
Windows x64	`modelgrid-windows-x64.exe`

chmod +x modelgrid-linux-x64
sudo mv modelgrid-linux-x64 /usr/local/bin/modelgrid

🚀 Quick Start

# 1. Check your GPUs
sudo modelgrid gpu list

# 2. Initialize configuration
sudo modelgrid config init

# 3. Add an API key
sudo modelgrid config apikey add

# 4. Browse the public catalog
sudo modelgrid model list

# 5. Deploy a model
sudo modelgrid run meta-llama/Llama-3.1-8B-Instruct

# 6. Enable and start the service
sudo modelgrid service enable
sudo modelgrid service start

# 7. Test the API
curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

That's it! Your GPU server is now serving AI models with an OpenAI-compatible API. 🎉

📖 API Reference

ModelGrid exposes a fully OpenAI-compatible API on port 8080 (configurable).

Authentication

All API endpoints require Bearer token authentication:

curl -H "Authorization: Bearer YOUR_API_KEY" http://localhost:8080/v1/models

Chat Completions

POST /v1/chat/completions

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is machine learning?"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024,
    "stream": false
  }'

Streaming Response:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "messages": [{"role": "user", "content": "Write a poem about AI"}],
    "stream": true
  }' --no-buffer

List Models

GET /v1/models

Returns all available models across all containers:

{
  "object": "list",
  "data": [
    {
      "id": "llama3:8b",
      "object": "model",
      "owned_by": "modelgrid",
      "created": 1706745600
    }
  ]
}

Embeddings

POST /v1/embeddings

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "input": "Hello, world!"
  }'

Health Check (No Auth Required)

GET /health

{
  "status": "ok",
  "uptime": 3600,
  "containers": { "total": 2, "running": 2 },
  "models": 5,
  "gpus": 2
}

Prometheus Metrics (No Auth Required)

GET /metrics

# HELP modelgrid_uptime_seconds Server uptime in seconds
modelgrid_uptime_seconds 3600
# HELP modelgrid_containers_total Total configured containers
modelgrid_containers_total 2
# HELP modelgrid_containers_running Running containers
modelgrid_containers_running 2

🔧 CLI Commands

Service Management

modelgrid service enable       # Install and enable systemd service
modelgrid service disable      # Stop and disable service
modelgrid service start        # Start the daemon
modelgrid service stop         # Stop the daemon
modelgrid service restart      # Restart the daemon
modelgrid service status       # Show service status with GPU/container info
modelgrid service logs         # Tail live service logs

GPU Management

modelgrid gpu list             # List all detected GPUs with VRAM info
modelgrid gpu status           # Show real-time GPU utilization
modelgrid gpu drivers          # Check driver status for all GPUs
modelgrid gpu install          # Install GPU drivers (interactive)

Example output:

GPU Devices (2):
┌────────────┬──────────────────────┬────────┬─────────────┬────────────┐
│ ID         │ Model                │ VRAM   │ Driver      │ Status     │
├────────────┼──────────────────────┼────────┼─────────────┼────────────┤
│ nvidia-0   │ NVIDIA RTX 4090      │ 24 GB  │ 535.154.05  │ Ready      │
│ nvidia-1   │ NVIDIA RTX 4090      │ 24 GB  │ 535.154.05  │ In Use     │
└────────────┴──────────────────────┴────────┴─────────────┴────────────┘

Deployment Management

modelgrid ps                   # List active deployments
modelgrid run MODEL            # Deploy a model from the registry
modelgrid container list       # List all configured deployments
modelgrid container add        # Interactive deployment setup wizard
modelgrid container remove ID  # Remove a deployment
modelgrid container start [ID] # Start deployment(s)
modelgrid container stop [ID]  # Stop deployment(s)
modelgrid container logs ID    # Show deployment logs

Model Management

modelgrid model list           # List available/loaded models
modelgrid model pull NAME      # Deploy a model from the registry
modelgrid model remove NAME    # Remove a model deployment
modelgrid model status         # Show model recommendations with VRAM analysis
modelgrid model refresh        # Refresh registry cache

Configuration

modelgrid config show          # Display current configuration
modelgrid config init          # Initialize default configuration
modelgrid config apikey list   # List configured API keys
modelgrid config apikey add    # Generate and add new API key
modelgrid config apikey remove # Remove an API key

Cluster

modelgrid cluster status       # Show cluster state
modelgrid cluster nodes        # List registered nodes
modelgrid cluster models       # Show model locations across nodes
modelgrid cluster desired      # Show desired deployment targets
modelgrid cluster ensure NAME  # Ask control plane to schedule a model
modelgrid cluster scale NAME 3 # Set desired replica count
modelgrid cluster clear NAME   # Remove desired deployment target
modelgrid cluster cordon NODE  # Prevent new placements on a node
modelgrid cluster drain NODE   # Mark a node for evacuation
modelgrid cluster activate NODE # Mark a node active again

Global Options

--debug, -d    # Enable debug mode (verbose logging)
--version, -v  # Show version information
--help, -h     # Show help message

📦 Supported Runtime

vLLM

High-performance inference with PagedAttention and continuous batching.

{
  "id": "vllm-1",
  "type": "vllm",
  "name": "vLLM Server",
  "gpuIds": ["nvidia-0", "nvidia-1"], // Tensor parallelism
  "port": 8000,
  "env": {
    "HF_TOKEN": "your-huggingface-token" // For gated models
  }
}

Best for: Production workloads, multi-GPU tensor parallelism, OpenAI-compatible serving

🎯 GPU Support

NVIDIA (CUDA)

Requirements:

NVIDIA Driver 470+
CUDA Toolkit 11.0+
NVIDIA Container Toolkit (nvidia-docker2)

# Check status
modelgrid gpu drivers

# Install (Ubuntu/Debian)
sudo apt install nvidia-driver-535 nvidia-container-toolkit
sudo systemctl restart docker

AMD (ROCm)

Requirements:

ROCm 5.0+
AMD GPU with ROCm support (RX 6000+, MI series)

# Install ROCm
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.0.60000-1_all.deb
sudo apt install ./amdgpu-install_6.0.60000-1_all.deb
sudo amdgpu-install --usecase=rocm

Intel Arc (oneAPI)

Requirements:

Intel oneAPI Base Toolkit
Intel Arc A-series GPU (A770, A750, A380)

# Install oneAPI
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | sudo gpg --dearmor -o /usr/share/keyrings/intel-oneapi-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/intel-oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/intel-oneapi.list
sudo apt update && sudo apt install intel-basekit

⚙️ Configuration

Configuration is stored at /etc/modelgrid/config.json:

{
  "version": "1.0",
  "api": {
    "port": 8080,
    "host": "0.0.0.0",
    "apiKeys": ["sk-your-api-key-here"],
    "cors": false,
    "corsOrigins": ["*"],
    "rateLimit": 60
  },
  "docker": {
    "networkName": "modelgrid",
    "runtime": "docker",
    "socketPath": "/var/run/docker.sock"
  },
  "gpus": {
    "autoDetect": true,
    "assignments": {}
  },
  "containers": [
    {
      "id": "vllm-llama31-8b",
      "type": "vllm",
      "name": "Primary vLLM",
      "image": "vllm/vllm-openai:latest",
      "gpuIds": ["nvidia-0"],
      "port": 8000,
      "models": ["meta-llama/Llama-3.1-8B-Instruct"],
      "env": {},
      "volumes": []
    }
  ],
  "models": {
    "registryUrl": "https://list.modelgrid.com/catalog/models.json",
    "autoDeploy": true,
    "defaultEngine": "vllm",
    "autoLoad": ["meta-llama/Llama-3.1-8B-Instruct"]
  },
  "cluster": {
    "enabled": false,
    "nodeName": "modelgrid-local",
    "role": "standalone",
    "bindHost": "0.0.0.0",
    "gossipPort": 7946,
    "sharedSecret": "",
    "advertiseUrl": "http://127.0.0.1:8080",
    "heartbeatIntervalMs": 5000,
    "seedNodes": []
  },
  "checkInterval": 30000
}

Configuration Options

Option	Description	Default
`api.port`	API server port	`8080`
`api.host`	Bind address	`0.0.0.0`
`api.apiKeys`	Valid API keys	`[]`
`api.rateLimit`	Requests per minute	`60`
`docker.runtime`	Container runtime	`docker`
`gpus.autoDetect`	Auto-detect GPUs	`true`
`models.autoDeploy`	Auto-start deployments on demand	`true`
`models.autoLoad`	Models to preload on start	`[]`
`cluster.role`	Cluster mode	`standalone`
`cluster.sharedSecret`	Shared secret for `/_cluster/*`	unset
`cluster.advertiseUrl`	URL advertised to other nodes	`http://127.0.0.1:8080`
`cluster.controlPlaneUrl`	Control-plane URL for workers	unset
`checkInterval`	Health check interval (ms)	`30000`

🕸️ Clustering

Cluster mode uses ModelGrid's internal control-plane endpoints to:

register worker nodes
advertise locally deployed models
persist desired deployment targets separately from live heartbeats
schedule new deployments onto healthy nodes with enough VRAM
proxy OpenAI-compatible requests to the selected node gateway
exclude cordoned or draining nodes from new placements

Minimal setup:

{
  "cluster": {
    "enabled": true,
    "nodeName": "worker-a",
    "role": "worker",
    "sharedSecret": "replace-me-with-a-random-secret",
    "advertiseUrl": "http://worker-a.internal:8080",
    "controlPlaneUrl": "http://control.internal:8080",
    "heartbeatIntervalMs": 5000
  }
}

For the control plane, set role to control-plane and advertiseUrl to its reachable API URL. Set the same cluster.sharedSecret on every node to protect internal cluster endpoints.

Runtime state files:

/var/lib/modelgrid/cluster-state.json for live node heartbeats
/var/lib/modelgrid/cluster-control-state.json for desired deployments and node lifecycle state

📚 Model Catalog

ModelGrid resolves deployable models from list.modelgrid.com. The catalog is public, versioned, and describes:

canonical model IDs
aliases
minimum VRAM and GPU count
vLLM launch defaults
desired replica counts
capabilities like chat, completions, and embeddings

Default catalog includes:

Qwen/Qwen2.5-7B-Instruct
meta-llama/Llama-3.1-8B-Instruct
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
BAAI/bge-m3

Custom registry source:

// models.json
{
  "version": "1.0",
  "generatedAt": "2026-04-20T00:00:00.000Z",
  "models": [
    {
      "id": "meta-llama/Llama-3.1-8B-Instruct",
      "engine": "vllm",
      "source": { "repo": "meta-llama/Llama-3.1-8B-Instruct" },
      "capabilities": { "chat": true },
      "requirements": { "minVramGb": 18 },
      "launchDefaults": { "replicas": 2 }
    }
  ]
}

Configure with:

{
  "models": {
    "registryUrl": "https://your-server.com/models.json"
  }
}

🏗️ Development

Building from Source

# Clone repository
git clone https://code.foss.global/modelgrid.com/modelgrid.git
cd modelgrid

# Run directly with Deno
deno run --allow-all mod.ts help

# Run tests
deno task test

# Type check
deno task check

# Compile for current platform
deno compile --allow-all --output modelgrid mod.ts

# Compile for all platforms
deno task compile

Project Structure

modelgrid/
├── mod.ts                    # Deno entry point
├── ts/
│   ├── cli.ts                # CLI command routing
│   ├── modelgrid.ts          # Main coordinator class
│   ├── daemon.ts             # Background daemon process
│   ├── systemd.ts            # Systemd service integration
│   ├── constants.ts          # Configuration constants
│   ├── logger.ts             # Logging utilities
│   ├── interfaces/           # TypeScript interfaces
│   ├── hardware/             # GPU detection (NVIDIA/AMD/Intel)
│   ├── drivers/              # Driver management
│   ├── docker/               # Docker management
│   ├── containers/           # Container orchestration
│   │   ├── vllm.ts           # vLLM implementation
│   │   └── tgi.ts            # TGI implementation
│   ├── api/                  # OpenAI-compatible API
│   │   ├── server.ts         # HTTP server
│   │   ├── router.ts         # Request routing
│   │   └── handlers/         # Endpoint handlers
│   ├── models/               # Model management
│   └── cli/                  # CLI handlers
├── test/                     # Test files
├── scripts/                  # Build scripts
└── bin/                      # npm wrapper

🗑️ Uninstallation

# Stop and remove service
sudo modelgrid service disable

# Uninstall via script
sudo modelgrid uninstall

# Or manual removal
sudo rm /usr/local/bin/modelgrid
sudo rm -rf /etc/modelgrid
sudo rm -rf /opt/modelgrid
sudo rm /etc/systemd/system/modelgrid.service
sudo systemctl daemon-reload

📚 Resources

Repository: https://code.foss.global/modelgrid.com/modelgrid
Issues: https://community.foss.global/
Releases: https://code.foss.global/modelgrid.com/modelgrid/releases

License and Legal Information

This repository contains open-source code licensed under the MIT License. A copy of the license can be found in the LICENSE file.

Please note: The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.

Trademarks

This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH or third parties, and are not included within the scope of the MIT license granted herein.

Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines or the guidelines of the respective third-party owners, and any usage must be approved in writing. Third-party trademarks used herein are the property of their respective owners and used only in a descriptive manner, e.g. for an implementation of an API or similar.

readme.md

🚀 ModelGrid

Issue Reporting and Security

✨ Features

📥 Installation

Via npm (Recommended)

Via Installer Script

Manual Binary Download

🚀 Quick Start

📖 API Reference

Authentication

Chat Completions

List Models

Embeddings

Health Check (No Auth Required)

Prometheus Metrics (No Auth Required)

🔧 CLI Commands

Service Management

GPU Management

Deployment Management

Model Management

Configuration

Cluster

Global Options

📦 Supported Runtime

vLLM

🎯 GPU Support

NVIDIA (CUDA)

AMD (ROCm)

Intel Arc (oneAPI)

⚙️ Configuration

Configuration Options

🕸️ Clustering

📚 Model Catalog

🏗️ Development

Building from Source

Project Structure

🗑️ Uninstallation

📚 Resources

License and Legal Information

Trademarks

Company Information