Juergen Kunz daaf6559e3
Some checks failed
CI / Type Check & Lint (push) Failing after 5s
CI / Build Test (Current Platform) (push) Failing after 5s
CI / Build All Platforms (push) Successful in 49s
initial
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00
2026-01-30 03:16:57 +00:00

ModelGrid

GPU infrastructure management daemon with OpenAI-compatible API for AI model containers.

ModelGrid is a root-level daemon that manages GPU infrastructure, Docker containers, and AI model serving. It provides an OpenAI-compatible API interface for seamless integration with existing tools and applications.

Features

  • Multi-GPU Support: Detect and manage NVIDIA (CUDA), AMD (ROCm), and Intel Arc (oneAPI) GPUs
  • Container Management: Orchestrate Ollama, vLLM, and TGI containers with GPU passthrough
  • OpenAI-Compatible API: Drop-in replacement API for chat completions, embeddings, and model management
  • Greenlit Models: Controlled model auto-pulling with remote configuration
  • Systemd Integration: Run as a system service with automatic startup
  • Cross-Platform: Pre-compiled binaries for Linux, macOS, and Windows

Quick Start

Installation

# Via npm (recommended)
npm install -g @modelgrid.com/modelgrid

# Via installer script
curl -sSL https://code.foss.global/modelgrid.com/modelgrid/raw/branch/main/install.sh | sudo bash

Initial Setup

# 1. Check GPU detection
sudo modelgrid gpu list

# 2. Initialize configuration
sudo modelgrid config init

# 3. Enable and start the service
sudo modelgrid service enable
sudo modelgrid service start

# 4. Check status
modelgrid service status

Using the API

Once running, ModelGrid exposes an OpenAI-compatible API:

# List available models
curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

CLI Commands

Service Management

modelgrid service enable      # Install and enable systemd service
modelgrid service disable     # Stop and disable systemd service
modelgrid service start       # Start the service
modelgrid service stop        # Stop the service
modelgrid service status      # Show service status
modelgrid service logs        # Show service logs

GPU Management

modelgrid gpu list            # List detected GPUs
modelgrid gpu status          # Show GPU utilization
modelgrid gpu drivers         # Check/install GPU drivers

Container Management

modelgrid container add       # Add a new container
modelgrid container remove    # Remove a container
modelgrid container list      # List all containers
modelgrid container start     # Start a container
modelgrid container stop      # Stop a container

Model Management

modelgrid model list          # List available/loaded models
modelgrid model pull <name>   # Pull a model
modelgrid model remove <name> # Remove a model

Configuration

modelgrid config show         # Display current configuration
modelgrid config init         # Initialize configuration

Configuration

Configuration is stored at /etc/modelgrid/config.json:

{
  "version": "1.0",
  "api": {
    "port": 8080,
    "host": "0.0.0.0",
    "apiKeys": ["your-api-key-here"]
  },
  "docker": {
    "networkName": "modelgrid",
    "runtime": "docker"
  },
  "gpus": {
    "autoDetect": true,
    "assignments": {}
  },
  "containers": [],
  "models": {
    "greenlistUrl": "https://code.foss.global/modelgrid.com/model_lists/raw/branch/main/greenlit.json",
    "autoPull": true,
    "defaultContainer": "ollama",
    "autoLoad": []
  },
  "checkInterval": 30000
}

Supported Container Types

Ollama

Best for general-purpose model serving with easy model management.

modelgrid container add --type ollama --gpu gpu-0

vLLM

High-performance serving for large models with tensor parallelism.

modelgrid container add --type vllm --gpu gpu-0,gpu-1

TGI (Text Generation Inference)

HuggingFace's production-ready inference server.

modelgrid container add --type tgi --gpu gpu-0

GPU Support

NVIDIA (CUDA)

Requires NVIDIA drivers and NVIDIA Container Toolkit:

# Check driver status
modelgrid gpu drivers

# Install if needed (Ubuntu/Debian)
sudo apt install nvidia-driver-535 nvidia-container-toolkit

AMD (ROCm)

Requires ROCm drivers:

# Check driver status
modelgrid gpu drivers

Intel Arc (oneAPI)

Requires Intel GPU drivers and oneAPI toolkit:

# Check driver status
modelgrid gpu drivers

Greenlit Models

ModelGrid uses a greenlit model system to control which models can be auto-pulled. The greenlist is fetched from a configurable URL and contains approved models with VRAM requirements:

{
  "version": "1.0",
  "models": [
    { "name": "llama3:8b", "container": "ollama", "minVram": 8 },
    { "name": "mistral:7b", "container": "ollama", "minVram": 8 },
    { "name": "llama3:70b", "container": "vllm", "minVram": 48 }
  ]
}

When a request comes in for a model not currently loaded:

  1. Check if model is in the greenlist
  2. Verify VRAM requirements can be met
  3. Auto-pull and load the model
  4. Serve the request

API Reference

Chat Completions

POST /v1/chat/completions

OpenAI-compatible chat completion endpoint with streaming support.

Models

GET /v1/models
GET /v1/models/:model

List available models or get details for a specific model.

Embeddings

POST /v1/embeddings

Generate text embeddings using compatible models.

Development

Building from Source

# Clone repository
git clone https://code.foss.global/modelgrid.com/modelgrid.git
cd modelgrid

# Run directly with Deno
deno run --allow-all mod.ts help

# Compile for current platform
deno compile --allow-all --output modelgrid mod.ts

# Compile for all platforms
bash scripts/compile-all.sh

Project Structure

modelgrid/
├── mod.ts                  # Entry point
├── ts/
│   ├── cli.ts              # CLI command routing
│   ├── modelgrid.ts        # Main coordinator class
│   ├── daemon.ts           # Background daemon
│   ├── systemd.ts          # Systemd service management
│   ├── constants.ts        # Configuration constants
│   ├── interfaces/         # TypeScript interfaces
│   ├── hardware/           # GPU detection
│   ├── drivers/            # Driver management
│   ├── docker/             # Docker management
│   ├── containers/         # Container orchestration
│   ├── api/                # OpenAI-compatible API
│   ├── models/             # Model management
│   └── cli/                # CLI handlers
├── test/                   # Test files
└── scripts/                # Build scripts

License

MIT License - See license for details.

Description
GPU infrastructure management daemon with OpenAI-compatible API for serving AI models in containers.
Readme 161 KiB
2026-01-30 03:18:21 +00:00
Languages
TypeScript 93%
Shell 3.7%
JavaScript 3.3%