modelgrid.com/modelgrid

Fork 0

Go to file

Juergen Kunz daaf6559e3

CI / Type Check & Lint (push) Failing after 5s

Details

CI / Build Test (Current Platform) (push) Failing after 5s

Details

CI / Build All Platforms (push) Successful in 49s

Details

initial

2026-01-30 03:16:57 +00:00

.gitea

initial

2026-01-30 03:16:57 +00:00

.vscode

initial

2026-01-30 03:16:57 +00:00

bin

initial

2026-01-30 03:16:57 +00:00

docs

initial

2026-01-30 03:16:57 +00:00

scripts

initial

2026-01-30 03:16:57 +00:00

test

initial

2026-01-30 03:16:57 +00:00

initial

2026-01-30 03:16:57 +00:00

.gitignore

initial

2026-01-30 03:16:57 +00:00

.npmignore

initial

2026-01-30 03:16:57 +00:00

changelog.md

initial

2026-01-30 03:16:57 +00:00

deno.json

initial

2026-01-30 03:16:57 +00:00

install.sh

initial

2026-01-30 03:16:57 +00:00

license

initial

2026-01-30 03:16:57 +00:00

mod.ts

initial

2026-01-30 03:16:57 +00:00

npmextra.json

initial

2026-01-30 03:16:57 +00:00

package.json

initial

2026-01-30 03:16:57 +00:00

readme.hints.md

initial

2026-01-30 03:16:57 +00:00

readme.md

initial

2026-01-30 03:16:57 +00:00

readme.plan.md

initial

2026-01-30 03:16:57 +00:00

uninstall.sh

initial

2026-01-30 03:16:57 +00:00

readme.md

ModelGrid

GPU infrastructure management daemon with OpenAI-compatible API for AI model containers.

ModelGrid is a root-level daemon that manages GPU infrastructure, Docker containers, and AI model serving. It provides an OpenAI-compatible API interface for seamless integration with existing tools and applications.

Features

Multi-GPU Support: Detect and manage NVIDIA (CUDA), AMD (ROCm), and Intel Arc (oneAPI) GPUs
Container Management: Orchestrate Ollama, vLLM, and TGI containers with GPU passthrough
OpenAI-Compatible API: Drop-in replacement API for chat completions, embeddings, and model management
Greenlit Models: Controlled model auto-pulling with remote configuration
Systemd Integration: Run as a system service with automatic startup
Cross-Platform: Pre-compiled binaries for Linux, macOS, and Windows

Quick Start

Installation

# Via npm (recommended)
npm install -g @modelgrid.com/modelgrid

# Via installer script
curl -sSL https://code.foss.global/modelgrid.com/modelgrid/raw/branch/main/install.sh | sudo bash

Initial Setup

# 1. Check GPU detection
sudo modelgrid gpu list

# 2. Initialize configuration
sudo modelgrid config init

# 3. Enable and start the service
sudo modelgrid service enable
sudo modelgrid service start

# 4. Check status
modelgrid service status

Using the API

Once running, ModelGrid exposes an OpenAI-compatible API:

# List available models
curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

CLI Commands

Service Management

modelgrid service enable      # Install and enable systemd service
modelgrid service disable     # Stop and disable systemd service
modelgrid service start       # Start the service
modelgrid service stop        # Stop the service
modelgrid service status      # Show service status
modelgrid service logs        # Show service logs

GPU Management

modelgrid gpu list            # List detected GPUs
modelgrid gpu status          # Show GPU utilization
modelgrid gpu drivers         # Check/install GPU drivers

Container Management

modelgrid container add       # Add a new container
modelgrid container remove    # Remove a container
modelgrid container list      # List all containers
modelgrid container start     # Start a container
modelgrid container stop      # Stop a container

Model Management

modelgrid model list          # List available/loaded models
modelgrid model pull <name>   # Pull a model
modelgrid model remove <name> # Remove a model

Configuration

modelgrid config show         # Display current configuration
modelgrid config init         # Initialize configuration

Configuration

Configuration is stored at /etc/modelgrid/config.json:

{
  "version": "1.0",
  "api": {
    "port": 8080,
    "host": "0.0.0.0",
    "apiKeys": ["your-api-key-here"]
  },
  "docker": {
    "networkName": "modelgrid",
    "runtime": "docker"
  },
  "gpus": {
    "autoDetect": true,
    "assignments": {}
  },
  "containers": [],
  "models": {
    "greenlistUrl": "https://code.foss.global/modelgrid.com/model_lists/raw/branch/main/greenlit.json",
    "autoPull": true,
    "defaultContainer": "ollama",
    "autoLoad": []
  },
  "checkInterval": 30000
}

Supported Container Types

Ollama

Best for general-purpose model serving with easy model management.

modelgrid container add --type ollama --gpu gpu-0

vLLM

High-performance serving for large models with tensor parallelism.

modelgrid container add --type vllm --gpu gpu-0,gpu-1

TGI (Text Generation Inference)

HuggingFace's production-ready inference server.

modelgrid container add --type tgi --gpu gpu-0

GPU Support

NVIDIA (CUDA)

Requires NVIDIA drivers and NVIDIA Container Toolkit:

# Check driver status
modelgrid gpu drivers

# Install if needed (Ubuntu/Debian)
sudo apt install nvidia-driver-535 nvidia-container-toolkit

AMD (ROCm)

Requires ROCm drivers:

# Check driver status
modelgrid gpu drivers

Intel Arc (oneAPI)

Requires Intel GPU drivers and oneAPI toolkit:

# Check driver status
modelgrid gpu drivers

Greenlit Models

ModelGrid uses a greenlit model system to control which models can be auto-pulled. The greenlist is fetched from a configurable URL and contains approved models with VRAM requirements:

{
  "version": "1.0",
  "models": [
    { "name": "llama3:8b", "container": "ollama", "minVram": 8 },
    { "name": "mistral:7b", "container": "ollama", "minVram": 8 },
    { "name": "llama3:70b", "container": "vllm", "minVram": 48 }
  ]
}

When a request comes in for a model not currently loaded:

Check if model is in the greenlist
Verify VRAM requirements can be met
Auto-pull and load the model
Serve the request

API Reference

Chat Completions

POST /v1/chat/completions

OpenAI-compatible chat completion endpoint with streaming support.

Models

GET /v1/models
GET /v1/models/:model

List available models or get details for a specific model.

Embeddings

POST /v1/embeddings

Generate text embeddings using compatible models.

Development

Building from Source

# Clone repository
git clone https://code.foss.global/modelgrid.com/modelgrid.git
cd modelgrid

# Run directly with Deno
deno run --allow-all mod.ts help

# Compile for current platform
deno compile --allow-all --output modelgrid mod.ts

# Compile for all platforms
bash scripts/compile-all.sh

Project Structure

modelgrid/
├── mod.ts                  # Entry point
├── ts/
│   ├── cli.ts              # CLI command routing
│   ├── modelgrid.ts        # Main coordinator class
│   ├── daemon.ts           # Background daemon
│   ├── systemd.ts          # Systemd service management
│   ├── constants.ts        # Configuration constants
│   ├── interfaces/         # TypeScript interfaces
│   ├── hardware/           # GPU detection
│   ├── drivers/            # Driver management
│   ├── docker/             # Docker management
│   ├── containers/         # Container orchestration
│   ├── api/                # OpenAI-compatible API
│   ├── models/             # Model management
│   └── cli/                # CLI handlers
├── test/                   # Test files
└── scripts/                # Build scripts

License

MIT License - See license for details.

readme.md

ModelGrid

Features

Quick Start

Installation

Initial Setup

Using the API

CLI Commands

Service Management

GPU Management

Container Management

Model Management

Configuration

Configuration

Supported Container Types

Ollama

vLLM

TGI (Text Generation Inference)

GPU Support

NVIDIA (CUDA)

AMD (ROCm)

Intel Arc (oneAPI)

Greenlit Models

API Reference

Chat Completions

Models

Embeddings

Development

Building from Source

Project Structure

License

Links