Files
modelgrid/readme.hints.md
Juergen Kunz daaf6559e3
Some checks failed
CI / Type Check & Lint (push) Failing after 5s
CI / Build Test (Current Platform) (push) Failing after 5s
CI / Build All Platforms (push) Successful in 49s
initial
2026-01-30 03:16:57 +00:00

6.7 KiB

ModelGrid Project Hints

Project Overview

ModelGrid is a root-level daemon that manages GPU infrastructure, Docker, and AI model containers (Ollama, vLLM, TGI) with an OpenAI-compatible API interface.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        ModelGrid Daemon                          │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │   CLI       │  │  Hardware   │  │   Container Manager     │  │
│  │  Commands   │  │  Detection  │  │  (Docker/Podman)        │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │   Driver    │  │   Model     │  │   OpenAI API Gateway    │  │
│  │  Installer  │  │  Registry   │  │  (HTTP Server)          │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│                     Systemd Service                              │
└─────────────────────────────────────────────────────────────────┘

File Organization

ts/
├── index.ts              # Node.js entry point
├── cli.ts                # CLI router
├── modelgrid.ts          # Main coordinator (facade)
├── daemon.ts             # Background daemon
├── systemd.ts            # Systemd integration
├── constants.ts          # Configuration constants
├── logger.ts             # Logging utilities
├── colors.ts             # Color themes
├── interfaces/           # TypeScript interfaces
│   ├── config.ts         # IModelGridConfig
│   ├── gpu.ts            # IGpuInfo, IGpuStatus
│   ├── container.ts      # IContainerConfig, IContainerStatus
│   └── api.ts            # OpenAI API types
├── hardware/             # Hardware detection
│   ├── gpu-detector.ts   # Detect GPUs (NVIDIA, AMD, Intel)
│   └── system-info.ts    # CPU, RAM info
├── drivers/              # Driver management
│   ├── nvidia.ts         # NVIDIA driver + CUDA
│   ├── amd.ts            # AMD driver + ROCm
│   ├── intel.ts          # Intel Arc + oneAPI
│   └── driver-manager.ts # Driver orchestrator
├── docker/               # Docker management
│   ├── docker-manager.ts # Docker setup
│   └── container-runtime.ts # Container lifecycle
├── containers/           # AI container management
│   ├── ollama.ts         # Ollama container
│   ├── vllm.ts           # vLLM container
│   ├── tgi.ts            # TGI container
│   └── container-manager.ts # Orchestrator
├── models/               # Model management
│   ├── registry.ts       # Greenlit model registry
│   └── loader.ts         # Model loading with VRAM checks
├── api/                  # OpenAI-compatible API
│   ├── server.ts         # HTTP server
│   ├── router.ts         # Request routing
│   ├── handlers/         # API endpoint handlers
│   │   ├── chat.ts       # /v1/chat/completions
│   │   ├── models.ts     # /v1/models
│   │   └── embeddings.ts # /v1/embeddings
│   └── middleware/       # Request processing
│       ├── auth.ts       # API key validation
│       └── sanity.ts     # Request validation
├── cli/                  # CLI handlers
│   ├── service-handler.ts
│   ├── gpu-handler.ts
│   ├── container-handler.ts
│   ├── model-handler.ts
│   └── config-handler.ts
└── helpers/              # Utilities
    ├── prompt.ts         # Readline utility
    └── shortid.ts        # ID generation

Key Concepts

Greenlit Model System

  • Only pre-approved models can be auto-pulled for security
  • Greenlist fetched from remote URL (configurable)
  • VRAM requirements checked before loading

Container Types

  • Ollama: Easy to use, native API converted to OpenAI format
  • vLLM: High performance, natively OpenAI-compatible
  • TGI: HuggingFace Text Generation Inference

GPU Support

  • NVIDIA: nvidia-smi, CUDA, nvidia-docker2
  • AMD: rocm-smi, ROCm
  • Intel Arc: xpu-smi, oneAPI

Configuration

Config file: /etc/modelgrid/config.json

interface IModelGridConfig {
  version: string;
  api: {
    port: number;           // Default: 8080
    host: string;           // Default: '0.0.0.0'
    apiKeys: string[];      // Valid API keys
    cors: boolean;
    corsOrigins: string[];
  };
  docker: {
    networkName: string;    // Default: 'modelgrid'
    runtime: 'docker' | 'podman';
  };
  gpus: {
    autoDetect: boolean;
    assignments: Record<string, string>;
  };
  containers: IContainerConfig[];
  models: {
    greenlistUrl: string;
    autoPull: boolean;
    defaultContainer: string;
    autoLoad: string[];
  };
  checkInterval: number;
}

CLI Commands

modelgrid service enable/disable/start/stop/status/logs
modelgrid gpu list/status/drivers/install
modelgrid container list/add/remove/start/stop/logs
modelgrid model list/pull/remove/status/refresh
modelgrid config show/init/apikey

API Endpoints

  • POST /v1/chat/completions - Chat completion (OpenAI-compatible)
  • GET /v1/models - List available models
  • POST /v1/embeddings - Generate embeddings
  • GET /health - Health check
  • GET /metrics - Prometheus metrics

Development Notes

  • All async patterns preferred for flexibility
  • Use fs.promises instead of sync methods
  • Containers auto-start on daemon startup
  • Models auto-preload if configured