modelgrid/readme.md

# ModelGrid

**GPU infrastructure management daemon with OpenAI-compatible API for AI model containers.**

ModelGrid is a root-level daemon that manages GPU infrastructure, Docker containers, and AI model serving. It provides an OpenAI-compatible API interface for seamless integration with existing tools and applications.

## Features

- **Multi-GPU Support**: Detect and manage NVIDIA (CUDA), AMD (ROCm), and Intel Arc (oneAPI) GPUs
- **Container Management**: Orchestrate Ollama, vLLM, and TGI containers with GPU passthrough
- **OpenAI-Compatible API**: Drop-in replacement API for chat completions, embeddings, and model management
- **Greenlit Models**: Controlled model auto-pulling with remote configuration
- **Systemd Integration**: Run as a system service with automatic startup
- **Cross-Platform**: Pre-compiled binaries for Linux, macOS, and Windows

## Quick Start

### Installation

```bash
# Via npm (recommended)
npm install -g @modelgrid.com/modelgrid

# Via installer script
curl -sSL https://code.foss.global/modelgrid.com/modelgrid/raw/branch/main/install.sh | sudo bash
```

### Initial Setup

```bash
# 1. Check GPU detection
sudo modelgrid gpu list

# 2. Initialize configuration
sudo modelgrid config init

# 3. Enable and start the service
sudo modelgrid service enable
sudo modelgrid service start

# 4. Check status
modelgrid service status
```

### Using the API

Once running, ModelGrid exposes an OpenAI-compatible API:

```bash
# List available models
curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

## CLI Commands

### Service Management

```bash
modelgrid service enable      # Install and enable systemd service
modelgrid service disable     # Stop and disable systemd service
modelgrid service start       # Start the service
modelgrid service stop        # Stop the service
modelgrid service status      # Show service status
modelgrid service logs        # Show service logs
```

### GPU Management

```bash
modelgrid gpu list            # List detected GPUs
modelgrid gpu status          # Show GPU utilization
modelgrid gpu drivers         # Check/install GPU drivers
```

### Container Management

```bash
modelgrid container add       # Add a new container
modelgrid container remove    # Remove a container
modelgrid container list      # List all containers
modelgrid container start     # Start a container
modelgrid container stop      # Stop a container
```

### Model Management

```bash
modelgrid model list          # List available/loaded models
modelgrid model pull <name>   # Pull a model
modelgrid model remove <name> # Remove a model
```

### Configuration

```bash
modelgrid config show         # Display current configuration
modelgrid config init         # Initialize configuration
```

## Configuration

Configuration is stored at `/etc/modelgrid/config.json`:

```json
{
  "version": "1.0",
  "api": {
    "port": 8080,
    "host": "0.0.0.0",
    "apiKeys": ["your-api-key-here"]
  },
  "docker": {
    "networkName": "modelgrid",
    "runtime": "docker"
  },
  "gpus": {
    "autoDetect": true,
    "assignments": {}
  },
  "containers": [],
  "models": {
    "greenlistUrl": "https://code.foss.global/modelgrid.com/model_lists/raw/branch/main/greenlit.json",
    "autoPull": true,
    "defaultContainer": "ollama",
    "autoLoad": []
  },
  "checkInterval": 30000
}
```

## Supported Container Types

### Ollama

Best for general-purpose model serving with easy model management.

```bash
modelgrid container add --type ollama --gpu gpu-0
```

### vLLM

High-performance serving for large models with tensor parallelism.

```bash
modelgrid container add --type vllm --gpu gpu-0,gpu-1
```

### TGI (Text Generation Inference)

HuggingFace's production-ready inference server.

```bash
modelgrid container add --type tgi --gpu gpu-0
```

## GPU Support

### NVIDIA (CUDA)

Requires NVIDIA drivers and NVIDIA Container Toolkit:

```bash
# Check driver status
modelgrid gpu drivers

# Install if needed (Ubuntu/Debian)
sudo apt install nvidia-driver-535 nvidia-container-toolkit
```

### AMD (ROCm)

Requires ROCm drivers:

```bash
# Check driver status
modelgrid gpu drivers
```

### Intel Arc (oneAPI)

Requires Intel GPU drivers and oneAPI toolkit:

```bash
# Check driver status
modelgrid gpu drivers
```

## Greenlit Models

ModelGrid uses a greenlit model system to control which models can be auto-pulled. The greenlist is fetched from a configurable URL and contains approved models with VRAM requirements:

```json
{
  "version": "1.0",
  "models": [
    { "name": "llama3:8b", "container": "ollama", "minVram": 8 },
    { "name": "mistral:7b", "container": "ollama", "minVram": 8 },
    { "name": "llama3:70b", "container": "vllm", "minVram": 48 }
  ]
}
```

When a request comes in for a model not currently loaded:
1. Check if model is in the greenlist
2. Verify VRAM requirements can be met
3. Auto-pull and load the model
4. Serve the request

## API Reference

### Chat Completions

```
POST /v1/chat/completions
```

OpenAI-compatible chat completion endpoint with streaming support.

### Models

```
GET /v1/models
GET /v1/models/:model
```

List available models or get details for a specific model.

### Embeddings

```
POST /v1/embeddings
```

Generate text embeddings using compatible models.

## Development

### Building from Source

```bash
# Clone repository
git clone https://code.foss.global/modelgrid.com/modelgrid.git
cd modelgrid

# Run directly with Deno
deno run --allow-all mod.ts help

# Compile for current platform
deno compile --allow-all --output modelgrid mod.ts

# Compile for all platforms
bash scripts/compile-all.sh
```

### Project Structure

```
modelgrid/
├── mod.ts                  # Entry point
├── ts/
│   ├── cli.ts              # CLI command routing
│   ├── modelgrid.ts        # Main coordinator class
│   ├── daemon.ts           # Background daemon
│   ├── systemd.ts          # Systemd service management
│   ├── constants.ts        # Configuration constants
│   ├── interfaces/         # TypeScript interfaces
│   ├── hardware/           # GPU detection
│   ├── drivers/            # Driver management
│   ├── docker/             # Docker management
│   ├── containers/         # Container orchestration
│   ├── api/                # OpenAI-compatible API
│   ├── models/             # Model management
│   └── cli/                # CLI handlers
├── test/                   # Test files
└── scripts/                # Build scripts
```

## License

MIT License - See [license](./license) for details.

## Links

- Repository: https://code.foss.global/modelgrid.com/modelgrid
- Issues: https://community.foss.global/