Files
modelgrid/readme.md
Juergen Kunz daaf6559e3
Some checks failed
CI / Type Check & Lint (push) Failing after 5s
CI / Build Test (Current Platform) (push) Failing after 5s
CI / Build All Platforms (push) Successful in 49s
initial
2026-01-30 03:16:57 +00:00

297 lines
6.9 KiB
Markdown

# ModelGrid
**GPU infrastructure management daemon with OpenAI-compatible API for AI model containers.**
ModelGrid is a root-level daemon that manages GPU infrastructure, Docker containers, and AI model serving. It provides an OpenAI-compatible API interface for seamless integration with existing tools and applications.
## Features
- **Multi-GPU Support**: Detect and manage NVIDIA (CUDA), AMD (ROCm), and Intel Arc (oneAPI) GPUs
- **Container Management**: Orchestrate Ollama, vLLM, and TGI containers with GPU passthrough
- **OpenAI-Compatible API**: Drop-in replacement API for chat completions, embeddings, and model management
- **Greenlit Models**: Controlled model auto-pulling with remote configuration
- **Systemd Integration**: Run as a system service with automatic startup
- **Cross-Platform**: Pre-compiled binaries for Linux, macOS, and Windows
## Quick Start
### Installation
```bash
# Via npm (recommended)
npm install -g @modelgrid.com/modelgrid
# Via installer script
curl -sSL https://code.foss.global/modelgrid.com/modelgrid/raw/branch/main/install.sh | sudo bash
```
### Initial Setup
```bash
# 1. Check GPU detection
sudo modelgrid gpu list
# 2. Initialize configuration
sudo modelgrid config init
# 3. Enable and start the service
sudo modelgrid service enable
sudo modelgrid service start
# 4. Check status
modelgrid service status
```
### Using the API
Once running, ModelGrid exposes an OpenAI-compatible API:
```bash
# List available models
curl http://localhost:8080/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"
# Chat completion
curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3:8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
## CLI Commands
### Service Management
```bash
modelgrid service enable # Install and enable systemd service
modelgrid service disable # Stop and disable systemd service
modelgrid service start # Start the service
modelgrid service stop # Stop the service
modelgrid service status # Show service status
modelgrid service logs # Show service logs
```
### GPU Management
```bash
modelgrid gpu list # List detected GPUs
modelgrid gpu status # Show GPU utilization
modelgrid gpu drivers # Check/install GPU drivers
```
### Container Management
```bash
modelgrid container add # Add a new container
modelgrid container remove # Remove a container
modelgrid container list # List all containers
modelgrid container start # Start a container
modelgrid container stop # Stop a container
```
### Model Management
```bash
modelgrid model list # List available/loaded models
modelgrid model pull <name> # Pull a model
modelgrid model remove <name> # Remove a model
```
### Configuration
```bash
modelgrid config show # Display current configuration
modelgrid config init # Initialize configuration
```
## Configuration
Configuration is stored at `/etc/modelgrid/config.json`:
```json
{
"version": "1.0",
"api": {
"port": 8080,
"host": "0.0.0.0",
"apiKeys": ["your-api-key-here"]
},
"docker": {
"networkName": "modelgrid",
"runtime": "docker"
},
"gpus": {
"autoDetect": true,
"assignments": {}
},
"containers": [],
"models": {
"greenlistUrl": "https://code.foss.global/modelgrid.com/model_lists/raw/branch/main/greenlit.json",
"autoPull": true,
"defaultContainer": "ollama",
"autoLoad": []
},
"checkInterval": 30000
}
```
## Supported Container Types
### Ollama
Best for general-purpose model serving with easy model management.
```bash
modelgrid container add --type ollama --gpu gpu-0
```
### vLLM
High-performance serving for large models with tensor parallelism.
```bash
modelgrid container add --type vllm --gpu gpu-0,gpu-1
```
### TGI (Text Generation Inference)
HuggingFace's production-ready inference server.
```bash
modelgrid container add --type tgi --gpu gpu-0
```
## GPU Support
### NVIDIA (CUDA)
Requires NVIDIA drivers and NVIDIA Container Toolkit:
```bash
# Check driver status
modelgrid gpu drivers
# Install if needed (Ubuntu/Debian)
sudo apt install nvidia-driver-535 nvidia-container-toolkit
```
### AMD (ROCm)
Requires ROCm drivers:
```bash
# Check driver status
modelgrid gpu drivers
```
### Intel Arc (oneAPI)
Requires Intel GPU drivers and oneAPI toolkit:
```bash
# Check driver status
modelgrid gpu drivers
```
## Greenlit Models
ModelGrid uses a greenlit model system to control which models can be auto-pulled. The greenlist is fetched from a configurable URL and contains approved models with VRAM requirements:
```json
{
"version": "1.0",
"models": [
{ "name": "llama3:8b", "container": "ollama", "minVram": 8 },
{ "name": "mistral:7b", "container": "ollama", "minVram": 8 },
{ "name": "llama3:70b", "container": "vllm", "minVram": 48 }
]
}
```
When a request comes in for a model not currently loaded:
1. Check if model is in the greenlist
2. Verify VRAM requirements can be met
3. Auto-pull and load the model
4. Serve the request
## API Reference
### Chat Completions
```
POST /v1/chat/completions
```
OpenAI-compatible chat completion endpoint with streaming support.
### Models
```
GET /v1/models
GET /v1/models/:model
```
List available models or get details for a specific model.
### Embeddings
```
POST /v1/embeddings
```
Generate text embeddings using compatible models.
## Development
### Building from Source
```bash
# Clone repository
git clone https://code.foss.global/modelgrid.com/modelgrid.git
cd modelgrid
# Run directly with Deno
deno run --allow-all mod.ts help
# Compile for current platform
deno compile --allow-all --output modelgrid mod.ts
# Compile for all platforms
bash scripts/compile-all.sh
```
### Project Structure
```
modelgrid/
├── mod.ts # Entry point
├── ts/
│ ├── cli.ts # CLI command routing
│ ├── modelgrid.ts # Main coordinator class
│ ├── daemon.ts # Background daemon
│ ├── systemd.ts # Systemd service management
│ ├── constants.ts # Configuration constants
│ ├── interfaces/ # TypeScript interfaces
│ ├── hardware/ # GPU detection
│ ├── drivers/ # Driver management
│ ├── docker/ # Docker management
│ ├── containers/ # Container orchestration
│ ├── api/ # OpenAI-compatible API
│ ├── models/ # Model management
│ └── cli/ # CLI handlers
├── test/ # Test files
└── scripts/ # Build scripts
```
## License
MIT License - See [license](./license) for details.
## Links
- Repository: https://code.foss.global/modelgrid.com/modelgrid
- Issues: https://community.foss.global/