297 lines
6.9 KiB
Markdown
297 lines
6.9 KiB
Markdown
|
|
# ModelGrid
|
||
|
|
|
||
|
|
**GPU infrastructure management daemon with OpenAI-compatible API for AI model containers.**
|
||
|
|
|
||
|
|
ModelGrid is a root-level daemon that manages GPU infrastructure, Docker containers, and AI model serving. It provides an OpenAI-compatible API interface for seamless integration with existing tools and applications.
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
- **Multi-GPU Support**: Detect and manage NVIDIA (CUDA), AMD (ROCm), and Intel Arc (oneAPI) GPUs
|
||
|
|
- **Container Management**: Orchestrate Ollama, vLLM, and TGI containers with GPU passthrough
|
||
|
|
- **OpenAI-Compatible API**: Drop-in replacement API for chat completions, embeddings, and model management
|
||
|
|
- **Greenlit Models**: Controlled model auto-pulling with remote configuration
|
||
|
|
- **Systemd Integration**: Run as a system service with automatic startup
|
||
|
|
- **Cross-Platform**: Pre-compiled binaries for Linux, macOS, and Windows
|
||
|
|
|
||
|
|
## Quick Start
|
||
|
|
|
||
|
|
### Installation
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Via npm (recommended)
|
||
|
|
npm install -g @modelgrid.com/modelgrid
|
||
|
|
|
||
|
|
# Via installer script
|
||
|
|
curl -sSL https://code.foss.global/modelgrid.com/modelgrid/raw/branch/main/install.sh | sudo bash
|
||
|
|
```
|
||
|
|
|
||
|
|
### Initial Setup
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# 1. Check GPU detection
|
||
|
|
sudo modelgrid gpu list
|
||
|
|
|
||
|
|
# 2. Initialize configuration
|
||
|
|
sudo modelgrid config init
|
||
|
|
|
||
|
|
# 3. Enable and start the service
|
||
|
|
sudo modelgrid service enable
|
||
|
|
sudo modelgrid service start
|
||
|
|
|
||
|
|
# 4. Check status
|
||
|
|
modelgrid service status
|
||
|
|
```
|
||
|
|
|
||
|
|
### Using the API
|
||
|
|
|
||
|
|
Once running, ModelGrid exposes an OpenAI-compatible API:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# List available models
|
||
|
|
curl http://localhost:8080/v1/models \
|
||
|
|
-H "Authorization: Bearer YOUR_API_KEY"
|
||
|
|
|
||
|
|
# Chat completion
|
||
|
|
curl http://localhost:8080/v1/chat/completions \
|
||
|
|
-H "Authorization: Bearer YOUR_API_KEY" \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{
|
||
|
|
"model": "llama3:8b",
|
||
|
|
"messages": [{"role": "user", "content": "Hello!"}]
|
||
|
|
}'
|
||
|
|
```
|
||
|
|
|
||
|
|
## CLI Commands
|
||
|
|
|
||
|
|
### Service Management
|
||
|
|
|
||
|
|
```bash
|
||
|
|
modelgrid service enable # Install and enable systemd service
|
||
|
|
modelgrid service disable # Stop and disable systemd service
|
||
|
|
modelgrid service start # Start the service
|
||
|
|
modelgrid service stop # Stop the service
|
||
|
|
modelgrid service status # Show service status
|
||
|
|
modelgrid service logs # Show service logs
|
||
|
|
```
|
||
|
|
|
||
|
|
### GPU Management
|
||
|
|
|
||
|
|
```bash
|
||
|
|
modelgrid gpu list # List detected GPUs
|
||
|
|
modelgrid gpu status # Show GPU utilization
|
||
|
|
modelgrid gpu drivers # Check/install GPU drivers
|
||
|
|
```
|
||
|
|
|
||
|
|
### Container Management
|
||
|
|
|
||
|
|
```bash
|
||
|
|
modelgrid container add # Add a new container
|
||
|
|
modelgrid container remove # Remove a container
|
||
|
|
modelgrid container list # List all containers
|
||
|
|
modelgrid container start # Start a container
|
||
|
|
modelgrid container stop # Stop a container
|
||
|
|
```
|
||
|
|
|
||
|
|
### Model Management
|
||
|
|
|
||
|
|
```bash
|
||
|
|
modelgrid model list # List available/loaded models
|
||
|
|
modelgrid model pull <name> # Pull a model
|
||
|
|
modelgrid model remove <name> # Remove a model
|
||
|
|
```
|
||
|
|
|
||
|
|
### Configuration
|
||
|
|
|
||
|
|
```bash
|
||
|
|
modelgrid config show # Display current configuration
|
||
|
|
modelgrid config init # Initialize configuration
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
Configuration is stored at `/etc/modelgrid/config.json`:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"version": "1.0",
|
||
|
|
"api": {
|
||
|
|
"port": 8080,
|
||
|
|
"host": "0.0.0.0",
|
||
|
|
"apiKeys": ["your-api-key-here"]
|
||
|
|
},
|
||
|
|
"docker": {
|
||
|
|
"networkName": "modelgrid",
|
||
|
|
"runtime": "docker"
|
||
|
|
},
|
||
|
|
"gpus": {
|
||
|
|
"autoDetect": true,
|
||
|
|
"assignments": {}
|
||
|
|
},
|
||
|
|
"containers": [],
|
||
|
|
"models": {
|
||
|
|
"greenlistUrl": "https://code.foss.global/modelgrid.com/model_lists/raw/branch/main/greenlit.json",
|
||
|
|
"autoPull": true,
|
||
|
|
"defaultContainer": "ollama",
|
||
|
|
"autoLoad": []
|
||
|
|
},
|
||
|
|
"checkInterval": 30000
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Supported Container Types
|
||
|
|
|
||
|
|
### Ollama
|
||
|
|
|
||
|
|
Best for general-purpose model serving with easy model management.
|
||
|
|
|
||
|
|
```bash
|
||
|
|
modelgrid container add --type ollama --gpu gpu-0
|
||
|
|
```
|
||
|
|
|
||
|
|
### vLLM
|
||
|
|
|
||
|
|
High-performance serving for large models with tensor parallelism.
|
||
|
|
|
||
|
|
```bash
|
||
|
|
modelgrid container add --type vllm --gpu gpu-0,gpu-1
|
||
|
|
```
|
||
|
|
|
||
|
|
### TGI (Text Generation Inference)
|
||
|
|
|
||
|
|
HuggingFace's production-ready inference server.
|
||
|
|
|
||
|
|
```bash
|
||
|
|
modelgrid container add --type tgi --gpu gpu-0
|
||
|
|
```
|
||
|
|
|
||
|
|
## GPU Support
|
||
|
|
|
||
|
|
### NVIDIA (CUDA)
|
||
|
|
|
||
|
|
Requires NVIDIA drivers and NVIDIA Container Toolkit:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check driver status
|
||
|
|
modelgrid gpu drivers
|
||
|
|
|
||
|
|
# Install if needed (Ubuntu/Debian)
|
||
|
|
sudo apt install nvidia-driver-535 nvidia-container-toolkit
|
||
|
|
```
|
||
|
|
|
||
|
|
### AMD (ROCm)
|
||
|
|
|
||
|
|
Requires ROCm drivers:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check driver status
|
||
|
|
modelgrid gpu drivers
|
||
|
|
```
|
||
|
|
|
||
|
|
### Intel Arc (oneAPI)
|
||
|
|
|
||
|
|
Requires Intel GPU drivers and oneAPI toolkit:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check driver status
|
||
|
|
modelgrid gpu drivers
|
||
|
|
```
|
||
|
|
|
||
|
|
## Greenlit Models
|
||
|
|
|
||
|
|
ModelGrid uses a greenlit model system to control which models can be auto-pulled. The greenlist is fetched from a configurable URL and contains approved models with VRAM requirements:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"version": "1.0",
|
||
|
|
"models": [
|
||
|
|
{ "name": "llama3:8b", "container": "ollama", "minVram": 8 },
|
||
|
|
{ "name": "mistral:7b", "container": "ollama", "minVram": 8 },
|
||
|
|
{ "name": "llama3:70b", "container": "vllm", "minVram": 48 }
|
||
|
|
]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
When a request comes in for a model not currently loaded:
|
||
|
|
1. Check if model is in the greenlist
|
||
|
|
2. Verify VRAM requirements can be met
|
||
|
|
3. Auto-pull and load the model
|
||
|
|
4. Serve the request
|
||
|
|
|
||
|
|
## API Reference
|
||
|
|
|
||
|
|
### Chat Completions
|
||
|
|
|
||
|
|
```
|
||
|
|
POST /v1/chat/completions
|
||
|
|
```
|
||
|
|
|
||
|
|
OpenAI-compatible chat completion endpoint with streaming support.
|
||
|
|
|
||
|
|
### Models
|
||
|
|
|
||
|
|
```
|
||
|
|
GET /v1/models
|
||
|
|
GET /v1/models/:model
|
||
|
|
```
|
||
|
|
|
||
|
|
List available models or get details for a specific model.
|
||
|
|
|
||
|
|
### Embeddings
|
||
|
|
|
||
|
|
```
|
||
|
|
POST /v1/embeddings
|
||
|
|
```
|
||
|
|
|
||
|
|
Generate text embeddings using compatible models.
|
||
|
|
|
||
|
|
## Development
|
||
|
|
|
||
|
|
### Building from Source
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Clone repository
|
||
|
|
git clone https://code.foss.global/modelgrid.com/modelgrid.git
|
||
|
|
cd modelgrid
|
||
|
|
|
||
|
|
# Run directly with Deno
|
||
|
|
deno run --allow-all mod.ts help
|
||
|
|
|
||
|
|
# Compile for current platform
|
||
|
|
deno compile --allow-all --output modelgrid mod.ts
|
||
|
|
|
||
|
|
# Compile for all platforms
|
||
|
|
bash scripts/compile-all.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
### Project Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
modelgrid/
|
||
|
|
├── mod.ts # Entry point
|
||
|
|
├── ts/
|
||
|
|
│ ├── cli.ts # CLI command routing
|
||
|
|
│ ├── modelgrid.ts # Main coordinator class
|
||
|
|
│ ├── daemon.ts # Background daemon
|
||
|
|
│ ├── systemd.ts # Systemd service management
|
||
|
|
│ ├── constants.ts # Configuration constants
|
||
|
|
│ ├── interfaces/ # TypeScript interfaces
|
||
|
|
│ ├── hardware/ # GPU detection
|
||
|
|
│ ├── drivers/ # Driver management
|
||
|
|
│ ├── docker/ # Docker management
|
||
|
|
│ ├── containers/ # Container orchestration
|
||
|
|
│ ├── api/ # OpenAI-compatible API
|
||
|
|
│ ├── models/ # Model management
|
||
|
|
│ └── cli/ # CLI handlers
|
||
|
|
├── test/ # Test files
|
||
|
|
└── scripts/ # Build scripts
|
||
|
|
```
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
MIT License - See [license](./license) for details.
|
||
|
|
|
||
|
|
## Links
|
||
|
|
|
||
|
|
- Repository: https://code.foss.global/modelgrid.com/modelgrid
|
||
|
|
- Issues: https://community.foss.global/
|