157 lines
6.7 KiB
Markdown
157 lines
6.7 KiB
Markdown
|
|
# ModelGrid Project Hints
|
||
|
|
|
||
|
|
## Project Overview
|
||
|
|
|
||
|
|
ModelGrid is a root-level daemon that manages GPU infrastructure, Docker, and AI model containers (Ollama, vLLM, TGI) with an OpenAI-compatible API interface.
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
||
|
|
│ ModelGrid Daemon │
|
||
|
|
├─────────────────────────────────────────────────────────────────┤
|
||
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||
|
|
│ │ CLI │ │ Hardware │ │ Container Manager │ │
|
||
|
|
│ │ Commands │ │ Detection │ │ (Docker/Podman) │ │
|
||
|
|
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||
|
|
│ │ Driver │ │ Model │ │ OpenAI API Gateway │ │
|
||
|
|
│ │ Installer │ │ Registry │ │ (HTTP Server) │ │
|
||
|
|
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||
|
|
├─────────────────────────────────────────────────────────────────┤
|
||
|
|
│ Systemd Service │
|
||
|
|
└─────────────────────────────────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
## File Organization
|
||
|
|
|
||
|
|
```
|
||
|
|
ts/
|
||
|
|
├── index.ts # Node.js entry point
|
||
|
|
├── cli.ts # CLI router
|
||
|
|
├── modelgrid.ts # Main coordinator (facade)
|
||
|
|
├── daemon.ts # Background daemon
|
||
|
|
├── systemd.ts # Systemd integration
|
||
|
|
├── constants.ts # Configuration constants
|
||
|
|
├── logger.ts # Logging utilities
|
||
|
|
├── colors.ts # Color themes
|
||
|
|
├── interfaces/ # TypeScript interfaces
|
||
|
|
│ ├── config.ts # IModelGridConfig
|
||
|
|
│ ├── gpu.ts # IGpuInfo, IGpuStatus
|
||
|
|
│ ├── container.ts # IContainerConfig, IContainerStatus
|
||
|
|
│ └── api.ts # OpenAI API types
|
||
|
|
├── hardware/ # Hardware detection
|
||
|
|
│ ├── gpu-detector.ts # Detect GPUs (NVIDIA, AMD, Intel)
|
||
|
|
│ └── system-info.ts # CPU, RAM info
|
||
|
|
├── drivers/ # Driver management
|
||
|
|
│ ├── nvidia.ts # NVIDIA driver + CUDA
|
||
|
|
│ ├── amd.ts # AMD driver + ROCm
|
||
|
|
│ ├── intel.ts # Intel Arc + oneAPI
|
||
|
|
│ └── driver-manager.ts # Driver orchestrator
|
||
|
|
├── docker/ # Docker management
|
||
|
|
│ ├── docker-manager.ts # Docker setup
|
||
|
|
│ └── container-runtime.ts # Container lifecycle
|
||
|
|
├── containers/ # AI container management
|
||
|
|
│ ├── ollama.ts # Ollama container
|
||
|
|
│ ├── vllm.ts # vLLM container
|
||
|
|
│ ├── tgi.ts # TGI container
|
||
|
|
│ └── container-manager.ts # Orchestrator
|
||
|
|
├── models/ # Model management
|
||
|
|
│ ├── registry.ts # Greenlit model registry
|
||
|
|
│ └── loader.ts # Model loading with VRAM checks
|
||
|
|
├── api/ # OpenAI-compatible API
|
||
|
|
│ ├── server.ts # HTTP server
|
||
|
|
│ ├── router.ts # Request routing
|
||
|
|
│ ├── handlers/ # API endpoint handlers
|
||
|
|
│ │ ├── chat.ts # /v1/chat/completions
|
||
|
|
│ │ ├── models.ts # /v1/models
|
||
|
|
│ │ └── embeddings.ts # /v1/embeddings
|
||
|
|
│ └── middleware/ # Request processing
|
||
|
|
│ ├── auth.ts # API key validation
|
||
|
|
│ └── sanity.ts # Request validation
|
||
|
|
├── cli/ # CLI handlers
|
||
|
|
│ ├── service-handler.ts
|
||
|
|
│ ├── gpu-handler.ts
|
||
|
|
│ ├── container-handler.ts
|
||
|
|
│ ├── model-handler.ts
|
||
|
|
│ └── config-handler.ts
|
||
|
|
└── helpers/ # Utilities
|
||
|
|
├── prompt.ts # Readline utility
|
||
|
|
└── shortid.ts # ID generation
|
||
|
|
```
|
||
|
|
|
||
|
|
## Key Concepts
|
||
|
|
|
||
|
|
### Greenlit Model System
|
||
|
|
- Only pre-approved models can be auto-pulled for security
|
||
|
|
- Greenlist fetched from remote URL (configurable)
|
||
|
|
- VRAM requirements checked before loading
|
||
|
|
|
||
|
|
### Container Types
|
||
|
|
- **Ollama**: Easy to use, native API converted to OpenAI format
|
||
|
|
- **vLLM**: High performance, natively OpenAI-compatible
|
||
|
|
- **TGI**: HuggingFace Text Generation Inference
|
||
|
|
|
||
|
|
### GPU Support
|
||
|
|
- NVIDIA: nvidia-smi, CUDA, nvidia-docker2
|
||
|
|
- AMD: rocm-smi, ROCm
|
||
|
|
- Intel Arc: xpu-smi, oneAPI
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
Config file: `/etc/modelgrid/config.json`
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
interface IModelGridConfig {
|
||
|
|
version: string;
|
||
|
|
api: {
|
||
|
|
port: number; // Default: 8080
|
||
|
|
host: string; // Default: '0.0.0.0'
|
||
|
|
apiKeys: string[]; // Valid API keys
|
||
|
|
cors: boolean;
|
||
|
|
corsOrigins: string[];
|
||
|
|
};
|
||
|
|
docker: {
|
||
|
|
networkName: string; // Default: 'modelgrid'
|
||
|
|
runtime: 'docker' | 'podman';
|
||
|
|
};
|
||
|
|
gpus: {
|
||
|
|
autoDetect: boolean;
|
||
|
|
assignments: Record<string, string>;
|
||
|
|
};
|
||
|
|
containers: IContainerConfig[];
|
||
|
|
models: {
|
||
|
|
greenlistUrl: string;
|
||
|
|
autoPull: boolean;
|
||
|
|
defaultContainer: string;
|
||
|
|
autoLoad: string[];
|
||
|
|
};
|
||
|
|
checkInterval: number;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## CLI Commands
|
||
|
|
|
||
|
|
```bash
|
||
|
|
modelgrid service enable/disable/start/stop/status/logs
|
||
|
|
modelgrid gpu list/status/drivers/install
|
||
|
|
modelgrid container list/add/remove/start/stop/logs
|
||
|
|
modelgrid model list/pull/remove/status/refresh
|
||
|
|
modelgrid config show/init/apikey
|
||
|
|
```
|
||
|
|
|
||
|
|
## API Endpoints
|
||
|
|
|
||
|
|
- `POST /v1/chat/completions` - Chat completion (OpenAI-compatible)
|
||
|
|
- `GET /v1/models` - List available models
|
||
|
|
- `POST /v1/embeddings` - Generate embeddings
|
||
|
|
- `GET /health` - Health check
|
||
|
|
- `GET /metrics` - Prometheus metrics
|
||
|
|
|
||
|
|
## Development Notes
|
||
|
|
|
||
|
|
- All async patterns preferred for flexibility
|
||
|
|
- Use `fs.promises` instead of sync methods
|
||
|
|
- Containers auto-start on daemon startup
|
||
|
|
- Models auto-preload if configured
|