Files
modelgrid/readme.hints.md

157 lines
6.7 KiB
Markdown
Raw Normal View History

2026-01-30 03:16:57 +00:00
# ModelGrid Project Hints
## Project Overview
ModelGrid is a root-level daemon that manages GPU infrastructure, Docker, and AI model containers (Ollama, vLLM, TGI) with an OpenAI-compatible API interface.
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ ModelGrid Daemon │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ CLI │ │ Hardware │ │ Container Manager │ │
│ │ Commands │ │ Detection │ │ (Docker/Podman) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Driver │ │ Model │ │ OpenAI API Gateway │ │
│ │ Installer │ │ Registry │ │ (HTTP Server) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Systemd Service │
└─────────────────────────────────────────────────────────────────┘
```
## File Organization
```
ts/
├── index.ts # Node.js entry point
├── cli.ts # CLI router
├── modelgrid.ts # Main coordinator (facade)
├── daemon.ts # Background daemon
├── systemd.ts # Systemd integration
├── constants.ts # Configuration constants
├── logger.ts # Logging utilities
├── colors.ts # Color themes
├── interfaces/ # TypeScript interfaces
│ ├── config.ts # IModelGridConfig
│ ├── gpu.ts # IGpuInfo, IGpuStatus
│ ├── container.ts # IContainerConfig, IContainerStatus
│ └── api.ts # OpenAI API types
├── hardware/ # Hardware detection
│ ├── gpu-detector.ts # Detect GPUs (NVIDIA, AMD, Intel)
│ └── system-info.ts # CPU, RAM info
├── drivers/ # Driver management
│ ├── nvidia.ts # NVIDIA driver + CUDA
│ ├── amd.ts # AMD driver + ROCm
│ ├── intel.ts # Intel Arc + oneAPI
│ └── driver-manager.ts # Driver orchestrator
├── docker/ # Docker management
│ ├── docker-manager.ts # Docker setup
│ └── container-runtime.ts # Container lifecycle
├── containers/ # AI container management
│ ├── ollama.ts # Ollama container
│ ├── vllm.ts # vLLM container
│ ├── tgi.ts # TGI container
│ └── container-manager.ts # Orchestrator
├── models/ # Model management
│ ├── registry.ts # Greenlit model registry
│ └── loader.ts # Model loading with VRAM checks
├── api/ # OpenAI-compatible API
│ ├── server.ts # HTTP server
│ ├── router.ts # Request routing
│ ├── handlers/ # API endpoint handlers
│ │ ├── chat.ts # /v1/chat/completions
│ │ ├── models.ts # /v1/models
│ │ └── embeddings.ts # /v1/embeddings
│ └── middleware/ # Request processing
│ ├── auth.ts # API key validation
│ └── sanity.ts # Request validation
├── cli/ # CLI handlers
│ ├── service-handler.ts
│ ├── gpu-handler.ts
│ ├── container-handler.ts
│ ├── model-handler.ts
│ └── config-handler.ts
└── helpers/ # Utilities
├── prompt.ts # Readline utility
└── shortid.ts # ID generation
```
## Key Concepts
### Greenlit Model System
- Only pre-approved models can be auto-pulled for security
- Greenlist fetched from remote URL (configurable)
- VRAM requirements checked before loading
### Container Types
- **Ollama**: Easy to use, native API converted to OpenAI format
- **vLLM**: High performance, natively OpenAI-compatible
- **TGI**: HuggingFace Text Generation Inference
### GPU Support
- NVIDIA: nvidia-smi, CUDA, nvidia-docker2
- AMD: rocm-smi, ROCm
- Intel Arc: xpu-smi, oneAPI
## Configuration
Config file: `/etc/modelgrid/config.json`
```typescript
interface IModelGridConfig {
version: string;
api: {
port: number; // Default: 8080
host: string; // Default: '0.0.0.0'
apiKeys: string[]; // Valid API keys
cors: boolean;
corsOrigins: string[];
};
docker: {
networkName: string; // Default: 'modelgrid'
runtime: 'docker' | 'podman';
};
gpus: {
autoDetect: boolean;
assignments: Record<string, string>;
};
containers: IContainerConfig[];
models: {
greenlistUrl: string;
autoPull: boolean;
defaultContainer: string;
autoLoad: string[];
};
checkInterval: number;
}
```
## CLI Commands
```bash
modelgrid service enable/disable/start/stop/status/logs
modelgrid gpu list/status/drivers/install
modelgrid container list/add/remove/start/stop/logs
modelgrid model list/pull/remove/status/refresh
modelgrid config show/init/apikey
```
## API Endpoints
- `POST /v1/chat/completions` - Chat completion (OpenAI-compatible)
- `GET /v1/models` - List available models
- `POST /v1/embeddings` - Generate embeddings
- `GET /health` - Health check
- `GET /metrics` - Prometheus metrics
## Development Notes
- All async patterns preferred for flexibility
- Use `fs.promises` instead of sync methods
- Containers auto-start on daemon startup
- Models auto-preload if configured