initial
This commit is contained in:
156
readme.hints.md
Normal file
156
readme.hints.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# ModelGrid Project Hints
|
||||
|
||||
## Project Overview
|
||||
|
||||
ModelGrid is a root-level daemon that manages GPU infrastructure, Docker, and AI model containers (Ollama, vLLM, TGI) with an OpenAI-compatible API interface.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ ModelGrid Daemon │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ CLI │ │ Hardware │ │ Container Manager │ │
|
||||
│ │ Commands │ │ Detection │ │ (Docker/Podman) │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ Driver │ │ Model │ │ OpenAI API Gateway │ │
|
||||
│ │ Installer │ │ Registry │ │ (HTTP Server) │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ Systemd Service │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## File Organization
|
||||
|
||||
```
|
||||
ts/
|
||||
├── index.ts # Node.js entry point
|
||||
├── cli.ts # CLI router
|
||||
├── modelgrid.ts # Main coordinator (facade)
|
||||
├── daemon.ts # Background daemon
|
||||
├── systemd.ts # Systemd integration
|
||||
├── constants.ts # Configuration constants
|
||||
├── logger.ts # Logging utilities
|
||||
├── colors.ts # Color themes
|
||||
├── interfaces/ # TypeScript interfaces
|
||||
│ ├── config.ts # IModelGridConfig
|
||||
│ ├── gpu.ts # IGpuInfo, IGpuStatus
|
||||
│ ├── container.ts # IContainerConfig, IContainerStatus
|
||||
│ └── api.ts # OpenAI API types
|
||||
├── hardware/ # Hardware detection
|
||||
│ ├── gpu-detector.ts # Detect GPUs (NVIDIA, AMD, Intel)
|
||||
│ └── system-info.ts # CPU, RAM info
|
||||
├── drivers/ # Driver management
|
||||
│ ├── nvidia.ts # NVIDIA driver + CUDA
|
||||
│ ├── amd.ts # AMD driver + ROCm
|
||||
│ ├── intel.ts # Intel Arc + oneAPI
|
||||
│ └── driver-manager.ts # Driver orchestrator
|
||||
├── docker/ # Docker management
|
||||
│ ├── docker-manager.ts # Docker setup
|
||||
│ └── container-runtime.ts # Container lifecycle
|
||||
├── containers/ # AI container management
|
||||
│ ├── ollama.ts # Ollama container
|
||||
│ ├── vllm.ts # vLLM container
|
||||
│ ├── tgi.ts # TGI container
|
||||
│ └── container-manager.ts # Orchestrator
|
||||
├── models/ # Model management
|
||||
│ ├── registry.ts # Greenlit model registry
|
||||
│ └── loader.ts # Model loading with VRAM checks
|
||||
├── api/ # OpenAI-compatible API
|
||||
│ ├── server.ts # HTTP server
|
||||
│ ├── router.ts # Request routing
|
||||
│ ├── handlers/ # API endpoint handlers
|
||||
│ │ ├── chat.ts # /v1/chat/completions
|
||||
│ │ ├── models.ts # /v1/models
|
||||
│ │ └── embeddings.ts # /v1/embeddings
|
||||
│ └── middleware/ # Request processing
|
||||
│ ├── auth.ts # API key validation
|
||||
│ └── sanity.ts # Request validation
|
||||
├── cli/ # CLI handlers
|
||||
│ ├── service-handler.ts
|
||||
│ ├── gpu-handler.ts
|
||||
│ ├── container-handler.ts
|
||||
│ ├── model-handler.ts
|
||||
│ └── config-handler.ts
|
||||
└── helpers/ # Utilities
|
||||
├── prompt.ts # Readline utility
|
||||
└── shortid.ts # ID generation
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Greenlit Model System
|
||||
- Only pre-approved models can be auto-pulled for security
|
||||
- Greenlist fetched from remote URL (configurable)
|
||||
- VRAM requirements checked before loading
|
||||
|
||||
### Container Types
|
||||
- **Ollama**: Easy to use, native API converted to OpenAI format
|
||||
- **vLLM**: High performance, natively OpenAI-compatible
|
||||
- **TGI**: HuggingFace Text Generation Inference
|
||||
|
||||
### GPU Support
|
||||
- NVIDIA: nvidia-smi, CUDA, nvidia-docker2
|
||||
- AMD: rocm-smi, ROCm
|
||||
- Intel Arc: xpu-smi, oneAPI
|
||||
|
||||
## Configuration
|
||||
|
||||
Config file: `/etc/modelgrid/config.json`
|
||||
|
||||
```typescript
|
||||
interface IModelGridConfig {
|
||||
version: string;
|
||||
api: {
|
||||
port: number; // Default: 8080
|
||||
host: string; // Default: '0.0.0.0'
|
||||
apiKeys: string[]; // Valid API keys
|
||||
cors: boolean;
|
||||
corsOrigins: string[];
|
||||
};
|
||||
docker: {
|
||||
networkName: string; // Default: 'modelgrid'
|
||||
runtime: 'docker' | 'podman';
|
||||
};
|
||||
gpus: {
|
||||
autoDetect: boolean;
|
||||
assignments: Record<string, string>;
|
||||
};
|
||||
containers: IContainerConfig[];
|
||||
models: {
|
||||
greenlistUrl: string;
|
||||
autoPull: boolean;
|
||||
defaultContainer: string;
|
||||
autoLoad: string[];
|
||||
};
|
||||
checkInterval: number;
|
||||
}
|
||||
```
|
||||
|
||||
## CLI Commands
|
||||
|
||||
```bash
|
||||
modelgrid service enable/disable/start/stop/status/logs
|
||||
modelgrid gpu list/status/drivers/install
|
||||
modelgrid container list/add/remove/start/stop/logs
|
||||
modelgrid model list/pull/remove/status/refresh
|
||||
modelgrid config show/init/apikey
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
- `POST /v1/chat/completions` - Chat completion (OpenAI-compatible)
|
||||
- `GET /v1/models` - List available models
|
||||
- `POST /v1/embeddings` - Generate embeddings
|
||||
- `GET /health` - Health check
|
||||
- `GET /metrics` - Prometheus metrics
|
||||
|
||||
## Development Notes
|
||||
|
||||
- All async patterns preferred for flexibility
|
||||
- Use `fs.promises` instead of sync methods
|
||||
- Containers auto-start on daemon startup
|
||||
- Models auto-preload if configured
|
||||
Reference in New Issue
Block a user