203 lines
9.1 KiB
Markdown
203 lines
9.1 KiB
Markdown
# ModelGrid Implementation Plan
|
|
|
|
**Goal**: GPU infrastructure management daemon with OpenAI-compatible API for AI model containers.
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ ModelGrid Daemon │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
|
│ │ CLI │ │ Hardware │ │ Container Manager │ │
|
|
│ │ Commands │ │ Detection │ │ (Docker/Podman) │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
|
│ │ Driver │ │ Model │ │ OpenAI API Gateway │ │
|
|
│ │ Installer │ │ Registry │ │ (HTTP Server) │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ Systemd Service │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Container Runtime │
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
|
│ │ Ollama │ │ vLLM │ │ TGI │ │ Custom │ │
|
|
│ │Container │ │Container │ │Container │ │Container │ │
|
|
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Status
|
|
|
|
### Completed Components
|
|
|
|
- [x] Project structure and configuration (deno.json, package.json)
|
|
- [x] TypeScript interfaces (ts/interfaces/)
|
|
- [x] Logger and colors (ts/logger.ts, ts/colors.ts)
|
|
- [x] Helper utilities (ts/helpers/)
|
|
- [x] Constants (ts/constants.ts)
|
|
- [x] Hardware detection (ts/hardware/)
|
|
- [x] Driver management (ts/drivers/)
|
|
- [x] Docker management (ts/docker/)
|
|
- [x] Container orchestration (ts/containers/)
|
|
- [x] Model management (ts/models/)
|
|
- [x] OpenAI-compatible API (ts/api/)
|
|
- [x] CLI router and handlers (ts/cli.ts, ts/cli/)
|
|
- [x] Main coordinator (ts/modelgrid.ts)
|
|
- [x] Daemon (ts/daemon.ts)
|
|
- [x] Systemd integration (ts/systemd.ts)
|
|
- [x] Build scripts (scripts/)
|
|
- [x] Installation scripts (install.sh, uninstall.sh)
|
|
- [x] CI/CD workflows (.gitea/workflows/)
|
|
- [x] npm packaging (package.json, bin/, scripts/)
|
|
|
|
### Pending Tasks
|
|
|
|
- [ ] Integration testing with real GPUs
|
|
- [ ] End-to-end API testing
|
|
- [ ] Documentation improvements
|
|
- [ ] First release (v1.0.0)
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
modelgrid/
|
|
├── mod.ts # Deno entry point
|
|
├── ts/
|
|
│ ├── index.ts # Node.js entry point
|
|
│ ├── cli.ts # CLI router
|
|
│ ├── modelgrid.ts # Main coordinator
|
|
│ ├── daemon.ts # Background daemon
|
|
│ ├── systemd.ts # Systemd integration
|
|
│ ├── constants.ts # Configuration constants
|
|
│ ├── logger.ts # Logging utilities
|
|
│ ├── colors.ts # Color themes
|
|
│ ├── interfaces/ # TypeScript interfaces
|
|
│ │ ├── index.ts
|
|
│ │ ├── config.ts # IModelGridConfig
|
|
│ │ ├── gpu.ts # IGpuInfo, IGpuStatus
|
|
│ │ ├── container.ts # IContainerConfig, IContainerStatus
|
|
│ │ └── api.ts # OpenAI API types
|
|
│ ├── hardware/ # Hardware detection
|
|
│ │ ├── index.ts
|
|
│ │ ├── gpu-detector.ts # Multi-vendor GPU detection
|
|
│ │ └── system-info.ts # System information
|
|
│ ├── drivers/ # Driver management
|
|
│ │ ├── index.ts
|
|
│ │ ├── nvidia.ts # NVIDIA/CUDA
|
|
│ │ ├── amd.ts # AMD/ROCm
|
|
│ │ ├── intel.ts # Intel Arc/oneAPI
|
|
│ │ └── base-driver.ts # Abstract driver class
|
|
│ ├── docker/ # Docker management
|
|
│ │ ├── index.ts
|
|
│ │ ├── docker-manager.ts # Docker operations
|
|
│ │ └── container-runtime.ts
|
|
│ ├── containers/ # Container orchestration
|
|
│ │ ├── index.ts
|
|
│ │ ├── ollama.ts # Ollama container
|
|
│ │ ├── vllm.ts # vLLM container
|
|
│ │ ├── tgi.ts # TGI container
|
|
│ │ └── base-container.ts # Abstract container class
|
|
│ ├── api/ # OpenAI-compatible API
|
|
│ │ ├── index.ts
|
|
│ │ ├── server.ts # HTTP server
|
|
│ │ ├── router.ts # Request routing
|
|
│ │ ├── handlers/ # Endpoint handlers
|
|
│ │ │ ├── chat.ts # /v1/chat/completions
|
|
│ │ │ ├── models.ts # /v1/models
|
|
│ │ │ └── embeddings.ts # /v1/embeddings
|
|
│ │ └── middleware/ # Request processing
|
|
│ │ ├── auth.ts # API key validation
|
|
│ │ ├── sanity.ts # Request validation
|
|
│ │ └── proxy.ts # Container proxy
|
|
│ ├── models/ # Model management
|
|
│ │ ├── index.ts
|
|
│ │ ├── registry.ts # Model registry
|
|
│ │ └── loader.ts # Model loading
|
|
│ └── cli/ # CLI handlers
|
|
│ ├── service-handler.ts
|
|
│ ├── gpu-handler.ts
|
|
│ ├── container-handler.ts
|
|
│ ├── model-handler.ts
|
|
│ └── config-handler.ts
|
|
├── test/ # Test files
|
|
├── scripts/ # Build scripts
|
|
├── bin/ # npm wrapper
|
|
└── docs/ # Documentation
|
|
```
|
|
|
|
---
|
|
|
|
## CLI Commands
|
|
|
|
```
|
|
modelgrid service enable # Install systemd service
|
|
modelgrid service disable # Remove systemd service
|
|
modelgrid service start # Start daemon
|
|
modelgrid service stop # Stop daemon
|
|
modelgrid service status # Show status
|
|
modelgrid service logs # Show logs
|
|
|
|
modelgrid gpu list # List detected GPUs
|
|
modelgrid gpu status # Show GPU utilization
|
|
modelgrid gpu drivers # Check/install drivers
|
|
|
|
modelgrid container add # Add container config
|
|
modelgrid container remove # Remove container
|
|
modelgrid container list # List containers
|
|
modelgrid container start # Start container
|
|
modelgrid container stop # Stop container
|
|
|
|
modelgrid model list # List available models
|
|
modelgrid model pull <name> # Pull model
|
|
modelgrid model remove <name> # Remove model
|
|
|
|
modelgrid config show # Show configuration
|
|
modelgrid config init # Initialize configuration
|
|
```
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
- `GET /v1/models` - List available models
|
|
- `GET /v1/models/:model` - Get model details
|
|
- `POST /v1/chat/completions` - Chat completions (streaming supported)
|
|
- `POST /v1/embeddings` - Generate embeddings
|
|
|
|
---
|
|
|
|
## Greenlit Model System
|
|
|
|
Models are controlled via a remote greenlist to prevent arbitrary downloads:
|
|
|
|
```json
|
|
{
|
|
"version": "1.0",
|
|
"models": [
|
|
{ "name": "llama3:8b", "container": "ollama", "minVram": 8 },
|
|
{ "name": "mistral:7b", "container": "ollama", "minVram": 8 },
|
|
{ "name": "llama3:70b", "container": "vllm", "minVram": 48 }
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Supported Platforms
|
|
|
|
- Linux x64 (x86_64)
|
|
- Linux ARM64 (aarch64)
|
|
- macOS Intel (x86_64)
|
|
- macOS Apple Silicon (ARM64)
|
|
- Windows x64
|