Files
modelgrid/readme.plan.md

203 lines
9.1 KiB
Markdown
Raw Permalink Normal View History

2026-01-30 03:16:57 +00:00
# ModelGrid Implementation Plan
**Goal**: GPU infrastructure management daemon with OpenAI-compatible API for AI model containers.
---
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────┐
│ ModelGrid Daemon │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ CLI │ │ Hardware │ │ Container Manager │ │
│ │ Commands │ │ Detection │ │ (Docker/Podman) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Driver │ │ Model │ │ OpenAI API Gateway │ │
│ │ Installer │ │ Registry │ │ (HTTP Server) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Systemd Service │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Container Runtime │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Ollama │ │ vLLM │ │ TGI │ │ Custom │ │
│ │Container │ │Container │ │Container │ │Container │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
---
## Implementation Status
### Completed Components
- [x] Project structure and configuration (deno.json, package.json)
- [x] TypeScript interfaces (ts/interfaces/)
- [x] Logger and colors (ts/logger.ts, ts/colors.ts)
- [x] Helper utilities (ts/helpers/)
- [x] Constants (ts/constants.ts)
- [x] Hardware detection (ts/hardware/)
- [x] Driver management (ts/drivers/)
- [x] Docker management (ts/docker/)
- [x] Container orchestration (ts/containers/)
- [x] Model management (ts/models/)
- [x] OpenAI-compatible API (ts/api/)
- [x] CLI router and handlers (ts/cli.ts, ts/cli/)
- [x] Main coordinator (ts/modelgrid.ts)
- [x] Daemon (ts/daemon.ts)
- [x] Systemd integration (ts/systemd.ts)
- [x] Build scripts (scripts/)
- [x] Installation scripts (install.sh, uninstall.sh)
- [x] CI/CD workflows (.gitea/workflows/)
- [x] npm packaging (package.json, bin/, scripts/)
### Pending Tasks
- [ ] Integration testing with real GPUs
- [ ] End-to-end API testing
- [ ] Documentation improvements
- [ ] First release (v1.0.0)
---
## Directory Structure
```
modelgrid/
├── mod.ts # Deno entry point
├── ts/
│ ├── index.ts # Node.js entry point
│ ├── cli.ts # CLI router
│ ├── modelgrid.ts # Main coordinator
│ ├── daemon.ts # Background daemon
│ ├── systemd.ts # Systemd integration
│ ├── constants.ts # Configuration constants
│ ├── logger.ts # Logging utilities
│ ├── colors.ts # Color themes
│ ├── interfaces/ # TypeScript interfaces
│ │ ├── index.ts
│ │ ├── config.ts # IModelGridConfig
│ │ ├── gpu.ts # IGpuInfo, IGpuStatus
│ │ ├── container.ts # IContainerConfig, IContainerStatus
│ │ └── api.ts # OpenAI API types
│ ├── hardware/ # Hardware detection
│ │ ├── index.ts
│ │ ├── gpu-detector.ts # Multi-vendor GPU detection
│ │ └── system-info.ts # System information
│ ├── drivers/ # Driver management
│ │ ├── index.ts
│ │ ├── nvidia.ts # NVIDIA/CUDA
│ │ ├── amd.ts # AMD/ROCm
│ │ ├── intel.ts # Intel Arc/oneAPI
│ │ └── base-driver.ts # Abstract driver class
│ ├── docker/ # Docker management
│ │ ├── index.ts
│ │ ├── docker-manager.ts # Docker operations
│ │ └── container-runtime.ts
│ ├── containers/ # Container orchestration
│ │ ├── index.ts
│ │ ├── ollama.ts # Ollama container
│ │ ├── vllm.ts # vLLM container
│ │ ├── tgi.ts # TGI container
│ │ └── base-container.ts # Abstract container class
│ ├── api/ # OpenAI-compatible API
│ │ ├── index.ts
│ │ ├── server.ts # HTTP server
│ │ ├── router.ts # Request routing
│ │ ├── handlers/ # Endpoint handlers
│ │ │ ├── chat.ts # /v1/chat/completions
│ │ │ ├── models.ts # /v1/models
│ │ │ └── embeddings.ts # /v1/embeddings
│ │ └── middleware/ # Request processing
│ │ ├── auth.ts # API key validation
│ │ ├── sanity.ts # Request validation
│ │ └── proxy.ts # Container proxy
│ ├── models/ # Model management
│ │ ├── index.ts
│ │ ├── registry.ts # Model registry
│ │ └── loader.ts # Model loading
│ └── cli/ # CLI handlers
│ ├── service-handler.ts
│ ├── gpu-handler.ts
│ ├── container-handler.ts
│ ├── model-handler.ts
│ └── config-handler.ts
├── test/ # Test files
├── scripts/ # Build scripts
├── bin/ # npm wrapper
└── docs/ # Documentation
```
---
## CLI Commands
```
modelgrid service enable # Install systemd service
modelgrid service disable # Remove systemd service
modelgrid service start # Start daemon
modelgrid service stop # Stop daemon
modelgrid service status # Show status
modelgrid service logs # Show logs
modelgrid gpu list # List detected GPUs
modelgrid gpu status # Show GPU utilization
modelgrid gpu drivers # Check/install drivers
modelgrid container add # Add container config
modelgrid container remove # Remove container
modelgrid container list # List containers
modelgrid container start # Start container
modelgrid container stop # Stop container
modelgrid model list # List available models
modelgrid model pull <name> # Pull model
modelgrid model remove <name> # Remove model
modelgrid config show # Show configuration
modelgrid config init # Initialize configuration
```
---
## API Endpoints
- `GET /v1/models` - List available models
- `GET /v1/models/:model` - Get model details
- `POST /v1/chat/completions` - Chat completions (streaming supported)
- `POST /v1/embeddings` - Generate embeddings
---
## Greenlit Model System
Models are controlled via a remote greenlist to prevent arbitrary downloads:
```json
{
"version": "1.0",
"models": [
{ "name": "llama3:8b", "container": "ollama", "minVram": 8 },
{ "name": "mistral:7b", "container": "ollama", "minVram": 8 },
{ "name": "llama3:70b", "container": "vllm", "minVram": 48 }
]
}
```
---
## Supported Platforms
- Linux x64 (x86_64)
- Linux ARM64 (aarch64)
- macOS Intel (x86_64)
- macOS Apple Silicon (ARM64)
- Windows x64