# ModelGrid Implementation Plan **Goal**: GPU infrastructure management daemon with OpenAI-compatible API for AI model containers. --- ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ ModelGrid Daemon │ ├─────────────────────────────────────────────────────────────────┤ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ CLI │ │ Hardware │ │ Container Manager │ │ │ │ Commands │ │ Detection │ │ (Docker/Podman) │ │ │ └─────────────┘ └─────────────┘ └─────────────────────────┘ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ Driver │ │ Model │ │ OpenAI API Gateway │ │ │ │ Installer │ │ Registry │ │ (HTTP Server) │ │ │ └─────────────┘ └─────────────┘ └─────────────────────────┘ │ ├─────────────────────────────────────────────────────────────────┤ │ Systemd Service │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Container Runtime │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Ollama │ │ vLLM │ │ TGI │ │ Custom │ │ │ │Container │ │Container │ │Container │ │Container │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## Implementation Status ### Completed Components - [x] Project structure and configuration (deno.json, package.json) - [x] TypeScript interfaces (ts/interfaces/) - [x] Logger and colors (ts/logger.ts, ts/colors.ts) - [x] Helper utilities (ts/helpers/) - [x] Constants (ts/constants.ts) - [x] Hardware detection (ts/hardware/) - [x] Driver management (ts/drivers/) - [x] Docker management (ts/docker/) - [x] Container orchestration (ts/containers/) - [x] Model management (ts/models/) - [x] OpenAI-compatible API (ts/api/) - [x] CLI router and handlers (ts/cli.ts, ts/cli/) - [x] Main coordinator (ts/modelgrid.ts) - [x] Daemon (ts/daemon.ts) - [x] Systemd integration (ts/systemd.ts) - [x] Build scripts (scripts/) - [x] Installation scripts (install.sh, uninstall.sh) - [x] CI/CD workflows (.gitea/workflows/) - [x] npm packaging (package.json, bin/, scripts/) ### Pending Tasks - [ ] Integration testing with real GPUs - [ ] End-to-end API testing - [ ] Documentation improvements - [ ] First release (v1.0.0) --- ## Directory Structure ``` modelgrid/ ├── mod.ts # Deno entry point ├── ts/ │ ├── index.ts # Node.js entry point │ ├── cli.ts # CLI router │ ├── modelgrid.ts # Main coordinator │ ├── daemon.ts # Background daemon │ ├── systemd.ts # Systemd integration │ ├── constants.ts # Configuration constants │ ├── logger.ts # Logging utilities │ ├── colors.ts # Color themes │ ├── interfaces/ # TypeScript interfaces │ │ ├── index.ts │ │ ├── config.ts # IModelGridConfig │ │ ├── gpu.ts # IGpuInfo, IGpuStatus │ │ ├── container.ts # IContainerConfig, IContainerStatus │ │ └── api.ts # OpenAI API types │ ├── hardware/ # Hardware detection │ │ ├── index.ts │ │ ├── gpu-detector.ts # Multi-vendor GPU detection │ │ └── system-info.ts # System information │ ├── drivers/ # Driver management │ │ ├── index.ts │ │ ├── nvidia.ts # NVIDIA/CUDA │ │ ├── amd.ts # AMD/ROCm │ │ ├── intel.ts # Intel Arc/oneAPI │ │ └── base-driver.ts # Abstract driver class │ ├── docker/ # Docker management │ │ ├── index.ts │ │ ├── docker-manager.ts # Docker operations │ │ └── container-runtime.ts │ ├── containers/ # Container orchestration │ │ ├── index.ts │ │ ├── ollama.ts # Ollama container │ │ ├── vllm.ts # vLLM container │ │ ├── tgi.ts # TGI container │ │ └── base-container.ts # Abstract container class │ ├── api/ # OpenAI-compatible API │ │ ├── index.ts │ │ ├── server.ts # HTTP server │ │ ├── router.ts # Request routing │ │ ├── handlers/ # Endpoint handlers │ │ │ ├── chat.ts # /v1/chat/completions │ │ │ ├── models.ts # /v1/models │ │ │ └── embeddings.ts # /v1/embeddings │ │ └── middleware/ # Request processing │ │ ├── auth.ts # API key validation │ │ ├── sanity.ts # Request validation │ │ └── proxy.ts # Container proxy │ ├── models/ # Model management │ │ ├── index.ts │ │ ├── registry.ts # Model registry │ │ └── loader.ts # Model loading │ └── cli/ # CLI handlers │ ├── service-handler.ts │ ├── gpu-handler.ts │ ├── container-handler.ts │ ├── model-handler.ts │ └── config-handler.ts ├── test/ # Test files ├── scripts/ # Build scripts ├── bin/ # npm wrapper └── docs/ # Documentation ``` --- ## CLI Commands ``` modelgrid service enable # Install systemd service modelgrid service disable # Remove systemd service modelgrid service start # Start daemon modelgrid service stop # Stop daemon modelgrid service status # Show status modelgrid service logs # Show logs modelgrid gpu list # List detected GPUs modelgrid gpu status # Show GPU utilization modelgrid gpu drivers # Check/install drivers modelgrid container add # Add container config modelgrid container remove # Remove container modelgrid container list # List containers modelgrid container start # Start container modelgrid container stop # Stop container modelgrid model list # List available models modelgrid model pull # Pull model modelgrid model remove # Remove model modelgrid config show # Show configuration modelgrid config init # Initialize configuration ``` --- ## API Endpoints - `GET /v1/models` - List available models - `GET /v1/models/:model` - Get model details - `POST /v1/chat/completions` - Chat completions (streaming supported) - `POST /v1/embeddings` - Generate embeddings --- ## Greenlit Model System Models are controlled via a remote greenlist to prevent arbitrary downloads: ```json { "version": "1.0", "models": [ { "name": "llama3:8b", "container": "ollama", "minVram": 8 }, { "name": "mistral:7b", "container": "ollama", "minVram": 8 }, { "name": "llama3:70b", "container": "vllm", "minVram": 48 } ] } ``` --- ## Supported Platforms - Linux x64 (x86_64) - Linux ARM64 (aarch64) - macOS Intel (x86_64) - macOS Apple Silicon (ARM64) - Windows x64