diff --git a/readme.hints.md b/readme.hints.md index b3589e1..c825aa1 100644 --- a/readme.hints.md +++ b/readme.hints.md @@ -3,7 +3,7 @@ ## Project Overview ModelGrid is a root-level daemon that manages GPU infrastructure, Docker, and AI model containers -(Ollama, vLLM, TGI) with an OpenAI-compatible API interface. +(vLLM, TGI) with an OpenAI-compatible API interface. ## Architecture @@ -84,13 +84,12 @@ ts/ ### Greenlit Model System -- Only pre-approved models can be auto-pulled for security -- Greenlist fetched from remote URL (configurable) +- Only catalog-listed models can be auto-deployed on demand +- Catalog fetched from a remote URL (configurable) - VRAM requirements checked before loading ### Container Types -- **Ollama**: Easy to use, native API converted to OpenAI format - **vLLM**: High performance, natively OpenAI-compatible - **TGI**: HuggingFace Text Generation Inference diff --git a/readme.plan.md b/readme.plan.md index 845a6ee..010fbe0 100644 --- a/readme.plan.md +++ b/readme.plan.md @@ -26,9 +26,9 @@ ┌─────────────────────────────────────────────────────────────────┐ │ Container Runtime │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ -│ │ Ollama │ │ vLLM │ │ TGI │ │ Custom │ │ -│ │Container │ │Container │ │Container │ │Container │ │ -│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ +│ │ vLLM │ │ TGI │ │ Custom │ │ +│ │Container │ │Container │ │Container │ │ +│ └──────────┘ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` @@ -116,8 +116,7 @@ modelgrid/ │ │ │ └── embeddings.ts # /v1/embeddings │ │ └── middleware/ # Request processing │ │ ├── auth.ts # API key validation -│ │ ├── sanity.ts # Request validation -│ │ └── proxy.ts # Container proxy +│ │ └── sanity.ts # Request validation │ ├── models/ # Model management │ │ ├── index.ts │ │ ├── registry.ts # Model registry @@ -177,7 +176,7 @@ modelgrid config init # Initialize configuration ## Greenlit Model System -Models are controlled via a remote greenlist to prevent arbitrary downloads: +Models are resolved through a remote catalog so deployments come from an explicit allowlist: ```json {