initial

2026-01-16 01:51:57 +00:00
commit 7d135569fe
10 changed files with 545 additions and 0 deletions
--- a/readme.hints.md
+++ b/readme.hints.md
@@ -0,0 +1,125 @@
+# Technical Notes - ht-docker-ai
+
+## Architecture
+
+This project uses **Ollama** as the runtime framework for serving AI models. This provides:
+
+- Automatic model download and caching
+- Unified REST API (compatible with OpenAI format)
+- Built-in quantization support
+- GPU/CPU auto-detection
+
+## Model Details
+
+### MiniCPM-V 4.5
+
+- **Source**: OpenBMB (https://github.com/OpenBMB/MiniCPM-V)
+- **Base Models**: Qwen3-8B + SigLIP2-400M
+- **Total Parameters**: 8B
+- **Ollama Model Name**: `minicpm-v`
+
+### VRAM Usage
+
+| Mode | VRAM Required |
+|------|---------------|
+| Full precision (bf16) | 18GB |
+| int4 quantized | 9GB |
+| GGUF (CPU) | 8GB RAM |
+
+## Container Startup Flow
+
+1. `docker-entrypoint.sh` starts Ollama server in background
+2. Waits for server to be ready
+3. Checks if model already exists in volume
+4. Pulls model if not present
+5. Keeps container running
+
+## Volume Persistence
+
+Mount `/root/.ollama` to persist downloaded models:
+
+```bash
+-v ollama-data:/root/.ollama
+```
+
+Without this volume, the model will be re-downloaded on each container start (~5GB download).
+
+## API Endpoints
+
+All endpoints follow the Ollama API specification:
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/tags` | GET | List available models |
+| `/api/generate` | POST | Generate completion |
+| `/api/chat` | POST | Chat completion |
+| `/api/pull` | POST | Pull a model |
+| `/api/show` | POST | Show model info |
+
+## GPU Detection
+
+The GPU variant uses Ollama's automatic GPU detection. For CPU-only mode, we set:
+
+```dockerfile
+ENV CUDA_VISIBLE_DEVICES=""
+```
+
+This forces Ollama to use CPU inference even if GPU is available.
+
+## Health Checks
+
+Both variants include Docker health checks:
+
+```dockerfile
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD curl -f http://localhost:11434/api/tags || exit 1
+```
+
+CPU variant has longer `start-period` (120s) due to slower startup.
+
+## Adding New Models
+
+To add a new model variant:
+
+1. Create `Dockerfile_<modelname>`
+2. Set `MODEL_NAME` environment variable
+3. Update `build-images.sh` with new build target
+4. Add documentation to `readme.md`
+
+## Troubleshooting
+
+### Model download hangs
+
+Check container logs:
+```bash
+docker logs -f <container-name>
+```
+
+The model download is ~5GB and may take several minutes.
+
+### Out of memory
+
+- GPU: Use int4 quantized version or add more VRAM
+- CPU: Increase container memory limit: `--memory=16g`
+
+### API not responding
+
+1. Check if container is healthy: `docker ps`
+2. Check logs for errors: `docker logs <container>`
+3. Verify port mapping: `curl localhost:11434/api/tags`
+
+## CI/CD Integration
+
+Build and push using npmci:
+
+```bash
+npmci docker login
+npmci docker build
+npmci docker push code.foss.global
+```
+
+## Related Resources
+
+- [Ollama Documentation](https://ollama.ai/docs)
+- [MiniCPM-V GitHub](https://github.com/OpenBMB/MiniCPM-V)
+- [Ollama API Reference](https://github.com/ollama/ollama/blob/main/docs/api.md)