This commit is contained in:
2026-01-16 01:51:57 +00:00
commit 7d135569fe
10 changed files with 545 additions and 0 deletions

125
readme.hints.md Normal file
View File

@@ -0,0 +1,125 @@
# Technical Notes - ht-docker-ai
## Architecture
This project uses **Ollama** as the runtime framework for serving AI models. This provides:
- Automatic model download and caching
- Unified REST API (compatible with OpenAI format)
- Built-in quantization support
- GPU/CPU auto-detection
## Model Details
### MiniCPM-V 4.5
- **Source**: OpenBMB (https://github.com/OpenBMB/MiniCPM-V)
- **Base Models**: Qwen3-8B + SigLIP2-400M
- **Total Parameters**: 8B
- **Ollama Model Name**: `minicpm-v`
### VRAM Usage
| Mode | VRAM Required |
|------|---------------|
| Full precision (bf16) | 18GB |
| int4 quantized | 9GB |
| GGUF (CPU) | 8GB RAM |
## Container Startup Flow
1. `docker-entrypoint.sh` starts Ollama server in background
2. Waits for server to be ready
3. Checks if model already exists in volume
4. Pulls model if not present
5. Keeps container running
## Volume Persistence
Mount `/root/.ollama` to persist downloaded models:
```bash
-v ollama-data:/root/.ollama
```
Without this volume, the model will be re-downloaded on each container start (~5GB download).
## API Endpoints
All endpoints follow the Ollama API specification:
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/tags` | GET | List available models |
| `/api/generate` | POST | Generate completion |
| `/api/chat` | POST | Chat completion |
| `/api/pull` | POST | Pull a model |
| `/api/show` | POST | Show model info |
## GPU Detection
The GPU variant uses Ollama's automatic GPU detection. For CPU-only mode, we set:
```dockerfile
ENV CUDA_VISIBLE_DEVICES=""
```
This forces Ollama to use CPU inference even if GPU is available.
## Health Checks
Both variants include Docker health checks:
```dockerfile
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:11434/api/tags || exit 1
```
CPU variant has longer `start-period` (120s) due to slower startup.
## Adding New Models
To add a new model variant:
1. Create `Dockerfile_<modelname>`
2. Set `MODEL_NAME` environment variable
3. Update `build-images.sh` with new build target
4. Add documentation to `readme.md`
## Troubleshooting
### Model download hangs
Check container logs:
```bash
docker logs -f <container-name>
```
The model download is ~5GB and may take several minutes.
### Out of memory
- GPU: Use int4 quantized version or add more VRAM
- CPU: Increase container memory limit: `--memory=16g`
### API not responding
1. Check if container is healthy: `docker ps`
2. Check logs for errors: `docker logs <container>`
3. Verify port mapping: `curl localhost:11434/api/tags`
## CI/CD Integration
Build and push using npmci:
```bash
npmci docker login
npmci docker build
npmci docker push code.foss.global
```
## Related Resources
- [Ollama Documentation](https://ollama.ai/docs)
- [MiniCPM-V GitHub](https://github.com/OpenBMB/MiniCPM-V)
- [Ollama API Reference](https://github.com/ollama/ollama/blob/main/docs/api.md)