# Technical Notes - ht-docker-ai ## Architecture This project uses **Ollama** as the runtime framework for serving AI models. This provides: - Automatic model download and caching - Unified REST API (compatible with OpenAI format) - Built-in quantization support - GPU/CPU auto-detection ## Model Details ### MiniCPM-V 4.5 - **Source**: OpenBMB (https://github.com/OpenBMB/MiniCPM-V) - **Base Models**: Qwen3-8B + SigLIP2-400M - **Total Parameters**: 8B - **Ollama Model Name**: `minicpm-v` ### VRAM Usage | Mode | VRAM Required | |------|---------------| | Full precision (bf16) | 18GB | | int4 quantized | 9GB | | GGUF (CPU) | 8GB RAM | ## Container Startup Flow 1. `docker-entrypoint.sh` starts Ollama server in background 2. Waits for server to be ready 3. Checks if model already exists in volume 4. Pulls model if not present 5. Keeps container running ## Volume Persistence Mount `/root/.ollama` to persist downloaded models: ```bash -v ollama-data:/root/.ollama ``` Without this volume, the model will be re-downloaded on each container start (~5GB download). ## API Endpoints All endpoints follow the Ollama API specification: | Endpoint | Method | Description | |----------|--------|-------------| | `/api/tags` | GET | List available models | | `/api/generate` | POST | Generate completion | | `/api/chat` | POST | Chat completion | | `/api/pull` | POST | Pull a model | | `/api/show` | POST | Show model info | ## GPU Detection The GPU variant uses Ollama's automatic GPU detection. For CPU-only mode, we set: ```dockerfile ENV CUDA_VISIBLE_DEVICES="" ``` This forces Ollama to use CPU inference even if GPU is available. ## Health Checks Both variants include Docker health checks: ```dockerfile HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ CMD curl -f http://localhost:11434/api/tags || exit 1 ``` CPU variant has longer `start-period` (120s) due to slower startup. ## PaddleOCR ### Overview PaddleOCR is a standalone OCR service using PaddlePaddle's PP-OCRv4 model. It provides: - Text detection and recognition - Multi-language support - FastAPI REST API - GPU and CPU variants ### Docker Images | Tag | Description | |-----|-------------| | `paddleocr` | GPU variant (default) | | `paddleocr-gpu` | GPU variant (alias) | | `paddleocr-cpu` | CPU-only variant | ### API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check with model info | | `/ocr` | POST | OCR with base64 image (JSON body) | | `/ocr/upload` | POST | OCR with file upload (multipart form) | ### Request/Response Format **POST /ocr (JSON)** ```json { "image": "", "language": "en" // optional } ``` **POST /ocr/upload (multipart)** - `img`: image file - `language`: optional language code **Response** ```json { "success": true, "results": [ { "text": "Invoice #12345", "confidence": 0.98, "box": [[x1,y1], [x2,y2], [x3,y3], [x4,y4]] } ] } ``` ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `OCR_LANGUAGE` | `en` | Default language for OCR | | `SERVER_PORT` | `5000` | Server port | | `SERVER_HOST` | `0.0.0.0` | Server host | | `CUDA_VISIBLE_DEVICES` | (auto) | Set to `-1` for CPU-only | ### Performance - **GPU**: ~1-3 seconds per page - **CPU**: ~10-30 seconds per page ### Supported Languages Common language codes: `en` (English), `ch` (Chinese), `de` (German), `fr` (French), `es` (Spanish), `ja` (Japanese), `ko` (Korean) --- ## Adding New Models To add a new model variant: 1. Create `Dockerfile_` 2. Set `MODEL_NAME` environment variable 3. Update `build-images.sh` with new build target 4. Add documentation to `readme.md` ## Troubleshooting ### Model download hangs Check container logs: ```bash docker logs -f ``` The model download is ~5GB and may take several minutes. ### Out of memory - GPU: Use int4 quantized version or add more VRAM - CPU: Increase container memory limit: `--memory=16g` ### API not responding 1. Check if container is healthy: `docker ps` 2. Check logs for errors: `docker logs ` 3. Verify port mapping: `curl localhost:11434/api/tags` ## CI/CD Integration Build and push using npmci: ```bash npmci docker login npmci docker build npmci docker push code.foss.global ``` ## Related Resources - [Ollama Documentation](https://ollama.ai/docs) - [MiniCPM-V GitHub](https://github.com/OpenBMB/MiniCPM-V) - [Ollama API Reference](https://github.com/ollama/ollama/blob/main/docs/api.md)