2026-01-16 01:51:57 +00:00
|
|
|
# Technical Notes - ht-docker-ai
|
|
|
|
|
|
|
|
|
|
## Architecture
|
|
|
|
|
|
|
|
|
|
This project uses **Ollama** as the runtime framework for serving AI models. This provides:
|
|
|
|
|
|
|
|
|
|
- Automatic model download and caching
|
|
|
|
|
- Unified REST API (compatible with OpenAI format)
|
|
|
|
|
- Built-in quantization support
|
|
|
|
|
- GPU/CPU auto-detection
|
|
|
|
|
|
|
|
|
|
## Model Details
|
|
|
|
|
|
|
|
|
|
### MiniCPM-V 4.5
|
|
|
|
|
|
|
|
|
|
- **Source**: OpenBMB (https://github.com/OpenBMB/MiniCPM-V)
|
|
|
|
|
- **Base Models**: Qwen3-8B + SigLIP2-400M
|
|
|
|
|
- **Total Parameters**: 8B
|
|
|
|
|
- **Ollama Model Name**: `minicpm-v`
|
|
|
|
|
|
|
|
|
|
### VRAM Usage
|
|
|
|
|
|
|
|
|
|
| Mode | VRAM Required |
|
|
|
|
|
|------|---------------|
|
|
|
|
|
| Full precision (bf16) | 18GB |
|
|
|
|
|
| int4 quantized | 9GB |
|
|
|
|
|
| GGUF (CPU) | 8GB RAM |
|
|
|
|
|
|
|
|
|
|
## Container Startup Flow
|
|
|
|
|
|
|
|
|
|
1. `docker-entrypoint.sh` starts Ollama server in background
|
|
|
|
|
2. Waits for server to be ready
|
|
|
|
|
3. Checks if model already exists in volume
|
|
|
|
|
4. Pulls model if not present
|
|
|
|
|
5. Keeps container running
|
|
|
|
|
|
|
|
|
|
## Volume Persistence
|
|
|
|
|
|
|
|
|
|
Mount `/root/.ollama` to persist downloaded models:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
-v ollama-data:/root/.ollama
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Without this volume, the model will be re-downloaded on each container start (~5GB download).
|
|
|
|
|
|
|
|
|
|
## API Endpoints
|
|
|
|
|
|
|
|
|
|
All endpoints follow the Ollama API specification:
|
|
|
|
|
|
|
|
|
|
| Endpoint | Method | Description |
|
|
|
|
|
|----------|--------|-------------|
|
|
|
|
|
| `/api/tags` | GET | List available models |
|
|
|
|
|
| `/api/generate` | POST | Generate completion |
|
|
|
|
|
| `/api/chat` | POST | Chat completion |
|
|
|
|
|
| `/api/pull` | POST | Pull a model |
|
|
|
|
|
| `/api/show` | POST | Show model info |
|
|
|
|
|
|
|
|
|
|
## GPU Detection
|
|
|
|
|
|
|
|
|
|
The GPU variant uses Ollama's automatic GPU detection. For CPU-only mode, we set:
|
|
|
|
|
|
|
|
|
|
```dockerfile
|
|
|
|
|
ENV CUDA_VISIBLE_DEVICES=""
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This forces Ollama to use CPU inference even if GPU is available.
|
|
|
|
|
|
|
|
|
|
## Health Checks
|
|
|
|
|
|
|
|
|
|
Both variants include Docker health checks:
|
|
|
|
|
|
|
|
|
|
```dockerfile
|
|
|
|
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
|
|
|
|
|
CMD curl -f http://localhost:11434/api/tags || exit 1
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
CPU variant has longer `start-period` (120s) due to slower startup.
|
|
|
|
|
|
2026-01-16 13:23:01 +00:00
|
|
|
## PaddleOCR
|
|
|
|
|
|
|
|
|
|
### Overview
|
|
|
|
|
|
|
|
|
|
PaddleOCR is a standalone OCR service using PaddlePaddle's PP-OCRv4 model. It provides:
|
|
|
|
|
|
|
|
|
|
- Text detection and recognition
|
|
|
|
|
- Multi-language support
|
|
|
|
|
- FastAPI REST API
|
|
|
|
|
- GPU and CPU variants
|
|
|
|
|
|
|
|
|
|
### Docker Images
|
|
|
|
|
|
|
|
|
|
| Tag | Description |
|
|
|
|
|
|-----|-------------|
|
|
|
|
|
| `paddleocr` | GPU variant (default) |
|
|
|
|
|
| `paddleocr-gpu` | GPU variant (alias) |
|
|
|
|
|
| `paddleocr-cpu` | CPU-only variant |
|
|
|
|
|
|
|
|
|
|
### API Endpoints
|
|
|
|
|
|
|
|
|
|
| Endpoint | Method | Description |
|
|
|
|
|
|----------|--------|-------------|
|
|
|
|
|
| `/health` | GET | Health check with model info |
|
|
|
|
|
| `/ocr` | POST | OCR with base64 image (JSON body) |
|
|
|
|
|
| `/ocr/upload` | POST | OCR with file upload (multipart form) |
|
|
|
|
|
|
|
|
|
|
### Request/Response Format
|
|
|
|
|
|
|
|
|
|
**POST /ocr (JSON)**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"image": "<base64-encoded-image>",
|
|
|
|
|
"language": "en" // optional
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**POST /ocr/upload (multipart)**
|
|
|
|
|
- `img`: image file
|
|
|
|
|
- `language`: optional language code
|
|
|
|
|
|
|
|
|
|
**Response**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"success": true,
|
|
|
|
|
"results": [
|
|
|
|
|
{
|
|
|
|
|
"text": "Invoice #12345",
|
|
|
|
|
"confidence": 0.98,
|
|
|
|
|
"box": [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
|
|
|
|
|
}
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Environment Variables
|
|
|
|
|
|
|
|
|
|
| Variable | Default | Description |
|
|
|
|
|
|----------|---------|-------------|
|
|
|
|
|
| `OCR_LANGUAGE` | `en` | Default language for OCR |
|
|
|
|
|
| `SERVER_PORT` | `5000` | Server port |
|
|
|
|
|
| `SERVER_HOST` | `0.0.0.0` | Server host |
|
|
|
|
|
| `CUDA_VISIBLE_DEVICES` | (auto) | Set to `-1` for CPU-only |
|
|
|
|
|
|
|
|
|
|
### Performance
|
|
|
|
|
|
|
|
|
|
- **GPU**: ~1-3 seconds per page
|
|
|
|
|
- **CPU**: ~10-30 seconds per page
|
|
|
|
|
|
|
|
|
|
### Supported Languages
|
|
|
|
|
|
|
|
|
|
Common language codes: `en` (English), `ch` (Chinese), `de` (German), `fr` (French), `es` (Spanish), `ja` (Japanese), `ko` (Korean)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
2026-01-16 01:51:57 +00:00
|
|
|
## Adding New Models
|
|
|
|
|
|
|
|
|
|
To add a new model variant:
|
|
|
|
|
|
|
|
|
|
1. Create `Dockerfile_<modelname>`
|
|
|
|
|
2. Set `MODEL_NAME` environment variable
|
|
|
|
|
3. Update `build-images.sh` with new build target
|
|
|
|
|
4. Add documentation to `readme.md`
|
|
|
|
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
|
|
|
|
|
|
### Model download hangs
|
|
|
|
|
|
|
|
|
|
Check container logs:
|
|
|
|
|
```bash
|
|
|
|
|
docker logs -f <container-name>
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The model download is ~5GB and may take several minutes.
|
|
|
|
|
|
|
|
|
|
### Out of memory
|
|
|
|
|
|
|
|
|
|
- GPU: Use int4 quantized version or add more VRAM
|
|
|
|
|
- CPU: Increase container memory limit: `--memory=16g`
|
|
|
|
|
|
|
|
|
|
### API not responding
|
|
|
|
|
|
|
|
|
|
1. Check if container is healthy: `docker ps`
|
|
|
|
|
2. Check logs for errors: `docker logs <container>`
|
|
|
|
|
3. Verify port mapping: `curl localhost:11434/api/tags`
|
|
|
|
|
|
|
|
|
|
## CI/CD Integration
|
|
|
|
|
|
|
|
|
|
Build and push using npmci:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
npmci docker login
|
|
|
|
|
npmci docker build
|
|
|
|
|
npmci docker push code.foss.global
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Related Resources
|
|
|
|
|
|
|
|
|
|
- [Ollama Documentation](https://ollama.ai/docs)
|
|
|
|
|
- [MiniCPM-V GitHub](https://github.com/OpenBMB/MiniCPM-V)
|
|
|
|
|
- [Ollama API Reference](https://github.com/ollama/ollama/blob/main/docs/api.md)
|