Files
ht-docker-ai/readme.hints.md

4.5 KiB

Technical Notes - ht-docker-ai

Architecture

This project uses Ollama as the runtime framework for serving AI models. This provides:

  • Automatic model download and caching
  • Unified REST API (compatible with OpenAI format)
  • Built-in quantization support
  • GPU/CPU auto-detection

Model Details

MiniCPM-V 4.5

VRAM Usage

Mode VRAM Required
Full precision (bf16) 18GB
int4 quantized 9GB
GGUF (CPU) 8GB RAM

Container Startup Flow

  1. docker-entrypoint.sh starts Ollama server in background
  2. Waits for server to be ready
  3. Checks if model already exists in volume
  4. Pulls model if not present
  5. Keeps container running

Volume Persistence

Mount /root/.ollama to persist downloaded models:

-v ollama-data:/root/.ollama

Without this volume, the model will be re-downloaded on each container start (~5GB download).

API Endpoints

All endpoints follow the Ollama API specification:

Endpoint Method Description
/api/tags GET List available models
/api/generate POST Generate completion
/api/chat POST Chat completion
/api/pull POST Pull a model
/api/show POST Show model info

GPU Detection

The GPU variant uses Ollama's automatic GPU detection. For CPU-only mode, we set:

ENV CUDA_VISIBLE_DEVICES=""

This forces Ollama to use CPU inference even if GPU is available.

Health Checks

Both variants include Docker health checks:

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:11434/api/tags || exit 1

CPU variant has longer start-period (120s) due to slower startup.

PaddleOCR

Overview

PaddleOCR is a standalone OCR service using PaddlePaddle's PP-OCRv4 model. It provides:

  • Text detection and recognition
  • Multi-language support
  • FastAPI REST API
  • GPU and CPU variants

Docker Images

Tag Description
paddleocr GPU variant (default)
paddleocr-gpu GPU variant (alias)
paddleocr-cpu CPU-only variant

API Endpoints

Endpoint Method Description
/health GET Health check with model info
/ocr POST OCR with base64 image (JSON body)
/ocr/upload POST OCR with file upload (multipart form)

Request/Response Format

POST /ocr (JSON)

{
  "image": "<base64-encoded-image>",
  "language": "en"  // optional
}

POST /ocr/upload (multipart)

  • img: image file
  • language: optional language code

Response

{
  "success": true,
  "results": [
    {
      "text": "Invoice #12345",
      "confidence": 0.98,
      "box": [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
    }
  ]
}

Environment Variables

Variable Default Description
OCR_LANGUAGE en Default language for OCR
SERVER_PORT 5000 Server port
SERVER_HOST 0.0.0.0 Server host
CUDA_VISIBLE_DEVICES (auto) Set to -1 for CPU-only

Performance

  • GPU: ~1-3 seconds per page
  • CPU: ~10-30 seconds per page

Supported Languages

Common language codes: en (English), ch (Chinese), de (German), fr (French), es (Spanish), ja (Japanese), ko (Korean)


Adding New Models

To add a new model variant:

  1. Create Dockerfile_<modelname>
  2. Set MODEL_NAME environment variable
  3. Update build-images.sh with new build target
  4. Add documentation to readme.md

Troubleshooting

Model download hangs

Check container logs:

docker logs -f <container-name>

The model download is ~5GB and may take several minutes.

Out of memory

  • GPU: Use int4 quantized version or add more VRAM
  • CPU: Increase container memory limit: --memory=16g

API not responding

  1. Check if container is healthy: docker ps
  2. Check logs for errors: docker logs <container>
  3. Verify port mapping: curl localhost:11434/api/tags

CI/CD Integration

Build and push using npmci:

npmci docker login
npmci docker build
npmci docker push code.foss.global