Files

Juergen Kunz bec379e9ca feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows

2026-01-16 13:23:01 +00:00

4.5 KiB

Raw Blame History

Technical Notes - ht-docker-ai

Architecture

This project uses Ollama as the runtime framework for serving AI models. This provides:

Automatic model download and caching
Unified REST API (compatible with OpenAI format)
Built-in quantization support
GPU/CPU auto-detection

Model Details

MiniCPM-V 4.5

Source: OpenBMB (https://github.com/OpenBMB/MiniCPM-V)
Base Models: Qwen3-8B + SigLIP2-400M
Total Parameters: 8B
Ollama Model Name: minicpm-v

VRAM Usage

Mode	VRAM Required
Full precision (bf16)	18GB
int4 quantized	9GB
GGUF (CPU)	8GB RAM

Container Startup Flow

docker-entrypoint.sh starts Ollama server in background
Waits for server to be ready
Checks if model already exists in volume
Pulls model if not present
Keeps container running

Volume Persistence

Mount /root/.ollama to persist downloaded models:

-v ollama-data:/root/.ollama

Without this volume, the model will be re-downloaded on each container start (~5GB download).

API Endpoints

All endpoints follow the Ollama API specification:

Endpoint	Method	Description
`/api/tags`	GET	List available models
`/api/generate`	POST	Generate completion
`/api/chat`	POST	Chat completion
`/api/pull`	POST	Pull a model
`/api/show`	POST	Show model info

GPU Detection

The GPU variant uses Ollama's automatic GPU detection. For CPU-only mode, we set:

ENV CUDA_VISIBLE_DEVICES=""

This forces Ollama to use CPU inference even if GPU is available.

Health Checks

Both variants include Docker health checks:

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:11434/api/tags || exit 1

CPU variant has longer start-period (120s) due to slower startup.

PaddleOCR

Overview

PaddleOCR is a standalone OCR service using PaddlePaddle's PP-OCRv4 model. It provides:

Text detection and recognition
Multi-language support
FastAPI REST API
GPU and CPU variants

Docker Images

Tag	Description
`paddleocr`	GPU variant (default)
`paddleocr-gpu`	GPU variant (alias)
`paddleocr-cpu`	CPU-only variant

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check with model info
`/ocr`	POST	OCR with base64 image (JSON body)
`/ocr/upload`	POST	OCR with file upload (multipart form)

Request/Response Format

POST /ocr (JSON)

{
  "image": "<base64-encoded-image>",
  "language": "en"  // optional
}

POST /ocr/upload (multipart)

img: image file
language: optional language code

Response

{
  "success": true,
  "results": [
    {
      "text": "Invoice #12345",
      "confidence": 0.98,
      "box": [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
    }
  ]
}

Environment Variables

Variable	Default	Description
`OCR_LANGUAGE`	`en`	Default language for OCR
`SERVER_PORT`	`5000`	Server port
`SERVER_HOST`	`0.0.0.0`	Server host
`CUDA_VISIBLE_DEVICES`	(auto)	Set to `-1` for CPU-only

Performance

GPU: ~1-3 seconds per page
CPU: ~10-30 seconds per page

Supported Languages

Common language codes: en (English), ch (Chinese), de (German), fr (French), es (Spanish), ja (Japanese), ko (Korean)

Adding New Models

To add a new model variant:

Create Dockerfile_<modelname>
Set MODEL_NAME environment variable
Update build-images.sh with new build target
Add documentation to readme.md

Troubleshooting

Model download hangs

Check container logs:

docker logs -f <container-name>

The model download is ~5GB and may take several minutes.

Out of memory

GPU: Use int4 quantized version or add more VRAM
CPU: Increase container memory limit: --memory=16g

API not responding

Check if container is healthy: docker ps
Check logs for errors: docker logs <container>
Verify port mapping: curl localhost:11434/api/tags

CI/CD Integration

Build and push using npmci:

npmci docker login
npmci docker build
npmci docker push code.foss.global

4.5 KiB Raw Blame History

Technical Notes - ht-docker-ai

Architecture

Model Details

MiniCPM-V 4.5

VRAM Usage

Container Startup Flow

Volume Persistence

API Endpoints

GPU Detection

Health Checks

PaddleOCR

Overview

Docker Images

API Endpoints

Request/Response Format

Environment Variables

Performance

Supported Languages

Adding New Models

Troubleshooting

Model download hangs

Out of memory

API not responding

CI/CD Integration

Related Resources

4.5 KiB

Raw Blame History