Files
ht-docker-ai/readme.hints.md
2026-01-16 01:51:57 +00:00

2.9 KiB

Technical Notes - ht-docker-ai

Architecture

This project uses Ollama as the runtime framework for serving AI models. This provides:

  • Automatic model download and caching
  • Unified REST API (compatible with OpenAI format)
  • Built-in quantization support
  • GPU/CPU auto-detection

Model Details

MiniCPM-V 4.5

VRAM Usage

Mode VRAM Required
Full precision (bf16) 18GB
int4 quantized 9GB
GGUF (CPU) 8GB RAM

Container Startup Flow

  1. docker-entrypoint.sh starts Ollama server in background
  2. Waits for server to be ready
  3. Checks if model already exists in volume
  4. Pulls model if not present
  5. Keeps container running

Volume Persistence

Mount /root/.ollama to persist downloaded models:

-v ollama-data:/root/.ollama

Without this volume, the model will be re-downloaded on each container start (~5GB download).

API Endpoints

All endpoints follow the Ollama API specification:

Endpoint Method Description
/api/tags GET List available models
/api/generate POST Generate completion
/api/chat POST Chat completion
/api/pull POST Pull a model
/api/show POST Show model info

GPU Detection

The GPU variant uses Ollama's automatic GPU detection. For CPU-only mode, we set:

ENV CUDA_VISIBLE_DEVICES=""

This forces Ollama to use CPU inference even if GPU is available.

Health Checks

Both variants include Docker health checks:

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:11434/api/tags || exit 1

CPU variant has longer start-period (120s) due to slower startup.

Adding New Models

To add a new model variant:

  1. Create Dockerfile_<modelname>
  2. Set MODEL_NAME environment variable
  3. Update build-images.sh with new build target
  4. Add documentation to readme.md

Troubleshooting

Model download hangs

Check container logs:

docker logs -f <container-name>

The model download is ~5GB and may take several minutes.

Out of memory

  • GPU: Use int4 quantized version or add more VRAM
  • CPU: Increase container memory limit: --memory=16g

API not responding

  1. Check if container is healthy: docker ps
  2. Check logs for errors: docker logs <container>
  3. Verify port mapping: curl localhost:11434/api/tags

CI/CD Integration

Build and push using npmci:

npmci docker login
npmci docker build
npmci docker push code.foss.global