# @host.today/ht-docker-ai Docker images for AI vision-language models, starting with MiniCPM-V 4.5. ## Overview This project provides ready-to-use Docker containers for running state-of-the-art AI vision-language models. Built on Ollama for simplified model management and a consistent REST API. ## Available Images | Tag | Description | Requirements | |-----|-------------|--------------| | `minicpm45v` | MiniCPM-V 4.5 with GPU support | NVIDIA GPU, 9-18GB VRAM | | `minicpm45v-cpu` | MiniCPM-V 4.5 CPU-only | 8GB+ RAM | | `latest` | Alias for `minicpm45v` | NVIDIA GPU | ## Quick Start ### GPU (Recommended) ```bash docker run -d \ --name minicpm \ --gpus all \ -p 11434:11434 \ -v ollama-data:/root/.ollama \ code.foss.global/host.today/ht-docker-ai:minicpm45v ``` ### CPU Only ```bash docker run -d \ --name minicpm \ -p 11434:11434 \ -v ollama-data:/root/.ollama \ code.foss.global/host.today/ht-docker-ai:minicpm45v-cpu ``` ## API Usage The container exposes the Ollama API on port 11434. ### List Available Models ```bash curl http://localhost:11434/api/tags ``` ### Generate Text from Image ```bash curl http://localhost:11434/api/generate -d '{ "model": "minicpm-v", "prompt": "What do you see in this image?", "images": [""] }' ``` ### Chat with Vision ```bash curl http://localhost:11434/api/chat -d '{ "model": "minicpm-v", "messages": [ { "role": "user", "content": "Describe this image in detail", "images": [""] } ] }' ``` ## Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `MODEL_NAME` | `minicpm-v` | Model to pull on startup | | `OLLAMA_HOST` | `0.0.0.0` | Host address for API | | `OLLAMA_ORIGINS` | `*` | Allowed CORS origins | ## Hardware Requirements ### GPU Variant (`minicpm45v`) - NVIDIA GPU with CUDA support - Minimum 9GB VRAM (int4 quantized) - Recommended 18GB VRAM (full precision) - NVIDIA Container Toolkit installed ### CPU Variant (`minicpm45v-cpu`) - Minimum 8GB RAM - Recommended 16GB+ RAM for better performance - No GPU required ## Model Information **MiniCPM-V 4.5** is a GPT-4o level multimodal large language model developed by OpenBMB. - **Parameters**: 8B (Qwen3-8B + SigLIP2-400M) - **Capabilities**: Image understanding, OCR, multi-image analysis - **Languages**: 30+ languages including English, Chinese, French, Spanish ## Docker Compose Example ```yaml version: '3.8' services: minicpm: image: code.foss.global/host.today/ht-docker-ai:minicpm45v container_name: minicpm ports: - "11434:11434" volumes: - ollama-data:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] restart: unless-stopped volumes: ollama-data: ``` ## Building Locally ```bash # Clone the repository git clone https://code.foss.global/host.today/ht-docker-ai.git cd ht-docker-ai # Build all images ./build-images.sh # Run tests ./test-images.sh ``` ## License MIT - Task Venture Capital GmbH