# @host.today/ht-docker-ai

Docker images for AI vision-language models, starting with MiniCPM-V 4.5.

## Overview

This project provides ready-to-use Docker containers for running state-of-the-art AI vision-language models. Built on Ollama for simplified model management and a consistent REST API.

## Available Images

| Tag | Description | Requirements |
|-----|-------------|--------------|
| `minicpm45v` | MiniCPM-V 4.5 with GPU support | NVIDIA GPU, 9-18GB VRAM |
| `minicpm45v-cpu` | MiniCPM-V 4.5 CPU-only | 8GB+ RAM |
| `latest` | Alias for `minicpm45v` | NVIDIA GPU |

## Quick Start

### GPU (Recommended)

```bash
docker run -d \
  --name minicpm \
  --gpus all \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  code.foss.global/host.today/ht-docker-ai:minicpm45v
```

### CPU Only

```bash
docker run -d \
  --name minicpm \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  code.foss.global/host.today/ht-docker-ai:minicpm45v-cpu
```

## API Usage

The container exposes the Ollama API on port 11434.

### List Available Models

```bash
curl http://localhost:11434/api/tags
```

### Generate Text from Image

```bash
curl http://localhost:11434/api/generate -d '{
  "model": "minicpm-v",
  "prompt": "What do you see in this image?",
  "images": ["<base64-encoded-image>"]
}'
```

### Chat with Vision

```bash
curl http://localhost:11434/api/chat -d '{
  "model": "minicpm-v",
  "messages": [
    {
      "role": "user",
      "content": "Describe this image in detail",
      "images": ["<base64-encoded-image>"]
    }
  ]
}'
```

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `MODEL_NAME` | `minicpm-v` | Model to pull on startup |
| `OLLAMA_HOST` | `0.0.0.0` | Host address for API |
| `OLLAMA_ORIGINS` | `*` | Allowed CORS origins |

## Hardware Requirements

### GPU Variant (`minicpm45v`)

- NVIDIA GPU with CUDA support
- Minimum 9GB VRAM (int4 quantized)
- Recommended 18GB VRAM (full precision)
- NVIDIA Container Toolkit installed

### CPU Variant (`minicpm45v-cpu`)

- Minimum 8GB RAM
- Recommended 16GB+ RAM for better performance
- No GPU required

## Model Information

**MiniCPM-V 4.5** is a GPT-4o level multimodal large language model developed by OpenBMB.

- **Parameters**: 8B (Qwen3-8B + SigLIP2-400M)
- **Capabilities**: Image understanding, OCR, multi-image analysis
- **Languages**: 30+ languages including English, Chinese, French, Spanish

## Docker Compose Example

```yaml
version: '3.8'
services:
  minicpm:
    image: code.foss.global/host.today/ht-docker-ai:minicpm45v
    container_name: minicpm
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

volumes:
  ollama-data:
```

## Building Locally

```bash
# Clone the repository
git clone https://code.foss.global/host.today/ht-docker-ai.git
cd ht-docker-ai

# Build all images
./build-images.sh

# Run tests
./test-images.sh
```

## License

MIT - Task Venture Capital GmbH