readme.hints.md

# Technical Notes - ht-docker-ai

## Architecture

This project uses **Ollama** and **vLLM** as runtime frameworks for serving AI models:

### Ollama-based Images (MiniCPM-V, Qwen3-VL)
- Automatic model download and caching
- Unified REST API (compatible with OpenAI format)
- Built-in quantization support
- GPU auto-detection

### vLLM-based Images (Nanonets-OCR)
- High-performance inference server
- OpenAI-compatible API
- Optimized for VLM workloads

## Model Details

### MiniCPM-V 4.5

- **Source**: OpenBMB (https://github.com/OpenBMB/MiniCPM-V)
- **Base Models**: Qwen3-8B + SigLIP2-400M
- **Total Parameters**: 8B
- **Ollama Model Name**: `minicpm-v`

### VRAM Usage

| Mode | VRAM Required |
|------|---------------|
| Full precision (bf16) | 18GB |
| int4 quantized | 9GB |

## Container Startup Flow

### Ollama-based containers
1. `docker-entrypoint.sh` starts Ollama server in background
2. Waits for server to be ready
3. Checks if model already exists in volume
4. Pulls model if not present
5. Keeps container running

### vLLM-based containers
1. vLLM server starts with model auto-download
2. Health check endpoint available at `/health`
3. OpenAI-compatible API at `/v1/chat/completions`

## Volume Persistence

### Ollama volumes
Mount `/root/.ollama` to persist downloaded models:

```bash
-v ollama-data:/root/.ollama
```

Without this volume, the model will be re-downloaded on each container start (~5GB download).

### vLLM/HuggingFace volumes
Mount `/root/.cache/huggingface` for model caching:

```bash
-v hf-cache:/root/.cache/huggingface
```

## API Endpoints

### Ollama API (MiniCPM-V, Qwen3-VL)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/tags` | GET | List available models |
| `/api/generate` | POST | Generate completion |
| `/api/chat` | POST | Chat completion |
| `/api/pull` | POST | Pull a model |
| `/api/show` | POST | Show model info |

### vLLM API (Nanonets-OCR)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/v1/models` | GET | List available models |
| `/v1/chat/completions` | POST | OpenAI-compatible chat completions |

## Health Checks

All containers include Docker health checks:

```dockerfile
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:11434/api/tags || exit 1
```

---

## Nanonets-OCR-s

### Overview

Nanonets-OCR-s is a Qwen2.5-VL-3B model fine-tuned specifically for document OCR tasks. It outputs structured markdown with semantic tags.

**Key features:**
- Based on Qwen2.5-VL-3B (~4B parameters)
- Fine-tuned for document OCR
- Outputs markdown with semantic HTML tags
- ~10GB VRAM

### Docker Images

| Tag | Description |
|-----|-------------|
| `nanonets-ocr` | GPU variant using vLLM (OpenAI-compatible API) |

### API Endpoints (OpenAI-compatible via vLLM)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/v1/models` | GET | List available models |
| `/v1/chat/completions` | POST | OpenAI-compatible chat completions |

### Request/Response Format

**POST /v1/chat/completions (OpenAI-compatible)**
```json
{
  "model": "nanonets/Nanonets-OCR-s",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
        {"type": "text", "text": "Extract the text from the above document..."}
      ]
    }
  ],
  "temperature": 0.0,
  "max_tokens": 4096
}
```

### Nanonets OCR Prompt

The model is designed to work with a specific prompt format:
```
Extract the text from the above document as if you were reading it naturally.
Return the tables in html format.
Return the equations in LaTeX representation.
If there is an image in the document and image caption is not present, add a small description inside <img></img> tag.
Watermarks should be wrapped in brackets. Ex: <watermark>OFFICIAL COPY</watermark>.
Page numbers should be wrapped in brackets. Ex: <page_number>14</page_number>.
```

### Performance

- **GPU (vLLM)**: ~3-8 seconds per page
- **VRAM usage**: ~10GB

### Two-Stage Pipeline (Nanonets + Qwen3)

The Nanonets tests use a two-stage pipeline:
1. **Stage 1**: Nanonets-OCR-s converts images to markdown (via vLLM on port 8000)
2. **Stage 2**: Qwen3 8B extracts structured JSON from markdown (via Ollama on port 11434)

**GPU Limitation**: Both vLLM and Ollama require significant GPU memory. On a single GPU system:
- Running both simultaneously causes memory contention
- For single GPU: Run services sequentially (stop Nanonets before Qwen3)
- For multi-GPU: Assign each service to a different GPU

**Sequential Execution**:
```bash
# Step 1: Run Nanonets OCR (converts to markdown)
docker start nanonets-test
# ... perform OCR ...
docker stop nanonets-test

# Step 2: Run Qwen3 extraction (from markdown)
docker start minicpm-test
# ... extract JSON ...
```

---

## Multi-Pass Extraction Strategy

The bank statement extraction uses a dual-VLM consensus approach:

### Architecture: Dual-VLM Consensus

| VLM | Model | Purpose |
|-----|-------|---------|
| **MiniCPM-V 4.5** | 8B params | Primary visual extraction |
| **Nanonets-OCR-s** | ~4B params | Document OCR with semantic output |

### Extraction Strategy

1. **Pass 1**: MiniCPM-V visual extraction (images → JSON)
2. **Pass 2**: Nanonets-OCR semantic extraction (images → markdown → JSON)
3. **Consensus**: If Pass 1 == Pass 2 → Done (fast path)
4. **Pass 3+**: MiniCPM-V visual if no consensus

### Why Dual-VLM Works

- **Different architectures**: Two independent models cross-check each other
- **Specialized strengths**: Nanonets-OCR-s optimized for document structure, MiniCPM-V for general vision
- **No structure loss**: Both VLMs see the original images directly
- **Fast consensus**: Most documents complete in 2 passes when VLMs agree

---

## Adding New Models

To add a new model variant:

1. Create `Dockerfile_<modelname>_<runtime>_<hardware>_VRAM<size>`
2. Set `MODEL_NAME` environment variable
3. Update `build-images.sh` with new build target
4. Add documentation to `readme.md`

## Troubleshooting

### Model download hangs

Check container logs:
```bash
docker logs -f <container-name>
```

The model download is ~5GB and may take several minutes.

### Out of memory

- GPU: Use a lighter model variant or upgrade VRAM
- Add more GPU memory: Consider multi-GPU setup

### API not responding

1. Check if container is healthy: `docker ps`
2. Check logs for errors: `docker logs <container>`
3. Verify port mapping: `curl localhost:11434/api/tags`

## CI/CD Integration

Build and push using npmci:

```bash
npmci docker login
npmci docker build
npmci docker push code.foss.global
```

---

## Related Resources

- [Ollama Documentation](https://ollama.ai/docs)
- [MiniCPM-V GitHub](https://github.com/OpenBMB/MiniCPM-V)
- [Ollama API Reference](https://github.com/ollama/ollama/blob/main/docs/api.md)
- [Nanonets-OCR-s on HuggingFace](https://huggingface.co/nanonets/Nanonets-OCR-s)
initial 2026-01-16 01:51:57 +00:00			`# Technical Notes - ht-docker-ai`

			`## Architecture`

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`This project uses Ollama and vLLM as runtime frameworks for serving AI models:`
initial 2026-01-16 01:51:57 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### Ollama-based Images (MiniCPM-V, Qwen3-VL)`
initial 2026-01-16 01:51:57 +00:00			`- Automatic model download and caching`
			`- Unified REST API (compatible with OpenAI format)`
			`- Built-in quantization support`
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`- GPU auto-detection`

			`### vLLM-based Images (Nanonets-OCR)`
			`- High-performance inference server`
			`- OpenAI-compatible API`
			`- Optimized for VLM workloads`
initial 2026-01-16 01:51:57 +00:00
			`## Model Details`

			`### MiniCPM-V 4.5`

			`- Source: OpenBMB (https://github.com/OpenBMB/MiniCPM-V)`
			`- Base Models: Qwen3-8B + SigLIP2-400M`
			`- Total Parameters: 8B`
			- Ollama Model Name: `minicpm-v`

			`### VRAM Usage`

			`\| Mode \| VRAM Required \|`
			`\|------\|---------------\|`
			`\| Full precision (bf16) \| 18GB \|`
			`\| int4 quantized \| 9GB \|`

			`## Container Startup Flow`

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### Ollama-based containers`
initial 2026-01-16 01:51:57 +00:00			1. `docker-entrypoint.sh` starts Ollama server in background
			`2. Waits for server to be ready`
			`3. Checks if model already exists in volume`
			`4. Pulls model if not present`
			`5. Keeps container running`

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### vLLM-based containers`
			`1. vLLM server starts with model auto-download`
			2. Health check endpoint available at `/health`
			3. OpenAI-compatible API at `/v1/chat/completions`

initial 2026-01-16 01:51:57 +00:00			`## Volume Persistence`

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### Ollama volumes`
initial 2026-01-16 01:51:57 +00:00			Mount `/root/.ollama` to persist downloaded models:

			```bash
			`-v ollama-data:/root/.ollama`
			```

			`Without this volume, the model will be re-downloaded on each container start (~5GB download).`

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### vLLM/HuggingFace volumes`
			Mount `/root/.cache/huggingface` for model caching:

			```bash
			`-v hf-cache:/root/.cache/huggingface`
			```

initial 2026-01-16 01:51:57 +00:00			`## API Endpoints`

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### Ollama API (MiniCPM-V, Qwen3-VL)`
initial 2026-01-16 01:51:57 +00:00
			`\| Endpoint \| Method \| Description \|`
			`\|----------\|--------\|-------------\|`
			\| `/api/tags` \| GET \| List available models \|
			\| `/api/generate` \| POST \| Generate completion \|
			\| `/api/chat` \| POST \| Chat completion \|
			\| `/api/pull` \| POST \| Pull a model \|
			\| `/api/show` \| POST \| Show model info \|

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### vLLM API (Nanonets-OCR)`
initial 2026-01-16 01:51:57 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`\| Endpoint \| Method \| Description \|`
			`\|----------\|--------\|-------------\|`
			\| `/health` \| GET \| Health check \|
			\| `/v1/models` \| GET \| List available models \|
			\| `/v1/chat/completions` \| POST \| OpenAI-compatible chat completions \|
initial 2026-01-16 01:51:57 +00:00
			`## Health Checks`

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`All containers include Docker health checks:`
initial 2026-01-16 01:51:57 +00:00
			```dockerfile
			`HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \`
			`CMD curl -f http://localhost:11434/api/tags \|\| exit 1`
			```

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`---`
initial 2026-01-16 01:51:57 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`## Nanonets-OCR-s`
feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows 2026-01-16 13:23:01 +00:00
			`### Overview`

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`Nanonets-OCR-s is a Qwen2.5-VL-3B model fine-tuned specifically for document OCR tasks. It outputs structured markdown with semantic tags.`
feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows 2026-01-16 13:23:01 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`Key features:`
			`- Based on Qwen2.5-VL-3B (~4B parameters)`
			`- Fine-tuned for document OCR`
			`- Outputs markdown with semantic HTML tags`
			`- ~10GB VRAM`
feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows 2026-01-16 13:23:01 +00:00
			`### Docker Images`

			`\| Tag \| Description \|`
			`\|-----\|-------------\|`
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			\| `nanonets-ocr` \| GPU variant using vLLM (OpenAI-compatible API) \|
feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows 2026-01-16 13:23:01 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### API Endpoints (OpenAI-compatible via vLLM)`
feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows 2026-01-16 13:23:01 +00:00
			`\| Endpoint \| Method \| Description \|`
			`\|----------\|--------\|-------------\|`
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			\| `/health` \| GET \| Health check \|
update 2026-01-16 16:21:44 +00:00			\| `/v1/models` \| GET \| List available models \|
			\| `/v1/chat/completions` \| POST \| OpenAI-compatible chat completions \|
feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows 2026-01-16 13:23:01 +00:00
			`### Request/Response Format`

update 2026-01-16 16:21:44 +00:00			`POST /v1/chat/completions (OpenAI-compatible)`
feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows 2026-01-16 13:23:01 +00:00			```json
			`{`
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`"model": "nanonets/Nanonets-OCR-s",`
update 2026-01-16 16:21:44 +00:00			`"messages": [`
			`{`
			`"role": "user",`
			`"content": [`
			`{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},`
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`{"type": "text", "text": "Extract the text from the above document..."}`
update 2026-01-16 16:21:44 +00:00			`]`
			`}`
			`],`
			`"temperature": 0.0,`
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`"max_tokens": 4096`
feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows 2026-01-16 13:23:01 +00:00			`}`
			```

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### Nanonets OCR Prompt`
feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows 2026-01-16 13:23:01 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`The model is designed to work with a specific prompt format:`
			```
			`Extract the text from the above document as if you were reading it naturally.`
			`Return the tables in html format.`
			`Return the equations in LaTeX representation.`
			`If there is an image in the document and image caption is not present, add a small description inside <img></img> tag.`
			`Watermarks should be wrapped in brackets. Ex: <watermark>OFFICIAL COPY</watermark>.`
			`Page numbers should be wrapped in brackets. Ex: <page_number>14</page_number>.`
feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows 2026-01-16 13:23:01 +00:00			```

			`### Performance`

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`- GPU (vLLM): ~3-8 seconds per page`
			`- VRAM usage: ~10GB`
initial 2026-01-16 01:51:57 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### Two-Stage Pipeline (Nanonets + Qwen3)`
initial 2026-01-16 01:51:57 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`The Nanonets tests use a two-stage pipeline:`
			`1. Stage 1: Nanonets-OCR-s converts images to markdown (via vLLM on port 8000)`
			`2. Stage 2: Qwen3 8B extracts structured JSON from markdown (via Ollama on port 11434)`
initial 2026-01-16 01:51:57 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`GPU Limitation: Both vLLM and Ollama require significant GPU memory. On a single GPU system:`
			`- Running both simultaneously causes memory contention`
			`- For single GPU: Run services sequentially (stop Nanonets before Qwen3)`
			`- For multi-GPU: Assign each service to a different GPU`
initial 2026-01-16 01:51:57 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`Sequential Execution:`
initial 2026-01-16 01:51:57 +00:00			```bash
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`# Step 1: Run Nanonets OCR (converts to markdown)`
			`docker start nanonets-test`
			`# ... perform OCR ...`
			`docker stop nanonets-test`
initial 2026-01-16 01:51:57 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`# Step 2: Run Qwen3 extraction (from markdown)`
			`docker start minicpm-test`
			`# ... extract JSON ...`
initial 2026-01-16 01:51:57 +00:00			```

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`---`

update 2026-01-16 16:21:44 +00:00			`## Multi-Pass Extraction Strategy`

			`The bank statement extraction uses a dual-VLM consensus approach:`

			`### Architecture: Dual-VLM Consensus`

			`\| VLM \| Model \| Purpose \|`
			`\|-----\|-------\|---------\|`
			`\| MiniCPM-V 4.5 \| 8B params \| Primary visual extraction \|`
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`\| Nanonets-OCR-s \| ~4B params \| Document OCR with semantic output \|`
update 2026-01-16 16:21:44 +00:00
			`### Extraction Strategy`

			`1. Pass 1: MiniCPM-V visual extraction (images → JSON)`
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`2. Pass 2: Nanonets-OCR semantic extraction (images → markdown → JSON)`
update 2026-01-16 16:21:44 +00:00			`3. Consensus: If Pass 1 == Pass 2 → Done (fast path)`
			`4. Pass 3+: MiniCPM-V visual if no consensus`

			`### Why Dual-VLM Works`

			`- Different architectures: Two independent models cross-check each other`
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`- Specialized strengths: Nanonets-OCR-s optimized for document structure, MiniCPM-V for general vision`
update 2026-01-16 16:21:44 +00:00			`- No structure loss: Both VLMs see the original images directly`
			`- Fast consensus: Most documents complete in 2 passes when VLMs agree`

			`---`

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`## Adding New Models`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`To add a new model variant:`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			1. Create `Dockerfile_<modelname>_<runtime>_<hardware>_VRAM<size>`
			2. Set `MODEL_NAME` environment variable
			3. Update `build-images.sh` with new build target
			4. Add documentation to `readme.md`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`## Troubleshooting`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### Model download hangs`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`Check container logs:`
			```bash
			`docker logs -f <container-name>`
update 2026-01-18 15:54:16 +00:00			```

feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`The model download is ~5GB and may take several minutes.`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### Out of memory`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`- GPU: Use a lighter model variant or upgrade VRAM`
			`- Add more GPU memory: Consider multi-GPU setup`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`### API not responding`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			1. Check if container is healthy: `docker ps`
			2. Check logs for errors: `docker logs <container>`
			3. Verify port mapping: `curl localhost:11434/api/tags`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`## CI/CD Integration`
update 2026-01-18 15:54:16 +00:00
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`Build and push using npmci:`
update 2026-01-18 15:54:16 +00:00
			```bash
feat(docker-images): add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout 2026-01-19 21:05:51 +00:00			`npmci docker login`
			`npmci docker build`
			`npmci docker push code.foss.global`
update 2026-01-18 15:54:16 +00:00			```

			`---`

initial 2026-01-16 01:51:57 +00:00			`## Related Resources`

			`- [Ollama Documentation](https://ollama.ai/docs)`
			`- [MiniCPM-V GitHub](https://github.com/OpenBMB/MiniCPM-V)`
			`- [Ollama API Reference](https://github.com/ollama/ollama/blob/main/docs/api.md)`
update 2026-01-18 15:54:16 +00:00			`- [Nanonets-OCR-s on HuggingFace](https://huggingface.co/nanonets/Nanonets-OCR-s)`