tag/v1.1.0/recipes/document.md

# Bank Statement Parsing with MiniCPM-V 4.5

Recipe for extracting transactions from bank statement PDFs using vision-language AI.

## Model

- **Model**: MiniCPM-V 4.5 (8B parameters)
- **Ollama Name**: `openbmb/minicpm-v4.5:q8_0`
- **Quantization**: Q8_0 (9.8GB VRAM)
- **Runtime**: Ollama on GPU

## Image Conversion

Convert PDF to PNG at 300 DPI for optimal OCR accuracy.

```bash
convert -density 300 -quality 100 input.pdf \
  -background white -alpha remove \
  output-%d.png
```

**Parameters:**
- `-density 300`: 300 DPI resolution (critical for accuracy)
- `-quality 100`: Maximum quality
- `-background white -alpha remove`: Remove transparency
- `output-%d.png`: Outputs page-0.png, page-1.png, etc.

**Dependencies:**
```bash
apt-get install imagemagick
```

## Prompt

```
You are a bank statement parser. Extract EVERY transaction from the table.

Read the Amount column carefully:
- "- 21,47 €" means DEBIT, output as: -21.47
- "+ 1.000,00 €" means CREDIT, output as: 1000.00
- European format: comma = decimal point

For each row output: {"date":"YYYY-MM-DD","counterparty":"NAME","amount":-21.47}

Do not skip any rows. Return complete JSON array:
```

## API Call

```python
import base64
import requests

# Load images
with open('page-0.png', 'rb') as f:
    page0 = base64.b64encode(f.read()).decode('utf-8')
with open('page-1.png', 'rb') as f:
    page1 = base64.b64encode(f.read()).decode('utf-8')

payload = {
    "model": "openbmb/minicpm-v4.5:q8_0",
    "prompt": prompt,
    "images": [page0, page1],  # Multiple pages supported
    "stream": False,
    "options": {
        "num_predict": 16384,
        "temperature": 0.1
    }
}

response = requests.post(
    'http://localhost:11434/api/generate',
    json=payload,
    timeout=600
)

result = response.json()['response']
```

## Output Format

```json
[
  {"date":"2022-04-01","counterparty":"DIGITALOCEAN.COM","amount":-21.47},
  {"date":"2022-04-01","counterparty":"DIGITALOCEAN.COM","amount":-58.06},
  {"date":"2022-04-12","counterparty":"LOSSLESS GMBH","amount":1000.00}
]
```

## Running the Container

**GPU (recommended):**
```bash
docker run -d --gpus all -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  -e MODEL_NAME="openbmb/minicpm-v4.5:q8_0" \
  ht-docker-ai:minicpm45v
```

**CPU (slower):**
```bash
docker run -d -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  -e MODEL_NAME="openbmb/minicpm-v4.5:q4_0" \
  ht-docker-ai:minicpm45v-cpu
```

## Hardware Requirements

| Quantization | VRAM/RAM | Speed |
|--------------|----------|-------|
| Q8_0 (GPU)   | 10GB     | Fast  |
| Q4_0 (CPU)   | 8GB      | Slow  |

## Test Results

| Statement | Pages | Transactions | Accuracy |
|-----------|-------|--------------|----------|
| bunq-2022-04 | 2 | 26 | 100% |
| bunq-2021-06 | 3 | 28 | 100% |

## Tips

1. **DPI matters**: 150 DPI causes missed rows; 300 DPI is optimal
2. **PNG over JPEG**: PNG preserves text clarity better
3. **Remove alpha**: Some models struggle with transparency
4. **Multi-page**: Pass all pages in single request for context
5. **Temperature 0.1**: Low temperature for consistent output
6. **European format**: Explicitly explain comma=decimal in prompt
update 2026-01-16 03:58:39 +00:00			`# Bank Statement Parsing with MiniCPM-V 4.5`

			`Recipe for extracting transactions from bank statement PDFs using vision-language AI.`

			`## Model`

			`- Model: MiniCPM-V 4.5 (8B parameters)`
			- Ollama Name: `openbmb/minicpm-v4.5:q8_0`
			`- Quantization: Q8_0 (9.8GB VRAM)`
			`- Runtime: Ollama on GPU`

			`## Image Conversion`

			`Convert PDF to PNG at 300 DPI for optimal OCR accuracy.`

			```bash
			`convert -density 300 -quality 100 input.pdf \`
			`-background white -alpha remove \`
			`output-%d.png`
			```

			`Parameters:`
			- `-density 300`: 300 DPI resolution (critical for accuracy)
			- `-quality 100`: Maximum quality
			- `-background white -alpha remove`: Remove transparency
			- `output-%d.png`: Outputs page-0.png, page-1.png, etc.

			`Dependencies:`
			```bash
			`apt-get install imagemagick`
			```

			`## Prompt`

			```
			`You are a bank statement parser. Extract EVERY transaction from the table.`

			`Read the Amount column carefully:`
			`- "- 21,47 €" means DEBIT, output as: -21.47`
			`- "+ 1.000,00 €" means CREDIT, output as: 1000.00`
			`- European format: comma = decimal point`

			`For each row output: {"date":"YYYY-MM-DD","counterparty":"NAME","amount":-21.47}`

			`Do not skip any rows. Return complete JSON array:`
			```

			`## API Call`

			```python
			`import base64`
			`import requests`

			`# Load images`
			`with open('page-0.png', 'rb') as f:`
			`page0 = base64.b64encode(f.read()).decode('utf-8')`
			`with open('page-1.png', 'rb') as f:`
			`page1 = base64.b64encode(f.read()).decode('utf-8')`

			`payload = {`
			`"model": "openbmb/minicpm-v4.5:q8_0",`
			`"prompt": prompt,`
			`"images": [page0, page1], # Multiple pages supported`
			`"stream": False,`
			`"options": {`
			`"num_predict": 16384,`
			`"temperature": 0.1`
			`}`
			`}`

			`response = requests.post(`
			`'http://localhost:11434/api/generate',`
			`json=payload,`
			`timeout=600`
			`)`

			`result = response.json()['response']`
			```

			`## Output Format`

			```json
			`[`
			`{"date":"2022-04-01","counterparty":"DIGITALOCEAN.COM","amount":-21.47},`
			`{"date":"2022-04-01","counterparty":"DIGITALOCEAN.COM","amount":-58.06},`
			`{"date":"2022-04-12","counterparty":"LOSSLESS GMBH","amount":1000.00}`
			`]`
			```

			`## Running the Container`

			`GPU (recommended):`
			```bash
			`docker run -d --gpus all -p 11434:11434 \`
			`-v ollama-data:/root/.ollama \`
			`-e MODEL_NAME="openbmb/minicpm-v4.5:q8_0" \`
			`ht-docker-ai:minicpm45v`
			```

			`CPU (slower):`
			```bash
			`docker run -d -p 11434:11434 \`
			`-v ollama-data:/root/.ollama \`
			`-e MODEL_NAME="openbmb/minicpm-v4.5:q4_0" \`
			`ht-docker-ai:minicpm45v-cpu`
			```

			`## Hardware Requirements`

			`\| Quantization \| VRAM/RAM \| Speed \|`
			`\|--------------\|----------\|-------\|`
			`\| Q8_0 (GPU) \| 10GB \| Fast \|`
			`\| Q4_0 (CPU) \| 8GB \| Slow \|`

			`## Test Results`

			`\| Statement \| Pages \| Transactions \| Accuracy \|`
			`\|-----------\|-------\|--------------\|----------\|`
			`\| bunq-2022-04 \| 2 \| 26 \| 100% \|`
			`\| bunq-2021-06 \| 3 \| 28 \| 100% \|`

			`## Tips`

			`1. DPI matters: 150 DPI causes missed rows; 300 DPI is optimal`
			`2. PNG over JPEG: PNG preserves text clarity better`
			`3. Remove alpha: Some models struggle with transparency`
			`4. Multi-page: Pass all pages in single request for context`
			`5. Temperature 0.1: Low temperature for consistent output`
			`6. European format: Explicitly explain comma=decimal in prompt`