Files
ht-docker-ai/recipes/document.md
2026-01-16 03:58:39 +00:00

3.1 KiB

Bank Statement Parsing with MiniCPM-V 4.5

Recipe for extracting transactions from bank statement PDFs using vision-language AI.

Model

  • Model: MiniCPM-V 4.5 (8B parameters)
  • Ollama Name: openbmb/minicpm-v4.5:q8_0
  • Quantization: Q8_0 (9.8GB VRAM)
  • Runtime: Ollama on GPU

Image Conversion

Convert PDF to PNG at 300 DPI for optimal OCR accuracy.

convert -density 300 -quality 100 input.pdf \
  -background white -alpha remove \
  output-%d.png

Parameters:

  • -density 300: 300 DPI resolution (critical for accuracy)
  • -quality 100: Maximum quality
  • -background white -alpha remove: Remove transparency
  • output-%d.png: Outputs page-0.png, page-1.png, etc.

Dependencies:

apt-get install imagemagick

Prompt

You are a bank statement parser. Extract EVERY transaction from the table.

Read the Amount column carefully:
- "- 21,47 €" means DEBIT, output as: -21.47
- "+ 1.000,00 €" means CREDIT, output as: 1000.00
- European format: comma = decimal point

For each row output: {"date":"YYYY-MM-DD","counterparty":"NAME","amount":-21.47}

Do not skip any rows. Return complete JSON array:

API Call

import base64
import requests

# Load images
with open('page-0.png', 'rb') as f:
    page0 = base64.b64encode(f.read()).decode('utf-8')
with open('page-1.png', 'rb') as f:
    page1 = base64.b64encode(f.read()).decode('utf-8')

payload = {
    "model": "openbmb/minicpm-v4.5:q8_0",
    "prompt": prompt,
    "images": [page0, page1],  # Multiple pages supported
    "stream": False,
    "options": {
        "num_predict": 16384,
        "temperature": 0.1
    }
}

response = requests.post(
    'http://localhost:11434/api/generate',
    json=payload,
    timeout=600
)

result = response.json()['response']

Output Format

[
  {"date":"2022-04-01","counterparty":"DIGITALOCEAN.COM","amount":-21.47},
  {"date":"2022-04-01","counterparty":"DIGITALOCEAN.COM","amount":-58.06},
  {"date":"2022-04-12","counterparty":"LOSSLESS GMBH","amount":1000.00}
]

Running the Container

GPU (recommended):

docker run -d --gpus all -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  -e MODEL_NAME="openbmb/minicpm-v4.5:q8_0" \
  ht-docker-ai:minicpm45v

CPU (slower):

docker run -d -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  -e MODEL_NAME="openbmb/minicpm-v4.5:q4_0" \
  ht-docker-ai:minicpm45v-cpu

Hardware Requirements

Quantization VRAM/RAM Speed
Q8_0 (GPU) 10GB Fast
Q4_0 (CPU) 8GB Slow

Test Results

Statement Pages Transactions Accuracy
bunq-2022-04 2 26 100%
bunq-2021-06 3 28 100%

Tips

  1. DPI matters: 150 DPI causes missed rows; 300 DPI is optimal
  2. PNG over JPEG: PNG preserves text clarity better
  3. Remove alpha: Some models struggle with transparency
  4. Multi-page: Pass all pages in single request for context
  5. Temperature 0.1: Low temperature for consistent output
  6. European format: Explicitly explain comma=decimal in prompt