# Bank Statement Parsing with MiniCPM-V 4.5 Recipe for extracting transactions from bank statement PDFs using vision-language AI. ## Model - **Model**: MiniCPM-V 4.5 (8B parameters) - **Ollama Name**: `openbmb/minicpm-v4.5:q8_0` - **Quantization**: Q8_0 (9.8GB VRAM) - **Runtime**: Ollama on GPU ## Image Conversion Convert PDF to PNG at 300 DPI for optimal OCR accuracy. ```bash convert -density 300 -quality 100 input.pdf \ -background white -alpha remove \ output-%d.png ``` **Parameters:** - `-density 300`: 300 DPI resolution (critical for accuracy) - `-quality 100`: Maximum quality - `-background white -alpha remove`: Remove transparency - `output-%d.png`: Outputs page-0.png, page-1.png, etc. **Dependencies:** ```bash apt-get install imagemagick ``` ## Prompt ``` You are a bank statement parser. Extract EVERY transaction from the table. Read the Amount column carefully: - "- 21,47 €" means DEBIT, output as: -21.47 - "+ 1.000,00 €" means CREDIT, output as: 1000.00 - European format: comma = decimal point For each row output: {"date":"YYYY-MM-DD","counterparty":"NAME","amount":-21.47} Do not skip any rows. Return complete JSON array: ``` ## API Call ```python import base64 import requests # Load images with open('page-0.png', 'rb') as f: page0 = base64.b64encode(f.read()).decode('utf-8') with open('page-1.png', 'rb') as f: page1 = base64.b64encode(f.read()).decode('utf-8') payload = { "model": "openbmb/minicpm-v4.5:q8_0", "prompt": prompt, "images": [page0, page1], # Multiple pages supported "stream": False, "options": { "num_predict": 16384, "temperature": 0.1 } } response = requests.post( 'http://localhost:11434/api/generate', json=payload, timeout=600 ) result = response.json()['response'] ``` ## Output Format ```json [ {"date":"2022-04-01","counterparty":"DIGITALOCEAN.COM","amount":-21.47}, {"date":"2022-04-01","counterparty":"DIGITALOCEAN.COM","amount":-58.06}, {"date":"2022-04-12","counterparty":"LOSSLESS GMBH","amount":1000.00} ] ``` ## Running the Container **GPU (recommended):** ```bash docker run -d --gpus all -p 11434:11434 \ -v ollama-data:/root/.ollama \ -e MODEL_NAME="openbmb/minicpm-v4.5:q8_0" \ ht-docker-ai:minicpm45v ``` **CPU (slower):** ```bash docker run -d -p 11434:11434 \ -v ollama-data:/root/.ollama \ -e MODEL_NAME="openbmb/minicpm-v4.5:q4_0" \ ht-docker-ai:minicpm45v-cpu ``` ## Hardware Requirements | Quantization | VRAM/RAM | Speed | |--------------|----------|-------| | Q8_0 (GPU) | 10GB | Fast | | Q4_0 (CPU) | 8GB | Slow | ## Test Results | Statement | Pages | Transactions | Accuracy | |-----------|-------|--------------|----------| | bunq-2022-04 | 2 | 26 | 100% | | bunq-2021-06 | 3 | 28 | 100% | ## Tips 1. **DPI matters**: 150 DPI causes missed rows; 300 DPI is optimal 2. **PNG over JPEG**: PNG preserves text clarity better 3. **Remove alpha**: Some models struggle with transparency 4. **Multi-page**: Pass all pages in single request for context 5. **Temperature 0.1**: Low temperature for consistent output 6. **European format**: Explicitly explain comma=decimal in prompt