update
This commit is contained in:
129
recipes/document.md
Normal file
129
recipes/document.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Bank Statement Parsing with MiniCPM-V 4.5
|
||||
|
||||
Recipe for extracting transactions from bank statement PDFs using vision-language AI.
|
||||
|
||||
## Model
|
||||
|
||||
- **Model**: MiniCPM-V 4.5 (8B parameters)
|
||||
- **Ollama Name**: `openbmb/minicpm-v4.5:q8_0`
|
||||
- **Quantization**: Q8_0 (9.8GB VRAM)
|
||||
- **Runtime**: Ollama on GPU
|
||||
|
||||
## Image Conversion
|
||||
|
||||
Convert PDF to PNG at 300 DPI for optimal OCR accuracy.
|
||||
|
||||
```bash
|
||||
convert -density 300 -quality 100 input.pdf \
|
||||
-background white -alpha remove \
|
||||
output-%d.png
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `-density 300`: 300 DPI resolution (critical for accuracy)
|
||||
- `-quality 100`: Maximum quality
|
||||
- `-background white -alpha remove`: Remove transparency
|
||||
- `output-%d.png`: Outputs page-0.png, page-1.png, etc.
|
||||
|
||||
**Dependencies:**
|
||||
```bash
|
||||
apt-get install imagemagick
|
||||
```
|
||||
|
||||
## Prompt
|
||||
|
||||
```
|
||||
You are a bank statement parser. Extract EVERY transaction from the table.
|
||||
|
||||
Read the Amount column carefully:
|
||||
- "- 21,47 €" means DEBIT, output as: -21.47
|
||||
- "+ 1.000,00 €" means CREDIT, output as: 1000.00
|
||||
- European format: comma = decimal point
|
||||
|
||||
For each row output: {"date":"YYYY-MM-DD","counterparty":"NAME","amount":-21.47}
|
||||
|
||||
Do not skip any rows. Return complete JSON array:
|
||||
```
|
||||
|
||||
## API Call
|
||||
|
||||
```python
|
||||
import base64
|
||||
import requests
|
||||
|
||||
# Load images
|
||||
with open('page-0.png', 'rb') as f:
|
||||
page0 = base64.b64encode(f.read()).decode('utf-8')
|
||||
with open('page-1.png', 'rb') as f:
|
||||
page1 = base64.b64encode(f.read()).decode('utf-8')
|
||||
|
||||
payload = {
|
||||
"model": "openbmb/minicpm-v4.5:q8_0",
|
||||
"prompt": prompt,
|
||||
"images": [page0, page1], # Multiple pages supported
|
||||
"stream": False,
|
||||
"options": {
|
||||
"num_predict": 16384,
|
||||
"temperature": 0.1
|
||||
}
|
||||
}
|
||||
|
||||
response = requests.post(
|
||||
'http://localhost:11434/api/generate',
|
||||
json=payload,
|
||||
timeout=600
|
||||
)
|
||||
|
||||
result = response.json()['response']
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
```json
|
||||
[
|
||||
{"date":"2022-04-01","counterparty":"DIGITALOCEAN.COM","amount":-21.47},
|
||||
{"date":"2022-04-01","counterparty":"DIGITALOCEAN.COM","amount":-58.06},
|
||||
{"date":"2022-04-12","counterparty":"LOSSLESS GMBH","amount":1000.00}
|
||||
]
|
||||
```
|
||||
|
||||
## Running the Container
|
||||
|
||||
**GPU (recommended):**
|
||||
```bash
|
||||
docker run -d --gpus all -p 11434:11434 \
|
||||
-v ollama-data:/root/.ollama \
|
||||
-e MODEL_NAME="openbmb/minicpm-v4.5:q8_0" \
|
||||
ht-docker-ai:minicpm45v
|
||||
```
|
||||
|
||||
**CPU (slower):**
|
||||
```bash
|
||||
docker run -d -p 11434:11434 \
|
||||
-v ollama-data:/root/.ollama \
|
||||
-e MODEL_NAME="openbmb/minicpm-v4.5:q4_0" \
|
||||
ht-docker-ai:minicpm45v-cpu
|
||||
```
|
||||
|
||||
## Hardware Requirements
|
||||
|
||||
| Quantization | VRAM/RAM | Speed |
|
||||
|--------------|----------|-------|
|
||||
| Q8_0 (GPU) | 10GB | Fast |
|
||||
| Q4_0 (CPU) | 8GB | Slow |
|
||||
|
||||
## Test Results
|
||||
|
||||
| Statement | Pages | Transactions | Accuracy |
|
||||
|-----------|-------|--------------|----------|
|
||||
| bunq-2022-04 | 2 | 26 | 100% |
|
||||
| bunq-2021-06 | 3 | 28 | 100% |
|
||||
|
||||
## Tips
|
||||
|
||||
1. **DPI matters**: 150 DPI causes missed rows; 300 DPI is optimal
|
||||
2. **PNG over JPEG**: PNG preserves text clarity better
|
||||
3. **Remove alpha**: Some models struggle with transparency
|
||||
4. **Multi-page**: Pass all pages in single request for context
|
||||
5. **Temperature 0.1**: Low temperature for consistent output
|
||||
6. **European format**: Explicitly explain comma=decimal in prompt
|
||||
Reference in New Issue
Block a user