feat(vision): process pages separately and make Qwen3-VL vision extraction more robust; add per-page parsing, safer JSON handling, reduced token usage, and multi-query invoice extraction

This commit is contained in:
2026-01-18 04:50:57 +00:00
parent 63d72a52c9
commit e76768da55
3 changed files with 96 additions and 68 deletions

View File

@@ -1,5 +1,16 @@
# Changelog
## 2026-01-18 - 1.11.0 - feat(vision)
process pages separately and make Qwen3-VL vision extraction more robust; add per-page parsing, safer JSON handling, reduced token usage, and multi-query invoice extraction
- Bank statements: split extraction into extractTransactionsFromPage and sequentially process pages to avoid thinking-token exhaustion
- Bank statements: reduced num_predict from 8000 to 4000, send single image per request, added per-page logging and non-throwing handling for empty or non-JSON responses
- Bank statements: catch JSON.parse errors and return empty array instead of throwing
- Invoices: introduced queryField to request single values and perform multiple simple queries (reduces model thinking usage)
- Invoices: reduced num_predict for invoice queries from 4000 to 500 and parse amounts robustly (handles European formats like 1.234,56)
- Invoices: normalize currency to uppercase 3-letter code, return safe defaults (empty strings / 0) instead of nulls, and parse net/vat/total with fallbacks
- General: simplified Ollama API error messages to avoid including response body content in thrown errors
## 2026-01-18 - 1.10.1 - fix(tests)
improve Qwen3-VL invoice extraction test by switching to non-stream API, adding model availability/pull checks, simplifying response parsing, and tightening model options