feat(tests): add Ministral 3 vision tests and improve invoice extraction pipeline to use Ollama chat schema, sanitization, and multi-page support
This commit is contained in:
10
changelog.md
10
changelog.md
@@ -1,5 +1,15 @@
|
||||
# Changelog
|
||||
|
||||
## 2026-01-18 - 1.9.0 - feat(tests)
|
||||
add Ministral 3 vision tests and improve invoice extraction pipeline to use Ollama chat schema, sanitization, and multi-page support
|
||||
|
||||
- Add new vision-based test suites for Ministral 3: test/test.invoices.ministral3.ts and test/test.bankstatements.ministral3.ts (model ministral-3:8b).
|
||||
- Introduce ensureMinistral3() helper to start/check Ollama/MiniCPM model in test/helpers/docker.ts.
|
||||
- Switch invoice extraction to use Ollama /api/chat with a JSON schema (format) and streaming support (reads message.content).
|
||||
- Improve HTML handling: sanitizeHtml() to remove OCR artifacts, concatenate multi-page HTML with page markers, and increase truncation limits.
|
||||
- Enhance response parsing: strip Markdown code fences, robustly locate JSON object boundaries, and provide clearer JSON parse errors.
|
||||
- Add PDF->PNG conversion (ImageMagick) and direct image-based extraction flow for vision model tests.
|
||||
|
||||
## 2026-01-18 - 1.8.0 - feat(paddleocr-vl)
|
||||
add structured HTML output and table parsing for PaddleOCR-VL, update API, tests, and README
|
||||
|
||||
|
||||
Reference in New Issue
Block a user