fix(extraction): improve JSON extraction prompts and model options for invoice and bank statement tests

This commit is contained in:
2026-01-19 21:19:37 +00:00
parent 235aa1352b
commit 09770d3177
4 changed files with 58 additions and 38 deletions

View File

@@ -1,5 +1,14 @@
# Changelog
## 2026-01-19 - 1.14.1 - fix(extraction)
improve JSON extraction prompts and model options for invoice and bank statement tests
- Refactor JSON extraction prompts to be sent after the document text and add explicit 'WHERE TO FIND DATA' and 'RULES' sections for clearer extraction guidance
- Change chat message flow to: send document, assistant acknowledgement, then the JSON extraction prompt (avoids concatenating large prompts into one message)
- Add model options (num_ctx: 32768, temperature: 0) to give larger context windows and deterministic JSON output
- Simplify logging to avoid printing full prompt contents; log document and prompt lengths instead
- Increase timeouts for large documents to 600000ms (10 minutes) where applicable
## 2026-01-19 - 1.14.0 - feat(docker-images)
add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout