fix(extraction): improve JSON extraction prompts and model options for invoice and bank statement tests

2026-01-19 21:19:37 +00:00
parent 235aa1352b
commit 09770d3177
4 changed files with 58 additions and 38 deletions
--- a/changelog.md
+++ b/changelog.md
@@ -1,5 +1,14 @@
 # Changelog

+## 2026-01-19 - 1.14.1 - fix(extraction)
+improve JSON extraction prompts and model options for invoice and bank statement tests
+
+- Refactor JSON extraction prompts to be sent after the document text and add explicit 'WHERE TO FIND DATA' and 'RULES' sections for clearer extraction guidance
+- Change chat message flow to: send document, assistant acknowledgement, then the JSON extraction prompt (avoids concatenating large prompts into one message)
+- Add model options (num_ctx: 32768, temperature: 0) to give larger context windows and deterministic JSON output
+- Simplify logging to avoid printing full prompt contents; log document and prompt lengths instead
+- Increase timeouts for large documents to 600000ms (10 minutes) where applicable
+
 ## 2026-01-19 - 1.14.0 - feat(docker-images)
 add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout