# Changelog ## 2026-01-20 - 1.14.3 - fix(repo) no changes detected in the diff; no files modified and no release required - Diff contained no changes - No files were added, removed, or modified - No code, dependency, or documentation updates to release ## 2026-01-19 - 1.14.2 - fix(readme) update README to document Nanonets-OCR2-3B (replaces Nanonets-OCR-s), adjust VRAM and context defaults, expand feature docs, and update examples/test command - Renamed Nanonets-OCR-s -> Nanonets-OCR2-3B throughout README and examples - Updated Nanonets VRAM guidance from ~10GB to ~12-16GB and documented 30K context - Changed documented MAX_MODEL_LEN default from 8192 to 30000 - Updated example model identifiers (model strings and curl/example snippets) to nanonets/Nanonets-OCR2-3B - Added MiniCPM and Qwen feature bullets (multilingual, multi-image, flowchart support, expanded context notes) - Replaced README test command from ./test-images.sh to pnpm test ## 2026-01-19 - 1.14.1 - fix(extraction) improve JSON extraction prompts and model options for invoice and bank statement tests - Refactor JSON extraction prompts to be sent after the document text and add explicit 'WHERE TO FIND DATA' and 'RULES' sections for clearer extraction guidance - Change chat message flow to: send document, assistant acknowledgement, then the JSON extraction prompt (avoids concatenating large prompts into one message) - Add model options (num_ctx: 32768, temperature: 0) to give larger context windows and deterministic JSON output - Simplify logging to avoid printing full prompt contents; log document and prompt lengths instead - Increase timeouts for large documents to 600000ms (10 minutes) where applicable ## 2026-01-19 - 1.14.0 - feat(docker-images) add vLLM-based Nanonets-OCR2-3B image, Qwen3-VL Ollama image and refactor build/docs/tests to use new runtime/layout - Add new Dockerfiles for Nanonets (Dockerfile_nanonets_vllm_gpu_VRAM10GB), Qwen3 (Dockerfile_qwen3vl_ollama_gpu_VRAM20GB) and a clarified MiniCPM Ollama variant (Dockerfile_minicpm45v_ollama_gpu_VRAM9GB); remove older, redundant Dockerfiles. - Update build-images.sh to build the new image tags (minicpm45v, qwen3vl, nanonets-ocr) and adjust messaging/targets accordingly. - Documentation overhaul: readme.md and readme.hints.md updated to reflect vLLM vs Ollama runtimes, corrected ports/VRAM estimates, volume recommendations, and API endpoint details. - Tests updated to target the new model ID (nanonets/Nanonets-OCR2-3B), to process one page per batch, and to include a 10-minute AbortSignal timeout for OCR requests. - Added focused extraction test suites (test/test.invoices.extraction.ts and test/test.invoices.failed.ts) for faster iteration and debugging of invoice extraction. - Bump devDependencies: @git.zone/tsrun -> ^2.0.1 and @git.zone/tstest -> ^3.1.5. - Misc: test helper references and docker compose/test port mapping fixed (nanonets uses 8000), and various README sections cleaned and reorganized. ## 2026-01-18 - 1.13.2 - fix(tests) stabilize OCR extraction tests and manage GPU containers - Add stopAllGpuContainers() and call it before starting GPU images to free GPU memory. - Remove PaddleOCR-VL image configs and associated ensure helpers from docker test helper to simplify images list. - Split invoice/bankstatement tests into two sequential stages: Stage 1 runs Nanonets OCR to produce markdown files, Stage 2 stops Nanonets and runs model extraction from saved markdown (avoids GPU contention). - Introduce temporary markdown directory handling and cleanup; add stopNanonets() and container running checks in tests. - Switch bank statement extraction model from qwen3:8b to gpt-oss:20b; add request timeout and improved logging/console output across tests. - Refactor extractWithConsensus and extraction functions to accept document identifiers, improve error messages and JSON extraction robustness. ## 2026-01-18 - 1.13.1 - fix(image_support_files) remove PaddleOCR-VL server scripts from image_support_files - Deleted files: image_support_files/paddleocr_vl_full_server.py (approx. 636 lines) and image_support_files/paddleocr_vl_server.py (approx. 465 lines) - Cleanup/removal of legacy PaddleOCR-VL FastAPI server implementations — may affect users who relied on these local scripts ## 2026-01-18 - 1.13.0 - feat(tests) revamp tests and remove legacy Dockerfiles: adopt JSON/consensus workflows, switch MiniCPM model, and delete deprecated Docker/test variants - Removed multiple Dockerfiles and related entrypoints for MiniCPM and PaddleOCR-VL (cpu/gpu/full), cleaning up legacy image recipes. - Pruned many older test files (combined, ministral3, paddleocr-vl, and several invoice/test variants) to consolidate the test suite. - Updated bank statement MiniCPM test: now uses MODEL='openbmb/minicpm-v4.5:q8_0', JSON per-page extraction prompt, consensus retry logic, expanded logging, and stricter result matching. - Updated invoice MiniCPM test: switched to a consensus flow (fast JSON pass + thinking pass), increased PDF conversion quality, endpoints migrated to chat-style API calls with image-in-message payloads, and improved finalization logic. - API usage changed from /api/generate to /api/chat with message-based payloads and embedded images — CI and local test runners will need model availability and possible pipeline adjustments. ## 2026-01-18 - 1.12.0 - feat(tests) switch vision tests to multi-query extraction (count then per-row/field queries) and add logging/summaries - Replace streaming + consensus pipeline with multi-query approach: count rows per page, then query each transaction/field individually (batched parallel queries). - Introduce unified helpers (queryVision / queryField / getTransaction / countTransactions) and simplify Ollama requests (stream:false, reduced num_predict, /no_think prompts). - Improve parsing and normalization for amounts (European formats), invoice numbers, dates and currency extraction. - Adjust model checks to look for generic 'minicpm' and update test names/messages; add pass/fail counters and a summary test output. - Remove previous consensus voting and streaming JSON accumulation logic, and add immediate per-transaction logging and batching. ## 2026-01-18 - 1.11.0 - feat(vision) process pages separately and make Qwen3-VL vision extraction more robust; add per-page parsing, safer JSON handling, reduced token usage, and multi-query invoice extraction - Bank statements: split extraction into extractTransactionsFromPage and sequentially process pages to avoid thinking-token exhaustion - Bank statements: reduced num_predict from 8000 to 4000, send single image per request, added per-page logging and non-throwing handling for empty or non-JSON responses - Bank statements: catch JSON.parse errors and return empty array instead of throwing - Invoices: introduced queryField to request single values and perform multiple simple queries (reduces model thinking usage) - Invoices: reduced num_predict for invoice queries from 4000 to 500 and parse amounts robustly (handles European formats like 1.234,56) - Invoices: normalize currency to uppercase 3-letter code, return safe defaults (empty strings / 0) instead of nulls, and parse net/vat/total with fallbacks - General: simplified Ollama API error messages to avoid including response body content in thrown errors ## 2026-01-18 - 1.10.1 - fix(tests) improve Qwen3-VL invoice extraction test by switching to non-stream API, adding model availability/pull checks, simplifying response parsing, and tightening model options - Replaced streaming reader logic with direct JSON parsing of the /api/chat response - Added ensureQwen3Vl() to check and pull the Qwen3-VL:8b model from Ollama - Switched to ensureMiniCpm() to verify Ollama service is running before model checks - Use /no_think prompt for direct JSON output and set temperature to 0.0 and num_predict to 512 - Removed retry loop and streaming parsing; improved error messages to include response body - Updated logging and test setup messages for clarity ## 2026-01-18 - 1.10.0 - feat(vision) add Qwen3-VL vision model support with Dockerfile and tests; improve invoice OCR conversion and prompts; simplify extraction flow by removing consensus voting - Add Dockerfile_qwen3vl to provide an Ollama-based image for Qwen3-VL and expose the Ollama API on port 11434 - Introduce test/test.invoices.qwen3vl.ts and ensureQwen3Vl() helper to pull and test qwen3-vl:8b - Improve PDF->PNG conversion and prompt in ministral3 tests (higher DPI, max quality, sharpen) and increase num_predict from 512 to 1024 - Simplify extraction pipeline: remove consensus voting, log single-pass results, and simplify OCR HTML sanitization/truncation logic ## 2026-01-18 - 1.9.0 - feat(tests) add Ministral 3 vision tests and improve invoice extraction pipeline to use Ollama chat schema, sanitization, and multi-page support - Add new vision-based test suites for Ministral 3: test/test.invoices.ministral3.ts and test/test.bankstatements.ministral3.ts (model ministral-3:8b). - Introduce ensureMinistral3() helper to start/check Ollama/MiniCPM model in test/helpers/docker.ts. - Switch invoice extraction to use Ollama /api/chat with a JSON schema (format) and streaming support (reads message.content). - Improve HTML handling: sanitizeHtml() to remove OCR artifacts, concatenate multi-page HTML with page markers, and increase truncation limits. - Enhance response parsing: strip Markdown code fences, robustly locate JSON object boundaries, and provide clearer JSON parse errors. - Add PDF->PNG conversion (ImageMagick) and direct image-based extraction flow for vision model tests. ## 2026-01-18 - 1.8.0 - feat(paddleocr-vl) add structured HTML output and table parsing for PaddleOCR-VL, update API, tests, and README - Add result_to_html(), parse_markdown_table(), and parse_paddleocr_table() to emit semantic HTML and convert OCR/markdown tables to proper