Changelog

2026-01-17 - 1.7.1 - fix(docker)

standardize Dockerfile and entrypoint filenames; add GPU-specific Dockerfiles and update build and test references

Added Dockerfile_minicpm45v_gpu and image_support_files/minicpm45v_entrypoint.sh; removed the old Dockerfile_minicpm45v and docker-entrypoint.sh
Renamed and simplified PaddleOCR entrypoint to image_support_files/paddleocr_vl_entrypoint.sh and updated CPU/GPU Dockerfile references
Updated build-images.sh to use *_gpu Dockerfiles and clarified PaddleOCR GPU build log
Updated test/helpers/docker.ts to point to Dockerfile_minicpm45v_gpu so tests build the GPU variant

use Qwen2.5 (Ollama) for invoice extraction tests and add helpers for model management; normalize dates and coerce numeric fields

Added ensureOllamaModel and ensureQwen25 test helpers to pull/check Ollama models via localhost:11434
Updated invoices test to use qwen2.5:7b instead of MiniCPM and removed image payload from the text-only extraction step
Increased Markdown truncate limit from 8000 to 12000 and reduced model num_predict from 2048 to 512
Rewrote extraction prompt to require strict JSON output and added post-processing to parse/convert numeric fields
Added normalizeDate and improved compareInvoice to normalize dates and handle numeric formatting/tolerance
Updated test setup to ensure Qwen2.5 is available and adjusted logging/messages to reflect the Qwen2.5-based workflow

add PaddleOCR-VL full pipeline Docker image and API server, plus integration tests and docker helpers

Add Dockerfile_paddleocr_vl_full and entrypoint script to build a GPU-enabled image with PP-DocLayoutV2 + PaddleOCR-VL and a FastAPI server
Introduce image_support_files/paddleocr_vl_full_server.py implementing the full pipeline API (/parse, OpenAI-compatible /v1/chat/completions) and a /formats endpoint
Improve image handling: decode_image supports data URLs, HTTP(S), raw base64 and file paths; add optimize_image_resolution to auto-scale images into the recommended 1080-2048px range
Add test helpers (test/helpers/docker.ts) to build/start/health-check Docker images and new ensurePaddleOcrVlFull workflow
Add comprehensive integration tests for bank statements and invoices (MiniCPM and PaddleOCR-VL variants) and update tests to ensure required containers are running before tests
Switch MiniCPM model references to 'minicpm-v:latest' and increase health/timeout expectations for the full pipeline

add PaddleOCR-VL GPU Dockerfile, pin vllm, update CPU image deps, and improve entrypoint and tests

Add a new GPU Dockerfile for PaddleOCR-VL (transformers-based) with CUDA support, healthcheck, and entrypoint.
Pin vllm to 0.11.1 in Dockerfile_paddleocr_vl to use the first stable release with PaddleOCR-VL support.
Update CPU image: add torchvision==0.20.1 and extra Python deps (protobuf, sentencepiece, einops) required by the transformers-based server.
Rewrite paddleocr-vl-entrypoint.sh to build vllm args array, add MAX_MODEL_LEN and ENFORCE_EAGER env vars, include --limit-mm-per-prompt and optional --enforce-eager, and switch to exec vllm with constructed args.
Update tests to use the OpenAI-compatible PaddleOCR-VL chat completions API (/v1/chat/completions) with image+text message payload and model 'paddleocr-vl'.
Add @types/node to package.json dependencies and tidy devDependencies ordering.

add hybrid OCR + vision invoice/document parsing with PaddleOCR, consensus voting, and prompt/test refactors

Add hybrid pipeline documentation and examples (PaddleOCR + MiniCPM-V) and architecture diagram in recipes/document.md
Integrate PaddleOCR: new OCR extraction functions and OCR-only prompt flow in test/test.node.ts
Add consensus voting and parallel-pass optimization to improve reliability (multiple passes, hashing, and majority voting)
Refactor prompts and tests: introduce /nothink token, OCR truncation limits, separate visual and OCR-only prompts, and improved prompt building in test/test.invoices.ts
Update image conversion defaults (200 DPI, filename change) and add TypeScript helper functions for extraction and consensus handling

add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows

Add GPU and CPU PaddleOCR Dockerfiles; pin paddlepaddle/paddle and paddleocr to stable 2.x and install libgomp1 for CPU builds
Avoid pre-downloading OCR models at build-time to prevent build-time segfaults; models are downloaded on first run
Refactor PaddleOCR FastAPI server: respect CUDA_VISIBLE_DEVICES, support per-request language, cache default language instance and create temporary instances for other languages
Add comprehensive tests (test.paddleocr.ts) and improve invoice extraction tests (parallelize passes, JSON OCR API usage, prioritize certain test cases)
Add Gitea CI workflows for tag and non-tag Docker runs and release pipeline (docker build/push, metadata trigger)
Update documentation (readme.hints.md) with PaddleOCR usage and add docker registry entry to npmextra.json

add PaddleOCR support: Docker images, FastAPI server, entrypoint and tests

Add PaddleOCR FastAPI server implementation at image_support_files/paddleocr_server.py
Remove old image_support_files/paddleocr-server.py and update entrypoint to import paddleocr_server:app
Extend build-images.sh to build paddleocr (GPU) and paddleocr-cpu images and list them
Extend test-images.sh to add paddleocr health/OCR tests, new test_paddleocr_image function, port config, and cleanup; rename test_image -> test_minicpm_image

add PaddleOCR GPU Docker image and FastAPI OCR server with entrypoint; implement OCR endpoints and consensus extraction testing

Add Dockerfile_paddleocr for GPU-accelerated PaddleOCR image (pre-downloads PP-OCRv4 models, exposes port 5000, healthcheck, entrypoint)
Add image_support_files/paddleocr-server.py: FastAPI app providing /ocr (base64), /ocr/upload (file), and /health endpoints; model warm-up on startup; structured JSON responses and error handling
Add image_support_files/paddleocr-entrypoint.sh to configure environment, detect GPU/CPU mode, and launch uvicorn
Update test/test.node.ts to replace streaming extraction with a consensus-based extraction flow (multiple passes, hashing of results, majority voting) and improve logging/prompt text
Add test/test.invoices.ts: integration tests for invoice extraction that call PaddleOCR, build prompts with optional OCR text, run consensus extraction, and produce a summary report

Initial project files added with two small follow-up updates.