Files
ht-docker-ai/changelog.md

6.8 KiB

Changelog

2026-01-17 - 1.7.1 - fix(docker)

standardize Dockerfile and entrypoint filenames; add GPU-specific Dockerfiles and update build and test references

  • Added Dockerfile_minicpm45v_gpu and image_support_files/minicpm45v_entrypoint.sh; removed the old Dockerfile_minicpm45v and docker-entrypoint.sh
  • Renamed and simplified PaddleOCR entrypoint to image_support_files/paddleocr_vl_entrypoint.sh and updated CPU/GPU Dockerfile references
  • Updated build-images.sh to use *_gpu Dockerfiles and clarified PaddleOCR GPU build log
  • Updated test/helpers/docker.ts to point to Dockerfile_minicpm45v_gpu so tests build the GPU variant

2026-01-17 - 1.7.0 - feat(tests)

use Qwen2.5 (Ollama) for invoice extraction tests and add helpers for model management; normalize dates and coerce numeric fields

  • Added ensureOllamaModel and ensureQwen25 test helpers to pull/check Ollama models via localhost:11434
  • Updated invoices test to use qwen2.5:7b instead of MiniCPM and removed image payload from the text-only extraction step
  • Increased Markdown truncate limit from 8000 to 12000 and reduced model num_predict from 2048 to 512
  • Rewrote extraction prompt to require strict JSON output and added post-processing to parse/convert numeric fields
  • Added normalizeDate and improved compareInvoice to normalize dates and handle numeric formatting/tolerance
  • Updated test setup to ensure Qwen2.5 is available and adjusted logging/messages to reflect the Qwen2.5-based workflow

2026-01-17 - 1.6.0 - feat(paddleocr-vl)

add PaddleOCR-VL full pipeline Docker image and API server, plus integration tests and docker helpers

  • Add Dockerfile_paddleocr_vl_full and entrypoint script to build a GPU-enabled image with PP-DocLayoutV2 + PaddleOCR-VL and a FastAPI server
  • Introduce image_support_files/paddleocr_vl_full_server.py implementing the full pipeline API (/parse, OpenAI-compatible /v1/chat/completions) and a /formats endpoint
  • Improve image handling: decode_image supports data URLs, HTTP(S), raw base64 and file paths; add optimize_image_resolution to auto-scale images into the recommended 1080-2048px range
  • Add test helpers (test/helpers/docker.ts) to build/start/health-check Docker images and new ensurePaddleOcrVlFull workflow
  • Add comprehensive integration tests for bank statements and invoices (MiniCPM and PaddleOCR-VL variants) and update tests to ensure required containers are running before tests
  • Switch MiniCPM model references to 'minicpm-v:latest' and increase health/timeout expectations for the full pipeline

2026-01-17 - 1.5.0 - feat(paddleocr-vl)

add PaddleOCR-VL GPU Dockerfile, pin vllm, update CPU image deps, and improve entrypoint and tests

  • Add a new GPU Dockerfile for PaddleOCR-VL (transformers-based) with CUDA support, healthcheck, and entrypoint.
  • Pin vllm to 0.11.1 in Dockerfile_paddleocr_vl to use the first stable release with PaddleOCR-VL support.
  • Update CPU image: add torchvision==0.20.1 and extra Python deps (protobuf, sentencepiece, einops) required by the transformers-based server.
  • Rewrite paddleocr-vl-entrypoint.sh to build vllm args array, add MAX_MODEL_LEN and ENFORCE_EAGER env vars, include --limit-mm-per-prompt and optional --enforce-eager, and switch to exec vllm with constructed args.
  • Update tests to use the OpenAI-compatible PaddleOCR-VL chat completions API (/v1/chat/completions) with image+text message payload and model 'paddleocr-vl'.
  • Add @types/node to package.json dependencies and tidy devDependencies ordering.

2026-01-16 - 1.4.0 - feat(invoices)

add hybrid OCR + vision invoice/document parsing with PaddleOCR, consensus voting, and prompt/test refactors

  • Add hybrid pipeline documentation and examples (PaddleOCR + MiniCPM-V) and architecture diagram in recipes/document.md
  • Integrate PaddleOCR: new OCR extraction functions and OCR-only prompt flow in test/test.node.ts
  • Add consensus voting and parallel-pass optimization to improve reliability (multiple passes, hashing, and majority voting)
  • Refactor prompts and tests: introduce /nothink token, OCR truncation limits, separate visual and OCR-only prompts, and improved prompt building in test/test.invoices.ts
  • Update image conversion defaults (200 DPI, filename change) and add TypeScript helper functions for extraction and consensus handling

2026-01-16 - 1.3.0 - feat(paddleocr)

add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows

  • Add GPU and CPU PaddleOCR Dockerfiles; pin paddlepaddle/paddle and paddleocr to stable 2.x and install libgomp1 for CPU builds
  • Avoid pre-downloading OCR models at build-time to prevent build-time segfaults; models are downloaded on first run
  • Refactor PaddleOCR FastAPI server: respect CUDA_VISIBLE_DEVICES, support per-request language, cache default language instance and create temporary instances for other languages
  • Add comprehensive tests (test.paddleocr.ts) and improve invoice extraction tests (parallelize passes, JSON OCR API usage, prioritize certain test cases)
  • Add Gitea CI workflows for tag and non-tag Docker runs and release pipeline (docker build/push, metadata trigger)
  • Update documentation (readme.hints.md) with PaddleOCR usage and add docker registry entry to npmextra.json

2026-01-16 - 1.2.0 - feat(paddleocr)

add PaddleOCR support: Docker images, FastAPI server, entrypoint and tests

  • Add PaddleOCR FastAPI server implementation at image_support_files/paddleocr_server.py
  • Remove old image_support_files/paddleocr-server.py and update entrypoint to import paddleocr_server:app
  • Extend build-images.sh to build paddleocr (GPU) and paddleocr-cpu images and list them
  • Extend test-images.sh to add paddleocr health/OCR tests, new test_paddleocr_image function, port config, and cleanup; rename test_image -> test_minicpm_image

2026-01-16 - 1.1.0 - feat(ocr)

add PaddleOCR GPU Docker image and FastAPI OCR server with entrypoint; implement OCR endpoints and consensus extraction testing

  • Add Dockerfile_paddleocr for GPU-accelerated PaddleOCR image (pre-downloads PP-OCRv4 models, exposes port 5000, healthcheck, entrypoint)
  • Add image_support_files/paddleocr-server.py: FastAPI app providing /ocr (base64), /ocr/upload (file), and /health endpoints; model warm-up on startup; structured JSON responses and error handling
  • Add image_support_files/paddleocr-entrypoint.sh to configure environment, detect GPU/CPU mode, and launch uvicorn
  • Update test/test.node.ts to replace streaming extraction with a consensus-based extraction flow (multiple passes, hashing of results, majority voting) and improve logging/prompt text
  • Add test/test.invoices.ts: integration tests for invoice extraction that call PaddleOCR, build prompts with optional OCR text, run consensus extraction, and produce a summary report

2026-01-16 - 1.0.0 - initial release

Initial project files added with two small follow-up updates.

  • initial: base project commit.
  • update: two minor follow-up updates refining the initial commit.