feat(paddleocr): add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows

2026-01-16 13:23:01 +00:00
parent 67c38eeb67
commit bec379e9ca
10 changed files with 624 additions and 71 deletions
--- a/changelog.md
+++ b/changelog.md
@@ -1,5 +1,15 @@
 # Changelog

+## 2026-01-16 - 1.3.0 - feat(paddleocr)
+add PaddleOCR OCR service (Docker images, server, tests, docs) and CI workflows
+
+- Add GPU and CPU PaddleOCR Dockerfiles; pin paddlepaddle/paddle and paddleocr to stable 2.x and install libgomp1 for CPU builds
+- Avoid pre-downloading OCR models at build-time to prevent build-time segfaults; models are downloaded on first run
+- Refactor PaddleOCR FastAPI server: respect CUDA_VISIBLE_DEVICES, support per-request language, cache default language instance and create temporary instances for other languages
+- Add comprehensive tests (test.paddleocr.ts) and improve invoice extraction tests (parallelize passes, JSON OCR API usage, prioritize certain test cases)
+- Add Gitea CI workflows for tag and non-tag Docker runs and release pipeline (docker build/push, metadata trigger)
+- Update documentation (readme.hints.md) with PaddleOCR usage and add docker registry entry to npmextra.json
+
 ## 2026-01-16 - 1.2.0 - feat(paddleocr)
 add PaddleOCR support: Docker images, FastAPI server, entrypoint and tests