feat(ocr): add smartai extraction support

2026-05-19 06:42:42 +00:00
parent d86a83d515
commit 30780e7514
8 changed files with 9864 additions and 3355 deletions
@@ -1,18 +1,19 @@
 # @push.rocks/smartocr
-an ocr module using ocrmypdf
+
+OCR utilities for PDF text-layer generation with `ocrmypdf` and optional SmartAI-powered text extraction.

 ## Install
-To install `@push.rocks/smartocr`, use the following command with npm:
+To install `@push.rocks/smartocr`, use pnpm:

 ```bash
-npm install @push.rocks/smartocr --save
+pnpm install @push.rocks/smartocr
 ```

 This module depends on a few external utilities like `ocrmypdf`, so make sure you have these installed and available in your system's PATH. Consult the `ocrmypdf` documentation for installation instructions suitable for your operating system.

 ## Usage

-This module provides a TypeScript interface for OCR processing of PDF documents using `ocrmypdf`, encapsulated in the `SmartOcr` class. Here's how to leverage it in your TypeScript project.
+This module provides a TypeScript interface for OCR processing of PDF documents using `ocrmypdf`, encapsulated in the `SmartOcr` class. It can also call SmartAI OCR for image buffers, or SmartAI document analysis for PDF text extraction with a vision-capable model.

 ### Preparing Your Project

@@ -45,6 +46,54 @@ await fs.promises.writeFile('./path/to/output/document_ocr.pdf', ocredPdfBuffer)

 In the example above, we import the `SmartOcr` class and use it to process a PDF by passing a `Buffer` of the PDF file to the `processPdfBuffer` method. The method returns a `Buffer` of the processed PDF which includes a text layer added by OCR.

+### SmartAI Image OCR
+
+For image inputs, use SmartAI's OCR engine. By default this uses the Mistral OCR engine from `@push.rocks/smartai/ocr`; pass `mistralOcrOptions.apiKey`, set `MISTRAL_API_KEY`, or inject a custom `smartAiOcrEngine`.
+
+```typescript
+import { SmartOcr } from '@push.rocks/smartocr';
+import * as fs from 'fs';
+
+const smartOcr = await SmartOcr.createAndInit({
+  mistralOcrOptions: {
+    apiKey: process.env.MISTRAL_API_KEY,
+    confidenceScoresGranularity: 'page',
+  },
+});
+
+const imageBuffer = await fs.promises.readFile('./scan.png');
+const result = await smartOcr.recognizeImageBufferWithSmartAi(imageBuffer, {
+  mimeType: 'image/png',
+});
+
+console.log(result.text);
+console.log(result.confidence);
+```
+
+### SmartAI PDF Text Extraction
+
+For cases where you want extracted text instead of a searchable PDF, pass a SmartAI model to `extractTextFromPdfBufferWithSmartAi()`. This uses `@push.rocks/smartai/document`, which converts PDF pages to images and asks a vision-capable model to extract text.
+
+```typescript
+import { SmartOcr } from '@push.rocks/smartocr';
+import { getModel } from '@push.rocks/smartai';
+import * as fs from 'fs';
+
+const smartOcr = await SmartOcr.createAndInit();
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
+});
+
+const pdfBuffer = await fs.promises.readFile('./scan.pdf');
+const extractedText = await smartOcr.extractTextFromPdfBufferWithSmartAi(pdfBuffer, {
+  model,
+});
+
+console.log(extractedText);
+```
+
 ### Advanced Usage

 The `SmartOcr` class maintains an internal `smartshell` instance to interface with the `ocrmypdf` command. This setup is abstracted away, ensuring you don't need to manage or understand the underlying shell commands to use OCR functionality in your application.