feat(ocr): add smartai extraction support

This commit is contained in:
2026-05-19 06:42:42 +00:00
parent d86a83d515
commit 30780e7514
8 changed files with 9864 additions and 3355 deletions
+53 -4
View File
@@ -1,18 +1,19 @@
# @push.rocks/smartocr
an ocr module using ocrmypdf
OCR utilities for PDF text-layer generation with `ocrmypdf` and optional SmartAI-powered text extraction.
## Install
To install `@push.rocks/smartocr`, use the following command with npm:
To install `@push.rocks/smartocr`, use pnpm:
```bash
npm install @push.rocks/smartocr --save
pnpm install @push.rocks/smartocr
```
This module depends on a few external utilities like `ocrmypdf`, so make sure you have these installed and available in your system's PATH. Consult the `ocrmypdf` documentation for installation instructions suitable for your operating system.
## Usage
This module provides a TypeScript interface for OCR processing of PDF documents using `ocrmypdf`, encapsulated in the `SmartOcr` class. Here's how to leverage it in your TypeScript project.
This module provides a TypeScript interface for OCR processing of PDF documents using `ocrmypdf`, encapsulated in the `SmartOcr` class. It can also call SmartAI OCR for image buffers, or SmartAI document analysis for PDF text extraction with a vision-capable model.
### Preparing Your Project
@@ -45,6 +46,54 @@ await fs.promises.writeFile('./path/to/output/document_ocr.pdf', ocredPdfBuffer)
In the example above, we import the `SmartOcr` class and use it to process a PDF by passing a `Buffer` of the PDF file to the `processPdfBuffer` method. The method returns a `Buffer` of the processed PDF which includes a text layer added by OCR.
### SmartAI Image OCR
For image inputs, use SmartAI's OCR engine. By default this uses the Mistral OCR engine from `@push.rocks/smartai/ocr`; pass `mistralOcrOptions.apiKey`, set `MISTRAL_API_KEY`, or inject a custom `smartAiOcrEngine`.
```typescript
import { SmartOcr } from '@push.rocks/smartocr';
import * as fs from 'fs';
const smartOcr = await SmartOcr.createAndInit({
mistralOcrOptions: {
apiKey: process.env.MISTRAL_API_KEY,
confidenceScoresGranularity: 'page',
},
});
const imageBuffer = await fs.promises.readFile('./scan.png');
const result = await smartOcr.recognizeImageBufferWithSmartAi(imageBuffer, {
mimeType: 'image/png',
});
console.log(result.text);
console.log(result.confidence);
```
### SmartAI PDF Text Extraction
For cases where you want extracted text instead of a searchable PDF, pass a SmartAI model to `extractTextFromPdfBufferWithSmartAi()`. This uses `@push.rocks/smartai/document`, which converts PDF pages to images and asks a vision-capable model to extract text.
```typescript
import { SmartOcr } from '@push.rocks/smartocr';
import { getModel } from '@push.rocks/smartai';
import * as fs from 'fs';
const smartOcr = await SmartOcr.createAndInit();
const model = getModel({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
apiKey: process.env.ANTHROPIC_TOKEN,
});
const pdfBuffer = await fs.promises.readFile('./scan.pdf');
const extractedText = await smartOcr.extractTextFromPdfBufferWithSmartAi(pdfBuffer, {
model,
});
console.log(extractedText);
```
### Advanced Usage
The `SmartOcr` class maintains an internal `smartshell` instance to interface with the `ocrmypdf` command. This setup is abstracted away, ensuring you don't need to manage or understand the underlying shell commands to use OCR functionality in your application.