feat(ocr): add smartai extraction support
This commit is contained in:
@@ -1,18 +1,19 @@
|
||||
# @push.rocks/smartocr
|
||||
an ocr module using ocrmypdf
|
||||
|
||||
OCR utilities for PDF text-layer generation with `ocrmypdf` and optional SmartAI-powered text extraction.
|
||||
|
||||
## Install
|
||||
To install `@push.rocks/smartocr`, use the following command with npm:
|
||||
To install `@push.rocks/smartocr`, use pnpm:
|
||||
|
||||
```bash
|
||||
npm install @push.rocks/smartocr --save
|
||||
pnpm install @push.rocks/smartocr
|
||||
```
|
||||
|
||||
This module depends on a few external utilities like `ocrmypdf`, so make sure you have these installed and available in your system's PATH. Consult the `ocrmypdf` documentation for installation instructions suitable for your operating system.
|
||||
|
||||
## Usage
|
||||
|
||||
This module provides a TypeScript interface for OCR processing of PDF documents using `ocrmypdf`, encapsulated in the `SmartOcr` class. Here's how to leverage it in your TypeScript project.
|
||||
This module provides a TypeScript interface for OCR processing of PDF documents using `ocrmypdf`, encapsulated in the `SmartOcr` class. It can also call SmartAI OCR for image buffers, or SmartAI document analysis for PDF text extraction with a vision-capable model.
|
||||
|
||||
### Preparing Your Project
|
||||
|
||||
@@ -45,6 +46,54 @@ await fs.promises.writeFile('./path/to/output/document_ocr.pdf', ocredPdfBuffer)
|
||||
|
||||
In the example above, we import the `SmartOcr` class and use it to process a PDF by passing a `Buffer` of the PDF file to the `processPdfBuffer` method. The method returns a `Buffer` of the processed PDF which includes a text layer added by OCR.
|
||||
|
||||
### SmartAI Image OCR
|
||||
|
||||
For image inputs, use SmartAI's OCR engine. By default this uses the Mistral OCR engine from `@push.rocks/smartai/ocr`; pass `mistralOcrOptions.apiKey`, set `MISTRAL_API_KEY`, or inject a custom `smartAiOcrEngine`.
|
||||
|
||||
```typescript
|
||||
import { SmartOcr } from '@push.rocks/smartocr';
|
||||
import * as fs from 'fs';
|
||||
|
||||
const smartOcr = await SmartOcr.createAndInit({
|
||||
mistralOcrOptions: {
|
||||
apiKey: process.env.MISTRAL_API_KEY,
|
||||
confidenceScoresGranularity: 'page',
|
||||
},
|
||||
});
|
||||
|
||||
const imageBuffer = await fs.promises.readFile('./scan.png');
|
||||
const result = await smartOcr.recognizeImageBufferWithSmartAi(imageBuffer, {
|
||||
mimeType: 'image/png',
|
||||
});
|
||||
|
||||
console.log(result.text);
|
||||
console.log(result.confidence);
|
||||
```
|
||||
|
||||
### SmartAI PDF Text Extraction
|
||||
|
||||
For cases where you want extracted text instead of a searchable PDF, pass a SmartAI model to `extractTextFromPdfBufferWithSmartAi()`. This uses `@push.rocks/smartai/document`, which converts PDF pages to images and asks a vision-capable model to extract text.
|
||||
|
||||
```typescript
|
||||
import { SmartOcr } from '@push.rocks/smartocr';
|
||||
import { getModel } from '@push.rocks/smartai';
|
||||
import * as fs from 'fs';
|
||||
|
||||
const smartOcr = await SmartOcr.createAndInit();
|
||||
const model = getModel({
|
||||
provider: 'anthropic',
|
||||
model: 'claude-sonnet-4-5-20250929',
|
||||
apiKey: process.env.ANTHROPIC_TOKEN,
|
||||
});
|
||||
|
||||
const pdfBuffer = await fs.promises.readFile('./scan.pdf');
|
||||
const extractedText = await smartOcr.extractTextFromPdfBufferWithSmartAi(pdfBuffer, {
|
||||
model,
|
||||
});
|
||||
|
||||
console.log(extractedText);
|
||||
```
|
||||
|
||||
### Advanced Usage
|
||||
|
||||
The `SmartOcr` class maintains an internal `smartshell` instance to interface with the `ocrmypdf` command. This setup is abstracted away, ensuring you don't need to manage or understand the underlying shell commands to use OCR functionality in your application.
|
||||
|
||||
Reference in New Issue
Block a user