feat(providers): Add vision and document processing capabilities to providers
This commit is contained in:
43
readme.md
43
readme.md
@@ -17,8 +17,8 @@ This command installs the package and adds it to your project's dependencies.
|
||||
@push.rocks/smartai supports multiple AI providers, each with its own unique capabilities:
|
||||
|
||||
### OpenAI
|
||||
- Models: GPT-4, GPT-3.5-turbo
|
||||
- Features: Chat, Streaming, Audio Generation
|
||||
- Models: GPT-4, GPT-3.5-turbo, GPT-4-vision-preview
|
||||
- Features: Chat, Streaming, Audio Generation, Vision, Document Processing
|
||||
- Configuration:
|
||||
```typescript
|
||||
openaiToken: 'your-openai-token'
|
||||
@@ -49,12 +49,13 @@ This command installs the package and adds it to your project's dependencies.
|
||||
```
|
||||
|
||||
### Ollama
|
||||
- Models: Configurable (default: llama2)
|
||||
- Features: Chat, Streaming
|
||||
- Models: Configurable (default: llama2, llava for vision/documents)
|
||||
- Features: Chat, Streaming, Vision, Document Processing
|
||||
- Configuration:
|
||||
```typescript
|
||||
baseUrl: 'http://localhost:11434' // Optional
|
||||
model: 'llama2' // Optional
|
||||
visionModel: 'llava' // Optional, for vision and document tasks
|
||||
```
|
||||
|
||||
## Usage
|
||||
@@ -147,15 +148,47 @@ const audioStream = await smartAi.openaiProvider.audio({
|
||||
|
||||
### Document Processing
|
||||
|
||||
For providers that support document processing (currently OpenAI):
|
||||
For providers that support document processing (OpenAI and Ollama):
|
||||
|
||||
```typescript
|
||||
// Using OpenAI
|
||||
const result = await smartAi.openaiProvider.document({
|
||||
systemMessage: 'Classify the document type',
|
||||
userMessage: 'What type of document is this?',
|
||||
messageHistory: [],
|
||||
pdfDocuments: [pdfBuffer] // Uint8Array of PDF content
|
||||
});
|
||||
|
||||
// Using Ollama with llava
|
||||
const analysis = await smartAi.ollamaProvider.document({
|
||||
systemMessage: 'You are a document analysis assistant',
|
||||
userMessage: 'Extract the key information from this document',
|
||||
messageHistory: [],
|
||||
pdfDocuments: [pdfBuffer] // Uint8Array of PDF content
|
||||
});
|
||||
```
|
||||
|
||||
Both providers will:
|
||||
1. Convert PDF documents to images
|
||||
2. Process each page using their vision models
|
||||
3. Return a comprehensive analysis based on the system message and user query
|
||||
|
||||
### Vision Processing
|
||||
|
||||
For providers that support vision tasks (OpenAI and Ollama):
|
||||
|
||||
```typescript
|
||||
// Using OpenAI's GPT-4 Vision
|
||||
const description = await smartAi.openaiProvider.vision({
|
||||
image: imageBuffer, // Buffer containing the image data
|
||||
prompt: 'What do you see in this image?'
|
||||
});
|
||||
|
||||
// Using Ollama's Llava model
|
||||
const analysis = await smartAi.ollamaProvider.vision({
|
||||
image: imageBuffer,
|
||||
prompt: 'Analyze this image in detail'
|
||||
});
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
Reference in New Issue
Block a user