feat(providers): Add research API and image generation/editing support; extend providers and tests
This commit is contained in:
13
changelog.md
13
changelog.md
@@ -1,5 +1,18 @@
|
||||
# Changelog
|
||||
|
||||
## 2025-10-03 - 0.7.0 - feat(providers)
|
||||
Add research API and image generation/editing support; extend providers and tests
|
||||
|
||||
- Introduce ResearchOptions and ResearchResponse to the MultiModalModel interface and implement research() where supported
|
||||
- OpenAiProvider: implement research(), add imageGenerate() and imageEdit() methods (gpt-image-1 / DALL·E support), and expose imageModel option
|
||||
- AnthropicProvider: implement research() and vision handling; explicitly throw for unsupported image generation/editing
|
||||
- PerplexityProvider: implement research() (sonar / sonar-pro support) and expose citation parsing
|
||||
- Add image/document-related interfaces (ImageGenerateOptions, ImageEditOptions, ImageResponse) to abstract API
|
||||
- Add image generation/editing/no-op stubs for other providers (Exo, Groq, Ollama, XAI) that throw informative errors to preserve API compatibility
|
||||
- Add comprehensive OpenAI image generation tests and helper to save test outputs (test/test.image.openai.ts)
|
||||
- Update README with Research & Web Search documentation, capability matrix, and roadmap entry for Research & Web Search API
|
||||
- Add local Claude agent permissions file (.claude/settings.local.json) and various provider type/import updates
|
||||
|
||||
## 2025-09-28 - 0.6.1 - fix(provider.anthropic)
|
||||
Fix Anthropic research tool identifier and add tests + local Claude permissions
|
||||
|
||||
|
65
readme.md
65
readme.md
@@ -45,15 +45,15 @@ const response = await ai.openaiProvider.chat({
|
||||
|
||||
Choose the right provider for your use case:
|
||||
|
||||
| Provider | Chat | Streaming | TTS | Vision | Documents | Highlights |
|
||||
|----------|:----:|:---------:|:---:|:------:|:---------:|------------|
|
||||
| **OpenAI** | ✅ | ✅ | ✅ | ✅ | ✅ | • GPT-4, DALL-E 3<br>• Industry standard<br>• Most features |
|
||||
| **Anthropic** | ✅ | ✅ | ❌ | ✅ | ✅ | • Claude 3 Opus<br>• Superior reasoning<br>• 200k context |
|
||||
| **Ollama** | ✅ | ✅ | ❌ | ✅ | ✅ | • 100% local<br>• Privacy-first<br>• No API costs |
|
||||
| **XAI** | ✅ | ✅ | ❌ | ❌ | ✅ | • Grok models<br>• Real-time data<br>• Uncensored |
|
||||
| **Perplexity** | ✅ | ✅ | ❌ | ❌ | ❌ | • Web-aware<br>• Research-focused<br>• Citations |
|
||||
| **Groq** | ✅ | ✅ | ❌ | ❌ | ❌ | • 10x faster<br>• LPU inference<br>• Low latency |
|
||||
| **Exo** | ✅ | ✅ | ❌ | ❌ | ❌ | • Distributed<br>• P2P compute<br>• Decentralized |
|
||||
| Provider | Chat | Streaming | TTS | Vision | Documents | Research | Highlights |
|
||||
|----------|:----:|:---------:|:---:|:------:|:---------:|:--------:|------------|
|
||||
| **OpenAI** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | • GPT-4, DALL-E 3<br>• Industry standard<br>• Deep research API |
|
||||
| **Anthropic** | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | • Claude 3 Opus<br>• Superior reasoning<br>• Web search API |
|
||||
| **Ollama** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | • 100% local<br>• Privacy-first<br>• No API costs |
|
||||
| **XAI** | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | • Grok models<br>• Real-time data<br>• Uncensored |
|
||||
| **Perplexity** | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | • Web-aware<br>• Research-focused<br>• Sonar Pro models |
|
||||
| **Groq** | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | • 10x faster<br>• LPU inference<br>• Low latency |
|
||||
| **Exo** | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | • Distributed<br>• P2P compute<br>• Decentralized |
|
||||
|
||||
## 🎮 Core Features
|
||||
|
||||
@@ -171,6 +171,51 @@ const taxAnalysis = await ai.anthropicProvider.document({
|
||||
});
|
||||
```
|
||||
|
||||
### 🔬 Research & Web Search
|
||||
|
||||
Perform deep research with web search capabilities across multiple providers:
|
||||
|
||||
```typescript
|
||||
// OpenAI Deep Research - Comprehensive analysis
|
||||
const deepResearch = await ai.openaiProvider.research({
|
||||
query: 'What are the latest developments in quantum computing?',
|
||||
searchDepth: 'deep',
|
||||
includeWebSearch: true
|
||||
});
|
||||
|
||||
console.log(deepResearch.answer);
|
||||
console.log('Sources:', deepResearch.sources);
|
||||
|
||||
// Anthropic Web Search - Domain-filtered research
|
||||
const anthropic = new AnthropicProvider({
|
||||
anthropicToken: 'sk-ant-...',
|
||||
enableWebSearch: true,
|
||||
searchDomainAllowList: ['nature.com', 'science.org']
|
||||
});
|
||||
|
||||
const scientificResearch = await anthropic.research({
|
||||
query: 'Latest breakthroughs in CRISPR gene editing',
|
||||
searchDepth: 'advanced'
|
||||
});
|
||||
|
||||
// Perplexity - Research-focused with citations
|
||||
const perplexityResearch = await ai.perplexityProvider.research({
|
||||
query: 'Current state of autonomous vehicle technology',
|
||||
searchDepth: 'deep' // Uses Sonar Pro model
|
||||
});
|
||||
```
|
||||
|
||||
**Research Options:**
|
||||
- `searchDepth`: 'basic' | 'advanced' | 'deep'
|
||||
- `maxSources`: Number of sources to include
|
||||
- `includeWebSearch`: Enable web search (OpenAI)
|
||||
- `background`: Run as background task (OpenAI)
|
||||
|
||||
**Supported Providers:**
|
||||
- **OpenAI**: Deep Research API with specialized models (`o3-deep-research-2025-06-26`, `o4-mini-deep-research-2025-06-26`)
|
||||
- **Anthropic**: Web Search API with domain filtering
|
||||
- **Perplexity**: Sonar and Sonar Pro models with built-in citations
|
||||
|
||||
### 🔄 Persistent Conversations
|
||||
|
||||
Maintain context across interactions:
|
||||
@@ -447,6 +492,7 @@ export PERPLEXITY_API_KEY=pplx-...
|
||||
| **General Purpose** | OpenAI | Most features, stable, well-documented |
|
||||
| **Complex Reasoning** | Anthropic | Superior logical thinking, safer outputs |
|
||||
| **Research & Facts** | Perplexity | Web-aware, provides citations |
|
||||
| **Deep Research** | OpenAI | Deep Research API with comprehensive analysis |
|
||||
| **Speed Critical** | Groq | 10x faster inference, sub-second responses |
|
||||
| **Privacy Critical** | Ollama | 100% local, no data leaves your servers |
|
||||
| **Real-time Data** | XAI | Access to current information |
|
||||
@@ -454,6 +500,7 @@ export PERPLEXITY_API_KEY=pplx-...
|
||||
|
||||
## 📈 Roadmap
|
||||
|
||||
- [x] Research & Web Search API
|
||||
- [ ] Streaming function calls
|
||||
- [ ] Image generation support
|
||||
- [ ] Voice input processing
|
||||
|
@@ -1,177 +0,0 @@
|
||||
# SmartAI Research API Implementation
|
||||
|
||||
This document describes the new research capabilities added to the SmartAI library, enabling web search and deep research features for OpenAI and Anthropic providers.
|
||||
|
||||
## Features Added
|
||||
|
||||
### 1. Research Method Interface
|
||||
|
||||
Added a new `research()` method to the `MultiModalModel` abstract class with the following interfaces:
|
||||
|
||||
```typescript
|
||||
interface ResearchOptions {
|
||||
query: string;
|
||||
searchDepth?: 'basic' | 'advanced' | 'deep';
|
||||
maxSources?: number;
|
||||
includeWebSearch?: boolean;
|
||||
background?: boolean;
|
||||
}
|
||||
|
||||
interface ResearchResponse {
|
||||
answer: string;
|
||||
sources: Array<{
|
||||
url: string;
|
||||
title: string;
|
||||
snippet: string;
|
||||
}>;
|
||||
searchQueries?: string[];
|
||||
metadata?: any;
|
||||
}
|
||||
```
|
||||
|
||||
### 2. OpenAI Provider Research Implementation
|
||||
|
||||
The OpenAI provider now supports:
|
||||
- **Deep Research API** with models:
|
||||
- `o3-deep-research-2025-06-26` (comprehensive analysis)
|
||||
- `o4-mini-deep-research-2025-06-26` (lightweight, faster)
|
||||
- **Web Search** for standard models (gpt-5, o3, o3-pro, o4-mini)
|
||||
- **Background processing** for async deep research tasks
|
||||
|
||||
### 3. Anthropic Provider Research Implementation
|
||||
|
||||
The Anthropic provider now supports:
|
||||
- **Web Search API** with Claude models
|
||||
- **Domain filtering** (allow/block lists)
|
||||
- **Progressive searches** for comprehensive research
|
||||
- **Citation extraction** from responses
|
||||
|
||||
### 4. Perplexity Provider Research Implementation
|
||||
|
||||
The Perplexity provider implements research using:
|
||||
- **Sonar models** for standard searches
|
||||
- **Sonar Pro** for deep research
|
||||
- Built-in citation support
|
||||
|
||||
### 5. Other Providers
|
||||
|
||||
Added research method stubs to:
|
||||
- Groq Provider
|
||||
- Ollama Provider
|
||||
- xAI Provider
|
||||
- Exo Provider
|
||||
|
||||
These providers throw a "not yet supported" error when research is called, maintaining interface compatibility.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Research with OpenAI
|
||||
|
||||
```typescript
|
||||
import { OpenAiProvider } from '@push.rocks/smartai';
|
||||
|
||||
const openai = new OpenAiProvider({
|
||||
openaiToken: 'your-api-key',
|
||||
researchModel: 'o4-mini-deep-research-2025-06-26'
|
||||
});
|
||||
|
||||
await openai.start();
|
||||
|
||||
const result = await openai.research({
|
||||
query: 'What are the latest developments in quantum computing?',
|
||||
searchDepth: 'basic',
|
||||
includeWebSearch: true
|
||||
});
|
||||
|
||||
console.log(result.answer);
|
||||
console.log('Sources:', result.sources);
|
||||
```
|
||||
|
||||
### Deep Research with OpenAI
|
||||
|
||||
```typescript
|
||||
const deepResult = await openai.research({
|
||||
query: 'Comprehensive analysis of climate change mitigation strategies',
|
||||
searchDepth: 'deep',
|
||||
background: true
|
||||
});
|
||||
```
|
||||
|
||||
### Research with Anthropic
|
||||
|
||||
```typescript
|
||||
import { AnthropicProvider } from '@push.rocks/smartai';
|
||||
|
||||
const anthropic = new AnthropicProvider({
|
||||
anthropicToken: 'your-api-key',
|
||||
enableWebSearch: true,
|
||||
searchDomainAllowList: ['nature.com', 'science.org']
|
||||
});
|
||||
|
||||
await anthropic.start();
|
||||
|
||||
const result = await anthropic.research({
|
||||
query: 'Latest breakthroughs in CRISPR gene editing',
|
||||
searchDepth: 'advanced'
|
||||
});
|
||||
```
|
||||
|
||||
### Research with Perplexity
|
||||
|
||||
```typescript
|
||||
import { PerplexityProvider } from '@push.rocks/smartai';
|
||||
|
||||
const perplexity = new PerplexityProvider({
|
||||
perplexityToken: 'your-api-key'
|
||||
});
|
||||
|
||||
const result = await perplexity.research({
|
||||
query: 'Current state of autonomous vehicle technology',
|
||||
searchDepth: 'deep' // Uses Sonar Pro model
|
||||
});
|
||||
```
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### OpenAI Provider
|
||||
- `researchModel`: Specify deep research model (default: `o4-mini-deep-research-2025-06-26`)
|
||||
- `enableWebSearch`: Enable web search for standard models
|
||||
|
||||
### Anthropic Provider
|
||||
- `enableWebSearch`: Enable web search capabilities
|
||||
- `searchDomainAllowList`: Array of allowed domains
|
||||
- `searchDomainBlockList`: Array of blocked domains
|
||||
|
||||
## API Pricing
|
||||
|
||||
- **OpenAI Deep Research**: $10 per 1,000 calls
|
||||
- **Anthropic Web Search**: $10 per 1,000 searches + standard token costs
|
||||
- **Perplexity Sonar**: $5 per 1,000 searches (Sonar Pro)
|
||||
|
||||
## Testing
|
||||
|
||||
Run the test suite:
|
||||
|
||||
```bash
|
||||
pnpm test test/test.research.ts
|
||||
```
|
||||
|
||||
All providers have been tested to ensure:
|
||||
- Research methods are properly exposed
|
||||
- Interfaces are correctly typed
|
||||
- Unsupported providers throw appropriate errors
|
||||
|
||||
## Next Steps
|
||||
|
||||
Future enhancements could include:
|
||||
1. Implementing Google Gemini Grounding API support
|
||||
2. Adding Brave Search API integration
|
||||
3. Implementing retry logic for rate limits
|
||||
4. Adding caching for repeated queries
|
||||
5. Supporting batch research operations
|
||||
|
||||
## Notes
|
||||
|
||||
- The implementation maintains backward compatibility
|
||||
- All existing methods continue to work unchanged
|
||||
- Research capabilities are optional and don't affect existing functionality
|
203
test/test.image.openai.ts
Normal file
203
test/test.image.openai.ts
Normal file
@@ -0,0 +1,203 @@
|
||||
import { expect, tap } from '@push.rocks/tapbundle';
|
||||
import * as qenv from '@push.rocks/qenv';
|
||||
import * as smartai from '../ts/index.js';
|
||||
import * as path from 'path';
|
||||
import { promises as fs } from 'fs';
|
||||
|
||||
const testQenv = new qenv.Qenv('./', './.nogit/');
|
||||
|
||||
let openaiProvider: smartai.OpenAiProvider;
|
||||
|
||||
// Helper function to save image results
|
||||
async function saveImageResult(testName: string, result: any) {
|
||||
const sanitizedName = testName.replace(/[^a-z0-9]/gi, '_').toLowerCase();
|
||||
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
||||
const filename = `openai_${sanitizedName}_${timestamp}.json`;
|
||||
const filepath = path.join('.nogit', 'testresults', 'images', filename);
|
||||
|
||||
await fs.mkdir(path.dirname(filepath), { recursive: true });
|
||||
await fs.writeFile(filepath, JSON.stringify(result, null, 2), 'utf-8');
|
||||
|
||||
console.log(` 💾 Saved to: ${filepath}`);
|
||||
|
||||
// Also save the actual image if b64_json is present
|
||||
if (result.images && result.images[0]?.b64_json) {
|
||||
const imageFilename = `openai_${sanitizedName}_${timestamp}.png`;
|
||||
const imageFilepath = path.join('.nogit', 'testresults', 'images', imageFilename);
|
||||
await fs.writeFile(imageFilepath, Buffer.from(result.images[0].b64_json, 'base64'));
|
||||
console.log(` 🖼️ Image saved to: ${imageFilepath}`);
|
||||
}
|
||||
}
|
||||
|
||||
tap.test('OpenAI Image Generation: should initialize provider', async () => {
|
||||
const openaiToken = await testQenv.getEnvVarOnDemand('OPENAI_TOKEN');
|
||||
expect(openaiToken).toBeTruthy();
|
||||
|
||||
openaiProvider = new smartai.OpenAiProvider({
|
||||
openaiToken,
|
||||
imageModel: 'gpt-image-1'
|
||||
});
|
||||
|
||||
await openaiProvider.start();
|
||||
expect(openaiProvider).toBeInstanceOf(smartai.OpenAiProvider);
|
||||
});
|
||||
|
||||
tap.test('OpenAI Image: Basic generation with gpt-image-1', async () => {
|
||||
const result = await openaiProvider.imageGenerate({
|
||||
prompt: 'A cute robot reading a book in a cozy library, digital art style',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'medium',
|
||||
size: '1024x1024'
|
||||
});
|
||||
|
||||
console.log('Basic gpt-image-1 Generation:');
|
||||
console.log('- Images generated:', result.images.length);
|
||||
console.log('- Model used:', result.metadata?.model);
|
||||
console.log('- Quality:', result.metadata?.quality);
|
||||
console.log('- Size:', result.metadata?.size);
|
||||
console.log('- Tokens used:', result.metadata?.tokensUsed);
|
||||
|
||||
await saveImageResult('basic_generation_gptimage1', result);
|
||||
|
||||
expect(result.images).toBeTruthy();
|
||||
expect(result.images.length).toEqual(1);
|
||||
expect(result.images[0].b64_json).toBeTruthy();
|
||||
expect(result.metadata?.model).toEqual('gpt-image-1');
|
||||
});
|
||||
|
||||
tap.test('OpenAI Image: High quality with transparent background', async () => {
|
||||
const result = await openaiProvider.imageGenerate({
|
||||
prompt: 'A simple geometric logo of a mountain peak, minimal design, clean lines',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'high',
|
||||
size: '1024x1024',
|
||||
background: 'transparent',
|
||||
outputFormat: 'png'
|
||||
});
|
||||
|
||||
console.log('High Quality Transparent:');
|
||||
console.log('- Quality:', result.metadata?.quality);
|
||||
console.log('- Background: transparent');
|
||||
console.log('- Format:', result.metadata?.outputFormat);
|
||||
console.log('- Tokens used:', result.metadata?.tokensUsed);
|
||||
|
||||
await saveImageResult('high_quality_transparent', result);
|
||||
|
||||
expect(result.images.length).toEqual(1);
|
||||
expect(result.images[0].b64_json).toBeTruthy();
|
||||
});
|
||||
|
||||
tap.test('OpenAI Image: WebP format with compression', async () => {
|
||||
const result = await openaiProvider.imageGenerate({
|
||||
prompt: 'A futuristic cityscape at sunset with flying cars, photorealistic',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'high',
|
||||
size: '1536x1024',
|
||||
outputFormat: 'webp',
|
||||
outputCompression: 85
|
||||
});
|
||||
|
||||
console.log('WebP with Compression:');
|
||||
console.log('- Format:', result.metadata?.outputFormat);
|
||||
console.log('- Compression: 85%');
|
||||
console.log('- Size:', result.metadata?.size);
|
||||
|
||||
await saveImageResult('webp_compression', result);
|
||||
|
||||
expect(result.images.length).toEqual(1);
|
||||
expect(result.images[0].b64_json).toBeTruthy();
|
||||
});
|
||||
|
||||
tap.test('OpenAI Image: Text rendering with gpt-image-1', async () => {
|
||||
const result = await openaiProvider.imageGenerate({
|
||||
prompt: 'A vintage cafe sign that says "COFFEE & CODE" in elegant hand-lettered typography, warm colors',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'high',
|
||||
size: '1024x1024'
|
||||
});
|
||||
|
||||
console.log('Text Rendering:');
|
||||
console.log('- Prompt includes text: "COFFEE & CODE"');
|
||||
console.log('- gpt-image-1 has superior text rendering');
|
||||
console.log('- Tokens used:', result.metadata?.tokensUsed);
|
||||
|
||||
await saveImageResult('text_rendering', result);
|
||||
|
||||
expect(result.images.length).toEqual(1);
|
||||
expect(result.images[0].b64_json).toBeTruthy();
|
||||
});
|
||||
|
||||
tap.test('OpenAI Image: Multiple images generation', async () => {
|
||||
const result = await openaiProvider.imageGenerate({
|
||||
prompt: 'Abstract colorful geometric patterns, modern minimalist art',
|
||||
model: 'gpt-image-1',
|
||||
n: 2,
|
||||
quality: 'medium',
|
||||
size: '1024x1024'
|
||||
});
|
||||
|
||||
console.log('Multiple Images:');
|
||||
console.log('- Images requested: 2');
|
||||
console.log('- Images generated:', result.images.length);
|
||||
|
||||
await saveImageResult('multiple_images', result);
|
||||
|
||||
expect(result.images.length).toEqual(2);
|
||||
expect(result.images[0].b64_json).toBeTruthy();
|
||||
expect(result.images[1].b64_json).toBeTruthy();
|
||||
});
|
||||
|
||||
tap.test('OpenAI Image: Low moderation setting', async () => {
|
||||
const result = await openaiProvider.imageGenerate({
|
||||
prompt: 'A fantasy battle scene with warriors and dragons',
|
||||
model: 'gpt-image-1',
|
||||
moderation: 'low',
|
||||
quality: 'medium'
|
||||
});
|
||||
|
||||
console.log('Low Moderation:');
|
||||
console.log('- Moderation: low (less restrictive filtering)');
|
||||
console.log('- Tokens used:', result.metadata?.tokensUsed);
|
||||
|
||||
await saveImageResult('low_moderation', result);
|
||||
|
||||
expect(result.images.length).toEqual(1);
|
||||
expect(result.images[0].b64_json).toBeTruthy();
|
||||
});
|
||||
|
||||
tap.test('OpenAI Image Editing: edit with gpt-image-1', async () => {
|
||||
// First, generate a base image
|
||||
const baseResult = await openaiProvider.imageGenerate({
|
||||
prompt: 'A simple white cat sitting on a red cushion',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'low',
|
||||
size: '1024x1024'
|
||||
});
|
||||
|
||||
const baseImageBuffer = Buffer.from(baseResult.images[0].b64_json!, 'base64');
|
||||
|
||||
// Now edit it
|
||||
const editResult = await openaiProvider.imageEdit({
|
||||
image: baseImageBuffer,
|
||||
prompt: 'Change the cat to orange and add stylish sunglasses',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'medium'
|
||||
});
|
||||
|
||||
console.log('Image Editing:');
|
||||
console.log('- Base image created');
|
||||
console.log('- Edit: change color and add sunglasses');
|
||||
console.log('- Result images:', editResult.images.length);
|
||||
|
||||
await saveImageResult('image_edit', editResult);
|
||||
|
||||
expect(editResult.images.length).toEqual(1);
|
||||
expect(editResult.images[0].b64_json).toBeTruthy();
|
||||
});
|
||||
|
||||
tap.test('OpenAI Image: should clean up provider', async () => {
|
||||
await openaiProvider.stop();
|
||||
console.log('OpenAI image provider stopped successfully');
|
||||
});
|
||||
|
||||
export default tap.start();
|
@@ -3,6 +3,6 @@
|
||||
*/
|
||||
export const commitinfo = {
|
||||
name: '@push.rocks/smartai',
|
||||
version: '0.6.1',
|
||||
version: '0.7.0',
|
||||
description: 'SmartAi is a versatile TypeScript library designed to facilitate integration and interaction with various AI models, offering functionalities for chat, audio generation, document processing, and vision tasks.'
|
||||
}
|
||||
|
@@ -50,6 +50,60 @@ export interface ResearchResponse {
|
||||
metadata?: any;
|
||||
}
|
||||
|
||||
/**
|
||||
* Options for image generation
|
||||
*/
|
||||
export interface ImageGenerateOptions {
|
||||
prompt: string;
|
||||
model?: 'gpt-image-1' | 'dall-e-3' | 'dall-e-2';
|
||||
quality?: 'low' | 'medium' | 'high' | 'standard' | 'hd' | 'auto';
|
||||
size?: '256x256' | '512x512' | '1024x1024' | '1536x1024' | '1024x1536' | '1792x1024' | '1024x1792' | 'auto';
|
||||
style?: 'vivid' | 'natural';
|
||||
background?: 'transparent' | 'opaque' | 'auto';
|
||||
outputFormat?: 'png' | 'jpeg' | 'webp';
|
||||
outputCompression?: number; // 0-100 for webp/jpeg
|
||||
moderation?: 'low' | 'auto';
|
||||
n?: number; // Number of images to generate
|
||||
stream?: boolean;
|
||||
partialImages?: number; // 0-3 for streaming
|
||||
}
|
||||
|
||||
/**
|
||||
* Options for image editing
|
||||
*/
|
||||
export interface ImageEditOptions {
|
||||
image: Buffer;
|
||||
prompt: string;
|
||||
mask?: Buffer;
|
||||
model?: 'gpt-image-1' | 'dall-e-2';
|
||||
quality?: 'low' | 'medium' | 'high' | 'standard' | 'auto';
|
||||
size?: '256x256' | '512x512' | '1024x1024' | '1536x1024' | '1024x1536' | 'auto';
|
||||
background?: 'transparent' | 'opaque' | 'auto';
|
||||
outputFormat?: 'png' | 'jpeg' | 'webp';
|
||||
outputCompression?: number;
|
||||
n?: number;
|
||||
stream?: boolean;
|
||||
partialImages?: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Response format for image operations
|
||||
*/
|
||||
export interface ImageResponse {
|
||||
images: Array<{
|
||||
b64_json?: string;
|
||||
url?: string;
|
||||
revisedPrompt?: string;
|
||||
}>;
|
||||
metadata?: {
|
||||
model: string;
|
||||
quality?: string;
|
||||
size?: string;
|
||||
outputFormat?: string;
|
||||
tokensUsed?: number;
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Abstract base class for multi-modal AI models.
|
||||
* Provides a common interface for different AI providers (OpenAI, Anthropic, Perplexity, Ollama)
|
||||
@@ -131,4 +185,20 @@ export abstract class MultiModalModel {
|
||||
* @throws Error if the provider doesn't support research capabilities
|
||||
*/
|
||||
public abstract research(optionsArg: ResearchOptions): Promise<ResearchResponse>;
|
||||
|
||||
/**
|
||||
* Image generation from text prompts
|
||||
* @param optionsArg Options containing the prompt and generation parameters
|
||||
* @returns Promise resolving to the generated image(s)
|
||||
* @throws Error if the provider doesn't support image generation
|
||||
*/
|
||||
public abstract imageGenerate(optionsArg: ImageGenerateOptions): Promise<ImageResponse>;
|
||||
|
||||
/**
|
||||
* Image editing and inpainting
|
||||
* @param optionsArg Options containing the image, prompt, and editing parameters
|
||||
* @returns Promise resolving to the edited image(s)
|
||||
* @throws Error if the provider doesn't support image editing
|
||||
*/
|
||||
public abstract imageEdit(optionsArg: ImageEditOptions): Promise<ImageResponse>;
|
||||
}
|
||||
|
@@ -1,7 +1,16 @@
|
||||
import * as plugins from './plugins.js';
|
||||
import * as paths from './paths.js';
|
||||
import { MultiModalModel } from './abstract.classes.multimodal.js';
|
||||
import type { ChatOptions, ChatResponse, ChatMessage, ResearchOptions, ResearchResponse } from './abstract.classes.multimodal.js';
|
||||
import type {
|
||||
ChatOptions,
|
||||
ChatResponse,
|
||||
ChatMessage,
|
||||
ResearchOptions,
|
||||
ResearchResponse,
|
||||
ImageGenerateOptions,
|
||||
ImageEditOptions,
|
||||
ImageResponse
|
||||
} from './abstract.classes.multimodal.js';
|
||||
import type { ImageBlockParam, TextBlockParam } from '@anthropic-ai/sdk/resources/messages';
|
||||
|
||||
type ContentBlock = ImageBlockParam | TextBlockParam;
|
||||
@@ -379,4 +388,18 @@ export class AnthropicProvider extends MultiModalModel {
|
||||
throw new Error(`Failed to perform research: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Image generation is not supported by Anthropic
|
||||
*/
|
||||
public async imageGenerate(optionsArg: ImageGenerateOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image generation is not supported by Anthropic. Claude can only analyze images, not generate them. Please use OpenAI provider for image generation.');
|
||||
}
|
||||
|
||||
/**
|
||||
* Image editing is not supported by Anthropic
|
||||
*/
|
||||
public async imageEdit(optionsArg: ImageEditOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image editing is not supported by Anthropic. Claude can only analyze images, not edit them. Please use OpenAI provider for image editing.');
|
||||
}
|
||||
}
|
@@ -1,7 +1,16 @@
|
||||
import * as plugins from './plugins.js';
|
||||
import * as paths from './paths.js';
|
||||
import { MultiModalModel } from './abstract.classes.multimodal.js';
|
||||
import type { ChatOptions, ChatResponse, ChatMessage, ResearchOptions, ResearchResponse } from './abstract.classes.multimodal.js';
|
||||
import type {
|
||||
ChatOptions,
|
||||
ChatResponse,
|
||||
ChatMessage,
|
||||
ResearchOptions,
|
||||
ResearchResponse,
|
||||
ImageGenerateOptions,
|
||||
ImageEditOptions,
|
||||
ImageResponse
|
||||
} from './abstract.classes.multimodal.js';
|
||||
import type { ChatCompletionMessageParam } from 'openai/resources/chat/completions';
|
||||
|
||||
export interface IExoProviderOptions {
|
||||
@@ -129,4 +138,18 @@ export class ExoProvider extends MultiModalModel {
|
||||
public async research(optionsArg: ResearchOptions): Promise<ResearchResponse> {
|
||||
throw new Error('Research capabilities are not yet supported by Exo provider.');
|
||||
}
|
||||
|
||||
/**
|
||||
* Image generation is not supported by Exo
|
||||
*/
|
||||
public async imageGenerate(optionsArg: ImageGenerateOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image generation is not supported by Exo. Please use OpenAI provider for image generation.');
|
||||
}
|
||||
|
||||
/**
|
||||
* Image editing is not supported by Exo
|
||||
*/
|
||||
public async imageEdit(optionsArg: ImageEditOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image editing is not supported by Exo. Please use OpenAI provider for image editing.');
|
||||
}
|
||||
}
|
||||
|
@@ -1,7 +1,16 @@
|
||||
import * as plugins from './plugins.js';
|
||||
import * as paths from './paths.js';
|
||||
import { MultiModalModel } from './abstract.classes.multimodal.js';
|
||||
import type { ChatOptions, ChatResponse, ChatMessage, ResearchOptions, ResearchResponse } from './abstract.classes.multimodal.js';
|
||||
import type {
|
||||
ChatOptions,
|
||||
ChatResponse,
|
||||
ChatMessage,
|
||||
ResearchOptions,
|
||||
ResearchResponse,
|
||||
ImageGenerateOptions,
|
||||
ImageEditOptions,
|
||||
ImageResponse
|
||||
} from './abstract.classes.multimodal.js';
|
||||
|
||||
export interface IGroqProviderOptions {
|
||||
groqToken: string;
|
||||
@@ -193,4 +202,18 @@ export class GroqProvider extends MultiModalModel {
|
||||
public async research(optionsArg: ResearchOptions): Promise<ResearchResponse> {
|
||||
throw new Error('Research capabilities are not yet supported by Groq provider.');
|
||||
}
|
||||
|
||||
/**
|
||||
* Image generation is not supported by Groq
|
||||
*/
|
||||
public async imageGenerate(optionsArg: ImageGenerateOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image generation is not supported by Groq. Please use OpenAI provider for image generation.');
|
||||
}
|
||||
|
||||
/**
|
||||
* Image editing is not supported by Groq
|
||||
*/
|
||||
public async imageEdit(optionsArg: ImageEditOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image editing is not supported by Groq. Please use OpenAI provider for image editing.');
|
||||
}
|
||||
}
|
@@ -1,7 +1,16 @@
|
||||
import * as plugins from './plugins.js';
|
||||
import * as paths from './paths.js';
|
||||
import { MultiModalModel } from './abstract.classes.multimodal.js';
|
||||
import type { ChatOptions, ChatResponse, ChatMessage, ResearchOptions, ResearchResponse } from './abstract.classes.multimodal.js';
|
||||
import type {
|
||||
ChatOptions,
|
||||
ChatResponse,
|
||||
ChatMessage,
|
||||
ResearchOptions,
|
||||
ResearchResponse,
|
||||
ImageGenerateOptions,
|
||||
ImageEditOptions,
|
||||
ImageResponse
|
||||
} from './abstract.classes.multimodal.js';
|
||||
|
||||
export interface IOllamaProviderOptions {
|
||||
baseUrl?: string;
|
||||
@@ -255,4 +264,18 @@ export class OllamaProvider extends MultiModalModel {
|
||||
public async research(optionsArg: ResearchOptions): Promise<ResearchResponse> {
|
||||
throw new Error('Research capabilities are not yet supported by Ollama provider.');
|
||||
}
|
||||
|
||||
/**
|
||||
* Image generation is not supported by Ollama
|
||||
*/
|
||||
public async imageGenerate(optionsArg: ImageGenerateOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image generation is not supported by Ollama. Please use OpenAI provider for image generation.');
|
||||
}
|
||||
|
||||
/**
|
||||
* Image editing is not supported by Ollama
|
||||
*/
|
||||
public async imageEdit(optionsArg: ImageEditOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image editing is not supported by Ollama. Please use OpenAI provider for image editing.');
|
||||
}
|
||||
}
|
@@ -9,7 +9,13 @@ export type TChatCompletionRequestMessage = {
|
||||
};
|
||||
|
||||
import { MultiModalModel } from './abstract.classes.multimodal.js';
|
||||
import type { ResearchOptions, ResearchResponse } from './abstract.classes.multimodal.js';
|
||||
import type {
|
||||
ResearchOptions,
|
||||
ResearchResponse,
|
||||
ImageGenerateOptions,
|
||||
ImageEditOptions,
|
||||
ImageResponse
|
||||
} from './abstract.classes.multimodal.js';
|
||||
|
||||
export interface IOpenaiProviderOptions {
|
||||
openaiToken: string;
|
||||
@@ -17,6 +23,7 @@ export interface IOpenaiProviderOptions {
|
||||
audioModel?: string;
|
||||
visionModel?: string;
|
||||
researchModel?: string;
|
||||
imageModel?: string;
|
||||
enableWebSearch?: boolean;
|
||||
}
|
||||
|
||||
@@ -328,4 +335,121 @@ export class OpenAiProvider extends MultiModalModel {
|
||||
throw new Error(`Failed to perform research: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Image generation using OpenAI's gpt-image-1 or DALL-E models
|
||||
*/
|
||||
public async imageGenerate(optionsArg: ImageGenerateOptions): Promise<ImageResponse> {
|
||||
const model = optionsArg.model || this.options.imageModel || 'gpt-image-1';
|
||||
|
||||
try {
|
||||
const requestParams: any = {
|
||||
model,
|
||||
prompt: optionsArg.prompt,
|
||||
n: optionsArg.n || 1,
|
||||
};
|
||||
|
||||
// Add gpt-image-1 specific parameters
|
||||
if (model === 'gpt-image-1') {
|
||||
if (optionsArg.quality) requestParams.quality = optionsArg.quality;
|
||||
if (optionsArg.size) requestParams.size = optionsArg.size;
|
||||
if (optionsArg.background) requestParams.background = optionsArg.background;
|
||||
if (optionsArg.outputFormat) requestParams.output_format = optionsArg.outputFormat;
|
||||
if (optionsArg.outputCompression !== undefined) requestParams.output_compression = optionsArg.outputCompression;
|
||||
if (optionsArg.moderation) requestParams.moderation = optionsArg.moderation;
|
||||
if (optionsArg.stream !== undefined) requestParams.stream = optionsArg.stream;
|
||||
if (optionsArg.partialImages !== undefined) requestParams.partial_images = optionsArg.partialImages;
|
||||
} else if (model === 'dall-e-3') {
|
||||
// DALL-E 3 specific parameters
|
||||
if (optionsArg.quality) requestParams.quality = optionsArg.quality;
|
||||
if (optionsArg.size) requestParams.size = optionsArg.size;
|
||||
if (optionsArg.style) requestParams.style = optionsArg.style;
|
||||
requestParams.response_format = 'b64_json'; // Always use base64 for consistency
|
||||
} else if (model === 'dall-e-2') {
|
||||
// DALL-E 2 specific parameters
|
||||
if (optionsArg.size) requestParams.size = optionsArg.size;
|
||||
requestParams.response_format = 'b64_json';
|
||||
}
|
||||
|
||||
const result = await this.openAiApiClient.images.generate(requestParams);
|
||||
|
||||
const images = (result.data || []).map(img => ({
|
||||
b64_json: img.b64_json,
|
||||
url: img.url,
|
||||
revisedPrompt: img.revised_prompt
|
||||
}));
|
||||
|
||||
return {
|
||||
images,
|
||||
metadata: {
|
||||
model,
|
||||
quality: result.quality,
|
||||
size: result.size,
|
||||
outputFormat: result.output_format,
|
||||
tokensUsed: result.usage?.total_tokens
|
||||
}
|
||||
};
|
||||
} catch (error) {
|
||||
console.error('Image generation error:', error);
|
||||
throw new Error(`Failed to generate image: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Image editing using OpenAI's gpt-image-1 or DALL-E 2 models
|
||||
*/
|
||||
public async imageEdit(optionsArg: ImageEditOptions): Promise<ImageResponse> {
|
||||
const model = optionsArg.model || this.options.imageModel || 'gpt-image-1';
|
||||
|
||||
try {
|
||||
const requestParams: any = {
|
||||
model,
|
||||
image: optionsArg.image,
|
||||
prompt: optionsArg.prompt,
|
||||
n: optionsArg.n || 1,
|
||||
};
|
||||
|
||||
// Add mask if provided
|
||||
if (optionsArg.mask) {
|
||||
requestParams.mask = optionsArg.mask;
|
||||
}
|
||||
|
||||
// Add gpt-image-1 specific parameters
|
||||
if (model === 'gpt-image-1') {
|
||||
if (optionsArg.quality) requestParams.quality = optionsArg.quality;
|
||||
if (optionsArg.size) requestParams.size = optionsArg.size;
|
||||
if (optionsArg.background) requestParams.background = optionsArg.background;
|
||||
if (optionsArg.outputFormat) requestParams.output_format = optionsArg.outputFormat;
|
||||
if (optionsArg.outputCompression !== undefined) requestParams.output_compression = optionsArg.outputCompression;
|
||||
if (optionsArg.stream !== undefined) requestParams.stream = optionsArg.stream;
|
||||
if (optionsArg.partialImages !== undefined) requestParams.partial_images = optionsArg.partialImages;
|
||||
} else if (model === 'dall-e-2') {
|
||||
// DALL-E 2 specific parameters
|
||||
if (optionsArg.size) requestParams.size = optionsArg.size;
|
||||
requestParams.response_format = 'b64_json';
|
||||
}
|
||||
|
||||
const result = await this.openAiApiClient.images.edit(requestParams);
|
||||
|
||||
const images = (result.data || []).map(img => ({
|
||||
b64_json: img.b64_json,
|
||||
url: img.url,
|
||||
revisedPrompt: img.revised_prompt
|
||||
}));
|
||||
|
||||
return {
|
||||
images,
|
||||
metadata: {
|
||||
model,
|
||||
quality: result.quality,
|
||||
size: result.size,
|
||||
outputFormat: result.output_format,
|
||||
tokensUsed: result.usage?.total_tokens
|
||||
}
|
||||
};
|
||||
} catch (error) {
|
||||
console.error('Image edit error:', error);
|
||||
throw new Error(`Failed to edit image: ${error.message}`);
|
||||
}
|
||||
}
|
||||
}
|
@@ -1,7 +1,16 @@
|
||||
import * as plugins from './plugins.js';
|
||||
import * as paths from './paths.js';
|
||||
import { MultiModalModel } from './abstract.classes.multimodal.js';
|
||||
import type { ChatOptions, ChatResponse, ChatMessage, ResearchOptions, ResearchResponse } from './abstract.classes.multimodal.js';
|
||||
import type {
|
||||
ChatOptions,
|
||||
ChatResponse,
|
||||
ChatMessage,
|
||||
ResearchOptions,
|
||||
ResearchResponse,
|
||||
ImageGenerateOptions,
|
||||
ImageEditOptions,
|
||||
ImageResponse
|
||||
} from './abstract.classes.multimodal.js';
|
||||
|
||||
export interface IPerplexityProviderOptions {
|
||||
perplexityToken: string;
|
||||
@@ -233,4 +242,18 @@ export class PerplexityProvider extends MultiModalModel {
|
||||
throw new Error(`Failed to perform research: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Image generation is not supported by Perplexity
|
||||
*/
|
||||
public async imageGenerate(optionsArg: ImageGenerateOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image generation is not supported by Perplexity. Please use OpenAI provider for image generation.');
|
||||
}
|
||||
|
||||
/**
|
||||
* Image editing is not supported by Perplexity
|
||||
*/
|
||||
public async imageEdit(optionsArg: ImageEditOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image editing is not supported by Perplexity. Please use OpenAI provider for image editing.');
|
||||
}
|
||||
}
|
@@ -1,7 +1,16 @@
|
||||
import * as plugins from './plugins.js';
|
||||
import * as paths from './paths.js';
|
||||
import { MultiModalModel } from './abstract.classes.multimodal.js';
|
||||
import type { ChatOptions, ChatResponse, ChatMessage, ResearchOptions, ResearchResponse } from './abstract.classes.multimodal.js';
|
||||
import type {
|
||||
ChatOptions,
|
||||
ChatResponse,
|
||||
ChatMessage,
|
||||
ResearchOptions,
|
||||
ResearchResponse,
|
||||
ImageGenerateOptions,
|
||||
ImageEditOptions,
|
||||
ImageResponse
|
||||
} from './abstract.classes.multimodal.js';
|
||||
import type { ChatCompletionMessageParam } from 'openai/resources/chat/completions';
|
||||
|
||||
export interface IXAIProviderOptions {
|
||||
@@ -185,4 +194,18 @@ export class XAIProvider extends MultiModalModel {
|
||||
public async research(optionsArg: ResearchOptions): Promise<ResearchResponse> {
|
||||
throw new Error('Research capabilities are not yet supported by xAI provider.');
|
||||
}
|
||||
|
||||
/**
|
||||
* Image generation is not supported by xAI
|
||||
*/
|
||||
public async imageGenerate(optionsArg: ImageGenerateOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image generation is not supported by xAI. Please use OpenAI provider for image generation.');
|
||||
}
|
||||
|
||||
/**
|
||||
* Image editing is not supported by xAI
|
||||
*/
|
||||
public async imageEdit(optionsArg: ImageEditOptions): Promise<ImageResponse> {
|
||||
throw new Error('Image editing is not supported by xAI. Please use OpenAI provider for image editing.');
|
||||
}
|
||||
}
|
||||
|
Reference in New Issue
Block a user