BREAKING CHANGE(vercel-ai-sdk): migrate to Vercel AI SDK v6 and introduce provider registry (getModel) returning LanguageModelV3
This commit is contained in:
861
readme.md
861
readme.md
@@ -1,12 +1,12 @@
|
||||
# @push.rocks/smartai
|
||||
|
||||
**One API to rule them all** 🚀
|
||||
**A unified provider registry for the Vercel AI SDK** 🧠⚡
|
||||
|
||||
[](https://www.npmjs.com/package/@push.rocks/smartai)
|
||||
[](https://www.typescriptlang.org/)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
|
||||
SmartAI unifies the world's leading AI providers — OpenAI, Anthropic, Mistral, Perplexity, Ollama, Groq, XAI, Exo, and ElevenLabs — under a single, elegant TypeScript interface. Build AI applications at lightning speed without vendor lock-in.
|
||||
SmartAI gives you a single `getModel()` function that returns a standard `LanguageModelV3` for **any** supported provider — Anthropic, OpenAI, Google, Groq, Mistral, XAI, Perplexity, or Ollama. Use the returned model with the Vercel AI SDK's `generateText()`, `streamText()`, and tool ecosystem. Specialized capabilities like vision, audio, image generation, document analysis, and web research are available as dedicated subpath imports.
|
||||
|
||||
## Issue Reporting and Security
|
||||
|
||||
@@ -14,679 +14,416 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community
|
||||
|
||||
## 🎯 Why SmartAI?
|
||||
|
||||
- **🔌 Universal Interface**: Write once, run with any AI provider. Switch between GPT-5, Claude, Llama, or Grok with a single line change.
|
||||
- **🛡️ Type-Safe**: Full TypeScript support with comprehensive type definitions for all operations.
|
||||
- **🌊 Streaming First**: Built for real-time applications with native streaming support.
|
||||
- **🎨 Multi-Modal**: Seamlessly work with text, images, audio, and documents.
|
||||
- **🏠 Local & Cloud**: Support for both cloud providers and local models via Ollama/Exo.
|
||||
- **⚡ Zero Lock-In**: Your code remains portable across all AI providers.
|
||||
- **🔌 One function, eight providers** — `getModel()` returns a standard `LanguageModelV3`. Switch providers by changing a string.
|
||||
- **🧱 Built on Vercel AI SDK** — Uses `ai` v6 under the hood. Your model works with `generateText()`, `streamText()`, tool calling, structured output, and everything else in the AI SDK ecosystem.
|
||||
- **🏠 Custom Ollama provider** — A full `LanguageModelV3` implementation for Ollama with support for `think` mode, `num_ctx`, auto-tuned temperature for Qwen models, and native tool calling.
|
||||
- **💰 Anthropic prompt caching** — Automatic `cacheControl` middleware reduces cost and latency on repeated calls. Enabled by default, opt out with `promptCaching: false`.
|
||||
- **📦 Modular subpath exports** — Vision, audio, image, document, and research capabilities ship as separate imports. Only import what you need.
|
||||
- **⚡ Zero lock-in** — Your code uses standard AI SDK types. Swap providers without touching application logic.
|
||||
|
||||
## 📦 Installation
|
||||
|
||||
```bash
|
||||
npm install @push.rocks/smartai
|
||||
# or
|
||||
pnpm install @push.rocks/smartai
|
||||
```
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
```typescript
|
||||
import { SmartAi } from '@push.rocks/smartai';
|
||||
import { getModel, generateText, streamText } from '@push.rocks/smartai';
|
||||
|
||||
// Initialize with your favorite providers
|
||||
const ai = new SmartAi({
|
||||
openaiToken: 'sk-...',
|
||||
anthropicToken: 'sk-ant-...',
|
||||
elevenlabsToken: 'sk-...',
|
||||
elevenlabs: {
|
||||
defaultVoiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
|
||||
},
|
||||
// Get a model for any provider
|
||||
const model = getModel({
|
||||
provider: 'anthropic',
|
||||
model: 'claude-sonnet-4-5-20250929',
|
||||
apiKey: process.env.ANTHROPIC_TOKEN,
|
||||
});
|
||||
|
||||
await ai.start();
|
||||
|
||||
// Same API, multiple providers
|
||||
const response = await ai.openaiProvider.chat({
|
||||
systemMessage: 'You are a helpful assistant.',
|
||||
userMessage: 'Explain quantum computing in simple terms',
|
||||
messageHistory: [],
|
||||
// Use it with the standard AI SDK functions
|
||||
const result = await generateText({
|
||||
model,
|
||||
prompt: 'Explain quantum computing in simple terms.',
|
||||
});
|
||||
|
||||
console.log(response.message);
|
||||
console.log(result.text);
|
||||
```
|
||||
|
||||
## 📊 Provider Capabilities Matrix
|
||||
That's it. Change `provider` to `'openai'` and `model` to `'gpt-4o'` and the rest of your code stays exactly the same.
|
||||
|
||||
Choose the right provider for your use case:
|
||||
## 🔧 Core API
|
||||
|
||||
| Provider | Chat | Streaming | TTS | Vision | Documents | Research | Images | Highlights |
|
||||
| -------------- | :--: | :-------: | :-: | :----: | :-------: | :------: | :----: | --------------------------------------------------------------- |
|
||||
| **OpenAI** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | gpt-image-1 • DALL-E 3 • Deep Research API |
|
||||
| **Anthropic** | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | Claude Sonnet 4.5 • Extended Thinking • Web Search API |
|
||||
| **Mistral** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | Native PDF OCR • mistral-large • Fast inference |
|
||||
| **ElevenLabs** | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | Premium TTS • 70+ languages • v3 model |
|
||||
| **Ollama** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | 100% local • Privacy-first • No API costs |
|
||||
| **XAI** | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | Grok 2 • Real-time data |
|
||||
| **Perplexity** | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | Web-aware • Research-focused • Sonar Pro |
|
||||
| **Groq** | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | 10x faster • LPU inference • Llama 3.3 |
|
||||
| **Exo** | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | Distributed • P2P compute • Decentralized |
|
||||
### `getModel(options): LanguageModelV3`
|
||||
|
||||
## 🎮 Core Features
|
||||
|
||||
### 💬 Universal Chat Interface
|
||||
|
||||
Works identically across all providers:
|
||||
The primary export. Returns a standard `LanguageModelV3` you can use with any AI SDK function.
|
||||
|
||||
```typescript
|
||||
// Use GPT-5 for complex reasoning
|
||||
const gptResponse = await ai.openaiProvider.chat({
|
||||
systemMessage: 'You are an expert physicist.',
|
||||
userMessage: 'Explain the implications of quantum entanglement',
|
||||
messageHistory: [],
|
||||
});
|
||||
import { getModel } from '@push.rocks/smartai';
|
||||
import type { ISmartAiOptions } from '@push.rocks/smartai';
|
||||
|
||||
// Use Claude for safety-critical applications
|
||||
const claudeResponse = await ai.anthropicProvider.chat({
|
||||
systemMessage: 'You are a medical advisor.',
|
||||
userMessage: 'Review this patient data for concerns',
|
||||
messageHistory: [],
|
||||
});
|
||||
const options: ISmartAiOptions = {
|
||||
provider: 'anthropic', // 'anthropic' | 'openai' | 'google' | 'groq' | 'mistral' | 'xai' | 'perplexity' | 'ollama'
|
||||
model: 'claude-sonnet-4-5-20250929',
|
||||
apiKey: 'sk-ant-...',
|
||||
// Anthropic-only: prompt caching (default: true)
|
||||
promptCaching: true,
|
||||
// Ollama-only: base URL (default: http://localhost:11434)
|
||||
baseUrl: 'http://localhost:11434',
|
||||
// Ollama-only: model runtime options
|
||||
ollamaOptions: { think: true, num_ctx: 4096 },
|
||||
};
|
||||
|
||||
// Use Groq for lightning-fast responses
|
||||
const groqResponse = await ai.groqProvider.chat({
|
||||
systemMessage: 'You are a code reviewer.',
|
||||
userMessage: 'Quick! Find the bug in this code: ...',
|
||||
messageHistory: [],
|
||||
});
|
||||
const model = getModel(options);
|
||||
```
|
||||
|
||||
### 🌊 Real-Time Streaming
|
||||
### Re-exported AI SDK Functions
|
||||
|
||||
Build responsive chat interfaces with token-by-token streaming:
|
||||
SmartAI re-exports the most commonly used functions from `ai` for convenience:
|
||||
|
||||
```typescript
|
||||
// Create a chat stream
|
||||
const stream = await ai.openaiProvider.chatStream(inputStream);
|
||||
const reader = stream.getReader();
|
||||
import {
|
||||
getModel,
|
||||
generateText,
|
||||
streamText,
|
||||
tool,
|
||||
jsonSchema,
|
||||
} from '@push.rocks/smartai';
|
||||
|
||||
// Display responses as they arrive
|
||||
while (true) {
|
||||
const { done, value } = await reader.read();
|
||||
if (done) break;
|
||||
import type {
|
||||
ModelMessage,
|
||||
ToolSet,
|
||||
StreamTextResult,
|
||||
LanguageModelV3,
|
||||
} from '@push.rocks/smartai';
|
||||
```
|
||||
|
||||
// Update UI in real-time
|
||||
process.stdout.write(value);
|
||||
## 🤖 Supported Providers
|
||||
|
||||
| Provider | Package | Example Models |
|
||||
|----------|---------|----------------|
|
||||
| **Anthropic** | `@ai-sdk/anthropic` | `claude-sonnet-4-5-20250929`, `claude-opus-4-5-20250929` |
|
||||
| **OpenAI** | `@ai-sdk/openai` | `gpt-4o`, `gpt-4o-mini`, `o3-mini` |
|
||||
| **Google** | `@ai-sdk/google` | `gemini-2.0-flash`, `gemini-2.5-pro` |
|
||||
| **Groq** | `@ai-sdk/groq` | `llama-3.3-70b-versatile`, `mixtral-8x7b-32768` |
|
||||
| **Mistral** | `@ai-sdk/mistral` | `mistral-large-latest`, `mistral-small-latest` |
|
||||
| **XAI** | `@ai-sdk/xai` | `grok-3`, `grok-3-mini` |
|
||||
| **Perplexity** | `@ai-sdk/perplexity` | `sonar-pro`, `sonar` |
|
||||
| **Ollama** | Custom `LanguageModelV3` | `qwen3:8b`, `llama3:8b`, `deepseek-r1` |
|
||||
|
||||
## 💬 Text Generation
|
||||
|
||||
### Generate Text
|
||||
|
||||
```typescript
|
||||
import { getModel, generateText } from '@push.rocks/smartai';
|
||||
|
||||
const model = getModel({
|
||||
provider: 'openai',
|
||||
model: 'gpt-4o',
|
||||
apiKey: process.env.OPENAI_TOKEN,
|
||||
});
|
||||
|
||||
const result = await generateText({
|
||||
model,
|
||||
system: 'You are a helpful assistant.',
|
||||
prompt: 'What is 2 + 2?',
|
||||
});
|
||||
|
||||
console.log(result.text); // "4"
|
||||
```
|
||||
|
||||
### Stream Text
|
||||
|
||||
```typescript
|
||||
import { getModel, streamText } from '@push.rocks/smartai';
|
||||
|
||||
const model = getModel({
|
||||
provider: 'anthropic',
|
||||
model: 'claude-sonnet-4-5-20250929',
|
||||
apiKey: process.env.ANTHROPIC_TOKEN,
|
||||
});
|
||||
|
||||
const result = await streamText({
|
||||
model,
|
||||
prompt: 'Count from 1 to 10.',
|
||||
});
|
||||
|
||||
for await (const chunk of result.textStream) {
|
||||
process.stdout.write(chunk);
|
||||
}
|
||||
```
|
||||
|
||||
### 🎙️ Text-to-Speech
|
||||
|
||||
Generate natural voices with OpenAI or ElevenLabs:
|
||||
### Tool Calling
|
||||
|
||||
```typescript
|
||||
// OpenAI TTS
|
||||
const audioStream = await ai.openaiProvider.audio({
|
||||
message: 'Welcome to the future of AI development!',
|
||||
import { getModel, generateText, tool, jsonSchema } from '@push.rocks/smartai';
|
||||
|
||||
const model = getModel({
|
||||
provider: 'anthropic',
|
||||
model: 'claude-sonnet-4-5-20250929',
|
||||
apiKey: process.env.ANTHROPIC_TOKEN,
|
||||
});
|
||||
|
||||
// ElevenLabs TTS - Premium quality, natural voices (uses v3 by default)
|
||||
const elevenLabsAudio = await ai.elevenlabsProvider.audio({
|
||||
message: 'Experience the most lifelike text to speech technology.',
|
||||
voiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
|
||||
modelId: 'eleven_v3', // Optional: defaults to eleven_v3 (70+ languages)
|
||||
voiceSettings: {
|
||||
// Optional: fine-tune voice characteristics
|
||||
stability: 0.5, // 0-1: Speech consistency
|
||||
similarity_boost: 0.8, // 0-1: Voice similarity to original
|
||||
style: 0.0, // 0-1: Expressiveness
|
||||
use_speaker_boost: true, // Enhanced clarity
|
||||
const result = await generateText({
|
||||
model,
|
||||
prompt: 'What is the weather in London?',
|
||||
tools: {
|
||||
getWeather: tool({
|
||||
description: 'Get weather for a location',
|
||||
parameters: jsonSchema({
|
||||
type: 'object',
|
||||
properties: {
|
||||
location: { type: 'string' },
|
||||
},
|
||||
required: ['location'],
|
||||
}),
|
||||
execute: async ({ location }) => {
|
||||
return { temperature: 18, condition: 'cloudy' };
|
||||
},
|
||||
}),
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
## 🏠 Ollama (Local Models)
|
||||
|
||||
The custom Ollama provider implements `LanguageModelV3` directly, calling Ollama's native `/api/chat` endpoint. This gives you features that generic OpenAI-compatible wrappers miss:
|
||||
|
||||
```typescript
|
||||
import { getModel, generateText } from '@push.rocks/smartai';
|
||||
|
||||
const model = getModel({
|
||||
provider: 'ollama',
|
||||
model: 'qwen3:8b',
|
||||
baseUrl: 'http://localhost:11434', // default
|
||||
ollamaOptions: {
|
||||
think: true, // Enable thinking/reasoning mode
|
||||
num_ctx: 8192, // Context window size
|
||||
temperature: 0.7, // Override default (Qwen models auto-default to 0.55)
|
||||
},
|
||||
});
|
||||
|
||||
// Stream directly to speakers or save to file
|
||||
audioStream.pipe(fs.createWriteStream('welcome.mp3'));
|
||||
const result = await generateText({
|
||||
model,
|
||||
prompt: 'Solve this step by step: what is 15% of 340?',
|
||||
});
|
||||
|
||||
console.log(result.text);
|
||||
```
|
||||
|
||||
### 👁️ Vision Analysis
|
||||
### Ollama Features
|
||||
|
||||
Understand images with multiple providers:
|
||||
- **`think` mode** — Enables reasoning for models that support it (Qwen3, QwQ, DeepSeek-R1). The `think` parameter is sent at the top level of the request body as required by the Ollama API.
|
||||
- **Auto-tuned temperature** — Qwen models automatically get `temperature: 0.55` when no explicit temperature is set, matching the recommended inference setting.
|
||||
- **Native tool calling** — Full tool call support via Ollama's native format (not shimmed through OpenAI-compatible endpoints).
|
||||
- **Streaming with reasoning** — `doStream()` emits proper `reasoning-start`, `reasoning-delta`, `reasoning-end` parts alongside text.
|
||||
- **All Ollama options** — `num_ctx`, `top_k`, `top_p`, `repeat_penalty`, `num_predict`, `stop`, `seed`.
|
||||
|
||||
## 💰 Anthropic Prompt Caching
|
||||
|
||||
When using the Anthropic provider, SmartAI automatically wraps the model with caching middleware that adds `cacheControl: { type: 'ephemeral' }` to the last system message and last user message. This can significantly reduce cost and latency for repeated calls with the same system prompt.
|
||||
|
||||
```typescript
|
||||
const image = fs.readFileSync('product-photo.jpg');
|
||||
|
||||
// OpenAI: General purpose vision
|
||||
const gptVision = await ai.openaiProvider.vision({
|
||||
image,
|
||||
prompt: 'Describe this product and suggest marketing angles',
|
||||
// Caching enabled by default
|
||||
const model = getModel({
|
||||
provider: 'anthropic',
|
||||
model: 'claude-sonnet-4-5-20250929',
|
||||
apiKey: process.env.ANTHROPIC_TOKEN,
|
||||
});
|
||||
|
||||
// Anthropic: Detailed analysis with extended thinking
|
||||
const claudeVision = await ai.anthropicProvider.vision({
|
||||
image,
|
||||
prompt: 'Identify any safety concerns or defects',
|
||||
});
|
||||
|
||||
// Ollama: Private, local analysis
|
||||
const ollamaVision = await ai.ollamaProvider.vision({
|
||||
image,
|
||||
prompt: 'Extract all text and categorize the content',
|
||||
// Opt out of caching
|
||||
const modelNoCaching = getModel({
|
||||
provider: 'anthropic',
|
||||
model: 'claude-sonnet-4-5-20250929',
|
||||
apiKey: process.env.ANTHROPIC_TOKEN,
|
||||
promptCaching: false,
|
||||
});
|
||||
```
|
||||
|
||||
### 📄 Document Intelligence
|
||||
|
||||
Extract insights from PDFs with AI:
|
||||
You can also use the middleware directly:
|
||||
|
||||
```typescript
|
||||
const contract = fs.readFileSync('contract.pdf');
|
||||
const invoice = fs.readFileSync('invoice.pdf');
|
||||
import { createAnthropicCachingMiddleware } from '@push.rocks/smartai';
|
||||
import { wrapLanguageModel } from 'ai';
|
||||
|
||||
// Analyze documents with OpenAI
|
||||
const analysis = await ai.openaiProvider.document({
|
||||
systemMessage: 'You are a legal expert.',
|
||||
userMessage: 'Compare these documents and highlight key differences',
|
||||
messageHistory: [],
|
||||
pdfDocuments: [contract, invoice],
|
||||
});
|
||||
|
||||
// Multi-document analysis with Anthropic
|
||||
const taxDocs = [form1099, w2, receipts];
|
||||
const taxAnalysis = await ai.anthropicProvider.document({
|
||||
systemMessage: 'You are a tax advisor.',
|
||||
userMessage: 'Prepare a tax summary from these documents',
|
||||
messageHistory: [],
|
||||
pdfDocuments: taxDocs,
|
||||
});
|
||||
const middleware = createAnthropicCachingMiddleware();
|
||||
const cachedModel = wrapLanguageModel({ model: baseModel, middleware });
|
||||
```
|
||||
|
||||
### 🔬 Research & Web Search
|
||||
## 📦 Subpath Exports
|
||||
|
||||
Perform deep research with web search capabilities across multiple providers:
|
||||
SmartAI provides specialized capabilities as separate subpath imports. Each one is a focused utility that takes a model (or API key) and does one thing well.
|
||||
|
||||
### 👁️ Vision — `@push.rocks/smartai/vision`
|
||||
|
||||
Analyze images using any vision-capable model.
|
||||
|
||||
```typescript
|
||||
// OpenAI Deep Research - Comprehensive analysis
|
||||
const deepResearch = await ai.openaiProvider.research({
|
||||
query: 'What are the latest developments in quantum computing?',
|
||||
searchDepth: 'deep',
|
||||
includeWebSearch: true,
|
||||
import { analyzeImage } from '@push.rocks/smartai/vision';
|
||||
import { getModel } from '@push.rocks/smartai';
|
||||
import * as fs from 'fs';
|
||||
|
||||
const model = getModel({
|
||||
provider: 'anthropic',
|
||||
model: 'claude-sonnet-4-5-20250929',
|
||||
apiKey: process.env.ANTHROPIC_TOKEN,
|
||||
});
|
||||
|
||||
console.log(deepResearch.answer);
|
||||
console.log('Sources:', deepResearch.sources);
|
||||
|
||||
// Anthropic Web Search - Domain-filtered research
|
||||
import { AnthropicProvider } from '@push.rocks/smartai';
|
||||
|
||||
const anthropic = new AnthropicProvider({
|
||||
anthropicToken: 'sk-ant-...',
|
||||
enableWebSearch: true,
|
||||
searchDomainAllowList: ['nature.com', 'science.org'],
|
||||
const description = await analyzeImage({
|
||||
model,
|
||||
image: fs.readFileSync('photo.jpg'),
|
||||
prompt: 'Describe this image in detail.',
|
||||
mediaType: 'image/jpeg', // optional, defaults to 'image/jpeg'
|
||||
});
|
||||
|
||||
const scientificResearch = await anthropic.research({
|
||||
query: 'Latest breakthroughs in CRISPR gene editing',
|
||||
searchDepth: 'advanced',
|
||||
});
|
||||
|
||||
// Perplexity - Research-focused with citations
|
||||
const perplexityResearch = await ai.perplexityProvider.research({
|
||||
query: 'Current state of autonomous vehicle technology',
|
||||
searchDepth: 'deep', // Uses Sonar Pro model
|
||||
});
|
||||
console.log(description);
|
||||
```
|
||||
|
||||
**Research Options:**
|
||||
**`analyzeImage(options)`** accepts:
|
||||
- `model` — Any `LanguageModelV3` with vision support
|
||||
- `image` — `Buffer` or `Uint8Array`
|
||||
- `prompt` — What to ask about the image
|
||||
- `mediaType` — `'image/jpeg'` | `'image/png'` | `'image/webp'` | `'image/gif'`
|
||||
|
||||
- `searchDepth`: `'basic'` | `'advanced'` | `'deep'`
|
||||
- `maxSources`: Number of sources to include
|
||||
- `includeWebSearch`: Enable web search (OpenAI)
|
||||
- `background`: Run as background task (OpenAI)
|
||||
### 🎙️ Audio — `@push.rocks/smartai/audio`
|
||||
|
||||
**Supported Providers:**
|
||||
|
||||
- **OpenAI**: Deep Research API with specialized models (`o3-deep-research-*`, `o4-mini-deep-research-*`)
|
||||
- **Anthropic**: Web Search API with domain filtering
|
||||
- **Perplexity**: Sonar and Sonar Pro models with built-in citations
|
||||
|
||||
### 🧠 Extended Thinking (Anthropic)
|
||||
|
||||
Enable Claude to spend more time reasoning about complex problems before generating responses:
|
||||
Text-to-speech using OpenAI's TTS models.
|
||||
|
||||
```typescript
|
||||
import { AnthropicProvider } from '@push.rocks/smartai';
|
||||
import { textToSpeech } from '@push.rocks/smartai/audio';
|
||||
import * as fs from 'fs';
|
||||
|
||||
// Configure extended thinking mode at provider level
|
||||
const anthropic = new AnthropicProvider({
|
||||
anthropicToken: 'sk-ant-...',
|
||||
extendedThinking: 'normal', // Options: 'quick' | 'normal' | 'deep' | 'off'
|
||||
const stream = await textToSpeech({
|
||||
apiKey: process.env.OPENAI_TOKEN,
|
||||
text: 'Welcome to the future of AI development!',
|
||||
voice: 'nova', // 'alloy' | 'echo' | 'fable' | 'onyx' | 'nova' | 'shimmer'
|
||||
model: 'tts-1-hd', // 'tts-1' | 'tts-1-hd'
|
||||
responseFormat: 'mp3', // 'mp3' | 'opus' | 'aac' | 'flac'
|
||||
speed: 1.0, // 0.25 to 4.0
|
||||
});
|
||||
|
||||
await anthropic.start();
|
||||
|
||||
// Extended thinking is automatically applied to all methods
|
||||
const response = await anthropic.chat({
|
||||
systemMessage: 'You are an expert mathematician.',
|
||||
userMessage: 'Prove the Pythagorean theorem from first principles',
|
||||
messageHistory: [],
|
||||
});
|
||||
stream.pipe(fs.createWriteStream('welcome.mp3'));
|
||||
```
|
||||
|
||||
**Thinking Modes:**
|
||||
### 🎨 Image — `@push.rocks/smartai/image`
|
||||
|
||||
| Mode | Budget Tokens | Use Case |
|
||||
| ---------- | ------------- | ------------------------------------------------ |
|
||||
| `'quick'` | 2,048 | Lightweight reasoning for simple queries |
|
||||
| `'normal'` | 8,000 | **Default** — Balanced reasoning for most tasks |
|
||||
| `'deep'` | 16,000 | Complex reasoning for difficult problems |
|
||||
| `'off'` | 0 | Disable extended thinking |
|
||||
|
||||
**Best Practices:**
|
||||
|
||||
- Start with `'normal'` (default) for general usage
|
||||
- Use `'deep'` for complex analytical tasks, philosophy, mathematics, or research
|
||||
- Use `'quick'` for simple factual queries where deep reasoning isn't needed
|
||||
- Thinking budget counts against total token usage
|
||||
|
||||
### 📑 Native PDF OCR (Mistral)
|
||||
|
||||
Mistral provides native PDF document processing via their OCR API — no image conversion required:
|
||||
Generate and edit images using OpenAI's image models.
|
||||
|
||||
```typescript
|
||||
import { MistralProvider } from '@push.rocks/smartai';
|
||||
import { generateImage, editImage } from '@push.rocks/smartai/image';
|
||||
|
||||
const mistral = new MistralProvider({
|
||||
mistralToken: 'your-api-key',
|
||||
chatModel: 'mistral-large-latest', // Default
|
||||
ocrModel: 'mistral-ocr-latest', // Default
|
||||
tableFormat: 'markdown', // 'markdown' | 'html'
|
||||
});
|
||||
|
||||
await mistral.start();
|
||||
|
||||
// Direct PDF processing - no image conversion overhead
|
||||
const result = await mistral.document({
|
||||
systemMessage: 'You are a document analyst.',
|
||||
userMessage: 'Extract all invoice details and calculate the total.',
|
||||
pdfDocuments: [invoicePdfBuffer],
|
||||
messageHistory: [],
|
||||
});
|
||||
```
|
||||
|
||||
**Key Advantage**: Unlike other providers that convert PDFs to images first, Mistral's OCR API processes PDFs natively, potentially offering faster and more accurate text extraction for document-heavy workloads.
|
||||
|
||||
**Supported Formats:**
|
||||
|
||||
- Native PDF processing via Files API
|
||||
- Image OCR (JPEG, PNG, GIF, WebP) for vision tasks
|
||||
- Table extraction with markdown or HTML output
|
||||
|
||||
### 🎨 Image Generation & Editing
|
||||
|
||||
Generate and edit images with OpenAI's cutting-edge models:
|
||||
|
||||
```typescript
|
||||
// Basic image generation with gpt-image-1
|
||||
const image = await ai.openaiProvider.imageGenerate({
|
||||
prompt: 'A futuristic robot assistant in a modern office, digital art',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'high',
|
||||
// Generate an image
|
||||
const result = await generateImage({
|
||||
apiKey: process.env.OPENAI_TOKEN,
|
||||
prompt: 'A futuristic cityscape at sunset, digital art',
|
||||
model: 'gpt-image-1', // 'gpt-image-1' | 'dall-e-3' | 'dall-e-2'
|
||||
quality: 'high', // 'low' | 'medium' | 'high' | 'auto'
|
||||
size: '1024x1024',
|
||||
background: 'transparent', // gpt-image-1 only
|
||||
outputFormat: 'png', // 'png' | 'jpeg' | 'webp'
|
||||
n: 1,
|
||||
});
|
||||
|
||||
// Save the generated image
|
||||
const imageBuffer = Buffer.from(image.images[0].b64_json!, 'base64');
|
||||
fs.writeFileSync('robot.png', imageBuffer);
|
||||
|
||||
// Advanced: Transparent background with custom format
|
||||
const logo = await ai.openaiProvider.imageGenerate({
|
||||
prompt: 'Minimalist mountain peak logo, geometric design',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'high',
|
||||
size: '1024x1024',
|
||||
background: 'transparent',
|
||||
outputFormat: 'png',
|
||||
});
|
||||
|
||||
// WebP with compression for web use
|
||||
const webImage = await ai.openaiProvider.imageGenerate({
|
||||
prompt: 'Product showcase: sleek smartphone on marble surface',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'high',
|
||||
size: '1536x1024',
|
||||
outputFormat: 'webp',
|
||||
outputCompression: 85,
|
||||
});
|
||||
|
||||
// Superior text rendering (gpt-image-1's strength)
|
||||
const signage = await ai.openaiProvider.imageGenerate({
|
||||
prompt:
|
||||
'Vintage cafe sign saying "COFFEE & CODE" in hand-lettered typography',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'high',
|
||||
size: '1024x1024',
|
||||
});
|
||||
|
||||
// Generate multiple variations at once
|
||||
const variations = await ai.openaiProvider.imageGenerate({
|
||||
prompt: 'Abstract geometric pattern, colorful minimalist art',
|
||||
model: 'gpt-image-1',
|
||||
n: 3,
|
||||
quality: 'medium',
|
||||
size: '1024x1024',
|
||||
});
|
||||
// result.images[0].b64_json — base64-encoded image data
|
||||
const imageBuffer = Buffer.from(result.images[0].b64_json!, 'base64');
|
||||
|
||||
// Edit an existing image
|
||||
const editedImage = await ai.openaiProvider.imageEdit({
|
||||
image: originalImageBuffer,
|
||||
prompt: 'Add sunglasses and change the background to a beach sunset',
|
||||
const edited = await editImage({
|
||||
apiKey: process.env.OPENAI_TOKEN,
|
||||
image: imageBuffer,
|
||||
prompt: 'Add a rainbow in the sky',
|
||||
model: 'gpt-image-1',
|
||||
quality: 'high',
|
||||
});
|
||||
```
|
||||
|
||||
**Image Generation Options:**
|
||||
### 📄 Document — `@push.rocks/smartai/document`
|
||||
|
||||
- `model`: `'gpt-image-1'` | `'dall-e-3'` | `'dall-e-2'`
|
||||
- `quality`: `'low'` | `'medium'` | `'high'` | `'auto'`
|
||||
- `size`: Multiple aspect ratios up to 4096×4096
|
||||
- `background`: `'transparent'` | `'opaque'` | `'auto'`
|
||||
- `outputFormat`: `'png'` | `'jpeg'` | `'webp'`
|
||||
- `outputCompression`: 0–100 for webp/jpeg
|
||||
- `moderation`: `'low'` | `'auto'`
|
||||
- `n`: Number of images (1–10)
|
||||
|
||||
**gpt-image-1 Advantages:**
|
||||
|
||||
- Superior text rendering in images
|
||||
- Up to 4096×4096 resolution
|
||||
- Transparent background support
|
||||
- Advanced output formats (WebP with compression)
|
||||
- Better prompt understanding
|
||||
- Streaming support for progressive rendering
|
||||
|
||||
### 🔄 Persistent Conversations
|
||||
|
||||
Maintain context across interactions:
|
||||
Analyze PDF documents by converting them to images and using a vision model. Uses `@push.rocks/smartpdf` for PDF-to-PNG conversion (requires Chromium/Puppeteer).
|
||||
|
||||
```typescript
|
||||
// Create a coding assistant conversation
|
||||
const assistant = ai.createConversation('openai');
|
||||
await assistant.setSystemMessage('You are an expert TypeScript developer.');
|
||||
import { analyzeDocuments, stopSmartpdf } from '@push.rocks/smartai/document';
|
||||
import { getModel } from '@push.rocks/smartai';
|
||||
import * as fs from 'fs';
|
||||
|
||||
// First question
|
||||
const inputWriter = assistant.getInputStreamWriter();
|
||||
await inputWriter.write('How do I implement a singleton pattern?');
|
||||
|
||||
// Continue the conversation
|
||||
await inputWriter.write('Now show me how to make it thread-safe');
|
||||
|
||||
// The assistant remembers the entire context
|
||||
```
|
||||
|
||||
## 🚀 Real-World Examples
|
||||
|
||||
### Build a Customer Support Bot
|
||||
|
||||
```typescript
|
||||
const supportBot = new SmartAi({
|
||||
anthropicToken: process.env.ANTHROPIC_KEY, // Claude for empathetic responses
|
||||
const model = getModel({
|
||||
provider: 'anthropic',
|
||||
model: 'claude-sonnet-4-5-20250929',
|
||||
apiKey: process.env.ANTHROPIC_TOKEN,
|
||||
});
|
||||
|
||||
async function handleCustomerQuery(query: string, history: ChatMessage[]) {
|
||||
try {
|
||||
const response = await supportBot.anthropicProvider.chat({
|
||||
systemMessage: `You are a helpful customer support agent.
|
||||
Be empathetic, professional, and solution-oriented.`,
|
||||
userMessage: query,
|
||||
messageHistory: history,
|
||||
});
|
||||
|
||||
return response.message;
|
||||
} catch (error) {
|
||||
// Fallback to another provider if needed
|
||||
return await supportBot.openaiProvider.chat({ /* ... */ });
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Create a Code Review Assistant
|
||||
|
||||
```typescript
|
||||
const codeReviewer = new SmartAi({
|
||||
groqToken: process.env.GROQ_KEY, // Groq for speed
|
||||
const analysis = await analyzeDocuments({
|
||||
model,
|
||||
systemMessage: 'You are a legal document analyst.',
|
||||
userMessage: 'Summarize the key terms and conditions.',
|
||||
pdfDocuments: [fs.readFileSync('contract.pdf')],
|
||||
messageHistory: [], // optional: prior conversation context
|
||||
});
|
||||
|
||||
async function reviewCode(code: string, language: string) {
|
||||
const review = await codeReviewer.groqProvider.chat({
|
||||
systemMessage: `You are a ${language} expert. Review code for:
|
||||
- Security vulnerabilities
|
||||
- Performance issues
|
||||
- Best practices
|
||||
- Potential bugs`,
|
||||
userMessage: `Review this code:\n\n${code}`,
|
||||
messageHistory: [],
|
||||
});
|
||||
console.log(analysis);
|
||||
|
||||
return review.message;
|
||||
}
|
||||
// Clean up the SmartPdf instance when done
|
||||
await stopSmartpdf();
|
||||
```
|
||||
|
||||
### Build a Research Assistant
|
||||
### 🔬 Research — `@push.rocks/smartai/research`
|
||||
|
||||
Perform web-search-powered research using Anthropic's `web_search_20250305` tool.
|
||||
|
||||
```typescript
|
||||
const researcher = new SmartAi({
|
||||
perplexityToken: process.env.PERPLEXITY_KEY,
|
||||
import { research } from '@push.rocks/smartai/research';
|
||||
|
||||
const result = await research({
|
||||
apiKey: process.env.ANTHROPIC_TOKEN,
|
||||
query: 'What are the latest developments in quantum computing?',
|
||||
searchDepth: 'basic', // 'basic' | 'advanced' | 'deep'
|
||||
maxSources: 10, // optional: limit number of search results
|
||||
allowedDomains: ['nature.com', 'arxiv.org'], // optional: restrict to domains
|
||||
blockedDomains: ['reddit.com'], // optional: exclude domains
|
||||
});
|
||||
|
||||
async function research(topic: string) {
|
||||
// Perplexity excels at web-aware research
|
||||
const findings = await researcher.perplexityProvider.research({
|
||||
query: `Research the latest developments in ${topic}`,
|
||||
searchDepth: 'deep',
|
||||
});
|
||||
|
||||
return {
|
||||
answer: findings.answer,
|
||||
sources: findings.sources,
|
||||
};
|
||||
}
|
||||
console.log(result.answer);
|
||||
console.log('Sources:', result.sources); // Array<{ url, title, snippet }>
|
||||
console.log('Queries:', result.searchQueries); // search queries the model used
|
||||
```
|
||||
|
||||
### Local AI for Sensitive Data
|
||||
## 🧪 Testing
|
||||
|
||||
```typescript
|
||||
const localAI = new SmartAi({
|
||||
ollama: {
|
||||
baseUrl: 'http://localhost:11434',
|
||||
model: 'llama2',
|
||||
visionModel: 'llava',
|
||||
},
|
||||
});
|
||||
```bash
|
||||
# All tests
|
||||
pnpm test
|
||||
|
||||
// Process sensitive documents without leaving your infrastructure
|
||||
async function analyzeSensitiveDoc(pdfBuffer: Buffer) {
|
||||
const analysis = await localAI.ollamaProvider.document({
|
||||
systemMessage: 'Extract and summarize key information.',
|
||||
userMessage: 'Analyze this confidential document',
|
||||
messageHistory: [],
|
||||
pdfDocuments: [pdfBuffer],
|
||||
});
|
||||
|
||||
// Data never leaves your servers
|
||||
return analysis.message;
|
||||
}
|
||||
# Individual test files
|
||||
tstest test/test.smartai.ts --verbose # Core getModel + generateText + streamText
|
||||
tstest test/test.ollama.ts --verbose # Ollama provider (mocked, no API needed)
|
||||
tstest test/test.vision.ts --verbose # Vision analysis
|
||||
tstest test/test.image.ts --verbose # Image generation
|
||||
tstest test/test.research.ts --verbose # Web research
|
||||
tstest test/test.audio.ts --verbose # Text-to-speech
|
||||
tstest test/test.document.ts --verbose # Document analysis (needs Chromium)
|
||||
```
|
||||
|
||||
## ⚡ Performance Tips
|
||||
Most tests skip gracefully when API keys are not set. The Ollama tests are fully mocked and require no external services.
|
||||
|
||||
### 1. Provider Selection Strategy
|
||||
## 📐 Architecture
|
||||
|
||||
```typescript
|
||||
class SmartAIRouter {
|
||||
constructor(private ai: SmartAi) {}
|
||||
|
||||
async query(
|
||||
message: string,
|
||||
requirements: {
|
||||
speed?: boolean;
|
||||
accuracy?: boolean;
|
||||
cost?: boolean;
|
||||
privacy?: boolean;
|
||||
}
|
||||
) {
|
||||
if (requirements.privacy) {
|
||||
return this.ai.ollamaProvider.chat({ /* ... */ }); // Local only
|
||||
}
|
||||
if (requirements.speed) {
|
||||
return this.ai.groqProvider.chat({ /* ... */ }); // 10x faster
|
||||
}
|
||||
if (requirements.accuracy) {
|
||||
return this.ai.anthropicProvider.chat({ /* ... */ }); // Best reasoning
|
||||
}
|
||||
// Default fallback
|
||||
return this.ai.openaiProvider.chat({ /* ... */ });
|
||||
}
|
||||
}
|
||||
```
|
||||
@push.rocks/smartai
|
||||
├── ts/ # Core package
|
||||
│ ├── index.ts # Re-exports getModel, AI SDK functions, types
|
||||
│ ├── smartai.classes.smartai.ts # getModel() — provider switch
|
||||
│ ├── smartai.interfaces.ts # ISmartAiOptions, TProvider, IOllamaModelOptions
|
||||
│ ├── smartai.provider.ollama.ts # Custom LanguageModelV3 for Ollama
|
||||
│ ├── smartai.middleware.anthropic.ts # Prompt caching middleware
|
||||
│ └── plugins.ts # AI SDK provider factories
|
||||
├── ts_vision/ # @push.rocks/smartai/vision
|
||||
├── ts_audio/ # @push.rocks/smartai/audio
|
||||
├── ts_image/ # @push.rocks/smartai/image
|
||||
├── ts_document/ # @push.rocks/smartai/document
|
||||
└── ts_research/ # @push.rocks/smartai/research
|
||||
```
|
||||
|
||||
### 2. Streaming for Large Responses
|
||||
The core package is a thin registry. `getModel()` creates the appropriate `@ai-sdk/*` provider, calls it with the model ID, and returns the resulting `LanguageModelV3`. For Anthropic, it optionally wraps the model with prompt caching middleware. For Ollama, it returns a custom `LanguageModelV3` implementation that talks directly to Ollama's `/api/chat` endpoint.
|
||||
|
||||
```typescript
|
||||
// Don't wait for the entire response
|
||||
async function streamResponse(userQuery: string) {
|
||||
const stream = await ai.openaiProvider.chatStream(
|
||||
createInputStream(userQuery)
|
||||
);
|
||||
|
||||
// Process tokens as they arrive
|
||||
for await (const chunk of stream) {
|
||||
updateUI(chunk); // Immediate feedback
|
||||
await processChunk(chunk); // Parallel processing
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Parallel Multi-Provider Queries
|
||||
|
||||
```typescript
|
||||
// Get the best answer from multiple AIs
|
||||
async function consensusQuery(question: string) {
|
||||
const providers = [
|
||||
ai.openaiProvider.chat({ /* ... */ }),
|
||||
ai.anthropicProvider.chat({ /* ... */ }),
|
||||
ai.perplexityProvider.chat({ /* ... */ }),
|
||||
];
|
||||
|
||||
const responses = await Promise.all(providers);
|
||||
return synthesizeResponses(responses);
|
||||
}
|
||||
```
|
||||
|
||||
## 🛠️ Advanced Configuration
|
||||
|
||||
### Provider-Specific Options
|
||||
|
||||
```typescript
|
||||
const ai = new SmartAi({
|
||||
// OpenAI
|
||||
openaiToken: 'sk-...',
|
||||
|
||||
// Anthropic with extended thinking
|
||||
anthropicToken: 'sk-ant-...',
|
||||
|
||||
// Perplexity for research
|
||||
perplexityToken: 'pplx-...',
|
||||
|
||||
// Groq for speed
|
||||
groqToken: 'gsk_...',
|
||||
|
||||
// Mistral with OCR settings
|
||||
mistralToken: 'your-key',
|
||||
mistral: {
|
||||
chatModel: 'mistral-large-latest',
|
||||
ocrModel: 'mistral-ocr-latest',
|
||||
tableFormat: 'markdown',
|
||||
},
|
||||
|
||||
// XAI (Grok)
|
||||
xaiToken: 'xai-...',
|
||||
|
||||
// ElevenLabs TTS
|
||||
elevenlabsToken: 'sk-...',
|
||||
elevenlabs: {
|
||||
defaultVoiceId: '19STyYD15bswVz51nqLf',
|
||||
defaultModelId: 'eleven_v3',
|
||||
},
|
||||
|
||||
// Ollama (local)
|
||||
ollama: {
|
||||
baseUrl: 'http://localhost:11434',
|
||||
model: 'llama2',
|
||||
visionModel: 'llava',
|
||||
defaultOptions: {
|
||||
num_ctx: 4096,
|
||||
temperature: 0.7,
|
||||
top_p: 0.9,
|
||||
},
|
||||
defaultTimeout: 120000,
|
||||
},
|
||||
|
||||
// Exo (distributed)
|
||||
exo: {
|
||||
baseUrl: 'http://localhost:8080/v1',
|
||||
apiKey: 'optional-key',
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
### Error Handling & Fallbacks
|
||||
|
||||
```typescript
|
||||
class ResilientAI {
|
||||
private providers = ['openai', 'anthropic', 'groq'];
|
||||
|
||||
async query(opts: ChatOptions): Promise<ChatResponse> {
|
||||
for (const provider of this.providers) {
|
||||
try {
|
||||
return await this.ai[`${provider}Provider`].chat(opts);
|
||||
} catch (error) {
|
||||
console.warn(`${provider} failed, trying next...`);
|
||||
continue;
|
||||
}
|
||||
}
|
||||
throw new Error('All providers failed');
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 🎯 Choosing the Right Provider
|
||||
|
||||
| Use Case | Recommended Provider | Why |
|
||||
| --------------------- | -------------------- | --------------------------------------------------------- |
|
||||
| **General Purpose** | OpenAI | Most features, stable, well-documented |
|
||||
| **Complex Reasoning** | Anthropic | Superior logical thinking, extended thinking, safer |
|
||||
| **Document OCR** | Mistral | Native PDF processing, no image conversion overhead |
|
||||
| **Research & Facts** | Perplexity | Web-aware, provides citations |
|
||||
| **Deep Research** | OpenAI | Deep Research API with comprehensive analysis |
|
||||
| **Premium TTS** | ElevenLabs | Most natural voices, 70+ languages, v3 model |
|
||||
| **Speed Critical** | Groq | 10x faster inference, sub-second responses |
|
||||
| **Privacy Critical** | Ollama | 100% local, no data leaves your servers |
|
||||
| **Real-time Data** | XAI | Grok with access to current information |
|
||||
| **Cost Sensitive** | Ollama/Exo | Free (local) or distributed compute |
|
||||
|
||||
## 📈 Roadmap
|
||||
|
||||
- [x] Research & Web Search API
|
||||
- [x] Image generation support (gpt-image-1, DALL-E 3, DALL-E 2)
|
||||
- [x] Extended thinking (Anthropic)
|
||||
- [x] Native PDF OCR (Mistral)
|
||||
- [ ] Streaming function calls
|
||||
- [ ] Voice input processing
|
||||
- [ ] Fine-tuning integration
|
||||
- [ ] Embedding support
|
||||
- [ ] Agent framework
|
||||
- [ ] More providers (Cohere, AI21, etc.)
|
||||
Subpath modules are independent — they import `ai` and provider SDKs directly, not through the core package. This keeps the dependency graph clean and allows tree-shaking.
|
||||
|
||||
## License and Legal Information
|
||||
|
||||
|
||||
Reference in New Issue
Block a user