BREAKING CHANGE(vercel-ai-sdk): migrate to Vercel AI SDK v6 and introduce provider registry (getModel) returning LanguageModelV3

This commit is contained in:
2026-03-05 19:37:29 +00:00
parent 27cef60900
commit c24010c9bc
61 changed files with 4789 additions and 9083 deletions

861
readme.md
View File

@@ -1,12 +1,12 @@
# @push.rocks/smartai
**One API to rule them all** 🚀
**A unified provider registry for the Vercel AI SDK** 🧠⚡
[![npm version](https://img.shields.io/npm/v/@push.rocks/smartai.svg)](https://www.npmjs.com/package/@push.rocks/smartai)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.x-blue.svg)](https://www.typescriptlang.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
SmartAI unifies the world's leading AI providersOpenAI, Anthropic, Mistral, Perplexity, Ollama, Groq, XAI, Exo, and ElevenLabs — under a single, elegant TypeScript interface. Build AI applications at lightning speed without vendor lock-in.
SmartAI gives you a single `getModel()` function that returns a standard `LanguageModelV3` for **any** supported provider — Anthropic, OpenAI, Google, Groq, Mistral, XAI, Perplexity, or Ollama. Use the returned model with the Vercel AI SDK's `generateText()`, `streamText()`, and tool ecosystem. Specialized capabilities like vision, audio, image generation, document analysis, and web research are available as dedicated subpath imports.
## Issue Reporting and Security
@@ -14,679 +14,416 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community
## 🎯 Why SmartAI?
- **🔌 Universal Interface**: Write once, run with any AI provider. Switch between GPT-5, Claude, Llama, or Grok with a single line change.
- **🛡️ Type-Safe**: Full TypeScript support with comprehensive type definitions for all operations.
- **🌊 Streaming First**: Built for real-time applications with native streaming support.
- **🎨 Multi-Modal**: Seamlessly work with text, images, audio, and documents.
- **🏠 Local & Cloud**: Support for both cloud providers and local models via Ollama/Exo.
- **⚡ Zero Lock-In**: Your code remains portable across all AI providers.
- **🔌 One function, eight providers** — `getModel()` returns a standard `LanguageModelV3`. Switch providers by changing a string.
- **🧱 Built on Vercel AI SDK** — Uses `ai` v6 under the hood. Your model works with `generateText()`, `streamText()`, tool calling, structured output, and everything else in the AI SDK ecosystem.
- **🏠 Custom Ollama provider** — A full `LanguageModelV3` implementation for Ollama with support for `think` mode, `num_ctx`, auto-tuned temperature for Qwen models, and native tool calling.
- **💰 Anthropic prompt caching** — Automatic `cacheControl` middleware reduces cost and latency on repeated calls. Enabled by default, opt out with `promptCaching: false`.
- **📦 Modular subpath exports** — Vision, audio, image, document, and research capabilities ship as separate imports. Only import what you need.
- **⚡ Zero lock-in** Your code uses standard AI SDK types. Swap providers without touching application logic.
## 📦 Installation
```bash
npm install @push.rocks/smartai
# or
pnpm install @push.rocks/smartai
```
## 🚀 Quick Start
```typescript
import { SmartAi } from '@push.rocks/smartai';
import { getModel, generateText, streamText } from '@push.rocks/smartai';
// Initialize with your favorite providers
const ai = new SmartAi({
openaiToken: 'sk-...',
anthropicToken: 'sk-ant-...',
elevenlabsToken: 'sk-...',
elevenlabs: {
defaultVoiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
},
// Get a model for any provider
const model = getModel({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
apiKey: process.env.ANTHROPIC_TOKEN,
});
await ai.start();
// Same API, multiple providers
const response = await ai.openaiProvider.chat({
systemMessage: 'You are a helpful assistant.',
userMessage: 'Explain quantum computing in simple terms',
messageHistory: [],
// Use it with the standard AI SDK functions
const result = await generateText({
model,
prompt: 'Explain quantum computing in simple terms.',
});
console.log(response.message);
console.log(result.text);
```
## 📊 Provider Capabilities Matrix
That's it. Change `provider` to `'openai'` and `model` to `'gpt-4o'` and the rest of your code stays exactly the same.
Choose the right provider for your use case:
## 🔧 Core API
| Provider | Chat | Streaming | TTS | Vision | Documents | Research | Images | Highlights |
| -------------- | :--: | :-------: | :-: | :----: | :-------: | :------: | :----: | --------------------------------------------------------------- |
| **OpenAI** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | gpt-image-1 • DALL-E 3 • Deep Research API |
| **Anthropic** | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | Claude Sonnet 4.5 • Extended Thinking • Web Search API |
| **Mistral** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | Native PDF OCR • mistral-large • Fast inference |
| **ElevenLabs** | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | Premium TTS • 70+ languages • v3 model |
| **Ollama** | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | 100% local • Privacy-first • No API costs |
| **XAI** | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | Grok 2 • Real-time data |
| **Perplexity** | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | Web-aware • Research-focused • Sonar Pro |
| **Groq** | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | 10x faster • LPU inference • Llama 3.3 |
| **Exo** | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | Distributed • P2P compute • Decentralized |
### `getModel(options): LanguageModelV3`
## 🎮 Core Features
### 💬 Universal Chat Interface
Works identically across all providers:
The primary export. Returns a standard `LanguageModelV3` you can use with any AI SDK function.
```typescript
// Use GPT-5 for complex reasoning
const gptResponse = await ai.openaiProvider.chat({
systemMessage: 'You are an expert physicist.',
userMessage: 'Explain the implications of quantum entanglement',
messageHistory: [],
});
import { getModel } from '@push.rocks/smartai';
import type { ISmartAiOptions } from '@push.rocks/smartai';
// Use Claude for safety-critical applications
const claudeResponse = await ai.anthropicProvider.chat({
systemMessage: 'You are a medical advisor.',
userMessage: 'Review this patient data for concerns',
messageHistory: [],
});
const options: ISmartAiOptions = {
provider: 'anthropic', // 'anthropic' | 'openai' | 'google' | 'groq' | 'mistral' | 'xai' | 'perplexity' | 'ollama'
model: 'claude-sonnet-4-5-20250929',
apiKey: 'sk-ant-...',
// Anthropic-only: prompt caching (default: true)
promptCaching: true,
// Ollama-only: base URL (default: http://localhost:11434)
baseUrl: 'http://localhost:11434',
// Ollama-only: model runtime options
ollamaOptions: { think: true, num_ctx: 4096 },
};
// Use Groq for lightning-fast responses
const groqResponse = await ai.groqProvider.chat({
systemMessage: 'You are a code reviewer.',
userMessage: 'Quick! Find the bug in this code: ...',
messageHistory: [],
});
const model = getModel(options);
```
### 🌊 Real-Time Streaming
### Re-exported AI SDK Functions
Build responsive chat interfaces with token-by-token streaming:
SmartAI re-exports the most commonly used functions from `ai` for convenience:
```typescript
// Create a chat stream
const stream = await ai.openaiProvider.chatStream(inputStream);
const reader = stream.getReader();
import {
getModel,
generateText,
streamText,
tool,
jsonSchema,
} from '@push.rocks/smartai';
// Display responses as they arrive
while (true) {
const { done, value } = await reader.read();
if (done) break;
import type {
ModelMessage,
ToolSet,
StreamTextResult,
LanguageModelV3,
} from '@push.rocks/smartai';
```
// Update UI in real-time
process.stdout.write(value);
## 🤖 Supported Providers
| Provider | Package | Example Models |
|----------|---------|----------------|
| **Anthropic** | `@ai-sdk/anthropic` | `claude-sonnet-4-5-20250929`, `claude-opus-4-5-20250929` |
| **OpenAI** | `@ai-sdk/openai` | `gpt-4o`, `gpt-4o-mini`, `o3-mini` |
| **Google** | `@ai-sdk/google` | `gemini-2.0-flash`, `gemini-2.5-pro` |
| **Groq** | `@ai-sdk/groq` | `llama-3.3-70b-versatile`, `mixtral-8x7b-32768` |
| **Mistral** | `@ai-sdk/mistral` | `mistral-large-latest`, `mistral-small-latest` |
| **XAI** | `@ai-sdk/xai` | `grok-3`, `grok-3-mini` |
| **Perplexity** | `@ai-sdk/perplexity` | `sonar-pro`, `sonar` |
| **Ollama** | Custom `LanguageModelV3` | `qwen3:8b`, `llama3:8b`, `deepseek-r1` |
## 💬 Text Generation
### Generate Text
```typescript
import { getModel, generateText } from '@push.rocks/smartai';
const model = getModel({
provider: 'openai',
model: 'gpt-4o',
apiKey: process.env.OPENAI_TOKEN,
});
const result = await generateText({
model,
system: 'You are a helpful assistant.',
prompt: 'What is 2 + 2?',
});
console.log(result.text); // "4"
```
### Stream Text
```typescript
import { getModel, streamText } from '@push.rocks/smartai';
const model = getModel({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
apiKey: process.env.ANTHROPIC_TOKEN,
});
const result = await streamText({
model,
prompt: 'Count from 1 to 10.',
});
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}
```
### 🎙️ Text-to-Speech
Generate natural voices with OpenAI or ElevenLabs:
### Tool Calling
```typescript
// OpenAI TTS
const audioStream = await ai.openaiProvider.audio({
message: 'Welcome to the future of AI development!',
import { getModel, generateText, tool, jsonSchema } from '@push.rocks/smartai';
const model = getModel({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
apiKey: process.env.ANTHROPIC_TOKEN,
});
// ElevenLabs TTS - Premium quality, natural voices (uses v3 by default)
const elevenLabsAudio = await ai.elevenlabsProvider.audio({
message: 'Experience the most lifelike text to speech technology.',
voiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
modelId: 'eleven_v3', // Optional: defaults to eleven_v3 (70+ languages)
voiceSettings: {
// Optional: fine-tune voice characteristics
stability: 0.5, // 0-1: Speech consistency
similarity_boost: 0.8, // 0-1: Voice similarity to original
style: 0.0, // 0-1: Expressiveness
use_speaker_boost: true, // Enhanced clarity
const result = await generateText({
model,
prompt: 'What is the weather in London?',
tools: {
getWeather: tool({
description: 'Get weather for a location',
parameters: jsonSchema({
type: 'object',
properties: {
location: { type: 'string' },
},
required: ['location'],
}),
execute: async ({ location }) => {
return { temperature: 18, condition: 'cloudy' };
},
}),
},
});
```
## 🏠 Ollama (Local Models)
The custom Ollama provider implements `LanguageModelV3` directly, calling Ollama's native `/api/chat` endpoint. This gives you features that generic OpenAI-compatible wrappers miss:
```typescript
import { getModel, generateText } from '@push.rocks/smartai';
const model = getModel({
provider: 'ollama',
model: 'qwen3:8b',
baseUrl: 'http://localhost:11434', // default
ollamaOptions: {
think: true, // Enable thinking/reasoning mode
num_ctx: 8192, // Context window size
temperature: 0.7, // Override default (Qwen models auto-default to 0.55)
},
});
// Stream directly to speakers or save to file
audioStream.pipe(fs.createWriteStream('welcome.mp3'));
const result = await generateText({
model,
prompt: 'Solve this step by step: what is 15% of 340?',
});
console.log(result.text);
```
### 👁️ Vision Analysis
### Ollama Features
Understand images with multiple providers:
- **`think` mode** — Enables reasoning for models that support it (Qwen3, QwQ, DeepSeek-R1). The `think` parameter is sent at the top level of the request body as required by the Ollama API.
- **Auto-tuned temperature** — Qwen models automatically get `temperature: 0.55` when no explicit temperature is set, matching the recommended inference setting.
- **Native tool calling** — Full tool call support via Ollama's native format (not shimmed through OpenAI-compatible endpoints).
- **Streaming with reasoning** — `doStream()` emits proper `reasoning-start`, `reasoning-delta`, `reasoning-end` parts alongside text.
- **All Ollama options** — `num_ctx`, `top_k`, `top_p`, `repeat_penalty`, `num_predict`, `stop`, `seed`.
## 💰 Anthropic Prompt Caching
When using the Anthropic provider, SmartAI automatically wraps the model with caching middleware that adds `cacheControl: { type: 'ephemeral' }` to the last system message and last user message. This can significantly reduce cost and latency for repeated calls with the same system prompt.
```typescript
const image = fs.readFileSync('product-photo.jpg');
// OpenAI: General purpose vision
const gptVision = await ai.openaiProvider.vision({
image,
prompt: 'Describe this product and suggest marketing angles',
// Caching enabled by default
const model = getModel({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
apiKey: process.env.ANTHROPIC_TOKEN,
});
// Anthropic: Detailed analysis with extended thinking
const claudeVision = await ai.anthropicProvider.vision({
image,
prompt: 'Identify any safety concerns or defects',
});
// Ollama: Private, local analysis
const ollamaVision = await ai.ollamaProvider.vision({
image,
prompt: 'Extract all text and categorize the content',
// Opt out of caching
const modelNoCaching = getModel({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
apiKey: process.env.ANTHROPIC_TOKEN,
promptCaching: false,
});
```
### 📄 Document Intelligence
Extract insights from PDFs with AI:
You can also use the middleware directly:
```typescript
const contract = fs.readFileSync('contract.pdf');
const invoice = fs.readFileSync('invoice.pdf');
import { createAnthropicCachingMiddleware } from '@push.rocks/smartai';
import { wrapLanguageModel } from 'ai';
// Analyze documents with OpenAI
const analysis = await ai.openaiProvider.document({
systemMessage: 'You are a legal expert.',
userMessage: 'Compare these documents and highlight key differences',
messageHistory: [],
pdfDocuments: [contract, invoice],
});
// Multi-document analysis with Anthropic
const taxDocs = [form1099, w2, receipts];
const taxAnalysis = await ai.anthropicProvider.document({
systemMessage: 'You are a tax advisor.',
userMessage: 'Prepare a tax summary from these documents',
messageHistory: [],
pdfDocuments: taxDocs,
});
const middleware = createAnthropicCachingMiddleware();
const cachedModel = wrapLanguageModel({ model: baseModel, middleware });
```
### 🔬 Research & Web Search
## 📦 Subpath Exports
Perform deep research with web search capabilities across multiple providers:
SmartAI provides specialized capabilities as separate subpath imports. Each one is a focused utility that takes a model (or API key) and does one thing well.
### 👁️ Vision — `@push.rocks/smartai/vision`
Analyze images using any vision-capable model.
```typescript
// OpenAI Deep Research - Comprehensive analysis
const deepResearch = await ai.openaiProvider.research({
query: 'What are the latest developments in quantum computing?',
searchDepth: 'deep',
includeWebSearch: true,
import { analyzeImage } from '@push.rocks/smartai/vision';
import { getModel } from '@push.rocks/smartai';
import * as fs from 'fs';
const model = getModel({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
apiKey: process.env.ANTHROPIC_TOKEN,
});
console.log(deepResearch.answer);
console.log('Sources:', deepResearch.sources);
// Anthropic Web Search - Domain-filtered research
import { AnthropicProvider } from '@push.rocks/smartai';
const anthropic = new AnthropicProvider({
anthropicToken: 'sk-ant-...',
enableWebSearch: true,
searchDomainAllowList: ['nature.com', 'science.org'],
const description = await analyzeImage({
model,
image: fs.readFileSync('photo.jpg'),
prompt: 'Describe this image in detail.',
mediaType: 'image/jpeg', // optional, defaults to 'image/jpeg'
});
const scientificResearch = await anthropic.research({
query: 'Latest breakthroughs in CRISPR gene editing',
searchDepth: 'advanced',
});
// Perplexity - Research-focused with citations
const perplexityResearch = await ai.perplexityProvider.research({
query: 'Current state of autonomous vehicle technology',
searchDepth: 'deep', // Uses Sonar Pro model
});
console.log(description);
```
**Research Options:**
**`analyzeImage(options)`** accepts:
- `model` — Any `LanguageModelV3` with vision support
- `image``Buffer` or `Uint8Array`
- `prompt` — What to ask about the image
- `mediaType``'image/jpeg'` | `'image/png'` | `'image/webp'` | `'image/gif'`
- `searchDepth`: `'basic'` | `'advanced'` | `'deep'`
- `maxSources`: Number of sources to include
- `includeWebSearch`: Enable web search (OpenAI)
- `background`: Run as background task (OpenAI)
### 🎙️ Audio — `@push.rocks/smartai/audio`
**Supported Providers:**
- **OpenAI**: Deep Research API with specialized models (`o3-deep-research-*`, `o4-mini-deep-research-*`)
- **Anthropic**: Web Search API with domain filtering
- **Perplexity**: Sonar and Sonar Pro models with built-in citations
### 🧠 Extended Thinking (Anthropic)
Enable Claude to spend more time reasoning about complex problems before generating responses:
Text-to-speech using OpenAI's TTS models.
```typescript
import { AnthropicProvider } from '@push.rocks/smartai';
import { textToSpeech } from '@push.rocks/smartai/audio';
import * as fs from 'fs';
// Configure extended thinking mode at provider level
const anthropic = new AnthropicProvider({
anthropicToken: 'sk-ant-...',
extendedThinking: 'normal', // Options: 'quick' | 'normal' | 'deep' | 'off'
const stream = await textToSpeech({
apiKey: process.env.OPENAI_TOKEN,
text: 'Welcome to the future of AI development!',
voice: 'nova', // 'alloy' | 'echo' | 'fable' | 'onyx' | 'nova' | 'shimmer'
model: 'tts-1-hd', // 'tts-1' | 'tts-1-hd'
responseFormat: 'mp3', // 'mp3' | 'opus' | 'aac' | 'flac'
speed: 1.0, // 0.25 to 4.0
});
await anthropic.start();
// Extended thinking is automatically applied to all methods
const response = await anthropic.chat({
systemMessage: 'You are an expert mathematician.',
userMessage: 'Prove the Pythagorean theorem from first principles',
messageHistory: [],
});
stream.pipe(fs.createWriteStream('welcome.mp3'));
```
**Thinking Modes:**
### 🎨 Image — `@push.rocks/smartai/image`
| Mode | Budget Tokens | Use Case |
| ---------- | ------------- | ------------------------------------------------ |
| `'quick'` | 2,048 | Lightweight reasoning for simple queries |
| `'normal'` | 8,000 | **Default** — Balanced reasoning for most tasks |
| `'deep'` | 16,000 | Complex reasoning for difficult problems |
| `'off'` | 0 | Disable extended thinking |
**Best Practices:**
- Start with `'normal'` (default) for general usage
- Use `'deep'` for complex analytical tasks, philosophy, mathematics, or research
- Use `'quick'` for simple factual queries where deep reasoning isn't needed
- Thinking budget counts against total token usage
### 📑 Native PDF OCR (Mistral)
Mistral provides native PDF document processing via their OCR API — no image conversion required:
Generate and edit images using OpenAI's image models.
```typescript
import { MistralProvider } from '@push.rocks/smartai';
import { generateImage, editImage } from '@push.rocks/smartai/image';
const mistral = new MistralProvider({
mistralToken: 'your-api-key',
chatModel: 'mistral-large-latest', // Default
ocrModel: 'mistral-ocr-latest', // Default
tableFormat: 'markdown', // 'markdown' | 'html'
});
await mistral.start();
// Direct PDF processing - no image conversion overhead
const result = await mistral.document({
systemMessage: 'You are a document analyst.',
userMessage: 'Extract all invoice details and calculate the total.',
pdfDocuments: [invoicePdfBuffer],
messageHistory: [],
});
```
**Key Advantage**: Unlike other providers that convert PDFs to images first, Mistral's OCR API processes PDFs natively, potentially offering faster and more accurate text extraction for document-heavy workloads.
**Supported Formats:**
- Native PDF processing via Files API
- Image OCR (JPEG, PNG, GIF, WebP) for vision tasks
- Table extraction with markdown or HTML output
### 🎨 Image Generation & Editing
Generate and edit images with OpenAI's cutting-edge models:
```typescript
// Basic image generation with gpt-image-1
const image = await ai.openaiProvider.imageGenerate({
prompt: 'A futuristic robot assistant in a modern office, digital art',
model: 'gpt-image-1',
quality: 'high',
// Generate an image
const result = await generateImage({
apiKey: process.env.OPENAI_TOKEN,
prompt: 'A futuristic cityscape at sunset, digital art',
model: 'gpt-image-1', // 'gpt-image-1' | 'dall-e-3' | 'dall-e-2'
quality: 'high', // 'low' | 'medium' | 'high' | 'auto'
size: '1024x1024',
background: 'transparent', // gpt-image-1 only
outputFormat: 'png', // 'png' | 'jpeg' | 'webp'
n: 1,
});
// Save the generated image
const imageBuffer = Buffer.from(image.images[0].b64_json!, 'base64');
fs.writeFileSync('robot.png', imageBuffer);
// Advanced: Transparent background with custom format
const logo = await ai.openaiProvider.imageGenerate({
prompt: 'Minimalist mountain peak logo, geometric design',
model: 'gpt-image-1',
quality: 'high',
size: '1024x1024',
background: 'transparent',
outputFormat: 'png',
});
// WebP with compression for web use
const webImage = await ai.openaiProvider.imageGenerate({
prompt: 'Product showcase: sleek smartphone on marble surface',
model: 'gpt-image-1',
quality: 'high',
size: '1536x1024',
outputFormat: 'webp',
outputCompression: 85,
});
// Superior text rendering (gpt-image-1's strength)
const signage = await ai.openaiProvider.imageGenerate({
prompt:
'Vintage cafe sign saying "COFFEE & CODE" in hand-lettered typography',
model: 'gpt-image-1',
quality: 'high',
size: '1024x1024',
});
// Generate multiple variations at once
const variations = await ai.openaiProvider.imageGenerate({
prompt: 'Abstract geometric pattern, colorful minimalist art',
model: 'gpt-image-1',
n: 3,
quality: 'medium',
size: '1024x1024',
});
// result.images[0].b64_json — base64-encoded image data
const imageBuffer = Buffer.from(result.images[0].b64_json!, 'base64');
// Edit an existing image
const editedImage = await ai.openaiProvider.imageEdit({
image: originalImageBuffer,
prompt: 'Add sunglasses and change the background to a beach sunset',
const edited = await editImage({
apiKey: process.env.OPENAI_TOKEN,
image: imageBuffer,
prompt: 'Add a rainbow in the sky',
model: 'gpt-image-1',
quality: 'high',
});
```
**Image Generation Options:**
### 📄 Document — `@push.rocks/smartai/document`
- `model`: `'gpt-image-1'` | `'dall-e-3'` | `'dall-e-2'`
- `quality`: `'low'` | `'medium'` | `'high'` | `'auto'`
- `size`: Multiple aspect ratios up to 4096×4096
- `background`: `'transparent'` | `'opaque'` | `'auto'`
- `outputFormat`: `'png'` | `'jpeg'` | `'webp'`
- `outputCompression`: 0100 for webp/jpeg
- `moderation`: `'low'` | `'auto'`
- `n`: Number of images (110)
**gpt-image-1 Advantages:**
- Superior text rendering in images
- Up to 4096×4096 resolution
- Transparent background support
- Advanced output formats (WebP with compression)
- Better prompt understanding
- Streaming support for progressive rendering
### 🔄 Persistent Conversations
Maintain context across interactions:
Analyze PDF documents by converting them to images and using a vision model. Uses `@push.rocks/smartpdf` for PDF-to-PNG conversion (requires Chromium/Puppeteer).
```typescript
// Create a coding assistant conversation
const assistant = ai.createConversation('openai');
await assistant.setSystemMessage('You are an expert TypeScript developer.');
import { analyzeDocuments, stopSmartpdf } from '@push.rocks/smartai/document';
import { getModel } from '@push.rocks/smartai';
import * as fs from 'fs';
// First question
const inputWriter = assistant.getInputStreamWriter();
await inputWriter.write('How do I implement a singleton pattern?');
// Continue the conversation
await inputWriter.write('Now show me how to make it thread-safe');
// The assistant remembers the entire context
```
## 🚀 Real-World Examples
### Build a Customer Support Bot
```typescript
const supportBot = new SmartAi({
anthropicToken: process.env.ANTHROPIC_KEY, // Claude for empathetic responses
const model = getModel({
provider: 'anthropic',
model: 'claude-sonnet-4-5-20250929',
apiKey: process.env.ANTHROPIC_TOKEN,
});
async function handleCustomerQuery(query: string, history: ChatMessage[]) {
try {
const response = await supportBot.anthropicProvider.chat({
systemMessage: `You are a helpful customer support agent.
Be empathetic, professional, and solution-oriented.`,
userMessage: query,
messageHistory: history,
});
return response.message;
} catch (error) {
// Fallback to another provider if needed
return await supportBot.openaiProvider.chat({ /* ... */ });
}
}
```
### Create a Code Review Assistant
```typescript
const codeReviewer = new SmartAi({
groqToken: process.env.GROQ_KEY, // Groq for speed
const analysis = await analyzeDocuments({
model,
systemMessage: 'You are a legal document analyst.',
userMessage: 'Summarize the key terms and conditions.',
pdfDocuments: [fs.readFileSync('contract.pdf')],
messageHistory: [], // optional: prior conversation context
});
async function reviewCode(code: string, language: string) {
const review = await codeReviewer.groqProvider.chat({
systemMessage: `You are a ${language} expert. Review code for:
- Security vulnerabilities
- Performance issues
- Best practices
- Potential bugs`,
userMessage: `Review this code:\n\n${code}`,
messageHistory: [],
});
console.log(analysis);
return review.message;
}
// Clean up the SmartPdf instance when done
await stopSmartpdf();
```
### Build a Research Assistant
### 🔬 Research — `@push.rocks/smartai/research`
Perform web-search-powered research using Anthropic's `web_search_20250305` tool.
```typescript
const researcher = new SmartAi({
perplexityToken: process.env.PERPLEXITY_KEY,
import { research } from '@push.rocks/smartai/research';
const result = await research({
apiKey: process.env.ANTHROPIC_TOKEN,
query: 'What are the latest developments in quantum computing?',
searchDepth: 'basic', // 'basic' | 'advanced' | 'deep'
maxSources: 10, // optional: limit number of search results
allowedDomains: ['nature.com', 'arxiv.org'], // optional: restrict to domains
blockedDomains: ['reddit.com'], // optional: exclude domains
});
async function research(topic: string) {
// Perplexity excels at web-aware research
const findings = await researcher.perplexityProvider.research({
query: `Research the latest developments in ${topic}`,
searchDepth: 'deep',
});
return {
answer: findings.answer,
sources: findings.sources,
};
}
console.log(result.answer);
console.log('Sources:', result.sources); // Array<{ url, title, snippet }>
console.log('Queries:', result.searchQueries); // search queries the model used
```
### Local AI for Sensitive Data
## 🧪 Testing
```typescript
const localAI = new SmartAi({
ollama: {
baseUrl: 'http://localhost:11434',
model: 'llama2',
visionModel: 'llava',
},
});
```bash
# All tests
pnpm test
// Process sensitive documents without leaving your infrastructure
async function analyzeSensitiveDoc(pdfBuffer: Buffer) {
const analysis = await localAI.ollamaProvider.document({
systemMessage: 'Extract and summarize key information.',
userMessage: 'Analyze this confidential document',
messageHistory: [],
pdfDocuments: [pdfBuffer],
});
// Data never leaves your servers
return analysis.message;
}
# Individual test files
tstest test/test.smartai.ts --verbose # Core getModel + generateText + streamText
tstest test/test.ollama.ts --verbose # Ollama provider (mocked, no API needed)
tstest test/test.vision.ts --verbose # Vision analysis
tstest test/test.image.ts --verbose # Image generation
tstest test/test.research.ts --verbose # Web research
tstest test/test.audio.ts --verbose # Text-to-speech
tstest test/test.document.ts --verbose # Document analysis (needs Chromium)
```
## ⚡ Performance Tips
Most tests skip gracefully when API keys are not set. The Ollama tests are fully mocked and require no external services.
### 1. Provider Selection Strategy
## 📐 Architecture
```typescript
class SmartAIRouter {
constructor(private ai: SmartAi) {}
async query(
message: string,
requirements: {
speed?: boolean;
accuracy?: boolean;
cost?: boolean;
privacy?: boolean;
}
) {
if (requirements.privacy) {
return this.ai.ollamaProvider.chat({ /* ... */ }); // Local only
}
if (requirements.speed) {
return this.ai.groqProvider.chat({ /* ... */ }); // 10x faster
}
if (requirements.accuracy) {
return this.ai.anthropicProvider.chat({ /* ... */ }); // Best reasoning
}
// Default fallback
return this.ai.openaiProvider.chat({ /* ... */ });
}
}
```
@push.rocks/smartai
├── ts/ # Core package
│ ├── index.ts # Re-exports getModel, AI SDK functions, types
├── smartai.classes.smartai.ts # getModel() — provider switch
├── smartai.interfaces.ts # ISmartAiOptions, TProvider, IOllamaModelOptions
├── smartai.provider.ollama.ts # Custom LanguageModelV3 for Ollama
├── smartai.middleware.anthropic.ts # Prompt caching middleware
│ └── plugins.ts # AI SDK provider factories
├── ts_vision/ # @push.rocks/smartai/vision
├── ts_audio/ # @push.rocks/smartai/audio
├── ts_image/ # @push.rocks/smartai/image
├── ts_document/ # @push.rocks/smartai/document
└── ts_research/ # @push.rocks/smartai/research
```
### 2. Streaming for Large Responses
The core package is a thin registry. `getModel()` creates the appropriate `@ai-sdk/*` provider, calls it with the model ID, and returns the resulting `LanguageModelV3`. For Anthropic, it optionally wraps the model with prompt caching middleware. For Ollama, it returns a custom `LanguageModelV3` implementation that talks directly to Ollama's `/api/chat` endpoint.
```typescript
// Don't wait for the entire response
async function streamResponse(userQuery: string) {
const stream = await ai.openaiProvider.chatStream(
createInputStream(userQuery)
);
// Process tokens as they arrive
for await (const chunk of stream) {
updateUI(chunk); // Immediate feedback
await processChunk(chunk); // Parallel processing
}
}
```
### 3. Parallel Multi-Provider Queries
```typescript
// Get the best answer from multiple AIs
async function consensusQuery(question: string) {
const providers = [
ai.openaiProvider.chat({ /* ... */ }),
ai.anthropicProvider.chat({ /* ... */ }),
ai.perplexityProvider.chat({ /* ... */ }),
];
const responses = await Promise.all(providers);
return synthesizeResponses(responses);
}
```
## 🛠️ Advanced Configuration
### Provider-Specific Options
```typescript
const ai = new SmartAi({
// OpenAI
openaiToken: 'sk-...',
// Anthropic with extended thinking
anthropicToken: 'sk-ant-...',
// Perplexity for research
perplexityToken: 'pplx-...',
// Groq for speed
groqToken: 'gsk_...',
// Mistral with OCR settings
mistralToken: 'your-key',
mistral: {
chatModel: 'mistral-large-latest',
ocrModel: 'mistral-ocr-latest',
tableFormat: 'markdown',
},
// XAI (Grok)
xaiToken: 'xai-...',
// ElevenLabs TTS
elevenlabsToken: 'sk-...',
elevenlabs: {
defaultVoiceId: '19STyYD15bswVz51nqLf',
defaultModelId: 'eleven_v3',
},
// Ollama (local)
ollama: {
baseUrl: 'http://localhost:11434',
model: 'llama2',
visionModel: 'llava',
defaultOptions: {
num_ctx: 4096,
temperature: 0.7,
top_p: 0.9,
},
defaultTimeout: 120000,
},
// Exo (distributed)
exo: {
baseUrl: 'http://localhost:8080/v1',
apiKey: 'optional-key',
},
});
```
### Error Handling & Fallbacks
```typescript
class ResilientAI {
private providers = ['openai', 'anthropic', 'groq'];
async query(opts: ChatOptions): Promise<ChatResponse> {
for (const provider of this.providers) {
try {
return await this.ai[`${provider}Provider`].chat(opts);
} catch (error) {
console.warn(`${provider} failed, trying next...`);
continue;
}
}
throw new Error('All providers failed');
}
}
```
## 🎯 Choosing the Right Provider
| Use Case | Recommended Provider | Why |
| --------------------- | -------------------- | --------------------------------------------------------- |
| **General Purpose** | OpenAI | Most features, stable, well-documented |
| **Complex Reasoning** | Anthropic | Superior logical thinking, extended thinking, safer |
| **Document OCR** | Mistral | Native PDF processing, no image conversion overhead |
| **Research & Facts** | Perplexity | Web-aware, provides citations |
| **Deep Research** | OpenAI | Deep Research API with comprehensive analysis |
| **Premium TTS** | ElevenLabs | Most natural voices, 70+ languages, v3 model |
| **Speed Critical** | Groq | 10x faster inference, sub-second responses |
| **Privacy Critical** | Ollama | 100% local, no data leaves your servers |
| **Real-time Data** | XAI | Grok with access to current information |
| **Cost Sensitive** | Ollama/Exo | Free (local) or distributed compute |
## 📈 Roadmap
- [x] Research & Web Search API
- [x] Image generation support (gpt-image-1, DALL-E 3, DALL-E 2)
- [x] Extended thinking (Anthropic)
- [x] Native PDF OCR (Mistral)
- [ ] Streaming function calls
- [ ] Voice input processing
- [ ] Fine-tuning integration
- [ ] Embedding support
- [ ] Agent framework
- [ ] More providers (Cohere, AI21, etc.)
Subpath modules are independent — they import `ai` and provider SDKs directly, not through the core package. This keeps the dependency graph clean and allows tree-shaking.
## License and Legal Information