BREAKING CHANGE(vercel-ai-sdk): migrate to Vercel AI SDK v6 and introduce provider registry (getModel) returning LanguageModelV3

2026-03-05 19:37:29 +00:00
parent 27cef60900
commit c24010c9bc
61 changed files with 4789 additions and 9083 deletions
@@ -1,12 +1,12 @@
 # @push.rocks/smartai

-**One API to rule them all** 🚀
+**A unified provider registry for the Vercel AI SDK** 🧠⚡

 [![npm version](https://img.shields.io/npm/v/@push.rocks/smartai.svg)](https://www.npmjs.com/package/@push.rocks/smartai)
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.x-blue.svg)](https://www.typescriptlang.org/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

-SmartAI unifies the world's leading AI providers — OpenAI, Anthropic, Mistral, Perplexity, Ollama, Groq, XAI, Exo, and ElevenLabs — under a single, elegant TypeScript interface. Build AI applications at lightning speed without vendor lock-in.
+SmartAI gives you a single `getModel()` function that returns a standard `LanguageModelV3` for **any** supported provider — Anthropic, OpenAI, Google, Groq, Mistral, XAI, Perplexity, or Ollama. Use the returned model with the Vercel AI SDK's `generateText()`, `streamText()`, and tool ecosystem. Specialized capabilities like vision, audio, image generation, document analysis, and web research are available as dedicated subpath imports.

 ## Issue Reporting and Security

@@ -14,679 +14,416 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community

 ## 🎯 Why SmartAI?

- **🔌 Universal Interface**: Write once, run with any AI provider. Switch between GPT-5, Claude, Llama, or Grok with a single line change.
- **🛡️ Type-Safe**: Full TypeScript support with comprehensive type definitions for all operations.
- **🌊 Streaming First**: Built for real-time applications with native streaming support.
- **🎨 Multi-Modal**: Seamlessly work with text, images, audio, and documents.
- **🏠 Local & Cloud**: Support for both cloud providers and local models via Ollama/Exo.
- **⚡ Zero Lock-In**: Your code remains portable across all AI providers.
+- **🔌 One function, eight providers** — `getModel()` returns a standard `LanguageModelV3`. Switch providers by changing a string.
+- **🧱 Built on Vercel AI SDK** — Uses `ai` v6 under the hood. Your model works with `generateText()`, `streamText()`, tool calling, structured output, and everything else in the AI SDK ecosystem.
+- **🏠 Custom Ollama provider** — A full `LanguageModelV3` implementation for Ollama with support for `think` mode, `num_ctx`, auto-tuned temperature for Qwen models, and native tool calling.
+- **💰 Anthropic prompt caching** — Automatic `cacheControl` middleware reduces cost and latency on repeated calls. Enabled by default, opt out with `promptCaching: false`.
+- **📦 Modular subpath exports** — Vision, audio, image, document, and research capabilities ship as separate imports. Only import what you need.
+- **⚡ Zero lock-in** — Your code uses standard AI SDK types. Swap providers without touching application logic.

 ## 📦 Installation

 ```bash
-npm install @push.rocks/smartai
-# or
 pnpm install @push.rocks/smartai
 ```

 ## 🚀 Quick Start

 ```typescript
-import { SmartAi } from '@push.rocks/smartai';
+import { getModel, generateText, streamText } from '@push.rocks/smartai';

-// Initialize with your favorite providers
-const ai = new SmartAi({
-  openaiToken: 'sk-...',
-  anthropicToken: 'sk-ant-...',
-  elevenlabsToken: 'sk-...',
-  elevenlabs: {
-    defaultVoiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
-  },
+// Get a model for any provider
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
 });

-await ai.start();
-
-// Same API, multiple providers
-const response = await ai.openaiProvider.chat({
-  systemMessage: 'You are a helpful assistant.',
-  userMessage: 'Explain quantum computing in simple terms',
-  messageHistory: [],
+// Use it with the standard AI SDK functions
+const result = await generateText({
+  model,
+  prompt: 'Explain quantum computing in simple terms.',
 });

-console.log(response.message);
+console.log(result.text);
 ```

-## 📊 Provider Capabilities Matrix
+That's it. Change `provider` to `'openai'` and `model` to `'gpt-4o'` and the rest of your code stays exactly the same.

-Choose the right provider for your use case:
+## 🔧 Core API

-| Provider       | Chat | Streaming | TTS | Vision | Documents | Research | Images | Highlights                                                      |
-| -------------- | :--: | :-------: | :-: | :----: | :-------: | :------: | :----: | --------------------------------------------------------------- |
-| **OpenAI**     |  ✅  |    ✅     | ✅  |   ✅   |    ✅     |    ✅    |   ✅   | gpt-image-1 • DALL-E 3 • Deep Research API                      |
-| **Anthropic**  |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ✅    |   ❌   | Claude Sonnet 4.5 • Extended Thinking • Web Search API          |
-| **Mistral**    |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | Native PDF OCR • mistral-large • Fast inference                 |
-| **ElevenLabs** |  ❌  |    ❌     | ✅  |   ❌   |    ❌     |    ❌    |   ❌   | Premium TTS • 70+ languages • v3 model                          |
-| **Ollama**     |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | 100% local • Privacy-first • No API costs                       |
-| **XAI**        |  ✅  |    ✅     | ❌  |   ❌   |    ✅     |    ❌    |   ❌   | Grok 2 • Real-time data                                         |
-| **Perplexity** |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ✅    |   ❌   | Web-aware • Research-focused • Sonar Pro                        |
-| **Groq**       |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ❌    |   ❌   | 10x faster • LPU inference • Llama 3.3                          |
-| **Exo**        |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ❌    |   ❌   | Distributed • P2P compute • Decentralized                       |
+### `getModel(options): LanguageModelV3`

-## 🎮 Core Features
-
-### 💬 Universal Chat Interface
-
-Works identically across all providers:
+The primary export. Returns a standard `LanguageModelV3` you can use with any AI SDK function.

 ```typescript
-// Use GPT-5 for complex reasoning
-const gptResponse = await ai.openaiProvider.chat({
-  systemMessage: 'You are an expert physicist.',
-  userMessage: 'Explain the implications of quantum entanglement',
-  messageHistory: [],
-});
+import { getModel } from '@push.rocks/smartai';
+import type { ISmartAiOptions } from '@push.rocks/smartai';

-// Use Claude for safety-critical applications
-const claudeResponse = await ai.anthropicProvider.chat({
-  systemMessage: 'You are a medical advisor.',
-  userMessage: 'Review this patient data for concerns',
-  messageHistory: [],
-});
+const options: ISmartAiOptions = {
+  provider: 'anthropic',  // 'anthropic' | 'openai' | 'google' | 'groq' | 'mistral' | 'xai' | 'perplexity' | 'ollama'
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: 'sk-ant-...',
+  // Anthropic-only: prompt caching (default: true)
+  promptCaching: true,
+  // Ollama-only: base URL (default: http://localhost:11434)
+  baseUrl: 'http://localhost:11434',
+  // Ollama-only: model runtime options
+  ollamaOptions: { think: true, num_ctx: 4096 },
+};

-// Use Groq for lightning-fast responses
-const groqResponse = await ai.groqProvider.chat({
-  systemMessage: 'You are a code reviewer.',
-  userMessage: 'Quick! Find the bug in this code: ...',
-  messageHistory: [],
-});
+const model = getModel(options);
 ```

-### 🌊 Real-Time Streaming
+### Re-exported AI SDK Functions

-Build responsive chat interfaces with token-by-token streaming:
+SmartAI re-exports the most commonly used functions from `ai` for convenience:

 ```typescript
-// Create a chat stream
-const stream = await ai.openaiProvider.chatStream(inputStream);
-const reader = stream.getReader();
+import {
+  getModel,
+  generateText,
+  streamText,
+  tool,
+  jsonSchema,
+} from '@push.rocks/smartai';

-// Display responses as they arrive
-while (true) {
-  const { done, value } = await reader.read();
-  if (done) break;
+import type {
+  ModelMessage,
+  ToolSet,
+  StreamTextResult,
+  LanguageModelV3,
+} from '@push.rocks/smartai';
+```

-  // Update UI in real-time
-  process.stdout.write(value);
+## 🤖 Supported Providers
+
+| Provider | Package | Example Models |
+|----------|---------|----------------|
+| **Anthropic** | `@ai-sdk/anthropic` | `claude-sonnet-4-5-20250929`, `claude-opus-4-5-20250929` |
+| **OpenAI** | `@ai-sdk/openai` | `gpt-4o`, `gpt-4o-mini`, `o3-mini` |
+| **Google** | `@ai-sdk/google` | `gemini-2.0-flash`, `gemini-2.5-pro` |
+| **Groq** | `@ai-sdk/groq` | `llama-3.3-70b-versatile`, `mixtral-8x7b-32768` |
+| **Mistral** | `@ai-sdk/mistral` | `mistral-large-latest`, `mistral-small-latest` |
+| **XAI** | `@ai-sdk/xai` | `grok-3`, `grok-3-mini` |
+| **Perplexity** | `@ai-sdk/perplexity` | `sonar-pro`, `sonar` |
+| **Ollama** | Custom `LanguageModelV3` | `qwen3:8b`, `llama3:8b`, `deepseek-r1` |
+
+## 💬 Text Generation
+
+### Generate Text
+
+```typescript
+import { getModel, generateText } from '@push.rocks/smartai';
+
+const model = getModel({
+  provider: 'openai',
+  model: 'gpt-4o',
+  apiKey: process.env.OPENAI_TOKEN,
+});
+
+const result = await generateText({
+  model,
+  system: 'You are a helpful assistant.',
+  prompt: 'What is 2 + 2?',
+});
+
+console.log(result.text); // "4"
+```
+
+### Stream Text
+
+```typescript
+import { getModel, streamText } from '@push.rocks/smartai';
+
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
+});
+
+const result = await streamText({
+  model,
+  prompt: 'Count from 1 to 10.',
+});
+
+for await (const chunk of result.textStream) {
+  process.stdout.write(chunk);
 }
 ```

-### 🎙️ Text-to-Speech
-
-Generate natural voices with OpenAI or ElevenLabs:
+### Tool Calling

 ```typescript
-// OpenAI TTS
-const audioStream = await ai.openaiProvider.audio({
-  message: 'Welcome to the future of AI development!',
+import { getModel, generateText, tool, jsonSchema } from '@push.rocks/smartai';
+
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
 });

-// ElevenLabs TTS - Premium quality, natural voices (uses v3 by default)
-const elevenLabsAudio = await ai.elevenlabsProvider.audio({
-  message: 'Experience the most lifelike text to speech technology.',
-  voiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
-  modelId: 'eleven_v3', // Optional: defaults to eleven_v3 (70+ languages)
-  voiceSettings: {
-    // Optional: fine-tune voice characteristics
-    stability: 0.5, // 0-1: Speech consistency
-    similarity_boost: 0.8, // 0-1: Voice similarity to original
-    style: 0.0, // 0-1: Expressiveness
-    use_speaker_boost: true, // Enhanced clarity
+const result = await generateText({
+  model,
+  prompt: 'What is the weather in London?',
+  tools: {
+    getWeather: tool({
+      description: 'Get weather for a location',
+      parameters: jsonSchema({
+        type: 'object',
+        properties: {
+          location: { type: 'string' },
+        },
+        required: ['location'],
+      }),
+      execute: async ({ location }) => {
+        return { temperature: 18, condition: 'cloudy' };
+      },
+    }),
+  },
+});
+```
+
+## 🏠 Ollama (Local Models)
+
+The custom Ollama provider implements `LanguageModelV3` directly, calling Ollama's native `/api/chat` endpoint. This gives you features that generic OpenAI-compatible wrappers miss:
+
+```typescript
+import { getModel, generateText } from '@push.rocks/smartai';
+
+const model = getModel({
+  provider: 'ollama',
+  model: 'qwen3:8b',
+  baseUrl: 'http://localhost:11434', // default
+  ollamaOptions: {
+    think: true,      // Enable thinking/reasoning mode
+    num_ctx: 8192,     // Context window size
+    temperature: 0.7,  // Override default (Qwen models auto-default to 0.55)
  },
 });

-// Stream directly to speakers or save to file
-audioStream.pipe(fs.createWriteStream('welcome.mp3'));
+const result = await generateText({
+  model,
+  prompt: 'Solve this step by step: what is 15% of 340?',
+});
+
+console.log(result.text);
 ```

-### 👁️ Vision Analysis
+### Ollama Features

-Understand images with multiple providers:
+- **`think` mode** — Enables reasoning for models that support it (Qwen3, QwQ, DeepSeek-R1). The `think` parameter is sent at the top level of the request body as required by the Ollama API.
+- **Auto-tuned temperature** — Qwen models automatically get `temperature: 0.55` when no explicit temperature is set, matching the recommended inference setting.
+- **Native tool calling** — Full tool call support via Ollama's native format (not shimmed through OpenAI-compatible endpoints).
+- **Streaming with reasoning** — `doStream()` emits proper `reasoning-start`, `reasoning-delta`, `reasoning-end` parts alongside text.
+- **All Ollama options** — `num_ctx`, `top_k`, `top_p`, `repeat_penalty`, `num_predict`, `stop`, `seed`.
+
+## 💰 Anthropic Prompt Caching
+
+When using the Anthropic provider, SmartAI automatically wraps the model with caching middleware that adds `cacheControl: { type: 'ephemeral' }` to the last system message and last user message. This can significantly reduce cost and latency for repeated calls with the same system prompt.

 ```typescript
-const image = fs.readFileSync('product-photo.jpg');
-
-// OpenAI: General purpose vision
-const gptVision = await ai.openaiProvider.vision({
-  image,
-  prompt: 'Describe this product and suggest marketing angles',
+// Caching enabled by default
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
 });

-// Anthropic: Detailed analysis with extended thinking
-const claudeVision = await ai.anthropicProvider.vision({
-  image,
-  prompt: 'Identify any safety concerns or defects',
-});
-
-// Ollama: Private, local analysis
-const ollamaVision = await ai.ollamaProvider.vision({
-  image,
-  prompt: 'Extract all text and categorize the content',
+// Opt out of caching
+const modelNoCaching = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
+  promptCaching: false,
 });
 ```

-### 📄 Document Intelligence
-
-Extract insights from PDFs with AI:
+You can also use the middleware directly:

 ```typescript
-const contract = fs.readFileSync('contract.pdf');
-const invoice = fs.readFileSync('invoice.pdf');
+import { createAnthropicCachingMiddleware } from '@push.rocks/smartai';
+import { wrapLanguageModel } from 'ai';

-// Analyze documents with OpenAI
-const analysis = await ai.openaiProvider.document({
-  systemMessage: 'You are a legal expert.',
-  userMessage: 'Compare these documents and highlight key differences',
-  messageHistory: [],
-  pdfDocuments: [contract, invoice],
-});
-
-// Multi-document analysis with Anthropic
-const taxDocs = [form1099, w2, receipts];
-const taxAnalysis = await ai.anthropicProvider.document({
-  systemMessage: 'You are a tax advisor.',
-  userMessage: 'Prepare a tax summary from these documents',
-  messageHistory: [],
-  pdfDocuments: taxDocs,
-});
+const middleware = createAnthropicCachingMiddleware();
+const cachedModel = wrapLanguageModel({ model: baseModel, middleware });
 ```

-### 🔬 Research & Web Search
+## 📦 Subpath Exports

-Perform deep research with web search capabilities across multiple providers:
+SmartAI provides specialized capabilities as separate subpath imports. Each one is a focused utility that takes a model (or API key) and does one thing well.
+
+### 👁️ Vision — `@push.rocks/smartai/vision`
+
+Analyze images using any vision-capable model.

 ```typescript
-// OpenAI Deep Research - Comprehensive analysis
-const deepResearch = await ai.openaiProvider.research({
-  query: 'What are the latest developments in quantum computing?',
-  searchDepth: 'deep',
-  includeWebSearch: true,
+import { analyzeImage } from '@push.rocks/smartai/vision';
+import { getModel } from '@push.rocks/smartai';
+import * as fs from 'fs';
+
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
 });

-console.log(deepResearch.answer);
-console.log('Sources:', deepResearch.sources);
-
-// Anthropic Web Search - Domain-filtered research
-import { AnthropicProvider } from '@push.rocks/smartai';
-
-const anthropic = new AnthropicProvider({
-  anthropicToken: 'sk-ant-...',
-  enableWebSearch: true,
-  searchDomainAllowList: ['nature.com', 'science.org'],
+const description = await analyzeImage({
+  model,
+  image: fs.readFileSync('photo.jpg'),
+  prompt: 'Describe this image in detail.',
+  mediaType: 'image/jpeg', // optional, defaults to 'image/jpeg'
 });

-const scientificResearch = await anthropic.research({
-  query: 'Latest breakthroughs in CRISPR gene editing',
-  searchDepth: 'advanced',
-});
-
-// Perplexity - Research-focused with citations
-const perplexityResearch = await ai.perplexityProvider.research({
-  query: 'Current state of autonomous vehicle technology',
-  searchDepth: 'deep', // Uses Sonar Pro model
-});
+console.log(description);
 ```

-**Research Options:**
+**`analyzeImage(options)`** accepts:
+- `model` — Any `LanguageModelV3` with vision support
+- `image` — `Buffer` or `Uint8Array`
+- `prompt` — What to ask about the image
+- `mediaType` — `'image/jpeg'` | `'image/png'` | `'image/webp'` | `'image/gif'`

- `searchDepth`: `'basic'` | `'advanced'` | `'deep'`
- `maxSources`: Number of sources to include
- `includeWebSearch`: Enable web search (OpenAI)
- `background`: Run as background task (OpenAI)
+### 🎙️ Audio — `@push.rocks/smartai/audio`

-**Supported Providers:**
-
- **OpenAI**: Deep Research API with specialized models (`o3-deep-research-*`, `o4-mini-deep-research-*`)
- **Anthropic**: Web Search API with domain filtering
- **Perplexity**: Sonar and Sonar Pro models with built-in citations
-
-### 🧠 Extended Thinking (Anthropic)
-
-Enable Claude to spend more time reasoning about complex problems before generating responses:
+Text-to-speech using OpenAI's TTS models.

 ```typescript
-import { AnthropicProvider } from '@push.rocks/smartai';
+import { textToSpeech } from '@push.rocks/smartai/audio';
+import * as fs from 'fs';

-// Configure extended thinking mode at provider level
-const anthropic = new AnthropicProvider({
-  anthropicToken: 'sk-ant-...',
-  extendedThinking: 'normal', // Options: 'quick' | 'normal' | 'deep' | 'off'
+const stream = await textToSpeech({
+  apiKey: process.env.OPENAI_TOKEN,
+  text: 'Welcome to the future of AI development!',
+  voice: 'nova',     // 'alloy' | 'echo' | 'fable' | 'onyx' | 'nova' | 'shimmer'
+  model: 'tts-1-hd', // 'tts-1' | 'tts-1-hd'
+  responseFormat: 'mp3', // 'mp3' | 'opus' | 'aac' | 'flac'
+  speed: 1.0,         // 0.25 to 4.0
 });

-await anthropic.start();
-
-// Extended thinking is automatically applied to all methods
-const response = await anthropic.chat({
-  systemMessage: 'You are an expert mathematician.',
-  userMessage: 'Prove the Pythagorean theorem from first principles',
-  messageHistory: [],
-});
+stream.pipe(fs.createWriteStream('welcome.mp3'));
 ```

-**Thinking Modes:**
+### 🎨 Image — `@push.rocks/smartai/image`

-| Mode       | Budget Tokens | Use Case                                         |
-| ---------- | ------------- | ------------------------------------------------ |
-| `'quick'`  | 2,048         | Lightweight reasoning for simple queries         |
-| `'normal'` | 8,000         | **Default** — Balanced reasoning for most tasks  |
-| `'deep'`   | 16,000        | Complex reasoning for difficult problems         |
-| `'off'`    | 0             | Disable extended thinking                        |
-
-**Best Practices:**
-
- Start with `'normal'` (default) for general usage
- Use `'deep'` for complex analytical tasks, philosophy, mathematics, or research
- Use `'quick'` for simple factual queries where deep reasoning isn't needed
- Thinking budget counts against total token usage
-
-### 📑 Native PDF OCR (Mistral)
-
-Mistral provides native PDF document processing via their OCR API — no image conversion required:
+Generate and edit images using OpenAI's image models.

 ```typescript
-import { MistralProvider } from '@push.rocks/smartai';
+import { generateImage, editImage } from '@push.rocks/smartai/image';

-const mistral = new MistralProvider({
-  mistralToken: 'your-api-key',
-  chatModel: 'mistral-large-latest', // Default
-  ocrModel: 'mistral-ocr-latest', // Default
-  tableFormat: 'markdown', // 'markdown' | 'html'
-});
-
-await mistral.start();
-
-// Direct PDF processing - no image conversion overhead
-const result = await mistral.document({
-  systemMessage: 'You are a document analyst.',
-  userMessage: 'Extract all invoice details and calculate the total.',
-  pdfDocuments: [invoicePdfBuffer],
-  messageHistory: [],
-});
-```
-
-**Key Advantage**: Unlike other providers that convert PDFs to images first, Mistral's OCR API processes PDFs natively, potentially offering faster and more accurate text extraction for document-heavy workloads.
-
-**Supported Formats:**
-
- Native PDF processing via Files API
- Image OCR (JPEG, PNG, GIF, WebP) for vision tasks
- Table extraction with markdown or HTML output
-
-### 🎨 Image Generation & Editing
-
-Generate and edit images with OpenAI's cutting-edge models:
-
-```typescript
-// Basic image generation with gpt-image-1
-const image = await ai.openaiProvider.imageGenerate({
-  prompt: 'A futuristic robot assistant in a modern office, digital art',
-  model: 'gpt-image-1',
-  quality: 'high',
+// Generate an image
+const result = await generateImage({
+  apiKey: process.env.OPENAI_TOKEN,
+  prompt: 'A futuristic cityscape at sunset, digital art',
+  model: 'gpt-image-1',     // 'gpt-image-1' | 'dall-e-3' | 'dall-e-2'
+  quality: 'high',           // 'low' | 'medium' | 'high' | 'auto'
  size: '1024x1024',
+  background: 'transparent', // gpt-image-1 only
+  outputFormat: 'png',       // 'png' | 'jpeg' | 'webp'
+  n: 1,
 });

-// Save the generated image
-const imageBuffer = Buffer.from(image.images[0].b64_json!, 'base64');
-fs.writeFileSync('robot.png', imageBuffer);
-
-// Advanced: Transparent background with custom format
-const logo = await ai.openaiProvider.imageGenerate({
-  prompt: 'Minimalist mountain peak logo, geometric design',
-  model: 'gpt-image-1',
-  quality: 'high',
-  size: '1024x1024',
-  background: 'transparent',
-  outputFormat: 'png',
-});
-
-// WebP with compression for web use
-const webImage = await ai.openaiProvider.imageGenerate({
-  prompt: 'Product showcase: sleek smartphone on marble surface',
-  model: 'gpt-image-1',
-  quality: 'high',
-  size: '1536x1024',
-  outputFormat: 'webp',
-  outputCompression: 85,
-});
-
-// Superior text rendering (gpt-image-1's strength)
-const signage = await ai.openaiProvider.imageGenerate({
-  prompt:
-    'Vintage cafe sign saying "COFFEE & CODE" in hand-lettered typography',
-  model: 'gpt-image-1',
-  quality: 'high',
-  size: '1024x1024',
-});
-
-// Generate multiple variations at once
-const variations = await ai.openaiProvider.imageGenerate({
-  prompt: 'Abstract geometric pattern, colorful minimalist art',
-  model: 'gpt-image-1',
-  n: 3,
-  quality: 'medium',
-  size: '1024x1024',
-});
+// result.images[0].b64_json — base64-encoded image data
+const imageBuffer = Buffer.from(result.images[0].b64_json!, 'base64');

 // Edit an existing image
-const editedImage = await ai.openaiProvider.imageEdit({
-  image: originalImageBuffer,
-  prompt: 'Add sunglasses and change the background to a beach sunset',
+const edited = await editImage({
+  apiKey: process.env.OPENAI_TOKEN,
+  image: imageBuffer,
+  prompt: 'Add a rainbow in the sky',
  model: 'gpt-image-1',
-  quality: 'high',
 });
 ```

-**Image Generation Options:**
+### 📄 Document — `@push.rocks/smartai/document`

- `model`: `'gpt-image-1'` | `'dall-e-3'` | `'dall-e-2'`
- `quality`: `'low'` | `'medium'` | `'high'` | `'auto'`
- `size`: Multiple aspect ratios up to 4096×4096
- `background`: `'transparent'` | `'opaque'` | `'auto'`
- `outputFormat`: `'png'` | `'jpeg'` | `'webp'`
- `outputCompression`: 0–100 for webp/jpeg
- `moderation`: `'low'` | `'auto'`
- `n`: Number of images (1–10)
-
-**gpt-image-1 Advantages:**
-
- Superior text rendering in images
- Up to 4096×4096 resolution
- Transparent background support
- Advanced output formats (WebP with compression)
- Better prompt understanding
- Streaming support for progressive rendering
-
-### 🔄 Persistent Conversations
-
-Maintain context across interactions:
+Analyze PDF documents by converting them to images and using a vision model. Uses `@push.rocks/smartpdf` for PDF-to-PNG conversion (requires Chromium/Puppeteer).

 ```typescript
-// Create a coding assistant conversation
-const assistant = ai.createConversation('openai');
-await assistant.setSystemMessage('You are an expert TypeScript developer.');
+import { analyzeDocuments, stopSmartpdf } from '@push.rocks/smartai/document';
+import { getModel } from '@push.rocks/smartai';
+import * as fs from 'fs';

-// First question
-const inputWriter = assistant.getInputStreamWriter();
-await inputWriter.write('How do I implement a singleton pattern?');
-
-// Continue the conversation
-await inputWriter.write('Now show me how to make it thread-safe');
-
-// The assistant remembers the entire context
-```
-
-## 🚀 Real-World Examples
-
-### Build a Customer Support Bot
-
-```typescript
-const supportBot = new SmartAi({
-  anthropicToken: process.env.ANTHROPIC_KEY, // Claude for empathetic responses
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
 });

-async function handleCustomerQuery(query: string, history: ChatMessage[]) {
-  try {
-    const response = await supportBot.anthropicProvider.chat({
-      systemMessage: `You are a helpful customer support agent.
-                      Be empathetic, professional, and solution-oriented.`,
-      userMessage: query,
-      messageHistory: history,
-    });
-
-    return response.message;
-  } catch (error) {
-    // Fallback to another provider if needed
-    return await supportBot.openaiProvider.chat({ /* ... */ });
-  }
-}
-```
-
-### Create a Code Review Assistant
-
-```typescript
-const codeReviewer = new SmartAi({
-  groqToken: process.env.GROQ_KEY, // Groq for speed
+const analysis = await analyzeDocuments({
+  model,
+  systemMessage: 'You are a legal document analyst.',
+  userMessage: 'Summarize the key terms and conditions.',
+  pdfDocuments: [fs.readFileSync('contract.pdf')],
+  messageHistory: [],  // optional: prior conversation context
 });

-async function reviewCode(code: string, language: string) {
-  const review = await codeReviewer.groqProvider.chat({
-    systemMessage: `You are a ${language} expert. Review code for:
-                    - Security vulnerabilities
-                    - Performance issues
-                    - Best practices
-                    - Potential bugs`,
-    userMessage: `Review this code:\n\n${code}`,
-    messageHistory: [],
-  });
+console.log(analysis);

-  return review.message;
-}
+// Clean up the SmartPdf instance when done
+await stopSmartpdf();
 ```

-### Build a Research Assistant
+### 🔬 Research — `@push.rocks/smartai/research`
+
+Perform web-search-powered research using Anthropic's `web_search_20250305` tool.

 ```typescript
-const researcher = new SmartAi({
-  perplexityToken: process.env.PERPLEXITY_KEY,
+import { research } from '@push.rocks/smartai/research';
+
+const result = await research({
+  apiKey: process.env.ANTHROPIC_TOKEN,
+  query: 'What are the latest developments in quantum computing?',
+  searchDepth: 'basic',     // 'basic' | 'advanced' | 'deep'
+  maxSources: 10,           // optional: limit number of search results
+  allowedDomains: ['nature.com', 'arxiv.org'],  // optional: restrict to domains
+  blockedDomains: ['reddit.com'],               // optional: exclude domains
 });

-async function research(topic: string) {
-  // Perplexity excels at web-aware research
-  const findings = await researcher.perplexityProvider.research({
-    query: `Research the latest developments in ${topic}`,
-    searchDepth: 'deep',
-  });
-
-  return {
-    answer: findings.answer,
-    sources: findings.sources,
-  };
-}
+console.log(result.answer);
+console.log('Sources:', result.sources);       // Array<{ url, title, snippet }>
+console.log('Queries:', result.searchQueries); // search queries the model used
 ```

-### Local AI for Sensitive Data
+## 🧪 Testing

-```typescript
-const localAI = new SmartAi({
-  ollama: {
-    baseUrl: 'http://localhost:11434',
-    model: 'llama2',
-    visionModel: 'llava',
-  },
-});
+```bash
+# All tests
+pnpm test

-// Process sensitive documents without leaving your infrastructure
-async function analyzeSensitiveDoc(pdfBuffer: Buffer) {
-  const analysis = await localAI.ollamaProvider.document({
-    systemMessage: 'Extract and summarize key information.',
-    userMessage: 'Analyze this confidential document',
-    messageHistory: [],
-    pdfDocuments: [pdfBuffer],
-  });
-
-  // Data never leaves your servers
-  return analysis.message;
-}
+# Individual test files
+tstest test/test.smartai.ts --verbose    # Core getModel + generateText + streamText
+tstest test/test.ollama.ts --verbose     # Ollama provider (mocked, no API needed)
+tstest test/test.vision.ts --verbose     # Vision analysis
+tstest test/test.image.ts --verbose      # Image generation
+tstest test/test.research.ts --verbose   # Web research
+tstest test/test.audio.ts --verbose      # Text-to-speech
+tstest test/test.document.ts --verbose   # Document analysis (needs Chromium)
 ```

-## ⚡ Performance Tips
+Most tests skip gracefully when API keys are not set. The Ollama tests are fully mocked and require no external services.

-### 1. Provider Selection Strategy
+## 📐 Architecture

-```typescript
-class SmartAIRouter {
-  constructor(private ai: SmartAi) {}
-
-  async query(
-    message: string,
-    requirements: {
-      speed?: boolean;
-      accuracy?: boolean;
-      cost?: boolean;
-      privacy?: boolean;
-    }
-  ) {
-    if (requirements.privacy) {
-      return this.ai.ollamaProvider.chat({ /* ... */ }); // Local only
-    }
-    if (requirements.speed) {
-      return this.ai.groqProvider.chat({ /* ... */ }); // 10x faster
-    }
-    if (requirements.accuracy) {
-      return this.ai.anthropicProvider.chat({ /* ... */ }); // Best reasoning
-    }
-    // Default fallback
-    return this.ai.openaiProvider.chat({ /* ... */ });
-  }
-}
+```
+@push.rocks/smartai
+├── ts/                          # Core package
+│   ├── index.ts                 # Re-exports getModel, AI SDK functions, types
+│   ├── smartai.classes.smartai.ts  # getModel() — provider switch
+│   ├── smartai.interfaces.ts    # ISmartAiOptions, TProvider, IOllamaModelOptions
+│   ├── smartai.provider.ollama.ts  # Custom LanguageModelV3 for Ollama
+│   ├── smartai.middleware.anthropic.ts  # Prompt caching middleware
+│   └── plugins.ts               # AI SDK provider factories
+├── ts_vision/                   # @push.rocks/smartai/vision
+├── ts_audio/                    # @push.rocks/smartai/audio
+├── ts_image/                    # @push.rocks/smartai/image
+├── ts_document/                 # @push.rocks/smartai/document
+└── ts_research/                 # @push.rocks/smartai/research
 ```

-### 2. Streaming for Large Responses
+The core package is a thin registry. `getModel()` creates the appropriate `@ai-sdk/*` provider, calls it with the model ID, and returns the resulting `LanguageModelV3`. For Anthropic, it optionally wraps the model with prompt caching middleware. For Ollama, it returns a custom `LanguageModelV3` implementation that talks directly to Ollama's `/api/chat` endpoint.

-```typescript
-// Don't wait for the entire response
-async function streamResponse(userQuery: string) {
-  const stream = await ai.openaiProvider.chatStream(
-    createInputStream(userQuery)
-  );
-
-  // Process tokens as they arrive
-  for await (const chunk of stream) {
-    updateUI(chunk); // Immediate feedback
-    await processChunk(chunk); // Parallel processing
-  }
-}
-```
-
-### 3. Parallel Multi-Provider Queries
-
-```typescript
-// Get the best answer from multiple AIs
-async function consensusQuery(question: string) {
-  const providers = [
-    ai.openaiProvider.chat({ /* ... */ }),
-    ai.anthropicProvider.chat({ /* ... */ }),
-    ai.perplexityProvider.chat({ /* ... */ }),
-  ];
-
-  const responses = await Promise.all(providers);
-  return synthesizeResponses(responses);
-}
-```
-
-## 🛠️ Advanced Configuration
-
-### Provider-Specific Options
-
-```typescript
-const ai = new SmartAi({
-  // OpenAI
-  openaiToken: 'sk-...',
-
-  // Anthropic with extended thinking
-  anthropicToken: 'sk-ant-...',
-
-  // Perplexity for research
-  perplexityToken: 'pplx-...',
-
-  // Groq for speed
-  groqToken: 'gsk_...',
-
-  // Mistral with OCR settings
-  mistralToken: 'your-key',
-  mistral: {
-    chatModel: 'mistral-large-latest',
-    ocrModel: 'mistral-ocr-latest',
-    tableFormat: 'markdown',
-  },
-
-  // XAI (Grok)
-  xaiToken: 'xai-...',
-
-  // ElevenLabs TTS
-  elevenlabsToken: 'sk-...',
-  elevenlabs: {
-    defaultVoiceId: '19STyYD15bswVz51nqLf',
-    defaultModelId: 'eleven_v3',
-  },
-
-  // Ollama (local)
-  ollama: {
-    baseUrl: 'http://localhost:11434',
-    model: 'llama2',
-    visionModel: 'llava',
-    defaultOptions: {
-      num_ctx: 4096,
-      temperature: 0.7,
-      top_p: 0.9,
-    },
-    defaultTimeout: 120000,
-  },
-
-  // Exo (distributed)
-  exo: {
-    baseUrl: 'http://localhost:8080/v1',
-    apiKey: 'optional-key',
-  },
-});
-```
-
-### Error Handling & Fallbacks
-
-```typescript
-class ResilientAI {
-  private providers = ['openai', 'anthropic', 'groq'];
-
-  async query(opts: ChatOptions): Promise<ChatResponse> {
-    for (const provider of this.providers) {
-      try {
-        return await this.ai[`${provider}Provider`].chat(opts);
-      } catch (error) {
-        console.warn(`${provider} failed, trying next...`);
-        continue;
-      }
-    }
-    throw new Error('All providers failed');
-  }
-}
-```
-
-## 🎯 Choosing the Right Provider
-
-| Use Case              | Recommended Provider | Why                                                       |
-| --------------------- | -------------------- | --------------------------------------------------------- |
-| **General Purpose**   | OpenAI               | Most features, stable, well-documented                    |
-| **Complex Reasoning** | Anthropic            | Superior logical thinking, extended thinking, safer       |
-| **Document OCR**      | Mistral              | Native PDF processing, no image conversion overhead       |
-| **Research & Facts**  | Perplexity           | Web-aware, provides citations                             |
-| **Deep Research**     | OpenAI               | Deep Research API with comprehensive analysis             |
-| **Premium TTS**       | ElevenLabs           | Most natural voices, 70+ languages, v3 model              |
-| **Speed Critical**    | Groq                 | 10x faster inference, sub-second responses                |
-| **Privacy Critical**  | Ollama               | 100% local, no data leaves your servers                   |
-| **Real-time Data**    | XAI                  | Grok with access to current information                   |
-| **Cost Sensitive**    | Ollama/Exo           | Free (local) or distributed compute                       |
-
-## 📈 Roadmap
-
- [x] Research & Web Search API
- [x] Image generation support (gpt-image-1, DALL-E 3, DALL-E 2)
- [x] Extended thinking (Anthropic)
- [x] Native PDF OCR (Mistral)
- [ ] Streaming function calls
- [ ] Voice input processing
- [ ] Fine-tuning integration
- [ ] Embedding support
- [ ] Agent framework
- [ ] More providers (Cohere, AI21, etc.)
+Subpath modules are independent — they import `ai` and provider SDKs directly, not through the core package. This keeps the dependency graph clean and allows tree-shaking.

 ## License and Legal Information