fix(docs): update documentation: clarify provider capabilities, add provider capabilities summary, polish examples and formatting, and remove Serena project config

2026-01-20 01:27:52 +00:00
parent ae8d3ccf33
commit 2040b3c629
6 changed files with 169 additions and 408 deletions
--- a/readme.md
+++ b/readme.md
@@ -6,7 +6,7 @@
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.x-blue.svg)](https://www.typescriptlang.org/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

-SmartAI unifies the world's leading AI providers - OpenAI, Anthropic, Mistral, Perplexity, Ollama, Groq, XAI, Exo, and ElevenLabs - under a single, elegant TypeScript interface. Build AI applications at lightning speed without vendor lock-in.
+SmartAI unifies the world's leading AI providers — OpenAI, Anthropic, Mistral, Perplexity, Ollama, Groq, XAI, Exo, and ElevenLabs — under a single, elegant TypeScript interface. Build AI applications at lightning speed without vendor lock-in.

 ## Issue Reporting and Security

@@ -14,19 +14,23 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community

 ## 🎯 Why SmartAI?

- **🔌 Universal Interface**: Write once, run with any AI provider. Switch between GPT-4, Claude, Llama, or Grok with a single line change.
- **🛡️ Type-Safe**: Full TypeScript support with comprehensive type definitions for all operations
- **🌊 Streaming First**: Built for real-time applications with native streaming support
- **🎨 Multi-Modal**: Seamlessly work with text, images, audio, and documents
- **🏠 Local & Cloud**: Support for both cloud providers and local models via Ollama
- **⚡ Zero Lock-In**: Your code remains portable across all AI providers
+- **🔌 Universal Interface**: Write once, run with any AI provider. Switch between GPT-5, Claude, Llama, or Grok with a single line change.
+- **🛡️ Type-Safe**: Full TypeScript support with comprehensive type definitions for all operations.
+- **🌊 Streaming First**: Built for real-time applications with native streaming support.
+- **🎨 Multi-Modal**: Seamlessly work with text, images, audio, and documents.
+- **🏠 Local & Cloud**: Support for both cloud providers and local models via Ollama/Exo.
+- **⚡ Zero Lock-In**: Your code remains portable across all AI providers.

-## 🚀 Quick Start
+## 📦 Installation

 ```bash
 npm install @push.rocks/smartai
+# or
+pnpm install @push.rocks/smartai
 ```

+## 🚀 Quick Start
+
 ```typescript
 import { SmartAi } from '@push.rocks/smartai';

@@ -48,6 +52,8 @@ const response = await ai.openaiProvider.chat({
  userMessage: 'Explain quantum computing in simple terms',
  messageHistory: [],
 });
+
+console.log(response.message);
 ```

 ## 📊 Provider Capabilities Matrix
@@ -56,15 +62,15 @@ Choose the right provider for your use case:

 | Provider       | Chat | Streaming | TTS | Vision | Documents | Research | Images | Highlights                                                      |
 | -------------- | :--: | :-------: | :-: | :----: | :-------: | :------: | :----: | --------------------------------------------------------------- |
-| **OpenAI**     |  ✅  |    ✅     | ✅  |   ✅   |    ✅     |    ✅    |   ✅   | • gpt-image-1<br>• DALL-E 3<br>• Deep research API              |
-| **Anthropic**  |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ✅    |   ❌   | • Claude Sonnet 4.5<br>• Superior reasoning<br>• Web search API |
-| **Mistral**    |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | • Native PDF OCR<br>• mistral-large<br>• Fast inference         |
-| **ElevenLabs** |  ❌  |    ❌     | ✅  |   ❌   |    ❌     |    ❌    |   ❌   | • Premium TTS<br>• 70+ languages<br>• Natural voices            |
-| **Ollama**     |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | • 100% local<br>• Privacy-first<br>• No API costs               |
-| **XAI**        |  ✅  |    ✅     | ❌  |   ❌   |    ✅     |    ❌    |   ❌   | • Grok models<br>• Real-time data<br>• Uncensored               |
-| **Perplexity** |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ✅    |   ❌   | • Web-aware<br>• Research-focused<br>• Sonar Pro models         |
-| **Groq**       |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ❌    |   ❌   | • 10x faster<br>• LPU inference<br>• Low latency                |
-| **Exo**        |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ❌    |   ❌   | • Distributed<br>• P2P compute<br>• Decentralized               |
+| **OpenAI**     |  ✅  |    ✅     | ✅  |   ✅   |    ✅     |    ✅    |   ✅   | gpt-image-1 • DALL-E 3 • Deep Research API                      |
+| **Anthropic**  |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ✅    |   ❌   | Claude Sonnet 4.5 • Extended Thinking • Web Search API          |
+| **Mistral**    |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | Native PDF OCR • mistral-large • Fast inference                 |
+| **ElevenLabs** |  ❌  |    ❌     | ✅  |   ❌   |    ❌     |    ❌    |   ❌   | Premium TTS • 70+ languages • v3 model                          |
+| **Ollama**     |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | 100% local • Privacy-first • No API costs                       |
+| **XAI**        |  ✅  |    ✅     | ❌  |   ❌   |    ✅     |    ❌    |   ❌   | Grok 2 • Real-time data                                         |
+| **Perplexity** |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ✅    |   ❌   | Web-aware • Research-focused • Sonar Pro                        |
+| **Groq**       |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ❌    |   ❌   | 10x faster • LPU inference • Llama 3.3                          |
+| **Exo**        |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ❌    |   ❌   | Distributed • P2P compute • Decentralized                       |

 ## 🎮 Core Features

@@ -73,9 +79,9 @@ Choose the right provider for your use case:
 Works identically across all providers:

 ```typescript
-// Use GPT-4 for complex reasoning
+// Use GPT-5 for complex reasoning
 const gptResponse = await ai.openaiProvider.chat({
-  systemMessage: 'You are a expert physicist.',
+  systemMessage: 'You are an expert physicist.',
  userMessage: 'Explain the implications of quantum entanglement',
  messageHistory: [],
 });
@@ -128,20 +134,17 @@ const audioStream = await ai.openaiProvider.audio({
 const elevenLabsAudio = await ai.elevenlabsProvider.audio({
  message: 'Experience the most lifelike text to speech technology.',
  voiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
-  modelId: 'eleven_v3', // Optional: defaults to eleven_v3 (70+ languages, most expressive)
+  modelId: 'eleven_v3', // Optional: defaults to eleven_v3 (70+ languages)
  voiceSettings: {
    // Optional: fine-tune voice characteristics
    stability: 0.5, // 0-1: Speech consistency
    similarity_boost: 0.8, // 0-1: Voice similarity to original
-    style: 0.0, // 0-1: Expressiveness (higher = more expressive)
+    style: 0.0, // 0-1: Expressiveness
    use_speaker_boost: true, // Enhanced clarity
  },
 });

-// Stream directly to speakers
-audioStream.pipe(speakerOutput);
-
-// Or save to file
+// Stream directly to speakers or save to file
 audioStream.pipe(fs.createWriteStream('welcome.mp3'));
 ```

@@ -158,7 +161,7 @@ const gptVision = await ai.openaiProvider.vision({
  prompt: 'Describe this product and suggest marketing angles',
 });

-// Anthropic: Detailed analysis
+// Anthropic: Detailed analysis with extended thinking
 const claudeVision = await ai.anthropicProvider.vision({
  image,
  prompt: 'Identify any safety concerns or defects',
@@ -179,7 +182,7 @@ Extract insights from PDFs with AI:
 const contract = fs.readFileSync('contract.pdf');
 const invoice = fs.readFileSync('invoice.pdf');

-// Analyze documents
+// Analyze documents with OpenAI
 const analysis = await ai.openaiProvider.document({
  systemMessage: 'You are a legal expert.',
  userMessage: 'Compare these documents and highlight key differences',
@@ -187,7 +190,7 @@ const analysis = await ai.openaiProvider.document({
  pdfDocuments: [contract, invoice],
 });

-// Multi-document analysis
+// Multi-document analysis with Anthropic
 const taxDocs = [form1099, w2, receipts];
 const taxAnalysis = await ai.anthropicProvider.document({
  systemMessage: 'You are a tax advisor.',
@@ -213,6 +216,8 @@ console.log(deepResearch.answer);
 console.log('Sources:', deepResearch.sources);

 // Anthropic Web Search - Domain-filtered research
+import { AnthropicProvider } from '@push.rocks/smartai';
+
 const anthropic = new AnthropicProvider({
  anthropicToken: 'sk-ant-...',
  enableWebSearch: true,
@@ -233,14 +238,14 @@ const perplexityResearch = await ai.perplexityProvider.research({

 **Research Options:**

- `searchDepth`: 'basic' | 'advanced' | 'deep'
+- `searchDepth`: `'basic'` | `'advanced'` | `'deep'`
 - `maxSources`: Number of sources to include
 - `includeWebSearch`: Enable web search (OpenAI)
 - `background`: Run as background task (OpenAI)

 **Supported Providers:**

- **OpenAI**: Deep Research API with specialized models (`o3-deep-research-2025-06-26`, `o4-mini-deep-research-2025-06-26`)
+- **OpenAI**: Deep Research API with specialized models (`o3-deep-research-*`, `o4-mini-deep-research-*`)
 - **Anthropic**: Web Search API with domain filtering
 - **Perplexity**: Sonar and Sonar Pro models with built-in citations

@@ -269,12 +274,12 @@ const response = await anthropic.chat({

 **Thinking Modes:**

-| Mode | Budget Tokens | Use Case |
-|------|---------------|----------|
-| `'quick'` | 2,048 | Lightweight reasoning for simple queries |
-| `'normal'` | 8,000 | **Default** - Balanced reasoning for most tasks |
-| `'deep'` | 16,000 | Complex reasoning for difficult problems |
-| `'off'` | 0 | Disable extended thinking |
+| Mode       | Budget Tokens | Use Case                                         |
+| ---------- | ------------- | ------------------------------------------------ |
+| `'quick'`  | 2,048         | Lightweight reasoning for simple queries         |
+| `'normal'` | 8,000         | **Default** — Balanced reasoning for most tasks  |
+| `'deep'`   | 16,000        | Complex reasoning for difficult problems         |
+| `'off'`    | 0             | Disable extended thinking                        |

 **Best Practices:**

@@ -285,16 +290,16 @@ const response = await anthropic.chat({

 ### 📑 Native PDF OCR (Mistral)

-Mistral provides native PDF document processing via their OCR API - no image conversion required:
+Mistral provides native PDF document processing via their OCR API — no image conversion required:

 ```typescript
 import { MistralProvider } from '@push.rocks/smartai';

 const mistral = new MistralProvider({
  mistralToken: 'your-api-key',
-  chatModel: 'mistral-large-latest',  // Default
-  ocrModel: 'mistral-ocr-latest',     // Default
-  tableFormat: 'markdown',             // 'markdown' | 'html'
+  chatModel: 'mistral-large-latest', // Default
+  ocrModel: 'mistral-ocr-latest', // Default
+  tableFormat: 'markdown', // 'markdown' | 'html'
 });

 await mistral.start();
@@ -311,6 +316,7 @@ const result = await mistral.document({
 **Key Advantage**: Unlike other providers that convert PDFs to images first, Mistral's OCR API processes PDFs natively, potentially offering faster and more accurate text extraction for document-heavy workloads.

 **Supported Formats:**
+
 - Native PDF processing via Files API
 - Image OCR (JPEG, PNG, GIF, WebP) for vision tasks
 - Table extraction with markdown or HTML output
@@ -381,14 +387,14 @@ const editedImage = await ai.openaiProvider.imageEdit({

 **Image Generation Options:**

- `model`: 'gpt-image-1' | 'dall-e-3' | 'dall-e-2'
- `quality`: 'low' | 'medium' | 'high' | 'auto'
+- `model`: `'gpt-image-1'` | `'dall-e-3'` | `'dall-e-2'`
+- `quality`: `'low'` | `'medium'` | `'high'` | `'auto'`
 - `size`: Multiple aspect ratios up to 4096×4096
- `background`: 'transparent' | 'opaque' | 'auto'
- `outputFormat`: 'png' | 'jpeg' | 'webp'
- `outputCompression`: 0-100 for webp/jpeg
- `moderation`: 'low' | 'auto'
- `n`: Number of images (1-10)
+- `background`: `'transparent'` | `'opaque'` | `'auto'`
+- `outputFormat`: `'png'` | `'jpeg'` | `'webp'`
+- `outputCompression`: 0–100 for webp/jpeg
+- `moderation`: `'low'` | `'auto'`
+- `n`: Number of images (1–10)

 **gpt-image-1 Advantages:**

@@ -424,7 +430,7 @@ await inputWriter.write('Now show me how to make it thread-safe');

 ```typescript
 const supportBot = new SmartAi({
-  anthropicToken: process.env.ANTHROPIC_KEY // Claude for empathetic responses
+  anthropicToken: process.env.ANTHROPIC_KEY, // Claude for empathetic responses
 });

 async function handleCustomerQuery(query: string, history: ChatMessage[]) {
@@ -433,13 +439,13 @@ async function handleCustomerQuery(query: string, history: ChatMessage[]) {
      systemMessage: `You are a helpful customer support agent.
                      Be empathetic, professional, and solution-oriented.`,
      userMessage: query,
-      messageHistory: history
+      messageHistory: history,
    });

    return response.message;
  } catch (error) {
    // Fallback to another provider if needed
-    return await supportBot.openaiProvider.chat({...});
+    return await supportBot.openaiProvider.chat({ /* ... */ });
  }
 }
 ```
@@ -452,19 +458,16 @@ const codeReviewer = new SmartAi({
 });

 async function reviewCode(code: string, language: string) {
-  const startTime = Date.now();
-
  const review = await codeReviewer.groqProvider.chat({
    systemMessage: `You are a ${language} expert. Review code for:
                    - Security vulnerabilities
-                    - Performance issues  
+                    - Performance issues
                    - Best practices
                    - Potential bugs`,
    userMessage: `Review this code:\n\n${code}`,
    messageHistory: [],
  });

-  console.log(`Review completed in ${Date.now() - startTime}ms`);
  return review.message;
 }
 ```
@@ -478,14 +481,15 @@ const researcher = new SmartAi({

 async function research(topic: string) {
  // Perplexity excels at web-aware research
-  const findings = await researcher.perplexityProvider.chat({
-    systemMessage:
-      'You are a research assistant. Provide factual, cited information.',
-    userMessage: `Research the latest developments in ${topic}`,
-    messageHistory: [],
+  const findings = await researcher.perplexityProvider.research({
+    query: `Research the latest developments in ${topic}`,
+    searchDepth: 'deep',
  });

-  return findings.message;
+  return {
+    answer: findings.answer,
+    sources: findings.sources,
+  };
 }
 ```

@@ -522,23 +526,26 @@ async function analyzeSensitiveDoc(pdfBuffer: Buffer) {
 class SmartAIRouter {
  constructor(private ai: SmartAi) {}

-  async query(message: string, requirements: {
-    speed?: boolean;
-    accuracy?: boolean;
-    cost?: boolean;
-    privacy?: boolean;
-  }) {
+  async query(
+    message: string,
+    requirements: {
+      speed?: boolean;
+      accuracy?: boolean;
+      cost?: boolean;
+      privacy?: boolean;
+    }
+  ) {
    if (requirements.privacy) {
-      return this.ai.ollamaProvider.chat({...}); // Local only
+      return this.ai.ollamaProvider.chat({ /* ... */ }); // Local only
    }
    if (requirements.speed) {
-      return this.ai.groqProvider.chat({...}); // 10x faster
+      return this.ai.groqProvider.chat({ /* ... */ }); // 10x faster
    }
    if (requirements.accuracy) {
-      return this.ai.anthropicProvider.chat({...}); // Best reasoning
+      return this.ai.anthropicProvider.chat({ /* ... */ }); // Best reasoning
    }
    // Default fallback
-    return this.ai.openaiProvider.chat({...});
+    return this.ai.openaiProvider.chat({ /* ... */ });
  }
 }
 ```
@@ -549,7 +556,7 @@ class SmartAIRouter {
 // Don't wait for the entire response
 async function streamResponse(userQuery: string) {
  const stream = await ai.openaiProvider.chatStream(
-    createInputStream(userQuery),
+    createInputStream(userQuery)
  );

  // Process tokens as they arrive
@@ -566,9 +573,9 @@ async function streamResponse(userQuery: string) {
 // Get the best answer from multiple AIs
 async function consensusQuery(question: string) {
  const providers = [
-    ai.openaiProvider.chat({...}),
-    ai.anthropicProvider.chat({...}),
-    ai.perplexityProvider.chat({...})
+    ai.openaiProvider.chat({ /* ... */ }),
+    ai.anthropicProvider.chat({ /* ... */ }),
+    ai.perplexityProvider.chat({ /* ... */ }),
  ];

  const responses = await Promise.all(providers);
@@ -576,21 +583,61 @@ async function consensusQuery(question: string) {
 }
 ```

-## 🛠️ Advanced Features
+## 🛠️ Advanced Configuration

-### Custom Streaming Transformations
+### Provider-Specific Options

 ```typescript
-// Add real-time translation
-const translationStream = new TransformStream({
-  async transform(chunk, controller) {
-    const translated = await translateChunk(chunk);
-    controller.enqueue(translated);
+const ai = new SmartAi({
+  // OpenAI
+  openaiToken: 'sk-...',
+
+  // Anthropic with extended thinking
+  anthropicToken: 'sk-ant-...',
+
+  // Perplexity for research
+  perplexityToken: 'pplx-...',
+
+  // Groq for speed
+  groqToken: 'gsk_...',
+
+  // Mistral with OCR settings
+  mistralToken: 'your-key',
+  mistral: {
+    chatModel: 'mistral-large-latest',
+    ocrModel: 'mistral-ocr-latest',
+    tableFormat: 'markdown',
+  },
+
+  // XAI (Grok)
+  xaiToken: 'xai-...',
+
+  // ElevenLabs TTS
+  elevenlabsToken: 'sk-...',
+  elevenlabs: {
+    defaultVoiceId: '19STyYD15bswVz51nqLf',
+    defaultModelId: 'eleven_v3',
+  },
+
+  // Ollama (local)
+  ollama: {
+    baseUrl: 'http://localhost:11434',
+    model: 'llama2',
+    visionModel: 'llava',
+    defaultOptions: {
+      num_ctx: 4096,
+      temperature: 0.7,
+      top_p: 0.9,
+    },
+    defaultTimeout: 120000,
+  },
+
+  // Exo (distributed)
+  exo: {
+    baseUrl: 'http://localhost:8080/v1',
+    apiKey: 'optional-key',
  },
 });
-
-const responseStream = await ai.openaiProvider.chatStream(input);
-const translatedStream = responseStream.pipeThrough(translationStream);
 ```

 ### Error Handling & Fallbacks
@@ -613,84 +660,27 @@ class ResilientAI {
 }
 ```

-### Token Counting & Cost Management
-
-```typescript
-// Track usage across providers
-class UsageTracker {
-  async trackedChat(provider: string, options: ChatOptions) {
-    const start = Date.now();
-    const response = await ai[`${provider}Provider`].chat(options);
-
-    const usage = {
-      provider,
-      duration: Date.now() - start,
-      inputTokens: estimateTokens(options),
-      outputTokens: estimateTokens(response.message),
-    };
-
-    await this.logUsage(usage);
-    return response;
-  }
-}
-```
-
-## 📦 Installation & Setup
-
-### Prerequisites
-
- Node.js 16+
- TypeScript 4.5+
- API keys for your chosen providers
-
-### Environment Setup
-
-```bash
-# Install
-npm install @push.rocks/smartai
-
-# Set up environment variables
-export OPENAI_API_KEY=sk-...
-export ANTHROPIC_API_KEY=sk-ant-...
-export PERPLEXITY_API_KEY=pplx-...
-export ELEVENLABS_API_KEY=sk-...
-# ... etc
-```
-
-### TypeScript Configuration
-
-```json
-{
-  "compilerOptions": {
-    "target": "ES2022",
-    "module": "NodeNext",
-    "lib": ["ES2022"],
-    "strict": true,
-    "esModuleInterop": true,
-    "skipLibCheck": true
-  }
-}
-```
-
 ## 🎯 Choosing the Right Provider

 | Use Case              | Recommended Provider | Why                                                       |
 | --------------------- | -------------------- | --------------------------------------------------------- |
 | **General Purpose**   | OpenAI               | Most features, stable, well-documented                    |
-| **Complex Reasoning** | Anthropic            | Superior logical thinking, safer outputs                  |
+| **Complex Reasoning** | Anthropic            | Superior logical thinking, extended thinking, safer       |
 | **Document OCR**      | Mistral              | Native PDF processing, no image conversion overhead       |
 | **Research & Facts**  | Perplexity           | Web-aware, provides citations                             |
 | **Deep Research**     | OpenAI               | Deep Research API with comprehensive analysis             |
-| **Premium TTS**       | ElevenLabs           | Most natural voices, 70+ languages, superior quality (v3) |
+| **Premium TTS**       | ElevenLabs           | Most natural voices, 70+ languages, v3 model              |
 | **Speed Critical**    | Groq                 | 10x faster inference, sub-second responses                |
 | **Privacy Critical**  | Ollama               | 100% local, no data leaves your servers                   |
-| **Real-time Data**    | XAI                  | Access to current information                             |
+| **Real-time Data**    | XAI                  | Grok with access to current information                   |
 | **Cost Sensitive**    | Ollama/Exo           | Free (local) or distributed compute                       |

 ## 📈 Roadmap

 - [x] Research & Web Search API
 - [x] Image generation support (gpt-image-1, DALL-E 3, DALL-E 2)
+- [x] Extended thinking (Anthropic)
+- [x] Native PDF OCR (Mistral)
 - [ ] Streaming function calls
 - [ ] Voice input processing
 - [ ] Fine-tuning integration