Files

Juergen Kunz 2040b3c629 fix(docs): update documentation: clarify provider capabilities, add provider capabilities summary, polish examples and formatting, and remove Serena project config

2026-01-20 01:27:52 +00:00

23 KiB

Raw Blame History

@push.rocks/smartai

One API to rule them all 🚀

SmartAI unifies the world's leading AI providers — OpenAI, Anthropic, Mistral, Perplexity, Ollama, Groq, XAI, Exo, and ElevenLabs — under a single, elegant TypeScript interface. Build AI applications at lightning speed without vendor lock-in.

Issue Reporting and Security

For reporting bugs, issues, or security vulnerabilities, please visit community.foss.global/. This is the central community hub for all issue reporting. Developers who sign and comply with our contribution agreement and go through identification can also get a code.foss.global/ account to submit Pull Requests directly.

🎯 Why SmartAI?

🔌 Universal Interface: Write once, run with any AI provider. Switch between GPT-5, Claude, Llama, or Grok with a single line change.
🛡️ Type-Safe: Full TypeScript support with comprehensive type definitions for all operations.
🌊 Streaming First: Built for real-time applications with native streaming support.
🎨 Multi-Modal: Seamlessly work with text, images, audio, and documents.
🏠 Local & Cloud: Support for both cloud providers and local models via Ollama/Exo.
⚡ Zero Lock-In: Your code remains portable across all AI providers.

📦 Installation

npm install @push.rocks/smartai
# or
pnpm install @push.rocks/smartai

🚀 Quick Start

import { SmartAi } from '@push.rocks/smartai';

// Initialize with your favorite providers
const ai = new SmartAi({
  openaiToken: 'sk-...',
  anthropicToken: 'sk-ant-...',
  elevenlabsToken: 'sk-...',
  elevenlabs: {
    defaultVoiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
  },
});

await ai.start();

// Same API, multiple providers
const response = await ai.openaiProvider.chat({
  systemMessage: 'You are a helpful assistant.',
  userMessage: 'Explain quantum computing in simple terms',
  messageHistory: [],
});

console.log(response.message);

📊 Provider Capabilities Matrix

Choose the right provider for your use case:

Provider	Chat	Streaming	TTS	Vision	Documents	Research	Images	Highlights
OpenAI	✅	✅	✅	✅	✅	✅	✅	gpt-image-1 • DALL-E 3 • Deep Research API
Anthropic	✅	✅	❌	✅	✅	✅	❌	Claude Sonnet 4.5 • Extended Thinking • Web Search API
Mistral	✅	✅	❌	✅	✅	❌	❌	Native PDF OCR • mistral-large • Fast inference
ElevenLabs	❌	❌	✅	❌	❌	❌	❌	Premium TTS • 70+ languages • v3 model
Ollama	✅	✅	❌	✅	✅	❌	❌	100% local • Privacy-first • No API costs
XAI	✅	✅	❌	❌	✅	❌	❌	Grok 2 • Real-time data
Perplexity	✅	✅	❌	❌	❌	✅	❌	Web-aware • Research-focused • Sonar Pro
Groq	✅	✅	❌	❌	❌	❌	❌	10x faster • LPU inference • Llama 3.3
Exo	✅	✅	❌	❌	❌	❌	❌	Distributed • P2P compute • Decentralized

🎮 Core Features

💬 Universal Chat Interface

Works identically across all providers:

// Use GPT-5 for complex reasoning
const gptResponse = await ai.openaiProvider.chat({
  systemMessage: 'You are an expert physicist.',
  userMessage: 'Explain the implications of quantum entanglement',
  messageHistory: [],
});

// Use Claude for safety-critical applications
const claudeResponse = await ai.anthropicProvider.chat({
  systemMessage: 'You are a medical advisor.',
  userMessage: 'Review this patient data for concerns',
  messageHistory: [],
});

// Use Groq for lightning-fast responses
const groqResponse = await ai.groqProvider.chat({
  systemMessage: 'You are a code reviewer.',
  userMessage: 'Quick! Find the bug in this code: ...',
  messageHistory: [],
});

🌊 Real-Time Streaming

Build responsive chat interfaces with token-by-token streaming:

// Create a chat stream
const stream = await ai.openaiProvider.chatStream(inputStream);
const reader = stream.getReader();

// Display responses as they arrive
while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  // Update UI in real-time
  process.stdout.write(value);
}

🎙️ Text-to-Speech

Generate natural voices with OpenAI or ElevenLabs:

// OpenAI TTS
const audioStream = await ai.openaiProvider.audio({
  message: 'Welcome to the future of AI development!',
});

// ElevenLabs TTS - Premium quality, natural voices (uses v3 by default)
const elevenLabsAudio = await ai.elevenlabsProvider.audio({
  message: 'Experience the most lifelike text to speech technology.',
  voiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
  modelId: 'eleven_v3', // Optional: defaults to eleven_v3 (70+ languages)
  voiceSettings: {
    // Optional: fine-tune voice characteristics
    stability: 0.5, // 0-1: Speech consistency
    similarity_boost: 0.8, // 0-1: Voice similarity to original
    style: 0.0, // 0-1: Expressiveness
    use_speaker_boost: true, // Enhanced clarity
  },
});

// Stream directly to speakers or save to file
audioStream.pipe(fs.createWriteStream('welcome.mp3'));

👁️ Vision Analysis

Understand images with multiple providers:

const image = fs.readFileSync('product-photo.jpg');

// OpenAI: General purpose vision
const gptVision = await ai.openaiProvider.vision({
  image,
  prompt: 'Describe this product and suggest marketing angles',
});

// Anthropic: Detailed analysis with extended thinking
const claudeVision = await ai.anthropicProvider.vision({
  image,
  prompt: 'Identify any safety concerns or defects',
});

// Ollama: Private, local analysis
const ollamaVision = await ai.ollamaProvider.vision({
  image,
  prompt: 'Extract all text and categorize the content',
});

📄 Document Intelligence

Extract insights from PDFs with AI:

const contract = fs.readFileSync('contract.pdf');
const invoice = fs.readFileSync('invoice.pdf');

// Analyze documents with OpenAI
const analysis = await ai.openaiProvider.document({
  systemMessage: 'You are a legal expert.',
  userMessage: 'Compare these documents and highlight key differences',
  messageHistory: [],
  pdfDocuments: [contract, invoice],
});

// Multi-document analysis with Anthropic
const taxDocs = [form1099, w2, receipts];
const taxAnalysis = await ai.anthropicProvider.document({
  systemMessage: 'You are a tax advisor.',
  userMessage: 'Prepare a tax summary from these documents',
  messageHistory: [],
  pdfDocuments: taxDocs,
});

🔬 Research & Web Search

Perform deep research with web search capabilities across multiple providers:

// OpenAI Deep Research - Comprehensive analysis
const deepResearch = await ai.openaiProvider.research({
  query: 'What are the latest developments in quantum computing?',
  searchDepth: 'deep',
  includeWebSearch: true,
});

console.log(deepResearch.answer);
console.log('Sources:', deepResearch.sources);

// Anthropic Web Search - Domain-filtered research
import { AnthropicProvider } from '@push.rocks/smartai';

const anthropic = new AnthropicProvider({
  anthropicToken: 'sk-ant-...',
  enableWebSearch: true,
  searchDomainAllowList: ['nature.com', 'science.org'],
});

const scientificResearch = await anthropic.research({
  query: 'Latest breakthroughs in CRISPR gene editing',
  searchDepth: 'advanced',
});

// Perplexity - Research-focused with citations
const perplexityResearch = await ai.perplexityProvider.research({
  query: 'Current state of autonomous vehicle technology',
  searchDepth: 'deep', // Uses Sonar Pro model
});

Research Options:

searchDepth: 'basic' | 'advanced' | 'deep'
maxSources: Number of sources to include
includeWebSearch: Enable web search (OpenAI)
background: Run as background task (OpenAI)

Supported Providers:

OpenAI: Deep Research API with specialized models (o3-deep-research-*, o4-mini-deep-research-*)
Anthropic: Web Search API with domain filtering
Perplexity: Sonar and Sonar Pro models with built-in citations

🧠 Extended Thinking (Anthropic)

Enable Claude to spend more time reasoning about complex problems before generating responses:

import { AnthropicProvider } from '@push.rocks/smartai';

// Configure extended thinking mode at provider level
const anthropic = new AnthropicProvider({
  anthropicToken: 'sk-ant-...',
  extendedThinking: 'normal', // Options: 'quick' | 'normal' | 'deep' | 'off'
});

await anthropic.start();

// Extended thinking is automatically applied to all methods
const response = await anthropic.chat({
  systemMessage: 'You are an expert mathematician.',
  userMessage: 'Prove the Pythagorean theorem from first principles',
  messageHistory: [],
});

Thinking Modes:

Mode	Budget Tokens	Use Case
`'quick'`	2,048	Lightweight reasoning for simple queries
`'normal'`	8,000	Default — Balanced reasoning for most tasks
`'deep'`	16,000	Complex reasoning for difficult problems
`'off'`	0	Disable extended thinking

Best Practices:

Start with 'normal' (default) for general usage
Use 'deep' for complex analytical tasks, philosophy, mathematics, or research
Use 'quick' for simple factual queries where deep reasoning isn't needed
Thinking budget counts against total token usage

📑 Native PDF OCR (Mistral)

Mistral provides native PDF document processing via their OCR API — no image conversion required:

import { MistralProvider } from '@push.rocks/smartai';

const mistral = new MistralProvider({
  mistralToken: 'your-api-key',
  chatModel: 'mistral-large-latest', // Default
  ocrModel: 'mistral-ocr-latest', // Default
  tableFormat: 'markdown', // 'markdown' | 'html'
});

await mistral.start();

// Direct PDF processing - no image conversion overhead
const result = await mistral.document({
  systemMessage: 'You are a document analyst.',
  userMessage: 'Extract all invoice details and calculate the total.',
  pdfDocuments: [invoicePdfBuffer],
  messageHistory: [],
});

Key Advantage: Unlike other providers that convert PDFs to images first, Mistral's OCR API processes PDFs natively, potentially offering faster and more accurate text extraction for document-heavy workloads.

Supported Formats:

Native PDF processing via Files API
Image OCR (JPEG, PNG, GIF, WebP) for vision tasks
Table extraction with markdown or HTML output

🎨 Image Generation & Editing

Generate and edit images with OpenAI's cutting-edge models:

// Basic image generation with gpt-image-1
const image = await ai.openaiProvider.imageGenerate({
  prompt: 'A futuristic robot assistant in a modern office, digital art',
  model: 'gpt-image-1',
  quality: 'high',
  size: '1024x1024',
});

// Save the generated image
const imageBuffer = Buffer.from(image.images[0].b64_json!, 'base64');
fs.writeFileSync('robot.png', imageBuffer);

// Advanced: Transparent background with custom format
const logo = await ai.openaiProvider.imageGenerate({
  prompt: 'Minimalist mountain peak logo, geometric design',
  model: 'gpt-image-1',
  quality: 'high',
  size: '1024x1024',
  background: 'transparent',
  outputFormat: 'png',
});

// WebP with compression for web use
const webImage = await ai.openaiProvider.imageGenerate({
  prompt: 'Product showcase: sleek smartphone on marble surface',
  model: 'gpt-image-1',
  quality: 'high',
  size: '1536x1024',
  outputFormat: 'webp',
  outputCompression: 85,
});

// Superior text rendering (gpt-image-1's strength)
const signage = await ai.openaiProvider.imageGenerate({
  prompt:
    'Vintage cafe sign saying "COFFEE & CODE" in hand-lettered typography',
  model: 'gpt-image-1',
  quality: 'high',
  size: '1024x1024',
});

// Generate multiple variations at once
const variations = await ai.openaiProvider.imageGenerate({
  prompt: 'Abstract geometric pattern, colorful minimalist art',
  model: 'gpt-image-1',
  n: 3,
  quality: 'medium',
  size: '1024x1024',
});

// Edit an existing image
const editedImage = await ai.openaiProvider.imageEdit({
  image: originalImageBuffer,
  prompt: 'Add sunglasses and change the background to a beach sunset',
  model: 'gpt-image-1',
  quality: 'high',
});

Image Generation Options:

model: 'gpt-image-1' | 'dall-e-3' | 'dall-e-2'
quality: 'low' | 'medium' | 'high' | 'auto'
size: Multiple aspect ratios up to 4096×4096
background: 'transparent' | 'opaque' | 'auto'
outputFormat: 'png' | 'jpeg' | 'webp'
outputCompression: 0–100 for webp/jpeg
moderation: 'low' | 'auto'
n: Number of images (1–10)

gpt-image-1 Advantages:

Superior text rendering in images
Up to 4096×4096 resolution
Transparent background support
Advanced output formats (WebP with compression)
Better prompt understanding
Streaming support for progressive rendering

🔄 Persistent Conversations

Maintain context across interactions:

// Create a coding assistant conversation
const assistant = ai.createConversation('openai');
await assistant.setSystemMessage('You are an expert TypeScript developer.');

// First question
const inputWriter = assistant.getInputStreamWriter();
await inputWriter.write('How do I implement a singleton pattern?');

// Continue the conversation
await inputWriter.write('Now show me how to make it thread-safe');

// The assistant remembers the entire context

🚀 Real-World Examples

Build a Customer Support Bot

const supportBot = new SmartAi({
  anthropicToken: process.env.ANTHROPIC_KEY, // Claude for empathetic responses
});

async function handleCustomerQuery(query: string, history: ChatMessage[]) {
  try {
    const response = await supportBot.anthropicProvider.chat({
      systemMessage: `You are a helpful customer support agent.
                      Be empathetic, professional, and solution-oriented.`,
      userMessage: query,
      messageHistory: history,
    });

    return response.message;
  } catch (error) {
    // Fallback to another provider if needed
    return await supportBot.openaiProvider.chat({ /* ... */ });
  }
}

Create a Code Review Assistant

const codeReviewer = new SmartAi({
  groqToken: process.env.GROQ_KEY, // Groq for speed
});

async function reviewCode(code: string, language: string) {
  const review = await codeReviewer.groqProvider.chat({
    systemMessage: `You are a ${language} expert. Review code for:
                    - Security vulnerabilities
                    - Performance issues
                    - Best practices
                    - Potential bugs`,
    userMessage: `Review this code:\n\n${code}`,
    messageHistory: [],
  });

  return review.message;
}

Build a Research Assistant

const researcher = new SmartAi({
  perplexityToken: process.env.PERPLEXITY_KEY,
});

async function research(topic: string) {
  // Perplexity excels at web-aware research
  const findings = await researcher.perplexityProvider.research({
    query: `Research the latest developments in ${topic}`,
    searchDepth: 'deep',
  });

  return {
    answer: findings.answer,
    sources: findings.sources,
  };
}

Local AI for Sensitive Data

const localAI = new SmartAi({
  ollama: {
    baseUrl: 'http://localhost:11434',
    model: 'llama2',
    visionModel: 'llava',
  },
});

// Process sensitive documents without leaving your infrastructure
async function analyzeSensitiveDoc(pdfBuffer: Buffer) {
  const analysis = await localAI.ollamaProvider.document({
    systemMessage: 'Extract and summarize key information.',
    userMessage: 'Analyze this confidential document',
    messageHistory: [],
    pdfDocuments: [pdfBuffer],
  });

  // Data never leaves your servers
  return analysis.message;
}

⚡ Performance Tips

1. Provider Selection Strategy

class SmartAIRouter {
  constructor(private ai: SmartAi) {}

  async query(
    message: string,
    requirements: {
      speed?: boolean;
      accuracy?: boolean;
      cost?: boolean;
      privacy?: boolean;
    }
  ) {
    if (requirements.privacy) {
      return this.ai.ollamaProvider.chat({ /* ... */ }); // Local only
    }
    if (requirements.speed) {
      return this.ai.groqProvider.chat({ /* ... */ }); // 10x faster
    }
    if (requirements.accuracy) {
      return this.ai.anthropicProvider.chat({ /* ... */ }); // Best reasoning
    }
    // Default fallback
    return this.ai.openaiProvider.chat({ /* ... */ });
  }
}

2. Streaming for Large Responses

// Don't wait for the entire response
async function streamResponse(userQuery: string) {
  const stream = await ai.openaiProvider.chatStream(
    createInputStream(userQuery)
  );

  // Process tokens as they arrive
  for await (const chunk of stream) {
    updateUI(chunk); // Immediate feedback
    await processChunk(chunk); // Parallel processing
  }
}

3. Parallel Multi-Provider Queries

// Get the best answer from multiple AIs
async function consensusQuery(question: string) {
  const providers = [
    ai.openaiProvider.chat({ /* ... */ }),
    ai.anthropicProvider.chat({ /* ... */ }),
    ai.perplexityProvider.chat({ /* ... */ }),
  ];

  const responses = await Promise.all(providers);
  return synthesizeResponses(responses);
}

🛠️ Advanced Configuration

Provider-Specific Options

const ai = new SmartAi({
  // OpenAI
  openaiToken: 'sk-...',

  // Anthropic with extended thinking
  anthropicToken: 'sk-ant-...',

  // Perplexity for research
  perplexityToken: 'pplx-...',

  // Groq for speed
  groqToken: 'gsk_...',

  // Mistral with OCR settings
  mistralToken: 'your-key',
  mistral: {
    chatModel: 'mistral-large-latest',
    ocrModel: 'mistral-ocr-latest',
    tableFormat: 'markdown',
  },

  // XAI (Grok)
  xaiToken: 'xai-...',

  // ElevenLabs TTS
  elevenlabsToken: 'sk-...',
  elevenlabs: {
    defaultVoiceId: '19STyYD15bswVz51nqLf',
    defaultModelId: 'eleven_v3',
  },

  // Ollama (local)
  ollama: {
    baseUrl: 'http://localhost:11434',
    model: 'llama2',
    visionModel: 'llava',
    defaultOptions: {
      num_ctx: 4096,
      temperature: 0.7,
      top_p: 0.9,
    },
    defaultTimeout: 120000,
  },

  // Exo (distributed)
  exo: {
    baseUrl: 'http://localhost:8080/v1',
    apiKey: 'optional-key',
  },
});

Error Handling & Fallbacks

class ResilientAI {
  private providers = ['openai', 'anthropic', 'groq'];

  async query(opts: ChatOptions): Promise<ChatResponse> {
    for (const provider of this.providers) {
      try {
        return await this.ai[`${provider}Provider`].chat(opts);
      } catch (error) {
        console.warn(`${provider} failed, trying next...`);
        continue;
      }
    }
    throw new Error('All providers failed');
  }
}

🎯 Choosing the Right Provider

Use Case	Recommended Provider	Why
General Purpose	OpenAI	Most features, stable, well-documented
Complex Reasoning	Anthropic	Superior logical thinking, extended thinking, safer
Document OCR	Mistral	Native PDF processing, no image conversion overhead
Research & Facts	Perplexity	Web-aware, provides citations
Deep Research	OpenAI	Deep Research API with comprehensive analysis
Premium TTS	ElevenLabs	Most natural voices, 70+ languages, v3 model
Speed Critical	Groq	10x faster inference, sub-second responses
Privacy Critical	Ollama	100% local, no data leaves your servers
Real-time Data	XAI	Grok with access to current information
Cost Sensitive	Ollama/Exo	Free (local) or distributed compute

📈 Roadmap

Research & Web Search API
Image generation support (gpt-image-1, DALL-E 3, DALL-E 2)
Extended thinking (Anthropic)
Native PDF OCR (Mistral)
Streaming function calls
Voice input processing
Fine-tuning integration
Embedding support
Agent framework
More providers (Cohere, AI21, etc.)

License and Legal Information

This repository contains open-source code licensed under the MIT License. A copy of the license can be found in the LICENSE file.

Please note: The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.

Trademarks

This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH or third parties, and are not included within the scope of the MIT license granted herein.

Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines or the guidelines of the respective third-party owners, and any usage must be approved in writing. Third-party trademarks used herein are the property of their respective owners and used only in a descriptive manner, e.g. for an implementation of an API or similar.

23 KiB Raw Blame History Unescape Escape