smartai/readme.md

# @push.rocks/smartai

**One API to rule them all** 🚀

[![npm version](https://img.shields.io/npm/v/@push.rocks/smartai.svg)](https://www.npmjs.com/package/@push.rocks/smartai)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.x-blue.svg)](https://www.typescriptlang.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

SmartAI unifies the world's leading AI providers — OpenAI, Anthropic, Mistral, Perplexity, Ollama, Groq, XAI, Exo, and ElevenLabs — under a single, elegant TypeScript interface. Build AI applications at lightning speed without vendor lock-in.

## Issue Reporting and Security

For reporting bugs, issues, or security vulnerabilities, please visit [community.foss.global/](https://community.foss.global/). This is the central community hub for all issue reporting. Developers who sign and comply with our contribution agreement and go through identification can also get a [code.foss.global/](https://code.foss.global/) account to submit Pull Requests directly.

## 🎯 Why SmartAI?

- **🔌 Universal Interface**: Write once, run with any AI provider. Switch between GPT-5, Claude, Llama, or Grok with a single line change.
- **🛡️ Type-Safe**: Full TypeScript support with comprehensive type definitions for all operations.
- **🌊 Streaming First**: Built for real-time applications with native streaming support.
- **🎨 Multi-Modal**: Seamlessly work with text, images, audio, and documents.
- **🏠 Local & Cloud**: Support for both cloud providers and local models via Ollama/Exo.
- **⚡ Zero Lock-In**: Your code remains portable across all AI providers.

## 📦 Installation

```bash
npm install @push.rocks/smartai
# or
pnpm install @push.rocks/smartai
```

## 🚀 Quick Start

```typescript
import { SmartAi } from '@push.rocks/smartai';

// Initialize with your favorite providers
const ai = new SmartAi({
  openaiToken: 'sk-...',
  anthropicToken: 'sk-ant-...',
  elevenlabsToken: 'sk-...',
  elevenlabs: {
    defaultVoiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
  },
});

await ai.start();

// Same API, multiple providers
const response = await ai.openaiProvider.chat({
  systemMessage: 'You are a helpful assistant.',
  userMessage: 'Explain quantum computing in simple terms',
  messageHistory: [],
});

console.log(response.message);
```

## 📊 Provider Capabilities Matrix

Choose the right provider for your use case:

| Provider       | Chat | Streaming | TTS | Vision | Documents | Research | Images | Highlights                                                      |
| -------------- | :--: | :-------: | :-: | :----: | :-------: | :------: | :----: | --------------------------------------------------------------- |
| **OpenAI**     |  ✅  |    ✅     | ✅  |   ✅   |    ✅     |    ✅    |   ✅   | gpt-image-1 • DALL-E 3 • Deep Research API                      |
| **Anthropic**  |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ✅    |   ❌   | Claude Sonnet 4.5 • Extended Thinking • Web Search API          |
| **Mistral**    |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | Native PDF OCR • mistral-large • Fast inference                 |
| **ElevenLabs** |  ❌  |    ❌     | ✅  |   ❌   |    ❌     |    ❌    |   ❌   | Premium TTS • 70+ languages • v3 model                          |
| **Ollama**     |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | 100% local • Privacy-first • No API costs                       |
| **XAI**        |  ✅  |    ✅     | ❌  |   ❌   |    ✅     |    ❌    |   ❌   | Grok 2 • Real-time data                                         |
| **Perplexity** |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ✅    |   ❌   | Web-aware • Research-focused • Sonar Pro                        |
| **Groq**       |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ❌    |   ❌   | 10x faster • LPU inference • Llama 3.3                          |
| **Exo**        |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ❌    |   ❌   | Distributed • P2P compute • Decentralized                       |

## 🎮 Core Features

### 💬 Universal Chat Interface

Works identically across all providers:

```typescript
// Use GPT-5 for complex reasoning
const gptResponse = await ai.openaiProvider.chat({
  systemMessage: 'You are an expert physicist.',
  userMessage: 'Explain the implications of quantum entanglement',
  messageHistory: [],
});

// Use Claude for safety-critical applications
const claudeResponse = await ai.anthropicProvider.chat({
  systemMessage: 'You are a medical advisor.',
  userMessage: 'Review this patient data for concerns',
  messageHistory: [],
});

// Use Groq for lightning-fast responses
const groqResponse = await ai.groqProvider.chat({
  systemMessage: 'You are a code reviewer.',
  userMessage: 'Quick! Find the bug in this code: ...',
  messageHistory: [],
});
```

### 🌊 Real-Time Streaming

Build responsive chat interfaces with token-by-token streaming:

```typescript
// Create a chat stream
const stream = await ai.openaiProvider.chatStream(inputStream);
const reader = stream.getReader();

// Display responses as they arrive
while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  // Update UI in real-time
  process.stdout.write(value);
}
```

### 🎙️ Text-to-Speech

Generate natural voices with OpenAI or ElevenLabs:

```typescript
// OpenAI TTS
const audioStream = await ai.openaiProvider.audio({
  message: 'Welcome to the future of AI development!',
});

// ElevenLabs TTS - Premium quality, natural voices (uses v3 by default)
const elevenLabsAudio = await ai.elevenlabsProvider.audio({
  message: 'Experience the most lifelike text to speech technology.',
  voiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
  modelId: 'eleven_v3', // Optional: defaults to eleven_v3 (70+ languages)
  voiceSettings: {
    // Optional: fine-tune voice characteristics
    stability: 0.5, // 0-1: Speech consistency
    similarity_boost: 0.8, // 0-1: Voice similarity to original
    style: 0.0, // 0-1: Expressiveness
    use_speaker_boost: true, // Enhanced clarity
  },
});

// Stream directly to speakers or save to file
audioStream.pipe(fs.createWriteStream('welcome.mp3'));
```

### 👁️ Vision Analysis

Understand images with multiple providers:

```typescript
const image = fs.readFileSync('product-photo.jpg');

// OpenAI: General purpose vision
const gptVision = await ai.openaiProvider.vision({
  image,
  prompt: 'Describe this product and suggest marketing angles',
});

// Anthropic: Detailed analysis with extended thinking
const claudeVision = await ai.anthropicProvider.vision({
  image,
  prompt: 'Identify any safety concerns or defects',
});

// Ollama: Private, local analysis
const ollamaVision = await ai.ollamaProvider.vision({
  image,
  prompt: 'Extract all text and categorize the content',
});
```

### 📄 Document Intelligence

Extract insights from PDFs with AI:

```typescript
const contract = fs.readFileSync('contract.pdf');
const invoice = fs.readFileSync('invoice.pdf');

// Analyze documents with OpenAI
const analysis = await ai.openaiProvider.document({
  systemMessage: 'You are a legal expert.',
  userMessage: 'Compare these documents and highlight key differences',
  messageHistory: [],
  pdfDocuments: [contract, invoice],
});

// Multi-document analysis with Anthropic
const taxDocs = [form1099, w2, receipts];
const taxAnalysis = await ai.anthropicProvider.document({
  systemMessage: 'You are a tax advisor.',
  userMessage: 'Prepare a tax summary from these documents',
  messageHistory: [],
  pdfDocuments: taxDocs,
});
```

### 🔬 Research & Web Search

Perform deep research with web search capabilities across multiple providers:

```typescript
// OpenAI Deep Research - Comprehensive analysis
const deepResearch = await ai.openaiProvider.research({
  query: 'What are the latest developments in quantum computing?',
  searchDepth: 'deep',
  includeWebSearch: true,
});

console.log(deepResearch.answer);
console.log('Sources:', deepResearch.sources);

// Anthropic Web Search - Domain-filtered research
import { AnthropicProvider } from '@push.rocks/smartai';

const anthropic = new AnthropicProvider({
  anthropicToken: 'sk-ant-...',
  enableWebSearch: true,
  searchDomainAllowList: ['nature.com', 'science.org'],
});

const scientificResearch = await anthropic.research({
  query: 'Latest breakthroughs in CRISPR gene editing',
  searchDepth: 'advanced',
});

// Perplexity - Research-focused with citations
const perplexityResearch = await ai.perplexityProvider.research({
  query: 'Current state of autonomous vehicle technology',
  searchDepth: 'deep', // Uses Sonar Pro model
});
```

**Research Options:**

- `searchDepth`: `'basic'` | `'advanced'` | `'deep'`
- `maxSources`: Number of sources to include
- `includeWebSearch`: Enable web search (OpenAI)
- `background`: Run as background task (OpenAI)

**Supported Providers:**

- **OpenAI**: Deep Research API with specialized models (`o3-deep-research-*`, `o4-mini-deep-research-*`)
- **Anthropic**: Web Search API with domain filtering
- **Perplexity**: Sonar and Sonar Pro models with built-in citations

### 🧠 Extended Thinking (Anthropic)

Enable Claude to spend more time reasoning about complex problems before generating responses:

```typescript
import { AnthropicProvider } from '@push.rocks/smartai';

// Configure extended thinking mode at provider level
const anthropic = new AnthropicProvider({
  anthropicToken: 'sk-ant-...',
  extendedThinking: 'normal', // Options: 'quick' | 'normal' | 'deep' | 'off'
});

await anthropic.start();

// Extended thinking is automatically applied to all methods
const response = await anthropic.chat({
  systemMessage: 'You are an expert mathematician.',
  userMessage: 'Prove the Pythagorean theorem from first principles',
  messageHistory: [],
});
```

**Thinking Modes:**

| Mode       | Budget Tokens | Use Case                                         |
| ---------- | ------------- | ------------------------------------------------ |
| `'quick'`  | 2,048         | Lightweight reasoning for simple queries         |
| `'normal'` | 8,000         | **Default** — Balanced reasoning for most tasks  |
| `'deep'`   | 16,000        | Complex reasoning for difficult problems         |
| `'off'`    | 0             | Disable extended thinking                        |

**Best Practices:**

- Start with `'normal'` (default) for general usage
- Use `'deep'` for complex analytical tasks, philosophy, mathematics, or research
- Use `'quick'` for simple factual queries where deep reasoning isn't needed
- Thinking budget counts against total token usage

### 📑 Native PDF OCR (Mistral)

Mistral provides native PDF document processing via their OCR API — no image conversion required:

```typescript
import { MistralProvider } from '@push.rocks/smartai';

const mistral = new MistralProvider({
  mistralToken: 'your-api-key',
  chatModel: 'mistral-large-latest', // Default
  ocrModel: 'mistral-ocr-latest', // Default
  tableFormat: 'markdown', // 'markdown' | 'html'
});

await mistral.start();

// Direct PDF processing - no image conversion overhead
const result = await mistral.document({
  systemMessage: 'You are a document analyst.',
  userMessage: 'Extract all invoice details and calculate the total.',
  pdfDocuments: [invoicePdfBuffer],
  messageHistory: [],
});
```

**Key Advantage**: Unlike other providers that convert PDFs to images first, Mistral's OCR API processes PDFs natively, potentially offering faster and more accurate text extraction for document-heavy workloads.

**Supported Formats:**

- Native PDF processing via Files API
- Image OCR (JPEG, PNG, GIF, WebP) for vision tasks
- Table extraction with markdown or HTML output

### 🎨 Image Generation & Editing

Generate and edit images with OpenAI's cutting-edge models:

```typescript
// Basic image generation with gpt-image-1
const image = await ai.openaiProvider.imageGenerate({
  prompt: 'A futuristic robot assistant in a modern office, digital art',
  model: 'gpt-image-1',
  quality: 'high',
  size: '1024x1024',
});

// Save the generated image
const imageBuffer = Buffer.from(image.images[0].b64_json!, 'base64');
fs.writeFileSync('robot.png', imageBuffer);

// Advanced: Transparent background with custom format
const logo = await ai.openaiProvider.imageGenerate({
  prompt: 'Minimalist mountain peak logo, geometric design',
  model: 'gpt-image-1',
  quality: 'high',
  size: '1024x1024',
  background: 'transparent',
  outputFormat: 'png',
});

// WebP with compression for web use
const webImage = await ai.openaiProvider.imageGenerate({
  prompt: 'Product showcase: sleek smartphone on marble surface',
  model: 'gpt-image-1',
  quality: 'high',
  size: '1536x1024',
  outputFormat: 'webp',
  outputCompression: 85,
});

// Superior text rendering (gpt-image-1's strength)
const signage = await ai.openaiProvider.imageGenerate({
  prompt:
    'Vintage cafe sign saying "COFFEE & CODE" in hand-lettered typography',
  model: 'gpt-image-1',
  quality: 'high',
  size: '1024x1024',
});

// Generate multiple variations at once
const variations = await ai.openaiProvider.imageGenerate({
  prompt: 'Abstract geometric pattern, colorful minimalist art',
  model: 'gpt-image-1',
  n: 3,
  quality: 'medium',
  size: '1024x1024',
});

// Edit an existing image
const editedImage = await ai.openaiProvider.imageEdit({
  image: originalImageBuffer,
  prompt: 'Add sunglasses and change the background to a beach sunset',
  model: 'gpt-image-1',
  quality: 'high',
});
```

**Image Generation Options:**

- `model`: `'gpt-image-1'` | `'dall-e-3'` | `'dall-e-2'`
- `quality`: `'low'` | `'medium'` | `'high'` | `'auto'`
- `size`: Multiple aspect ratios up to 4096×4096
- `background`: `'transparent'` | `'opaque'` | `'auto'`
- `outputFormat`: `'png'` | `'jpeg'` | `'webp'`
- `outputCompression`: 0–100 for webp/jpeg
- `moderation`: `'low'` | `'auto'`
- `n`: Number of images (1–10)

**gpt-image-1 Advantages:**

- Superior text rendering in images
- Up to 4096×4096 resolution
- Transparent background support
- Advanced output formats (WebP with compression)
- Better prompt understanding
- Streaming support for progressive rendering

### 🔄 Persistent Conversations

Maintain context across interactions:

```typescript
// Create a coding assistant conversation
const assistant = ai.createConversation('openai');
await assistant.setSystemMessage('You are an expert TypeScript developer.');

// First question
const inputWriter = assistant.getInputStreamWriter();
await inputWriter.write('How do I implement a singleton pattern?');

// Continue the conversation
await inputWriter.write('Now show me how to make it thread-safe');

// The assistant remembers the entire context
```

## 🚀 Real-World Examples

### Build a Customer Support Bot

```typescript
const supportBot = new SmartAi({
  anthropicToken: process.env.ANTHROPIC_KEY, // Claude for empathetic responses
});

async function handleCustomerQuery(query: string, history: ChatMessage[]) {
  try {
    const response = await supportBot.anthropicProvider.chat({
      systemMessage: `You are a helpful customer support agent.
                      Be empathetic, professional, and solution-oriented.`,
      userMessage: query,
      messageHistory: history,
    });

    return response.message;
  } catch (error) {
    // Fallback to another provider if needed
    return await supportBot.openaiProvider.chat({ /* ... */ });
  }
}
```

### Create a Code Review Assistant

```typescript
const codeReviewer = new SmartAi({
  groqToken: process.env.GROQ_KEY, // Groq for speed
});

async function reviewCode(code: string, language: string) {
  const review = await codeReviewer.groqProvider.chat({
    systemMessage: `You are a ${language} expert. Review code for:
                    - Security vulnerabilities
                    - Performance issues
                    - Best practices
                    - Potential bugs`,
    userMessage: `Review this code:\n\n${code}`,
    messageHistory: [],
  });

  return review.message;
}
```

### Build a Research Assistant

```typescript
const researcher = new SmartAi({
  perplexityToken: process.env.PERPLEXITY_KEY,
});

async function research(topic: string) {
  // Perplexity excels at web-aware research
  const findings = await researcher.perplexityProvider.research({
    query: `Research the latest developments in ${topic}`,
    searchDepth: 'deep',
  });

  return {
    answer: findings.answer,
    sources: findings.sources,
  };
}
```

### Local AI for Sensitive Data

```typescript
const localAI = new SmartAi({
  ollama: {
    baseUrl: 'http://localhost:11434',
    model: 'llama2',
    visionModel: 'llava',
  },
});

// Process sensitive documents without leaving your infrastructure
async function analyzeSensitiveDoc(pdfBuffer: Buffer) {
  const analysis = await localAI.ollamaProvider.document({
    systemMessage: 'Extract and summarize key information.',
    userMessage: 'Analyze this confidential document',
    messageHistory: [],
    pdfDocuments: [pdfBuffer],
  });

  // Data never leaves your servers
  return analysis.message;
}
```

## ⚡ Performance Tips

### 1. Provider Selection Strategy

```typescript
class SmartAIRouter {
  constructor(private ai: SmartAi) {}

  async query(
    message: string,
    requirements: {
      speed?: boolean;
      accuracy?: boolean;
      cost?: boolean;
      privacy?: boolean;
    }
  ) {
    if (requirements.privacy) {
      return this.ai.ollamaProvider.chat({ /* ... */ }); // Local only
    }
    if (requirements.speed) {
      return this.ai.groqProvider.chat({ /* ... */ }); // 10x faster
    }
    if (requirements.accuracy) {
      return this.ai.anthropicProvider.chat({ /* ... */ }); // Best reasoning
    }
    // Default fallback
    return this.ai.openaiProvider.chat({ /* ... */ });
  }
}
```

### 2. Streaming for Large Responses

```typescript
// Don't wait for the entire response
async function streamResponse(userQuery: string) {
  const stream = await ai.openaiProvider.chatStream(
    createInputStream(userQuery)
  );

  // Process tokens as they arrive
  for await (const chunk of stream) {
    updateUI(chunk); // Immediate feedback
    await processChunk(chunk); // Parallel processing
  }
}
```

### 3. Parallel Multi-Provider Queries

```typescript
// Get the best answer from multiple AIs
async function consensusQuery(question: string) {
  const providers = [
    ai.openaiProvider.chat({ /* ... */ }),
    ai.anthropicProvider.chat({ /* ... */ }),
    ai.perplexityProvider.chat({ /* ... */ }),
  ];

  const responses = await Promise.all(providers);
  return synthesizeResponses(responses);
}
```

## 🛠️ Advanced Configuration

### Provider-Specific Options

```typescript
const ai = new SmartAi({
  // OpenAI
  openaiToken: 'sk-...',

  // Anthropic with extended thinking
  anthropicToken: 'sk-ant-...',

  // Perplexity for research
  perplexityToken: 'pplx-...',

  // Groq for speed
  groqToken: 'gsk_...',

  // Mistral with OCR settings
  mistralToken: 'your-key',
  mistral: {
    chatModel: 'mistral-large-latest',
    ocrModel: 'mistral-ocr-latest',
    tableFormat: 'markdown',
  },

  // XAI (Grok)
  xaiToken: 'xai-...',

  // ElevenLabs TTS
  elevenlabsToken: 'sk-...',
  elevenlabs: {
    defaultVoiceId: '19STyYD15bswVz51nqLf',
    defaultModelId: 'eleven_v3',
  },

  // Ollama (local)
  ollama: {
    baseUrl: 'http://localhost:11434',
    model: 'llama2',
    visionModel: 'llava',
    defaultOptions: {
      num_ctx: 4096,
      temperature: 0.7,
      top_p: 0.9,
    },
    defaultTimeout: 120000,
  },

  // Exo (distributed)
  exo: {
    baseUrl: 'http://localhost:8080/v1',
    apiKey: 'optional-key',
  },
});
```

### Error Handling & Fallbacks

```typescript
class ResilientAI {
  private providers = ['openai', 'anthropic', 'groq'];

  async query(opts: ChatOptions): Promise<ChatResponse> {
    for (const provider of this.providers) {
      try {
        return await this.ai[`${provider}Provider`].chat(opts);
      } catch (error) {
        console.warn(`${provider} failed, trying next...`);
        continue;
      }
    }
    throw new Error('All providers failed');
  }
}
```

## 🎯 Choosing the Right Provider

| Use Case              | Recommended Provider | Why                                                       |
| --------------------- | -------------------- | --------------------------------------------------------- |
| **General Purpose**   | OpenAI               | Most features, stable, well-documented                    |
| **Complex Reasoning** | Anthropic            | Superior logical thinking, extended thinking, safer       |
| **Document OCR**      | Mistral              | Native PDF processing, no image conversion overhead       |
| **Research & Facts**  | Perplexity           | Web-aware, provides citations                             |
| **Deep Research**     | OpenAI               | Deep Research API with comprehensive analysis             |
| **Premium TTS**       | ElevenLabs           | Most natural voices, 70+ languages, v3 model              |
| **Speed Critical**    | Groq                 | 10x faster inference, sub-second responses                |
| **Privacy Critical**  | Ollama               | 100% local, no data leaves your servers                   |
| **Real-time Data**    | XAI                  | Grok with access to current information                   |
| **Cost Sensitive**    | Ollama/Exo           | Free (local) or distributed compute                       |

## 📈 Roadmap

- [x] Research & Web Search API
- [x] Image generation support (gpt-image-1, DALL-E 3, DALL-E 2)
- [x] Extended thinking (Anthropic)
- [x] Native PDF OCR (Mistral)
- [ ] Streaming function calls
- [ ] Voice input processing
- [ ] Fine-tuning integration
- [ ] Embedding support
- [ ] Agent framework
- [ ] More providers (Cohere, AI21, etc.)

## License and Legal Information

This repository contains open-source code licensed under the MIT License. A copy of the license can be found in the [LICENSE](./LICENSE) file.

**Please note:** The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.

### Trademarks

This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH or third parties, and are not included within the scope of the MIT license granted herein.

Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines or the guidelines of the respective third-party owners, and any usage must be approved in writing. Third-party trademarks used herein are the property of their respective owners and used only in a descriptive manner, e.g. for an implementation of an API or similar.

### Company Information

Task Venture Capital GmbH
Registered at District Court Bremen HRB 35230 HB, Germany

For any legal inquiries or further information, please contact us via email at hello@task.vc.

By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.