feat(OllamaProvider): add model options, streaming support, and thinking tokens

- Add IOllamaModelOptions interface for runtime options (num_ctx, temperature, etc.) - Extend IOllamaProviderOptions with defaultOptions and defaultTimeout - Add IOllamaChatOptions for per-request overrides - Add IOllamaStreamChunk and IOllamaChatResponse interfaces - Add chatStreamResponse() for async iteration with options - Add collectStreamResponse() for streaming with progress callback - Add chatWithOptions() for non-streaming with full options - Update chat() to use defaultOptions and defaultTimeout
2026-01-20 00:02:45 +00:00
parent a556053510
commit 126e9b239b
12 changed files with 320 additions and 74 deletions
--- a/readme.md
+++ b/readme.md
@@ -6,7 +6,7 @@
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.x-blue.svg)](https://www.typescriptlang.org/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

-SmartAI unifies the world's leading AI providers - OpenAI, Anthropic, Perplexity, Ollama, Groq, XAI, Exo, and ElevenLabs - under a single, elegant TypeScript interface. Build AI applications at lightning speed without vendor lock-in.
+SmartAI unifies the world's leading AI providers - OpenAI, Anthropic, Mistral, Perplexity, Ollama, Groq, XAI, Exo, and ElevenLabs - under a single, elegant TypeScript interface. Build AI applications at lightning speed without vendor lock-in.

 ## Issue Reporting and Security

@@ -58,6 +58,7 @@ Choose the right provider for your use case:
 | -------------- | :--: | :-------: | :-: | :----: | :-------: | :------: | :----: | --------------------------------------------------------------- |
 | **OpenAI**     |  ✅  |    ✅     | ✅  |   ✅   |    ✅     |    ✅    |   ✅   | • gpt-image-1<br>• DALL-E 3<br>• Deep research API              |
 | **Anthropic**  |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ✅    |   ❌   | • Claude Sonnet 4.5<br>• Superior reasoning<br>• Web search API |
+| **Mistral**    |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | • Native PDF OCR<br>• mistral-large<br>• Fast inference         |
 | **ElevenLabs** |  ❌  |    ❌     | ✅  |   ❌   |    ❌     |    ❌    |   ❌   | • Premium TTS<br>• 70+ languages<br>• Natural voices            |
 | **Ollama**     |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | • 100% local<br>• Privacy-first<br>• No API costs               |
 | **XAI**        |  ✅  |    ✅     | ❌  |   ❌   |    ✅     |    ❌    |   ❌   | • Grok models<br>• Real-time data<br>• Uncensored               |
@@ -282,6 +283,38 @@ const response = await anthropic.chat({
 - Use `'quick'` for simple factual queries where deep reasoning isn't needed
 - Thinking budget counts against total token usage

+### 📑 Native PDF OCR (Mistral)
+
+Mistral provides native PDF document processing via their OCR API - no image conversion required:
+
+```typescript
+import { MistralProvider } from '@push.rocks/smartai';
+
+const mistral = new MistralProvider({
+  mistralToken: 'your-api-key',
+  chatModel: 'mistral-large-latest',  // Default
+  ocrModel: 'mistral-ocr-latest',     // Default
+  tableFormat: 'markdown',             // 'markdown' | 'html'
+});
+
+await mistral.start();
+
+// Direct PDF processing - no image conversion overhead
+const result = await mistral.document({
+  systemMessage: 'You are a document analyst.',
+  userMessage: 'Extract all invoice details and calculate the total.',
+  pdfDocuments: [invoicePdfBuffer],
+  messageHistory: [],
+});
+```
+
+**Key Advantage**: Unlike other providers that convert PDFs to images first, Mistral's OCR API processes PDFs natively, potentially offering faster and more accurate text extraction for document-heavy workloads.
+
+**Supported Formats:**
+- Native PDF processing via Files API
+- Image OCR (JPEG, PNG, GIF, WebP) for vision tasks
+- Table extraction with markdown or HTML output
+
 ### 🎨 Image Generation & Editing

 Generate and edit images with OpenAI's cutting-edge models:
@@ -645,6 +678,7 @@ export ELEVENLABS_API_KEY=sk-...
 | --------------------- | -------------------- | --------------------------------------------------------- |
 | **General Purpose**   | OpenAI               | Most features, stable, well-documented                    |
 | **Complex Reasoning** | Anthropic            | Superior logical thinking, safer outputs                  |
+| **Document OCR**      | Mistral              | Native PDF processing, no image conversion overhead       |
 | **Research & Facts**  | Perplexity           | Web-aware, provides citations                             |
 | **Deep Research**     | OpenAI               | Deep Research API with comprehensive analysis             |
 | **Premium TTS**       | ElevenLabs           | Most natural voices, 70+ languages, superior quality (v3) |