Compare commits

..

2 Commits

Author SHA1 Message Date
Juergen Kunz
be574df599 feat(image): add progressive JPEG generation support
Some checks failed
Default (tags) / security (push) Failing after 24s
Default (tags) / test (push) Failing after 12s
Default (tags) / release (push) Has been skipped
Default (tags) / metadata (push) Has been skipped
- Add convertPDFToJpegBytes method for progressive JPEG images
- Integrate @push.rocks/smartjimp for true progressive encoding
- Update readme with comprehensive documentation
- Update legal section to Task Venture Capital GmbH
2025-08-02 17:29:38 +00:00
Juergen Kunz
6a4aeed3e1 BREAKING CHANGE(smartpdf): improve image generation quality and API consistency
- Renamed convertPDFToWebpPreviews to convertPDFToWebpBytes for consistency
- Added configurable scale options with DPI support
- Changed default scale to 3.0 (216 DPI) for better quality
- Added DPI helper methods and scale constants
2025-08-02 12:37:48 +00:00
8 changed files with 1625 additions and 293 deletions

View File

@@ -1,5 +1,17 @@
# Changelog # Changelog
## 2025-08-02 - 4.0.0 - BREAKING CHANGE(smartpdf)
Improve image generation quality and API consistency
- BREAKING: Renamed `convertPDFToWebpPreviews` to `convertPDFToWebpBytes` for API consistency
- Added configurable scale options to `convertPDFToPngBytes` method
- Changed default scale from 1.0 to 3.0 for PNG generation (216 DPI)
- Changed default scale from 0.5 to 3.0 for WebP generation (216 DPI)
- Added DPI helper methods: `getScaleForDPI()` and scale constants (SCALE_SCREEN, SCALE_HIGH, SCALE_PRINT)
- Added maxWidth/maxHeight constraints for both PNG and WebP generation
- Improved test file organization with clear naming conventions
- Updated documentation with DPI/scale guidance and examples
## 2025-08-01 - 3.3.0 - feat(smartpdf) ## 2025-08-01 - 3.3.0 - feat(smartpdf)
Add automatic port allocation and multi-instance support Add automatic port allocation and multi-instance support

View File

@@ -1,6 +1,6 @@
{ {
"name": "@push.rocks/smartpdf", "name": "@push.rocks/smartpdf",
"version": "3.3.0", "version": "4.0.0",
"private": false, "private": false,
"description": "A library for creating PDFs dynamically from HTML or websites with additional features like merging PDFs.", "description": "A library for creating PDFs dynamically from HTML or websites with additional features like merging PDFs.",
"main": "dist_ts/index.js", "main": "dist_ts/index.js",
@@ -9,7 +9,7 @@
"author": "Lossless GmbH", "author": "Lossless GmbH",
"license": "MIT", "license": "MIT",
"scripts": { "scripts": {
"test": "(tstest test/ --verbose --timeout 60)", "test": "(tstest test/ --verbose --timeout 120)",
"build": "(tsbuild tsfolders --allowimplicitany)", "build": "(tsbuild tsfolders --allowimplicitany)",
"buildDocs": "tsdoc" "buildDocs": "tsdoc"
}, },
@@ -24,6 +24,7 @@
"@push.rocks/smartbuffer": "^3.0.5", "@push.rocks/smartbuffer": "^3.0.5",
"@push.rocks/smartdelay": "^3.0.5", "@push.rocks/smartdelay": "^3.0.5",
"@push.rocks/smartfile": "^11.2.5", "@push.rocks/smartfile": "^11.2.5",
"@push.rocks/smartjimp": "^1.2.0",
"@push.rocks/smartnetwork": "^4.1.2", "@push.rocks/smartnetwork": "^4.1.2",
"@push.rocks/smartpath": "^6.0.0", "@push.rocks/smartpath": "^6.0.0",
"@push.rocks/smartpromise": "^4.2.3", "@push.rocks/smartpromise": "^4.2.3",

839
pnpm-lock.yaml generated

File diff suppressed because it is too large Load Diff

2
pnpm-workspace.yaml Normal file
View File

@@ -0,0 +1,2 @@
onlyBuiltDependencies:
- sharp

530
readme.md
View File

@@ -1,317 +1,409 @@
# @push.rocks/smartpdf # @push.rocks/smartpdf 📄✨
Create PDFs on the fly from HTML, websites, or existing PDFs with advanced features like text extraction, PDF merging, and PNG conversion.
## Install > **Transform HTML, websites, and PDFs into beautiful documents with just a few lines of code!**
To install `@push.rocks/smartpdf`, use npm or yarn:
[![npm version](https://img.shields.io/npm/v/@push.rocks/smartpdf.svg?style=flat-square)](https://www.npmjs.com/package/@push.rocks/smartpdf)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.x-blue.svg?style=flat-square)](https://www.typescriptlang.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](./license)
## 🚀 Why SmartPDF?
SmartPDF is your Swiss Army knife for PDF operations in Node.js. Whether you're generating invoices, creating reports, or converting web pages to PDFs, we've got you covered with a simple, powerful API.
### ✨ Features at a Glance
- 📝 **HTML to PDF** - Transform HTML strings with full CSS support
- 🌐 **Website to PDF** - Capture any website as a perfectly formatted PDF
- 🔀 **PDF Merging** - Combine multiple PDFs into one
- 🖼️ **PDF to Images** - Convert PDFs to PNG, WebP, or progressive JPEG
- 📑 **Text Extraction** - Pull text content from existing PDFs
- 🎯 **Smart Port Management** - Automatic port allocation for concurrent instances
- 💪 **TypeScript First** - Full type safety and IntelliSense support
-**High Performance** - Optimized for speed and reliability
## 📦 Installation
```bash ```bash
# Using npm
npm install @push.rocks/smartpdf --save npm install @push.rocks/smartpdf --save
```
Or with yarn: # Using yarn
```bash
yarn add @push.rocks/smartpdf yarn add @push.rocks/smartpdf
# Using pnpm (recommended)
pnpm add @push.rocks/smartpdf
``` ```
## Requirements ## 🎯 Quick Start
This package requires a Chrome or Chromium installation to be available on the system, as it uses Puppeteer for rendering. The package will automatically detect and use the appropriate executable.
## Usage
`@push.rocks/smartpdf` provides a powerful interface for PDF generation and manipulation. All examples use ESM syntax and TypeScript.
### Getting Started
First, import the necessary classes:
```typescript ```typescript
import { SmartPdf, IPdf } from '@push.rocks/smartpdf'; import { SmartPdf } from '@push.rocks/smartpdf';
```
### Basic Setup with Automatic Port Allocation // Create and start SmartPdf
SmartPdf automatically finds an available port between 20000-30000 for its internal server:
```typescript
async function setupSmartPdf() {
const smartPdf = await SmartPdf.create(); const smartPdf = await SmartPdf.create();
await smartPdf.start(); await smartPdf.start();
// Your PDF operations here // Generate a PDF from HTML
const pdf = await smartPdf.getA4PdfResultForHtmlString(`
<h1>Hello, PDF World! 🌍</h1>
<p>This is my first SmartPDF document.</p>
`);
// Save it
await fs.writeFile('my-first-pdf.pdf', pdf.buffer);
// Don't forget to clean up!
await smartPdf.stop(); await smartPdf.stop();
}
``` ```
### Advanced Setup with Custom Port Configuration ## 📚 Core Concepts
You can specify custom port settings to avoid conflicts or meet specific requirements:
### 🏗️ Instance Management
SmartPDF uses a client-server architecture for maximum performance. Always remember:
1. **Create** an instance
2. **Start** the server
3. **Do your PDF magic**
4. **Stop** the server
```typescript ```typescript
// Use a specific port const smartPdf = await SmartPdf.create();
const smartPdf = await SmartPdf.create({ port: 3000 }); await smartPdf.start();
// ... your PDF operations ...
await smartPdf.stop();
```
// Use a custom port range ### 🔌 Smart Port Allocation
const smartPdf = await SmartPdf.create({
portRangeStart: 4000, Run multiple instances without port conflicts:
```typescript
// Each instance automatically finds a free port
const instance1 = await SmartPdf.create(); // Port: 20000
const instance2 = await SmartPdf.create(); // Port: 20001
const instance3 = await SmartPdf.create(); // Port: 20002
// Or specify custom settings
const customInstance = await SmartPdf.create({
port: 3000, // Use specific port
portRangeStart: 4000, // Or define a range
portRangeEnd: 5000 portRangeEnd: 5000
}); });
// The server will find an available port in your specified range
await smartPdf.start();
console.log(`Server running on port: ${smartPdf.serverPort}`);
``` ```
### Creating PDFs from HTML Strings ## 🎨 PDF Generation
Generate PDFs from HTML content with full CSS support:
### 📝 From HTML String
Create beautiful PDFs from HTML with full CSS support:
```typescript ```typescript
async function createPdfFromHtml() {
const smartPdf = await SmartPdf.create(); const smartPdf = await SmartPdf.create();
await smartPdf.start(); await smartPdf.start();
const htmlString = ` const pdf = await smartPdf.getA4PdfResultForHtmlString(`
<!DOCTYPE html> <!DOCTYPE html>
<html> <html>
<head> <head>
<style> <style>
body { font-family: Arial, sans-serif; margin: 40px; } @import url('https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;700&display=swap');
h1 { color: #333; }
.highlight { background-color: yellow; } body {
font-family: 'Roboto', sans-serif;
margin: 40px;
color: #333;
}
.header {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
padding: 30px;
border-radius: 10px;
text-align: center;
}
.content {
margin-top: 30px;
line-height: 1.6;
}
.highlight {
background-color: #ffd93d;
padding: 2px 6px;
border-radius: 3px;
}
</style> </style>
</head> </head>
<body> <body>
<h1>Professional PDF Document</h1> <div class="header">
<p>This PDF was generated from <span class="highlight">HTML content</span>.</p> <h1>Invoice #2024-001</h1>
<p>Generated on ${new Date().toLocaleDateString()}</p>
</div>
<div class="content">
<h2>Bill To:</h2>
<p>Acme Corporation</p>
<p>Total: <span class="highlight">$1,234.56</span></p>
</div>
</body> </body>
</html> </html>
`; `);
const pdf: IPdf = await smartPdf.getA4PdfResultForHtmlString(htmlString);
// pdf.buffer contains the PDF data
// pdf.id contains a unique identifier
// pdf.name contains the filename
// pdf.metadata contains additional information like extracted text
await fs.writeFile('invoice.pdf', pdf.buffer);
await smartPdf.stop(); await smartPdf.stop();
}
``` ```
### Generating PDFs from Websites ### 🌐 From Website
Capture web pages as PDFs with two different approaches:
#### A4 Format PDF from Website Capture any website as a PDF with two powerful methods:
Captures the viewable area formatted for A4 paper:
#### Standard A4 Format
Perfect for articles and documents:
```typescript ```typescript
async function createA4PdfFromWebsite() { const pdf = await smartPdf.getPdfResultForWebsite('https://example.com');
const smartPdf = await SmartPdf.create();
await smartPdf.start();
const pdf: IPdf = await smartPdf.getPdfResultForWebsite('https://example.com');
// Save to file
await fs.writeFile('website-a4.pdf', pdf.buffer);
await smartPdf.stop();
}
``` ```
#### Full Webpage as Single PDF #### Full Page Capture
Captures the entire webpage in a single PDF, regardless of length: Capture the entire scrollable area:
```typescript ```typescript
async function createFullPdfFromWebsite() { const fullPagePdf = await smartPdf.getFullWebsiteAsSinglePdf('https://example.com');
const smartPdf = await SmartPdf.create();
await smartPdf.start();
const pdf: IPdf = await smartPdf.getFullWebsiteAsSinglePdf('https://example.com');
// This captures the entire scrollable area
await fs.writeFile('website-full.pdf', pdf.buffer);
await smartPdf.stop();
}
``` ```
### Merging Multiple PDFs ### 🔀 Merge Multiple PDFs
Combine multiple PDF files into a single document:
Combine PDFs like a pro:
```typescript ```typescript
async function mergePdfs() { // Load your PDFs
const smartPdf = await SmartPdf.create(); const invoice = await smartPdf.readFileToPdfObject('./invoice.pdf');
await smartPdf.start(); const terms = await smartPdf.readFileToPdfObject('./terms.pdf');
const contract = await smartPdf.getA4PdfResultForHtmlString('<h1>Contract</h1>...');
// Create or load your PDFs // Merge them in order
const pdf1 = await smartPdf.getA4PdfResultForHtmlString('<h1>Document 1</h1>'); const mergedPdf = await smartPdf.mergePdfs([
const pdf2 = await smartPdf.getA4PdfResultForHtmlString('<h1>Document 2</h1>'); contract.buffer,
const pdf3 = await smartPdf.readFileToPdfObject('./existing-document.pdf'); invoice.buffer,
terms.buffer
// Merge PDFs - order matters!
const mergedPdf: Uint8Array = await smartPdf.mergePdfs([
pdf1.buffer,
pdf2.buffer,
pdf3.buffer
]); ]);
// Save the merged PDF await fs.writeFile('complete-document.pdf', mergedPdf);
await fs.writeFile('merged-document.pdf', mergedPdf);
await smartPdf.stop();
}
``` ```
### Reading PDFs and Extracting Text ## 🖼️ Image Generation
Extract text content from existing PDFs:
### 🎨 Convert PDF to Images
SmartPDF supports three image formats, each with its own strengths:
#### PNG - Crystal Clear Quality
```typescript ```typescript
async function extractTextFromPdf() { const pngImages = await smartPdf.convertPDFToPngBytes(pdf.buffer, {
const smartPdf = await SmartPdf.create(); scale: SmartPdf.SCALE_HIGH // 216 DPI - perfect for most uses
// Read PDF from disk
const pdf: IPdf = await smartPdf.readFileToPdfObject('/path/to/document.pdf');
// Extract all text
const extractedText = await smartPdf.extractTextFromPdfBuffer(pdf.buffer);
console.log('Extracted text:', extractedText);
// The pdf object also contains metadata with text extraction
console.log('Metadata:', pdf.metadata);
}
```
### Converting PDF to PNG Images
Convert each page of a PDF into PNG images:
```typescript
async function convertPdfToPng() {
const smartPdf = await SmartPdf.create();
await smartPdf.start();
// Load a PDF
const pdf = await smartPdf.readFileToPdfObject('./document.pdf');
// Convert to PNG images (one per page)
const pngImages: Uint8Array[] = await smartPdf.convertPDFToPngBytes(pdf.buffer);
// Save each page as a PNG
pngImages.forEach((pngBuffer, index) => {
fs.writeFileSync(`page-${index + 1}.png`, pngBuffer);
}); });
await smartPdf.stop(); // Save each page
} pngImages.forEach((png, index) => {
``` fs.writeFileSync(`page-${index + 1}.png`, png);
### Using External Browser Instance
For advanced use cases, you can provide your own Puppeteer browser instance:
```typescript
import puppeteer from 'puppeteer';
async function useExternalBrowser() {
// Create your own browser instance with custom options
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
}); });
const smartPdf = await SmartPdf.create();
await smartPdf.start(browser);
// Use SmartPdf normally
const pdf = await smartPdf.getA4PdfResultForHtmlString('<h1>Hello</h1>');
// SmartPdf will not close the browser when stopping
await smartPdf.stop();
// You control the browser lifecycle
await browser.close();
}
``` ```
### Running Multiple Instances #### WebP - Modern & Efficient
Thanks to automatic port allocation, you can run multiple SmartPdf instances simultaneously:
```typescript ```typescript
async function runMultipleInstances() { const webpImages = await smartPdf.convertPDFToWebpBytes(pdf.buffer, {
// Each instance automatically finds its own free port quality: 90, // 0-100 quality scale
const instance1 = await SmartPdf.create(); scale: 2.0 // 144 DPI - great for web
const instance2 = await SmartPdf.create(); });
const instance3 = await SmartPdf.create();
// Start all instances
await Promise.all([
instance1.start(),
instance2.start(),
instance3.start()
]);
console.log(`Instance 1 running on port: ${instance1.serverPort}`);
console.log(`Instance 2 running on port: ${instance2.serverPort}`);
console.log(`Instance 3 running on port: ${instance3.serverPort}`);
// Use instances independently
const pdfs = await Promise.all([
instance1.getA4PdfResultForHtmlString('<h1>PDF 1</h1>'),
instance2.getA4PdfResultForHtmlString('<h1>PDF 2</h1>'),
instance3.getA4PdfResultForHtmlString('<h1>PDF 3</h1>')
]);
// Clean up all instances
await Promise.all([
instance1.stop(),
instance2.stop(),
instance3.stop()
]);
}
``` ```
### Error Handling #### JPEG - Progressive Loading
Always wrap SmartPdf operations in try-catch blocks and ensure proper cleanup:
```typescript
const jpegImages = await smartPdf.convertPDFToJpegBytes(pdf.buffer, {
quality: 85, // Balance between size and quality
scale: SmartPdf.SCALE_SCREEN, // 144 DPI
maxWidth: 1920 // Constrain dimensions
});
```
### 📏 DPI & Scale Guide
SmartPDF makes it easy to get the right resolution:
```typescript
// Built-in scale constants
SmartPdf.SCALE_SCREEN // 2.0 = ~144 DPI (web display)
SmartPdf.SCALE_HIGH // 3.0 = ~216 DPI (high quality, default)
SmartPdf.SCALE_PRINT // 6.0 = ~432 DPI (print quality)
// Or calculate your own
const scale = SmartPdf.getScaleForDPI(300); // Get scale for 300 DPI
```
### 🖼️ Thumbnail Generation
Create perfect thumbnails for document previews:
```typescript
const thumbnails = await smartPdf.convertPDFToWebpBytes(pdf.buffer, {
scale: 0.5, // Small but readable
quality: 70, // Lower quality for tiny files
maxWidth: 200, // Constrain to thumbnail size
maxHeight: 200
});
```
## 📊 Format Comparison
Choose the right format for your needs:
| Format | File Size | Best For | Special Features |
|--------|-----------|----------|------------------|
| **PNG** | Largest | Screenshots, diagrams, text | Lossless, transparency |
| **JPEG** | 30-50% of PNG | Photos, complex images | Progressive loading |
| **WebP** | 25-40% of PNG | Modern web apps | Best compression |
## 🛡️ Best Practices
### 1. Always Use Try-Finally
```typescript ```typescript
async function safePdfGeneration() {
let smartPdf: SmartPdf; let smartPdf: SmartPdf;
try { try {
smartPdf = await SmartPdf.create(); smartPdf = await SmartPdf.create();
await smartPdf.start(); await smartPdf.start();
const pdf = await smartPdf.getA4PdfResultForHtmlString('<h1>Hello</h1>'); // Your PDF operations
// Process PDF...
} catch (error) {
console.error('PDF generation failed:', error);
// Handle error appropriately
} finally { } finally {
// Always cleanup
if (smartPdf) { if (smartPdf) {
await smartPdf.stop(); await smartPdf.stop(); // Always cleanup!
}
} }
} }
``` ```
### IPdf Interface ### 2. Optimize HTML for PDFs
The `IPdf` interface represents a PDF with its metadata:
```typescript
const optimizedHtml = `
<style>
/* Use print-friendly styles */
@media print {
.no-print { display: none; }
}
/* Avoid page breaks in wrong places */
h1, h2, h3 { page-break-after: avoid; }
table { page-break-inside: avoid; }
</style>
${yourContent}
`;
```
### 3. Handle Large Documents
For documents with many pages:
```typescript
// Process in batches
const pages = await smartPdf.convertPDFToPngBytes(largePdf.buffer);
for (let i = 0; i < pages.length; i += 10) {
const batch = pages.slice(i, i + 10);
await processBatch(batch);
}
```
## 🎯 Advanced Usage
### 🌐 Custom Browser Instance
Bring your own Puppeteer instance:
```typescript
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-dev-shm-usage']
});
const smartPdf = await SmartPdf.create();
await smartPdf.start(browser);
// SmartPdf won't close your browser
await smartPdf.stop();
await browser.close(); // You manage it
```
### ⚡ Parallel Processing
Process multiple PDFs concurrently:
```typescript
const urls = ['https://example1.com', 'https://example2.com', 'https://example3.com'];
const pdfs = await Promise.all(
urls.map(url => smartPdf.getFullWebsiteAsSinglePdf(url))
);
// Or with multiple instances for maximum performance
const instances = await Promise.all(
Array(3).fill(null).map(() => SmartPdf.create())
);
await Promise.all(instances.map(i => i.start()));
// Process in parallel across instances
const results = await Promise.all(
urls.map((url, i) => instances[i % instances.length].getFullWebsiteAsSinglePdf(url))
);
// Cleanup all instances
await Promise.all(instances.map(i => i.stop()));
```
## 📝 API Reference
### Class: SmartPdf
#### Static Methods
- `create(options?: ISmartPdfOptions)` - Create a new SmartPdf instance
- `getScaleForDPI(dpi: number)` - Calculate scale factor for desired DPI
#### Instance Methods
- `start(browser?: Browser)` - Start the PDF server
- `stop()` - Stop the PDF server
- `getA4PdfResultForHtmlString(html: string)` - Generate A4 PDF from HTML
- `getPdfResultForWebsite(url: string)` - Generate A4 PDF from website
- `getFullWebsiteAsSinglePdf(url: string)` - Capture full webpage as PDF
- `mergePdfs(buffers: Uint8Array[])` - Merge multiple PDFs
- `readFileToPdfObject(path: string)` - Read PDF file from disk
- `extractTextFromPdfBuffer(buffer: Buffer)` - Extract text from PDF
- `convertPDFToPngBytes(buffer: Uint8Array, options?)` - Convert to PNG
- `convertPDFToWebpBytes(buffer: Uint8Array, options?)` - Convert to WebP
- `convertPDFToJpegBytes(buffer: Uint8Array, options?)` - Convert to JPEG
### Interface: IPdf
```typescript ```typescript
interface IPdf { interface IPdf {
name: string; // Filename of the PDF name: string; // Filename
buffer: Buffer; // PDF content as buffer buffer: Buffer; // PDF content
id: string | null; // Unique identifier id: string | null; // Unique identifier
metadata?: { metadata?: {
textExtraction?: string; // Extracted text content textExtraction?: string; // Extracted text
}; };
} }
``` ```
## Best Practices ## 🤝 Contributing
1. **Always start and stop**: Initialize with `start()` and cleanup with `stop()` to properly manage resources. We love contributions! Please feel free to submit a Pull Request.
2. **Port management**: Use the automatic port allocation feature to avoid conflicts when running multiple instances.
3. **Error handling**: Always implement proper error handling as PDF generation can fail due to various reasons.
4. **Resource cleanup**: Ensure `stop()` is called even if an error occurs to prevent memory leaks.
5. **HTML optimization**: When creating PDFs from HTML, ensure your HTML is well-formed and CSS is embedded or inlined.
## License and Legal Information ## License and Legal Information

View File

@@ -15,6 +15,13 @@ function ensureDir(dirPath: string): void {
} }
} }
// Clean test results directory at start
const testResultsDir = path.join('.nogit', 'testresults');
if (fs.existsSync(testResultsDir)) {
fs.rmSync(testResultsDir, { recursive: true, force: true });
}
ensureDir(testResultsDir);
tap.test('should create a valid instance of SmartPdf', async () => { tap.test('should create a valid instance of SmartPdf', async () => {
testSmartPdf = new smartpdf.SmartPdf(); testSmartPdf = new smartpdf.SmartPdf();
expect(testSmartPdf).toBeInstanceOf(smartpdf.SmartPdf); expect(testSmartPdf).toBeInstanceOf(smartpdf.SmartPdf);
@@ -65,19 +72,215 @@ tap.test('should create PNG images from combined PDF using Puppeteer conversion'
}); });
tap.test('should store PNG results from both conversion functions in .nogit/testresults', async () => { tap.test('should store PNG results from both conversion functions in .nogit/testresults', async () => {
const testResultsDir = path.join('.nogit', 'testresults');
ensureDir(testResultsDir);
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/combined.pdf'); const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/combined.pdf');
// Convert using Puppeteer-based function and store images // Convert using Puppeteer-based function and store images
const imagesPuppeteer = await testSmartPdf.convertPDFToPngBytes(pdfObject.buffer); const imagesPuppeteer = await testSmartPdf.convertPDFToPngBytes(pdfObject.buffer);
imagesPuppeteer.forEach((img, index) => { imagesPuppeteer.forEach((img, index) => {
const filePath = path.join(testResultsDir, `puppeteer_method_page_${index + 1}.png`); const filePath = path.join(testResultsDir, `png_combined_page${index + 1}.png`);
fs.writeFileSync(filePath, Buffer.from(img)); fs.writeFileSync(filePath, Buffer.from(img));
}); });
}); });
tap.test('should create WebP preview images from PDF', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
const webpPreviews = await testSmartPdf.convertPDFToWebpBytes(pdfObject.buffer);
expect(webpPreviews.length).toBeGreaterThan(0);
console.log('WebP preview sizes:', webpPreviews.map(img => img.length));
// Also create PNG previews for comparison
const pngPreviews = await testSmartPdf.convertPDFToPngBytes(pdfObject.buffer);
console.log('PNG preview sizes:', pngPreviews.map(img => img.length));
// Save the first page as both WebP and PNG preview
fs.writeFileSync(path.join(testResultsDir, 'webp_default_page1.webp'), Buffer.from(webpPreviews[0]));
fs.writeFileSync(path.join(testResultsDir, 'png_default_page1.png'), Buffer.from(pngPreviews[0]));
});
tap.test('should create WebP previews with custom scale and quality', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Create smaller previews with lower quality for thumbnails
const thumbnails = await testSmartPdf.convertPDFToWebpBytes(pdfObject.buffer, {
scale: 0.5, // Create readable thumbnails at ~36 DPI
quality: 70
});
expect(thumbnails.length).toBeGreaterThan(0);
console.log('Thumbnail sizes:', thumbnails.map(img => img.length));
// Save thumbnails
thumbnails.forEach((thumb, index) => {
fs.writeFileSync(path.join(testResultsDir, `webp_thumbnail_page${index + 1}.webp`), Buffer.from(thumb));
});
});
tap.test('should create WebP previews with max dimensions', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Create previews with maximum dimensions (will use high scale but constrain to max size)
const constrainedPreviews = await testSmartPdf.convertPDFToWebpBytes(pdfObject.buffer, {
scale: smartpdf.SmartPdf.SCALE_HIGH, // Start with high quality
quality: 90,
maxWidth: 800,
maxHeight: 1000
});
expect(constrainedPreviews.length).toBeGreaterThan(0);
console.log('Constrained preview sizes:', constrainedPreviews.map(img => img.length));
// Save constrained preview
fs.writeFileSync(path.join(testResultsDir, 'webp_constrained_page1.webp'), Buffer.from(constrainedPreviews[0]));
});
tap.test('should verify WebP files are smaller than PNG', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Generate both PNG and WebP versions at the same scale for fair comparison
const comparisonScale = smartpdf.SmartPdf.SCALE_HIGH; // Both use 3.0 scale
const pngImages = await testSmartPdf.convertPDFToPngBytes(pdfObject.buffer, {
scale: comparisonScale
});
const webpImages = await testSmartPdf.convertPDFToWebpBytes(pdfObject.buffer, {
scale: comparisonScale,
quality: 85
});
expect(pngImages.length).toEqual(webpImages.length);
// Compare sizes
let totalPngSize = 0;
let totalWebpSize = 0;
pngImages.forEach((png, index) => {
const pngSize = png.length;
const webpSize = webpImages[index].length;
totalPngSize += pngSize;
totalWebpSize += webpSize;
const reduction = ((pngSize - webpSize) / pngSize * 100).toFixed(1);
console.log(`Page ${index + 1}: PNG=${pngSize} bytes, WebP=${webpSize} bytes, Reduction=${reduction}%`);
// Save comparison files
fs.writeFileSync(path.join(testResultsDir, `comparison_png_page${index + 1}.png`), Buffer.from(png));
fs.writeFileSync(path.join(testResultsDir, `comparison_webp_page${index + 1}.webp`), Buffer.from(webpImages[index]));
});
const totalReduction = ((totalPngSize - totalWebpSize) / totalPngSize * 100).toFixed(1);
console.log(`Total size reduction: ${totalReduction}% (PNG: ${totalPngSize} bytes, WebP: ${totalWebpSize} bytes)`);
// WebP should be smaller
expect(totalWebpSize).toBeLessThan(totalPngSize);
});
tap.test('should create JPEG images from PDF', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
const jpegImages = await testSmartPdf.convertPDFToJpegBytes(pdfObject.buffer);
expect(jpegImages.length).toBeGreaterThan(0);
console.log('JPEG image sizes:', jpegImages.map(img => img.length));
// Save the first page as JPEG
fs.writeFileSync(path.join(testResultsDir, 'jpeg_default_page1.jpg'), Buffer.from(jpegImages[0]));
});
tap.test('should create JPEG images with different quality levels', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Test different quality levels
const qualityLevels = [50, 70, 85, 95];
for (const quality of qualityLevels) {
const jpegImages = await testSmartPdf.convertPDFToJpegBytes(pdfObject.buffer, {
scale: smartpdf.SmartPdf.SCALE_HIGH,
quality: quality
});
console.log(`JPEG quality ${quality}: ${jpegImages[0].length} bytes`);
// Save first page at each quality level
fs.writeFileSync(
path.join(testResultsDir, `jpeg_quality_${quality}_page1.jpg`),
Buffer.from(jpegImages[0])
);
}
});
tap.test('should create JPEG images with max dimensions', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Create constrained JPEG images
const constrainedJpegs = await testSmartPdf.convertPDFToJpegBytes(pdfObject.buffer, {
scale: smartpdf.SmartPdf.SCALE_HIGH,
quality: 85,
maxWidth: 1200,
maxHeight: 1200
});
expect(constrainedJpegs.length).toBeGreaterThan(0);
console.log('Constrained JPEG sizes:', constrainedJpegs.map(img => img.length));
// Save constrained JPEG
fs.writeFileSync(path.join(testResultsDir, 'jpeg_constrained_page1.jpg'), Buffer.from(constrainedJpegs[0]));
});
tap.test('should compare file sizes between PNG, WebP, and JPEG', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Generate all three formats at the same scale
const comparisonScale = smartpdf.SmartPdf.SCALE_HIGH; // 3.0 scale
const pngImages = await testSmartPdf.convertPDFToPngBytes(pdfObject.buffer, {
scale: comparisonScale
});
const webpImages = await testSmartPdf.convertPDFToWebpBytes(pdfObject.buffer, {
scale: comparisonScale,
quality: 85
});
const jpegImages = await testSmartPdf.convertPDFToJpegBytes(pdfObject.buffer, {
scale: comparisonScale,
quality: 85
});
expect(pngImages.length).toEqual(webpImages.length);
expect(pngImages.length).toEqual(jpegImages.length);
// Compare sizes
let totalPngSize = 0;
let totalWebpSize = 0;
let totalJpegSize = 0;
pngImages.forEach((png, index) => {
const pngSize = png.length;
const webpSize = webpImages[index].length;
const jpegSize = jpegImages[index].length;
totalPngSize += pngSize;
totalWebpSize += webpSize;
totalJpegSize += jpegSize;
const webpReduction = ((pngSize - webpSize) / pngSize * 100).toFixed(1);
const jpegReduction = ((pngSize - jpegSize) / pngSize * 100).toFixed(1);
console.log(`Page ${index + 1}:`);
console.log(` PNG: ${pngSize} bytes`);
console.log(` WebP: ${webpSize} bytes (${webpReduction}% smaller than PNG)`);
console.log(` JPEG: ${jpegSize} bytes (${jpegReduction}% smaller than PNG)`);
});
const totalWebpReduction = ((totalPngSize - totalWebpSize) / totalPngSize * 100).toFixed(1);
const totalJpegReduction = ((totalPngSize - totalJpegSize) / totalPngSize * 100).toFixed(1);
console.log('\nTotal size comparison:');
console.log(`PNG: ${totalPngSize} bytes`);
console.log(`WebP: ${totalWebpSize} bytes (${totalWebpReduction}% reduction)`);
console.log(`JPEG: ${totalJpegSize} bytes (${totalJpegReduction}% reduction)`);
// JPEG and WebP should both be smaller than PNG
expect(totalJpegSize).toBeLessThan(totalPngSize);
expect(totalWebpSize).toBeLessThan(totalPngSize);
});
tap.test('should close the SmartPdf instance properly', async () => { tap.test('should close the SmartPdf instance properly', async () => {
await testSmartPdf.stop(); await testSmartPdf.stop();
}); });

View File

@@ -14,6 +14,19 @@ export interface ISmartPdfOptions {
} }
export class SmartPdf { export class SmartPdf {
// STATIC SCALE CONSTANTS
public static readonly SCALE_SCREEN = 2.0; // ~144 DPI - Good for screen display
public static readonly SCALE_HIGH = 3.0; // ~216 DPI - High quality (default)
public static readonly SCALE_PRINT = 6.0; // ~432 DPI - Print quality
/**
* Calculate scale factor for desired DPI
* PDF.js default is 72 DPI, so scale = desiredDPI / 72
*/
public static getScaleForDPI(dpi: number): number {
return dpi / 72;
}
// STATIC // STATIC
public static async create(optionsArg?: ISmartPdfOptions) { public static async create(optionsArg?: ISmartPdfOptions) {
const smartpdfInstance = new SmartPdf(optionsArg); const smartpdfInstance = new SmartPdf(optionsArg);
@@ -318,10 +331,14 @@ export class SmartPdf {
*/ */
public async convertPDFToPngBytes( public async convertPDFToPngBytes(
pdfBytes: Uint8Array, pdfBytes: Uint8Array,
options: { width?: number; height?: number; quality?: number } = {} options: {
scale?: number; // Scale factor for output size (default: 3.0 for 216 DPI)
maxWidth?: number; // Maximum width in pixels (optional)
maxHeight?: number; // Maximum height in pixels (optional)
} = {}
): Promise<Uint8Array[]> { ): Promise<Uint8Array[]> {
// Note: options.width, options.height, and options.quality are not applied here, // Set default scale for higher quality output (3.0 = ~216 DPI)
// as the rendered canvas size is determined by the PDF page dimensions. const scale = options.scale || 3.0;
// Create a new page using the headless browser. // Create a new page using the headless browser.
const page = await this.headlessBrowser.newPage(); const page = await this.headlessBrowser.newPage();
@@ -354,12 +371,31 @@ export class SmartPdf {
const numPages = pdf.numPages; const numPages = pdf.numPages;
for (let pageNum = 1; pageNum <= numPages; pageNum++) { for (let pageNum = 1; pageNum <= numPages; pageNum++) {
const page = await pdf.getPage(pageNum); const page = await pdf.getPage(pageNum);
const viewport = page.getViewport({ scale: 1.0 }); // Apply scale factor to viewport
const viewport = page.getViewport({ scale: ${scale} });
// Apply max width/height constraints if specified
let finalScale = ${scale};
${options.maxWidth ? `
if (viewport.width > ${options.maxWidth}) {
finalScale = ${options.maxWidth} / (viewport.width / ${scale});
}` : ''}
${options.maxHeight ? `
if (viewport.height > ${options.maxHeight}) {
const heightScale = ${options.maxHeight} / (viewport.height / ${scale});
finalScale = Math.min(finalScale, heightScale);
}` : ''}
// Get final viewport with adjusted scale
const finalViewport = page.getViewport({ scale: finalScale });
const canvas = document.createElement('canvas'); const canvas = document.createElement('canvas');
const context = canvas.getContext('2d'); const context = canvas.getContext('2d');
canvas.width = viewport.width; canvas.width = finalViewport.width;
canvas.height = viewport.height; canvas.height = finalViewport.height;
await page.render({ canvasContext: context, viewport: viewport }).promise; canvas.setAttribute('data-page', pageNum);
await page.render({ canvasContext: context, viewport: finalViewport }).promise;
document.body.appendChild(canvas); document.body.appendChild(canvas);
} }
window.renderComplete = true; window.renderComplete = true;
@@ -391,4 +427,163 @@ export class SmartPdf {
await page.close(); await page.close();
return pngBuffers; return pngBuffers;
} }
/**
* Converts a PDF to WebP bytes for each page.
* This method creates web-optimized images using WebP format.
* WebP provides 25-35% better compression than JPEG/PNG while maintaining quality.
*/
public async convertPDFToWebpBytes(
pdfBytes: Uint8Array,
options: {
scale?: number; // Scale factor for preview size (default: 3.0 for 216 DPI)
quality?: number; // WebP quality 0-100 (default: 85)
maxWidth?: number; // Maximum width in pixels (optional)
maxHeight?: number; // Maximum height in pixels (optional)
} = {}
): Promise<Uint8Array[]> {
// Set default options for higher quality output (3.0 = ~216 DPI)
const scale = options.scale || 3.0;
const quality = options.quality || 85;
// Create a new page using the headless browser
const page = await this.headlessBrowser.newPage();
// Prepare PDF data as a base64 string
const base64Pdf: string = Buffer.from(pdfBytes).toString('base64');
// HTML template that loads PDF.js and renders the PDF with scaling
const htmlTemplate: string = `
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>PDF to WebP Preview Converter</title>
<style>
body { margin: 0; }
canvas { display: block; margin: 10px auto; }
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.16.105/pdf.min.js"></script>
</head>
<body>
<script>
(async function() {
pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.16.105/pdf.worker.min.js';
const pdfData = "__PDF_DATA__";
const raw = atob(pdfData);
const pdfArray = new Uint8Array([...raw].map(c => c.charCodeAt(0)));
const loadingTask = pdfjsLib.getDocument({data: pdfArray});
const pdf = await loadingTask.promise;
const numPages = pdf.numPages;
for (let pageNum = 1; pageNum <= numPages; pageNum++) {
const page = await pdf.getPage(pageNum);
// Apply scale factor to viewport
const viewport = page.getViewport({ scale: ${scale} });
// Apply max width/height constraints if specified
let finalScale = ${scale};
${options.maxWidth ? `
if (viewport.width > ${options.maxWidth}) {
finalScale = ${options.maxWidth} / (viewport.width / ${scale});
}` : ''}
${options.maxHeight ? `
if (viewport.height > ${options.maxHeight}) {
const heightScale = ${options.maxHeight} / (viewport.height / ${scale});
finalScale = Math.min(finalScale, heightScale);
}` : ''}
// Get final viewport with adjusted scale
const finalViewport = page.getViewport({ scale: finalScale });
const canvas = document.createElement('canvas');
const context = canvas.getContext('2d');
canvas.width = finalViewport.width;
canvas.height = finalViewport.height;
canvas.setAttribute('data-page', pageNum);
await page.render({ canvasContext: context, viewport: finalViewport }).promise;
document.body.appendChild(canvas);
}
window.renderComplete = true;
})();
</script>
</body>
</html>
`;
// Replace the placeholder with the actual base64 PDF data
const htmlContent: string = htmlTemplate.replace("__PDF_DATA__", base64Pdf);
// Set the page content
await page.setContent(htmlContent, { waitUntil: 'networkidle0' });
// Wait until the PDF.js rendering is complete
await page.waitForFunction(() => (window as any).renderComplete === true, { timeout: 30000 });
// Query all canvas elements (each representing a rendered PDF page)
const canvasElements = await page.$$('canvas');
const webpBuffers: Uint8Array[] = [];
for (const canvasElement of canvasElements) {
// Screenshot the canvas element as WebP
const screenshotBuffer = (await canvasElement.screenshot({
type: 'webp',
quality: quality,
encoding: 'binary'
})) as Buffer;
webpBuffers.push(new Uint8Array(screenshotBuffer));
}
await page.close();
return webpBuffers;
}
/**
* Converts a PDF to progressive JPEG bytes for each page.
* This method creates progressive JPEG images that load in multiple passes,
* showing a low-quality preview first, then progressively improving.
* Uses SmartJimp for true progressive JPEG encoding.
*/
public async convertPDFToJpegBytes(
pdfBytes: Uint8Array,
options: {
scale?: number; // Scale factor for output size (default: 3.0 for 216 DPI)
quality?: number; // JPEG quality 0-100 (default: 85)
maxWidth?: number; // Maximum width in pixels (optional)
maxHeight?: number; // Maximum height in pixels (optional)
} = {}
): Promise<Uint8Array[]> {
// First, convert PDF to PNG using our existing method
const pngBuffers = await this.convertPDFToPngBytes(pdfBytes, {
scale: options.scale,
maxWidth: options.maxWidth,
maxHeight: options.maxHeight
});
// Initialize SmartJimp in sharp mode for progressive JPEG support
const smartJimpInstance = new plugins.smartjimp.SmartJimp({ mode: 'sharp' });
// Convert each PNG to progressive JPEG
const jpegBuffers: Uint8Array[] = [];
const quality = options.quality || 85;
for (const pngBuffer of pngBuffers) {
// Convert PNG buffer to progressive JPEG
const jpegBuffer = await smartJimpInstance.computeAssetVariation(
Buffer.from(pngBuffer),
{
format: 'jpeg',
progressive: true,
// SmartJimp uses a different quality scale, need to check if adjustment is needed
// For now, pass through the quality value
quality
}
);
jpegBuffers.push(new Uint8Array(jpegBuffer));
}
return jpegBuffers;
}
} }

View File

@@ -13,6 +13,7 @@ import * as smartpath from '@push.rocks/smartpath';
import * as smartpuppeteer from '@push.rocks/smartpuppeteer'; import * as smartpuppeteer from '@push.rocks/smartpuppeteer';
import * as smartnetwork from '@push.rocks/smartnetwork'; import * as smartnetwork from '@push.rocks/smartnetwork';
import * as smartunique from '@push.rocks/smartunique'; import * as smartunique from '@push.rocks/smartunique';
import * as smartjimp from '@push.rocks/smartjimp';
export { export {
smartbuffer, smartbuffer,
@@ -23,6 +24,7 @@ export {
smartpuppeteer, smartpuppeteer,
smartunique, smartunique,
smartnetwork, smartnetwork,
smartjimp,
}; };
// tsclass scope // tsclass scope