Juergen Kunz 6a4aeed3e1 BREAKING CHANGE(smartpdf): improve image generation quality and API consistency
- Renamed convertPDFToWebpPreviews to convertPDFToWebpBytes for consistency
- Added configurable scale options with DPI support
- Changed default scale to 3.0 (216 DPI) for better quality
- Added DPI helper methods and scale constants
2025-08-02 12:37:48 +00:00
2024-04-26 13:29:32 +02:00
2022-10-26 23:04:59 +02:00
2021-03-05 15:38:11 +00:00
2022-06-12 19:26:49 +02:00
2024-04-14 18:07:39 +02:00
2024-04-14 18:07:39 +02:00
2024-03-19 17:52:27 +01:00

@push.rocks/smartpdf

Create PDFs on the fly from HTML, websites, or existing PDFs with advanced features like text extraction, PDF merging, and PNG conversion.

Install

To install @push.rocks/smartpdf, use npm or yarn:

npm install @push.rocks/smartpdf --save

Or with yarn:

yarn add @push.rocks/smartpdf

Requirements

This package requires a Chrome or Chromium installation to be available on the system, as it uses Puppeteer for rendering. The package will automatically detect and use the appropriate executable.

Usage

@push.rocks/smartpdf provides a powerful interface for PDF generation and manipulation. All examples use ESM syntax and TypeScript.

Getting Started

First, import the necessary classes:

import { SmartPdf, IPdf } from '@push.rocks/smartpdf';

Basic Setup with Automatic Port Allocation

SmartPdf automatically finds an available port between 20000-30000 for its internal server:

async function setupSmartPdf() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
  
  // Your PDF operations here
  
  await smartPdf.stop();
}

Advanced Setup with Custom Port Configuration

You can specify custom port settings to avoid conflicts or meet specific requirements:

// Use a specific port
const smartPdf = await SmartPdf.create({ port: 3000 });

// Use a custom port range
const smartPdf = await SmartPdf.create({ 
  portRangeStart: 4000, 
  portRangeEnd: 5000 
});

// The server will find an available port in your specified range
await smartPdf.start();
console.log(`Server running on port: ${smartPdf.serverPort}`);

Creating PDFs from HTML Strings

Generate PDFs from HTML content with full CSS support:

async function createPdfFromHtml() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
  
  const htmlString = `
    <!DOCTYPE html>
    <html>
      <head>
        <style>
          body { font-family: Arial, sans-serif; margin: 40px; }
          h1 { color: #333; }
          .highlight { background-color: yellow; }
        </style>
      </head>
      <body>
        <h1>Professional PDF Document</h1>
        <p>This PDF was generated from <span class="highlight">HTML content</span>.</p>
      </body>
    </html>
  `;
  
  const pdf: IPdf = await smartPdf.getA4PdfResultForHtmlString(htmlString);
  
  // pdf.buffer contains the PDF data
  // pdf.id contains a unique identifier
  // pdf.name contains the filename
  // pdf.metadata contains additional information like extracted text
  
  await smartPdf.stop();
}

Generating PDFs from Websites

Capture web pages as PDFs with two different approaches:

A4 Format PDF from Website

Captures the viewable area formatted for A4 paper:

async function createA4PdfFromWebsite() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
  
  const pdf: IPdf = await smartPdf.getPdfResultForWebsite('https://example.com');
  
  // Save to file
  await fs.writeFile('website-a4.pdf', pdf.buffer);
  
  await smartPdf.stop();
}

Full Webpage as Single PDF

Captures the entire webpage in a single PDF, regardless of length:

async function createFullPdfFromWebsite() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
  
  const pdf: IPdf = await smartPdf.getFullWebsiteAsSinglePdf('https://example.com');
  
  // This captures the entire scrollable area
  await fs.writeFile('website-full.pdf', pdf.buffer);
  
  await smartPdf.stop();
}

Merging Multiple PDFs

Combine multiple PDF files into a single document:

async function mergePdfs() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
  
  // Create or load your PDFs
  const pdf1 = await smartPdf.getA4PdfResultForHtmlString('<h1>Document 1</h1>');
  const pdf2 = await smartPdf.getA4PdfResultForHtmlString('<h1>Document 2</h1>');
  const pdf3 = await smartPdf.readFileToPdfObject('./existing-document.pdf');
  
  // Merge PDFs - order matters!
  const mergedPdf: Uint8Array = await smartPdf.mergePdfs([
    pdf1.buffer,
    pdf2.buffer,
    pdf3.buffer
  ]);
  
  // Save the merged PDF
  await fs.writeFile('merged-document.pdf', mergedPdf);
  
  await smartPdf.stop();
}

Reading PDFs and Extracting Text

Extract text content from existing PDFs:

async function extractTextFromPdf() {
  const smartPdf = await SmartPdf.create();
  
  // Read PDF from disk
  const pdf: IPdf = await smartPdf.readFileToPdfObject('/path/to/document.pdf');
  
  // Extract all text
  const extractedText = await smartPdf.extractTextFromPdfBuffer(pdf.buffer);
  console.log('Extracted text:', extractedText);
  
  // The pdf object also contains metadata with text extraction
  console.log('Metadata:', pdf.metadata);
}

Converting PDF to PNG Images

Convert each page of a PDF into PNG images with configurable quality:

async function convertPdfToPng() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
  
  // Load a PDF
  const pdf = await smartPdf.readFileToPdfObject('./document.pdf');
  
  // Convert to PNG images with default high quality (216 DPI)
  const pngImages: Uint8Array[] = await smartPdf.convertPDFToPngBytes(pdf.buffer);
  
  // Or specify custom scale/DPI
  const highResPngs = await smartPdf.convertPDFToPngBytes(pdf.buffer, {
    scale: SmartPdf.SCALE_PRINT,  // 6.0 scale = ~432 DPI
    maxWidth: 3000,               // Optional: limit maximum width
    maxHeight: 4000               // Optional: limit maximum height
  });
  
  // Save each page as a PNG
  pngImages.forEach((pngBuffer, index) => {
    fs.writeFileSync(`page-${index + 1}.png`, pngBuffer);
  });
  
  await smartPdf.stop();
}

Understanding Scale and DPI

PDF.js renders at 72 DPI by default. Use these scale factors for different quality levels:

  • SmartPdf.SCALE_SCREEN (2.0): ~144 DPI - Good for screen display
  • SmartPdf.SCALE_HIGH (3.0): ~216 DPI - High quality (default)
  • SmartPdf.SCALE_PRINT (6.0): ~432 DPI - Print quality
  • Custom DPI: scale = SmartPdf.getScaleForDPI(300) for 300 DPI

Converting PDF to WebP Images

Generate web-optimized images using WebP format. WebP provides 25-35% better compression than PNG/JPEG while maintaining quality:

async function createWebPImages() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
  
  // Load a PDF
  const pdf = await smartPdf.readFileToPdfObject('./document.pdf');
  
  // Create high-quality WebP images (default: 3.0 scale = 216 DPI, 85% quality)
  const webpImages = await smartPdf.convertPDFToWebpBytes(pdf.buffer);
  
  // Save WebP images
  webpImages.forEach((webpBuffer, index) => {
    fs.writeFileSync(`page-${index + 1}.webp`, webpBuffer);
  });
  
  await smartPdf.stop();
}

Creating Thumbnails

Generate small thumbnail images for PDF galleries or document lists:

async function createThumbnails() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
  
  const pdf = await smartPdf.readFileToPdfObject('./document.pdf');
  
  // Create small thumbnails (0.5 scale = ~36 DPI, 70% quality)
  const thumbnails = await smartPdf.convertPDFToWebpBytes(pdf.buffer, {
    scale: 0.5,      // Small readable thumbnails
    quality: 70      // Lower quality for smaller files
  });
  
  // Save thumbnails
  thumbnails.forEach((thumb, index) => {
    fs.writeFileSync(`thumb-${index + 1}.webp`, thumb);
  });
  
  await smartPdf.stop();
}

Constrained Dimensions

Create previews with maximum width/height constraints, useful for responsive layouts:

async function createConstrainedPreviews() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
  
  const pdf = await smartPdf.readFileToPdfObject('./document.pdf');
  
  // Create previews that fit within 800x600 pixels
  const previews = await smartPdf.convertPDFToWebpBytes(pdf.buffer, {
    scale: 1.0,          // Start with full size
    quality: 90,         // High quality
    maxWidth: 800,       // Maximum 800px wide
    maxHeight: 600       // Maximum 600px tall
  });
  
  // The method automatically scales down to fit within constraints
  previews.forEach((preview, index) => {
    fs.writeFileSync(`preview-constrained-${index + 1}.webp`, preview);
  });
  
  await smartPdf.stop();
}

WebP Options

The convertPDFToWebpBytes method accepts these options:

  • scale: Scale factor for preview size (default: 3.0 for ~216 DPI)
  • quality: WebP compression quality (default: 85, range: 0-100)
  • maxWidth: Maximum width in pixels (optional)
  • maxHeight: Maximum height in pixels (optional)

Common scale values:

  • 0.5: Thumbnails (~36 DPI)
  • 2.0: Screen display (~144 DPI)
  • 3.0: High quality (~216 DPI, default)
  • 6.0: Print quality (~432 DPI)

Using External Browser Instance

For advanced use cases, you can provide your own Puppeteer browser instance:

import puppeteer from 'puppeteer';

async function useExternalBrowser() {
  // Create your own browser instance with custom options
  const browser = await puppeteer.launch({
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });
  
  const smartPdf = await SmartPdf.create();
  await smartPdf.start(browser);
  
  // Use SmartPdf normally
  const pdf = await smartPdf.getA4PdfResultForHtmlString('<h1>Hello</h1>');
  
  // SmartPdf will not close the browser when stopping
  await smartPdf.stop();
  
  // You control the browser lifecycle
  await browser.close();
}

Running Multiple Instances

Thanks to automatic port allocation, you can run multiple SmartPdf instances simultaneously:

async function runMultipleInstances() {
  // Each instance automatically finds its own free port
  const instance1 = await SmartPdf.create();
  const instance2 = await SmartPdf.create();
  const instance3 = await SmartPdf.create();
  
  // Start all instances
  await Promise.all([
    instance1.start(),
    instance2.start(),
    instance3.start()
  ]);
  
  console.log(`Instance 1 running on port: ${instance1.serverPort}`);
  console.log(`Instance 2 running on port: ${instance2.serverPort}`);
  console.log(`Instance 3 running on port: ${instance3.serverPort}`);
  
  // Use instances independently
  const pdfs = await Promise.all([
    instance1.getA4PdfResultForHtmlString('<h1>PDF 1</h1>'),
    instance2.getA4PdfResultForHtmlString('<h1>PDF 2</h1>'),
    instance3.getA4PdfResultForHtmlString('<h1>PDF 3</h1>')
  ]);
  
  // Clean up all instances
  await Promise.all([
    instance1.stop(),
    instance2.stop(),
    instance3.stop()
  ]);
}

Error Handling

Always wrap SmartPdf operations in try-catch blocks and ensure proper cleanup:

async function safePdfGeneration() {
  let smartPdf: SmartPdf;
  
  try {
    smartPdf = await SmartPdf.create();
    await smartPdf.start();
    
    const pdf = await smartPdf.getA4PdfResultForHtmlString('<h1>Hello</h1>');
    // Process PDF...
    
  } catch (error) {
    console.error('PDF generation failed:', error);
    // Handle error appropriately
  } finally {
    // Always cleanup
    if (smartPdf) {
      await smartPdf.stop();
    }
  }
}

IPdf Interface

The IPdf interface represents a PDF with its metadata:

interface IPdf {
  name: string;           // Filename of the PDF
  buffer: Buffer;         // PDF content as buffer
  id: string | null;      // Unique identifier
  metadata?: {
    textExtraction?: string;  // Extracted text content
  };
}

Best Practices

  1. Always start and stop: Initialize with start() and cleanup with stop() to properly manage resources.
  2. Port management: Use the automatic port allocation feature to avoid conflicts when running multiple instances.
  3. Error handling: Always implement proper error handling as PDF generation can fail due to various reasons.
  4. Resource cleanup: Ensure stop() is called even if an error occurs to prevent memory leaks.
  5. HTML optimization: When creating PDFs from HTML, ensure your HTML is well-formed and CSS is embedded or inlined.

This repository contains open-source code that is licensed under the MIT License. A copy of the MIT License can be found in the license file within this repository.

Please note: The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.

Trademarks

This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH and are not included within the scope of the MIT license granted herein. Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines, and any usage must be approved in writing by Task Venture Capital GmbH.

Company Information

Task Venture Capital GmbH
Registered at District court Bremen HRB 35230 HB, Germany

For any legal inquiries or if you require further information, please contact us via email at hello@task.vc.

By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.

Description
A library for creating PDFs dynamically from HTML or websites with additional features like merging PDFs.
Readme 2.5 MiB
Languages
TypeScript 100%