feat(smartpdf): add automatic port allocation and multi-instance support

2025-08-01 16:09:17 +00:00
parent f535eacd97
commit a4c3415838
7 changed files with 2102 additions and 2176 deletions
--- a/readme.md
+++ b/readme.md
@@ -1,8 +1,8 @@
 # @push.rocks/smartpdf
-Create PDFs on the fly
+Create PDFs on the fly from HTML, websites, or existing PDFs with advanced features like text extraction, PDF merging, and PNG conversion.

 ## Install
-To install `@push.rocks/smartpdf`, use the following command with npm:
+To install `@push.rocks/smartpdf`, use npm or yarn:

 ```bash
 npm install @push.rocks/smartpdf --save
@@ -14,87 +14,304 @@ Or with yarn:
 yarn add @push.rocks/smartpdf
 ```

+## Requirements
+This package requires a Chrome or Chromium installation to be available on the system, as it uses Puppeteer for rendering. The package will automatically detect and use the appropriate executable.
+
 ## Usage
-This documentation will guide you through using `@push.rocks/smartpdf` to create PDFs in various ways, such as from HTML strings or full web pages, and provides examples on how to merge multiple PDFs into one. Remember, all examples provided here use ESM syntax and TypeScript.
+`@push.rocks/smartpdf` provides a powerful interface for PDF generation and manipulation. All examples use ESM syntax and TypeScript.

 ### Getting Started
-First, ensure you have the package installed and you can import it into your TypeScript project:
+First, import the necessary classes:

 ```typescript
 import { SmartPdf, IPdf } from '@push.rocks/smartpdf';
 ```

-### Creating a PDF from an HTML String
-To create a PDF from a simple HTML string, you’ll need to instantiate `SmartPdf` and call `getA4PdfResultForHtmlString`.
+### Basic Setup with Automatic Port Allocation
+SmartPdf automatically finds an available port between 20000-30000 for its internal server:
+
+```typescript
+async function setupSmartPdf() {
+  const smartPdf = await SmartPdf.create();
+  await smartPdf.start();
+  
+  // Your PDF operations here
+  
+  await smartPdf.stop();
+}
+```
+
+### Advanced Setup with Custom Port Configuration
+You can specify custom port settings to avoid conflicts or meet specific requirements:
+
+```typescript
+// Use a specific port
+const smartPdf = await SmartPdf.create({ port: 3000 });
+
+// Use a custom port range
+const smartPdf = await SmartPdf.create({ 
+  portRangeStart: 4000, 
+  portRangeEnd: 5000 
+});
+
+// The server will find an available port in your specified range
+await smartPdf.start();
+console.log(`Server running on port: ${smartPdf.serverPort}`);
+```
+
+### Creating PDFs from HTML Strings
+Generate PDFs from HTML content with full CSS support:

 ```typescript
 async function createPdfFromHtml() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
-  const htmlString = `<h1>Hello World</h1>`;
+  
+  const htmlString = `
+    <!DOCTYPE html>
+    <html>
+      <head>
+        <style>
+          body { font-family: Arial, sans-serif; margin: 40px; }
+          h1 { color: #333; }
+          .highlight { background-color: yellow; }
+        </style>
+      </head>
+      <body>
+        <h1>Professional PDF Document</h1>
+        <p>This PDF was generated from <span class="highlight">HTML content</span>.</p>
+      </body>
+    </html>
+  `;
+  
  const pdf: IPdf = await smartPdf.getA4PdfResultForHtmlString(htmlString);
-  console.log(pdf.buffer); // This is your PDF buffer
+  
+  // pdf.buffer contains the PDF data
+  // pdf.id contains a unique identifier
+  // pdf.name contains the filename
+  // pdf.metadata contains additional information like extracted text
+  
  await smartPdf.stop();
 }
-createPdfFromHtml();
 ```

-### Generating a PDF from a Website
-You may want to capture a full webpage as a PDF. `SmartPdf` provides two methods to accomplish this. One captures the viewable area as an A4 pdf, and the other captures the entire webpage.
+### Generating PDFs from Websites
+Capture web pages as PDFs with two different approaches:

-#### A4 PDF from a Website
+#### A4 Format PDF from Website
+Captures the viewable area formatted for A4 paper:

 ```typescript
 async function createA4PdfFromWebsite() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
+  
  const pdf: IPdf = await smartPdf.getPdfResultForWebsite('https://example.com');
-  console.log(pdf.buffer); // PDF buffer of the webpage
+  
+  // Save to file
+  await fs.writeFile('website-a4.pdf', pdf.buffer);
+  
  await smartPdf.stop();
 }
-createA4PdfFromWebsite();
 ```

-#### Full Webpage as a Single PDF
+#### Full Webpage as Single PDF
+Captures the entire webpage in a single PDF, regardless of length:

 ```typescript
 async function createFullPdfFromWebsite() {
  const smartPdf = await SmartPdf.create();
  await smartPdf.start();
+  
  const pdf: IPdf = await smartPdf.getFullWebsiteAsSinglePdf('https://example.com');
-  console.log(pdf.buffer); // PDF buffer with the full webpage
+  
+  // This captures the entire scrollable area
+  await fs.writeFile('website-full.pdf', pdf.buffer);
+  
  await smartPdf.stop();
 }
-createFullPdfFromWebsite();
 ```

 ### Merging Multiple PDFs
-If you have multiple PDF objects (`IPdf`) that you wish to merge into a single PDF file, you can use the `mergePdfs` method.
+Combine multiple PDF files into a single document:

 ```typescript
 async function mergePdfs() {
  const smartPdf = await SmartPdf.create();
-  // Assume pdf1 and pdf2 are objects of type IPdf that you want to merge
-  const mergedPdf: IPdf = await smartPdf.mergePdfs([pdf1, pdf2]);
-  console.log(mergedPdf.buffer); // Buffer of the merged PDF
+  await smartPdf.start();
+  
+  // Create or load your PDFs
+  const pdf1 = await smartPdf.getA4PdfResultForHtmlString('<h1>Document 1</h1>');
+  const pdf2 = await smartPdf.getA4PdfResultForHtmlString('<h1>Document 2</h1>');
+  const pdf3 = await smartPdf.readFileToPdfObject('./existing-document.pdf');
+  
+  // Merge PDFs - order matters!
+  const mergedPdf: Uint8Array = await smartPdf.mergePdfs([
+    pdf1.buffer,
+    pdf2.buffer,
+    pdf3.buffer
+  ]);
+  
+  // Save the merged PDF
+  await fs.writeFile('merged-document.pdf', mergedPdf);
+  
+  await smartPdf.stop();
 }
-mergePdfs();
 ```

-### Reading PDF from Disk and Extracting Text
-To read a PDF from the disk and extract its text content:
+### Reading PDFs and Extracting Text
+Extract text content from existing PDFs:

 ```typescript
-async function readAndExtractFromPdf() {
+async function extractTextFromPdf() {
  const smartPdf = await SmartPdf.create();
-  const pdf: IPdf = await smartPdf.readFileToPdfObject('/path/to/your/pdf/file.pdf');
+  
+  // Read PDF from disk
+  const pdf: IPdf = await smartPdf.readFileToPdfObject('/path/to/document.pdf');
+  
+  // Extract all text
  const extractedText = await smartPdf.extractTextFromPdfBuffer(pdf.buffer);
-  console.log(extractedText);  // Extracted text from the PDF
+  console.log('Extracted text:', extractedText);
+  
+  // The pdf object also contains metadata with text extraction
+  console.log('Metadata:', pdf.metadata);
 }
-readAndExtractFromPdf();
 ```

-This guide provides a comprehensive overview of generating PDFs using `@push.rocks/smartpdf`. Remember to start and stop your `SmartPdf` instance to properly initialize and clean up resources, especially when working with server-side rendering or capturing web pages.
+### Converting PDF to PNG Images
+Convert each page of a PDF into PNG images:
+
+```typescript
+async function convertPdfToPng() {
+  const smartPdf = await SmartPdf.create();
+  await smartPdf.start();
+  
+  // Load a PDF
+  const pdf = await smartPdf.readFileToPdfObject('./document.pdf');
+  
+  // Convert to PNG images (one per page)
+  const pngImages: Uint8Array[] = await smartPdf.convertPDFToPngBytes(pdf.buffer);
+  
+  // Save each page as a PNG
+  pngImages.forEach((pngBuffer, index) => {
+    fs.writeFileSync(`page-${index + 1}.png`, pngBuffer);
+  });
+  
+  await smartPdf.stop();
+}
+```
+
+### Using External Browser Instance
+For advanced use cases, you can provide your own Puppeteer browser instance:
+
+```typescript
+import puppeteer from 'puppeteer';
+
+async function useExternalBrowser() {
+  // Create your own browser instance with custom options
+  const browser = await puppeteer.launch({
+    headless: true,
+    args: ['--no-sandbox', '--disable-setuid-sandbox']
+  });
+  
+  const smartPdf = await SmartPdf.create();
+  await smartPdf.start(browser);
+  
+  // Use SmartPdf normally
+  const pdf = await smartPdf.getA4PdfResultForHtmlString('<h1>Hello</h1>');
+  
+  // SmartPdf will not close the browser when stopping
+  await smartPdf.stop();
+  
+  // You control the browser lifecycle
+  await browser.close();
+}
+```
+
+### Running Multiple Instances
+Thanks to automatic port allocation, you can run multiple SmartPdf instances simultaneously:
+
+```typescript
+async function runMultipleInstances() {
+  // Each instance automatically finds its own free port
+  const instance1 = await SmartPdf.create();
+  const instance2 = await SmartPdf.create();
+  const instance3 = await SmartPdf.create();
+  
+  // Start all instances
+  await Promise.all([
+    instance1.start(),
+    instance2.start(),
+    instance3.start()
+  ]);
+  
+  console.log(`Instance 1 running on port: ${instance1.serverPort}`);
+  console.log(`Instance 2 running on port: ${instance2.serverPort}`);
+  console.log(`Instance 3 running on port: ${instance3.serverPort}`);
+  
+  // Use instances independently
+  const pdfs = await Promise.all([
+    instance1.getA4PdfResultForHtmlString('<h1>PDF 1</h1>'),
+    instance2.getA4PdfResultForHtmlString('<h1>PDF 2</h1>'),
+    instance3.getA4PdfResultForHtmlString('<h1>PDF 3</h1>')
+  ]);
+  
+  // Clean up all instances
+  await Promise.all([
+    instance1.stop(),
+    instance2.stop(),
+    instance3.stop()
+  ]);
+}
+```
+
+### Error Handling
+Always wrap SmartPdf operations in try-catch blocks and ensure proper cleanup:
+
+```typescript
+async function safePdfGeneration() {
+  let smartPdf: SmartPdf;
+  
+  try {
+    smartPdf = await SmartPdf.create();
+    await smartPdf.start();
+    
+    const pdf = await smartPdf.getA4PdfResultForHtmlString('<h1>Hello</h1>');
+    // Process PDF...
+    
+  } catch (error) {
+    console.error('PDF generation failed:', error);
+    // Handle error appropriately
+  } finally {
+    // Always cleanup
+    if (smartPdf) {
+      await smartPdf.stop();
+    }
+  }
+}
+```
+
+### IPdf Interface
+The `IPdf` interface represents a PDF with its metadata:
+
+```typescript
+interface IPdf {
+  name: string;           // Filename of the PDF
+  buffer: Buffer;         // PDF content as buffer
+  id: string | null;      // Unique identifier
+  metadata?: {
+    textExtraction?: string;  // Extracted text content
+  };
+}
+```
+
+## Best Practices
+
+1. **Always start and stop**: Initialize with `start()` and cleanup with `stop()` to properly manage resources.
+2. **Port management**: Use the automatic port allocation feature to avoid conflicts when running multiple instances.
+3. **Error handling**: Always implement proper error handling as PDF generation can fail due to various reasons.
+4. **Resource cleanup**: Ensure `stop()` is called even if an error occurs to prevent memory leaks.
+5. **HTML optimization**: When creating PDFs from HTML, ensure your HTML is well-formed and CSS is embedded or inlined.

 ## License and Legal Information

@@ -113,4 +330,4 @@ Registered at District court Bremen HRB 35230 HB, Germany

 For any legal inquiries or if you require further information, please contact us via email at hello@task.vc.

-By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.
+By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.