Compare commits

..

16 Commits

Author SHA1 Message Date
Juergen Kunz
be574df599 feat(image): add progressive JPEG generation support
Some checks failed
Default (tags) / security (push) Failing after 24s
Default (tags) / test (push) Failing after 12s
Default (tags) / release (push) Has been skipped
Default (tags) / metadata (push) Has been skipped
- Add convertPDFToJpegBytes method for progressive JPEG images
- Integrate @push.rocks/smartjimp for true progressive encoding
- Update readme with comprehensive documentation
- Update legal section to Task Venture Capital GmbH
2025-08-02 17:29:38 +00:00
Juergen Kunz
6a4aeed3e1 BREAKING CHANGE(smartpdf): improve image generation quality and API consistency
- Renamed convertPDFToWebpPreviews to convertPDFToWebpBytes for consistency
- Added configurable scale options with DPI support
- Changed default scale to 3.0 (216 DPI) for better quality
- Added DPI helper methods and scale constants
2025-08-02 12:37:48 +00:00
Juergen Kunz
a4c3415838 feat(smartpdf): add automatic port allocation and multi-instance support 2025-08-01 16:09:17 +00:00
f535eacd97 3.2.2 2025-02-25 18:22:06 +00:00
9908897aa2 fix(SmartPdf): Fix buffer handling for PDF conversion and text extraction 2025-02-25 18:22:06 +00:00
29d3cbb0b6 3.2.1 2025-02-25 18:06:45 +00:00
babc20649a fix(SmartPdf): Fix type for extractTextFromPdfBuffer function 2025-02-25 18:06:45 +00:00
1188643c4b 3.2.0 2025-02-25 18:03:27 +00:00
6b74301588 feat(smartpdf): Improve dependency versions and optimize PDF to PNG conversion. 2025-02-25 18:03:27 +00:00
168527573c 3.1.8 2024-11-30 20:43:05 +01:00
3d7bb37849 fix(core): Fix candidate handling in PDF generation 2024-11-30 20:43:05 +01:00
12a581ced9 3.1.7 2024-09-27 23:21:31 +02:00
857e1717a5 fix(dependencies): Update dependencies to latest versions 2024-09-27 23:21:30 +02:00
186bfb9d12 update description 2024-05-29 14:15:22 +02:00
c5bc354f65 3.1.6 2024-04-30 17:48:12 +02:00
c48bb0428f fix(core): update 2024-04-30 17:48:11 +02:00
10 changed files with 10991 additions and 3600 deletions

154
changelog.md Normal file
View File

@@ -0,0 +1,154 @@
# Changelog
## 2025-08-02 - 4.0.0 - BREAKING CHANGE(smartpdf)
Improve image generation quality and API consistency
- BREAKING: Renamed `convertPDFToWebpPreviews` to `convertPDFToWebpBytes` for API consistency
- Added configurable scale options to `convertPDFToPngBytes` method
- Changed default scale from 1.0 to 3.0 for PNG generation (216 DPI)
- Changed default scale from 0.5 to 3.0 for WebP generation (216 DPI)
- Added DPI helper methods: `getScaleForDPI()` and scale constants (SCALE_SCREEN, SCALE_HIGH, SCALE_PRINT)
- Added maxWidth/maxHeight constraints for both PNG and WebP generation
- Improved test file organization with clear naming conventions
- Updated documentation with DPI/scale guidance and examples
## 2025-08-01 - 3.3.0 - feat(smartpdf)
Add automatic port allocation and multi-instance support
- Added ISmartPdfOptions interface with port configuration options
- Implemented automatic port allocation between 20000-30000 by default
- Added support for custom port ranges via portRangeStart/portRangeEnd options
- Added support for specific port assignment via port option
- Fixed resource cleanup when port allocation fails
- Multiple SmartPdf instances can now run simultaneously without port conflicts
- Updated readme with comprehensive documentation for all features
## 2025-02-25 - 3.2.2 - fix(SmartPdf)
Fix buffer handling for PDF conversion and text extraction
- Ensure Uint8Array is converted to Node Buffer for PDF conversion.
- Correct the PDF page viewport handling by using document dimensions.
- Fix extractTextFromPdfBuffer argument type from Uint8Array to Buffer.
## 2025-02-25 - 3.2.1 - fix(SmartPdf)
Fix type for extractTextFromPdfBuffer function
- Corrected the parameter type from Buffer to Uint8Array for extractTextFromPdfBuffer function.
## 2025-02-25 - 3.2.0 - feat(smartpdf)
Improve dependency versions and optimize PDF to PNG conversion.
- Update several dependencies to newer versions for better stability and performance.
- Refactor tests to enhance readability and add directory creation validations.
- Optimize PDF to PNG conversion by switching to a more efficient Puppeteer and PDF.js-based method.
- Add checks for presence of required dependencies (GraphicsMagick and Ghostscript).
- Fix media emulation issue by properly awaiting the emulateMediaType function.
## 2024-11-30 - 3.1.8 - fix(core)
Fix candidate handling in PDF generation
- Added error handling for missing PDF candidates in server requests.
- Updated devDependencies and dependencies to latest versions for better stability and new features.
- Patched header retrieval logic during PDF generation for security check.
## 2024-09-27 - 3.1.7 - fix(dependencies)
Update dependencies to latest versions
- Updated @git.zone/tsbuild to version ^2.1.84
- Updated @git.zone/tsdoc to version ^1.3.12
- Updated @git.zone/tsrun to version ^1.2.49
- Updated @push.rocks/tapbundle to version ^5.3.0
- Updated @types/node to version ^22.7.4
- Updated @push.rocks/smartfile to version ^11.0.21
- Updated @push.rocks/smartpromise to version ^4.0.4
- Updated @tsclass/tsclass to version ^4.1.2
- Updated express to version ^4.21.0
- Updated pdf2pic to version ^3.1.3
## 2024-05-29 - 3.1.6 - Core
Updated description
- Minor changes to documentation and internal text.
## 2024-04-25 to 2024-04-30 - 3.1.0 to 3.1.5 - Core
Fix updates in core functionality
- Fixes and updates in core function in versions 3.1.0 to 3.1.5.
## 2024-04-25 - 3.0.17 - Feature
Now supports PDF to JPG conversion
- Added support for converting PDF files to JPG format.
## 2024-03-19 to 2024-04-14 - 3.0.17 - Maintenance
Various updates to project configuration files
- Updated `tsconfig`.
- Updated `npmextra.json`.
## 2023-07-11 to 2024-03-19 - 3.0.15 to 3.0.16 - Organization
Switch to new organization scheme and core updates
- Switched to new organization scheme.
- Applied core updates and bug fixes.
## 2022-11-07 to 2023-07-10 - 3.0.13 to 3.0.14 - Core
Fixes and updates to core functionality
- Various minor bug fixes and updates to core components.
## 2022-09-13 to 2022-11-07 - 3.0.10 to 3.0.12 - Core
Ongoing core updates and maintenance
- Regular fixes and operational improvements in core functionalities.
## 2022-06-12 to 2022-09-13 - 3.0.7 to 3.0.9 - Core
Continued focus on high-priority bug fixes and core functionalities
- Regular fixes for critical bugs and enhancements.
## 2022-03-24 to 2022-06-29 - 3.0.3 to 3.0.6 - Core
Further optimization and maintenance releases
- Further improvements and refinements of issues in core functionalities.
## 2022-01-05 to 2022-03-25 - 3.0.0 to 3.0.2 - Major Version Release
Major release for version 3.0.x, including core fixes
- Increased version from 2.x to 3.0. New significant changes and fixes.
## 2022-01-05 to 2022-03-24 - 2.0.13 to 2.0.19 - Core
Routine core updates and bug fixes
- Regular bug fixes in core components.
## 2019-11-19 to 2022-01-06 - 2.0.0 to 2.0.11 - Core
Multiple core updates and a few performance improvements
- Some performance enhancements and multiple bug fixes.
## 2019-11-16 to 2019-11-19 - 1.0.27 to 1.0.29 - API
Breaking change in API
- Naming PDF results to better represent their content.
## 2019-05-29 to 2019-11-15 - 1.0.13 to 1.0.26 - Core
Core functional updates and some major restructuring
- Introduced multiple updates to the core, addressing bugs and improving stability.
## 2019-04-10 to 2019-05-28 - 1.0.4 to 1.0.12 - Core
Fixes and updates in the core
- Implementation of multiple essential fixes for core components.
## 2018-10-06 - 1.0.1 to 1.0.3 - Core and Typings
Initial implementation and core fixes
- Initial implementation of the project.
- Fixed compilation problems in typings.
## 2016-01-29 - unknown - Initial
Initial commit
- Initial commit for the project setup.

View File

@@ -1,6 +1,6 @@
{
"name": "@push.rocks/smartpdf",
"version": "3.1.5",
"version": "4.0.0",
"private": false,
"description": "A library for creating PDFs dynamically from HTML or websites with additional features like merging PDFs.",
"main": "dist_ts/index.js",
@@ -9,33 +9,32 @@
"author": "Lossless GmbH",
"license": "MIT",
"scripts": {
"test": "(tstest test/ --web)",
"build": "(tsbuild --web --allowimplicitany)",
"test": "(tstest test/ --verbose --timeout 120)",
"build": "(tsbuild tsfolders --allowimplicitany)",
"buildDocs": "tsdoc"
},
"devDependencies": {
"@git.zone/tsbuild": "^2.1.66",
"@git.zone/tsdoc": "^1.1.12",
"@git.zone/tsrun": "^1.2.44",
"@git.zone/tstest": "^1.0.77",
"@push.rocks/tapbundle": "^5.0.23",
"@types/node": "^20.12.7"
"@git.zone/tsbuild": "^2.6.4",
"@git.zone/tsdoc": "^1.5.0",
"@git.zone/tsrun": "^1.3.3",
"@git.zone/tstest": "^2.3.2",
"@types/node": "^24.1.0"
},
"dependencies": {
"@push.rocks/smartbuffer": "^3.0.4",
"@push.rocks/smartbuffer": "^3.0.5",
"@push.rocks/smartdelay": "^3.0.5",
"@push.rocks/smartfile": "^11.0.14",
"@push.rocks/smartnetwork": "^3.0.0",
"@push.rocks/smartpath": "^5.0.18",
"@push.rocks/smartpromise": "^4.0.3",
"@push.rocks/smartpuppeteer": "^2.0.2",
"@push.rocks/smartfile": "^11.2.5",
"@push.rocks/smartjimp": "^1.2.0",
"@push.rocks/smartnetwork": "^4.1.2",
"@push.rocks/smartpath": "^6.0.0",
"@push.rocks/smartpromise": "^4.2.3",
"@push.rocks/smartpuppeteer": "^2.0.5",
"@push.rocks/smartunique": "^3.0.9",
"@tsclass/tsclass": "^4.0.54",
"@types/express": "^4.17.21",
"express": "^4.19.2",
"@tsclass/tsclass": "^9.2.0",
"@types/express": "^5.0.3",
"express": "^5.1.0",
"pdf-lib": "^1.17.1",
"pdf2json": "^3.0.5",
"pdf2pic": "^3.1.1"
"pdf2json": "3.2.0"
},
"files": [
"ts/**/*",
@@ -65,5 +64,11 @@
"PDF merging",
"text extraction",
"PDF management"
]
],
"homepage": "https://code.foss.global/push.rocks/smartpdf",
"repository": {
"type": "git",
"url": "https://code.foss.global/push.rocks/smartpdf.git"
},
"packageManager": "pnpm@10.11.0+sha512.6540583f41cc5f628eb3d9773ecee802f4f9ef9923cc45b69890fb47991d4b092964694ec3a4f738a420c918a333062c8b925d312f42e4f0c263eb603551f977"
}

13128
pnpm-lock.yaml generated

File diff suppressed because it is too large Load Diff

2
pnpm-workspace.yaml Normal file
View File

@@ -0,0 +1,2 @@
onlyBuiltDependencies:
- sharp

435
readme.md
View File

@@ -1,100 +1,409 @@
# @push.rocks/smartpdf
Create PDFs on the fly
# @push.rocks/smartpdf 📄✨
## Install
To install `@push.rocks/smartpdf`, use the following command with npm:
> **Transform HTML, websites, and PDFs into beautiful documents with just a few lines of code!**
[![npm version](https://img.shields.io/npm/v/@push.rocks/smartpdf.svg?style=flat-square)](https://www.npmjs.com/package/@push.rocks/smartpdf)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.x-blue.svg?style=flat-square)](https://www.typescriptlang.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](./license)
## 🚀 Why SmartPDF?
SmartPDF is your Swiss Army knife for PDF operations in Node.js. Whether you're generating invoices, creating reports, or converting web pages to PDFs, we've got you covered with a simple, powerful API.
### ✨ Features at a Glance
- 📝 **HTML to PDF** - Transform HTML strings with full CSS support
- 🌐 **Website to PDF** - Capture any website as a perfectly formatted PDF
- 🔀 **PDF Merging** - Combine multiple PDFs into one
- 🖼️ **PDF to Images** - Convert PDFs to PNG, WebP, or progressive JPEG
- 📑 **Text Extraction** - Pull text content from existing PDFs
- 🎯 **Smart Port Management** - Automatic port allocation for concurrent instances
- 💪 **TypeScript First** - Full type safety and IntelliSense support
-**High Performance** - Optimized for speed and reliability
## 📦 Installation
```bash
# Using npm
npm install @push.rocks/smartpdf --save
```
Or with yarn:
```bash
# Using yarn
yarn add @push.rocks/smartpdf
# Using pnpm (recommended)
pnpm add @push.rocks/smartpdf
```
## Usage
This documentation will guide you through using `@push.rocks/smartpdf` to create PDFs in various ways, such as from HTML strings or full web pages, and provides examples on how to merge multiple PDFs into one. Remember, all examples provided here use ESM syntax and TypeScript.
### Getting Started
First, ensure you have the package installed and you can import it into your TypeScript project:
## 🎯 Quick Start
```typescript
import { SmartPdf, IPdf } from '@push.rocks/smartpdf';
import { SmartPdf } from '@push.rocks/smartpdf';
// Create and start SmartPdf
const smartPdf = await SmartPdf.create();
await smartPdf.start();
// Generate a PDF from HTML
const pdf = await smartPdf.getA4PdfResultForHtmlString(`
<h1>Hello, PDF World! 🌍</h1>
<p>This is my first SmartPDF document.</p>
`);
// Save it
await fs.writeFile('my-first-pdf.pdf', pdf.buffer);
// Don't forget to clean up!
await smartPdf.stop();
```
### Creating a PDF from an HTML String
To create a PDF from a simple HTML string, youll need to instantiate `SmartPdf` and call `getA4PdfResultForHtmlString`.
## 📚 Core Concepts
### 🏗️ Instance Management
SmartPDF uses a client-server architecture for maximum performance. Always remember:
1. **Create** an instance
2. **Start** the server
3. **Do your PDF magic**
4. **Stop** the server
```typescript
async function createPdfFromHtml() {
const smartPdf = await SmartPdf.create();
const smartPdf = await SmartPdf.create();
await smartPdf.start();
// ... your PDF operations ...
await smartPdf.stop();
```
### 🔌 Smart Port Allocation
Run multiple instances without port conflicts:
```typescript
// Each instance automatically finds a free port
const instance1 = await SmartPdf.create(); // Port: 20000
const instance2 = await SmartPdf.create(); // Port: 20001
const instance3 = await SmartPdf.create(); // Port: 20002
// Or specify custom settings
const customInstance = await SmartPdf.create({
port: 3000, // Use specific port
portRangeStart: 4000, // Or define a range
portRangeEnd: 5000
});
```
## 🎨 PDF Generation
### 📝 From HTML String
Create beautiful PDFs from HTML with full CSS support:
```typescript
const smartPdf = await SmartPdf.create();
await smartPdf.start();
const pdf = await smartPdf.getA4PdfResultForHtmlString(`
<!DOCTYPE html>
<html>
<head>
<style>
@import url('https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;700&display=swap');
body {
font-family: 'Roboto', sans-serif;
margin: 40px;
color: #333;
}
.header {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
padding: 30px;
border-radius: 10px;
text-align: center;
}
.content {
margin-top: 30px;
line-height: 1.6;
}
.highlight {
background-color: #ffd93d;
padding: 2px 6px;
border-radius: 3px;
}
</style>
</head>
<body>
<div class="header">
<h1>Invoice #2024-001</h1>
<p>Generated on ${new Date().toLocaleDateString()}</p>
</div>
<div class="content">
<h2>Bill To:</h2>
<p>Acme Corporation</p>
<p>Total: <span class="highlight">$1,234.56</span></p>
</div>
</body>
</html>
`);
await fs.writeFile('invoice.pdf', pdf.buffer);
await smartPdf.stop();
```
### 🌐 From Website
Capture any website as a PDF with two powerful methods:
#### Standard A4 Format
Perfect for articles and documents:
```typescript
const pdf = await smartPdf.getPdfResultForWebsite('https://example.com');
```
#### Full Page Capture
Capture the entire scrollable area:
```typescript
const fullPagePdf = await smartPdf.getFullWebsiteAsSinglePdf('https://example.com');
```
### 🔀 Merge Multiple PDFs
Combine PDFs like a pro:
```typescript
// Load your PDFs
const invoice = await smartPdf.readFileToPdfObject('./invoice.pdf');
const terms = await smartPdf.readFileToPdfObject('./terms.pdf');
const contract = await smartPdf.getA4PdfResultForHtmlString('<h1>Contract</h1>...');
// Merge them in order
const mergedPdf = await smartPdf.mergePdfs([
contract.buffer,
invoice.buffer,
terms.buffer
]);
await fs.writeFile('complete-document.pdf', mergedPdf);
```
## 🖼️ Image Generation
### 🎨 Convert PDF to Images
SmartPDF supports three image formats, each with its own strengths:
#### PNG - Crystal Clear Quality
```typescript
const pngImages = await smartPdf.convertPDFToPngBytes(pdf.buffer, {
scale: SmartPdf.SCALE_HIGH // 216 DPI - perfect for most uses
});
// Save each page
pngImages.forEach((png, index) => {
fs.writeFileSync(`page-${index + 1}.png`, png);
});
```
#### WebP - Modern & Efficient
```typescript
const webpImages = await smartPdf.convertPDFToWebpBytes(pdf.buffer, {
quality: 90, // 0-100 quality scale
scale: 2.0 // 144 DPI - great for web
});
```
#### JPEG - Progressive Loading
```typescript
const jpegImages = await smartPdf.convertPDFToJpegBytes(pdf.buffer, {
quality: 85, // Balance between size and quality
scale: SmartPdf.SCALE_SCREEN, // 144 DPI
maxWidth: 1920 // Constrain dimensions
});
```
### 📏 DPI & Scale Guide
SmartPDF makes it easy to get the right resolution:
```typescript
// Built-in scale constants
SmartPdf.SCALE_SCREEN // 2.0 = ~144 DPI (web display)
SmartPdf.SCALE_HIGH // 3.0 = ~216 DPI (high quality, default)
SmartPdf.SCALE_PRINT // 6.0 = ~432 DPI (print quality)
// Or calculate your own
const scale = SmartPdf.getScaleForDPI(300); // Get scale for 300 DPI
```
### 🖼️ Thumbnail Generation
Create perfect thumbnails for document previews:
```typescript
const thumbnails = await smartPdf.convertPDFToWebpBytes(pdf.buffer, {
scale: 0.5, // Small but readable
quality: 70, // Lower quality for tiny files
maxWidth: 200, // Constrain to thumbnail size
maxHeight: 200
});
```
## 📊 Format Comparison
Choose the right format for your needs:
| Format | File Size | Best For | Special Features |
|--------|-----------|----------|------------------|
| **PNG** | Largest | Screenshots, diagrams, text | Lossless, transparency |
| **JPEG** | 30-50% of PNG | Photos, complex images | Progressive loading |
| **WebP** | 25-40% of PNG | Modern web apps | Best compression |
## 🛡️ Best Practices
### 1. Always Use Try-Finally
```typescript
let smartPdf: SmartPdf;
try {
smartPdf = await SmartPdf.create();
await smartPdf.start();
const htmlString = `<h1>Hello World</h1>`;
const pdf: IPdf = await smartPdf.getA4PdfResultForHtmlString(htmlString);
console.log(pdf.buffer); // This is your PDF buffer
await smartPdf.stop();
// Your PDF operations
} finally {
if (smartPdf) {
await smartPdf.stop(); // Always cleanup!
}
}
createPdfFromHtml();
```
### Generating a PDF from a Website
You may want to capture a full webpage as a PDF. `SmartPdf` provides two methods to accomplish this. One captures the viewable area as an A4 pdf, and the other captures the entire webpage.
#### A4 PDF from a Website
### 2. Optimize HTML for PDFs
```typescript
async function createA4PdfFromWebsite() {
const smartPdf = await SmartPdf.create();
await smartPdf.start();
const pdf: IPdf = await smartPdf.getPdfResultForWebsite('https://example.com');
console.log(pdf.buffer); // PDF buffer of the webpage
await smartPdf.stop();
}
createA4PdfFromWebsite();
const optimizedHtml = `
<style>
/* Use print-friendly styles */
@media print {
.no-print { display: none; }
}
/* Avoid page breaks in wrong places */
h1, h2, h3 { page-break-after: avoid; }
table { page-break-inside: avoid; }
</style>
${yourContent}
`;
```
#### Full Webpage as a Single PDF
### 3. Handle Large Documents
For documents with many pages:
```typescript
async function createFullPdfFromWebsite() {
const smartPdf = await SmartPdf.create();
await smartPdf.start();
const pdf: IPdf = await smartPdf.getFullWebsiteAsSinglePdf('https://example.com');
console.log(pdf.buffer); // PDF buffer with the full webpage
await smartPdf.stop();
// Process in batches
const pages = await smartPdf.convertPDFToPngBytes(largePdf.buffer);
for (let i = 0; i < pages.length; i += 10) {
const batch = pages.slice(i, i + 10);
await processBatch(batch);
}
createFullPdfFromWebsite();
```
### Merging Multiple PDFs
If you have multiple PDF objects (`IPdf`) that you wish to merge into a single PDF file, you can use the `mergePdfs` method.
## 🎯 Advanced Usage
### 🌐 Custom Browser Instance
Bring your own Puppeteer instance:
```typescript
async function mergePdfs() {
const smartPdf = await SmartPdf.create();
// Assume pdf1 and pdf2 are objects of type IPdf that you want to merge
const mergedPdf: IPdf = await smartPdf.mergePdfs([pdf1, pdf2]);
console.log(mergedPdf.buffer); // Buffer of the merged PDF
}
mergePdfs();
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-dev-shm-usage']
});
const smartPdf = await SmartPdf.create();
await smartPdf.start(browser);
// SmartPdf won't close your browser
await smartPdf.stop();
await browser.close(); // You manage it
```
### Reading PDF from Disk and Extracting Text
To read a PDF from the disk and extract its text content:
### ⚡ Parallel Processing
Process multiple PDFs concurrently:
```typescript
async function readAndExtractFromPdf() {
const smartPdf = await SmartPdf.create();
const pdf: IPdf = await smartPdf.readFileToPdfObject('/path/to/your/pdf/file.pdf');
const extractedText = await smartPdf.extractTextFromPdfBuffer(pdf.buffer);
console.log(extractedText); // Extracted text from the PDF
}
readAndExtractFromPdf();
const urls = ['https://example1.com', 'https://example2.com', 'https://example3.com'];
const pdfs = await Promise.all(
urls.map(url => smartPdf.getFullWebsiteAsSinglePdf(url))
);
// Or with multiple instances for maximum performance
const instances = await Promise.all(
Array(3).fill(null).map(() => SmartPdf.create())
);
await Promise.all(instances.map(i => i.start()));
// Process in parallel across instances
const results = await Promise.all(
urls.map((url, i) => instances[i % instances.length].getFullWebsiteAsSinglePdf(url))
);
// Cleanup all instances
await Promise.all(instances.map(i => i.stop()));
```
This guide provides a comprehensive overview of generating PDFs using `@push.rocks/smartpdf`. Remember to start and stop your `SmartPdf` instance to properly initialize and clean up resources, especially when working with server-side rendering or capturing web pages.
## 📝 API Reference
### Class: SmartPdf
#### Static Methods
- `create(options?: ISmartPdfOptions)` - Create a new SmartPdf instance
- `getScaleForDPI(dpi: number)` - Calculate scale factor for desired DPI
#### Instance Methods
- `start(browser?: Browser)` - Start the PDF server
- `stop()` - Stop the PDF server
- `getA4PdfResultForHtmlString(html: string)` - Generate A4 PDF from HTML
- `getPdfResultForWebsite(url: string)` - Generate A4 PDF from website
- `getFullWebsiteAsSinglePdf(url: string)` - Capture full webpage as PDF
- `mergePdfs(buffers: Uint8Array[])` - Merge multiple PDFs
- `readFileToPdfObject(path: string)` - Read PDF file from disk
- `extractTextFromPdfBuffer(buffer: Buffer)` - Extract text from PDF
- `convertPDFToPngBytes(buffer: Uint8Array, options?)` - Convert to PNG
- `convertPDFToWebpBytes(buffer: Uint8Array, options?)` - Convert to WebP
- `convertPDFToJpegBytes(buffer: Uint8Array, options?)` - Convert to JPEG
### Interface: IPdf
```typescript
interface IPdf {
name: string; // Filename
buffer: Buffer; // PDF content
id: string | null; // Unique identifier
metadata?: {
textExtraction?: string; // Extracted text
};
}
```
## 🤝 Contributing
We love contributions! Please feel free to submit a Pull Request.
## License and Legal Information
@@ -113,4 +422,4 @@ Registered at District court Bremen HRB 35230 HB, Germany
For any legal inquiries or if you require further information, please contact us via email at hello@task.vc.
By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.
By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.

97
test/test.port.ts Normal file
View File

@@ -0,0 +1,97 @@
import { expect, tap } from '@git.zone/tstest/tapbundle';
import * as smartpdf from '../ts/index.js';
tap.test('should create multiple SmartPdf instances with automatic port allocation', async () => {
const instance1 = new smartpdf.SmartPdf();
const instance2 = new smartpdf.SmartPdf();
const instance3 = new smartpdf.SmartPdf();
// Start all instances
await instance1.start();
await instance2.start();
await instance3.start();
// Verify all instances have different ports
expect(instance1.serverPort).toBeGreaterThanOrEqual(20000);
expect(instance1.serverPort).toBeLessThanOrEqual(30000);
expect(instance2.serverPort).toBeGreaterThanOrEqual(20000);
expect(instance2.serverPort).toBeLessThanOrEqual(30000);
expect(instance3.serverPort).toBeGreaterThanOrEqual(20000);
expect(instance3.serverPort).toBeLessThanOrEqual(30000);
// Ensure all ports are different
expect(instance1.serverPort).not.toEqual(instance2.serverPort);
expect(instance1.serverPort).not.toEqual(instance3.serverPort);
expect(instance2.serverPort).not.toEqual(instance3.serverPort);
console.log(`Instance 1 port: ${instance1.serverPort}`);
console.log(`Instance 2 port: ${instance2.serverPort}`);
console.log(`Instance 3 port: ${instance3.serverPort}`);
// Test that all instances work correctly
const pdf1 = await instance1.getA4PdfResultForHtmlString('<h1>Instance 1</h1>');
const pdf2 = await instance2.getA4PdfResultForHtmlString('<h1>Instance 2</h1>');
const pdf3 = await instance3.getA4PdfResultForHtmlString('<h1>Instance 3</h1>');
expect(pdf1.buffer).toBeInstanceOf(Buffer);
expect(pdf2.buffer).toBeInstanceOf(Buffer);
expect(pdf3.buffer).toBeInstanceOf(Buffer);
// Clean up
await instance1.stop();
await instance2.stop();
await instance3.stop();
});
tap.test('should create SmartPdf instance with custom port range', async () => {
const customInstance = new smartpdf.SmartPdf({
portRangeStart: 25000,
portRangeEnd: 26000
});
await customInstance.start();
expect(customInstance.serverPort).toBeGreaterThanOrEqual(25000);
expect(customInstance.serverPort).toBeLessThanOrEqual(26000);
console.log(`Custom range instance port: ${customInstance.serverPort}`);
await customInstance.stop();
});
tap.test('should create SmartPdf instance with specific port', async () => {
const specificPortInstance = new smartpdf.SmartPdf({
port: 28888
});
await specificPortInstance.start();
expect(specificPortInstance.serverPort).toEqual(28888);
console.log(`Specific port instance: ${specificPortInstance.serverPort}`);
await specificPortInstance.stop();
});
tap.test('should throw error when specific port is already in use', async () => {
const instance1 = new smartpdf.SmartPdf({ port: 29999 });
await instance1.start();
const instance2 = new smartpdf.SmartPdf({ port: 29999 });
let errorThrown = false;
try {
await instance2.start();
} catch (error) {
errorThrown = true;
expect(error.message).toInclude('already in use');
}
expect(errorThrown).toBeTrue();
await instance1.stop();
});
export default tap.start();

View File

@@ -1,66 +1,288 @@
import { expect, tap } from '@push.rocks/tapbundle';
import { expect, tap } from '@git.zone/tstest/tapbundle';
import * as smartpdf from '../ts/index.js';
import * as fs from 'fs';
import * as path from 'path';
let testSmartPdf: smartpdf.SmartPdf;
tap.test('should create a valid instance of smartpdf', async () => {
/**
* Ensures that a directory exists.
* @param dirPath - The directory path to ensure.
*/
function ensureDir(dirPath: string): void {
if (!fs.existsSync(dirPath)) {
fs.mkdirSync(dirPath, { recursive: true });
}
}
// Clean test results directory at start
const testResultsDir = path.join('.nogit', 'testresults');
if (fs.existsSync(testResultsDir)) {
fs.rmSync(testResultsDir, { recursive: true, force: true });
}
ensureDir(testResultsDir);
tap.test('should create a valid instance of SmartPdf', async () => {
testSmartPdf = new smartpdf.SmartPdf();
expect(testSmartPdf).toBeInstanceOf(smartpdf.SmartPdf);
});
tap.test('should start the instance', async () => {
tap.test('should start the SmartPdf instance', async () => {
await testSmartPdf.start();
});
tap.test('should create a pdf from html string', async () => {
await testSmartPdf.getA4PdfResultForHtmlString('hi');
tap.test('should create PDFs from HTML string', async () => {
const pdf1 = await testSmartPdf.getA4PdfResultForHtmlString('hi');
const pdf2 = await testSmartPdf.getA4PdfResultForHtmlString('hello');
expect(pdf1.buffer).toBeInstanceOf(Buffer);
expect(pdf2.buffer).toBeInstanceOf(Buffer);
});
tap.test('should create a pdf from html string', async () => {
await testSmartPdf.getA4PdfResultForHtmlString('hi');
tap.test('should create PDFs from websites', async () => {
const pdfA4 = await testSmartPdf.getPdfResultForWebsite('https://www.wikipedia.org');
const pdfSingle = await testSmartPdf.getFullWebsiteAsSinglePdf('https://www.wikipedia.org');
expect(pdfA4.buffer).toBeInstanceOf(Buffer);
expect(pdfSingle.buffer).toBeInstanceOf(Buffer);
});
tap.test('should create a pdf from website as A4', async () => {
await testSmartPdf.getPdfResultForWebsite('https://www.wikipedia.org');
});
tap.test('should create a pdf from website as single page PDF', async () => {
await testSmartPdf.getFullWebsiteAsSinglePdf('https://www.wikipedia.org');
});
tap.test('should create a valid PDFResult', async () => {
const writePDfToDisk = async (urlArg: string, fileName: string) => {
tap.test('should create valid PDF results and write them to disk', async () => {
const writePdfToDisk = async (urlArg: string, fileName: string) => {
const pdfResult = await testSmartPdf.getFullWebsiteAsSinglePdf(urlArg);
expect(pdfResult.buffer).toBeInstanceOf(Buffer);
const fs = await import('fs');
if (!fs.existsSync('.nogit/')) {
fs.mkdirSync('.nogit/');
}
fs.writeFileSync(`.nogit/${fileName}`, pdfResult.buffer as Buffer);
ensureDir('.nogit');
fs.writeFileSync(path.join('.nogit', fileName), pdfResult.buffer as Buffer);
};
await writePDfToDisk('https://lossless.com/', '1.pdf');
await writePDfToDisk('https://layer.io', '2.pdf');
await writePdfToDisk('https://lossless.com/', '1.pdf');
await writePdfToDisk('https://layer.io', '2.pdf');
});
tap.test('should merge pdfs', async () => {
const fs = await import('fs');
tap.test('should merge PDFs into a combined PDF', async () => {
const pdf1 = await testSmartPdf.readFileToPdfObject('.nogit/1.pdf');
const pdf2 = await testSmartPdf.readFileToPdfObject('.nogit/2.pdf');
fs.writeFileSync(
`.nogit/combined.pdf`,
await testSmartPdf.mergePdfs([pdf1.buffer, pdf2.buffer])
);
const mergedBuffer = await testSmartPdf.mergePdfs([pdf1.buffer, pdf2.buffer]);
ensureDir('.nogit');
fs.writeFileSync(path.join('.nogit', 'combined.pdf'), mergedBuffer);
});
tap.test('should create images from an pdf', async () => {
tap.test('should create PNG images from combined PDF using Puppeteer conversion', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/combined.pdf');
const images = await testSmartPdf.convertPDFToPngBytes(pdfObject.buffer);
console.log(images.map((val) => val.length));
expect(images.length).toBeGreaterThan(0);
console.log('Puppeteer-based conversion image sizes:', images.map(img => img.length));
});
tap.test('should be able to close properly', async () => {
tap.test('should store PNG results from both conversion functions in .nogit/testresults', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/combined.pdf');
// Convert using Puppeteer-based function and store images
const imagesPuppeteer = await testSmartPdf.convertPDFToPngBytes(pdfObject.buffer);
imagesPuppeteer.forEach((img, index) => {
const filePath = path.join(testResultsDir, `png_combined_page${index + 1}.png`);
fs.writeFileSync(filePath, Buffer.from(img));
});
});
tap.test('should create WebP preview images from PDF', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
const webpPreviews = await testSmartPdf.convertPDFToWebpBytes(pdfObject.buffer);
expect(webpPreviews.length).toBeGreaterThan(0);
console.log('WebP preview sizes:', webpPreviews.map(img => img.length));
// Also create PNG previews for comparison
const pngPreviews = await testSmartPdf.convertPDFToPngBytes(pdfObject.buffer);
console.log('PNG preview sizes:', pngPreviews.map(img => img.length));
// Save the first page as both WebP and PNG preview
fs.writeFileSync(path.join(testResultsDir, 'webp_default_page1.webp'), Buffer.from(webpPreviews[0]));
fs.writeFileSync(path.join(testResultsDir, 'png_default_page1.png'), Buffer.from(pngPreviews[0]));
});
tap.test('should create WebP previews with custom scale and quality', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Create smaller previews with lower quality for thumbnails
const thumbnails = await testSmartPdf.convertPDFToWebpBytes(pdfObject.buffer, {
scale: 0.5, // Create readable thumbnails at ~36 DPI
quality: 70
});
expect(thumbnails.length).toBeGreaterThan(0);
console.log('Thumbnail sizes:', thumbnails.map(img => img.length));
// Save thumbnails
thumbnails.forEach((thumb, index) => {
fs.writeFileSync(path.join(testResultsDir, `webp_thumbnail_page${index + 1}.webp`), Buffer.from(thumb));
});
});
tap.test('should create WebP previews with max dimensions', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Create previews with maximum dimensions (will use high scale but constrain to max size)
const constrainedPreviews = await testSmartPdf.convertPDFToWebpBytes(pdfObject.buffer, {
scale: smartpdf.SmartPdf.SCALE_HIGH, // Start with high quality
quality: 90,
maxWidth: 800,
maxHeight: 1000
});
expect(constrainedPreviews.length).toBeGreaterThan(0);
console.log('Constrained preview sizes:', constrainedPreviews.map(img => img.length));
// Save constrained preview
fs.writeFileSync(path.join(testResultsDir, 'webp_constrained_page1.webp'), Buffer.from(constrainedPreviews[0]));
});
tap.test('should verify WebP files are smaller than PNG', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Generate both PNG and WebP versions at the same scale for fair comparison
const comparisonScale = smartpdf.SmartPdf.SCALE_HIGH; // Both use 3.0 scale
const pngImages = await testSmartPdf.convertPDFToPngBytes(pdfObject.buffer, {
scale: comparisonScale
});
const webpImages = await testSmartPdf.convertPDFToWebpBytes(pdfObject.buffer, {
scale: comparisonScale,
quality: 85
});
expect(pngImages.length).toEqual(webpImages.length);
// Compare sizes
let totalPngSize = 0;
let totalWebpSize = 0;
pngImages.forEach((png, index) => {
const pngSize = png.length;
const webpSize = webpImages[index].length;
totalPngSize += pngSize;
totalWebpSize += webpSize;
const reduction = ((pngSize - webpSize) / pngSize * 100).toFixed(1);
console.log(`Page ${index + 1}: PNG=${pngSize} bytes, WebP=${webpSize} bytes, Reduction=${reduction}%`);
// Save comparison files
fs.writeFileSync(path.join(testResultsDir, `comparison_png_page${index + 1}.png`), Buffer.from(png));
fs.writeFileSync(path.join(testResultsDir, `comparison_webp_page${index + 1}.webp`), Buffer.from(webpImages[index]));
});
const totalReduction = ((totalPngSize - totalWebpSize) / totalPngSize * 100).toFixed(1);
console.log(`Total size reduction: ${totalReduction}% (PNG: ${totalPngSize} bytes, WebP: ${totalWebpSize} bytes)`);
// WebP should be smaller
expect(totalWebpSize).toBeLessThan(totalPngSize);
});
tap.test('should create JPEG images from PDF', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
const jpegImages = await testSmartPdf.convertPDFToJpegBytes(pdfObject.buffer);
expect(jpegImages.length).toBeGreaterThan(0);
console.log('JPEG image sizes:', jpegImages.map(img => img.length));
// Save the first page as JPEG
fs.writeFileSync(path.join(testResultsDir, 'jpeg_default_page1.jpg'), Buffer.from(jpegImages[0]));
});
tap.test('should create JPEG images with different quality levels', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Test different quality levels
const qualityLevels = [50, 70, 85, 95];
for (const quality of qualityLevels) {
const jpegImages = await testSmartPdf.convertPDFToJpegBytes(pdfObject.buffer, {
scale: smartpdf.SmartPdf.SCALE_HIGH,
quality: quality
});
console.log(`JPEG quality ${quality}: ${jpegImages[0].length} bytes`);
// Save first page at each quality level
fs.writeFileSync(
path.join(testResultsDir, `jpeg_quality_${quality}_page1.jpg`),
Buffer.from(jpegImages[0])
);
}
});
tap.test('should create JPEG images with max dimensions', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Create constrained JPEG images
const constrainedJpegs = await testSmartPdf.convertPDFToJpegBytes(pdfObject.buffer, {
scale: smartpdf.SmartPdf.SCALE_HIGH,
quality: 85,
maxWidth: 1200,
maxHeight: 1200
});
expect(constrainedJpegs.length).toBeGreaterThan(0);
console.log('Constrained JPEG sizes:', constrainedJpegs.map(img => img.length));
// Save constrained JPEG
fs.writeFileSync(path.join(testResultsDir, 'jpeg_constrained_page1.jpg'), Buffer.from(constrainedJpegs[0]));
});
tap.test('should compare file sizes between PNG, WebP, and JPEG', async () => {
const pdfObject = await testSmartPdf.readFileToPdfObject('.nogit/3.pdf');
// Generate all three formats at the same scale
const comparisonScale = smartpdf.SmartPdf.SCALE_HIGH; // 3.0 scale
const pngImages = await testSmartPdf.convertPDFToPngBytes(pdfObject.buffer, {
scale: comparisonScale
});
const webpImages = await testSmartPdf.convertPDFToWebpBytes(pdfObject.buffer, {
scale: comparisonScale,
quality: 85
});
const jpegImages = await testSmartPdf.convertPDFToJpegBytes(pdfObject.buffer, {
scale: comparisonScale,
quality: 85
});
expect(pngImages.length).toEqual(webpImages.length);
expect(pngImages.length).toEqual(jpegImages.length);
// Compare sizes
let totalPngSize = 0;
let totalWebpSize = 0;
let totalJpegSize = 0;
pngImages.forEach((png, index) => {
const pngSize = png.length;
const webpSize = webpImages[index].length;
const jpegSize = jpegImages[index].length;
totalPngSize += pngSize;
totalWebpSize += webpSize;
totalJpegSize += jpegSize;
const webpReduction = ((pngSize - webpSize) / pngSize * 100).toFixed(1);
const jpegReduction = ((pngSize - jpegSize) / pngSize * 100).toFixed(1);
console.log(`Page ${index + 1}:`);
console.log(` PNG: ${pngSize} bytes`);
console.log(` WebP: ${webpSize} bytes (${webpReduction}% smaller than PNG)`);
console.log(` JPEG: ${jpegSize} bytes (${jpegReduction}% smaller than PNG)`);
});
const totalWebpReduction = ((totalPngSize - totalWebpSize) / totalPngSize * 100).toFixed(1);
const totalJpegReduction = ((totalPngSize - totalJpegSize) / totalPngSize * 100).toFixed(1);
console.log('\nTotal size comparison:');
console.log(`PNG: ${totalPngSize} bytes`);
console.log(`WebP: ${totalWebpSize} bytes (${totalWebpReduction}% reduction)`);
console.log(`JPEG: ${totalJpegSize} bytes (${totalJpegReduction}% reduction)`);
// JPEG and WebP should both be smaller than PNG
expect(totalJpegSize).toBeLessThan(totalPngSize);
expect(totalWebpSize).toBeLessThan(totalPngSize);
});
tap.test('should close the SmartPdf instance properly', async () => {
await testSmartPdf.stop();
});
tap.start();
tap.start();

View File

@@ -1,8 +1,8 @@
/**
* autocreated commitinfo by @pushrocks/commitinfo
* autocreated commitinfo by @push.rocks/commitinfo
*/
export const commitinfo = {
name: '@push.rocks/smartpdf',
version: '3.1.5',
version: '3.2.2',
description: 'A library for creating PDFs dynamically from HTML or websites with additional features like merging PDFs.'
}

View File

@@ -3,13 +3,33 @@ import * as paths from './smartpdf.paths.js';
import { Server } from 'http';
import { PdfCandidate } from './smartpdf.classes.pdfcandidate.js';
import { type IPdf } from '@tsclass/tsclass/dist_ts/business/pdf.js';
import { execFile } from 'child_process';
declare const document: any;
export interface ISmartPdfOptions {
port?: number;
portRangeStart?: number;
portRangeEnd?: number;
}
export class SmartPdf {
// STATIC SCALE CONSTANTS
public static readonly SCALE_SCREEN = 2.0; // ~144 DPI - Good for screen display
public static readonly SCALE_HIGH = 3.0; // ~216 DPI - High quality (default)
public static readonly SCALE_PRINT = 6.0; // ~432 DPI - Print quality
/**
* Calculate scale factor for desired DPI
* PDF.js default is 72 DPI, so scale = desiredDPI / 72
*/
public static getScaleForDPI(dpi: number): number {
return dpi / 72;
}
// STATIC
public static async create() {
const smartpdfInstance = new SmartPdf();
public static async create(optionsArg?: ISmartPdfOptions) {
const smartpdfInstance = new SmartPdf(optionsArg);
return smartpdfInstance;
}
@@ -20,9 +40,15 @@ export class SmartPdf {
externalBrowserBool: boolean = false;
private _readyDeferred: plugins.smartpromise.Deferred<void>;
private _candidates: { [key: string]: PdfCandidate } = {};
private _options: ISmartPdfOptions;
constructor() {
constructor(optionsArg?: ISmartPdfOptions) {
this._readyDeferred = new plugins.smartpromise.Deferred();
this._options = {
portRangeStart: 20000,
portRangeEnd: 30000,
...optionsArg
};
}
async start(headlessBrowserArg?: plugins.smartpuppeteer.puppeteer.Browser) {
@@ -34,21 +60,56 @@ export class SmartPdf {
this.externalBrowserBool = true;
} else {
this.headlessBrowser = await plugins.smartpuppeteer.getEnvAwareBrowserInstance({
forceNoSandbox: true,
forceNoSandbox: false,
});
}
// setup server
// Find an available port BEFORE creating server
const smartnetworkInstance = new plugins.smartnetwork.SmartNetwork();
if (this._options.port) {
// If a specific port is requested, check if it's available
const isPortAvailable = await smartnetworkInstance.isLocalPortUnused(this._options.port);
if (isPortAvailable) {
this.serverPort = this._options.port;
} else {
// Clean up browser if we created one
if (!this.externalBrowserBool && this.headlessBrowser) {
await this.headlessBrowser.close();
}
throw new Error(`Requested port ${this._options.port} is already in use`);
}
} else {
// Find a free port in the specified range
this.serverPort = await smartnetworkInstance.findFreePort(
this._options.portRangeStart,
this._options.portRangeEnd
);
if (!this.serverPort) {
// Clean up browser if we created one
if (!this.externalBrowserBool && this.headlessBrowser) {
await this.headlessBrowser.close();
}
throw new Error(`No free ports available in range ${this._options.portRangeStart}-${this._options.portRangeEnd}`);
}
}
// Now setup server after we know we have a valid port
const app = plugins.express();
app.get('/:pdfId', (req, res) => {
res.setHeader('PDF-ID', this._candidates[req.params.pdfId].pdfId);
res.send(this._candidates[req.params.pdfId].htmlString);
const wantedCandidate = this._candidates[req.params.pdfId];
if (!wantedCandidate) {
console.log(`${req.url} not attached to a candidate`);
return;
}
res.setHeader('pdf-id', wantedCandidate.pdfId);
res.send(wantedCandidate.htmlString);
});
this.htmlServerInstance = plugins.http.createServer(app);
const smartnetworkInstance = new plugins.smartnetwork.SmartNetwork();
const portAvailable = smartnetworkInstance.isLocalPortUnused(3210);
this.htmlServerInstance.listen(3210, 'localhost');
this.htmlServerInstance.listen(this.serverPort, 'localhost');
this.htmlServerInstance.on('listening', () => {
console.log(`SmartPdf server listening on port ${this.serverPort}`);
this._readyDeferred.resolve();
done.resolve();
});
@@ -70,7 +131,7 @@ export class SmartPdf {
}
/**
* returns a pdf for a given html string;
* Returns a PDF for a given HTML string.
*/
async getA4PdfResultForHtmlString(htmlStringArg: string): Promise<plugins.tsclass.business.IPdf> {
await this._readyDeferred.promise;
@@ -81,10 +142,9 @@ export class SmartPdf {
width: 794,
height: 1122,
});
const response = await page.goto(`http://localhost:3210/${pdfCandidate.pdfId}`, {
const response = await page.goto(`http://localhost:${this.serverPort}/${pdfCandidate.pdfId}`, {
waitUntil: 'networkidle2',
});
// await plugins.smartdelay.delayFor(1000);
const headers = response.headers();
if (headers['pdf-id'] !== pdfCandidate.pdfId) {
console.log('Error! Headers do not match. For security reasons no pdf is being emitted!');
@@ -99,6 +159,8 @@ export class SmartPdf {
printBackground: true,
displayHeaderFooter: false,
});
// Convert Uint8Array to Node Buffer
const nodePdfBuffer = Buffer.from(pdfBuffer);
await page.close();
delete this._candidates[pdfCandidate.pdfId];
pdfCandidate.doneDeferred.resolve();
@@ -107,9 +169,9 @@ export class SmartPdf {
id: pdfCandidate.pdfId,
name: `${pdfCandidate.pdfId}.js`,
metadata: {
textExtraction: await this.extractTextFromPdfBuffer(pdfBuffer),
textExtraction: await this.extractTextFromPdfBuffer(nodePdfBuffer),
},
buffer: pdfBuffer,
buffer: nodePdfBuffer,
};
}
@@ -134,14 +196,16 @@ export class SmartPdf {
printBackground: true,
displayHeaderFooter: false,
});
// Convert Uint8Array to Node Buffer
const nodePdfBuffer = Buffer.from(pdfBuffer);
await page.close();
return {
id: pdfId,
name: `${pdfId}.js`,
metadata: {
textExtraction: await this.extractTextFromPdfBuffer(pdfBuffer),
textExtraction: await this.extractTextFromPdfBuffer(nodePdfBuffer),
},
buffer: pdfBuffer,
buffer: nodePdfBuffer,
};
}
@@ -151,15 +215,23 @@ export class SmartPdf {
width: 1920,
height: 1200,
});
page.emulateMediaType('screen');
await page.emulateMediaType('screen');
const response = await page.goto(websiteUrl, { waitUntil: 'networkidle2' });
const pdfId = plugins.smartunique.shortId();
// Use both document.body and document.documentElement to ensure we have a valid height and width.
const { documentHeight, documentWidth } = await page.evaluate(() => {
return {
documentHeight: document.body.scrollHeight,
documentWidth: document.body.clientWidth,
documentHeight: Math.max(
document.body.scrollHeight,
document.documentElement.scrollHeight
) || 1200,
documentWidth: Math.max(
document.body.clientWidth,
document.documentElement.clientWidth
) || 1920,
};
});
// Update viewport height to the full document height.
await page.setViewport({
width: 1920,
height: documentHeight,
@@ -172,14 +244,16 @@ export class SmartPdf {
scale: 1,
pageRanges: '1',
});
// Convert Uint8Array to Node Buffer
const nodePdfBuffer = Buffer.from(pdfBuffer);
await page.close();
return {
id: pdfId,
name: `${pdfId}.js`,
metadata: {
textExtraction: await this.extractTextFromPdfBuffer(pdfBuffer),
textExtraction: await this.extractTextFromPdfBuffer(nodePdfBuffer),
},
buffer: pdfBuffer,
buffer: nodePdfBuffer,
};
}
@@ -196,9 +270,9 @@ export class SmartPdf {
}
public async readFileToPdfObject(pathArg: string): Promise<plugins.tsclass.business.IPdf> {
const path = plugins.smartpath.transform.makeAbsolute(pathArg);
const parsedPath = plugins.path.parse(path);
const buffer = await plugins.smartfile.fs.toBuffer(path);
const absolutePath = plugins.smartpath.transform.makeAbsolute(pathArg);
const parsedPath = plugins.path.parse(absolutePath);
const buffer = await plugins.smartfile.fs.toBuffer(absolutePath);
return {
name: parsedPath.base,
buffer,
@@ -225,40 +299,291 @@ export class SmartPdf {
return deferred.promise;
}
/**
* Checks for the presence of required dependencies: GraphicsMagick and Ghostscript.
*/
private async checkDependencies(): Promise<void> {
await Promise.all([
this.checkCommandExists('gm', ['version']),
this.checkCommandExists('gs', ['--version']),
]);
}
/**
* Checks if a given command exists by trying to execute it.
*/
private checkCommandExists(command: string, args: string[]): Promise<void> {
return new Promise((resolve, reject) => {
execFile(command, args, (error, stdout, stderr) => {
if (error) {
reject(new Error(`Dependency check failed: ${command} is not installed or not in the PATH. ${error.message}`));
} else {
resolve();
}
});
});
}
/**
* Converts a PDF to PNG bytes for each page using Puppeteer and PDF.js.
* This method creates a temporary HTML page that loads PDF.js from a CDN,
* renders each PDF page to a canvas, and then screenshots each canvas element.
*/
public async convertPDFToPngBytes(
pdfBytes: Uint8Array,
options: {
width?: number;
height?: number;
quality?: number;
scale?: number; // Scale factor for output size (default: 3.0 for 216 DPI)
maxWidth?: number; // Maximum width in pixels (optional)
maxHeight?: number; // Maximum height in pixels (optional)
} = {}
) {
const { width = 1024, height = 768, quality = 100 } = options;
): Promise<Uint8Array[]> {
// Set default scale for higher quality output (3.0 = ~216 DPI)
const scale = options.scale || 3.0;
// Load the PDF document
const pdfDoc = await plugins.pdfLib.PDFDocument.load(pdfBytes);
// Create a new page using the headless browser.
const page = await this.headlessBrowser.newPage();
const converter = plugins.pdf2pic.fromBuffer(Buffer.from(pdfBytes), {
density: 100, // Image density (DPI)
format: 'png', // Image format
width, // Output image width
height, // Output image height
quality, // Output image quality
});
// Prepare PDF data as a base64 string.
const base64Pdf: string = Buffer.from(pdfBytes).toString('base64');
// Get array promises that resolve to JPG buffers
const imagePromises: Promise<Buffer>[] = [];
const numPages = pdfDoc.getPageCount();
// HTML template that loads PDF.js and renders the PDF.
const htmlTemplate: string = `
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>PDF to PNG Converter</title>
<style>
body { margin: 0; }
canvas { display: block; margin: 10px auto; }
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.16.105/pdf.min.js"></script>
</head>
<body>
<script>
(async function() {
pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.16.105/pdf.worker.min.js';
const pdfData = "__PDF_DATA__";
const raw = atob(pdfData);
const pdfArray = new Uint8Array([...raw].map(c => c.charCodeAt(0)));
const loadingTask = pdfjsLib.getDocument({data: pdfArray});
const pdf = await loadingTask.promise;
const numPages = pdf.numPages;
for (let pageNum = 1; pageNum <= numPages; pageNum++) {
const page = await pdf.getPage(pageNum);
// Apply scale factor to viewport
const viewport = page.getViewport({ scale: ${scale} });
// Apply max width/height constraints if specified
let finalScale = ${scale};
${options.maxWidth ? `
if (viewport.width > ${options.maxWidth}) {
finalScale = ${options.maxWidth} / (viewport.width / ${scale});
}` : ''}
${options.maxHeight ? `
if (viewport.height > ${options.maxHeight}) {
const heightScale = ${options.maxHeight} / (viewport.height / ${scale});
finalScale = Math.min(finalScale, heightScale);
}` : ''}
// Get final viewport with adjusted scale
const finalViewport = page.getViewport({ scale: finalScale });
const canvas = document.createElement('canvas');
const context = canvas.getContext('2d');
canvas.width = finalViewport.width;
canvas.height = finalViewport.height;
canvas.setAttribute('data-page', pageNum);
await page.render({ canvasContext: context, viewport: finalViewport }).promise;
document.body.appendChild(canvas);
}
window.renderComplete = true;
})();
</script>
</body>
</html>
`;
for (let i = 0; i < numPages; i++) {
imagePromises.push(converter(i + 1, {
responseType: 'buffer',
}).then((output) => output.buffer));
// Replace the placeholder with the actual base64 PDF data.
const htmlContent: string = htmlTemplate.replace("__PDF_DATA__", base64Pdf);
// Set the page content.
await page.setContent(htmlContent, { waitUntil: 'networkidle0' });
// Wait until the PDF.js rendering is complete.
await page.waitForFunction(() => (window as any).renderComplete === true, { timeout: 30000 });
// Query all canvas elements (each representing a rendered PDF page).
const canvasElements = await page.$$('canvas');
const pngBuffers: Uint8Array[] = [];
for (const canvasElement of canvasElements) {
// Screenshot the canvas element. The screenshot will be a PNG buffer.
const screenshotBuffer = (await canvasElement.screenshot({ encoding: 'binary' })) as Buffer;
pngBuffers.push(new Uint8Array(screenshotBuffer));
}
// Resolve all promises and return the array of buffers
const imageBuffers = await Promise.all(imagePromises);
const imageUint8Arrays = imageBuffers.map((buffer) => buffer);
return imageUint8Arrays;
await page.close();
return pngBuffers;
}
}
/**
* Converts a PDF to WebP bytes for each page.
* This method creates web-optimized images using WebP format.
* WebP provides 25-35% better compression than JPEG/PNG while maintaining quality.
*/
public async convertPDFToWebpBytes(
pdfBytes: Uint8Array,
options: {
scale?: number; // Scale factor for preview size (default: 3.0 for 216 DPI)
quality?: number; // WebP quality 0-100 (default: 85)
maxWidth?: number; // Maximum width in pixels (optional)
maxHeight?: number; // Maximum height in pixels (optional)
} = {}
): Promise<Uint8Array[]> {
// Set default options for higher quality output (3.0 = ~216 DPI)
const scale = options.scale || 3.0;
const quality = options.quality || 85;
// Create a new page using the headless browser
const page = await this.headlessBrowser.newPage();
// Prepare PDF data as a base64 string
const base64Pdf: string = Buffer.from(pdfBytes).toString('base64');
// HTML template that loads PDF.js and renders the PDF with scaling
const htmlTemplate: string = `
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>PDF to WebP Preview Converter</title>
<style>
body { margin: 0; }
canvas { display: block; margin: 10px auto; }
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.16.105/pdf.min.js"></script>
</head>
<body>
<script>
(async function() {
pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.16.105/pdf.worker.min.js';
const pdfData = "__PDF_DATA__";
const raw = atob(pdfData);
const pdfArray = new Uint8Array([...raw].map(c => c.charCodeAt(0)));
const loadingTask = pdfjsLib.getDocument({data: pdfArray});
const pdf = await loadingTask.promise;
const numPages = pdf.numPages;
for (let pageNum = 1; pageNum <= numPages; pageNum++) {
const page = await pdf.getPage(pageNum);
// Apply scale factor to viewport
const viewport = page.getViewport({ scale: ${scale} });
// Apply max width/height constraints if specified
let finalScale = ${scale};
${options.maxWidth ? `
if (viewport.width > ${options.maxWidth}) {
finalScale = ${options.maxWidth} / (viewport.width / ${scale});
}` : ''}
${options.maxHeight ? `
if (viewport.height > ${options.maxHeight}) {
const heightScale = ${options.maxHeight} / (viewport.height / ${scale});
finalScale = Math.min(finalScale, heightScale);
}` : ''}
// Get final viewport with adjusted scale
const finalViewport = page.getViewport({ scale: finalScale });
const canvas = document.createElement('canvas');
const context = canvas.getContext('2d');
canvas.width = finalViewport.width;
canvas.height = finalViewport.height;
canvas.setAttribute('data-page', pageNum);
await page.render({ canvasContext: context, viewport: finalViewport }).promise;
document.body.appendChild(canvas);
}
window.renderComplete = true;
})();
</script>
</body>
</html>
`;
// Replace the placeholder with the actual base64 PDF data
const htmlContent: string = htmlTemplate.replace("__PDF_DATA__", base64Pdf);
// Set the page content
await page.setContent(htmlContent, { waitUntil: 'networkidle0' });
// Wait until the PDF.js rendering is complete
await page.waitForFunction(() => (window as any).renderComplete === true, { timeout: 30000 });
// Query all canvas elements (each representing a rendered PDF page)
const canvasElements = await page.$$('canvas');
const webpBuffers: Uint8Array[] = [];
for (const canvasElement of canvasElements) {
// Screenshot the canvas element as WebP
const screenshotBuffer = (await canvasElement.screenshot({
type: 'webp',
quality: quality,
encoding: 'binary'
})) as Buffer;
webpBuffers.push(new Uint8Array(screenshotBuffer));
}
await page.close();
return webpBuffers;
}
/**
* Converts a PDF to progressive JPEG bytes for each page.
* This method creates progressive JPEG images that load in multiple passes,
* showing a low-quality preview first, then progressively improving.
* Uses SmartJimp for true progressive JPEG encoding.
*/
public async convertPDFToJpegBytes(
pdfBytes: Uint8Array,
options: {
scale?: number; // Scale factor for output size (default: 3.0 for 216 DPI)
quality?: number; // JPEG quality 0-100 (default: 85)
maxWidth?: number; // Maximum width in pixels (optional)
maxHeight?: number; // Maximum height in pixels (optional)
} = {}
): Promise<Uint8Array[]> {
// First, convert PDF to PNG using our existing method
const pngBuffers = await this.convertPDFToPngBytes(pdfBytes, {
scale: options.scale,
maxWidth: options.maxWidth,
maxHeight: options.maxHeight
});
// Initialize SmartJimp in sharp mode for progressive JPEG support
const smartJimpInstance = new plugins.smartjimp.SmartJimp({ mode: 'sharp' });
// Convert each PNG to progressive JPEG
const jpegBuffers: Uint8Array[] = [];
const quality = options.quality || 85;
for (const pngBuffer of pngBuffers) {
// Convert PNG buffer to progressive JPEG
const jpegBuffer = await smartJimpInstance.computeAssetVariation(
Buffer.from(pngBuffer),
{
format: 'jpeg',
progressive: true,
// SmartJimp uses a different quality scale, need to check if adjustment is needed
// For now, pass through the quality value
quality
}
);
jpegBuffers.push(new Uint8Array(jpegBuffer));
}
return jpegBuffers;
}
}

View File

@@ -13,6 +13,7 @@ import * as smartpath from '@push.rocks/smartpath';
import * as smartpuppeteer from '@push.rocks/smartpuppeteer';
import * as smartnetwork from '@push.rocks/smartnetwork';
import * as smartunique from '@push.rocks/smartunique';
import * as smartjimp from '@push.rocks/smartjimp';
export {
smartbuffer,
@@ -23,6 +24,7 @@ export {
smartpuppeteer,
smartunique,
smartnetwork,
smartjimp,
};
// tsclass scope
@@ -33,7 +35,6 @@ export { tsclass };
// thirdparty
import express from 'express';
import pdf2json from 'pdf2json';
import pdf2pic from 'pdf2pic';
import pdfLib from 'pdf-lib';
export { express, pdf2json, pdf2pic, pdfLib, };
export { express, pdf2json, pdfLib, };