461 lines
14 KiB
Markdown
461 lines
14 KiB
Markdown
# @push.rocks/smartarchive 📦
|
||
|
||
Powerful archive manipulation for modern Node.js applications.
|
||
|
||
`@push.rocks/smartarchive` is a versatile library for handling archive files with a focus on developer experience. Work with **zip**, **tar**, **gzip**, and **bzip2** formats through a unified, streaming-optimized API.
|
||
|
||
## Issue Reporting and Security
|
||
|
||
For reporting bugs, issues, or security vulnerabilities, please visit [community.foss.global/](https://community.foss.global/). This is the central community hub for all issue reporting. Developers who sign and comply with our contribution agreement and go through identification can also get a [code.foss.global/](https://code.foss.global/) account to submit Pull Requests directly.
|
||
|
||
## Features 🚀
|
||
|
||
- 📁 **Multi-format support** – Handle `.zip`, `.tar`, `.tar.gz`, `.tgz`, and `.bz2` archives
|
||
- 🌊 **Streaming-first architecture** – Process large archives without memory constraints
|
||
- 🔄 **Unified API** – Consistent interface across different archive formats
|
||
- 🎯 **Smart detection** – Automatically identifies archive types via magic bytes
|
||
- ⚡ **High performance** – Built on `tar-stream` and `fflate` for speed
|
||
- 🔧 **Flexible I/O** – Work with files, URLs, and streams seamlessly
|
||
- 🛠️ **Modern TypeScript** – Full type safety and excellent IDE support
|
||
|
||
## Installation 📥
|
||
|
||
```bash
|
||
# Using pnpm (recommended)
|
||
pnpm add @push.rocks/smartarchive
|
||
|
||
# Using npm
|
||
npm install @push.rocks/smartarchive
|
||
|
||
# Using yarn
|
||
yarn add @push.rocks/smartarchive
|
||
```
|
||
|
||
## Quick Start 🎯
|
||
|
||
### Extract an archive from URL
|
||
|
||
```typescript
|
||
import { SmartArchive } from '@push.rocks/smartarchive';
|
||
|
||
// Extract a .tar.gz archive from a URL directly to the filesystem
|
||
const archive = await SmartArchive.fromArchiveUrl(
|
||
'https://registry.npmjs.org/some-package/-/some-package-1.0.0.tgz'
|
||
);
|
||
await archive.exportToFs('./extracted');
|
||
```
|
||
|
||
### Process archive as a stream
|
||
|
||
```typescript
|
||
import { SmartArchive } from '@push.rocks/smartarchive';
|
||
|
||
// Stream-based processing for memory efficiency
|
||
const archive = await SmartArchive.fromArchiveFile('./large-archive.zip');
|
||
const streamOfFiles = await archive.exportToStreamOfStreamFiles();
|
||
|
||
// Process each file in the archive
|
||
streamOfFiles.on('data', async (streamFile) => {
|
||
console.log(`Processing ${streamFile.relativeFilePath}`);
|
||
const readStream = await streamFile.createReadStream();
|
||
// Handle individual file stream
|
||
});
|
||
|
||
streamOfFiles.on('end', () => {
|
||
console.log('Extraction complete');
|
||
});
|
||
```
|
||
|
||
## Core Concepts 💡
|
||
|
||
### Archive Sources
|
||
|
||
`SmartArchive` accepts archives from three sources:
|
||
|
||
| Source | Method | Use Case |
|
||
|--------|--------|----------|
|
||
| **URL** | `SmartArchive.fromArchiveUrl(url)` | Download and process archives from the web |
|
||
| **File** | `SmartArchive.fromArchiveFile(path)` | Load archives from the local filesystem |
|
||
| **Stream** | `SmartArchive.fromArchiveStream(stream)` | Process archives from any Node.js stream |
|
||
|
||
### Export Destinations
|
||
|
||
| Destination | Method | Use Case |
|
||
|-------------|--------|----------|
|
||
| **Filesystem** | `exportToFs(targetDir, fileName?)` | Extract directly to a directory |
|
||
| **Stream of files** | `exportToStreamOfStreamFiles()` | Process files individually as `StreamFile` objects |
|
||
|
||
## Usage Examples 🔨
|
||
|
||
### Working with ZIP files
|
||
|
||
```typescript
|
||
import { SmartArchive } from '@push.rocks/smartarchive';
|
||
|
||
// Extract a ZIP file
|
||
const zipArchive = await SmartArchive.fromArchiveFile('./archive.zip');
|
||
await zipArchive.exportToFs('./output');
|
||
|
||
// Stream ZIP contents for processing
|
||
const fileStream = await zipArchive.exportToStreamOfStreamFiles();
|
||
|
||
fileStream.on('data', async (streamFile) => {
|
||
if (streamFile.relativeFilePath.endsWith('.json')) {
|
||
const readStream = await streamFile.createReadStream();
|
||
// Process JSON files from the archive
|
||
}
|
||
});
|
||
```
|
||
|
||
### Working with TAR archives
|
||
|
||
```typescript
|
||
import { SmartArchive, TarTools } from '@push.rocks/smartarchive';
|
||
|
||
// Extract a .tar.gz file
|
||
const tarGzArchive = await SmartArchive.fromArchiveFile('./archive.tar.gz');
|
||
await tarGzArchive.exportToFs('./extracted');
|
||
|
||
// Create a TAR archive using TarTools directly
|
||
const tarTools = new TarTools();
|
||
const pack = await tarTools.getPackStream();
|
||
|
||
// Add files to the pack
|
||
await tarTools.addFileToPack(pack, {
|
||
fileName: 'hello.txt',
|
||
content: 'Hello, World!'
|
||
});
|
||
|
||
await tarTools.addFileToPack(pack, {
|
||
fileName: 'data.json',
|
||
content: Buffer.from(JSON.stringify({ foo: 'bar' }))
|
||
});
|
||
|
||
// Finalize and pipe to destination
|
||
pack.finalize();
|
||
pack.pipe(createWriteStream('./output.tar'));
|
||
```
|
||
|
||
### Pack a directory into TAR
|
||
|
||
```typescript
|
||
import { TarTools } from '@push.rocks/smartarchive';
|
||
import { createWriteStream } from 'fs';
|
||
|
||
const tarTools = new TarTools();
|
||
|
||
// Pack an entire directory
|
||
const pack = await tarTools.packDirectory('./src');
|
||
pack.finalize();
|
||
pack.pipe(createWriteStream('./source.tar'));
|
||
```
|
||
|
||
### Extracting from URLs
|
||
|
||
```typescript
|
||
import { SmartArchive } from '@push.rocks/smartarchive';
|
||
|
||
// Download and extract npm packages
|
||
const npmPackage = await SmartArchive.fromArchiveUrl(
|
||
'https://registry.npmjs.org/@push.rocks/smartfile/-/smartfile-11.2.7.tgz'
|
||
);
|
||
await npmPackage.exportToFs('./node_modules/@push.rocks/smartfile');
|
||
|
||
// Or process as stream for memory efficiency
|
||
const stream = await npmPackage.exportToStreamOfStreamFiles();
|
||
stream.on('data', async (file) => {
|
||
console.log(`Extracted: ${file.relativeFilePath}`);
|
||
});
|
||
```
|
||
|
||
### Working with GZIP files
|
||
|
||
```typescript
|
||
import { SmartArchive, GzipTools } from '@push.rocks/smartarchive';
|
||
import { createReadStream, createWriteStream } from 'fs';
|
||
|
||
// Decompress a .gz file - provide filename since gzip doesn't store it
|
||
const gzipArchive = await SmartArchive.fromArchiveFile('./data.json.gz');
|
||
await gzipArchive.exportToFs('./decompressed', 'data.json');
|
||
|
||
// Use GzipTools directly for streaming decompression
|
||
const gzipTools = new GzipTools();
|
||
const decompressStream = gzipTools.getDecompressionStream();
|
||
|
||
createReadStream('./compressed.gz')
|
||
.pipe(decompressStream)
|
||
.pipe(createWriteStream('./decompressed.txt'));
|
||
```
|
||
|
||
### Working with BZIP2 files
|
||
|
||
```typescript
|
||
import { SmartArchive } from '@push.rocks/smartarchive';
|
||
|
||
// Handle .bz2 files
|
||
const bzipArchive = await SmartArchive.fromArchiveUrl(
|
||
'https://example.com/data.bz2'
|
||
);
|
||
await bzipArchive.exportToFs('./extracted', 'data.txt');
|
||
```
|
||
|
||
### In-memory processing (no filesystem)
|
||
|
||
```typescript
|
||
import { SmartArchive } from '@push.rocks/smartarchive';
|
||
import { Readable } from 'stream';
|
||
|
||
// Process archives entirely in memory
|
||
const compressedBuffer = await fetchCompressedData();
|
||
const memoryStream = Readable.from(compressedBuffer);
|
||
|
||
const archive = await SmartArchive.fromArchiveStream(memoryStream);
|
||
const streamFiles = await archive.exportToStreamOfStreamFiles();
|
||
|
||
const extractedFiles: Array<{ name: string; content: Buffer }> = [];
|
||
|
||
streamFiles.on('data', async (streamFile) => {
|
||
const chunks: Buffer[] = [];
|
||
const readStream = await streamFile.createReadStream();
|
||
|
||
for await (const chunk of readStream) {
|
||
chunks.push(chunk);
|
||
}
|
||
|
||
extractedFiles.push({
|
||
name: streamFile.relativeFilePath,
|
||
content: Buffer.concat(chunks)
|
||
});
|
||
});
|
||
|
||
await new Promise((resolve) => streamFiles.on('end', resolve));
|
||
console.log(`Extracted ${extractedFiles.length} files in memory`);
|
||
```
|
||
|
||
### Nested archive handling (e.g., .tar.gz)
|
||
|
||
The library automatically handles nested compression. A `.tar.gz` file is:
|
||
1. First decompressed from gzip
|
||
2. Then unpacked from tar
|
||
|
||
This happens transparently:
|
||
|
||
```typescript
|
||
import { SmartArchive } from '@push.rocks/smartarchive';
|
||
|
||
// Automatically handles gzip → tar extraction chain
|
||
const tgzArchive = await SmartArchive.fromArchiveFile('./package.tar.gz');
|
||
await tgzArchive.exportToFs('./extracted');
|
||
```
|
||
|
||
## API Reference 📚
|
||
|
||
### SmartArchive Class
|
||
|
||
The main entry point for archive operations.
|
||
|
||
#### Static Factory Methods
|
||
|
||
```typescript
|
||
// Create from URL - downloads and processes archive
|
||
SmartArchive.fromArchiveUrl(url: string): Promise<SmartArchive>
|
||
|
||
// Create from local file path
|
||
SmartArchive.fromArchiveFile(path: string): Promise<SmartArchive>
|
||
|
||
// Create from any Node.js readable stream
|
||
SmartArchive.fromArchiveStream(stream: Readable | Duplex | Transform): Promise<SmartArchive>
|
||
```
|
||
|
||
#### Instance Methods
|
||
|
||
```typescript
|
||
// Extract all files to a directory
|
||
// fileName is optional - used for single-file archives (like .gz) that don't store filename
|
||
exportToFs(targetDir: string, fileName?: string): Promise<void>
|
||
|
||
// Get a stream that emits StreamFile objects for each file in the archive
|
||
exportToStreamOfStreamFiles(): Promise<StreamIntake<StreamFile>>
|
||
|
||
// Get the raw archive stream (useful for piping)
|
||
getArchiveStream(): Promise<Readable>
|
||
```
|
||
|
||
#### Instance Properties
|
||
|
||
```typescript
|
||
archive.tarTools // TarTools instance for TAR-specific operations
|
||
archive.zipTools // ZipTools instance for ZIP-specific operations
|
||
archive.gzipTools // GzipTools instance for GZIP-specific operations
|
||
archive.bzip2Tools // Bzip2Tools instance for BZIP2-specific operations
|
||
archive.archiveAnalyzer // ArchiveAnalyzer for inspecting archive type
|
||
```
|
||
|
||
### TarTools Class
|
||
|
||
TAR-specific operations for creating and extracting TAR archives.
|
||
|
||
```typescript
|
||
import { TarTools } from '@push.rocks/smartarchive';
|
||
|
||
const tarTools = new TarTools();
|
||
|
||
// Get a tar pack stream for creating archives
|
||
const pack = await tarTools.getPackStream();
|
||
|
||
// Add files to a pack stream
|
||
await tarTools.addFileToPack(pack, {
|
||
fileName: 'file.txt', // Name in archive
|
||
content: 'Hello World', // String, Buffer, Readable, SmartFile, or StreamFile
|
||
byteLength?: number, // Optional: specify size for streams
|
||
filePath?: string // Optional: path to file on disk
|
||
});
|
||
|
||
// Pack an entire directory
|
||
const pack = await tarTools.packDirectory('./src');
|
||
|
||
// Get extraction stream
|
||
const extract = tarTools.getDecompressionStream();
|
||
```
|
||
|
||
### ZipTools Class
|
||
|
||
ZIP-specific operations.
|
||
|
||
```typescript
|
||
import { ZipTools } from '@push.rocks/smartarchive';
|
||
|
||
const zipTools = new ZipTools();
|
||
|
||
// Get compression stream (for creating ZIP)
|
||
const compressor = zipTools.getCompressionStream();
|
||
|
||
// Get decompression stream (for extracting ZIP)
|
||
const decompressor = zipTools.getDecompressionStream();
|
||
```
|
||
|
||
### GzipTools Class
|
||
|
||
GZIP compression/decompression streams.
|
||
|
||
```typescript
|
||
import { GzipTools } from '@push.rocks/smartarchive';
|
||
|
||
const gzipTools = new GzipTools();
|
||
|
||
// Get compression stream
|
||
const compressor = gzipTools.getCompressionStream();
|
||
|
||
// Get decompression stream
|
||
const decompressor = gzipTools.getDecompressionStream();
|
||
```
|
||
|
||
## Supported Formats 📋
|
||
|
||
| Format | Extension(s) | Extract | Create |
|
||
|--------|--------------|---------|--------|
|
||
| TAR | `.tar` | ✅ | ✅ |
|
||
| TAR.GZ / TGZ | `.tar.gz`, `.tgz` | ✅ | ⚠️ |
|
||
| ZIP | `.zip` | ✅ | ⚠️ |
|
||
| GZIP | `.gz` | ✅ | ✅ |
|
||
| BZIP2 | `.bz2` | ✅ | ❌ |
|
||
|
||
✅ Full support | ⚠️ Partial/basic support | ❌ Not supported
|
||
|
||
## Performance Tips 🏎️
|
||
|
||
1. **Use streaming for large files** – Avoid loading entire archives into memory with `exportToStreamOfStreamFiles()`
|
||
2. **Provide byte lengths when known** – When adding streams to TAR, provide `byteLength` for better performance
|
||
3. **Process files as they stream** – Don't collect all files into an array unless necessary
|
||
4. **Choose the right format** – TAR.GZ for Unix/compression, ZIP for cross-platform compatibility
|
||
|
||
## Error Handling 🛡️
|
||
|
||
```typescript
|
||
import { SmartArchive } from '@push.rocks/smartarchive';
|
||
|
||
try {
|
||
const archive = await SmartArchive.fromArchiveUrl('https://example.com/file.zip');
|
||
await archive.exportToFs('./output');
|
||
} catch (error) {
|
||
if (error.code === 'ENOENT') {
|
||
console.error('Archive file not found');
|
||
} else if (error.code === 'EACCES') {
|
||
console.error('Permission denied');
|
||
} else if (error.message.includes('fetch')) {
|
||
console.error('Network error downloading archive');
|
||
} else {
|
||
console.error('Archive extraction failed:', error.message);
|
||
}
|
||
}
|
||
```
|
||
|
||
## Real-World Use Cases 🌍
|
||
|
||
### CI/CD: Download & Extract Build Artifacts
|
||
|
||
```typescript
|
||
const artifacts = await SmartArchive.fromArchiveUrl(
|
||
`${CI_SERVER}/artifacts/build-${BUILD_ID}.zip`
|
||
);
|
||
await artifacts.exportToFs('./dist');
|
||
```
|
||
|
||
### Backup System: Restore from Archive
|
||
|
||
```typescript
|
||
const backup = await SmartArchive.fromArchiveFile('./backup-2024.tar.gz');
|
||
await backup.exportToFs('/restore/location');
|
||
```
|
||
|
||
### NPM Package Inspection
|
||
|
||
```typescript
|
||
const pkg = await SmartArchive.fromArchiveUrl(
|
||
'https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz'
|
||
);
|
||
const files = await pkg.exportToStreamOfStreamFiles();
|
||
|
||
files.on('data', async (file) => {
|
||
if (file.relativeFilePath.includes('package.json')) {
|
||
const stream = await file.createReadStream();
|
||
// Read and analyze package.json
|
||
}
|
||
});
|
||
```
|
||
|
||
### Data Pipeline: Process Compressed Datasets
|
||
|
||
```typescript
|
||
const dataset = await SmartArchive.fromArchiveUrl(
|
||
'https://data.source/dataset.tar.gz'
|
||
);
|
||
|
||
const files = await dataset.exportToStreamOfStreamFiles();
|
||
files.on('data', async (file) => {
|
||
if (file.relativeFilePath.endsWith('.csv')) {
|
||
const stream = await file.createReadStream();
|
||
// Stream CSV processing
|
||
}
|
||
});
|
||
```
|
||
|
||
## License and Legal Information
|
||
|
||
This repository contains open-source code that is licensed under the MIT License. A copy of the MIT License can be found in the [license](license) file within this repository.
|
||
|
||
**Please note:** The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.
|
||
|
||
### Trademarks
|
||
|
||
This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH and are not included within the scope of the MIT license granted herein. Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines, and any usage must be approved in writing by Task Venture Capital GmbH.
|
||
|
||
### Company Information
|
||
|
||
Task Venture Capital GmbH
|
||
Registered at District court Bremen HRB 35230 HB, Germany
|
||
|
||
For any legal inquiries or if you require further information, please contact us via email at hello@task.vc.
|
||
|
||
By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.
|