docs: refresh readme and legal info

This commit is contained in:
2026-05-07 20:22:12 +00:00
parent bb9b2bc74a
commit 904318531a
3 changed files with 170 additions and 240 deletions
+2 -2
View File
@@ -1,6 +1,6 @@
The MIT License (MIT) MIT License
Copyright (c) 2026 Lossless GmbH Copyright (c) 2026 Task Venture Capital GmbH
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal of this software and associated documentation files (the "Software"), to deal
+1 -1
View File
@@ -14,7 +14,7 @@
"type": "git", "type": "git",
"url": "https://code.foss.global/serve.zone/containerarchive.git" "url": "https://code.foss.global/serve.zone/containerarchive.git"
}, },
"author": "Lossless GmbH", "author": "Task Venture Capital GmbH",
"license": "MIT", "license": "MIT",
"bugs": { "bugs": {
"url": "https://code.foss.global/serve.zone/containerarchive/issues" "url": "https://code.foss.global/serve.zone/containerarchive/issues"
+166 -236
View File
@@ -1,325 +1,255 @@
# @serve.zone/containerarchive # @serve.zone/containerarchive
A high-performance, content-addressed incremental backup engine with built-in deduplication, encryption, compression, and Reed-Solomon error correction — powered by a Rust core with a clean TypeScript API. `@serve.zone/containerarchive` is a content-addressed incremental backup engine with a Rust core and TypeScript API for deduplicated, compressed, optionally encrypted, parity-protected snapshots of arbitrary Node.js streams.
## Issue Reporting and Security ## Issue Reporting and Security
For reporting bugs, issues, or security vulnerabilities, please visit [community.foss.global/](https://community.foss.global/). This is the central community hub for all issue reporting. Developers who sign and comply with our contribution agreement and go through identification can also get a [code.foss.global/](https://code.foss.global/) account to submit Pull Requests directly. For reporting bugs, issues, or security vulnerabilities, please visit [community.foss.global/](https://community.foss.global/). This is the central community hub for all issue reporting. Developers who sign and comply with our contribution agreement and go through identification can also get a [code.foss.global/](https://code.foss.global/) account to submit Pull Requests directly.
## Why It Exists
Container workloads do not only need file copies. They need repeatable point-in-time snapshots, low storage amplification, safe restores, and integrity checks that can run in automation. `containerarchive` packages those primitives behind a small TypeScript interface while leaving chunking, hashing, pack I/O, encryption, and repair work to Rust.
## Highlights
- 📦 Immutable snapshot manifests with tags and multi-item backup support
- 🧩 FastCDC content-defined chunking with SHA-256 content addressing
- ♻️ Cross-snapshot deduplication through a global chunk index
- 🗜️ gzip by default with zstd support in the Rust core
- 🔐 Optional AES-256-GCM encryption with Argon2id-derived passphrase wrapping
- 🧱 8 MB target pack files with sidecar `.idx` lookup data
- 🛟 Reed-Solomon parity, default RS(20,1), to recover one missing/corrupt pack per group
- 🔍 Quick, standard, and full repository verification modes
- 🧹 Retention pruning, stale lock handling, index rebuilds, and parity repair
## Install ## Install
```bash ```bash
pnpm install @serve.zone/containerarchive pnpm add @serve.zone/containerarchive
``` ```
## 🏗️ Architecture ## Architecture
containerarchive uses a **hybrid Rust + TypeScript architecture**. The heavy lifting — chunking, hashing, compression, encryption, pack file I/O, and parity — runs in a compiled Rust binary. The TypeScript layer provides a clean, idiomatic Node.js API and manages data streaming via Unix sockets through the [`@push.rocks/smartrust`](https://code.foss.global/push.rocks/smartrust) RustBridge IPC. The TypeScript class manages the developer-facing API and uses `@push.rocks/smartrust` to control the compiled Rust binary. Large data does not travel through JSON IPC; the TypeScript side opens temporary Unix sockets and streams bytes directly to or from Rust.
``` ```
┌──────────────────────────────────────┐ Node.js app
│ Your Application (TypeScript/JS) │ |
│ ┌────────────────────────────────┐ │ | TypeScript API: ContainerArchive
│ │ ContainerArchive API │ │ |
│ .init() .ingest() .restore()│ │ | JSON IPC for commands, Unix sockets for data streams
│ └────────────┬───────────────────┘ │ v
│ │ Unix Socket + JSON │ Rust engine
│ ┌────────────▼───────────────────┐ │ |
│ │ Rust Engine (compiled bin) │ │ | chunk -> hash -> compress -> encrypt -> pack -> snapshot
│ │ FastCDC │ SHA-256 │ AES-GCM │ │ v
│ │ gzip/zstd │ Reed-Solomon │ │ repository directory
│ └────────────────────────────────┘ │
└──────────────────────────────────────┘
``` ```
## ✨ Features ## Quick Start
| Feature | Details | ```typescript
|---|---| import { createReadStream, createWriteStream } from 'node:fs';
| **Content-Defined Chunking** | FastCDC with gear-based rolling hash — insertions/deletions only affect nearby boundaries | import { ContainerArchive } from '@serve.zone/containerarchive';
| **Deduplication** | SHA-256 chunk addressing — identical data is stored only once across all snapshots |
| **Compression** | gzip or zstd per-chunk compression |
| **Encryption** | AES-256-GCM with Argon2id key derivation — passphrase-protected repositories |
| **Pack Files** | Chunks are batched into binary pack files with binary `.idx` indexes for fast lookup |
| **Snapshots** | Immutable point-in-time snapshots with metadata tags and multi-item support |
| **Reed-Solomon Parity** | RS(20,1) erasure coding — recover any single lost pack from a group of 20 |
| **Incremental** | Only new/changed chunks are stored on each ingest |
| **Streaming** | Unix socket streaming between TypeScript and Rust for zero-copy data transfer |
| **Multi-Item Snapshots** | Bundle multiple data streams (DB dumps, config tarballs, etc.) into a single snapshot |
| **Verification** | Three-level integrity checks: quick, standard, full |
| **Pruning** | Retention policies (keep last N, days, weeks, months) with garbage collection |
| **Repair** | Automatic index rebuild, stale lock removal, and parity-based pack recovery |
## 📖 Usage const repo = await ContainerArchive.init('/backups/my-service', {
passphrase: process.env.ARCHIVE_PASSPHRASE,
});
### Initialize a New Repository const snapshot = await repo.ingest(createReadStream('/tmp/database.sql'), {
tags: {
service: 'postgres',
environment: 'production',
},
items: [{ name: 'database.sql', type: 'database-dump' }],
});
console.log(snapshot.id, snapshot.newChunks, snapshot.reusedChunks);
const restored = await repo.restore(snapshot.id, { item: 'database.sql' });
restored.pipe(createWriteStream('/tmp/restored-database.sql'));
await repo.close();
```
## Open an Existing Repository
```typescript ```typescript
import { ContainerArchive } from '@serve.zone/containerarchive'; import { ContainerArchive } from '@serve.zone/containerarchive';
// Unencrypted repository const repo = await ContainerArchive.open('/backups/my-service', {
const repo = await ContainerArchive.init('/path/to/backup-repo'); passphrase: process.env.ARCHIVE_PASSPHRASE,
});
// Encrypted repository (AES-256-GCM + Argon2id) const snapshots = await repo.listSnapshots({
const encryptedRepo = await ContainerArchive.init('/path/to/secure-repo', { tags: { service: 'postgres' },
passphrase: 'my-strong-passphrase',
}); });
``` ```
### Open an Existing Repository Repositories initialized without a passphrase are unencrypted. Encrypted repositories require the passphrase on `open()`.
## Multi-Item Snapshots
Use `ingestMulti()` when a single restore point needs several streams, for example a DB dump plus a config archive.
```typescript ```typescript
const repo = await ContainerArchive.open('/path/to/backup-repo'); import { createReadStream } from 'node:fs';
// With passphrase for encrypted repos
const repo = await ContainerArchive.open('/path/to/secure-repo', {
passphrase: 'my-strong-passphrase',
});
```
### Ingest Data (Single Stream)
```typescript
import * as fs from 'node:fs';
const inputStream = fs.createReadStream('/path/to/database-dump.sql');
const snapshot = await repo.ingest(inputStream, {
tags: { service: 'postgres', environment: 'production' },
items: [{ name: 'database.sql', type: 'database-dump' }],
});
console.log(`Snapshot ${snapshot.id} created`);
console.log(`Original: ${snapshot.originalSize} bytes`);
console.log(`Stored: ${snapshot.storedSize} bytes`);
console.log(`New chunks: ${snapshot.newChunks}, Reused: ${snapshot.reusedChunks}`);
```
### Multi-Item Ingest
Bundle multiple data streams into one snapshot:
```typescript
import * as stream from 'node:stream';
const dbDump = fs.createReadStream('/tmp/pg_dump.sql');
const configTar = fs.createReadStream('/tmp/config-volumes.tar');
const snapshot = await repo.ingestMulti([ const snapshot = await repo.ingestMulti([
{ stream: dbDump, name: 'database.sql', type: 'database-dump' }, {
{ stream: configTar, name: 'config.tar', type: 'volume-tar' }, name: 'database.sql',
type: 'database-dump',
stream: createReadStream('/tmp/database.sql'),
},
{
name: 'volumes.tar',
type: 'volume-tar',
stream: createReadStream('/tmp/volumes.tar'),
},
], { ], {
tags: { service: 'myapp', type: 'full-backup' }, tags: { service: 'nextcloud', kind: 'full-backup' },
}); });
console.log(`Items stored: ${snapshot.items.map(i => i.name).join(', ')}`); console.log(snapshot.items.map((item) => item.name));
``` ```
### Restore Data ## Listing, Filtering, and Restore
```typescript ```typescript
// Restore an entire snapshot
const restoreStream = await repo.restore(snapshot.id);
const writeStream = fs.createWriteStream('/tmp/restored-dump.sql');
restoreStream.pipe(writeStream);
// Restore a specific item from a multi-item snapshot
const configStream = await repo.restore(snapshot.id, { item: 'config.tar' });
configStream.pipe(fs.createWriteStream('/tmp/restored-config.tar'));
```
### List & Filter Snapshots
```typescript
// List all snapshots
const allSnapshots = await repo.listSnapshots(); const allSnapshots = await repo.listSnapshots();
// Filter by tags const recentProductionSnapshots = await repo.listSnapshots({
const prodSnapshots = await repo.listSnapshots({
tags: { environment: 'production' }, tags: { environment: 'production' },
after: '2026-05-01T00:00:00Z',
}); });
// Filter by date range const snapshot = await repo.getSnapshot(recentProductionSnapshots[0].id);
const recentSnapshots = await repo.listSnapshots({
after: '2026-03-01T00:00:00Z',
before: '2026-03-22T00:00:00Z',
});
// Get a specific snapshot const stream = await repo.restore(snapshot.id, {
const snap = await repo.getSnapshot('snapshot-id-here'); item: snapshot.items[0].name,
});
``` ```
### Verify Repository Integrity ## Verification and Repair
```typescript ```typescript
// Quick check — validates index consistency
const quick = await repo.verify({ level: 'quick' }); const quick = await repo.verify({ level: 'quick' });
// Standard — reads pack headers and validates checksums
const standard = await repo.verify({ level: 'standard' });
// Full — decompresses and re-hashes every chunk
const full = await repo.verify({ level: 'full' }); const full = await repo.verify({ level: 'full' });
console.log(`OK: ${full.ok}`); if (!full.ok) {
console.log(`Packs checked: ${full.stats.packsChecked}`); console.error(full.errors);
console.log(`Chunks checked: ${full.stats.chunksChecked}`); }
const repair = await repo.repair();
console.log(repair.indexRebuilt, repair.packsRepaired, repair.errors);
await repo.reindex();
await repo.unlock();
``` ```
### Prune Old Snapshots Verification levels are intentionally different tradeoffs: quick checks index consistency, standard reads pack metadata/checksums, and full rehydrates chunk content for the strongest validation.
## Retention Pruning
Always dry-run retention policies before deleting data.
```typescript ```typescript
// Dry run first const preview = await repo.prune({ keepLast: 7, keepDays: 30 }, true);
const preview = await repo.prune({ keepLast: 5, keepDays: 30 }, true); console.log('would free bytes', preview.freedBytes);
console.log(`Would remove ${preview.removedSnapshots} snapshots, free ${preview.freedBytes} bytes`);
// Execute for real
const result = await repo.prune({ const result = await repo.prune({
keepLast: 5, keepLast: 7,
keepDays: 30, keepDays: 30,
keepWeeks: 12, keepWeeks: 12,
keepMonths: 6, keepMonths: 6,
}); });
console.log(`Removed ${result.removedSnapshots} snapshots, ${result.removedPacks} packs`);
console.log(result.removedSnapshots, result.removedPacks, result.freedBytes);
``` ```
### Repair & Maintenance ## Events
`ContainerArchive#on()` exposes RxJS subscriptions for progress and integrity signals.
```typescript ```typescript
// Repair — rebuild index, remove stale locks, attempt parity recovery const subscription = repo.on('ingest:progress', (event) => {
const repairResult = await repo.repair(); console.log(event.operation, event.percentage, event.message);
console.log(`Index rebuilt: ${repairResult.indexRebuilt}`);
console.log(`Packs repaired via parity: ${repairResult.packsRepaired}`);
// Rebuild global index from pack .idx files
await repo.reindex();
// Remove stale locks
await repo.unlock();
await repo.unlock({ force: true }); // force-remove all locks
```
### Event Subscriptions
Monitor ingest progress and errors with RxJS-based event streams:
```typescript
// Track ingest progress
const sub = repo.on('ingest:progress', (data) => {
console.log(`${data.operation}: ${data.percentage}% — ${data.message}`);
}); });
// Track completed ingests repo.on('ingest:complete', (event) => {
repo.on('ingest:complete', (data) => { console.log('snapshot complete', event.snapshotId);
console.log(`Snapshot ${data.snapshotId} complete — ${data.newChunks} new chunks`);
}); });
// Track verification errors repo.on('verify:error', (event) => {
repo.on('verify:error', (error) => { console.error('verification error', event.pack, event.chunk, event.error);
console.error(`Verification error in ${error.pack || error.chunk}: ${error.error}`);
}); });
// Unsubscribe when done subscription.unsubscribe();
sub.unsubscribe();
``` ```
### Close the Repository ## Repository Layout
```typescript An initialized repository is a directory with predictable data stores.
await repo.close();
```text
repo/
config.json
packs/
data/
parity/
snapshots/
index/
keys/
locks/
``` ```
## 🗂️ Repository Structure | Path | Purpose |
When initialized, a repository has the following on-disk layout:
```
backup-repo/
├── config.json # Repository config (chunking, compression, encryption, parity)
├── packs/
│ ├── data/ # Binary pack files (.pack) and indexes (.idx)
│ └── parity/ # Reed-Solomon parity packs
├── snapshots/ # JSON snapshot manifests
├── index/ # Global chunk index (hash → pack location)
├── keys/ # Encrypted key files (for passphrase-protected repos)
└── locks/ # Advisory locks for concurrent access
```
## 🔧 How It Works
1. **Chunking** — Incoming data is split into variable-size chunks using FastCDC with a gear-based rolling hash. Chunk sizes range from 64 KB to 1 MB (avg 256 KB). Content-defined boundaries mean that insertions or edits only affect nearby chunks, maximizing dedup across versions.
2. **Hashing** — Each chunk is hashed with SHA-256 for content addressing. If a chunk's hash already exists in the global index, it's deduplicated — only a reference is stored.
3. **Compression** — New chunks are compressed with gzip (default) or zstd before storage. Per-chunk compression flags are stored in the index.
4. **Encryption** — If a passphrase is set, a random 256-bit master key is generated, wrapped with an Argon2id-derived key, and stored in a key file. Every chunk is encrypted with AES-256-GCM using a unique nonce.
5. **Packing** — Compressed (and optionally encrypted) chunks are appended into binary pack files (~8 MB target). Each pack has an associated `.idx` file with chunk offsets, sizes, and flags for O(1) lookup.
6. **Parity** — After every group of 20 data packs, a Reed-Solomon RS(20,1) parity pack is generated. If any single pack in the group is lost or corrupted, it can be fully reconstructed.
7. **Snapshots** — A JSON manifest records the chunk list, tags, sizes, and item metadata. Snapshots are immutable — pruning removes snapshots but never alters existing pack data in-place.
8. **Restore** — The snapshot manifest is read, chunks are looked up in the global index, fetched from pack files, decompressed, decrypted if needed, and streamed back in order via a Unix socket.
## 📋 API Reference
### `ContainerArchive`
| Method | Description |
| --- | --- | | --- | --- |
| `static init(path, options?)` | Create a new repository. Returns `Promise<ContainerArchive>` | | `config.json` | Repository ID, chunking config, compression, encryption, pack target size, and parity config. |
| `static open(path, options?)` | Open an existing repository. Returns `Promise<ContainerArchive>` | | `packs/data` | Binary pack files and pack indexes. |
| `ingest(stream, options?)` | Ingest a single data stream. Returns `Promise<ISnapshot>` | | `packs/parity` | Reed-Solomon parity shards and parity manifests. |
| `ingestMulti(items, options?)` | Ingest multiple streams as one snapshot. Returns `Promise<ISnapshot>` | | `snapshots` | Immutable JSON snapshot manifests. |
| `restore(snapshotId, options?)` | Restore a snapshot. Returns `Promise<ReadableStream>` | | `index` | Global content-addressed chunk index. |
| `listSnapshots(filter?)` | List snapshots with optional tag/date filtering. Returns `Promise<ISnapshot[]>` | | `keys` | Wrapped encryption keys for passphrase-protected repositories. |
| `getSnapshot(id)` | Get a specific snapshot. Returns `Promise<ISnapshot>` | | `locks` | Advisory lock records for write operations. |
| `verify(options?)` | Verify repository integrity (quick/standard/full). Returns `Promise<IVerifyResult>` |
| `prune(retention, dryRun?)` | Remove old snapshots and garbage-collect packs. Returns `Promise<IPruneResult>` |
| `repair()` | Rebuild index, remove stale locks, attempt parity recovery. Returns `Promise<IRepairResult>` |
| `reindex()` | Rebuild the global index from pack `.idx` files. Returns `Promise<void>` |
| `unlock(options?)` | Remove advisory locks. Returns `Promise<void>` |
| `on(event, handler)` | Subscribe to events. Returns `Subscription` |
| `close()` | Close the repository and terminate the Rust process. Returns `Promise<void>` |
### Key Interfaces ## API Surface
```typescript | API | Purpose |
interface ISnapshot { | --- | --- |
id: string; | `ContainerArchive.init(path, options?)` | Create a new repository and return an open instance. |
version: number; | `ContainerArchive.open(path, options?)` | Open an existing repository. |
createdAt: string; | `ingest(stream, options?)` | Store one stream as a snapshot. |
tags: Record<string, string>; | `ingestMulti(items, options?)` | Store several streams as one snapshot. |
originalSize: number; | `restore(snapshotId, options?)` | Return a readable stream for a full snapshot or item. |
storedSize: number; | `listSnapshots(filter?)` | List snapshots, optionally filtered by tags or date. |
chunkCount: number; | `getSnapshot(id)` | Load one snapshot manifest. |
newChunks: number; | `verify(options?)` | Verify repository integrity. |
reusedChunks: number; | `prune(retention, dryRun?)` | Apply retention rules and garbage collect unreferenced packs. |
items: ISnapshotItem[]; | `repair()` | Rebuild index data, remove stale locks, and attempt parity recovery. |
} | `reindex()` | Rebuild the global index from pack `.idx` files. |
| `unlock(options?)` | Remove advisory locks. |
| `on(event, handler)` | Subscribe to ingest/verify events. |
| `close()` | Close the repository and terminate the Rust process. |
interface IRetentionPolicy { ## Development
keepLast?: number;
keepDays?: number;
keepWeeks?: number;
keepMonths?: number;
}
interface IVerifyResult { ```bash
ok: boolean; pnpm run build
errors: IVerifyError[]; pnpm test
stats: {
packsChecked: number;
chunksChecked: number;
snapshotsChecked: number;
};
}
``` ```
Useful source entry points:
- `ts/index.ts` exports the public API.
- `ts/classes.containerarchive.ts` owns the TypeScript facade and stream socket handling.
- `ts/interfaces.ts` defines snapshot, retention, verification, repair, and IPC shapes.
- `rust/src/main.rs` starts the Rust management loop.
- `rust/src/ingest.rs`, `restore.rs`, `verify.rs`, `prune.rs`, and `repair.rs` implement the core workflows.
## License and Legal Information ## License and Legal Information
This repository contains open-source code licensed under the MIT License. A copy of the license can be found in the [LICENSE](./LICENSE) file. This repository contains open-source code licensed under the MIT License. A copy of the license can be found in the [license](./license) file.
**Please note:** The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file. **Please note:** The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.