2021-12-18 01:41:50 +01:00
2021-12-18 01:41:50 +01:00
2026-03-21 22:00:41 +00:00

@push.rocks/smartstorage

A high-performance, S3-compatible storage server powered by a Rust core with a clean TypeScript API. Runs standalone for dev/test — or scales out as a distributed, erasure-coded cluster with QUIC-based inter-node communication. No cloud, no Docker. Just npm install and go. 🚀

Issue Reporting and Security

For reporting bugs, issues, or security vulnerabilities, please visit community.foss.global/. This is the central community hub for all issue reporting. Developers who sign and comply with our contribution agreement and go through identification can also get a code.foss.global/ account to submit Pull Requests directly.

Why smartstorage?

Feature smartstorage MinIO s3rver
Install pnpm add Docker / binary npm install
Startup time ~20ms seconds ~200ms
Large file uploads Streaming, zero-copy Yes OOM risk
Range requests Seek-based Yes Full read
Language Rust + TypeScript Go JavaScript
Multipart uploads Full support Yes No
Auth AWS SigV4 (full verification) Full IAM Basic
Bucket policies IAM-style evaluation Yes No
Clustering Erasure-coded, QUIC Yes No
Multi-drive awareness Per-drive health Yes No

Core Features

  • 🦀 Rust-powered HTTP server — hyper 1.x with streaming I/O, zero-copy, backpressure
  • 📦 Full S3-compatible API — works with AWS SDK v3, SmartBucket, any S3 client
  • 💾 Filesystem-backed storage — buckets map to directories, objects to files
  • 📤 Streaming multipart uploads — large files without memory pressure
  • 📐 Byte-range requestsseek() directly to the requested byte offset
  • 🔐 AWS SigV4 authentication — full signature verification with constant-time comparison
  • 📋 Bucket policies — IAM-style JSON policies with Allow/Deny evaluation and wildcard matching
  • 🌐 CORS middleware — configurable cross-origin support
  • 🧹 Clean slate mode — wipe storage on startup for test isolation
  • Test-first design — start/stop in milliseconds, no port conflicts

Clustering Features

  • 🔗 Erasure coding — Reed-Solomon (configurable k data + m parity shards) for storage efficiency and fault tolerance
  • 🚄 QUIC transport — multiplexed, encrypted inter-node communication via quinn with zero head-of-line blocking
  • 💽 Multi-drive awareness — each node manages multiple independent storage paths with health monitoring
  • 🤝 Cluster membership — static seed config + runtime join, heartbeat-based failure detection
  • ✍️ Quorum writes — data is only acknowledged after k+1 shards are persisted
  • 📖 Quorum reads — reconstruct from any k available shards, local-first fast path
  • 🩹 Self-healing — background scanner detects and reconstructs missing/corrupt shards

Installation

pnpm add @push.rocks/smartstorage -D

Note: The package ships with precompiled Rust binaries for linux_amd64 and linux_arm64. No Rust toolchain needed on your machine.

Quick Start

Standalone Mode (Dev & Test)

import { SmartStorage } from '@push.rocks/smartstorage';

// Start a local S3-compatible storage server
const storage = await SmartStorage.createAndStart({
  server: { port: 3000 },
  storage: { cleanSlate: true },
});

// Create a bucket
await storage.createBucket('my-bucket');

// Get connection details for any S3 client
const descriptor = await storage.getStorageDescriptor();
// → { endpoint: 'localhost', port: 3000, accessKey: 'STORAGE', accessSecret: 'STORAGE', useSsl: false }

// When done
await storage.stop();

Cluster Mode (Distributed)

import { SmartStorage } from '@push.rocks/smartstorage';

const storage = await SmartStorage.createAndStart({
  server: { port: 3000 },
  cluster: {
    enabled: true,
    nodeId: 'node-1',
    quicPort: 4000,
    seedNodes: ['192.168.1.11:4000', '192.168.1.12:4000'],
    erasure: {
      dataShards: 4,      // k: minimum shards to reconstruct data
      parityShards: 2,    // m: fault tolerance (can lose up to m shards)
    },
    drives: {
      paths: ['/mnt/disk1', '/mnt/disk2', '/mnt/disk3'],
    },
  },
});

Objects are automatically split into chunks (default 4 MB), erasure-coded into 6 shards (4 data + 2 parity), and distributed across drives/nodes. Any 4 of 6 shards can reconstruct the original data.

Configuration

All config fields are optional — sensible defaults are applied automatically.

import { SmartStorage, ISmartStorageConfig } from '@push.rocks/smartstorage';

const config: ISmartStorageConfig = {
  server: {
    port: 3000,              // Default: 3000
    address: '0.0.0.0',      // Default: '0.0.0.0'
    silent: false,           // Default: false
    region: 'us-east-1',     // Default: 'us-east-1' — used for SigV4 signing
  },
  storage: {
    directory: './my-data',  // Default: .nogit/bucketsDir
    cleanSlate: false,       // Default: false — set true to wipe on start
  },
  auth: {
    enabled: false,          // Default: false
    credentials: [{
      accessKeyId: 'MY_KEY',
      secretAccessKey: 'MY_SECRET',
    }],
  },
  cors: {
    enabled: false,          // Default: false
    allowedOrigins: ['*'],
    allowedMethods: ['GET', 'POST', 'PUT', 'DELETE', 'HEAD', 'OPTIONS'],
    allowedHeaders: ['*'],
    exposedHeaders: ['ETag', 'x-amz-request-id', 'x-amz-version-id'],
    maxAge: 86400,
    allowCredentials: false,
  },
  logging: {
    level: 'info',           // 'error' | 'warn' | 'info' | 'debug'
    format: 'text',          // 'text' | 'json'
    enabled: true,
  },
  limits: {
    maxObjectSize: 5 * 1024 * 1024 * 1024, // 5 GB
    maxMetadataSize: 2048,
    requestTimeout: 300000,  // 5 minutes
  },
  multipart: {
    expirationDays: 7,
    cleanupIntervalMinutes: 60,
  },
  cluster: {                 // Optional — omit for standalone mode
    enabled: true,
    nodeId: 'node-1',        // Auto-generated UUID if omitted
    quicPort: 4000,          // Default: 4000
    seedNodes: [],           // Addresses of existing cluster members
    erasure: {
      dataShards: 4,         // Default: 4
      parityShards: 2,       // Default: 2
      chunkSizeBytes: 4194304, // Default: 4 MB
    },
    drives: {
      paths: ['/mnt/disk1', '/mnt/disk2'],
    },
    heartbeatIntervalMs: 5000,  // Default: 5000
    heartbeatTimeoutMs: 30000,  // Default: 30000
  },
};

const storage = await SmartStorage.createAndStart(config);

Common Configurations

CI/CD testing — silent, clean, fast:

const storage = await SmartStorage.createAndStart({
  server: { port: 9999, silent: true },
  storage: { cleanSlate: true },
});

Auth enabled:

const storage = await SmartStorage.createAndStart({
  auth: {
    enabled: true,
    credentials: [{ accessKeyId: 'test', secretAccessKey: 'test123' }],
  },
});

CORS for local web dev:

const storage = await SmartStorage.createAndStart({
  cors: {
    enabled: true,
    allowedOrigins: ['http://localhost:5173'],
    allowCredentials: true,
  },
});

Usage with AWS SDK v3

import { S3Client, PutObjectCommand, GetObjectCommand, DeleteObjectCommand } from '@aws-sdk/client-s3';

const descriptor = await storage.getStorageDescriptor();

const client = new S3Client({
  endpoint: `http://${descriptor.endpoint}:${descriptor.port}`,
  region: 'us-east-1',
  credentials: {
    accessKeyId: descriptor.accessKey,
    secretAccessKey: descriptor.accessSecret,
  },
  forcePathStyle: true,  // Required for path-style access
});

// Upload
await client.send(new PutObjectCommand({
  Bucket: 'my-bucket',
  Key: 'hello.txt',
  Body: 'Hello, Storage!',
  ContentType: 'text/plain',
}));

// Download
const { Body } = await client.send(new GetObjectCommand({
  Bucket: 'my-bucket',
  Key: 'hello.txt',
}));
const content = await Body.transformToString(); // "Hello, Storage!"

// Delete
await client.send(new DeleteObjectCommand({
  Bucket: 'my-bucket',
  Key: 'hello.txt',
}));

Usage with SmartBucket

import { SmartBucket } from '@push.rocks/smartbucket';

const smartbucket = new SmartBucket(await storage.getStorageDescriptor());
const bucket = await smartbucket.createBucket('my-bucket');
const dir = await bucket.getBaseDirectory();

// Upload
await dir.fastPut({ path: 'docs/readme.txt', contents: 'Hello!' });

// Download
const content = await dir.fastGet('docs/readme.txt');

// List
const files = await dir.listFiles();

Multipart Uploads

For files larger than 5 MB, use multipart uploads. smartstorage handles them with streaming I/O — parts are written directly to disk, never buffered in memory. In cluster mode, each part is independently erasure-coded and distributed.

import {
  CreateMultipartUploadCommand,
  UploadPartCommand,
  CompleteMultipartUploadCommand,
} from '@aws-sdk/client-s3';

// 1. Initiate
const { UploadId } = await client.send(new CreateMultipartUploadCommand({
  Bucket: 'my-bucket',
  Key: 'large-file.bin',
}));

// 2. Upload parts
const parts = [];
for (let i = 0; i < chunks.length; i++) {
  const { ETag } = await client.send(new UploadPartCommand({
    Bucket: 'my-bucket',
    Key: 'large-file.bin',
    UploadId,
    PartNumber: i + 1,
    Body: chunks[i],
  }));
  parts.push({ PartNumber: i + 1, ETag });
}

// 3. Complete
await client.send(new CompleteMultipartUploadCommand({
  Bucket: 'my-bucket',
  Key: 'large-file.bin',
  UploadId,
  MultipartUpload: { Parts: parts },
}));

Bucket Policies

smartstorage supports AWS-style bucket policies for fine-grained access control. Policies use the same IAM JSON format as real S3 — so you can develop and test your policy logic locally before deploying.

When auth.enabled is true, the auth pipeline works as follows:

  1. Authenticate — verify the AWS SigV4 signature (anonymous requests skip this step)
  2. Authorize — evaluate bucket policies against the request action, resource, and caller identity
  3. Default — authenticated users get full access; anonymous requests are denied unless a policy explicitly allows them

Setting a Bucket Policy

import { PutBucketPolicyCommand } from '@aws-sdk/client-s3';

// Allow anonymous read access to all objects in a bucket
await client.send(new PutBucketPolicyCommand({
  Bucket: 'public-assets',
  Policy: JSON.stringify({
    Version: '2012-10-17',
    Statement: [{
      Sid: 'PublicRead',
      Effect: 'Allow',
      Principal: '*',
      Action: ['s3:GetObject'],
      Resource: ['arn:aws:s3:::public-assets/*'],
    }],
  }),
}));

Policy Features

  • Effect: Allow and Deny (explicit Deny always wins)
  • Principal: "*" (everyone) or { "AWS": ["arn:..."] } for specific identities
  • Action: IAM-style actions like s3:GetObject, s3:PutObject, s3:*, or prefix wildcards like s3:Get*
  • Resource: ARN patterns with * and ? wildcards (e.g. arn:aws:s3:::my-bucket/*)
  • Persistence: Policies survive server restarts — stored as JSON on disk alongside your data

Policy CRUD Operations

Operation AWS SDK Command HTTP
Get policy GetBucketPolicyCommand GET /{bucket}?policy
Set policy PutBucketPolicyCommand PUT /{bucket}?policy
Delete policy DeleteBucketPolicyCommand DELETE /{bucket}?policy

Deleting a bucket automatically removes its associated policy.

Clustering Deep Dive 🔗

smartstorage can run as a distributed storage cluster where multiple nodes cooperate to store and retrieve data with built-in redundancy.

How It Works

Client ──HTTP PUT──▶ Node A (coordinator)
                       │
                       ├─ Split object into 4 MB chunks
                       ├─ Erasure-code each chunk (4 data + 2 parity = 6 shards)
                       │
                       ├──QUIC──▶ Node B (shard writes)
                       ├──QUIC──▶ Node C (shard writes)
                       └─ Local disk (shard writes)
  1. Any node can coordinate — the client connects to any cluster member
  2. Objects are chunked — large objects split into fixed-size pieces (default 4 MB)
  3. Each chunk is erasure-coded — Reed-Solomon produces k data + m parity shards
  4. Shards are distributed — placed across different nodes and drives for fault isolation
  5. Quorum guarantees consistency — writes need k+1 acks, reads need k shards

Erasure Coding

With the default 4+2 configuration:

  • Storage overhead: 33% (vs. 200% for 3x replication)
  • Fault tolerance: any 2 drives/nodes can fail simultaneously
  • Read efficiency: only 4 of 6 shards needed to reconstruct data
Config Total Shards Overhead Tolerance Min Nodes
4+2 6 33% 2 failures 3
6+3 9 50% 3 failures 5
2+1 3 50% 1 failure 2

QUIC Transport

Inter-node communication uses QUIC via the quinn library:

  • 🔒 Built-in TLS — self-signed certs auto-generated at cluster init
  • 🔀 Multiplexed streams — concurrent shard transfers without head-of-line blocking
  • Connection pooling — persistent connections to peer nodes
  • 🌊 Natural backpressure — QUIC flow control prevents overloading slow peers

Cluster Membership

  • Static seed nodes — initial cluster defined in config
  • Runtime join — new nodes can join a running cluster
  • Heartbeat monitoring — every 5s (configurable), with suspect/offline detection
  • Split-brain prevention — nodes only mark peers offline when they have majority

Self-Healing

A background scanner periodically (default: every 24h):

  1. Checks shard checksums (CRC32C) for bit-rot detection
  2. Identifies shards on offline nodes
  3. Reconstructs missing shards from remaining data using Reed-Solomon
  4. Places healed shards on healthy drives

Healing runs at low priority to avoid impacting foreground I/O.

Erasure Set Formation

Drives are organized into fixed erasure sets at cluster initialization:

3 nodes × 4 drives each = 12 drives total
With 6-shard erasure sets → 2 erasure sets

Set 0: Node1-Disk0, Node2-Disk0, Node3-Disk0, Node1-Disk1, Node2-Disk1, Node3-Disk1
Set 1: Node1-Disk2, Node2-Disk2, Node3-Disk2, Node1-Disk3, Node2-Disk3, Node3-Disk3

Drives are interleaved across nodes for maximum fault isolation. New nodes form new erasure sets — existing data is never rebalanced.

Testing Integration

import { SmartStorage } from '@push.rocks/smartstorage';
import { tap, expect } from '@git.zone/tstest/tapbundle';

let storage: SmartStorage;

tap.test('setup', async () => {
  storage = await SmartStorage.createAndStart({
    server: { port: 4567, silent: true },
    storage: { cleanSlate: true },
  });
});

tap.test('should store and retrieve objects', async () => {
  await storage.createBucket('test');
  // ... your test logic using AWS SDK or SmartBucket
});

tap.test('teardown', async () => {
  await storage.stop();
});

export default tap.start();

API Reference

SmartStorage Class

static createAndStart(config?: ISmartStorageConfig): Promise<SmartStorage>

Create and start a server in one call.

start(): Promise<void>

Spawn the Rust binary and start the HTTP server.

stop(): Promise<void>

Gracefully stop the server and kill the Rust process.

createBucket(name: string): Promise<{ name: string }>

Create a storage bucket.

getStorageDescriptor(options?): Promise<IS3Descriptor>

Get connection details for S3-compatible clients. Returns:

Field Type Description
endpoint string Server hostname (localhost by default)
port number Server port
accessKey string Access key from first configured credential
accessSecret string Secret key from first configured credential
useSsl boolean Always false (plain HTTP)

Architecture

smartstorage uses a hybrid Rust + TypeScript architecture:

┌──────────────────────────────────────────────┐
│  Your Code (AWS SDK, SmartBucket, etc.)       │
│  ↕ HTTP (localhost:3000)                     │
├──────────────────────────────────────────────┤
│  ruststorage binary (Rust)                    │
│  ├─ hyper 1.x HTTP server                   │
│  ├─ S3 path-style routing                   │
│  ├─ StorageBackend (Standalone or Clustered) │
│  │   ├─ FileStore (single-node mode)        │
│  │   └─ DistributedStore (cluster mode)     │
│  │       ├─ ErasureCoder (Reed-Solomon)     │
│  │       ├─ ShardStore (per-drive storage)  │
│  │       ├─ QuicTransport (quinn)           │
│  │       ├─ ClusterState & Membership       │
│  │       └─ HealingService                  │
│  ├─ SigV4 auth + policy engine              │
│  ├─ CORS middleware                          │
│  └─ S3 XML response builder                 │
├──────────────────────────────────────────────┤
│  TypeScript (thin IPC wrapper)               │
│  ├─ SmartStorage class                       │
│  ├─ RustBridge (stdin/stdout JSON IPC)       │
│  └─ Config & S3 descriptor                  │
└──────────────────────────────────────────────┘

Why Rust? The original TypeScript implementation had critical perf issues: OOM on multipart uploads (parts buffered in memory), double stream copying, file descriptor leaks on HEAD requests, full-file reads for range requests, and no backpressure. The Rust binary solves all of these with streaming I/O, zero-copy, and direct seek() for range requests.

IPC Protocol: TypeScript spawns the ruststorage binary with --management and communicates via newline-delimited JSON over stdin/stdout. Commands: start, stop, createBucket, clusterStatus.

S3-Compatible Operations

Operation Method Path
ListBuckets GET /
CreateBucket PUT /{bucket}
DeleteBucket DELETE /{bucket}
HeadBucket HEAD /{bucket}
ListObjects (v1/v2) GET /{bucket} ?list-type=2 for v2
PutObject PUT /{bucket}/{key}
GetObject GET /{bucket}/{key} Supports Range header
HeadObject HEAD /{bucket}/{key}
DeleteObject DELETE /{bucket}/{key}
CopyObject PUT /{bucket}/{key} x-amz-copy-source header
InitiateMultipartUpload POST /{bucket}/{key}?uploads
UploadPart PUT /{bucket}/{key}?partNumber&uploadId
CompleteMultipartUpload POST /{bucket}/{key}?uploadId
AbortMultipartUpload DELETE /{bucket}/{key}?uploadId
ListMultipartUploads GET /{bucket}?uploads
GetBucketPolicy GET /{bucket}?policy
PutBucketPolicy PUT /{bucket}?policy
DeleteBucketPolicy DELETE /{bucket}?policy

On-Disk Format

Standalone mode:

{storage.directory}/
  {bucket}/
    {key}._storage_object                # Object data
    {key}._storage_object.metadata.json  # Metadata (content-type, x-amz-meta-*, etc.)
    {key}._storage_object.md5            # Cached MD5 hash
  .multipart/
    {upload-id}/
      metadata.json                      # Upload metadata
      part-1, part-2, ...               # Part data files
  .policies/
    {bucket}.policy.json                 # Bucket policy (IAM JSON format)

Cluster mode:

{drive_path}/.smartstorage/
  format.json                            # Drive metadata (cluster ID, erasure set)
  data/{bucket}/{key_hash}/{key}/
    chunk-{N}/shard-{M}.dat              # Erasure-coded shard data
    chunk-{N}/shard-{M}.meta             # Shard metadata (checksum, size)

{storage.directory}/
  .manifests/{bucket}/
    {key}.manifest.json                  # Object manifest (shard placements, checksums)
  .buckets/{bucket}/                     # Bucket metadata
  .policies/{bucket}.policy.json         # Bucket policies

This repository contains open-source code licensed under the MIT License. A copy of the license can be found in the LICENSE file.

Please note: The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.

Trademarks

This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH or third parties, and are not included within the scope of the MIT license granted herein.

Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines or the guidelines of the respective third-party owners, and any usage must be approved in writing. Third-party trademarks used herein are the property of their respective owners and used only in a descriptive manner, e.g. for an implementation of an API or similar.

Company Information

Task Venture Capital GmbH Registered at District Court Bremen HRB 35230 HB, Germany

For any legal inquiries or further information, please contact us via email at hello@task.vc.

By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.

Description
create an S3-compatible endpoint that map to a local directory.
Readme 1.3 MiB
Languages
Rust 82.9%
TypeScript 17.1%