feat: enhance storage stats and cluster health reporting

- Introduced new data structures for bucket and storage statistics, including BucketSummary, StorageStats, and ClusterHealth.
- Implemented runtime statistics tracking for buckets, including object count and total size.
- Added methods to retrieve storage stats and bucket summaries in the FileStore.
- Enhanced the SmartStorage interface to expose storage stats and cluster health.
- Implemented tests for runtime stats, cluster health, and credential management.
- Added support for runtime-managed credentials with atomic replacement.
- Improved filesystem usage reporting for storage locations.
This commit is contained in:
2026-04-19 11:57:28 +00:00
parent c683b02e8c
commit 0e9862efca
16 changed files with 1803 additions and 85 deletions
+109
View File
@@ -32,6 +32,8 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community
- 📋 **Bucket policies** — IAM-style JSON policies with Allow/Deny evaluation and wildcard matching
- 🌐 **CORS middleware** — configurable cross-origin support
- 🧹 **Clean slate mode** — wipe storage on startup for test isolation
- 📊 **Runtime storage stats** — cheap bucket summaries and global counts without S3 list scans
- 🔑 **Runtime credential rotation** — list and replace active auth credentials without mutating internals
-**Test-first design** — start/stop in milliseconds, no port conflicts
### Clustering Features
@@ -39,6 +41,7 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community
- 🔗 **Erasure coding** — Reed-Solomon (configurable k data + m parity shards) for storage efficiency and fault tolerance
- 🚄 **QUIC transport** — multiplexed, encrypted inter-node communication via `quinn` with zero head-of-line blocking
- 💽 **Multi-drive awareness** — each node manages multiple independent storage paths with health monitoring
- 🩺 **Cluster health introspection** — query native node, drive, quorum, and healing status for product dashboards
- 🤝 **Cluster membership** — static seed config + runtime join, heartbeat-based failure detection
- ✍️ **Quorum writes** — data is only acknowledged after k+1 shards are persisted
- 📖 **Quorum reads** — reconstruct from any k available shards, local-first fast path
@@ -201,6 +204,112 @@ const storage = await SmartStorage.createAndStart({
});
```
## Runtime Credentials
```typescript
const credentials = await storage.listCredentials();
await storage.replaceCredentials([
{
accessKeyId: 'ADMINA',
secretAccessKey: 'super-secret-a',
},
{
accessKeyId: 'ADMINB',
secretAccessKey: 'super-secret-b',
},
]);
```
```typescript
interface IStorageCredential {
accessKeyId: string;
secretAccessKey: string;
}
```
- `listCredentials()` returns the Rust core's current runtime credential set.
- `replaceCredentials()` swaps the full set atomically. On success, new requests use the new set immediately and the old credentials stop authenticating immediately.
- Requests that were already authenticated before the replacement keep running; auth is evaluated when each request starts.
- No restart is required.
- Replacement input must contain at least one credential, each `accessKeyId` and `secretAccessKey` must be non-empty, and `accessKeyId` values must be unique.
## Runtime Stats
```typescript
const stats = await storage.getStorageStats();
const bucketSummaries = await storage.listBucketSummaries();
console.log(stats.bucketCount);
console.log(stats.totalObjectCount);
console.log(stats.totalStorageBytes);
console.log(bucketSummaries[0]?.name, bucketSummaries[0]?.objectCount);
```
```typescript
interface IBucketSummary {
name: string;
objectCount: number;
totalSizeBytes: number;
creationDate?: number;
}
interface IStorageLocationSummary {
path: string;
totalBytes?: number;
availableBytes?: number;
usedBytes?: number;
}
interface IStorageStats {
bucketCount: number;
totalObjectCount: number;
totalStorageBytes: number;
buckets: IBucketSummary[];
storageDirectory: string;
storageLocations?: IStorageLocationSummary[];
}
```
- `bucketCount`, `totalObjectCount`, `totalStorageBytes`, and per-bucket totals are logical object stats maintained by the Rust runtime. They count object payload bytes, not sidecar files or erasure-coded shard overhead.
- smartstorage initializes these values from native on-disk state at startup, then keeps them in memory and updates them when bucket/object mutations succeed. Stats reads do not issue S3 `ListObjects` or rescan every object.
- Values are exact for mutations performed through smartstorage after startup. Direct filesystem edits outside smartstorage are not watched; restart the server to resync.
- `storageLocations` is a cheap filesystem-capacity snapshot. Standalone mode reports the storage directory. Cluster mode reports the configured drive paths.
## Cluster Health
```typescript
const clusterHealth = await storage.getClusterHealth();
if (!clusterHealth.enabled) {
console.log('Cluster mode is disabled');
} else {
console.log(clusterHealth.nodeId, clusterHealth.quorumHealthy);
console.log(clusterHealth.peers);
console.log(clusterHealth.drives);
}
```
```typescript
interface IClusterHealth {
enabled: boolean;
nodeId?: string;
quorumHealthy?: boolean;
majorityHealthy?: boolean;
peers?: IClusterPeerHealth[];
drives?: IClusterDriveHealth[];
erasure?: IClusterErasureHealth;
repairs?: IClusterRepairHealth;
}
```
- `getClusterHealth()` is served by the Rust core. The TypeScript wrapper does not infer values from static config.
- Standalone mode returns `{ enabled: false }`.
- Peer status is the local node's current view of cluster membership and heartbeats, so it is best-effort and may lag real network state.
- Drive health is based on live native probe checks on the configured local drive paths. Capacity values are cheap filesystem snapshots.
- `quorumHealthy` means the local node currently sees majority quorum and enough available placements in every erasure set to satisfy the configured write quorum.
- Repair fields expose the background healer's currently available runtime state. They are best-effort and limited to what the engine tracks today, such as whether a scan is active, the last completed run, and the last error.
## Usage with AWS SDK v3
```typescript