Files
objectstorage/readme.plan.md
T

126 lines
6.4 KiB
Markdown
Raw Normal View History

# Improvement Plan
## Goal
Make this repository a solid product wrapper for `@push.rocks/smartstorage`: reliable Docker image, predictable runtime behavior, durable admin configuration, and a management surface that scales past toy datasets.
## Current Position
The repository already has a clear structure and decent coverage for auth, buckets, objects, policies, credentials, status/config, lifecycle, and S3 compatibility.
The main gaps are product hardening and dependency boundaries:
- the Docker build still references `npmextra.json` even though the repo migrated to `.smartconfig.json`
- some admin/runtime behavior is still in-memory only
- status and bucket summaries are computed by rescanning storage through S3 calls
- the product surface does not yet expose real cluster and drive health
- the UI error model is still mostly `console.error()` + stale state retention
## Repo-Owned Work
### 1. Fix Packaging Regressions First
- Update `Dockerfile` to stop copying `npmextra.json` and use the current build config layout.
- Add a Docker smoke test that proves the image builds and the container starts with both ports exposed.
Reason:
- This repo is explicitly about shipping a usable container. A broken or stale Docker build path is a product failure, not a nice-to-have cleanup.
### 2. Make Admin Configuration Durable
- Decide which runtime-managed settings must survive restart.
- Credentials are the biggest gap today: add/remove currently mutates runtime config only.
- Align credential durability with the existing named-policy persistence model.
- Add restart-persistence tests.
Reason:
- A product container should not pretend to manage credentials if those changes disappear on restart.
### 3. Tighten Management API Behavior
- Fix bucket/policy cleanup ordering so deleting a bucket does not trigger follow-up work against a missing bucket.
- Return cleaner typed errors from handlers where failure is expected.
- Surface actionable errors in the UI instead of only logging and keeping stale state.
Reason:
- The current tests pass, but post-test output already shows avoidable `NoSuchBucket` noise during policy cleanup.
### 4. Reduce Large-Object and Large-Dataset Pain
- Keep inline editing and preview for small objects.
- Add explicit size thresholds for inline/base64 flows.
- Prefer direct object URLs or streaming paths for large-object download/preview paths.
- Stop treating expensive full-dataset scans as normal refresh behavior.
Reason:
- The product should stay usable once users have real buckets and non-trivial object counts.
### 5. Add Product Acceptance Coverage
- Docker build/start smoke test.
- Restart-persistence test for credentials and policies.
- Cluster-config smoke test.
- One browser-level smoke test for login, bucket browse, and config view.
Reason:
- The repo is productization code. Acceptance coverage matters more here than unit-test density.
## Dependency-Owned Work Completed
The dependency boundary work landed in `@push.rocks/smartstorage` `v6.4.0` and is now consumed here through the Deno import map.
- Runtime storage stats and bucket summaries come from `smartstorage` instead of product-side S3 scans.
- Runtime credential listing/replacement uses supported `smartstorage` APIs instead of mutating engine internals.
- Cluster and drive health are exposed through `smartstorage` and surfaced through the management API/UI.
## Suggested Execution Order
1. Fix Dockerfile and add a container smoke test.
2. Make credential changes durable and test restart behavior.
3. Clean up bucket/policy deletion behavior and UI error reporting.
4. Keep product code on supported `smartstorage` runtime APIs for stats, credentials, and cluster health.
5. Add browser-level smoke coverage for login, bucket browse, and config views.
## Success Criteria
- Docker image builds from a clean checkout.
- Runtime-managed credentials survive restart or are explicitly documented as ephemeral until fixed.
- Status and bucket views no longer require full object scans for routine refreshes.
- Bucket deletion and policy cleanup complete without noisy missing-bucket follow-up errors.
- Cluster mode exposes live health, not just configured values.
## Enterprise Readiness Plan
### Acceptance Criteria Implemented In-Repo
- Runtime health: expose unauthenticated `/livez`, `/readyz`, `/healthz`, and `/metrics` on the management server.
- Container health: Docker `HEALTHCHECK` must use `/readyz` and the image must run as a non-root user.
- Admin auditability: login, logout, bucket mutation, object mutation, credential mutation, and named-policy mutation must emit append-only audit events under `.objectstorage/audit.log`.
- Audit access: admins can query recent audit entries through a typed management API.
- Session security: JWT role is validated from verified claims, logout revokes the active token, and repeated failed logins are rate-limited.
- Secret persistence: admin metadata files are written with restrictive file mode where the OS supports it.
- Default-secret guardrail: persistent `/data` deployments refuse `admin/admin` defaults unless explicitly allowed for disposable development.
- Frontend session storage: admin identity is no longer stored in persistent browser state.
### Dependency Work Implemented In `@push.rocks/smartstorage`
- Cluster identity and topology snapshots persist under `.smartstorage/cluster/`.
- Cluster startup refuses unsafe seed-node fallback instead of silently forming a split-brain cluster.
- Heartbeats probe all known peers, including suspect/offline peers, using configured timeout.
- Operational S3-side endpoints expose `/-/live`, `/-/ready`, `/-/health`, and `/-/metrics`.
- Runtime credential listing returns metadata only; secrets are write-only.
- Multi-node tests cover topology convergence, remote drive routing, and restart/rejoin identity durability.
### Production Requirements Outside This Repo
- Run the management UI behind TLS or add native typedserver TLS configuration for the deployment.
- Configure real admin and S3 credentials through secrets management, not image defaults.
- Decide cluster transport policy: pinned CA or mTLS with operational certificate rotation. Do not operate QUIC insecure-dev transport on untrusted networks.
- Define backup/restore procedures for `/data`, `.objectstorage`, `.smartstorage/cluster`, bucket manifests, and policies.
- Add load, soak, and failure-injection runs to release qualification for large datasets and network partitions.