Files
smartmigration/readme.md

377 lines
16 KiB
Markdown
Raw Permalink Normal View History

# @push.rocks/smartmigration
A unified migration runner for MongoDB (via [@push.rocks/smartdata](https://code.foss.global/push.rocks/smartdata)) and S3 (via [@push.rocks/smartbucket](https://code.foss.global/push.rocks/smartbucket)) — designed to be invoked at SaaS app startup.
## Installation
```bash
pnpm add @push.rocks/smartmigration
# plus whichever resources you need
pnpm add @push.rocks/smartdata @push.rocks/smartbucket
```
`@push.rocks/smartdata` and `@push.rocks/smartbucket` are declared as **optional peer dependencies** — install whichever ones your migrations actually touch.
## Issue Reporting and Security
Report bugs and security issues at [community.foss.global](https://community.foss.global).
## What is smartmigration?
`smartmigration` is the missing piece for SaaS apps in the push.rocks ecosystem: a deterministic, idempotent way to evolve persistent data in lockstep with code releases. You define a chain of migration steps with `from`/`to` semver versions, hand the runner your `SmartdataDb` and/or `Bucket`, and call `run()` from your app's startup path. The runner figures out which steps need to execute, runs them sequentially, and stamps progress into a ledger so subsequent boots are fast no-ops.
### Key features
| Feature | Description |
|---|---|
| **Builder-style API** | Fluent step definition: `step('id').from('1.0.0').to('1.1.0').up(handler)` |
| **Unified mongo + S3** | A single semver represents your combined data version. One step can touch both. |
| **Drivers exposed via context** | `ctx.db`, `ctx.mongo`, `ctx.bucket`, `ctx.s3` — write any operation you can write directly |
| **Idempotent** | `run()` is a no-op when data is at `targetVersion`. Costs one ledger read on the happy path. |
| **Resumable** | Mark a step `.resumable()` and it gets `ctx.checkpoint.read/write/clear` for restartable bulk operations |
| **Lockable** | Mongo-backed lock with TTL serializes concurrent SaaS instances — safe for rolling deploys |
| **Fresh-install fast path** | Configure `freshInstallVersion` to skip migrations on a brand-new database |
| **Dry-run** | `dryRun: true` or `.plan()` returns the execution plan without writing anything |
| **Structured errors** | All failures throw `SmartMigrationError` with a stable `code` field for branching |
## Quick start
```ts
import { SmartdataDb } from '@push.rocks/smartdata';
import { SmartBucket } from '@push.rocks/smartbucket';
import { SmartMigration } from '@push.rocks/smartmigration';
import { commitinfo } from './00_commitinfo_data.js';
// 1. set up your resources as you normally would
const db = new SmartdataDb({
mongoDbUrl: process.env.MONGODB_URL!,
mongoDbName: 'myapp',
});
await db.init();
const sb = new SmartBucket({
accessKey: process.env.S3_ACCESSKEY!,
accessSecret: process.env.S3_SECRETKEY!,
endpoint: process.env.S3_ENDPOINT!,
region: 'us-east-1',
});
const bucket = await sb.getBucketByName('myapp-uploads');
// 2. construct the runner with your app's target version
const migration = new SmartMigration({
targetVersion: commitinfo.version,
db,
bucket,
});
// 3. register migrations in execution order, with from/to validating the chain
migration
.step('lowercase-emails')
.from('1.0.0').to('1.1.0')
.description('Lowercase all user emails')
.up(async (ctx) => {
await ctx.mongo!.collection('users').updateMany(
{},
[{ $set: { email: { $toLower: '$email' } } }],
);
})
.step('reorganize-uploads')
.from('1.1.0').to('2.0.0')
.description('Move uploads/ to media/')
.resumable()
.up(async (ctx) => {
const cursor = ctx.bucket!.createCursor('uploads/');
const startToken = await ctx.checkpoint!.read<string>('cursorToken');
if (startToken) cursor.setToken(startToken);
while (await cursor.hasMore()) {
for (const key of (await cursor.next()) ?? []) {
await ctx.bucket!.fastMove({
sourcePath: key,
destinationPath: 'media/' + key.slice('uploads/'.length),
overwrite: true,
});
}
await ctx.checkpoint!.write('cursorToken', cursor.getToken());
}
});
// 4. run on startup — fast no-op once the data is at targetVersion
const result = await migration.run();
console.log(`smartmigration: ${result.currentVersionBefore ?? 'fresh'} → ${result.currentVersionAfter} (${result.stepsApplied.length} steps in ${result.totalDurationMs}ms)`);
```
## Core concepts
### The data version
A single semver string represents the combined state of your mongo and S3 data. It is **not** the same as your app version (though you usually want them to match): the app version is what's running, the data version is what's on disk. The runner's job is to bring the data version up to the app version.
### Steps and the chain
Each migration is a `step` with:
- A unique **id** (string) — used as the ledger key
- A **`from`** semver — the data version this step expects to start from
- A **`to`** semver — the data version this step produces
- An **`up`** handler — the actual migration logic
- An optional **description**, **resumable** flag
Steps execute in **registration order**. The runner validates that the chain is contiguous: `step[N].to === step[N+1].from`. This catches gaps and overlaps at definition time, before any handler runs.
### The ledger
The ledger is the source of truth for "what data version are we at, what steps have been applied, who holds the lock right now." It is persisted in one of two backends:
- **`mongo` (default when `db` is provided)** — backed by smartdata's `EasyStore`, stored as a single document. Lock semantics work safely across multiple SaaS instances. **Recommended.**
- **`s3` (default when only `bucket` is provided)** — a single JSON object at `<bucket>/.smartmigration/<ledgerName>.json`. Lock is best-effort because S3 has no atomic CAS without additional infrastructure; do not use for multi-instance deployments without external coordination.
If you pass both `db` and `bucket`, mongo is used.
### The migration context
Each `up` handler receives a `IMigrationContext` with:
```ts
interface IMigrationContext {
db?: SmartdataDb; // high-level smartdata
bucket?: Bucket; // high-level smartbucket Bucket
mongo?: mongodb.Db; // raw mongo Db (db.mongoDb)
s3?: S3Client; // raw AWS SDK v3 S3Client
step: { id, fromVersion, toVersion, description?, isResumable };
log: Smartlog;
isDryRun: boolean;
checkpoint?: IMigrationCheckpoint; // only when step is .resumable()
startSession(): mongodb.ClientSession; // throws if no db
}
```
Use whichever level fits your migration:
- **High-level** (`ctx.db`, `ctx.bucket`) for working with smartdata models / smartbucket files
- **Raw drivers** (`ctx.mongo`, `ctx.s3`) for any operation smartdata/smartbucket don't wrap (renameCollection, dropIndex, S3 lifecycle policies, etc.)
- **`startSession()`** for transactional mongo migrations
## Common use cases
### Mongo schema rename
```ts
migration
.step('rename-user-email-field')
.from('1.0.0').to('1.1.0')
.up(async (ctx) => {
await ctx.mongo!.collection('users').updateMany(
{ emailAddress: { $exists: true } },
{ $rename: { emailAddress: 'email' } },
);
});
```
### Backfilling a derived field with a transaction
```ts
migration
.step('backfill-account-tier')
.from('1.5.0').to('2.0.0')
.up(async (ctx) => {
const session = ctx.startSession();
try {
await session.withTransaction(async () => {
const users = ctx.mongo!.collection('users');
await users.updateMany(
{ tier: { $exists: false }, plan: 'free' },
{ $set: { tier: 'free' } },
{ session },
);
await users.updateMany(
{ tier: { $exists: false }, plan: { $in: ['pro', 'enterprise'] } },
{ $set: { tier: 'paid' } },
{ session },
);
});
} finally {
await session.endSession();
}
});
```
### Resumable S3 reprefix
```ts
migration
.step('move-attachments')
.from('2.0.0').to('2.1.0')
.resumable()
.up(async (ctx) => {
const cursor = ctx.bucket!.createCursor('attachments/legacy/', { pageSize: 200 });
const start = await ctx.checkpoint!.read<string>('cursor');
if (start) cursor.setToken(start);
while (await cursor.hasMore()) {
const batch = (await cursor.next()) ?? [];
for (const key of batch) {
await ctx.bucket!.fastMove({
sourcePath: key,
destinationPath: 'attachments/' + key.slice('attachments/legacy/'.length),
overwrite: true,
});
}
await ctx.checkpoint!.write('cursor', cursor.getToken());
}
});
```
If the process crashes mid-migration, the next call to `run()` will resume from the last persisted cursor token.
### Mongo + S3 in lockstep
```ts
migration
.step('split-avatars-out-of-users')
.from('2.1.0').to('2.5.0')
.up(async (ctx) => {
// 1. for every user with an inline avatarBytes field, write the bytes to S3
const users = ctx.mongo!.collection('users');
const cursor = users.find({ avatarBytes: { $exists: true } });
for await (const user of cursor) {
const buf = Buffer.from(user.avatarBytes.buffer);
await ctx.bucket!.fastPut({
path: `avatars/${user._id}.bin`,
contents: buf,
overwrite: true,
});
await users.updateOne(
{ _id: user._id },
{ $set: { avatarKey: `avatars/${user._id}.bin` }, $unset: { avatarBytes: '' } },
);
}
});
```
### Fresh-install fast path
```ts
const migration = new SmartMigration({
targetVersion: '5.0.0',
db,
freshInstallVersion: '5.0.0', // brand-new DB jumps straight to 5.0.0
});
// ... register your full chain of steps ...
await migration.run(); // for a fresh DB, runs zero steps and stamps version 5.0.0
```
### Dry run / planning
```ts
const planned = await migration.plan();
console.log(`would apply ${planned.stepsSkipped.length} steps:`,
planned.stepsSkipped.map((s) => `${s.id}(${s.fromVersion}→${s.toVersion})`).join(' → '));
// or — same thing, via dryRun option
const m = new SmartMigration({ targetVersion: '2.0.0', db, dryRun: true });
// ...
const result = await m.run(); // returns plan, doesn't write
```
## API reference
### `new SmartMigration(options: ISmartMigrationOptions)`
| Option | Type | Default | Description |
|---|---|---|---|
| `targetVersion` | `string` | — (required) | Semver representing the data version this code expects |
| `db` | `SmartdataDb` | undefined | Required if any step uses `ctx.db`/`ctx.mongo` or for the mongo ledger |
| `bucket` | `Bucket` | undefined | Required if any step uses `ctx.bucket`/`ctx.s3` or for the S3 ledger |
| `ledgerName` | `string` | `"smartmigration"` | Logical name; lets multiple migrations coexist on the same db/bucket |
| `ledgerBackend` | `'mongo'` \| `'s3'` | mongo if db, else s3 | Where to persist the ledger |
| `freshInstallVersion` | `string` | undefined | When the resource is empty, jump straight to this version |
| `lockWaitMs` | `number` | `60_000` | How long to wait for a stale lock from another instance |
| `lockTtlMs` | `number` | `600_000` | How long this instance's own lock auto-expires after |
| `dryRun` | `boolean` | `false` | If true, `run()` returns the plan without executing or locking |
| `logger` | `Smartlog` | module logger | Custom logger; defaults to a Smartlog with a local destination |
The constructor throws `SmartMigrationError` with one of these `code`s on bad input:
- `INVALID_OPTIONS` — options object missing
- `MISSING_TARGET_VERSION``targetVersion` not set
- `INVALID_VERSION``targetVersion` is not a valid semver
- `NO_RESOURCES` — neither `db` nor `bucket` provided
- `LEDGER_BACKEND_MISMATCH` — explicit `ledgerBackend` doesn't match the resources you provided
### `migration.step(id: string).from(v).to(v).[description(t)].[resumable()].up(handler)`
The step builder. `.from()`, `.to()`, and `.up()` are required; `.description()` and `.resumable()` are optional. `.up()` is the terminal call: it commits the step to the parent runner and returns the runner so you can chain `.step('next')`.
### `migration.run(): Promise<IMigrationRunResult>`
The startup entry point. Acquires a lock, computes the plan, executes pending steps in order, and releases the lock. Returns:
```ts
interface IMigrationRunResult {
currentVersionBefore: string | null;
currentVersionAfter: string;
targetVersion: string;
wasUpToDate: boolean; // true if no steps ran
wasFreshInstall: boolean; // true if freshInstallVersion was used
stepsApplied: IMigrationStepResult[];
stepsSkipped: IMigrationStepResult[];
totalDurationMs: number;
}
```
Throws `SmartMigrationError` with these `code`s:
- `LOCK_TIMEOUT` — could not acquire lock within `lockWaitMs`
- `STEP_FAILED` — a step's handler threw; the failure is persisted to the ledger
- `CHAIN_*`, `DUPLICATE_STEP_ID`, `NON_INCREASING_STEP`, `TARGET_NOT_REACHABLE`, `DOWNGRADE_NOT_SUPPORTED` — chain validation / planning errors
### `migration.plan(): Promise<IMigrationRunResult>`
Same as `run()` but does not acquire the lock or execute anything. Useful for `--dry-run` style probes in CI.
### `migration.getCurrentVersion(): Promise<string | null>`
Returns the current data version from the ledger, or `null` if the ledger has never been initialized.
## Troubleshooting
### `LOCK_TIMEOUT` on every startup
Another instance crashed while holding the lock. Wait for `lockTtlMs` (default 10 minutes) for the lock to expire, or manually clear the `lock` field on the ledger document.
### `CHAIN_GAP` at startup
Two adjacent steps have mismatched versions: `step[N].to !== step[N+1].from`. Steps must form a contiguous chain in registration order. Fix the version on the offending step.
### `CHAIN_NOT_AT_CURRENT`
The ledger says the data is at version X, but no registered step starts at X. This usually happens when you delete a step that has already been applied to production data. Either keep the step or manually update the ledger's `currentVersion`.
### S3-only deployments and concurrent instances
The S3 ledger's lock is best-effort. If you run multiple SaaS instances against the same S3 bucket, use external coordination (e.g. Redis lock, leader election) before calling `run()`. The mongo backend has no such limitation.
## Best practices
1. **Prefer idempotent migrations.** Even with the lock, machines crash, networks partition, and migrations should be safe to re-run. Use `$set` over `$inc`, check existence before insert, etc.
2. **Use `.resumable()` for any step that processes more than a few hundred records.** Crashes happen; resumability avoids redoing work.
3. **Use transactions for multi-collection mongo migrations** via `ctx.startSession()` + `session.withTransaction()`.
4. **Don't mix S3 and Mongo writes inside a single conceptual operation** — S3 has no transactions, so design migrations so each S3 write is observably idempotent on its own.
5. **Keep the `targetVersion` linked to your app's package.json `version`** via the autocreated `00_commitinfo_data.ts`. That way every release automatically signals the data version it expects.
6. **Freeze applied migrations in source control** — don't edit a step that has been applied to production data, even if the new version is "more correct." Add a new step instead.
## License and Legal Information
This repository contains open-source code that is licensed under the MIT License. A copy of the MIT License can be found in the [license](license) file within this repository.
**Please note:** The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.
### Trademarks
This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH and are not included within the scope of the MIT license granted herein. Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines, and any usage must be approved in writing by Task Venture Capital GmbH.
### Company Information
Task Venture Capital GmbH
Registered at District court Bremen HRB 35230 HB, Germany
For any legal inquiries or if you require further information, please contact us via email at hello@task.vc.
By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.