tstest/readme.plan.md

# Improvement Plan for tstest and tapbundle

!! FIRST: Reread /home/philkunz/.claude/CLAUDE.md to ensure following all guidelines !!

## Improved Internal Protocol (NEW - Critical) ✅ COMPLETED

### Current Issues ✅ RESOLVED
- ✅ TAP protocol uses `#` for metadata which conflicts with test descriptions containing `#`
- ✅ Fragile regex parsing that breaks with special characters
- ✅ Limited extensibility for new metadata types

### Proposed Solution: Protocol V2 ✅ IMPLEMENTED
- ✅ Use Unicode delimiters `⟦TSTEST:META:{}⟧` that won't appear in test names
- ✅ Structured JSON metadata format
- ✅ Separate protocol blocks for complex data (errors, snapshots)
- ✅ Complete replacement of v1 (no backwards compatibility needed)

### Implementation ✅ COMPLETED
- ✅ Phase 1: Create protocol v2 implementation in ts_tapbundle_protocol
- ✅ Phase 2: Replace all v1 code in both tstest and tapbundle with v2
- ✅ Phase 3: Delete all v1 parsing and generation code

#### ts_tapbundle_protocol Directory
The protocol v2 implementation will be contained in the `ts_tapbundle_protocol` directory as isomorphic TypeScript code:
- **Isomorphic Design**: All code must work in both browser and Node.js environments
- **No Node.js Imports**: No Node.js-specific modules allowed (no fs, path, child_process, etc.)
- **Protocol Classes**: Contains classes implementing all sides of the protocol:
  - ✅ `ProtocolEmitter`: For generating protocol v2 messages (used by tapbundle)
  - ✅ `ProtocolParser`: For parsing protocol v2 messages (used by tstest)
  - ✅ `ProtocolMessage`: Base classes for different message types
  - ✅ `ProtocolTypes`: TypeScript interfaces and types for protocol structures
- **Pure TypeScript**: Only browser-compatible APIs and pure TypeScript/JavaScript code
- **Build Integration**:
  - Compiled by `pnpm build` (via tsbuild) to `dist_ts_tapbundle_protocol/`
  - Build order defined in tspublish.json files
  - Imported by ts and ts_tapbundle modules from the compiled dist directory

See `readme.protocol.md` for detailed specification.

## Test Configuration System (NEW)

### Global Test Configuration via 00init.ts
- **Discovery**: Check for `test/00init.ts` before running tests
- **Execution**: Import and execute before any test files if found
- **Purpose**: Define project-wide default test settings

### tap.settings() API
```typescript
interface TapSettings {
  // Timing
  timeout?: number;              // Default timeout for all tests (ms)
  slowThreshold?: number;        // Mark tests as slow if they exceed this (ms)

  // Execution Control
  bail?: boolean;                // Stop on first test failure
  retries?: number;              // Number of retries for failed tests
  retryDelay?: number;           // Delay between retries (ms)

  // Output Control
  suppressConsole?: boolean;     // Suppress console output in passing tests
  verboseErrors?: boolean;       // Show full stack traces
  showTestDuration?: boolean;    // Show duration for each test

  // Parallel Execution
  maxConcurrency?: number;       // Max parallel tests (for .para files)
  isolateTests?: boolean;        // Run each test in fresh context

  // Lifecycle Hooks
  beforeAll?: () => Promise<void> | void;
  afterAll?: () => Promise<void> | void;
  beforeEach?: (testName: string) => Promise<void> | void;
  afterEach?: (testName: string, passed: boolean) => Promise<void> | void;

  // Environment
  env?: Record<string, string>;  // Additional environment variables

  // Features
  enableSnapshots?: boolean;     // Enable snapshot testing
  snapshotDirectory?: string;    // Custom snapshot directory
  updateSnapshots?: boolean;     // Update snapshots instead of comparing
}
```

### Settings Inheritance
- Global (00init.ts) → File level → Test level
- More specific settings override less specific ones
- Arrays/objects are merged, primitives are replaced

### Implementation Phases
1. **Core Infrastructure**: Settings storage and merge logic
2. **Discovery**: 00init.ts loading mechanism
3. **Application**: Apply settings to test execution
4. **Advanced**: Parallel execution and snapshot configuration

## 1. Enhanced Communication Between tapbundle and tstest ✅ COMPLETED

### 1.1 Real-time Test Progress API ✅ COMPLETED
- ✅ Create a bidirectional communication channel between tapbundle and tstest
- ✅ Emit events for test lifecycle stages (start, progress, completion)
- ✅ Allow tstest to subscribe to tapbundle events for better progress reporting
- ✅ Implement a standardized message format for test metadata

### 1.2 Rich Error Reporting ✅ COMPLETED
- ✅ Pass structured error objects from tapbundle to tstest
- ✅ Include stack traces, code snippets, and contextual information
- ✅ Support for error categorization (assertion failures, timeouts, uncaught exceptions)
- ✅ Visual diff output for failed assertions

## 2. Enhanced toolsArg Functionality

### 2.3 Test Data and Context Sharing (Partial)
```typescript
tap.test('data-driven test', async (toolsArg) => {
  // Parameterized test data (not yet implemented)
  const testData = toolsArg.data<TestInput>();
  expect(processData(testData)).toEqual(expected);
});
```

## 3. Nested Tests and Test Suites

### 3.2 Hierarchical Test Organization (Not yet implemented)
- Support for multiple levels of nesting
- Inherited context and configuration from parent suites
- Aggregated reporting for test suites
- Suite-level lifecycle hooks

## 4. Advanced Test Features

### 4.1 Snapshot Testing ✅ (Basic implementation complete)

### 4.2 Performance Benchmarking
```typescript
tap.test('performance test', async (toolsArg) => {
  const benchmark = toolsArg.benchmark();

  // Run operation
  await expensiveOperation();

  // Assert performance constraints
  benchmark.expect({
    maxDuration: 1000,
    maxMemory: '100MB'
  });
});
```


## 5. Test Execution Improvements


### 5.2 Watch Mode ✅ COMPLETED
- Automatically re-run tests on file changes
- Debounced file change detection (300ms)
- Clear console output between runs
- Shows which files triggered re-runs
- Graceful exit with Ctrl+C
- `--watch-ignore` option for excluding patterns

### 5.3 Advanced Test Filtering (Partial) ⚠️
```typescript
// Exclude tests by pattern (not yet implemented)
tstest --exclude "**/slow/**"

// Run only failed tests from last run (not yet implemented)
tstest --failed

// Run tests modified in git (not yet implemented)
tstest --changed
```

## 6. Reporting and Analytics

### 6.1 Custom Reporters
- Plugin architecture for custom reporters
- Built-in reporters: JSON, JUnit, HTML, Markdown
- Real-time streaming reporters
- Aggregated test metrics and trends

### 6.2 Coverage Integration
- Built-in code coverage collection
- Coverage thresholds and enforcement
- Coverage trending over time
- Integration with CI/CD pipelines

### 6.3 Test Analytics Dashboard
- Web-based dashboard for test results
- Historical test performance data
- Flaky test detection
- Test impact analysis

## 7. Developer Experience

### 7.1 Better Error Messages
- Clear, actionable error messages
- Suggestions for common issues
- Links to documentation
- Code examples in error output

## Implementation Phases

### Phase 1: Improved Internal Protocol (Priority: Critical) ✅ COMPLETED
1. ✅ Create ts_tapbundle_protocol directory with isomorphic protocol v2 implementation
   - ✅ Implement ProtocolEmitter class for message generation
   - ✅ Implement ProtocolParser class for message parsing
   - ✅ Define ProtocolMessage types and interfaces
   - ✅ Ensure all code is browser and Node.js compatible
   - ✅ Add tspublish.json to configure build order
2. ✅ Update build configuration to compile ts_tapbundle_protocol first
3. ✅ Replace TAP parser in tstest with Protocol V2 parser importing from dist_ts_tapbundle_protocol
4. ✅ Replace TAP generation in tapbundle with Protocol V2 emitter importing from dist_ts_tapbundle_protocol
5. ✅ Delete all v1 TAP parsing code from tstest
6. ✅ Delete all v1 TAP generation code from tapbundle
7. ✅ Test with real-world test suites containing special characters

### Phase 2: Test Configuration System (Priority: High) ✅ COMPLETED
1. ✅ Implement tap.settings() API with TypeScript interfaces
2. ✅ Add 00init.ts discovery and loading mechanism
3. ✅ Implement settings inheritance and merge logic
4. ✅ Apply settings to test execution (timeouts, retries, etc.)

### Phase 3: Enhanced Communication (Priority: High) ✅ COMPLETED
1. ✅ Build on Protocol V2 for richer communication
2. ✅ Implement real-time test progress API
3. ✅ Add structured error reporting with diffs and traces

### Phase 4: Developer Experience (Priority: Medium) ❌ NOT STARTED
1. Add watch mode
2. Implement custom reporters
3. Complete advanced test filtering options
4. Add performance benchmarking API

### Phase 5: Analytics and Performance (Priority: Low) ❌ NOT STARTED
1. Build test analytics dashboard
2. Implement coverage integration
3. Create trend analysis tools
4. Add test impact analysis

## Technical Considerations

### API Design Principles
- Clean, modern API design without legacy constraints
- Progressive enhancement approach
- Well-documented features and APIs
- Clear, simple interfaces

### Performance Goals
- Minimal overhead for test execution
- Efficient parallel execution
- Fast test discovery
- Optimized browser test bundling

### Integration Points
- Clean interfaces between tstest and tapbundle
- Extensible plugin architecture
- Standard test result format
- Compatible with existing CI/CD tools

## Summary of Remaining Work

### ✅ Completed
- **Protocol V2**: Full implementation with Unicode delimiters, structured metadata, and special character handling
- **Test Configuration System**: tap.settings() API, 00init.ts discovery, settings inheritance, lifecycle hooks
- **Enhanced Communication**: Event-based test lifecycle reporting, visual diff output for assertion failures, real-time test progress API
- **Rich Error Reporting**: Stack traces, error metadata, and visual diffs through protocol
- **Tags Filtering**: `--tags` option for running specific tagged tests

### ✅ Existing Features (Not in Plan)
- **Timeout Support**: `--timeout` option and per-test timeouts
- **Test Retries**: `tap.retry()` for flaky test handling
- **Parallel Tests**: `.testParallel()` for concurrent execution
- **Snapshot Testing**: Basic implementation with `toMatchSnapshot()`
- **Test Lifecycle**: `describe()` blocks with `beforeEach`/`afterEach`
- **Skip Tests**: `tap.skip.test()` (though it doesn't create test objects)
- **Log Files**: `--logfile` option saves output to `.nogit/testlogs/`
- **Test Range**: `--startFrom` and `--stopAt` for partial runs

### ⚠️ Partially Completed
- **Advanced Test Filtering**: Have `--tags` but missing `--exclude`, `--failed`, `--changed`

### ❌ Not Started

#### High Priority

#### Medium Priority
2. **Developer Experience**
   - Watch mode for file changes
   - Custom reporters (JSON, JUnit, HTML, Markdown)
   - Performance benchmarking API
   - Better error messages with suggestions

3. **Enhanced toolsArg**
   - Test data injection
   - Context sharing between tests
   - Parameterized tests

4. **Test Organization**
   - Hierarchical test suites
   - Nested describe blocks
   - Suite-level lifecycle hooks

#### Low Priority
5. **Analytics and Performance**
   - Test analytics dashboard
   - Code coverage integration
   - Trend analysis
   - Flaky test detection

### Recently Fixed Issues ✅
- **tap.todo()**: Now fully implemented with test object creation
- **tap.skip.test()**: Now creates test objects and maintains accurate test count
- **tap.only.test()**: Works correctly - when .only tests exist, only those run

### Remaining Minor Issues
- **Protocol Output**: Some protocol messages still appear in console output

### Next Recommended Steps
1. Add Watch Mode (Phase 4) - high developer value for fast feedback
2. Implement Custom Reporters - important for CI/CD integration
3. Implement performance benchmarking API
4. Add better error messages with suggestions