Philipp Kunz ab273ea75c feat(docs): Update project metadata and documentation to reflect comprehensive AI-enhanced features and improved installation and usage instructions

2025-05-14 11:27:38 +00:00

11 KiB

Raw Permalink Blame History

TSDocs Context Optimization Plan

Problem Statement

For large TypeScript projects, the context generated for AI-based documentation creation becomes too large, potentially exceeding even o4-mini's 200K token limit. This affects the ability to effectively generate:

Project documentation (README.md)
API descriptions and keywords
Commit messages and changelogs

Current implementation simply includes all TypeScript files and key project files, but lacks intelligent selection, prioritization, or content reduction mechanisms.

Analysis of Approaches

1. Smart Content Selection

Description: Intelligently select only files that are necessary for the specific task being performed, using heuristic rules.

Advantages:

Simple to implement
Predictable behavior
Can be fine-tuned for different operations

Disadvantages:

Requires manual tuning of rules
May miss important context in complex projects
Static approach lacks adaptability

Implementation Complexity: Medium

2. File Prioritization

Description: Rank files by relevance using git history, file size, import/export analysis, and relationship to the current task.

Advantages:

Adaptively includes the most relevant files first
Maintains context for frequently changed or central files
Can leverage git history for additional signals

Disadvantages:

Complexity in determining accurate relevance scores
Requires analyzing project structure
May require scanning imports/exports for dependency analysis

Implementation Complexity: High

3. Chunking Strategy

Description: Process the project in logical segments, generating intermediate results that are then combined to create the final output.

Advantages:

Can handle projects of any size
Focused context for each specific part
May improve quality by focusing on specific areas deeply

Disadvantages:

Complex orchestration of multiple AI calls
Challenge in maintaining consistency across chunks
May increase time and cost for processing

Implementation Complexity: High

4. Dynamic Context Trimming

Description: Automatically reduce context by removing non-essential code while preserving structure. Techniques include:

Removing implementation details but keeping interfaces and type definitions
Truncating large functions while keeping signatures
Removing comments and whitespace (except JSDoc)
Keeping only imports/exports for context files

Advantages:

Preserves full project structure
Flexible token usage based on importance
Good balance between completeness and token efficiency

Disadvantages:

Potential to remove important implementation details
Risk of missing context needed for specific tasks
Complex rules for what to trim vs keep

Implementation Complexity: Medium

5. Embeddings-Based Retrieval

Description: Create vector embeddings of project files and retrieve only the most relevant ones for a specific task using semantic similarity.

Advantages:

Highly adaptive to different types of requests
Leverages semantic understanding of content
Can scale to extremely large projects

Disadvantages:

Requires setting up and managing embeddings database
Added complexity of running vector similarity searches
Higher resource requirements for maintaining embeddings

Implementation Complexity: Very High

6. Task-Specific Contexts

Description: Create separate optimized contexts for different tasks (readme, commit messages, etc.) with distinct file selection and processing strategies.

Advantages:

Highly optimized for each specific task
Efficient token usage for each operation
Improved quality through task-focused contexts

Disadvantages:

Maintenance of multiple context building strategies
More complex configuration
Potential duplication in implementation

Implementation Complexity: Medium

7. Recursive Summarization

Description: Summarize larger files first, then include these summaries in the final context along with smaller files included in full.

Advantages:

Can handle arbitrary project sizes
Preserves essential information from all files
Balanced approach to token usage

Disadvantages:

Quality loss from summarization
Increased processing time from multiple AI calls
Complex orchestration logic

Implementation Complexity: High

Implementation Strategy

We propose a phased implementation approach, starting with the most impactful and straightforward approaches, then building toward more complex solutions as needed:

Phase 1: Foundation (1-2 weeks)

Implement Dynamic Context Trimming
- Create a ContextProcessor class that takes SmartFile objects and applies trimming rules
- Implement configurable trimming rules (remove implementations, keep signatures)
- Add a configuration option to control trimming aggressiveness
- Support preserving JSDoc comments while removing other comments
Enhance Token Monitoring
- Track token usage per file to identify problematic files
- Implement token budgeting to stay within limits
- Add detailed token reporting for optimization

Phase 2: Smart Selection (2-3 weeks)

Implement Task-Specific Contexts
- Create specialized context builders for readme, commit messages, and descriptions
- Customize file selection rules for each task
- Add configuration options for task-specific settings
Add Smart Content Selection
- Implement heuristic rules for file importance
- Create configuration for inclusion/exclusion patterns
- Add ability to focus on specific directories or modules

Phase 3: Advanced Techniques (3-4 weeks)

Implement File Prioritization
- Add git history analysis to identify frequently changed files
- Implement dependency analysis to identify central files
- Create a scoring system for file relevance
Add Optional Recursive Summarization
- Implement file summarization for large files
- Create a hybrid approach that mixes full files and summaries
- Add configuration to control summarization thresholds

Phase 4: Research-Based Approaches (Future Consideration)

Research and Evaluate Embeddings-Based Retrieval
- Prototype embeddings creation for TypeScript files
- Evaluate performance and accuracy
- Implement if benefits justify the complexity
Explore Chunking Strategies
- Research effective chunking approaches for documentation
- Prototype and evaluate performance
- Implement if benefits justify the complexity

Technical Design

Core Components

ContextBuilder - Enhanced version of current ProjectContext

interface IContextBuilder {
  buildContext(): Promise<string>;
  getTokenCount(): number;
  setContextMode(mode: 'normal' | 'trimmed' | 'summarized'): void;
  setTokenBudget(maxTokens: number): void;
  setPrioritizationStrategy(strategy: IPrioritizationStrategy): void;
}

FileProcessor - Handles per-file processing and trimming

interface IFileProcessor {
  processFile(file: SmartFile): Promise<string>;
  setProcessingMode(mode: 'full' | 'trim' | 'summarize'): void;
  getTokenCount(): number;
}

PrioritizationStrategy - Ranks files by importance

interface IPrioritizationStrategy {
  rankFiles(files: SmartFile[], context: string): Promise<SmartFile[]>;
  setImportanceMetrics(metrics: IImportanceMetrics): void;
}

TaskContextFactory - Creates optimized contexts for specific tasks

interface ITaskContextFactory {
  createContextForReadme(projectDir: string): Promise<string>;
  createContextForCommit(projectDir: string, diff: string): Promise<string>;
  createContextForDescription(projectDir: string): Promise<string>;
}

Configuration Options

The system will support configuration via a new section in npmextra.json:

{
  "tsdoc": {
    "context": {
      "maxTokens": 190000,
      "defaultMode": "dynamic",
      "taskSpecificSettings": {
        "readme": {
          "mode": "full",
          "includePaths": ["src/", "lib/"],
          "excludePaths": ["test/", "examples/"]
        },
        "commit": {
          "mode": "trimmed",
          "focusOnChangedFiles": true
        },
        "description": {
          "mode": "summarized",
          "includePackageInfo": true
        }
      },
      "trimming": {
        "removeImplementations": true,
        "preserveInterfaces": true,
        "preserveTypeDefs": true,
        "preserveJSDoc": true,
        "maxFunctionLines": 5
      }
    }
  }
}

Cost-Benefit Analysis

Cost Considerations

Development costs
- Initial implementation of foundational components (~30-40 hours)
- Testing and validation across different project sizes (~10-15 hours)
- Documentation and configuration examples (~5 hours)
Operational costs
- Potential increased processing time for context preparation
- Additional API calls for summarization or embeddings approaches
- Monitoring and maintenance of the system

Benefits

Scalability
- Support for projects of any size, up to and beyond o4-mini's 200K token limit
- Future-proof design that can adapt to different models and token limits
Quality improvements
- More focused contexts lead to better AI outputs
- Task-specific optimization improves relevance
- Consistent performance regardless of project size
User experience
- Predictable behavior for all project sizes
- Transparent token usage reporting
- Configuration options for different usage patterns

First Deliverable

For immediate improvements, we recommend implementing Dynamic Context Trimming and Task-Specific Contexts first, as these offer the best balance of impact and implementation complexity.

Implementation Plan for Dynamic Context Trimming

Create a basic ContextTrimmer class that processes TypeScript files:
- Remove function bodies but keep signatures
- Preserve interface and type definitions
- Keep imports and exports
- Preserve JSDoc comments
Integrate with the existing ProjectContext class:
- Add a trimming mode option
- Apply trimming during the context building process
- Track and report token savings
Modify the CLI to support trimming options:
- Add a --trim flag to enable trimming
- Add a --trim-level option for controlling aggressiveness
- Show token usage with and without trimming

This approach could reduce token usage by 40-70% while preserving the essential structure of the codebase, making it suitable for large projects while maintaining high-quality AI outputs.

11 KiB Raw Permalink Blame History

TSDocs Context Optimization Plan

Problem Statement

Analysis of Approaches

1. Smart Content Selection

2. File Prioritization

3. Chunking Strategy

4. Dynamic Context Trimming

5. Embeddings-Based Retrieval

6. Task-Specific Contexts

7. Recursive Summarization

Implementation Strategy

Phase 1: Foundation (1-2 weeks)

Phase 2: Smart Selection (2-3 weeks)

Phase 3: Advanced Techniques (3-4 weeks)

Phase 4: Research-Based Approaches (Future Consideration)

Technical Design

Core Components

Configuration Options

Cost-Benefit Analysis

Cost Considerations

Benefits

First Deliverable

Implementation Plan for Dynamic Context Trimming

11 KiB

Raw Permalink Blame History