feat(docs): Update project metadata and documentation to reflect comprehensive AI-enhanced features and improved installation and usage instructions

2025-05-14 11:27:38 +00:00
parent 620737566f
commit ab273ea75c
21 changed files with 2305 additions and 258 deletions
--- a/readme.plan.md
+++ b/readme.plan.md
@@ -0,0 +1,314 @@
+# TSDocs Context Optimization Plan
+
+## Problem Statement
+
+For large TypeScript projects, the context generated for AI-based documentation creation becomes too large, potentially exceeding even o4-mini's 200K token limit. This affects the ability to effectively generate:
+
+- Project documentation (README.md)
+- API descriptions and keywords
+- Commit messages and changelogs
+
+Current implementation simply includes all TypeScript files and key project files, but lacks intelligent selection, prioritization, or content reduction mechanisms.
+
+## Analysis of Approaches
+
+### 1. Smart Content Selection
+
+**Description:** Intelligently select only files that are necessary for the specific task being performed, using heuristic rules.
+
+**Advantages:**
+- Simple to implement
+- Predictable behavior
+- Can be fine-tuned for different operations
+
+**Disadvantages:**
+- Requires manual tuning of rules
+- May miss important context in complex projects
+- Static approach lacks adaptability
+
+**Implementation Complexity:** Medium
+
+### 2. File Prioritization
+
+**Description:** Rank files by relevance using git history, file size, import/export analysis, and relationship to the current task.
+
+**Advantages:**
+- Adaptively includes the most relevant files first
+- Maintains context for frequently changed or central files
+- Can leverage git history for additional signals
+
+**Disadvantages:**
+- Complexity in determining accurate relevance scores
+- Requires analyzing project structure
+- May require scanning imports/exports for dependency analysis
+
+**Implementation Complexity:** High
+
+### 3. Chunking Strategy
+
+**Description:** Process the project in logical segments, generating intermediate results that are then combined to create the final output.
+
+**Advantages:**
+- Can handle projects of any size
+- Focused context for each specific part
+- May improve quality by focusing on specific areas deeply
+
+**Disadvantages:**
+- Complex orchestration of multiple AI calls
+- Challenge in maintaining consistency across chunks
+- May increase time and cost for processing
+
+**Implementation Complexity:** High
+
+### 4. Dynamic Context Trimming
+
+**Description:** Automatically reduce context by removing non-essential code while preserving structure. Techniques include:
+- Removing implementation details but keeping interfaces and type definitions
+- Truncating large functions while keeping signatures
+- Removing comments and whitespace (except JSDoc)
+- Keeping only imports/exports for context files
+
+**Advantages:**
+- Preserves full project structure
+- Flexible token usage based on importance
+- Good balance between completeness and token efficiency
+
+**Disadvantages:**
+- Potential to remove important implementation details
+- Risk of missing context needed for specific tasks
+- Complex rules for what to trim vs keep
+
+**Implementation Complexity:** Medium
+
+### 5. Embeddings-Based Retrieval
+
+**Description:** Create vector embeddings of project files and retrieve only the most relevant ones for a specific task using semantic similarity.
+
+**Advantages:**
+- Highly adaptive to different types of requests
+- Leverages semantic understanding of content
+- Can scale to extremely large projects
+
+**Disadvantages:**
+- Requires setting up and managing embeddings database
+- Added complexity of running vector similarity searches
+- Higher resource requirements for maintaining embeddings
+
+**Implementation Complexity:** Very High
+
+### 6. Task-Specific Contexts
+
+**Description:** Create separate optimized contexts for different tasks (readme, commit messages, etc.) with distinct file selection and processing strategies.
+
+**Advantages:**
+- Highly optimized for each specific task
+- Efficient token usage for each operation
+- Improved quality through task-focused contexts
+
+**Disadvantages:**
+- Maintenance of multiple context building strategies
+- More complex configuration
+- Potential duplication in implementation
+
+**Implementation Complexity:** Medium
+
+### 7. Recursive Summarization
+
+**Description:** Summarize larger files first, then include these summaries in the final context along with smaller files included in full.
+
+**Advantages:**
+- Can handle arbitrary project sizes
+- Preserves essential information from all files
+- Balanced approach to token usage
+
+**Disadvantages:**
+- Quality loss from summarization
+- Increased processing time from multiple AI calls
+- Complex orchestration logic
+
+**Implementation Complexity:** High
+
+## Implementation Strategy
+
+We propose a phased implementation approach, starting with the most impactful and straightforward approaches, then building toward more complex solutions as needed:
+
+### Phase 1: Foundation (1-2 weeks)
+
+1. **Implement Dynamic Context Trimming**
+   - Create a `ContextProcessor` class that takes SmartFile objects and applies trimming rules
+   - Implement configurable trimming rules (remove implementations, keep signatures)
+   - Add a configuration option to control trimming aggressiveness
+   - Support preserving JSDoc comments while removing other comments
+
+2. **Enhance Token Monitoring**
+   - Track token usage per file to identify problematic files
+   - Implement token budgeting to stay within limits
+   - Add detailed token reporting for optimization
+
+### Phase 2: Smart Selection (2-3 weeks)
+
+3. **Implement Task-Specific Contexts**
+   - Create specialized context builders for readme, commit messages, and descriptions
+   - Customize file selection rules for each task
+   - Add configuration options for task-specific settings
+
+4. **Add Smart Content Selection**
+   - Implement heuristic rules for file importance
+   - Create configuration for inclusion/exclusion patterns
+   - Add ability to focus on specific directories or modules
+
+### Phase 3: Advanced Techniques (3-4 weeks)
+
+5. **Implement File Prioritization**
+   - Add git history analysis to identify frequently changed files
+   - Implement dependency analysis to identify central files
+   - Create a scoring system for file relevance
+
+6. **Add Optional Recursive Summarization**
+   - Implement file summarization for large files
+   - Create a hybrid approach that mixes full files and summaries
+   - Add configuration to control summarization thresholds
+
+### Phase 4: Research-Based Approaches (Future Consideration)
+
+7. **Research and Evaluate Embeddings-Based Retrieval**
+   - Prototype embeddings creation for TypeScript files
+   - Evaluate performance and accuracy
+   - Implement if benefits justify the complexity
+
+8. **Explore Chunking Strategies**
+   - Research effective chunking approaches for documentation
+   - Prototype and evaluate performance
+   - Implement if benefits justify the complexity
+
+## Technical Design
+
+### Core Components
+
+1. **ContextBuilder** - Enhanced version of current ProjectContext
+   ```typescript
+   interface IContextBuilder {
+     buildContext(): Promise<string>;
+     getTokenCount(): number;
+     setContextMode(mode: 'normal' | 'trimmed' | 'summarized'): void;
+     setTokenBudget(maxTokens: number): void;
+     setPrioritizationStrategy(strategy: IPrioritizationStrategy): void;
+   }
+   ```
+
+2. **FileProcessor** - Handles per-file processing and trimming
+   ```typescript
+   interface IFileProcessor {
+     processFile(file: SmartFile): Promise<string>;
+     setProcessingMode(mode: 'full' | 'trim' | 'summarize'): void;
+     getTokenCount(): number;
+   }
+   ```
+
+3. **PrioritizationStrategy** - Ranks files by importance
+   ```typescript
+   interface IPrioritizationStrategy {
+     rankFiles(files: SmartFile[], context: string): Promise<SmartFile[]>;
+     setImportanceMetrics(metrics: IImportanceMetrics): void;
+   }
+   ```
+
+4. **TaskContextFactory** - Creates optimized contexts for specific tasks
+   ```typescript
+   interface ITaskContextFactory {
+     createContextForReadme(projectDir: string): Promise<string>;
+     createContextForCommit(projectDir: string, diff: string): Promise<string>;
+     createContextForDescription(projectDir: string): Promise<string>;
+   }
+   ```
+
+### Configuration Options
+
+The system will support configuration via a new section in `npmextra.json`:
+
+```json
+{
+  "tsdoc": {
+    "context": {
+      "maxTokens": 190000,
+      "defaultMode": "dynamic",
+      "taskSpecificSettings": {
+        "readme": {
+          "mode": "full",
+          "includePaths": ["src/", "lib/"],
+          "excludePaths": ["test/", "examples/"]
+        },
+        "commit": {
+          "mode": "trimmed",
+          "focusOnChangedFiles": true
+        },
+        "description": {
+          "mode": "summarized",
+          "includePackageInfo": true
+        }
+      },
+      "trimming": {
+        "removeImplementations": true,
+        "preserveInterfaces": true,
+        "preserveTypeDefs": true,
+        "preserveJSDoc": true,
+        "maxFunctionLines": 5
+      }
+    }
+  }
+}
+```
+
+## Cost-Benefit Analysis
+
+### Cost Considerations
+
+1. **Development costs**
+   - Initial implementation of foundational components (~30-40 hours)
+   - Testing and validation across different project sizes (~10-15 hours)
+   - Documentation and configuration examples (~5 hours)
+
+2. **Operational costs**
+   - Potential increased processing time for context preparation
+   - Additional API calls for summarization or embeddings approaches
+   - Monitoring and maintenance of the system
+
+### Benefits
+
+1. **Scalability**
+   - Support for projects of any size, up to and beyond o4-mini's 200K token limit
+   - Future-proof design that can adapt to different models and token limits
+
+2. **Quality improvements**
+   - More focused contexts lead to better AI outputs
+   - Task-specific optimization improves relevance
+   - Consistent performance regardless of project size
+
+3. **User experience**
+   - Predictable behavior for all project sizes
+   - Transparent token usage reporting
+   - Configuration options for different usage patterns
+
+## First Deliverable
+
+For immediate improvements, we recommend implementing Dynamic Context Trimming and Task-Specific Contexts first, as these offer the best balance of impact and implementation complexity.
+
+### Implementation Plan for Dynamic Context Trimming
+
+1. Create a basic `ContextTrimmer` class that processes TypeScript files:
+   - Remove function bodies but keep signatures
+   - Preserve interface and type definitions
+   - Keep imports and exports
+   - Preserve JSDoc comments
+
+2. Integrate with the existing ProjectContext class:
+   - Add a trimming mode option
+   - Apply trimming during the context building process
+   - Track and report token savings
+
+3. Modify the CLI to support trimming options:
+   - Add a `--trim` flag to enable trimming
+   - Add a `--trim-level` option for controlling aggressiveness
+   - Show token usage with and without trimming
+
+This approach could reduce token usage by 40-70% while preserving the essential structure of the codebase, making it suitable for large projects while maintaining high-quality AI outputs.