feat(docs): Update project metadata and documentation to reflect comprehensive AI-enhanced features and improved installation and usage instructions
This commit is contained in:
314
readme.plan.md
Normal file
314
readme.plan.md
Normal file
@@ -0,0 +1,314 @@
|
||||
# TSDocs Context Optimization Plan
|
||||
|
||||
## Problem Statement
|
||||
|
||||
For large TypeScript projects, the context generated for AI-based documentation creation becomes too large, potentially exceeding even o4-mini's 200K token limit. This affects the ability to effectively generate:
|
||||
|
||||
- Project documentation (README.md)
|
||||
- API descriptions and keywords
|
||||
- Commit messages and changelogs
|
||||
|
||||
Current implementation simply includes all TypeScript files and key project files, but lacks intelligent selection, prioritization, or content reduction mechanisms.
|
||||
|
||||
## Analysis of Approaches
|
||||
|
||||
### 1. Smart Content Selection
|
||||
|
||||
**Description:** Intelligently select only files that are necessary for the specific task being performed, using heuristic rules.
|
||||
|
||||
**Advantages:**
|
||||
- Simple to implement
|
||||
- Predictable behavior
|
||||
- Can be fine-tuned for different operations
|
||||
|
||||
**Disadvantages:**
|
||||
- Requires manual tuning of rules
|
||||
- May miss important context in complex projects
|
||||
- Static approach lacks adaptability
|
||||
|
||||
**Implementation Complexity:** Medium
|
||||
|
||||
### 2. File Prioritization
|
||||
|
||||
**Description:** Rank files by relevance using git history, file size, import/export analysis, and relationship to the current task.
|
||||
|
||||
**Advantages:**
|
||||
- Adaptively includes the most relevant files first
|
||||
- Maintains context for frequently changed or central files
|
||||
- Can leverage git history for additional signals
|
||||
|
||||
**Disadvantages:**
|
||||
- Complexity in determining accurate relevance scores
|
||||
- Requires analyzing project structure
|
||||
- May require scanning imports/exports for dependency analysis
|
||||
|
||||
**Implementation Complexity:** High
|
||||
|
||||
### 3. Chunking Strategy
|
||||
|
||||
**Description:** Process the project in logical segments, generating intermediate results that are then combined to create the final output.
|
||||
|
||||
**Advantages:**
|
||||
- Can handle projects of any size
|
||||
- Focused context for each specific part
|
||||
- May improve quality by focusing on specific areas deeply
|
||||
|
||||
**Disadvantages:**
|
||||
- Complex orchestration of multiple AI calls
|
||||
- Challenge in maintaining consistency across chunks
|
||||
- May increase time and cost for processing
|
||||
|
||||
**Implementation Complexity:** High
|
||||
|
||||
### 4. Dynamic Context Trimming
|
||||
|
||||
**Description:** Automatically reduce context by removing non-essential code while preserving structure. Techniques include:
|
||||
- Removing implementation details but keeping interfaces and type definitions
|
||||
- Truncating large functions while keeping signatures
|
||||
- Removing comments and whitespace (except JSDoc)
|
||||
- Keeping only imports/exports for context files
|
||||
|
||||
**Advantages:**
|
||||
- Preserves full project structure
|
||||
- Flexible token usage based on importance
|
||||
- Good balance between completeness and token efficiency
|
||||
|
||||
**Disadvantages:**
|
||||
- Potential to remove important implementation details
|
||||
- Risk of missing context needed for specific tasks
|
||||
- Complex rules for what to trim vs keep
|
||||
|
||||
**Implementation Complexity:** Medium
|
||||
|
||||
### 5. Embeddings-Based Retrieval
|
||||
|
||||
**Description:** Create vector embeddings of project files and retrieve only the most relevant ones for a specific task using semantic similarity.
|
||||
|
||||
**Advantages:**
|
||||
- Highly adaptive to different types of requests
|
||||
- Leverages semantic understanding of content
|
||||
- Can scale to extremely large projects
|
||||
|
||||
**Disadvantages:**
|
||||
- Requires setting up and managing embeddings database
|
||||
- Added complexity of running vector similarity searches
|
||||
- Higher resource requirements for maintaining embeddings
|
||||
|
||||
**Implementation Complexity:** Very High
|
||||
|
||||
### 6. Task-Specific Contexts
|
||||
|
||||
**Description:** Create separate optimized contexts for different tasks (readme, commit messages, etc.) with distinct file selection and processing strategies.
|
||||
|
||||
**Advantages:**
|
||||
- Highly optimized for each specific task
|
||||
- Efficient token usage for each operation
|
||||
- Improved quality through task-focused contexts
|
||||
|
||||
**Disadvantages:**
|
||||
- Maintenance of multiple context building strategies
|
||||
- More complex configuration
|
||||
- Potential duplication in implementation
|
||||
|
||||
**Implementation Complexity:** Medium
|
||||
|
||||
### 7. Recursive Summarization
|
||||
|
||||
**Description:** Summarize larger files first, then include these summaries in the final context along with smaller files included in full.
|
||||
|
||||
**Advantages:**
|
||||
- Can handle arbitrary project sizes
|
||||
- Preserves essential information from all files
|
||||
- Balanced approach to token usage
|
||||
|
||||
**Disadvantages:**
|
||||
- Quality loss from summarization
|
||||
- Increased processing time from multiple AI calls
|
||||
- Complex orchestration logic
|
||||
|
||||
**Implementation Complexity:** High
|
||||
|
||||
## Implementation Strategy
|
||||
|
||||
We propose a phased implementation approach, starting with the most impactful and straightforward approaches, then building toward more complex solutions as needed:
|
||||
|
||||
### Phase 1: Foundation (1-2 weeks)
|
||||
|
||||
1. **Implement Dynamic Context Trimming**
|
||||
- Create a `ContextProcessor` class that takes SmartFile objects and applies trimming rules
|
||||
- Implement configurable trimming rules (remove implementations, keep signatures)
|
||||
- Add a configuration option to control trimming aggressiveness
|
||||
- Support preserving JSDoc comments while removing other comments
|
||||
|
||||
2. **Enhance Token Monitoring**
|
||||
- Track token usage per file to identify problematic files
|
||||
- Implement token budgeting to stay within limits
|
||||
- Add detailed token reporting for optimization
|
||||
|
||||
### Phase 2: Smart Selection (2-3 weeks)
|
||||
|
||||
3. **Implement Task-Specific Contexts**
|
||||
- Create specialized context builders for readme, commit messages, and descriptions
|
||||
- Customize file selection rules for each task
|
||||
- Add configuration options for task-specific settings
|
||||
|
||||
4. **Add Smart Content Selection**
|
||||
- Implement heuristic rules for file importance
|
||||
- Create configuration for inclusion/exclusion patterns
|
||||
- Add ability to focus on specific directories or modules
|
||||
|
||||
### Phase 3: Advanced Techniques (3-4 weeks)
|
||||
|
||||
5. **Implement File Prioritization**
|
||||
- Add git history analysis to identify frequently changed files
|
||||
- Implement dependency analysis to identify central files
|
||||
- Create a scoring system for file relevance
|
||||
|
||||
6. **Add Optional Recursive Summarization**
|
||||
- Implement file summarization for large files
|
||||
- Create a hybrid approach that mixes full files and summaries
|
||||
- Add configuration to control summarization thresholds
|
||||
|
||||
### Phase 4: Research-Based Approaches (Future Consideration)
|
||||
|
||||
7. **Research and Evaluate Embeddings-Based Retrieval**
|
||||
- Prototype embeddings creation for TypeScript files
|
||||
- Evaluate performance and accuracy
|
||||
- Implement if benefits justify the complexity
|
||||
|
||||
8. **Explore Chunking Strategies**
|
||||
- Research effective chunking approaches for documentation
|
||||
- Prototype and evaluate performance
|
||||
- Implement if benefits justify the complexity
|
||||
|
||||
## Technical Design
|
||||
|
||||
### Core Components
|
||||
|
||||
1. **ContextBuilder** - Enhanced version of current ProjectContext
|
||||
```typescript
|
||||
interface IContextBuilder {
|
||||
buildContext(): Promise<string>;
|
||||
getTokenCount(): number;
|
||||
setContextMode(mode: 'normal' | 'trimmed' | 'summarized'): void;
|
||||
setTokenBudget(maxTokens: number): void;
|
||||
setPrioritizationStrategy(strategy: IPrioritizationStrategy): void;
|
||||
}
|
||||
```
|
||||
|
||||
2. **FileProcessor** - Handles per-file processing and trimming
|
||||
```typescript
|
||||
interface IFileProcessor {
|
||||
processFile(file: SmartFile): Promise<string>;
|
||||
setProcessingMode(mode: 'full' | 'trim' | 'summarize'): void;
|
||||
getTokenCount(): number;
|
||||
}
|
||||
```
|
||||
|
||||
3. **PrioritizationStrategy** - Ranks files by importance
|
||||
```typescript
|
||||
interface IPrioritizationStrategy {
|
||||
rankFiles(files: SmartFile[], context: string): Promise<SmartFile[]>;
|
||||
setImportanceMetrics(metrics: IImportanceMetrics): void;
|
||||
}
|
||||
```
|
||||
|
||||
4. **TaskContextFactory** - Creates optimized contexts for specific tasks
|
||||
```typescript
|
||||
interface ITaskContextFactory {
|
||||
createContextForReadme(projectDir: string): Promise<string>;
|
||||
createContextForCommit(projectDir: string, diff: string): Promise<string>;
|
||||
createContextForDescription(projectDir: string): Promise<string>;
|
||||
}
|
||||
```
|
||||
|
||||
### Configuration Options
|
||||
|
||||
The system will support configuration via a new section in `npmextra.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"tsdoc": {
|
||||
"context": {
|
||||
"maxTokens": 190000,
|
||||
"defaultMode": "dynamic",
|
||||
"taskSpecificSettings": {
|
||||
"readme": {
|
||||
"mode": "full",
|
||||
"includePaths": ["src/", "lib/"],
|
||||
"excludePaths": ["test/", "examples/"]
|
||||
},
|
||||
"commit": {
|
||||
"mode": "trimmed",
|
||||
"focusOnChangedFiles": true
|
||||
},
|
||||
"description": {
|
||||
"mode": "summarized",
|
||||
"includePackageInfo": true
|
||||
}
|
||||
},
|
||||
"trimming": {
|
||||
"removeImplementations": true,
|
||||
"preserveInterfaces": true,
|
||||
"preserveTypeDefs": true,
|
||||
"preserveJSDoc": true,
|
||||
"maxFunctionLines": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Cost-Benefit Analysis
|
||||
|
||||
### Cost Considerations
|
||||
|
||||
1. **Development costs**
|
||||
- Initial implementation of foundational components (~30-40 hours)
|
||||
- Testing and validation across different project sizes (~10-15 hours)
|
||||
- Documentation and configuration examples (~5 hours)
|
||||
|
||||
2. **Operational costs**
|
||||
- Potential increased processing time for context preparation
|
||||
- Additional API calls for summarization or embeddings approaches
|
||||
- Monitoring and maintenance of the system
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **Scalability**
|
||||
- Support for projects of any size, up to and beyond o4-mini's 200K token limit
|
||||
- Future-proof design that can adapt to different models and token limits
|
||||
|
||||
2. **Quality improvements**
|
||||
- More focused contexts lead to better AI outputs
|
||||
- Task-specific optimization improves relevance
|
||||
- Consistent performance regardless of project size
|
||||
|
||||
3. **User experience**
|
||||
- Predictable behavior for all project sizes
|
||||
- Transparent token usage reporting
|
||||
- Configuration options for different usage patterns
|
||||
|
||||
## First Deliverable
|
||||
|
||||
For immediate improvements, we recommend implementing Dynamic Context Trimming and Task-Specific Contexts first, as these offer the best balance of impact and implementation complexity.
|
||||
|
||||
### Implementation Plan for Dynamic Context Trimming
|
||||
|
||||
1. Create a basic `ContextTrimmer` class that processes TypeScript files:
|
||||
- Remove function bodies but keep signatures
|
||||
- Preserve interface and type definitions
|
||||
- Keep imports and exports
|
||||
- Preserve JSDoc comments
|
||||
|
||||
2. Integrate with the existing ProjectContext class:
|
||||
- Add a trimming mode option
|
||||
- Apply trimming during the context building process
|
||||
- Track and report token savings
|
||||
|
||||
3. Modify the CLI to support trimming options:
|
||||
- Add a `--trim` flag to enable trimming
|
||||
- Add a `--trim-level` option for controlling aggressiveness
|
||||
- Show token usage with and without trimming
|
||||
|
||||
This approach could reduce token usage by 40-70% while preserving the essential structure of the codebase, making it suitable for large projects while maintaining high-quality AI outputs.
|
Reference in New Issue
Block a user