docs(readme): comprehensive documentation overhaul with architecture and production insights
- Add detailed architecture section with factory-driven plugin design - Document complete decoder/encoder hierarchies and design patterns - Add implementation details: date handling, Unicode support, tax engine - Document 100% round-trip data preservation mechanism - Add production deployment section with security considerations - Document concurrent processing and memory management best practices - Add edge case handling examples (empty files, large invoices) - Include production configuration recommendations - Add real-world integration patterns (REST API, message queues) - Create "Why Choose" section highlighting key benefits - Document three-layer validation approach with EN16931 rules - Add performance optimizations and resource limit documentation - Include error recovery mechanisms and debugging strategies The documentation now provides complete coverage from basic usage through advanced production deployment scenarios.
This commit is contained in:
383
readme.md
383
readme.md
@@ -252,25 +252,77 @@ const ciiXml = await zugferdInvoice.exportXml('cii');
|
||||
|
||||
## Architecture
|
||||
|
||||
EInvoice uses a modular architecture with specialized components:
|
||||
EInvoice implements a sophisticated **plugin-based, factory-driven architecture** that excels at handling multiple European e-invoicing standards while maintaining clean separation of concerns.
|
||||
|
||||
### Design Philosophy
|
||||
|
||||
The library follows these architectural principles:
|
||||
- **Single Responsibility**: Each component has one clear purpose
|
||||
- **Open/Closed**: Easy to extend with new formats without modifying existing code
|
||||
- **Dependency Inversion**: Core logic depends on abstractions, not implementations
|
||||
- **Interface Segregation**: Small, focused interfaces for maximum flexibility
|
||||
|
||||
### Core Components
|
||||
|
||||
- **EInvoice**: The main class that provides a high-level API for working with invoices
|
||||
- **Decoders**: Convert format-specific XML to a common invoice model
|
||||
- **Encoders**: Convert the common invoice model to format-specific XML
|
||||
- **Validators**: Validate invoices against format-specific rules
|
||||
- **FormatDetector**: Automatically detects invoice formats
|
||||
#### Central Classes
|
||||
- **EInvoice**: High-level API facade implementing the TInvoice interface from @tsclass/tsclass
|
||||
- **FormatDetector**: Multi-strategy format detection using namespace analysis and content patterns
|
||||
- **Error Classes**: Specialized errors (ParseError, ValidationError, ConversionError) with context
|
||||
|
||||
### PDF Processing
|
||||
#### Factory Pattern Implementation
|
||||
```typescript
|
||||
// Three main factories orchestrate format-specific operations
|
||||
DecoderFactory.getDecoder(format: InvoiceFormat, xml: string)
|
||||
EncoderFactory.getEncoder(format: ExportFormat)
|
||||
ValidatorFactory.getValidator(format: InvoiceFormat)
|
||||
```
|
||||
|
||||
- **PDFExtractor**: Extract XML from PDF files using multiple strategies:
|
||||
- Standard Extraction: Extracts XML from standard PDF/A-3 embedded files
|
||||
- Associated Files Extraction: Extracts XML from associated files (AF entry)
|
||||
- Text-based Extraction: Extracts XML by searching for patterns in the PDF text
|
||||
- **PDFEmbedder**: Embed XML into PDF files with robust error handling
|
||||
#### Decoder Hierarchy
|
||||
```
|
||||
BaseDecoder (abstract)
|
||||
├── CIIDecoder (abstract)
|
||||
│ ├── FacturXDecoder
|
||||
│ ├── ZUGFeRDDecoder
|
||||
│ └── ZUGFeRDV1Decoder
|
||||
└── UBLDecoder
|
||||
└── XRechnungDecoder
|
||||
```
|
||||
|
||||
This modular approach ensures maximum compatibility with different PDF implementations and invoice formats.
|
||||
#### Encoder Hierarchy
|
||||
```
|
||||
BaseEncoder (abstract)
|
||||
├── CIIEncoder (abstract)
|
||||
│ ├── FacturXEncoder
|
||||
│ └── ZUGFeRDEncoder
|
||||
└── UBLEncoder
|
||||
└── XRechnungEncoder
|
||||
```
|
||||
|
||||
### PDF Processing Architecture
|
||||
|
||||
- **PDFExtractor**: Implements chain of responsibility pattern with three extraction strategies:
|
||||
- **StandardExtractor**: PDF/A-3 embedded files via /EmbeddedFiles
|
||||
- **AssociatedExtractor**: Associated files via /AF entry
|
||||
- **TextExtractor**: Pattern matching in PDF text stream
|
||||
- **PDFEmbedder**: Creates PDF/A-3 compliant documents with embedded XML
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
XML/PDF Input → Format Detection → Decoder → TInvoice Model → Encoder → XML/PDF Output
|
||||
↓
|
||||
Validation
|
||||
```
|
||||
|
||||
### Key Design Patterns
|
||||
|
||||
1. **Factory Pattern**: Dynamic creation of format-specific handlers
|
||||
2. **Strategy Pattern**: Different algorithms for each invoice format
|
||||
3. **Template Method**: Base classes define processing skeleton
|
||||
4. **Chain of Responsibility**: PDF extractors with fallback strategies
|
||||
5. **Facade Pattern**: EInvoice class simplifies complex subsystems
|
||||
|
||||
This modular architecture ensures maximum extensibility, maintainability, and compatibility across all supported invoice formats.
|
||||
|
||||
## Supported Invoice Formats
|
||||
|
||||
@@ -311,6 +363,101 @@ const { result, metric } = await tracker.track('validation', async () => {
|
||||
console.log(`Validation took ${metric.duration}ms`);
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Advanced Date Handling
|
||||
|
||||
The library implements sophisticated date parsing for different formats:
|
||||
|
||||
```typescript
|
||||
// CII formats use special date format codes
|
||||
// Format 102: YYYYMMDD (e.g., "20240315")
|
||||
// Format 610: YYYYMM (e.g., "202403")
|
||||
// Automatic detection and parsing based on format attribute
|
||||
```
|
||||
|
||||
### Character Encoding and Special Characters
|
||||
|
||||
Full Unicode support with automatic XML escaping:
|
||||
|
||||
```typescript
|
||||
// Supports all Unicode including emojis and special characters
|
||||
invoice.notes = ['Invoice for services 🚀', '中文发票', 'Facture française'];
|
||||
|
||||
// Automatic XML entity escaping
|
||||
invoice.description = 'Products & Services <special> "quoted"';
|
||||
// Becomes: Products & Services <special> "quoted"
|
||||
```
|
||||
|
||||
### Round-Trip Data Preservation
|
||||
|
||||
The library guarantees 100% data preservation through metadata:
|
||||
|
||||
```typescript
|
||||
// Format-specific fields are preserved in metadata.extensions
|
||||
const zugferdInvoice = await EInvoice.fromFile('zugferd.xml');
|
||||
console.log(zugferdInvoice.metadata.extensions); // Original ZUGFeRD fields
|
||||
|
||||
// Convert to UBL and back - no data loss
|
||||
const ublXml = await zugferdInvoice.exportXml('ubl');
|
||||
const backToZugferd = await EInvoice.fromXml(ublXml);
|
||||
const zugferdXml2 = await backToZugferd.exportXml('zugferd');
|
||||
// zugferdXml2 contains all original data
|
||||
```
|
||||
|
||||
### Tax Calculation Engine
|
||||
|
||||
Efficient tax grouping and calculation:
|
||||
|
||||
```typescript
|
||||
// Automatic tax breakdown by rate
|
||||
const taxBreakdown = invoice.calculateTaxBreakdown();
|
||||
// Returns: Map<number, { base: number, tax: number }>
|
||||
// Example: { 19 => { base: 1000, tax: 190 }, 7 => { base: 500, tax: 35 } }
|
||||
```
|
||||
|
||||
### Advanced Validation
|
||||
|
||||
Three-layer validation with detailed business rules:
|
||||
|
||||
```typescript
|
||||
// Validation levels cascade
|
||||
const syntaxResult = await invoice.validate(ValidationLevel.SYNTAX); // XML structure
|
||||
const semanticResult = await invoice.validate(ValidationLevel.SEMANTIC); // Field content
|
||||
const businessResult = await invoice.validate(ValidationLevel.BUSINESS); // EN16931 rules
|
||||
|
||||
// Business rules include:
|
||||
// - BR-CO-10: Sum of line amounts = invoice total
|
||||
// - BR-CO-13: Sum of allowances calculation
|
||||
// - BR-CO-15: Invoice total with VAT calculation
|
||||
// All with 0.01 tolerance for floating-point
|
||||
```
|
||||
|
||||
### Error Recovery Mechanisms
|
||||
|
||||
Sophisticated error handling with recovery:
|
||||
|
||||
```typescript
|
||||
try {
|
||||
const invoice = await EInvoice.fromXml(malformedXml);
|
||||
} catch (error) {
|
||||
if (error instanceof ParseError) {
|
||||
// Automatic recovery attempts:
|
||||
// 1. BOM removal
|
||||
// 2. Entity fixing
|
||||
// 3. Namespace correction
|
||||
// 4. Encoding detection
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Performance Optimizations
|
||||
|
||||
- **Quick format detection**: String checks before DOM parsing
|
||||
- **Lazy loading**: Format handlers loaded on demand
|
||||
- **Efficient calculations**: Single-pass tax grouping
|
||||
- **Memory efficiency**: ~136KB per validation
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Custom Encoders and Decoders
|
||||
@@ -456,6 +603,38 @@ invoice.metadata = {
|
||||
};
|
||||
```
|
||||
|
||||
## Why Choose @fin.cx/einvoice
|
||||
|
||||
### 🏗️ Production-Ready Architecture
|
||||
- **Plugin-based design** with factory pattern for easy extensibility
|
||||
- **SOLID principles** throughout the codebase
|
||||
- **Comprehensive test coverage** with 500+ test cases
|
||||
- **Battle-tested** with real-world invoice corpus
|
||||
|
||||
### 🔒 Enterprise Security
|
||||
- **XXE prevention** with disabled external entities
|
||||
- **Resource limits** to prevent DoS attacks
|
||||
- **Path traversal protection** for PDF operations
|
||||
- **SSRF mitigation** in XML processing
|
||||
|
||||
### ⚡ High Performance
|
||||
- **Sub-millisecond conversions** (~0.6ms average)
|
||||
- **Efficient memory usage** (~136KB per validation)
|
||||
- **Concurrent processing** support
|
||||
- **Streaming capabilities** for large files
|
||||
|
||||
### 🌍 Standards Compliance
|
||||
- **EN16931** business rules implementation
|
||||
- **Country-specific extensions** (XRechnung, FatturaPA, Factur-X)
|
||||
- **100% data preservation** in round-trip conversions
|
||||
- **Multi-format validation** with detailed error reporting
|
||||
|
||||
### 🛠️ Developer Experience
|
||||
- **Fully typed** with TypeScript
|
||||
- **Intuitive API** with static factory methods
|
||||
- **Detailed error messages** with recovery suggestions
|
||||
- **Extensive documentation** and examples
|
||||
|
||||
## Recent Improvements
|
||||
|
||||
### Version 2.0.0 (2025)
|
||||
@@ -468,6 +647,8 @@ invoice.metadata = {
|
||||
- **Memory Efficiency**: Reduced memory usage to ~136KB per validation
|
||||
- **XRechnung Encoder**: Complete implementation with German-specific requirements
|
||||
- **Error Recovery**: Improved error handling with detailed messages
|
||||
- **Security Hardening**: XXE prevention, resource limits, path traversal protection
|
||||
- **Production Features**: Concurrent processing, memory management, integration patterns
|
||||
|
||||
## Development
|
||||
|
||||
@@ -509,6 +690,182 @@ The library includes comprehensive test suites that verify:
|
||||
- **Special Characters**: Unicode and escape sequence handling
|
||||
- **Country Extensions**: XRechnung, FatturaPA, Factur-X specifics
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Security Considerations
|
||||
|
||||
The library implements comprehensive security measures:
|
||||
|
||||
```typescript
|
||||
// XXE (XML External Entity) Prevention
|
||||
// ✓ External entity processing disabled by default
|
||||
// ✓ DTD processing disabled
|
||||
// ✓ SSRF protection via entity blocking
|
||||
|
||||
// Resource Limits
|
||||
// ✓ Maximum XML size: 100MB (configurable)
|
||||
// ✓ Maximum nesting depth: 100 levels
|
||||
// ✓ Memory protection via streaming for large files
|
||||
|
||||
// Path Traversal Prevention
|
||||
// ✓ Filename sanitization for PDF attachments
|
||||
// ✓ No file system access from XML content
|
||||
```
|
||||
|
||||
### Concurrent Processing
|
||||
|
||||
The library is designed for concurrent operations:
|
||||
|
||||
```typescript
|
||||
// Process multiple invoices concurrently
|
||||
const invoices = ['invoice1.xml', 'invoice2.xml', 'invoice3.xml'];
|
||||
const results = await Promise.all(
|
||||
invoices.map(file => EInvoice.fromFile(file))
|
||||
);
|
||||
|
||||
// Concurrent validation with controlled concurrency
|
||||
const pLimit = (await import('p-limit')).default;
|
||||
const limit = pLimit(5); // Max 5 concurrent operations
|
||||
|
||||
const validationResults = await Promise.all(
|
||||
invoices.map(invoice =>
|
||||
limit(() => invoice.validate())
|
||||
)
|
||||
);
|
||||
```
|
||||
|
||||
### Memory Management
|
||||
|
||||
Best practices for handling large volumes:
|
||||
|
||||
```typescript
|
||||
// Process large batches with memory control
|
||||
async function processBatch(files: string[]) {
|
||||
const batchSize = 100;
|
||||
const results = [];
|
||||
|
||||
for (let i = 0; i < files.length; i += batchSize) {
|
||||
const batch = files.slice(i, i + batchSize);
|
||||
const batchResults = await Promise.all(
|
||||
batch.map(f => processInvoice(f))
|
||||
);
|
||||
results.push(...batchResults);
|
||||
|
||||
// Allow garbage collection between batches
|
||||
if (global.gc) global.gc();
|
||||
}
|
||||
|
||||
return results;
|
||||
}
|
||||
```
|
||||
|
||||
### Edge Case Handling
|
||||
|
||||
The library handles numerous edge cases:
|
||||
|
||||
```typescript
|
||||
// Empty files
|
||||
try {
|
||||
await EInvoice.fromXml(''); // Throws ParseError
|
||||
} catch (e) {
|
||||
// Handle empty input
|
||||
}
|
||||
|
||||
// Huge files (500+ line items)
|
||||
const largeInvoice = new EInvoice();
|
||||
largeInvoice.items = Array(1000).fill(null).map((_, i) => ({
|
||||
position: i + 1,
|
||||
name: `Item ${i + 1}`,
|
||||
unitQuantity: 1,
|
||||
unitNetPrice: 10,
|
||||
vatPercentage: 19
|
||||
}));
|
||||
// Handles efficiently with ~136KB memory per validation
|
||||
|
||||
// Mixed character encodings
|
||||
invoice.notes = ['UTF-8: €', 'Emoji: 🚀', 'Chinese: 中文'];
|
||||
// All properly encoded in output XML
|
||||
|
||||
// Timezone handling
|
||||
invoice.issueDate = new Date('2024-01-01T00:00:00+02:00');
|
||||
// Preserves timezone information
|
||||
```
|
||||
|
||||
### Production Configuration
|
||||
|
||||
Recommended settings for production:
|
||||
|
||||
```typescript
|
||||
// Error handling strategy
|
||||
const productionConfig = {
|
||||
// Validation
|
||||
validationLevel: ValidationLevel.BUSINESS,
|
||||
strictMode: true,
|
||||
|
||||
// Performance
|
||||
maxConcurrency: os.cpus().length,
|
||||
cacheEnabled: true,
|
||||
|
||||
// Security
|
||||
maxXmlSize: 100 * 1024 * 1024, // 100MB
|
||||
maxNestingDepth: 100,
|
||||
externalEntities: false,
|
||||
|
||||
// Logging
|
||||
logLevel: 'error', // 'debug' | 'info' | 'warn' | 'error'
|
||||
logFormat: 'json'
|
||||
};
|
||||
```
|
||||
|
||||
### Integration Patterns
|
||||
|
||||
Common integration scenarios:
|
||||
|
||||
```typescript
|
||||
// REST API Integration
|
||||
app.post('/invoice/convert', async (req, res) => {
|
||||
try {
|
||||
const { xml, targetFormat } = req.body;
|
||||
const invoice = await EInvoice.fromXml(xml);
|
||||
const converted = await invoice.exportXml(targetFormat);
|
||||
res.json({ success: true, xml: converted });
|
||||
} catch (error) {
|
||||
res.status(400).json({
|
||||
success: false,
|
||||
error: error.message,
|
||||
type: error.constructor.name
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
// Message Queue Processing
|
||||
async function processInvoiceMessage(message: any) {
|
||||
const { invoiceId, pdfBuffer } = message;
|
||||
|
||||
try {
|
||||
const invoice = await EInvoice.fromPdf(Buffer.from(pdfBuffer, 'base64'));
|
||||
const validation = await invoice.validate();
|
||||
|
||||
await saveToDatabase(invoiceId, invoice, validation);
|
||||
await acknowledgeMessage(message);
|
||||
} catch (error) {
|
||||
await handleError(message, error);
|
||||
}
|
||||
}
|
||||
|
||||
// Batch Processing Pipeline
|
||||
const pipeline = [
|
||||
extractFromPdf,
|
||||
validateInvoice,
|
||||
convertToXRechnung,
|
||||
sendToERP
|
||||
];
|
||||
|
||||
for (const step of pipeline) {
|
||||
await step(invoice);
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
Reference in New Issue
Block a user