# Project Implementation Notes This file contains technical implementation details for PyPI and RubyGems protocols. ## Python (PyPI) Protocol Implementation ✅ ### PEP 503: Simple Repository API (HTML-based) **URL Structure:** - Root: `//` - Lists all projects - Project: `///` - Lists all files for a project - All URLs MUST end with `/` (redirect if missing) **Package Name Normalization:** - Lowercase all characters - Replace runs of `.`, `-`, `_` with single `-` - Implementation: `re.sub(r"[-_.]+", "-", name).lower()` **HTML Format:** - Root: One anchor per project - Project: One anchor per file - Anchor text must match final filename - Anchor href links to download URL **Hash Fragments:** Format: `#=` - hashname: lowercase hash function name (recommend `sha256`) - hashvalue: hex-encoded digest **Data Attributes:** - `data-gpg-sig`: `true`/`false` for GPG signature presence - `data-requires-python`: PEP 345 requirement string (HTML-encode `<` as `<`, `>` as `>`) ### PEP 691: JSON-based Simple API **Content Types:** - `application/vnd.pypi.simple.v1+json` - JSON format - `application/vnd.pypi.simple.v1+html` - HTML format - `text/html` - Alias for HTML (backwards compat) **Root Endpoint JSON:** ```json { "meta": {"api-version": "1.0"}, "projects": [{"name": "ProjectName"}] } ``` **Project Endpoint JSON:** ```json { "name": "normalized-name", "meta": {"api-version": "1.0"}, "files": [ { "filename": "package-1.0-py3-none-any.whl", "url": "https://example.com/path/to/file", "hashes": {"sha256": "..."}, "requires-python": ">=3.7", "dist-info-metadata": true | {"sha256": "..."}, "gpg-sig": true, "yanked": false | "reason string" } ] } ``` **Content Negotiation:** - Use `Accept` header for format selection - Server responds with `Content-Type` header - Support both JSON and HTML formats ### PyPI Upload API (Legacy /legacy/) **Endpoint:** - URL: `https://upload.pypi.org/legacy/` - Method: `POST` - Content-Type: `multipart/form-data` **Required Form Fields:** - `:action` = `file_upload` - `protocol_version` = `1` - `content` = Binary file data with filename - `filetype` = `bdist_wheel` | `sdist` - `pyversion` = Python tag (e.g., `py3`, `py2.py3`) or `source` for sdist - `metadata_version` = Metadata standard version - `name` = Package name - `version` = Version string **Hash Digest (one required):** - `md5_digest`: urlsafe base64 without padding - `sha256_digest`: hexadecimal - `blake2_256_digest`: hexadecimal **Optional Fields:** - `attestations`: JSON array of attestation objects - Any Core Metadata fields (lowercase, hyphens → underscores) - Example: `Description-Content-Type` → `description_content_type` **Authentication:** - Username/password or API token in HTTP Basic Auth - API tokens: username = `__token__`, password = token value **Behavior:** - First file uploaded creates the release - Multiple files uploaded sequentially for same version ### PEP 694: Upload 2.0 API **Status:** Draft (not yet required, legacy API still supported) - Multi-step workflow with sessions - Async upload support with resumption - JSON-based API - Standard HTTP auth (RFC 7235) - Not implementing initially (legacy API sufficient) --- ## Ruby (RubyGems) Protocol Implementation ✅ ### Compact Index Format **Endpoints:** - `/versions` - Master list of all gems and versions - `/info/` - Detailed info for specific gem - `/names` - Simple list of gem names **Authentication:** - UUID tokens similar to NPM pattern - API key in `Authorization` header - Scope format: `rubygems:gem:{name}:{read|write|yank}` ### `/versions` File Format **Structure:** ``` created_at: 2024-04-01T00:00:05Z --- RUBYGEM [-]VERSION_PLATFORM[,VERSION_PLATFORM,...] MD5 ``` **Details:** - Metadata lines before `---` delimiter - One line per gem with comma-separated versions - `[-]` prefix indicates yanked version - `MD5`: Checksum of corresponding `/info/` file - Append-only during month, recalculated monthly ### `/info/` File Format **Structure:** ``` --- VERSION[-PLATFORM] [DEPENDENCY[,DEPENDENCY,...]]|REQUIREMENT[,REQUIREMENT,...] ``` **Dependency Format:** ``` GEM:CONSTRAINT[&CONSTRAINT] ``` - Examples: `actionmailer:= 2.2.2`, `parser:>= 3.2.2.3` - Operators: `=`, `>`, `<`, `>=`, `<=`, `~>`, `!=` - Multiple constraints: `unicode-display_width:< 3.0&>= 2.4.0` **Requirement Format:** ``` checksum:SHA256_HEX ruby:CONSTRAINT rubygems:CONSTRAINT ``` **Platform:** - Default platform is `ruby` - Non-default platforms: `VERSION-PLATFORM` (e.g., `3.2.1-arm64-darwin`) **Yanked Gems:** - Listed with `-` prefix in `/versions` - Excluded entirely from `/info/` file ### `/names` File Format ``` --- gemname1 gemname2 gemname3 ``` ### HTTP Range Support **Headers:** - `Range: bytes=#{start}-`: Request from byte position - `If-None-Match`: ETag conditional request - `Repr-Digest`: SHA256 checksum in response **Caching Strategy:** 1. Store file with last byte position 2. Request range from last position 3. Append response to existing file 4. Verify SHA256 against `Repr-Digest` ### RubyGems Upload/Management API **Upload Gem:** - `POST /api/v1/gems` - Binary `.gem` file in request body - `Authorization` header with API key **Yank Version:** - `DELETE /api/v1/gems/yank` - Parameters: `gem_name`, `version` **Unyank Version:** - `PUT /api/v1/gems/unyank` - Parameters: `gem_name`, `version` **Version Metadata:** - `GET /api/v1/versions/.json` - Returns JSON array of versions **Dependencies:** - `GET /api/v1/dependencies?gems=` - Returns dependency information for resolution --- ## Implementation Details ### Completed Protocols - ✅ OCI Distribution Spec v1.1 - ✅ NPM Registry API - ✅ Maven Repository - ✅ Cargo/crates.io Registry - ✅ Composer/Packagist - ✅ PyPI (Python Package Index) - PEP 503/691 - ✅ RubyGems - Compact Index ### Storage Paths **PyPI:** ``` pypi/ ├── simple/ # PEP 503 HTML files │ ├── index.html # All packages list │ └── {package}/index.html # Package versions list ├── packages/ │ └── {package}/{filename} # .whl and .tar.gz files └── metadata/ └── {package}/metadata.json # Package metadata ``` **RubyGems:** ``` rubygems/ ├── versions # Master versions file ├── info/{gemname} # Per-gem info files ├── names # All gem names └── gems/{gemname}-{version}.gem # .gem files ``` ### Authentication Pattern Both protocols should follow the existing UUID token pattern used by NPM, Maven, Cargo, Composer: ```typescript // AuthManager additions createPypiToken(userId: string, readonly: boolean): string validatePypiToken(token: string): ITokenInfo | null revokePypiToken(token: string): boolean createRubyGemsToken(userId: string, readonly: boolean): string validateRubyGemsToken(token: string): ITokenInfo | null revokeRubyGemsToken(token: string): boolean ``` ### Scope Format ``` pypi:package:{name}:{read|write} rubygems:gem:{name}:{read|write|yank} ``` ### Common Patterns 1. **Package name normalization** - Critical for PyPI 2. **Checksum calculation** - SHA256 for both protocols 3. **Append-only files** - RubyGems compact index 4. **Content negotiation** - PyPI JSON vs HTML 5. **Multipart upload parsing** - PyPI file uploads 6. **Binary file handling** - Both protocols (.whl, .tar.gz, .gem) --- ## Key Differences from Existing Protocols **PyPI vs NPM:** - PyPI uses Simple API (HTML) + JSON API - PyPI requires package name normalization - PyPI uses multipart form data for uploads (not JSON) - PyPI supports multiple file types per release (wheel + sdist) **RubyGems vs Cargo:** - RubyGems uses compact index (append-only text files) - RubyGems uses checksums in index files (not just filenames) - RubyGems has HTTP Range support for incremental updates - RubyGems uses MD5 for index checksums, SHA256 for .gem files --- ## Testing Requirements ### PyPI Tests Must Cover: - Package upload (wheel and sdist) - Package name normalization - Simple API HTML generation (PEP 503) - JSON API responses (PEP 691) - Content negotiation - Hash calculation and verification - Authentication (tokens) - Multi-file releases - Yanked packages ### RubyGems Tests Must Cover: - Gem upload - Compact index generation - `/versions` file updates (append-only) - `/info/` file generation - `/names` file generation - Checksum calculations (MD5 and SHA256) - Platform-specific gems - Yanking/unyanking - HTTP Range requests - Authentication (API keys) --- ## Security Considerations 1. **Package name validation** - Prevent path traversal 2. **File size limits** - Prevent DoS via large uploads 3. **Content-Type validation** - Verify file types 4. **Checksum verification** - Ensure file integrity 5. **Token scope enforcement** - Read vs write permissions 6. **HTML escaping** - Prevent XSS in generated HTML 7. **Metadata sanitization** - Clean user-provided strings 8. **Rate limiting** - Consider upload frequency limits --- ## Implementation Status (Completed) ### PyPI Implementation ✅ - **Files Created:** - `ts/pypi/interfaces.pypi.ts` - Type definitions (354 lines) - `ts/pypi/helpers.pypi.ts` - Helper functions (280 lines) - `ts/pypi/classes.pypiregistry.ts` - Main registry (650 lines) - `ts/pypi/index.ts` - Module exports - **Features Implemented:** - ✅ PEP 503 Simple API (HTML) - ✅ PEP 691 JSON API - ✅ Content negotiation (Accept header) - ✅ Package name normalization - ✅ File upload with multipart/form-data - ✅ Hash verification (SHA256, MD5, Blake2b) - ✅ Package metadata management - ✅ JSON API endpoints (/pypi/{package}/json) - ✅ Token-based authentication - ✅ Scope-based permissions (read/write/delete) - **Security Enhancements:** - ✅ Hash verification on upload (validates client-provided hashes) - ✅ Package name validation (regex check) - ✅ HTML escaping in generated pages - ✅ Permission checks on all mutating operations ### RubyGems Implementation ✅ - **Files Created:** - `ts/rubygems/interfaces.rubygems.ts` - Type definitions (215 lines) - `ts/rubygems/helpers.rubygems.ts` - Helper functions (350 lines) - `ts/rubygems/classes.rubygemsregistry.ts` - Main registry (580 lines) - `ts/rubygems/index.ts` - Module exports - **Features Implemented:** - ✅ Compact Index format (modern Bundler) - ✅ /versions endpoint (all gems list) - ✅ /info/{gem} endpoint (gem-specific metadata) - ✅ /names endpoint (gem names list) - ✅ Gem upload API - ✅ Yank/unyank functionality - ✅ Platform-specific gems support - ✅ JSON API endpoints - ✅ Legacy endpoints (specs.4.8.gz, Marshal.4.8) - ✅ Token-based authentication - ✅ Scope-based permissions ### Integration ✅ - **Core Updates:** - ✅ Updated `IRegistryConfig` interface - ✅ Updated `TRegistryProtocol` type - ✅ Added authentication methods to `AuthManager` - ✅ Added 30+ storage methods to `RegistryStorage` - ✅ Updated `SmartRegistry` initialization and routing - ✅ Module exports from `ts/index.ts` - **Test Coverage:** - ✅ `test/test.pypi.ts` - 25+ tests covering all PyPI endpoints - ✅ `test/test.rubygems.ts` - 30+ tests covering all RubyGems endpoints - ✅ `test/test.integration.pypi-rubygems.ts` - Integration tests - ✅ Updated test helpers with PyPI and RubyGems support ### Known Limitations 1. **PyPI:** - Does not implement legacy XML-RPC API - No support for PGP signatures (data-gpg-sig always false) - Metadata extraction from wheel files not implemented 2. **RubyGems:** - Gem spec extraction from .gem files returns placeholder (Ruby Marshal parsing not implemented) - Legacy Marshal endpoints return basic data only - No support for gem dependencies resolution ### Configuration Example ```typescript { pypi: { enabled: true, basePath: '/pypi', // Also handles /simple }, rubygems: { enabled: true, basePath: '/rubygems', }, auth: { pypiTokens: { enabled: true }, rubygemsTokens: { enabled: true }, } } ```