# Project Readme Hints ## Python (PyPI) Protocol Implementation Notes ### PEP 503: Simple Repository API (HTML-based) **URL Structure:** - Root: `//` - Lists all projects - Project: `///` - Lists all files for a project - All URLs MUST end with `/` (redirect if missing) **Package Name Normalization:** - Lowercase all characters - Replace runs of `.`, `-`, `_` with single `-` - Implementation: `re.sub(r"[-_.]+", "-", name).lower()` **HTML Format:** - Root: One anchor per project - Project: One anchor per file - Anchor text must match final filename - Anchor href links to download URL **Hash Fragments:** Format: `#=` - hashname: lowercase hash function name (recommend `sha256`) - hashvalue: hex-encoded digest **Data Attributes:** - `data-gpg-sig`: `true`/`false` for GPG signature presence - `data-requires-python`: PEP 345 requirement string (HTML-encode `<` as `<`, `>` as `>`) ### PEP 691: JSON-based Simple API **Content Types:** - `application/vnd.pypi.simple.v1+json` - JSON format - `application/vnd.pypi.simple.v1+html` - HTML format - `text/html` - Alias for HTML (backwards compat) **Root Endpoint JSON:** ```json { "meta": {"api-version": "1.0"}, "projects": [{"name": "ProjectName"}] } ``` **Project Endpoint JSON:** ```json { "name": "normalized-name", "meta": {"api-version": "1.0"}, "files": [ { "filename": "package-1.0-py3-none-any.whl", "url": "https://example.com/path/to/file", "hashes": {"sha256": "..."}, "requires-python": ">=3.7", "dist-info-metadata": true | {"sha256": "..."}, "gpg-sig": true, "yanked": false | "reason string" } ] } ``` **Content Negotiation:** - Use `Accept` header for format selection - Server responds with `Content-Type` header - Support both JSON and HTML formats ### PyPI Upload API (Legacy /legacy/) **Endpoint:** - URL: `https://upload.pypi.org/legacy/` - Method: `POST` - Content-Type: `multipart/form-data` **Required Form Fields:** - `:action` = `file_upload` - `protocol_version` = `1` - `content` = Binary file data with filename - `filetype` = `bdist_wheel` | `sdist` - `pyversion` = Python tag (e.g., `py3`, `py2.py3`) or `source` for sdist - `metadata_version` = Metadata standard version - `name` = Package name - `version` = Version string **Hash Digest (one required):** - `md5_digest`: urlsafe base64 without padding - `sha256_digest`: hexadecimal - `blake2_256_digest`: hexadecimal **Optional Fields:** - `attestations`: JSON array of attestation objects - Any Core Metadata fields (lowercase, hyphens → underscores) - Example: `Description-Content-Type` → `description_content_type` **Authentication:** - Username/password or API token in HTTP Basic Auth - API tokens: username = `__token__`, password = token value **Behavior:** - First file uploaded creates the release - Multiple files uploaded sequentially for same version ### PEP 694: Upload 2.0 API **Status:** Draft (not yet required, legacy API still supported) - Multi-step workflow with sessions - Async upload support with resumption - JSON-based API - Standard HTTP auth (RFC 7235) - Not implementing initially (legacy API sufficient) --- ## Ruby (RubyGems) Protocol Implementation Notes ### Compact Index Format **Endpoints:** - `/versions` - Master list of all gems and versions - `/info/` - Detailed info for specific gem - `/names` - Simple list of gem names **Authentication:** - UUID tokens similar to NPM pattern - API key in `Authorization` header - Scope format: `rubygems:gem:{name}:{read|write|yank}` ### `/versions` File Format **Structure:** ``` created_at: 2024-04-01T00:00:05Z --- RUBYGEM [-]VERSION_PLATFORM[,VERSION_PLATFORM,...] MD5 ``` **Details:** - Metadata lines before `---` delimiter - One line per gem with comma-separated versions - `[-]` prefix indicates yanked version - `MD5`: Checksum of corresponding `/info/` file - Append-only during month, recalculated monthly ### `/info/` File Format **Structure:** ``` --- VERSION[-PLATFORM] [DEPENDENCY[,DEPENDENCY,...]]|REQUIREMENT[,REQUIREMENT,...] ``` **Dependency Format:** ``` GEM:CONSTRAINT[&CONSTRAINT] ``` - Examples: `actionmailer:= 2.2.2`, `parser:>= 3.2.2.3` - Operators: `=`, `>`, `<`, `>=`, `<=`, `~>`, `!=` - Multiple constraints: `unicode-display_width:< 3.0&>= 2.4.0` **Requirement Format:** ``` checksum:SHA256_HEX ruby:CONSTRAINT rubygems:CONSTRAINT ``` **Platform:** - Default platform is `ruby` - Non-default platforms: `VERSION-PLATFORM` (e.g., `3.2.1-arm64-darwin`) **Yanked Gems:** - Listed with `-` prefix in `/versions` - Excluded entirely from `/info/` file ### `/names` File Format ``` --- gemname1 gemname2 gemname3 ``` ### HTTP Range Support **Headers:** - `Range: bytes=#{start}-`: Request from byte position - `If-None-Match`: ETag conditional request - `Repr-Digest`: SHA256 checksum in response **Caching Strategy:** 1. Store file with last byte position 2. Request range from last position 3. Append response to existing file 4. Verify SHA256 against `Repr-Digest` ### RubyGems Upload/Management API **Upload Gem:** - `POST /api/v1/gems` - Binary `.gem` file in request body - `Authorization` header with API key **Yank Version:** - `DELETE /api/v1/gems/yank` - Parameters: `gem_name`, `version` **Unyank Version:** - `PUT /api/v1/gems/unyank` - Parameters: `gem_name`, `version` **Version Metadata:** - `GET /api/v1/versions/.json` - Returns JSON array of versions **Dependencies:** - `GET /api/v1/dependencies?gems=` - Returns dependency information for resolution --- ## Implementation Strategy ### Storage Paths **PyPI:** ``` pypi/ ├── simple/ # PEP 503 HTML files │ ├── index.html # All packages list │ └── {package}/index.html # Package versions list ├── packages/ │ └── {package}/{filename} # .whl and .tar.gz files └── metadata/ └── {package}/metadata.json # Package metadata ``` **RubyGems:** ``` rubygems/ ├── versions # Master versions file ├── info/{gemname} # Per-gem info files ├── names # All gem names └── gems/{gemname}-{version}.gem # .gem files ``` ### Authentication Pattern Both protocols should follow the existing UUID token pattern used by NPM, Maven, Cargo, Composer: ```typescript // AuthManager additions createPypiToken(userId: string, readonly: boolean): string validatePypiToken(token: string): ITokenInfo | null revokePypiToken(token: string): boolean createRubyGemsToken(userId: string, readonly: boolean): string validateRubyGemsToken(token: string): ITokenInfo | null revokeRubyGemsToken(token: string): boolean ``` ### Scope Format ``` pypi:package:{name}:{read|write} rubygems:gem:{name}:{read|write|yank} ``` ### Common Patterns 1. **Package name normalization** - Critical for PyPI 2. **Checksum calculation** - SHA256 for both protocols 3. **Append-only files** - RubyGems compact index 4. **Content negotiation** - PyPI JSON vs HTML 5. **Multipart upload parsing** - PyPI file uploads 6. **Binary file handling** - Both protocols (.whl, .tar.gz, .gem) --- ## Key Differences from Existing Protocols **PyPI vs NPM:** - PyPI uses Simple API (HTML) + JSON API - PyPI requires package name normalization - PyPI uses multipart form data for uploads (not JSON) - PyPI supports multiple file types per release (wheel + sdist) **RubyGems vs Cargo:** - RubyGems uses compact index (append-only text files) - RubyGems uses checksums in index files (not just filenames) - RubyGems has HTTP Range support for incremental updates - RubyGems uses MD5 for index checksums, SHA256 for .gem files --- ## Testing Requirements ### PyPI Tests Must Cover: - Package upload (wheel and sdist) - Package name normalization - Simple API HTML generation (PEP 503) - JSON API responses (PEP 691) - Content negotiation - Hash calculation and verification - Authentication (tokens) - Multi-file releases - Yanked packages ### RubyGems Tests Must Cover: - Gem upload - Compact index generation - `/versions` file updates (append-only) - `/info/` file generation - `/names` file generation - Checksum calculations (MD5 and SHA256) - Platform-specific gems - Yanking/unyanking - HTTP Range requests - Authentication (API keys) --- ## Security Considerations 1. **Package name validation** - Prevent path traversal 2. **File size limits** - Prevent DoS via large uploads 3. **Content-Type validation** - Verify file types 4. **Checksum verification** - Ensure file integrity 5. **Token scope enforcement** - Read vs write permissions 6. **HTML escaping** - Prevent XSS in generated HTML 7. **Metadata sanitization** - Clean user-provided strings 8. **Rate limiting** - Consider upload frequency limits