Files
smartregistry/readme.hints.md

12 KiB

Project Implementation Notes

This file contains technical implementation details for PyPI and RubyGems protocols.

Python (PyPI) Protocol Implementation

PEP 503: Simple Repository API (HTML-based)

URL Structure:

  • Root: /<base>/ - Lists all projects
  • Project: /<base>/<project>/ - Lists all files for a project
  • All URLs MUST end with / (redirect if missing)

Package Name Normalization:

  • Lowercase all characters
  • Replace runs of ., -, _ with single -
  • Implementation: re.sub(r"[-_.]+", "-", name).lower()

HTML Format:

  • Root: One anchor per project
  • Project: One anchor per file
  • Anchor text must match final filename
  • Anchor href links to download URL

Hash Fragments: Format: #<hashname>=<hashvalue>

  • hashname: lowercase hash function name (recommend sha256)
  • hashvalue: hex-encoded digest

Data Attributes:

  • data-gpg-sig: true/false for GPG signature presence
  • data-requires-python: PEP 345 requirement string (HTML-encode < as &lt;, > as &gt;)

PEP 691: JSON-based Simple API

Content Types:

  • application/vnd.pypi.simple.v1+json - JSON format
  • application/vnd.pypi.simple.v1+html - HTML format
  • text/html - Alias for HTML (backwards compat)

Root Endpoint JSON:

{
  "meta": {"api-version": "1.0"},
  "projects": [{"name": "ProjectName"}]
}

Project Endpoint JSON:

{
  "name": "normalized-name",
  "meta": {"api-version": "1.0"},
  "files": [
    {
      "filename": "package-1.0-py3-none-any.whl",
      "url": "https://example.com/path/to/file",
      "hashes": {"sha256": "..."},
      "requires-python": ">=3.7",
      "dist-info-metadata": true | {"sha256": "..."},
      "gpg-sig": true,
      "yanked": false | "reason string"
    }
  ]
}

Content Negotiation:

  • Use Accept header for format selection
  • Server responds with Content-Type header
  • Support both JSON and HTML formats

PyPI Upload API (Legacy /legacy/)

Endpoint:

  • URL: https://upload.pypi.org/legacy/
  • Method: POST
  • Content-Type: multipart/form-data

Required Form Fields:

  • :action = file_upload
  • protocol_version = 1
  • content = Binary file data with filename
  • filetype = bdist_wheel | sdist
  • pyversion = Python tag (e.g., py3, py2.py3) or source for sdist
  • metadata_version = Metadata standard version
  • name = Package name
  • version = Version string

Hash Digest (one required):

  • md5_digest: urlsafe base64 without padding
  • sha256_digest: hexadecimal
  • blake2_256_digest: hexadecimal

Optional Fields:

  • attestations: JSON array of attestation objects
  • Any Core Metadata fields (lowercase, hyphens → underscores)
    • Example: Description-Content-Typedescription_content_type

Authentication:

  • Username/password or API token in HTTP Basic Auth
  • API tokens: username = __token__, password = token value

Behavior:

  • First file uploaded creates the release
  • Multiple files uploaded sequentially for same version

PEP 694: Upload 2.0 API

Status: Draft (not yet required, legacy API still supported)

  • Multi-step workflow with sessions
  • Async upload support with resumption
  • JSON-based API
  • Standard HTTP auth (RFC 7235)
  • Not implementing initially (legacy API sufficient)

Ruby (RubyGems) Protocol Implementation

Compact Index Format

Endpoints:

  • /versions - Master list of all gems and versions
  • /info/<RUBYGEM> - Detailed info for specific gem
  • /names - Simple list of gem names

Authentication:

  • UUID tokens similar to NPM pattern
  • API key in Authorization header
  • Scope format: rubygems:gem:{name}:{read|write|yank}

/versions File Format

Structure:

created_at: 2024-04-01T00:00:05Z
---
RUBYGEM [-]VERSION_PLATFORM[,VERSION_PLATFORM,...] MD5

Details:

  • Metadata lines before --- delimiter
  • One line per gem with comma-separated versions
  • [-] prefix indicates yanked version
  • MD5: Checksum of corresponding /info/<RUBYGEM> file
  • Append-only during month, recalculated monthly

/info/<RUBYGEM> File Format

Structure:

---
VERSION[-PLATFORM] [DEPENDENCY[,DEPENDENCY,...]]|REQUIREMENT[,REQUIREMENT,...]

Dependency Format:

GEM:CONSTRAINT[&CONSTRAINT]
  • Examples: actionmailer:= 2.2.2, parser:>= 3.2.2.3
  • Operators: =, >, <, >=, <=, ~>, !=
  • Multiple constraints: unicode-display_width:< 3.0&>= 2.4.0

Requirement Format:

checksum:SHA256_HEX
ruby:CONSTRAINT
rubygems:CONSTRAINT

Platform:

  • Default platform is ruby
  • Non-default platforms: VERSION-PLATFORM (e.g., 3.2.1-arm64-darwin)

Yanked Gems:

  • Listed with - prefix in /versions
  • Excluded entirely from /info/<RUBYGEM> file

/names File Format

---
gemname1
gemname2
gemname3

HTTP Range Support

Headers:

  • Range: bytes=#{start}-: Request from byte position
  • If-None-Match: ETag conditional request
  • Repr-Digest: SHA256 checksum in response

Caching Strategy:

  1. Store file with last byte position
  2. Request range from last position
  3. Append response to existing file
  4. Verify SHA256 against Repr-Digest

RubyGems Upload/Management API

Upload Gem:

  • POST /api/v1/gems
  • Binary .gem file in request body
  • Authorization header with API key

Yank Version:

  • DELETE /api/v1/gems/yank
  • Parameters: gem_name, version

Unyank Version:

  • PUT /api/v1/gems/unyank
  • Parameters: gem_name, version

Version Metadata:

  • GET /api/v1/versions/<gem>.json
  • Returns JSON array of versions

Dependencies:

  • GET /api/v1/dependencies?gems=<comma-list>
  • Returns dependency information for resolution

Implementation Details

Completed Protocols

  • OCI Distribution Spec v1.1
  • NPM Registry API
  • Maven Repository
  • Cargo/crates.io Registry
  • Composer/Packagist
  • PyPI (Python Package Index) - PEP 503/691
  • RubyGems - Compact Index

Storage Paths

PyPI:

pypi/
├── simple/                          # PEP 503 HTML files
│   ├── index.html                  # All packages list
│   └── {package}/index.html        # Package versions list
├── packages/
│   └── {package}/{filename}        # .whl and .tar.gz files
└── metadata/
    └── {package}/metadata.json     # Package metadata

RubyGems:

rubygems/
├── versions                         # Master versions file
├── info/{gemname}                   # Per-gem info files
├── names                            # All gem names
└── gems/{gemname}-{version}.gem    # .gem files

Authentication Pattern

Both protocols should follow the existing UUID token pattern used by NPM, Maven, Cargo, Composer:

// AuthManager additions
createPypiToken(userId: string, readonly: boolean): string
validatePypiToken(token: string): ITokenInfo | null
revokePypiToken(token: string): boolean

createRubyGemsToken(userId: string, readonly: boolean): string
validateRubyGemsToken(token: string): ITokenInfo | null
revokeRubyGemsToken(token: string): boolean

Scope Format

pypi:package:{name}:{read|write}
rubygems:gem:{name}:{read|write|yank}

Common Patterns

  1. Package name normalization - Critical for PyPI
  2. Checksum calculation - SHA256 for both protocols
  3. Append-only files - RubyGems compact index
  4. Content negotiation - PyPI JSON vs HTML
  5. Multipart upload parsing - PyPI file uploads
  6. Binary file handling - Both protocols (.whl, .tar.gz, .gem)

Key Differences from Existing Protocols

PyPI vs NPM:

  • PyPI uses Simple API (HTML) + JSON API
  • PyPI requires package name normalization
  • PyPI uses multipart form data for uploads (not JSON)
  • PyPI supports multiple file types per release (wheel + sdist)

RubyGems vs Cargo:

  • RubyGems uses compact index (append-only text files)
  • RubyGems uses checksums in index files (not just filenames)
  • RubyGems has HTTP Range support for incremental updates
  • RubyGems uses MD5 for index checksums, SHA256 for .gem files

Testing Requirements

PyPI Tests Must Cover:

  • Package upload (wheel and sdist)
  • Package name normalization
  • Simple API HTML generation (PEP 503)
  • JSON API responses (PEP 691)
  • Content negotiation
  • Hash calculation and verification
  • Authentication (tokens)
  • Multi-file releases
  • Yanked packages

RubyGems Tests Must Cover:

  • Gem upload
  • Compact index generation
  • /versions file updates (append-only)
  • /info/<gem> file generation
  • /names file generation
  • Checksum calculations (MD5 and SHA256)
  • Platform-specific gems
  • Yanking/unyanking
  • HTTP Range requests
  • Authentication (API keys)

Security Considerations

  1. Package name validation - Prevent path traversal
  2. File size limits - Prevent DoS via large uploads
  3. Content-Type validation - Verify file types
  4. Checksum verification - Ensure file integrity
  5. Token scope enforcement - Read vs write permissions
  6. HTML escaping - Prevent XSS in generated HTML
  7. Metadata sanitization - Clean user-provided strings
  8. Rate limiting - Consider upload frequency limits

Implementation Status (Completed)

PyPI Implementation

  • Files Created:

    • ts/pypi/interfaces.pypi.ts - Type definitions (354 lines)
    • ts/pypi/helpers.pypi.ts - Helper functions (280 lines)
    • ts/pypi/classes.pypiregistry.ts - Main registry (650 lines)
    • ts/pypi/index.ts - Module exports
  • Features Implemented:

    • PEP 503 Simple API (HTML)
    • PEP 691 JSON API
    • Content negotiation (Accept header)
    • Package name normalization
    • File upload with multipart/form-data
    • Hash verification (SHA256, MD5, Blake2b)
    • Package metadata management
    • JSON API endpoints (/pypi/{package}/json)
    • Token-based authentication
    • Scope-based permissions (read/write/delete)
  • Security Enhancements:

    • Hash verification on upload (validates client-provided hashes)
    • Package name validation (regex check)
    • HTML escaping in generated pages
    • Permission checks on all mutating operations

RubyGems Implementation

  • Files Created:

    • ts/rubygems/interfaces.rubygems.ts - Type definitions (215 lines)
    • ts/rubygems/helpers.rubygems.ts - Helper functions (350 lines)
    • ts/rubygems/classes.rubygemsregistry.ts - Main registry (580 lines)
    • ts/rubygems/index.ts - Module exports
  • Features Implemented:

    • Compact Index format (modern Bundler)
    • /versions endpoint (all gems list)
    • /info/{gem} endpoint (gem-specific metadata)
    • /names endpoint (gem names list)
    • Gem upload API
    • Yank/unyank functionality
    • Platform-specific gems support
    • JSON API endpoints
    • Legacy endpoints (specs.4.8.gz, Marshal.4.8)
    • Token-based authentication
    • Scope-based permissions

Integration

  • Core Updates:

    • Updated IRegistryConfig interface
    • Updated TRegistryProtocol type
    • Added authentication methods to AuthManager
    • Added 30+ storage methods to RegistryStorage
    • Updated SmartRegistry initialization and routing
    • Module exports from ts/index.ts
  • Test Coverage:

    • test/test.pypi.ts - 25+ tests covering all PyPI endpoints
    • test/test.rubygems.ts - 30+ tests covering all RubyGems endpoints
    • test/test.integration.pypi-rubygems.ts - Integration tests
    • Updated test helpers with PyPI and RubyGems support

Known Limitations

  1. PyPI:

    • Does not implement legacy XML-RPC API
    • No support for PGP signatures (data-gpg-sig always false)
    • Metadata extraction from wheel files not implemented
  2. RubyGems:

    • Gem spec extraction from .gem files returns placeholder (Ruby Marshal parsing not implemented)
    • Legacy Marshal endpoints return basic data only
    • No support for gem dependencies resolution

Configuration Example

{
  pypi: {
    enabled: true,
    basePath: '/pypi', // Also handles /simple
  },
  rubygems: {
    enabled: true,
    basePath: '/rubygems',
  },
  auth: {
    pypiTokens: { enabled: true },
    rubygemsTokens: { enabled: true },
  }
}