Files
smartregistry/readme.hints.md

8.7 KiB

Project Readme Hints

Python (PyPI) Protocol Implementation Notes

PEP 503: Simple Repository API (HTML-based)

URL Structure:

  • Root: /<base>/ - Lists all projects
  • Project: /<base>/<project>/ - Lists all files for a project
  • All URLs MUST end with / (redirect if missing)

Package Name Normalization:

  • Lowercase all characters
  • Replace runs of ., -, _ with single -
  • Implementation: re.sub(r"[-_.]+", "-", name).lower()

HTML Format:

  • Root: One anchor per project
  • Project: One anchor per file
  • Anchor text must match final filename
  • Anchor href links to download URL

Hash Fragments: Format: #<hashname>=<hashvalue>

  • hashname: lowercase hash function name (recommend sha256)
  • hashvalue: hex-encoded digest

Data Attributes:

  • data-gpg-sig: true/false for GPG signature presence
  • data-requires-python: PEP 345 requirement string (HTML-encode < as &lt;, > as &gt;)

PEP 691: JSON-based Simple API

Content Types:

  • application/vnd.pypi.simple.v1+json - JSON format
  • application/vnd.pypi.simple.v1+html - HTML format
  • text/html - Alias for HTML (backwards compat)

Root Endpoint JSON:

{
  "meta": {"api-version": "1.0"},
  "projects": [{"name": "ProjectName"}]
}

Project Endpoint JSON:

{
  "name": "normalized-name",
  "meta": {"api-version": "1.0"},
  "files": [
    {
      "filename": "package-1.0-py3-none-any.whl",
      "url": "https://example.com/path/to/file",
      "hashes": {"sha256": "..."},
      "requires-python": ">=3.7",
      "dist-info-metadata": true | {"sha256": "..."},
      "gpg-sig": true,
      "yanked": false | "reason string"
    }
  ]
}

Content Negotiation:

  • Use Accept header for format selection
  • Server responds with Content-Type header
  • Support both JSON and HTML formats

PyPI Upload API (Legacy /legacy/)

Endpoint:

  • URL: https://upload.pypi.org/legacy/
  • Method: POST
  • Content-Type: multipart/form-data

Required Form Fields:

  • :action = file_upload
  • protocol_version = 1
  • content = Binary file data with filename
  • filetype = bdist_wheel | sdist
  • pyversion = Python tag (e.g., py3, py2.py3) or source for sdist
  • metadata_version = Metadata standard version
  • name = Package name
  • version = Version string

Hash Digest (one required):

  • md5_digest: urlsafe base64 without padding
  • sha256_digest: hexadecimal
  • blake2_256_digest: hexadecimal

Optional Fields:

  • attestations: JSON array of attestation objects
  • Any Core Metadata fields (lowercase, hyphens → underscores)
    • Example: Description-Content-Typedescription_content_type

Authentication:

  • Username/password or API token in HTTP Basic Auth
  • API tokens: username = __token__, password = token value

Behavior:

  • First file uploaded creates the release
  • Multiple files uploaded sequentially for same version

PEP 694: Upload 2.0 API

Status: Draft (not yet required, legacy API still supported)

  • Multi-step workflow with sessions
  • Async upload support with resumption
  • JSON-based API
  • Standard HTTP auth (RFC 7235)
  • Not implementing initially (legacy API sufficient)

Ruby (RubyGems) Protocol Implementation Notes

Compact Index Format

Endpoints:

  • /versions - Master list of all gems and versions
  • /info/<RUBYGEM> - Detailed info for specific gem
  • /names - Simple list of gem names

Authentication:

  • UUID tokens similar to NPM pattern
  • API key in Authorization header
  • Scope format: rubygems:gem:{name}:{read|write|yank}

/versions File Format

Structure:

created_at: 2024-04-01T00:00:05Z
---
RUBYGEM [-]VERSION_PLATFORM[,VERSION_PLATFORM,...] MD5

Details:

  • Metadata lines before --- delimiter
  • One line per gem with comma-separated versions
  • [-] prefix indicates yanked version
  • MD5: Checksum of corresponding /info/<RUBYGEM> file
  • Append-only during month, recalculated monthly

/info/<RUBYGEM> File Format

Structure:

---
VERSION[-PLATFORM] [DEPENDENCY[,DEPENDENCY,...]]|REQUIREMENT[,REQUIREMENT,...]

Dependency Format:

GEM:CONSTRAINT[&CONSTRAINT]
  • Examples: actionmailer:= 2.2.2, parser:>= 3.2.2.3
  • Operators: =, >, <, >=, <=, ~>, !=
  • Multiple constraints: unicode-display_width:< 3.0&>= 2.4.0

Requirement Format:

checksum:SHA256_HEX
ruby:CONSTRAINT
rubygems:CONSTRAINT

Platform:

  • Default platform is ruby
  • Non-default platforms: VERSION-PLATFORM (e.g., 3.2.1-arm64-darwin)

Yanked Gems:

  • Listed with - prefix in /versions
  • Excluded entirely from /info/<RUBYGEM> file

/names File Format

---
gemname1
gemname2
gemname3

HTTP Range Support

Headers:

  • Range: bytes=#{start}-: Request from byte position
  • If-None-Match: ETag conditional request
  • Repr-Digest: SHA256 checksum in response

Caching Strategy:

  1. Store file with last byte position
  2. Request range from last position
  3. Append response to existing file
  4. Verify SHA256 against Repr-Digest

RubyGems Upload/Management API

Upload Gem:

  • POST /api/v1/gems
  • Binary .gem file in request body
  • Authorization header with API key

Yank Version:

  • DELETE /api/v1/gems/yank
  • Parameters: gem_name, version

Unyank Version:

  • PUT /api/v1/gems/unyank
  • Parameters: gem_name, version

Version Metadata:

  • GET /api/v1/versions/<gem>.json
  • Returns JSON array of versions

Dependencies:

  • GET /api/v1/dependencies?gems=<comma-list>
  • Returns dependency information for resolution

Implementation Strategy

Storage Paths

PyPI:

pypi/
├── simple/                          # PEP 503 HTML files
│   ├── index.html                  # All packages list
│   └── {package}/index.html        # Package versions list
├── packages/
│   └── {package}/{filename}        # .whl and .tar.gz files
└── metadata/
    └── {package}/metadata.json     # Package metadata

RubyGems:

rubygems/
├── versions                         # Master versions file
├── info/{gemname}                   # Per-gem info files
├── names                            # All gem names
└── gems/{gemname}-{version}.gem    # .gem files

Authentication Pattern

Both protocols should follow the existing UUID token pattern used by NPM, Maven, Cargo, Composer:

// AuthManager additions
createPypiToken(userId: string, readonly: boolean): string
validatePypiToken(token: string): ITokenInfo | null
revokePypiToken(token: string): boolean

createRubyGemsToken(userId: string, readonly: boolean): string
validateRubyGemsToken(token: string): ITokenInfo | null
revokeRubyGemsToken(token: string): boolean

Scope Format

pypi:package:{name}:{read|write}
rubygems:gem:{name}:{read|write|yank}

Common Patterns

  1. Package name normalization - Critical for PyPI
  2. Checksum calculation - SHA256 for both protocols
  3. Append-only files - RubyGems compact index
  4. Content negotiation - PyPI JSON vs HTML
  5. Multipart upload parsing - PyPI file uploads
  6. Binary file handling - Both protocols (.whl, .tar.gz, .gem)

Key Differences from Existing Protocols

PyPI vs NPM:

  • PyPI uses Simple API (HTML) + JSON API
  • PyPI requires package name normalization
  • PyPI uses multipart form data for uploads (not JSON)
  • PyPI supports multiple file types per release (wheel + sdist)

RubyGems vs Cargo:

  • RubyGems uses compact index (append-only text files)
  • RubyGems uses checksums in index files (not just filenames)
  • RubyGems has HTTP Range support for incremental updates
  • RubyGems uses MD5 for index checksums, SHA256 for .gem files

Testing Requirements

PyPI Tests Must Cover:

  • Package upload (wheel and sdist)
  • Package name normalization
  • Simple API HTML generation (PEP 503)
  • JSON API responses (PEP 691)
  • Content negotiation
  • Hash calculation and verification
  • Authentication (tokens)
  • Multi-file releases
  • Yanked packages

RubyGems Tests Must Cover:

  • Gem upload
  • Compact index generation
  • /versions file updates (append-only)
  • /info/<gem> file generation
  • /names file generation
  • Checksum calculations (MD5 and SHA256)
  • Platform-specific gems
  • Yanking/unyanking
  • HTTP Range requests
  • Authentication (API keys)

Security Considerations

  1. Package name validation - Prevent path traversal
  2. File size limits - Prevent DoS via large uploads
  3. Content-Type validation - Verify file types
  4. Checksum verification - Ensure file integrity
  5. Token scope enforcement - Read vs write permissions
  6. HTML escaping - Prevent XSS in generated HTML
  7. Metadata sanitization - Clean user-provided strings
  8. Rate limiting - Consider upload frequency limits