12 KiB
Project Implementation Notes
This file contains technical implementation details for PyPI and RubyGems protocols.
Python (PyPI) Protocol Implementation ✅
PEP 503: Simple Repository API (HTML-based)
URL Structure:
- Root:
/<base>/- Lists all projects - Project:
/<base>/<project>/- Lists all files for a project - All URLs MUST end with
/(redirect if missing)
Package Name Normalization:
- Lowercase all characters
- Replace runs of
.,-,_with single- - Implementation:
re.sub(r"[-_.]+", "-", name).lower()
HTML Format:
- Root: One anchor per project
- Project: One anchor per file
- Anchor text must match final filename
- Anchor href links to download URL
Hash Fragments:
Format: #<hashname>=<hashvalue>
- hashname: lowercase hash function name (recommend
sha256) - hashvalue: hex-encoded digest
Data Attributes:
data-gpg-sig:true/falsefor GPG signature presencedata-requires-python: PEP 345 requirement string (HTML-encode<as<,>as>)
PEP 691: JSON-based Simple API
Content Types:
application/vnd.pypi.simple.v1+json- JSON formatapplication/vnd.pypi.simple.v1+html- HTML formattext/html- Alias for HTML (backwards compat)
Root Endpoint JSON:
{
"meta": {"api-version": "1.0"},
"projects": [{"name": "ProjectName"}]
}
Project Endpoint JSON:
{
"name": "normalized-name",
"meta": {"api-version": "1.0"},
"files": [
{
"filename": "package-1.0-py3-none-any.whl",
"url": "https://example.com/path/to/file",
"hashes": {"sha256": "..."},
"requires-python": ">=3.7",
"dist-info-metadata": true | {"sha256": "..."},
"gpg-sig": true,
"yanked": false | "reason string"
}
]
}
Content Negotiation:
- Use
Acceptheader for format selection - Server responds with
Content-Typeheader - Support both JSON and HTML formats
PyPI Upload API (Legacy /legacy/)
Endpoint:
- URL:
https://upload.pypi.org/legacy/ - Method:
POST - Content-Type:
multipart/form-data
Required Form Fields:
:action=file_uploadprotocol_version=1content= Binary file data with filenamefiletype=bdist_wheel|sdistpyversion= Python tag (e.g.,py3,py2.py3) orsourcefor sdistmetadata_version= Metadata standard versionname= Package nameversion= Version string
Hash Digest (one required):
md5_digest: urlsafe base64 without paddingsha256_digest: hexadecimalblake2_256_digest: hexadecimal
Optional Fields:
attestations: JSON array of attestation objects- Any Core Metadata fields (lowercase, hyphens → underscores)
- Example:
Description-Content-Type→description_content_type
- Example:
Authentication:
- Username/password or API token in HTTP Basic Auth
- API tokens: username =
__token__, password = token value
Behavior:
- First file uploaded creates the release
- Multiple files uploaded sequentially for same version
PEP 694: Upload 2.0 API
Status: Draft (not yet required, legacy API still supported)
- Multi-step workflow with sessions
- Async upload support with resumption
- JSON-based API
- Standard HTTP auth (RFC 7235)
- Not implementing initially (legacy API sufficient)
Ruby (RubyGems) Protocol Implementation ✅
Compact Index Format
Endpoints:
/versions- Master list of all gems and versions/info/<RUBYGEM>- Detailed info for specific gem/names- Simple list of gem names
Authentication:
- UUID tokens similar to NPM pattern
- API key in
Authorizationheader - Scope format:
rubygems:gem:{name}:{read|write|yank}
/versions File Format
Structure:
created_at: 2024-04-01T00:00:05Z
---
RUBYGEM [-]VERSION_PLATFORM[,VERSION_PLATFORM,...] MD5
Details:
- Metadata lines before
---delimiter - One line per gem with comma-separated versions
[-]prefix indicates yanked versionMD5: Checksum of corresponding/info/<RUBYGEM>file- Append-only during month, recalculated monthly
/info/<RUBYGEM> File Format
Structure:
---
VERSION[-PLATFORM] [DEPENDENCY[,DEPENDENCY,...]]|REQUIREMENT[,REQUIREMENT,...]
Dependency Format:
GEM:CONSTRAINT[&CONSTRAINT]
- Examples:
actionmailer:= 2.2.2,parser:>= 3.2.2.3 - Operators:
=,>,<,>=,<=,~>,!= - Multiple constraints:
unicode-display_width:< 3.0&>= 2.4.0
Requirement Format:
checksum:SHA256_HEX
ruby:CONSTRAINT
rubygems:CONSTRAINT
Platform:
- Default platform is
ruby - Non-default platforms:
VERSION-PLATFORM(e.g.,3.2.1-arm64-darwin)
Yanked Gems:
- Listed with
-prefix in/versions - Excluded entirely from
/info/<RUBYGEM>file
/names File Format
---
gemname1
gemname2
gemname3
HTTP Range Support
Headers:
Range: bytes=#{start}-: Request from byte positionIf-None-Match: ETag conditional requestRepr-Digest: SHA256 checksum in response
Caching Strategy:
- Store file with last byte position
- Request range from last position
- Append response to existing file
- Verify SHA256 against
Repr-Digest
RubyGems Upload/Management API
Upload Gem:
POST /api/v1/gems- Binary
.gemfile in request body Authorizationheader with API key
Yank Version:
DELETE /api/v1/gems/yank- Parameters:
gem_name,version
Unyank Version:
PUT /api/v1/gems/unyank- Parameters:
gem_name,version
Version Metadata:
GET /api/v1/versions/<gem>.json- Returns JSON array of versions
Dependencies:
GET /api/v1/dependencies?gems=<comma-list>- Returns dependency information for resolution
Implementation Details
Completed Protocols
- ✅ OCI Distribution Spec v1.1
- ✅ NPM Registry API
- ✅ Maven Repository
- ✅ Cargo/crates.io Registry
- ✅ Composer/Packagist
- ✅ PyPI (Python Package Index) - PEP 503/691
- ✅ RubyGems - Compact Index
Storage Paths
PyPI:
pypi/
├── simple/ # PEP 503 HTML files
│ ├── index.html # All packages list
│ └── {package}/index.html # Package versions list
├── packages/
│ └── {package}/{filename} # .whl and .tar.gz files
└── metadata/
└── {package}/metadata.json # Package metadata
RubyGems:
rubygems/
├── versions # Master versions file
├── info/{gemname} # Per-gem info files
├── names # All gem names
└── gems/{gemname}-{version}.gem # .gem files
Authentication Pattern
Both protocols should follow the existing UUID token pattern used by NPM, Maven, Cargo, Composer:
// AuthManager additions
createPypiToken(userId: string, readonly: boolean): string
validatePypiToken(token: string): ITokenInfo | null
revokePypiToken(token: string): boolean
createRubyGemsToken(userId: string, readonly: boolean): string
validateRubyGemsToken(token: string): ITokenInfo | null
revokeRubyGemsToken(token: string): boolean
Scope Format
pypi:package:{name}:{read|write}
rubygems:gem:{name}:{read|write|yank}
Common Patterns
- Package name normalization - Critical for PyPI
- Checksum calculation - SHA256 for both protocols
- Append-only files - RubyGems compact index
- Content negotiation - PyPI JSON vs HTML
- Multipart upload parsing - PyPI file uploads
- Binary file handling - Both protocols (.whl, .tar.gz, .gem)
Key Differences from Existing Protocols
PyPI vs NPM:
- PyPI uses Simple API (HTML) + JSON API
- PyPI requires package name normalization
- PyPI uses multipart form data for uploads (not JSON)
- PyPI supports multiple file types per release (wheel + sdist)
RubyGems vs Cargo:
- RubyGems uses compact index (append-only text files)
- RubyGems uses checksums in index files (not just filenames)
- RubyGems has HTTP Range support for incremental updates
- RubyGems uses MD5 for index checksums, SHA256 for .gem files
Testing Requirements
PyPI Tests Must Cover:
- Package upload (wheel and sdist)
- Package name normalization
- Simple API HTML generation (PEP 503)
- JSON API responses (PEP 691)
- Content negotiation
- Hash calculation and verification
- Authentication (tokens)
- Multi-file releases
- Yanked packages
RubyGems Tests Must Cover:
- Gem upload
- Compact index generation
/versionsfile updates (append-only)/info/<gem>file generation/namesfile generation- Checksum calculations (MD5 and SHA256)
- Platform-specific gems
- Yanking/unyanking
- HTTP Range requests
- Authentication (API keys)
Security Considerations
- Package name validation - Prevent path traversal
- File size limits - Prevent DoS via large uploads
- Content-Type validation - Verify file types
- Checksum verification - Ensure file integrity
- Token scope enforcement - Read vs write permissions
- HTML escaping - Prevent XSS in generated HTML
- Metadata sanitization - Clean user-provided strings
- Rate limiting - Consider upload frequency limits
Implementation Status (Completed)
PyPI Implementation ✅
-
Files Created:
ts/pypi/interfaces.pypi.ts- Type definitions (354 lines)ts/pypi/helpers.pypi.ts- Helper functions (280 lines)ts/pypi/classes.pypiregistry.ts- Main registry (650 lines)ts/pypi/index.ts- Module exports
-
Features Implemented:
- ✅ PEP 503 Simple API (HTML)
- ✅ PEP 691 JSON API
- ✅ Content negotiation (Accept header)
- ✅ Package name normalization
- ✅ File upload with multipart/form-data
- ✅ Hash verification (SHA256, MD5, Blake2b)
- ✅ Package metadata management
- ✅ JSON API endpoints (/pypi/{package}/json)
- ✅ Token-based authentication
- ✅ Scope-based permissions (read/write/delete)
-
Security Enhancements:
- ✅ Hash verification on upload (validates client-provided hashes)
- ✅ Package name validation (regex check)
- ✅ HTML escaping in generated pages
- ✅ Permission checks on all mutating operations
RubyGems Implementation ✅
-
Files Created:
ts/rubygems/interfaces.rubygems.ts- Type definitions (215 lines)ts/rubygems/helpers.rubygems.ts- Helper functions (350 lines)ts/rubygems/classes.rubygemsregistry.ts- Main registry (580 lines)ts/rubygems/index.ts- Module exports
-
Features Implemented:
- ✅ Compact Index format (modern Bundler)
- ✅ /versions endpoint (all gems list)
- ✅ /info/{gem} endpoint (gem-specific metadata)
- ✅ /names endpoint (gem names list)
- ✅ Gem upload API
- ✅ Yank/unyank functionality
- ✅ Platform-specific gems support
- ✅ JSON API endpoints
- ✅ Legacy endpoints (specs.4.8.gz, Marshal.4.8)
- ✅ Token-based authentication
- ✅ Scope-based permissions
Integration ✅
-
Core Updates:
- ✅ Updated
IRegistryConfiginterface - ✅ Updated
TRegistryProtocoltype - ✅ Added authentication methods to
AuthManager - ✅ Added 30+ storage methods to
RegistryStorage - ✅ Updated
SmartRegistryinitialization and routing - ✅ Module exports from
ts/index.ts
- ✅ Updated
-
Test Coverage:
- ✅
test/test.pypi.ts- 25+ tests covering all PyPI endpoints - ✅
test/test.rubygems.ts- 30+ tests covering all RubyGems endpoints - ✅
test/test.integration.pypi-rubygems.ts- Integration tests - ✅ Updated test helpers with PyPI and RubyGems support
- ✅
Known Limitations
-
PyPI:
- Does not implement legacy XML-RPC API
- No support for PGP signatures (data-gpg-sig always false)
- Metadata extraction from wheel files not implemented
-
RubyGems:
- Gem spec extraction from .gem files returns placeholder (Ruby Marshal parsing not implemented)
- Legacy Marshal endpoints return basic data only
- No support for gem dependencies resolution
Configuration Example
{
pypi: {
enabled: true,
basePath: '/pypi', // Also handles /simple
},
rubygems: {
enabled: true,
basePath: '/rubygems',
},
auth: {
pypiTokens: { enabled: true },
rubygemsTokens: { enabled: true },
}
}