8.7 KiB
8.7 KiB
Project Readme Hints
Python (PyPI) Protocol Implementation Notes
PEP 503: Simple Repository API (HTML-based)
URL Structure:
- Root:
/<base>/- Lists all projects - Project:
/<base>/<project>/- Lists all files for a project - All URLs MUST end with
/(redirect if missing)
Package Name Normalization:
- Lowercase all characters
- Replace runs of
.,-,_with single- - Implementation:
re.sub(r"[-_.]+", "-", name).lower()
HTML Format:
- Root: One anchor per project
- Project: One anchor per file
- Anchor text must match final filename
- Anchor href links to download URL
Hash Fragments:
Format: #<hashname>=<hashvalue>
- hashname: lowercase hash function name (recommend
sha256) - hashvalue: hex-encoded digest
Data Attributes:
data-gpg-sig:true/falsefor GPG signature presencedata-requires-python: PEP 345 requirement string (HTML-encode<as<,>as>)
PEP 691: JSON-based Simple API
Content Types:
application/vnd.pypi.simple.v1+json- JSON formatapplication/vnd.pypi.simple.v1+html- HTML formattext/html- Alias for HTML (backwards compat)
Root Endpoint JSON:
{
"meta": {"api-version": "1.0"},
"projects": [{"name": "ProjectName"}]
}
Project Endpoint JSON:
{
"name": "normalized-name",
"meta": {"api-version": "1.0"},
"files": [
{
"filename": "package-1.0-py3-none-any.whl",
"url": "https://example.com/path/to/file",
"hashes": {"sha256": "..."},
"requires-python": ">=3.7",
"dist-info-metadata": true | {"sha256": "..."},
"gpg-sig": true,
"yanked": false | "reason string"
}
]
}
Content Negotiation:
- Use
Acceptheader for format selection - Server responds with
Content-Typeheader - Support both JSON and HTML formats
PyPI Upload API (Legacy /legacy/)
Endpoint:
- URL:
https://upload.pypi.org/legacy/ - Method:
POST - Content-Type:
multipart/form-data
Required Form Fields:
:action=file_uploadprotocol_version=1content= Binary file data with filenamefiletype=bdist_wheel|sdistpyversion= Python tag (e.g.,py3,py2.py3) orsourcefor sdistmetadata_version= Metadata standard versionname= Package nameversion= Version string
Hash Digest (one required):
md5_digest: urlsafe base64 without paddingsha256_digest: hexadecimalblake2_256_digest: hexadecimal
Optional Fields:
attestations: JSON array of attestation objects- Any Core Metadata fields (lowercase, hyphens → underscores)
- Example:
Description-Content-Type→description_content_type
- Example:
Authentication:
- Username/password or API token in HTTP Basic Auth
- API tokens: username =
__token__, password = token value
Behavior:
- First file uploaded creates the release
- Multiple files uploaded sequentially for same version
PEP 694: Upload 2.0 API
Status: Draft (not yet required, legacy API still supported)
- Multi-step workflow with sessions
- Async upload support with resumption
- JSON-based API
- Standard HTTP auth (RFC 7235)
- Not implementing initially (legacy API sufficient)
Ruby (RubyGems) Protocol Implementation Notes
Compact Index Format
Endpoints:
/versions- Master list of all gems and versions/info/<RUBYGEM>- Detailed info for specific gem/names- Simple list of gem names
Authentication:
- UUID tokens similar to NPM pattern
- API key in
Authorizationheader - Scope format:
rubygems:gem:{name}:{read|write|yank}
/versions File Format
Structure:
created_at: 2024-04-01T00:00:05Z
---
RUBYGEM [-]VERSION_PLATFORM[,VERSION_PLATFORM,...] MD5
Details:
- Metadata lines before
---delimiter - One line per gem with comma-separated versions
[-]prefix indicates yanked versionMD5: Checksum of corresponding/info/<RUBYGEM>file- Append-only during month, recalculated monthly
/info/<RUBYGEM> File Format
Structure:
---
VERSION[-PLATFORM] [DEPENDENCY[,DEPENDENCY,...]]|REQUIREMENT[,REQUIREMENT,...]
Dependency Format:
GEM:CONSTRAINT[&CONSTRAINT]
- Examples:
actionmailer:= 2.2.2,parser:>= 3.2.2.3 - Operators:
=,>,<,>=,<=,~>,!= - Multiple constraints:
unicode-display_width:< 3.0&>= 2.4.0
Requirement Format:
checksum:SHA256_HEX
ruby:CONSTRAINT
rubygems:CONSTRAINT
Platform:
- Default platform is
ruby - Non-default platforms:
VERSION-PLATFORM(e.g.,3.2.1-arm64-darwin)
Yanked Gems:
- Listed with
-prefix in/versions - Excluded entirely from
/info/<RUBYGEM>file
/names File Format
---
gemname1
gemname2
gemname3
HTTP Range Support
Headers:
Range: bytes=#{start}-: Request from byte positionIf-None-Match: ETag conditional requestRepr-Digest: SHA256 checksum in response
Caching Strategy:
- Store file with last byte position
- Request range from last position
- Append response to existing file
- Verify SHA256 against
Repr-Digest
RubyGems Upload/Management API
Upload Gem:
POST /api/v1/gems- Binary
.gemfile in request body Authorizationheader with API key
Yank Version:
DELETE /api/v1/gems/yank- Parameters:
gem_name,version
Unyank Version:
PUT /api/v1/gems/unyank- Parameters:
gem_name,version
Version Metadata:
GET /api/v1/versions/<gem>.json- Returns JSON array of versions
Dependencies:
GET /api/v1/dependencies?gems=<comma-list>- Returns dependency information for resolution
Implementation Strategy
Storage Paths
PyPI:
pypi/
├── simple/ # PEP 503 HTML files
│ ├── index.html # All packages list
│ └── {package}/index.html # Package versions list
├── packages/
│ └── {package}/{filename} # .whl and .tar.gz files
└── metadata/
└── {package}/metadata.json # Package metadata
RubyGems:
rubygems/
├── versions # Master versions file
├── info/{gemname} # Per-gem info files
├── names # All gem names
└── gems/{gemname}-{version}.gem # .gem files
Authentication Pattern
Both protocols should follow the existing UUID token pattern used by NPM, Maven, Cargo, Composer:
// AuthManager additions
createPypiToken(userId: string, readonly: boolean): string
validatePypiToken(token: string): ITokenInfo | null
revokePypiToken(token: string): boolean
createRubyGemsToken(userId: string, readonly: boolean): string
validateRubyGemsToken(token: string): ITokenInfo | null
revokeRubyGemsToken(token: string): boolean
Scope Format
pypi:package:{name}:{read|write}
rubygems:gem:{name}:{read|write|yank}
Common Patterns
- Package name normalization - Critical for PyPI
- Checksum calculation - SHA256 for both protocols
- Append-only files - RubyGems compact index
- Content negotiation - PyPI JSON vs HTML
- Multipart upload parsing - PyPI file uploads
- Binary file handling - Both protocols (.whl, .tar.gz, .gem)
Key Differences from Existing Protocols
PyPI vs NPM:
- PyPI uses Simple API (HTML) + JSON API
- PyPI requires package name normalization
- PyPI uses multipart form data for uploads (not JSON)
- PyPI supports multiple file types per release (wheel + sdist)
RubyGems vs Cargo:
- RubyGems uses compact index (append-only text files)
- RubyGems uses checksums in index files (not just filenames)
- RubyGems has HTTP Range support for incremental updates
- RubyGems uses MD5 for index checksums, SHA256 for .gem files
Testing Requirements
PyPI Tests Must Cover:
- Package upload (wheel and sdist)
- Package name normalization
- Simple API HTML generation (PEP 503)
- JSON API responses (PEP 691)
- Content negotiation
- Hash calculation and verification
- Authentication (tokens)
- Multi-file releases
- Yanked packages
RubyGems Tests Must Cover:
- Gem upload
- Compact index generation
/versionsfile updates (append-only)/info/<gem>file generation/namesfile generation- Checksum calculations (MD5 and SHA256)
- Platform-specific gems
- Yanking/unyanking
- HTTP Range requests
- Authentication (API keys)
Security Considerations
- Package name validation - Prevent path traversal
- File size limits - Prevent DoS via large uploads
- Content-Type validation - Verify file types
- Checksum verification - Ensure file integrity
- Token scope enforcement - Read vs write permissions
- HTML escaping - Prevent XSS in generated HTML
- Metadata sanitization - Clean user-provided strings
- Rate limiting - Consider upload frequency limits