2025-11-19 14:41:19 +00:00
|
|
|
# Project Readme Hints
|
|
|
|
|
|
2025-11-21 14:23:18 +00:00
|
|
|
## Python (PyPI) Protocol Implementation Notes
|
|
|
|
|
|
|
|
|
|
### PEP 503: Simple Repository API (HTML-based)
|
|
|
|
|
|
|
|
|
|
**URL Structure:**
|
|
|
|
|
- Root: `/<base>/` - Lists all projects
|
|
|
|
|
- Project: `/<base>/<project>/` - Lists all files for a project
|
|
|
|
|
- All URLs MUST end with `/` (redirect if missing)
|
|
|
|
|
|
|
|
|
|
**Package Name Normalization:**
|
|
|
|
|
- Lowercase all characters
|
|
|
|
|
- Replace runs of `.`, `-`, `_` with single `-`
|
|
|
|
|
- Implementation: `re.sub(r"[-_.]+", "-", name).lower()`
|
|
|
|
|
|
|
|
|
|
**HTML Format:**
|
|
|
|
|
- Root: One anchor per project
|
|
|
|
|
- Project: One anchor per file
|
|
|
|
|
- Anchor text must match final filename
|
|
|
|
|
- Anchor href links to download URL
|
|
|
|
|
|
|
|
|
|
**Hash Fragments:**
|
|
|
|
|
Format: `#<hashname>=<hashvalue>`
|
|
|
|
|
- hashname: lowercase hash function name (recommend `sha256`)
|
|
|
|
|
- hashvalue: hex-encoded digest
|
|
|
|
|
|
|
|
|
|
**Data Attributes:**
|
|
|
|
|
- `data-gpg-sig`: `true`/`false` for GPG signature presence
|
|
|
|
|
- `data-requires-python`: PEP 345 requirement string (HTML-encode `<` as `<`, `>` as `>`)
|
|
|
|
|
|
|
|
|
|
### PEP 691: JSON-based Simple API
|
|
|
|
|
|
|
|
|
|
**Content Types:**
|
|
|
|
|
- `application/vnd.pypi.simple.v1+json` - JSON format
|
|
|
|
|
- `application/vnd.pypi.simple.v1+html` - HTML format
|
|
|
|
|
- `text/html` - Alias for HTML (backwards compat)
|
|
|
|
|
|
|
|
|
|
**Root Endpoint JSON:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"meta": {"api-version": "1.0"},
|
|
|
|
|
"projects": [{"name": "ProjectName"}]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Project Endpoint JSON:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"name": "normalized-name",
|
|
|
|
|
"meta": {"api-version": "1.0"},
|
|
|
|
|
"files": [
|
|
|
|
|
{
|
|
|
|
|
"filename": "package-1.0-py3-none-any.whl",
|
|
|
|
|
"url": "https://example.com/path/to/file",
|
|
|
|
|
"hashes": {"sha256": "..."},
|
|
|
|
|
"requires-python": ">=3.7",
|
|
|
|
|
"dist-info-metadata": true | {"sha256": "..."},
|
|
|
|
|
"gpg-sig": true,
|
|
|
|
|
"yanked": false | "reason string"
|
|
|
|
|
}
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Content Negotiation:**
|
|
|
|
|
- Use `Accept` header for format selection
|
|
|
|
|
- Server responds with `Content-Type` header
|
|
|
|
|
- Support both JSON and HTML formats
|
|
|
|
|
|
|
|
|
|
### PyPI Upload API (Legacy /legacy/)
|
|
|
|
|
|
|
|
|
|
**Endpoint:**
|
|
|
|
|
- URL: `https://upload.pypi.org/legacy/`
|
|
|
|
|
- Method: `POST`
|
|
|
|
|
- Content-Type: `multipart/form-data`
|
|
|
|
|
|
|
|
|
|
**Required Form Fields:**
|
|
|
|
|
- `:action` = `file_upload`
|
|
|
|
|
- `protocol_version` = `1`
|
|
|
|
|
- `content` = Binary file data with filename
|
|
|
|
|
- `filetype` = `bdist_wheel` | `sdist`
|
|
|
|
|
- `pyversion` = Python tag (e.g., `py3`, `py2.py3`) or `source` for sdist
|
|
|
|
|
- `metadata_version` = Metadata standard version
|
|
|
|
|
- `name` = Package name
|
|
|
|
|
- `version` = Version string
|
|
|
|
|
|
|
|
|
|
**Hash Digest (one required):**
|
|
|
|
|
- `md5_digest`: urlsafe base64 without padding
|
|
|
|
|
- `sha256_digest`: hexadecimal
|
|
|
|
|
- `blake2_256_digest`: hexadecimal
|
|
|
|
|
|
|
|
|
|
**Optional Fields:**
|
|
|
|
|
- `attestations`: JSON array of attestation objects
|
|
|
|
|
- Any Core Metadata fields (lowercase, hyphens → underscores)
|
|
|
|
|
- Example: `Description-Content-Type` → `description_content_type`
|
|
|
|
|
|
|
|
|
|
**Authentication:**
|
|
|
|
|
- Username/password or API token in HTTP Basic Auth
|
|
|
|
|
- API tokens: username = `__token__`, password = token value
|
|
|
|
|
|
|
|
|
|
**Behavior:**
|
|
|
|
|
- First file uploaded creates the release
|
|
|
|
|
- Multiple files uploaded sequentially for same version
|
|
|
|
|
|
|
|
|
|
### PEP 694: Upload 2.0 API
|
|
|
|
|
|
|
|
|
|
**Status:** Draft (not yet required, legacy API still supported)
|
|
|
|
|
- Multi-step workflow with sessions
|
|
|
|
|
- Async upload support with resumption
|
|
|
|
|
- JSON-based API
|
|
|
|
|
- Standard HTTP auth (RFC 7235)
|
|
|
|
|
- Not implementing initially (legacy API sufficient)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Ruby (RubyGems) Protocol Implementation Notes
|
|
|
|
|
|
|
|
|
|
### Compact Index Format
|
|
|
|
|
|
|
|
|
|
**Endpoints:**
|
|
|
|
|
- `/versions` - Master list of all gems and versions
|
|
|
|
|
- `/info/<RUBYGEM>` - Detailed info for specific gem
|
|
|
|
|
- `/names` - Simple list of gem names
|
|
|
|
|
|
|
|
|
|
**Authentication:**
|
|
|
|
|
- UUID tokens similar to NPM pattern
|
|
|
|
|
- API key in `Authorization` header
|
|
|
|
|
- Scope format: `rubygems:gem:{name}:{read|write|yank}`
|
|
|
|
|
|
|
|
|
|
### `/versions` File Format
|
|
|
|
|
|
|
|
|
|
**Structure:**
|
|
|
|
|
```
|
|
|
|
|
created_at: 2024-04-01T00:00:05Z
|
|
|
|
|
---
|
|
|
|
|
RUBYGEM [-]VERSION_PLATFORM[,VERSION_PLATFORM,...] MD5
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Details:**
|
|
|
|
|
- Metadata lines before `---` delimiter
|
|
|
|
|
- One line per gem with comma-separated versions
|
|
|
|
|
- `[-]` prefix indicates yanked version
|
|
|
|
|
- `MD5`: Checksum of corresponding `/info/<RUBYGEM>` file
|
|
|
|
|
- Append-only during month, recalculated monthly
|
|
|
|
|
|
|
|
|
|
### `/info/<RUBYGEM>` File Format
|
|
|
|
|
|
|
|
|
|
**Structure:**
|
|
|
|
|
```
|
|
|
|
|
---
|
|
|
|
|
VERSION[-PLATFORM] [DEPENDENCY[,DEPENDENCY,...]]|REQUIREMENT[,REQUIREMENT,...]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Dependency Format:**
|
|
|
|
|
```
|
|
|
|
|
GEM:CONSTRAINT[&CONSTRAINT]
|
|
|
|
|
```
|
|
|
|
|
- Examples: `actionmailer:= 2.2.2`, `parser:>= 3.2.2.3`
|
|
|
|
|
- Operators: `=`, `>`, `<`, `>=`, `<=`, `~>`, `!=`
|
|
|
|
|
- Multiple constraints: `unicode-display_width:< 3.0&>= 2.4.0`
|
|
|
|
|
|
|
|
|
|
**Requirement Format:**
|
|
|
|
|
```
|
|
|
|
|
checksum:SHA256_HEX
|
|
|
|
|
ruby:CONSTRAINT
|
|
|
|
|
rubygems:CONSTRAINT
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Platform:**
|
|
|
|
|
- Default platform is `ruby`
|
|
|
|
|
- Non-default platforms: `VERSION-PLATFORM` (e.g., `3.2.1-arm64-darwin`)
|
|
|
|
|
|
|
|
|
|
**Yanked Gems:**
|
|
|
|
|
- Listed with `-` prefix in `/versions`
|
|
|
|
|
- Excluded entirely from `/info/<RUBYGEM>` file
|
|
|
|
|
|
|
|
|
|
### `/names` File Format
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
---
|
|
|
|
|
gemname1
|
|
|
|
|
gemname2
|
|
|
|
|
gemname3
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### HTTP Range Support
|
|
|
|
|
|
|
|
|
|
**Headers:**
|
|
|
|
|
- `Range: bytes=#{start}-`: Request from byte position
|
|
|
|
|
- `If-None-Match`: ETag conditional request
|
|
|
|
|
- `Repr-Digest`: SHA256 checksum in response
|
|
|
|
|
|
|
|
|
|
**Caching Strategy:**
|
|
|
|
|
1. Store file with last byte position
|
|
|
|
|
2. Request range from last position
|
|
|
|
|
3. Append response to existing file
|
|
|
|
|
4. Verify SHA256 against `Repr-Digest`
|
|
|
|
|
|
|
|
|
|
### RubyGems Upload/Management API
|
|
|
|
|
|
|
|
|
|
**Upload Gem:**
|
|
|
|
|
- `POST /api/v1/gems`
|
|
|
|
|
- Binary `.gem` file in request body
|
|
|
|
|
- `Authorization` header with API key
|
|
|
|
|
|
|
|
|
|
**Yank Version:**
|
|
|
|
|
- `DELETE /api/v1/gems/yank`
|
|
|
|
|
- Parameters: `gem_name`, `version`
|
|
|
|
|
|
|
|
|
|
**Unyank Version:**
|
|
|
|
|
- `PUT /api/v1/gems/unyank`
|
|
|
|
|
- Parameters: `gem_name`, `version`
|
|
|
|
|
|
|
|
|
|
**Version Metadata:**
|
|
|
|
|
- `GET /api/v1/versions/<gem>.json`
|
|
|
|
|
- Returns JSON array of versions
|
|
|
|
|
|
|
|
|
|
**Dependencies:**
|
|
|
|
|
- `GET /api/v1/dependencies?gems=<comma-list>`
|
|
|
|
|
- Returns dependency information for resolution
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Implementation Strategy
|
|
|
|
|
|
|
|
|
|
### Storage Paths
|
|
|
|
|
|
|
|
|
|
**PyPI:**
|
|
|
|
|
```
|
|
|
|
|
pypi/
|
|
|
|
|
├── simple/ # PEP 503 HTML files
|
|
|
|
|
│ ├── index.html # All packages list
|
|
|
|
|
│ └── {package}/index.html # Package versions list
|
|
|
|
|
├── packages/
|
|
|
|
|
│ └── {package}/{filename} # .whl and .tar.gz files
|
|
|
|
|
└── metadata/
|
|
|
|
|
└── {package}/metadata.json # Package metadata
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**RubyGems:**
|
|
|
|
|
```
|
|
|
|
|
rubygems/
|
|
|
|
|
├── versions # Master versions file
|
|
|
|
|
├── info/{gemname} # Per-gem info files
|
|
|
|
|
├── names # All gem names
|
|
|
|
|
└── gems/{gemname}-{version}.gem # .gem files
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Authentication Pattern
|
|
|
|
|
|
|
|
|
|
Both protocols should follow the existing UUID token pattern used by NPM, Maven, Cargo, Composer:
|
|
|
|
|
|
|
|
|
|
```typescript
|
|
|
|
|
// AuthManager additions
|
|
|
|
|
createPypiToken(userId: string, readonly: boolean): string
|
|
|
|
|
validatePypiToken(token: string): ITokenInfo | null
|
|
|
|
|
revokePypiToken(token: string): boolean
|
|
|
|
|
|
|
|
|
|
createRubyGemsToken(userId: string, readonly: boolean): string
|
|
|
|
|
validateRubyGemsToken(token: string): ITokenInfo | null
|
|
|
|
|
revokeRubyGemsToken(token: string): boolean
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Scope Format
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
pypi:package:{name}:{read|write}
|
|
|
|
|
rubygems:gem:{name}:{read|write|yank}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Common Patterns
|
|
|
|
|
|
|
|
|
|
1. **Package name normalization** - Critical for PyPI
|
|
|
|
|
2. **Checksum calculation** - SHA256 for both protocols
|
|
|
|
|
3. **Append-only files** - RubyGems compact index
|
|
|
|
|
4. **Content negotiation** - PyPI JSON vs HTML
|
|
|
|
|
5. **Multipart upload parsing** - PyPI file uploads
|
|
|
|
|
6. **Binary file handling** - Both protocols (.whl, .tar.gz, .gem)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Key Differences from Existing Protocols
|
|
|
|
|
|
|
|
|
|
**PyPI vs NPM:**
|
|
|
|
|
- PyPI uses Simple API (HTML) + JSON API
|
|
|
|
|
- PyPI requires package name normalization
|
|
|
|
|
- PyPI uses multipart form data for uploads (not JSON)
|
|
|
|
|
- PyPI supports multiple file types per release (wheel + sdist)
|
|
|
|
|
|
|
|
|
|
**RubyGems vs Cargo:**
|
|
|
|
|
- RubyGems uses compact index (append-only text files)
|
|
|
|
|
- RubyGems uses checksums in index files (not just filenames)
|
|
|
|
|
- RubyGems has HTTP Range support for incremental updates
|
|
|
|
|
- RubyGems uses MD5 for index checksums, SHA256 for .gem files
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Testing Requirements
|
|
|
|
|
|
|
|
|
|
### PyPI Tests Must Cover:
|
|
|
|
|
- Package upload (wheel and sdist)
|
|
|
|
|
- Package name normalization
|
|
|
|
|
- Simple API HTML generation (PEP 503)
|
|
|
|
|
- JSON API responses (PEP 691)
|
|
|
|
|
- Content negotiation
|
|
|
|
|
- Hash calculation and verification
|
|
|
|
|
- Authentication (tokens)
|
|
|
|
|
- Multi-file releases
|
|
|
|
|
- Yanked packages
|
|
|
|
|
|
|
|
|
|
### RubyGems Tests Must Cover:
|
|
|
|
|
- Gem upload
|
|
|
|
|
- Compact index generation
|
|
|
|
|
- `/versions` file updates (append-only)
|
|
|
|
|
- `/info/<gem>` file generation
|
|
|
|
|
- `/names` file generation
|
|
|
|
|
- Checksum calculations (MD5 and SHA256)
|
|
|
|
|
- Platform-specific gems
|
|
|
|
|
- Yanking/unyanking
|
|
|
|
|
- HTTP Range requests
|
|
|
|
|
- Authentication (API keys)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Security Considerations
|
|
|
|
|
|
|
|
|
|
1. **Package name validation** - Prevent path traversal
|
|
|
|
|
2. **File size limits** - Prevent DoS via large uploads
|
|
|
|
|
3. **Content-Type validation** - Verify file types
|
|
|
|
|
4. **Checksum verification** - Ensure file integrity
|
|
|
|
|
5. **Token scope enforcement** - Read vs write permissions
|
|
|
|
|
6. **HTML escaping** - Prevent XSS in generated HTML
|
|
|
|
|
7. **Metadata sanitization** - Clean user-provided strings
|
|
|
|
|
8. **Rate limiting** - Consider upload frequency limits
|