Files
smartregistry/readme.hints.md

336 lines
8.7 KiB
Markdown
Raw Permalink Normal View History

# Project Readme Hints
## Python (PyPI) Protocol Implementation Notes
### PEP 503: Simple Repository API (HTML-based)
**URL Structure:**
- Root: `/<base>/` - Lists all projects
- Project: `/<base>/<project>/` - Lists all files for a project
- All URLs MUST end with `/` (redirect if missing)
**Package Name Normalization:**
- Lowercase all characters
- Replace runs of `.`, `-`, `_` with single `-`
- Implementation: `re.sub(r"[-_.]+", "-", name).lower()`
**HTML Format:**
- Root: One anchor per project
- Project: One anchor per file
- Anchor text must match final filename
- Anchor href links to download URL
**Hash Fragments:**
Format: `#<hashname>=<hashvalue>`
- hashname: lowercase hash function name (recommend `sha256`)
- hashvalue: hex-encoded digest
**Data Attributes:**
- `data-gpg-sig`: `true`/`false` for GPG signature presence
- `data-requires-python`: PEP 345 requirement string (HTML-encode `<` as `&lt;`, `>` as `&gt;`)
### PEP 691: JSON-based Simple API
**Content Types:**
- `application/vnd.pypi.simple.v1+json` - JSON format
- `application/vnd.pypi.simple.v1+html` - HTML format
- `text/html` - Alias for HTML (backwards compat)
**Root Endpoint JSON:**
```json
{
"meta": {"api-version": "1.0"},
"projects": [{"name": "ProjectName"}]
}
```
**Project Endpoint JSON:**
```json
{
"name": "normalized-name",
"meta": {"api-version": "1.0"},
"files": [
{
"filename": "package-1.0-py3-none-any.whl",
"url": "https://example.com/path/to/file",
"hashes": {"sha256": "..."},
"requires-python": ">=3.7",
"dist-info-metadata": true | {"sha256": "..."},
"gpg-sig": true,
"yanked": false | "reason string"
}
]
}
```
**Content Negotiation:**
- Use `Accept` header for format selection
- Server responds with `Content-Type` header
- Support both JSON and HTML formats
### PyPI Upload API (Legacy /legacy/)
**Endpoint:**
- URL: `https://upload.pypi.org/legacy/`
- Method: `POST`
- Content-Type: `multipart/form-data`
**Required Form Fields:**
- `:action` = `file_upload`
- `protocol_version` = `1`
- `content` = Binary file data with filename
- `filetype` = `bdist_wheel` | `sdist`
- `pyversion` = Python tag (e.g., `py3`, `py2.py3`) or `source` for sdist
- `metadata_version` = Metadata standard version
- `name` = Package name
- `version` = Version string
**Hash Digest (one required):**
- `md5_digest`: urlsafe base64 without padding
- `sha256_digest`: hexadecimal
- `blake2_256_digest`: hexadecimal
**Optional Fields:**
- `attestations`: JSON array of attestation objects
- Any Core Metadata fields (lowercase, hyphens → underscores)
- Example: `Description-Content-Type``description_content_type`
**Authentication:**
- Username/password or API token in HTTP Basic Auth
- API tokens: username = `__token__`, password = token value
**Behavior:**
- First file uploaded creates the release
- Multiple files uploaded sequentially for same version
### PEP 694: Upload 2.0 API
**Status:** Draft (not yet required, legacy API still supported)
- Multi-step workflow with sessions
- Async upload support with resumption
- JSON-based API
- Standard HTTP auth (RFC 7235)
- Not implementing initially (legacy API sufficient)
---
## Ruby (RubyGems) Protocol Implementation Notes
### Compact Index Format
**Endpoints:**
- `/versions` - Master list of all gems and versions
- `/info/<RUBYGEM>` - Detailed info for specific gem
- `/names` - Simple list of gem names
**Authentication:**
- UUID tokens similar to NPM pattern
- API key in `Authorization` header
- Scope format: `rubygems:gem:{name}:{read|write|yank}`
### `/versions` File Format
**Structure:**
```
created_at: 2024-04-01T00:00:05Z
---
RUBYGEM [-]VERSION_PLATFORM[,VERSION_PLATFORM,...] MD5
```
**Details:**
- Metadata lines before `---` delimiter
- One line per gem with comma-separated versions
- `[-]` prefix indicates yanked version
- `MD5`: Checksum of corresponding `/info/<RUBYGEM>` file
- Append-only during month, recalculated monthly
### `/info/<RUBYGEM>` File Format
**Structure:**
```
---
VERSION[-PLATFORM] [DEPENDENCY[,DEPENDENCY,...]]|REQUIREMENT[,REQUIREMENT,...]
```
**Dependency Format:**
```
GEM:CONSTRAINT[&CONSTRAINT]
```
- Examples: `actionmailer:= 2.2.2`, `parser:>= 3.2.2.3`
- Operators: `=`, `>`, `<`, `>=`, `<=`, `~>`, `!=`
- Multiple constraints: `unicode-display_width:< 3.0&>= 2.4.0`
**Requirement Format:**
```
checksum:SHA256_HEX
ruby:CONSTRAINT
rubygems:CONSTRAINT
```
**Platform:**
- Default platform is `ruby`
- Non-default platforms: `VERSION-PLATFORM` (e.g., `3.2.1-arm64-darwin`)
**Yanked Gems:**
- Listed with `-` prefix in `/versions`
- Excluded entirely from `/info/<RUBYGEM>` file
### `/names` File Format
```
---
gemname1
gemname2
gemname3
```
### HTTP Range Support
**Headers:**
- `Range: bytes=#{start}-`: Request from byte position
- `If-None-Match`: ETag conditional request
- `Repr-Digest`: SHA256 checksum in response
**Caching Strategy:**
1. Store file with last byte position
2. Request range from last position
3. Append response to existing file
4. Verify SHA256 against `Repr-Digest`
### RubyGems Upload/Management API
**Upload Gem:**
- `POST /api/v1/gems`
- Binary `.gem` file in request body
- `Authorization` header with API key
**Yank Version:**
- `DELETE /api/v1/gems/yank`
- Parameters: `gem_name`, `version`
**Unyank Version:**
- `PUT /api/v1/gems/unyank`
- Parameters: `gem_name`, `version`
**Version Metadata:**
- `GET /api/v1/versions/<gem>.json`
- Returns JSON array of versions
**Dependencies:**
- `GET /api/v1/dependencies?gems=<comma-list>`
- Returns dependency information for resolution
---
## Implementation Strategy
### Storage Paths
**PyPI:**
```
pypi/
├── simple/ # PEP 503 HTML files
│ ├── index.html # All packages list
│ └── {package}/index.html # Package versions list
├── packages/
│ └── {package}/{filename} # .whl and .tar.gz files
└── metadata/
└── {package}/metadata.json # Package metadata
```
**RubyGems:**
```
rubygems/
├── versions # Master versions file
├── info/{gemname} # Per-gem info files
├── names # All gem names
└── gems/{gemname}-{version}.gem # .gem files
```
### Authentication Pattern
Both protocols should follow the existing UUID token pattern used by NPM, Maven, Cargo, Composer:
```typescript
// AuthManager additions
createPypiToken(userId: string, readonly: boolean): string
validatePypiToken(token: string): ITokenInfo | null
revokePypiToken(token: string): boolean
createRubyGemsToken(userId: string, readonly: boolean): string
validateRubyGemsToken(token: string): ITokenInfo | null
revokeRubyGemsToken(token: string): boolean
```
### Scope Format
```
pypi:package:{name}:{read|write}
rubygems:gem:{name}:{read|write|yank}
```
### Common Patterns
1. **Package name normalization** - Critical for PyPI
2. **Checksum calculation** - SHA256 for both protocols
3. **Append-only files** - RubyGems compact index
4. **Content negotiation** - PyPI JSON vs HTML
5. **Multipart upload parsing** - PyPI file uploads
6. **Binary file handling** - Both protocols (.whl, .tar.gz, .gem)
---
## Key Differences from Existing Protocols
**PyPI vs NPM:**
- PyPI uses Simple API (HTML) + JSON API
- PyPI requires package name normalization
- PyPI uses multipart form data for uploads (not JSON)
- PyPI supports multiple file types per release (wheel + sdist)
**RubyGems vs Cargo:**
- RubyGems uses compact index (append-only text files)
- RubyGems uses checksums in index files (not just filenames)
- RubyGems has HTTP Range support for incremental updates
- RubyGems uses MD5 for index checksums, SHA256 for .gem files
---
## Testing Requirements
### PyPI Tests Must Cover:
- Package upload (wheel and sdist)
- Package name normalization
- Simple API HTML generation (PEP 503)
- JSON API responses (PEP 691)
- Content negotiation
- Hash calculation and verification
- Authentication (tokens)
- Multi-file releases
- Yanked packages
### RubyGems Tests Must Cover:
- Gem upload
- Compact index generation
- `/versions` file updates (append-only)
- `/info/<gem>` file generation
- `/names` file generation
- Checksum calculations (MD5 and SHA256)
- Platform-specific gems
- Yanking/unyanking
- HTTP Range requests
- Authentication (API keys)
---
## Security Considerations
1. **Package name validation** - Prevent path traversal
2. **File size limits** - Prevent DoS via large uploads
3. **Content-Type validation** - Verify file types
4. **Checksum verification** - Ensure file integrity
5. **Token scope enforcement** - Read vs write permissions
6. **HTML escaping** - Prevent XSS in generated HTML
7. **Metadata sanitization** - Clean user-provided strings
8. **Rate limiting** - Consider upload frequency limits