An easy-to-host PDS on the ATProtocol, MacOS. Grandma-approved.

docs: add MM-92 relay signing key generation design plan

Completed brainstorming session. Design includes:
- RustCrypto stack (p256 + aes-gcm + multibase), all audited
- AES-256-GCM encryption at rest with per-key OsRng nonces
- did:key URI format for key IDs (P-256 multicodec 0x1200)
- 3 implementation phases: crypto crate, config+migration, relay endpoint

authored by malpercio.dev and committed by

Tangled 2349133b 3d8e2da5

+189
+189
docs/design-plans/2026-03-11-MM-92.md
··· 1 + # Relay Signing Key Generation Design 2 + 3 + ## Summary 4 + 5 + This ticket adds signing key generation to the relay — the first step toward the relay acting as an ATProto signing authority. A new `POST /v1/relay/keys` endpoint accepts an algorithm name, generates a P-256 ECDSA keypair, encrypts the private key using AES-256-GCM with an operator-supplied master key, persists the encrypted key in SQLite, and returns the key's `did:key` identifier and compressed public key to the caller. The endpoint is gated behind a static admin Bearer token so only the server operator can invoke it. 6 + 7 + The work is organized across three layers following the Functional Core / Imperative Shell pattern already established in this codebase. All cryptographic operations — keypair generation, AES-256-GCM encryption/decryption, and `did:key` URI derivation — are implemented as pure functions in `crates/crypto` with no I/O or async code. The `relay` crate acts as the sole imperative shell: it reads configuration, calls the crypto functions, writes to SQLite via a new V003 migration, and returns the HTTP response. Two new optional config fields (`EZPDS_ADMIN_TOKEN` and `EZPDS_SIGNING_KEY_MASTER_KEY`) are added to `crates/common` following the existing three-layer config pattern; both are `Option<_>` so existing deployments continue to start without error. 8 + 9 + ## Definition of Done 10 + 11 + - `POST /v1/relay/keys` accepts `{ "algorithm": "p256" }`, requires a valid `Authorization: Bearer <admin_token>`, generates a P-256 keypair, stores the private key AES-256-GCM encrypted in SQLite, and returns `{ "key_id", "public_key", "algorithm" }` 12 + - `crates/crypto` implements pure key generation, encryption, decryption, and did:key ID derivation — no I/O, no async 13 + - Private key encrypted at rest using a 32-byte master key from config (`EZPDS_SIGNING_KEY_MASTER_KEY`); private key never appears in API responses or logs 14 + - `EZPDS_ADMIN_TOKEN` required to call the endpoint; missing or wrong token returns 401 15 + - V003 migration adds `relay_signing_keys` table; `cargo test`, `cargo clippy`, `cargo fmt` all pass 16 + 17 + ## Acceptance Criteria 18 + 19 + ### MM-92.AC1: POST generates a valid signing key 20 + - **MM-92.AC1.1 Success:** Valid request with `{ "algorithm": "p256" }` and correct Bearer token returns 200 with `key_id`, `public_key`, and `algorithm` fields 21 + - **MM-92.AC1.2 Success:** `key_id` is a valid `did:key:z...` URI starting with the P-256 multicodec prefix 22 + - **MM-92.AC1.3 Success:** `public_key` is the multibase base58btc-encoded compressed point (no `did:key:` prefix) 23 + - **MM-92.AC1.4 Success:** `algorithm` is `"p256"` in the response 24 + - **MM-92.AC1.5 Success:** A row exists in `relay_signing_keys` with matching `id`, `public_key`, `algorithm`, and non-null `private_key_encrypted` 25 + 26 + ### MM-92.AC2: Private key is never exposed 27 + - **MM-92.AC2.1 Success:** API response body contains no `private_key` or raw key bytes field 28 + - **MM-92.AC2.2 Success:** `private_key_encrypted` in DB is base64-encoded (80 chars) and differs from the raw key bytes 29 + 30 + ### MM-92.AC3: Encryption at rest is correct 31 + - **MM-92.AC3.1 Success:** `encrypt_private_key` / `decrypt_private_key` round-trip produces the original 32 bytes 32 + - **MM-92.AC3.2 Failure:** Decryption with wrong master key returns `CryptoError::Decryption` (authentication tag mismatch) 33 + - **MM-92.AC3.3 Failure:** Decryption of malformed base64 returns `CryptoError::Decryption` 34 + - **MM-92.AC3.4 Success:** Two calls to `encrypt_private_key` with the same key produce different ciphertexts (random nonce) 35 + 36 + ### MM-92.AC4: Authentication 37 + - **MM-92.AC4.1 Failure:** Missing `Authorization` header returns 401 38 + - **MM-92.AC4.2 Failure:** Wrong Bearer token returns 401 39 + - **MM-92.AC4.3 Failure:** `Bearer` prefix missing (bare token) returns 401 40 + 41 + ### MM-92.AC5: Unsupported algorithm 42 + - **MM-92.AC5.1 Failure:** `{ "algorithm": "k256" }` returns 400 43 + - **MM-92.AC5.2 Failure:** `{ "algorithm": "" }` returns 400 44 + - **MM-92.AC5.3 Failure:** Missing `algorithm` field returns 400 45 + 46 + ### MM-92.AC6: Master key not configured 47 + - **MM-92.AC6.1 Failure:** Request with valid token but no `EZPDS_SIGNING_KEY_MASTER_KEY` configured returns 503 48 + 49 + ### MM-92.AC7: Config parsing 50 + - **MM-92.AC7.1 Success:** `EZPDS_ADMIN_TOKEN` env var sets `config.admin_token` 51 + - **MM-92.AC7.2 Success:** `EZPDS_SIGNING_KEY_MASTER_KEY` with 64 valid hex chars parses to `[u8; 32]` 52 + - **MM-92.AC7.3 Failure:** `EZPDS_SIGNING_KEY_MASTER_KEY` with wrong length returns `ConfigError::Invalid` 53 + - **MM-92.AC7.4 Failure:** `EZPDS_SIGNING_KEY_MASTER_KEY` with non-hex chars returns `ConfigError::Invalid` 54 + - **MM-92.AC7.5 Success:** Both fields absent — config loads without error, both are `None` 55 + 56 + ### MM-92.AC8: Toolchain checks 57 + - **MM-92.AC8.1 Success:** `cargo test --workspace` passes 58 + - **MM-92.AC8.2 Success:** `cargo clippy --workspace -- -D warnings` passes 59 + - **MM-92.AC8.3 Success:** `cargo fmt --all --check` passes 60 + 61 + ## Glossary 62 + 63 + - **P-256**: An elliptic curve (also known as `secp256r1` or `prime256v1`) standardized by NIST. Used here for ECDSA signing keypairs. ATProto supports both P-256 and K-256; this ticket implements P-256 only. 64 + - **AES-256-GCM**: AES (Advanced Encryption Standard) in Galois/Counter Mode with a 256-bit key. An authenticated encryption scheme — the decryption step verifies an authentication tag, so tampering with the ciphertext is detected. Used here to encrypt private key bytes at rest. 65 + - **Nonce**: A number used once — a unique value required by AES-GCM to ensure that encrypting the same plaintext twice with the same key produces different ciphertexts. Here a fresh 12-byte nonce is generated from `OsRng` for each key and prepended to the stored ciphertext. 66 + - **Authentication tag**: A fixed-length value appended to an AES-GCM ciphertext that lets the decryptor verify the ciphertext has not been tampered with. If the tag does not verify, decryption returns an error rather than a potentially corrupted plaintext. 67 + - **`did:key`**: A W3C DID method that encodes a public key directly into a URI. No registry lookup is required — the key material is self-contained in the identifier. In ATProto, `did:key` URIs serve as cryptographic identifiers for signing keys. 68 + - **Multicodec**: A table of numeric prefixes (varints) that identify the type of data following them. ATProto uses multicodec prefixes to distinguish key types inside a `did:key` URI; P-256 keys use the prefix `0x1200`. 69 + - **Multibase**: An encoding scheme that prepends a single character to a byte string to identify the encoding used. `did:key` URIs use multibase with the `z` prefix to indicate base58btc encoding. 70 + - **base58btc**: A binary-to-text encoding scheme that uses a 58-character alphabet (no `0`, `O`, `I`, `l` to avoid visual ambiguity). The encoding used for the key bytes inside a `did:key` URI. 71 + - **`zeroize`**: A Rust crate (and trait) that overwrites sensitive memory with zeroes when a value is dropped. The `p256` crate's `SecretKey` type implements `zeroize`, ensuring private key bytes are scrubbed from memory after use. 72 + - **`OsRng`**: The operating-system-provided cryptographically secure random number generator, exposed in Rust via `rand_core`. Used here to generate both the P-256 private key scalar and the per-key AES-GCM nonce. 73 + - **Compressed public key**: A compact representation of an elliptic-curve public key point. A P-256 compressed point is 33 bytes: a single prefix byte (`0x02` or `0x03`) indicating which of two possible y-coordinates applies, followed by the 32-byte x-coordinate. The uncompressed form is 65 bytes. 74 + - **V003 migration**: The third entry in the relay's append-only forward-only migration sequence. Adds the `relay_signing_keys` table. Follows the same convention as V001 and V002 — a new `.sql` file is added and never modified after it is applied. 75 + - **`WITHOUT ROWID`**: A SQLite table option that eliminates the implicit integer rowid that SQLite normally adds to every table. Appropriate here because signing keys are always looked up by their `did:key` primary key, so a separate hidden rowid index would be redundant. 76 + - **Admin Bearer token**: A static shared secret sent in the `Authorization: Bearer <token>` HTTP header. Used here as a lightweight operator authentication mechanism for management endpoints that must not be publicly accessible. The token is compared directly against `EZPDS_ADMIN_TOKEN` in config. 77 + - **503 Service Unavailable**: The HTTP status code returned when the server is running but a required dependency is not configured. Used here when `EZPDS_SIGNING_KEY_MASTER_KEY` is absent — the server is healthy but cannot fulfill key-generation requests. 78 + - **Functional Core / Imperative Shell (FCIS)**: The architectural pattern used throughout this codebase. Pure logic with no side effects (the functional core) lives in library crates such as `crypto`; code that performs I/O lives exclusively in the `relay` crate (the imperative shell). 79 + - **`p256` crate**: A Rust implementation of P-256 elliptic-curve operations, including keypair generation and ECDSA signing, from the RustCrypto project. Audited by zkSecurity (April 2025). 80 + - **`aes-gcm` crate**: A Rust implementation of AES-GCM authenticated encryption from the RustCrypto project. Audited by NCC Group (no significant findings). The `decrypt` API (used here) verifies the authentication tag before returning plaintext, avoiding the GHSA-423w-p2w9-r7vq vulnerability in `decrypt_in_place_detached`. 81 + - **`multibase` crate**: A Rust crate implementing the multibase encoding specification. Used to produce the `z`-prefixed base58btc strings required by the `did:key` format. 82 + - **`CryptoError`**: The typed error enum introduced in `crates/crypto/src/error.rs` for this ticket. Variants: `KeyGeneration`, `Encryption`, `Decryption`, `InvalidKeyId`. 83 + - **`RawConfig` / `validate_and_build`**: The two-step config pattern in `crates/common`. `RawConfig` deserializes all fields as `Option<_>` with no validation; `validate_and_build` applies defaults, parses complex types (such as hex strings to `[u8; 32]`), and returns a validated `Config` or a `ConfigError`. 84 + 85 + ## Architecture 86 + 87 + Signing key generation follows the Functional Core / Imperative Shell pattern already established in this codebase. All crypto operations — keypair generation, encryption, decryption, key ID derivation — live in `crates/crypto` as pure functions with no I/O. The relay crate is the sole imperative shell: it owns the HTTP handler, reads config, writes to SQLite, and returns the HTTP response. 88 + 89 + The `POST /v1/relay/keys` route is a relay management endpoint, not an XRPC endpoint. It lives at `/v1/relay/keys` (not `/xrpc/...`) and is registered directly in `app.rs` alongside the existing routes. 90 + 91 + **Authentication:** A static admin Bearer token (`EZPDS_ADMIN_TOKEN`) gates the endpoint. The relay extracts the `Authorization` header, compares the token against config, and returns 401 if absent or mismatched. This establishes the admin auth pattern for all future operator endpoints. 92 + 93 + **Key ID derivation:** ATProto uses `did:key` URIs as cryptographic identifiers. For P-256, the key ID is constructed as: 94 + ``` 95 + did:key:z + base58btc( [0x80, 0x24] || compressed_public_key_bytes ) 96 + ↑ multibase ↑ P-256 multicodec varint (0x1200) 97 + ``` 98 + The `public_key` field in the response carries the same multibase-encoded bytes without the `did:key:` prefix. 99 + 100 + **Private key encryption:** AES-256-GCM with a 32-byte master key sourced from `EZPDS_SIGNING_KEY_MASTER_KEY`. Each key gets a fresh 12-byte nonce from `OsRng`. Stored format: `base64( nonce || ciphertext || tag )` = 60 bytes → 80 base64 chars. The `p256` crate's `SecretKey` type carries `zeroize` — private key bytes are wiped from memory on drop. 101 + 102 + **Config:** Both new fields are `Option<_>` to avoid breaking existing deployments. The endpoint returns 503 if the master key is not configured. 103 + 104 + ``` 105 + ┌─────────────────────────────────────────────┐ 106 + │ crates/relay │ 107 + │ │ 108 + POST /v1/relay/keys ──► create_signing_key handler │ 109 + │ │ │ 110 + │ ├── check Bearer token (config) │ 111 + │ ├── check master key (config → 503) │ 112 + │ ├── crypto::generate_p256_keypair() │◄─── crates/crypto 113 + │ ├── crypto::encrypt_private_key() │◄─── crates/crypto 114 + │ └── INSERT relay_signing_keys (DB) │ 115 + └─────────────────────────────────────────────┘ 116 + ``` 117 + 118 + ## Existing Patterns 119 + 120 + **FCIS boundary:** Matches the pattern established in MM-72 and MM-138. `crates/relay` is the only crate that touches SQLite; `crates/crypto` is a pure functional core. 121 + 122 + **Migration runner:** V003 follows the same append-only convention as V001 and V002 — a new `.sql` file under `crates/relay/src/db/migrations/`, applied by the existing custom runner. Never modifying an applied migration. 123 + 124 + **`WITHOUT ROWID`:** The `relay_signing_keys` table uses `WITHOUT ROWID` since keys are always fetched by their `id` (the did:key URI). This matches the pattern used for `did_documents`, `handles`, and `oauth_clients` in V002. 125 + 126 + **Config pattern:** New optional fields follow the existing three-layer pattern in `common/src/config.rs`: `RawConfig` (Deserialize, all-optional), `apply_env_overrides` (env var map), `validate_and_build` (validation + defaults). 127 + 128 + **Route registration:** The new route is added to the `Router::new()` chain in `app.rs`, following the pattern of `describe_server` and `health`. 129 + 130 + **Test helper:** `test_state()` in `app.rs` provides a fully-migrated in-memory pool for handler tests. This function will need updating to include the two new config fields (both `None` by default in test context). 131 + 132 + ## Implementation Phases 133 + 134 + <!-- START_PHASE_1 --> 135 + ### Phase 1: Crypto crate 136 + 137 + **Goal:** Implement the pure functional core — P-256 key generation, AES-256-GCM encryption/decryption, and did:key ID derivation. No relay changes yet. 138 + 139 + **Components:** 140 + - Workspace `Cargo.toml` — add `p256 = { version = "0.13", features = ["ecdsa"] }`, `aes-gcm = "0.10"`, `multibase = "0.9"`, `rand_core = { version = "0.6", features = ["getrandom"] }` to `[workspace.dependencies]` 141 + - `crates/crypto/Cargo.toml` — opt in to all four new deps plus `thiserror = { workspace = true }` 142 + - `crates/crypto/src/error.rs` — `CryptoError` enum: `KeyGeneration`, `Encryption`, `Decryption`, `InvalidKeyId` 143 + - `crates/crypto/src/keys.rs` — `P256Keypair` struct (key_id, public_key, private_key_bytes); `generate_p256_keypair()`, `encrypt_private_key(key_bytes, master_key)`, `decrypt_private_key(encrypted, master_key)` 144 + - `crates/crypto/src/lib.rs` — re-export public surface 145 + 146 + **Dependencies:** None (Phase 1 is standalone) 147 + 148 + **Done when:** `cargo test -p crypto` passes; `cargo clippy -p crypto -- -D warnings` passes; `cargo fmt --all --check` passes. Tests cover: keypair generation produces valid compressed point and did:key ID; encrypt/decrypt round-trips correctly; decryption with wrong master key returns error; invalid base64 input returns error. 149 + <!-- END_PHASE_1 --> 150 + 151 + <!-- START_PHASE_2 --> 152 + ### Phase 2: Config additions and V003 migration 153 + 154 + **Goal:** Add the two new config fields and the `relay_signing_keys` table. No endpoint yet. 155 + 156 + **Components:** 157 + - `crates/common/src/config.rs` — add `admin_token: Option<String>` and `signing_key_master_key: Option<[u8; 32]>` to `Config`; add corresponding fields to `RawConfig`; add `EZPDS_ADMIN_TOKEN` and `EZPDS_SIGNING_KEY_MASTER_KEY` (64 hex chars → `[u8; 32]`) to `apply_env_overrides`; add parsing/validation to `validate_and_build` 158 + - `crates/relay/src/db/migrations/V003__relay_signing_keys.sql` — `relay_signing_keys` table (`WITHOUT ROWID`) 159 + - `crates/relay/src/app.rs` — update `test_state()` to include `admin_token: None, signing_key_master_key: None` 160 + 161 + **Dependencies:** Phase 1 (workspace Cargo.toml must already be updated) 162 + 163 + **Done when:** `cargo test` passes; config tests cover: `EZPDS_ADMIN_TOKEN` env override; `EZPDS_SIGNING_KEY_MASTER_KEY` with valid 64-char hex parses to `[u8; 32]`; invalid hex or wrong length returns `ConfigError::Invalid`; both fields default to `None` when absent. Migration tests confirm `relay_signing_keys` table exists after running migrations on an in-memory DB. 164 + <!-- END_PHASE_2 --> 165 + 166 + <!-- START_PHASE_3 --> 167 + ### Phase 3: Relay endpoint 168 + 169 + **Goal:** Wire up the HTTP handler. The relay can now generate and store signing keys. 170 + 171 + **Components:** 172 + - `crates/relay/src/routes/create_signing_key.rs` — handler for `POST /v1/relay/keys`; extracts Bearer token, validates against `config.admin_token`; checks `config.signing_key_master_key`; calls `crypto::generate_p256_keypair()`; calls `crypto::encrypt_private_key()`; inserts into `relay_signing_keys`; returns JSON response 173 + - `crates/relay/src/routes/mod.rs` — add `pub mod create_signing_key` 174 + - `crates/relay/src/app.rs` — register `.route("/v1/relay/keys", post(create_signing_key))` in `app()`; add `crypto = { workspace = true }` to `crates/relay/Cargo.toml` 175 + 176 + **Dependencies:** Phase 1 (crypto functions), Phase 2 (config fields + migration) 177 + 178 + **Done when:** Integration tests pass for all cases: 200 with valid token and master key configured (response contains key_id, public_key, algorithm; private key not present); 400 for unsupported algorithm; 401 for missing Authorization header; 401 for wrong token; 503 when master key not in config; 500 not tested directly but error path compiles. `cargo test --workspace`, `cargo clippy --workspace -- -D warnings`, `cargo fmt --all --check` all pass. 179 + <!-- END_PHASE_3 --> 180 + 181 + ## Additional Considerations 182 + 183 + **aes-gcm advisory GHSA-423w-p2w9-r7vq:** Affects `decrypt_in_place_detached` — plaintext is exposed before tag verification completes. The implementation must use the safe `decrypt` API (tag verified before plaintext is returned) and never call `decrypt_in_place_detached`. 184 + 185 + **Nonce reuse:** Nonce reuse with the same AES-GCM key is catastrophic (breaks both confidentiality and integrity). At expected key volumes (~thousands of relay signing keys), random 12-byte nonces via `OsRng` are safe: collision probability at 1M keys is ~10⁻²³. No counter-based nonce management required. 186 + 187 + **Follow-up tickets:** Key listing (`GET /v1/relay/keys`) and rotation/invalidation are explicitly out of scope. The `relay_signing_keys` schema has no `active` or `revoked_at` column; those are Wave 3 concerns. 188 + 189 + **Master key rotation:** Rotating the master key requires re-encrypting all stored private keys. Not in scope for this ticket; noted here because the stored format (`base64(nonce||ct||tag)`) is forward-compatible with a key-version prefix if needed later.