commits
Completed brainstorming session. Design includes:
- crypto crate: pure build_did_plc_genesis_op function (CBOR, ECDSA P-256, RFC 6979, base32 DID derivation)
- relay crate: POST /v1/dids with pending_session auth, pre-store retry resilience, atomic account promotion
- 2 implementation phases
Critical: correct V006 migration comment — SQLite does not auto-update FK
references in child tables on RENAME; the migration is safe because all
tables are empty (no DML-time FK checks fire).
Important:
- Add UNIQUE INDEX idx_devices_token_hash on devices.device_token_hash
- Add max-length check (512 chars) on devicePublicKey input
- Add #[tracing::instrument] + claim_code field to redeem_and_register;
distinguish RowNotFound from other errors in log messages
- Fix seed_pending_account helper to generate unique codes/email/handle
per call so it is safe to invoke multiple times on the same pool
- Add orphaned_claim_code_returns_500_and_does_not_redeem_code test
(verifies atomicity: transaction rolls back if pending_accounts lookup
fails, leaving claim code unredeemed)
- Extend closed_db_pool_returns_500 and platform_is_case_sensitive tests
to assert error code in response body
- Add oversized_public_key_returns_400 test
- Add oversized_public_key_returns_400 test (boundary test for devicePublicKey, mirrors register_device.rs analogue)
- Add empty_email_returns_400 test (present-but-empty email returns 400, not 422)
- Document V007 pending_sessions migration in crates/relay/src/db/CLAUDE.md
Combined mobile account creation endpoint for the iOS identity wallet
onboarding flow. Atomically redeems a claim code, creates a pending
account, registers the device, and issues a pending session token in a
single transaction — with full rollback on any step failure.
- V007 migration: pending_sessions table (token_hash UNIQUE, FKs to
pending_accounts and devices) for pre-DID session tokens
- ClaimCodeRedeemed ErrorCode (409) to distinguish already-redeemed
codes from invalid/expired ones (404) per spec
- validate_handle and is_valid_platform promoted to pub(crate) for reuse
- Bruno collection entry for the new route
Critical: correct V006 migration comment — SQLite does not auto-update FK
references in child tables on RENAME; the migration is safe because all
tables are empty (no DML-time FK checks fire).
Important:
- Add UNIQUE INDEX idx_devices_token_hash on devices.device_token_hash
- Add max-length check (512 chars) on devicePublicKey input
- Add #[tracing::instrument] + claim_code field to redeem_and_register;
distinguish RowNotFound from other errors in log messages
- Fix seed_pending_account helper to generate unique codes/email/handle
per call so it is safe to invoke multiple times on the same pool
- Add orphaned_claim_code_returns_500_and_does_not_redeem_code test
(verifies atomicity: transaction rolls back if pending_accounts lookup
fails, leaving claim code unredeemed)
- Extend closed_db_pool_returns_500 and platform_is_case_sensitive tests
to assert error code in response body
- Add oversized_public_key_returns_400 test
Device registration via claim code: validates and redeems a single-use claim
code, stores the device public key, generates an opaque device_token (stored
as SHA-256 hash, returned once), and enforces platform validation.
V006 migration rebuilds the devices table to reference pending_accounts.id
instead of accounts.did (registration precedes DID assignment), adding
platform, public_key, and device_token_hash columns. sessions, oauth_tokens,
and refresh_tokens are also rebuilt to maintain correct FK targets after the
cascading rename.
Critical:
- Run cargo fmt --all (formatting violations in auth.rs, create_account.rs)
- Add unit tests for require_admin_token() in auth.rs (6 tests covering all
branches including the non-UTF-8 Authorization header path)
- Add unit tests for generate_code() in code_gen.rs (4 tests: length, charset,
character set membership, non-constant output)
Important:
- Narrow pub mod auth to pub(crate) mod auth in routes/mod.rs
- Drop pub from CODE_LEN and CHARSET in code_gen.rs (no external consumers)
- Switch OR EXISTS queries from bool to i64 + CAST AS INTEGER to avoid
sqlx type-affinity ambiguity on untyped SQLite expressions
- Narrow auth.rs doc comment: presence/prefix checks are conventional
short-circuits; only the final comparison uses subtle::ct_eq
- Remove stale "handle_in_handles query coverage" comment from test
- Log constraint name in unique_violation_source default arm so unexpected
future constraints are visible in traces
Suggestions (high-value):
- Use bool::from(ct_eq(...)) instead of unwrap_u8() != 1 per subtle docs
- Upgrade non-UTF-8 Authorization header log from debug to warn
Extract shared admin Bearer token validation, code generation, and test
helpers that were duplicated across claim_codes, create_account, and
create_signing_key.
- routes/auth.rs: require_admin_token() replaces 37-line auth block copied 3×;
also fixes create_signing_key which was missing the inspect_err debug log
for non-UTF-8 Authorization headers
- routes/code_gen.rs: generate_code() + CODE_LEN/CHARSET moved here from
claim_codes and create_account where they were defined identically
- routes/test_utils.rs: test_state_with_admin_token() shared instead of
duplicated in each route's test module
- create_account: consolidate 4 separate pre-check queries into 2 OR EXISTS
queries (email and handle each check both tables in one round-trip)
Critical fixes:
- Distinguish unique constraint violations by inspecting db_err.message():
pending_accounts.email → AccountExists (409), pending_accounts.handle
→ HandleTaken (409), claim_codes.code → retry. Prevents TOCTOU races
from being silently swallowed and retried as claim-code collisions.
- Add AccountExists, HandleTaken, InvalidHandle to status_code_mapping test
Important fixes:
- Add duplicate_handle_in_handles_returns_409 test covering the
handle_in_handles SELECT path (was untested)
- Assert json["error"]["code"] in all four 409 conflict tests so
AccountExists vs HandleTaken body swaps are caught
- Add inspect_err logging to both execute calls inside
insert_pending_account for per-operation failure attribution
Suggestion:
- Replace retry-exhaustion message "failed to generate unique claim code
after retries" with generic "failed to create account"; move the
detail to a tracing::error! before the return
Adds operator-authenticated account provisioning endpoint that creates
a pending account slot with a 24h claim code before DID assignment.
- V005 migration: pending_accounts staging table (id, email, handle,
tier, claim_code FK → claim_codes, created_at); unique indices on
email and handle
- New ErrorCode variants: AccountExists (409), HandleTaken (409),
InvalidHandle (400)
- POST /v1/accounts handler: auth → handle validation → email/handle
uniqueness across both pending and active tables → single-TX insert
into claim_codes + pending_accounts → 201 with {accountId, did: null,
claimCode, status: "pending"}
- 26 tests covering happy path, DB persistence, duplicate email/handle,
handle format, tier validation, missing fields, auth, and 500 path
- Bruno create_account.bru collection entry
- uuid v1 workspace dependency for account_id generation
Adds a Bruno HTTP client collection covering all four relay endpoints
(health, describeServer, claim-codes, create-signing-key) with a local
environment template and a mandatory update rule in CLAUDE.md.
- Remove `&& attempt < 2` guard: unique violations now always retry;
post-loop error becomes the exhaustion case (was dead code before)
- Fix comment: "Retry up to 3 times" → "Attempt up to 3 times total (2 retries)"
- Add expires_at window assertion to persistence tests (5s tolerance)
- Add non_unique_db_error_returns_500_without_retry test (closes pool before request)
- Annotate begin/commit in insert_claim_codes with inspect_err logging
- Log non-UTF-8 Authorization header at debug level
- Add doc comment to ClaimCodesResponse.codes field
Adds operator-authenticated endpoint for generating batch invite codes
before account creation exists. Fixes the Wave 1 schema which incorrectly
required a NOT NULL DID FK on claim_codes, making pre-account invite codes
structurally impossible.
- V004 migration: recreates claim_codes without did FK; adds expires_at index
- POST /v1/accounts/claim-codes: Bearer-auth, count 1–10, configurable expiry
- 6-char uppercase alphanumeric codes via OsRng, batch-inserted in one tx
- Status derived from redeemed_at/expires_at columns (no status enum)
- 15 handler tests covering happy path, format, persistence, validation, auth
It is the random coefficient (coeffs[i]), not the secret byte directly,
that passes through gf_mul. The secret byte only goes through gf_add
(XOR, inherently branchless). Security intent unchanged.
- Replace branching GF(2^8) reduction with branchless mask:
(a as i8 >> 7) as u8 selects 0x1b without branching on secret bits
- Add upper-bound index check (> 3) in combine_shares; silent wrong
reconstruction on out-of-range indices was not caught before
- Switch fill_bytes -> try_fill_bytes so RNG failure returns
CryptoError::SecretSharing instead of panicking
- Remove #[derive(Clone)] from ShamirShare — no call site uses it and
Clone on a secret-bearing type is inconsistent with P256Keypair
- Expand combine_with_index_zero_fails to test both argument positions
- Add combine_with_index_out_of_range_fails test (index: 4)
- Expand gf_mul_is_commutative to exhaustive 256×256 check
- Update gf_mul/gf_inv doc comments: describe branchless reduction,
fix "repeated squaring" -> "binary exponentiation (square-and-multiply)",
add standard -> GF(2^8) Lagrange derivation step
Adds split_secret and combine_shares to crates/crypto using GF(2^8)
arithmetic (AES irreducible polynomial 0x11b). Any 2 of the 3 returned
shares reconstruct the original 32-byte secret; a single share reveals
nothing (information-theoretic security). Share data is zeroized on drop.
Closes MM-93
- Fix 1: Replace wrong sqlx error source for corrupt migration version. Use
Protocol(format!(...)) instead of RowNotFound to accurately describe the
i64-to-u32 conversion failure.
- Fix 2: Remove InvalidKeyId from CryptoError variants list in CLAUDE.md (never
implemented).
- Fix 3: Rename three test functions to match their assertions:
unsupported_algorithm_returns_422, empty_algorithm_returns_422,
missing_algorithm_field_returns_422.
- Fix 4: Add test for null algorithm field. Null deserialization returns 400
(Bad Request) from Axum's default JSON rejection, distinct from missing/invalid
enum variants (422).
- Fix 5: Add key_id context to DB insert error log for better debugging when
signing key persistence fails.
- Fix 6: Add comment explaining why Sensitive<T> has pub T field — deliberate
design choice to make raw value access visible in source for code review.
- Fix #5: Config.signing_key_master_key leaks via Debug and clone
- Wrap signing_key_master_key in Sensitive<Zeroizing<[u8; 32]>>
- Adds Sensitive newtype that redacts Debug output to "***"
- Zeroizing ensures key bytes are securely zeroized on drop
- Never copies key into non-zeroizing allocation
- Fix #6: CreateSigningKeyRequest.algorithm should be one-variant enum
- Replace Option<String> with Algorithm enum (single P256 variant)
- Serde validates at deserialization time, not runtime
- Remove dead runtime algorithm matching code
- Updated tests to expect 422 (Unprocessable Entity) for invalid enum
- Fix #7: Remove dead CryptoError::InvalidKeyId variant
- Variant was never constructed in this PR
- Fix #8: Wrap raw_bytes in Zeroizing in keys.rs
- Ensures intermediate GenericArray from secret_key.to_bytes() is zeroized
- Guards against future changes to p256 library behavior
- Fix #9: Add PRAGMA table_info test for V003 relay_signing_keys columns
- Validates exact column order and names: id, algorithm, public_key,
private_key_encrypted, created_at
- Fix #10: Add V003 PRIMARY KEY uniqueness constraint test
- Verifies duplicate id inserts fail with constraint violation
- Fix #11: Introduce DidKeyUri newtype for P256Keypair.key_id
- Prevents silent positional swap bugs in SQL binds and API responses
- Type-safe distinction between key_id (did:key:z...) and public_key (z...)
- Converts to string for DB inserts and JSON responses
Changes:
- crates/common: Add zeroize dependency, Sensitive<T> wrapper, export it
- crates/crypto: Add DidKeyUri newtype, remove InvalidKeyId CryptoError variant
- crates/relay: Add zeroize dependency, update handler to use new types, add V003 tests
- Issue #1 (Critical): Replace non-constant-time Bearer token comparison with subtle::ConstantTimeEq to prevent timing attacks in create_signing_key.rs:57
- Issue #2 (Critical): Move zeroize and subtle dependencies to [workspace.dependencies] in root Cargo.toml; update crates to use { workspace = true } per project conventions
- Issue #3 (High): Fix migration infrastructure to return DbError instead of silently mapping corrupt schema_migrations version numbers to 0; now propagates parse errors with ? operator in mod.rs:99-107
- Issue #4 (High): Add sentinel field signing_key_master_key_toml_sentinel to RawConfig to detect and reject misconfigured operators who set the security-sensitive field in relay.toml instead of env var EZPDS_SIGNING_KEY_MASTER_KEY; includes validation check and regression test in config.rs
Add crypto crate CLAUDE.md (new public API: P-256 keygen, AES-256-GCM
encrypt/decrypt, did:key derivation). Update db CLAUDE.md with V003
migration. Update root CLAUDE.md crypto crate description.
Completed brainstorming session. Design includes:
- RustCrypto stack (p256 + aes-gcm + multibase), all audited
- AES-256-GCM encryption at rest with per-key OsRng nonces
- did:key URI format for key IDs (P-256 multicodec 0x1200)
- 3 implementation phases: crypto crate, config+migration, relay endpoint
Critical:
- Enable foreign_keys=true in open_pool via SqliteConnectOptions so all
FK constraints are enforced at runtime, not just in tests
- Add 4 missing did indexes: idx_handles_did, idx_signing_keys_did,
idx_devices_did, idx_sessions_did
Important:
- Add comment to did_documents explaining no FK to accounts (intentional:
caches external DIDs from remote PDSs, not only local accounts)
- Expand FK test coverage: sessions.device_id, refresh_tokens.session_id,
oauth_authorization_codes.client_id
- Add core auth chain insert test: accounts → devices → sessions →
refresh_tokens (validates column names and NOT NULL constraints end-to-end)
Suggestions:
- Replace v002_migrations_are_idempotent (duplicate of generic test) with
v002_tables_survive_second_migration_run (behavioral, not count-based)
- Remove redundant PRAGMA foreign_keys = ON from FK test (pool handles it)
- Add EXPLAIN QUERY PLAN tests for all 4 new did indexes
- Update db/CLAUDE.md Invariants and Key Files sections
Creates V002__auth_identity.sql with all 12 Wave 2 tables: accounts,
handles, did_documents, signing_keys, devices, claim_codes, sessions,
refresh_tokens, oauth_clients, oauth_authorization_codes, oauth_tokens,
and oauth_par_requests. Adds the 5 required indexes (unique email,
claim_codes/refresh_tokens/oauth_tokens did lookups). Uses WITHOUT ROWID
on tables with PK-only access paths. Updates MIGRATIONS static and
existing row-count assertions to be future-proof via MIGRATIONS.len().
Adds 9 V002 tests covering table existence, idempotency, FK enforcement,
unique email, and EXPLAIN QUERY PLAN for all 4 non-trivial indexes.
Critical:
- Move #[tracing::instrument] to after doc comment blocks in db/mod.rs
(was splitting /// paragraphs, breaking rustdoc)
- Add skip(url) to open_pool's instrument attribute to avoid recording
the database URL as a span field
Important:
- Replace eprintln! in OtelGuard::drop with tracing::error! — the
subscriber is still live at drop time, so structured logging is correct
- Add tracing::debug! in HeaderMapCarrier::get for non-UTF-8 header
values instead of silently discarding them
- Add comment at _otel_guard binding explaining the naming requirement:
bare _ drops immediately, which would shut down the exporter early
- Add otlp_endpoint validation in validate_and_build: must be non-empty
and start with http:// or https://
- Add unit tests for HeaderMapCarrier (get, absent, case-insensitive,
keys) including the non-UTF-8 silent-drop behaviour being intentional
Suggestions:
- Add eprintln warning when RUST_LOG is invalid (subscriber not up yet)
- Update apply_env_overrides doc to mention OTEL_SERVICE_NAME
- Update validate_and_build doc to include invite_code_required default
and telemetry validation rules
- Add `[telemetry]` config section (enabled, otlp_endpoint, service_name)
with env overrides: EZPDS_TELEMETRY_ENABLED, EZPDS_OTLP_ENDPOINT,
OTEL_SERVICE_NAME
- New crates/relay/src/telemetry.rs: layered subscriber init with
conditional OTel layer; OtelGuard flushes spans on graceful shutdown
- OtelMakeSpan in app.rs: custom TraceLayer make_span_with that extracts
W3C traceparent/tracestate headers and links upstream traces via
TraceContextPropagator
- DB instrumentation: #[tracing::instrument] on open_pool and
run_migrations with db.system = "sqlite" span attribute
- 7 new config tests covering telemetry TOML parsing, all three env
overrides, env-wins-over-toml precedence, and invalid bool error
- Zero overhead when telemetry.enabled = false (default)
- relay.dev.toml: change available_user_domains to ["localhost"] so dev
startup doesn't fail validation out of the box
- config.rs: return ConfigError::Invalid (not MissingField) when
available_user_domains is present but empty; update doc comment to
list it as a required field
- app.rs: remove stale #[allow(dead_code)] on config field now that
describe_server reads it
- describe_server.rs: remove stale TODO/Parameters/trade-offs comment
from resolve_did; add two handler-level tests asserting the did field
(derived from public_url and from explicit server_did config)
Adds the ATProto service discovery endpoint required by Bluesky clients
during login. Config is extended with available_user_domains (required),
invite_code_required (default true), optional server_did, links, and
contact sections. The handler derives did:web:<host> from public_url as
a placeholder until Wave 3 generates a real DID.
- Log error details when DB health check fails (tracing::error!)
- Assert content-type: application/json on 503 response in test
- Add health_post_returns_405 test (consistent with existing XRPC pattern)
- Clarify SELECT 1 scope in handler comment (liveness only, not schema)
- Fix test_state() doc comment to note pool is fully migrated
Adds the first real XRPC route establishing the pattern for all subsequent
endpoint additions. Returns {"version":"0.1.0","db":"ok"} on 200 or
{"version":"0.1.0","db":"error"} on 503 when the SQLite pool is unreachable.
- Add routes/ module with one file per endpoint (routes/health.rs)
- Register /xrpc/_health before the catch-all /xrpc/:method route
- Promote test_state() to pub(crate) so per-endpoint test modules can share it
- Remove dead_code suppression on AppState.db now that a handler uses it
CRITICAL (5):
- Add DbError::InvalidUrl variant to distinguish URL parsing failures from pool-open failures
- Remove #[from] on DbError::Pool, add explicit error handling at each call site
- Add DbError::Setup.step field to provide detailed context (4 distinct failure stages)
- Update run_migrations doc comment to accurately describe bootstrap vs transaction scope
- Update DbError::Setup doc comment to list all 4 failure stages (bootstrap DDL, fetch versions, begin, commit)
IMPORTANT (9):
- Extract db_url normalization into to_sqlite_url() function with comprehensive unit tests
- Fix i32/u32 type mismatch: fetch as i64, convert with u32::try_from(), bind as i64
- Add tracing::info!() at: migrations start, each migration apply, commit success; debug for no-op
- Add tracing::error!() before fatal errors propagate to main() eprintln
- Update migration failure context to include database URL
- Add test for server_metadata PRIMARY KEY uniqueness constraint enforcement
- Add tracing::warn!() for relative-path URL normalization (CWD sensitivity)
- Fix open_pool doc: 're-issues' not 'tracks' for WAL PRAGMA behavior
- Update db/CLAUDE.md Guarantees section to accurately describe bootstrap vs transaction scope
MINOR (7):
- Document that commit() failure triggers Drop rollback; no partial schema, safe to re-run
- Document that pool creation succeeds even if file path is invalid; failure surfaces at first query
- Add distinct purpose doc to migrations_apply_on_first_run (row count = 1 after first run)
- Add doc comment to select_one_succeeds explaining it's a pool-connectivity smoke test
- Strengthen applied_at assertion: check length=19 and ISO-8601 format (starts with '20')
- Remove duplicate '// pattern: Imperative Shell' comment from top of db/mod.rs
- Consolidate duplicate #[allow(dead_code)] comments in AppState struct
All tests pass; build clean; clippy and fmt verified.
- Root CLAUDE.md: add SQLite/sqlx to tech stack, correct rusqlite
references to sqlx's libsqlite3-sys, update freshness date
- New crates/relay/src/db/CLAUDE.md: document db module contracts,
invariants, key decisions (custom migration runner, single-connection
pool, decoupled open_pool signature)
Completed brainstorming session. Design includes:
- Custom forward-only migration runner with schema_migrations tracking
- WAL-mode SqlitePool (max 1 connection) via SqliteConnectOptions
- Flat db/mod.rs with include_str!-embedded SQL migrations
- AppState extended with SqlitePool; open_pool accepts plain URL for future per-user DB reuse
- 3 implementation phases: sqlx dep, db module + V001 schema, AppState integration
Adds processes.relay to devenv.nix so `devenv up` launches the relay via
cargo run, with EZPDS_* env vars providing dev defaults for data_dir and
public_url. relay.dev.toml holds committed non-secret defaults; state lands
in .devenv/state/relay (already gitignored). Override per-machine in
devenv.local.nix.
Critical: correct V006 migration comment — SQLite does not auto-update FK
references in child tables on RENAME; the migration is safe because all
tables are empty (no DML-time FK checks fire).
Important:
- Add UNIQUE INDEX idx_devices_token_hash on devices.device_token_hash
- Add max-length check (512 chars) on devicePublicKey input
- Add #[tracing::instrument] + claim_code field to redeem_and_register;
distinguish RowNotFound from other errors in log messages
- Fix seed_pending_account helper to generate unique codes/email/handle
per call so it is safe to invoke multiple times on the same pool
- Add orphaned_claim_code_returns_500_and_does_not_redeem_code test
(verifies atomicity: transaction rolls back if pending_accounts lookup
fails, leaving claim code unredeemed)
- Extend closed_db_pool_returns_500 and platform_is_case_sensitive tests
to assert error code in response body
- Add oversized_public_key_returns_400 test
Combined mobile account creation endpoint for the iOS identity wallet
onboarding flow. Atomically redeems a claim code, creates a pending
account, registers the device, and issues a pending session token in a
single transaction — with full rollback on any step failure.
- V007 migration: pending_sessions table (token_hash UNIQUE, FKs to
pending_accounts and devices) for pre-DID session tokens
- ClaimCodeRedeemed ErrorCode (409) to distinguish already-redeemed
codes from invalid/expired ones (404) per spec
- validate_handle and is_valid_platform promoted to pub(crate) for reuse
- Bruno collection entry for the new route
Critical: correct V006 migration comment — SQLite does not auto-update FK
references in child tables on RENAME; the migration is safe because all
tables are empty (no DML-time FK checks fire).
Important:
- Add UNIQUE INDEX idx_devices_token_hash on devices.device_token_hash
- Add max-length check (512 chars) on devicePublicKey input
- Add #[tracing::instrument] + claim_code field to redeem_and_register;
distinguish RowNotFound from other errors in log messages
- Fix seed_pending_account helper to generate unique codes/email/handle
per call so it is safe to invoke multiple times on the same pool
- Add orphaned_claim_code_returns_500_and_does_not_redeem_code test
(verifies atomicity: transaction rolls back if pending_accounts lookup
fails, leaving claim code unredeemed)
- Extend closed_db_pool_returns_500 and platform_is_case_sensitive tests
to assert error code in response body
- Add oversized_public_key_returns_400 test
Device registration via claim code: validates and redeems a single-use claim
code, stores the device public key, generates an opaque device_token (stored
as SHA-256 hash, returned once), and enforces platform validation.
V006 migration rebuilds the devices table to reference pending_accounts.id
instead of accounts.did (registration precedes DID assignment), adding
platform, public_key, and device_token_hash columns. sessions, oauth_tokens,
and refresh_tokens are also rebuilt to maintain correct FK targets after the
cascading rename.
Critical:
- Run cargo fmt --all (formatting violations in auth.rs, create_account.rs)
- Add unit tests for require_admin_token() in auth.rs (6 tests covering all
branches including the non-UTF-8 Authorization header path)
- Add unit tests for generate_code() in code_gen.rs (4 tests: length, charset,
character set membership, non-constant output)
Important:
- Narrow pub mod auth to pub(crate) mod auth in routes/mod.rs
- Drop pub from CODE_LEN and CHARSET in code_gen.rs (no external consumers)
- Switch OR EXISTS queries from bool to i64 + CAST AS INTEGER to avoid
sqlx type-affinity ambiguity on untyped SQLite expressions
- Narrow auth.rs doc comment: presence/prefix checks are conventional
short-circuits; only the final comparison uses subtle::ct_eq
- Remove stale "handle_in_handles query coverage" comment from test
- Log constraint name in unique_violation_source default arm so unexpected
future constraints are visible in traces
Suggestions (high-value):
- Use bool::from(ct_eq(...)) instead of unwrap_u8() != 1 per subtle docs
- Upgrade non-UTF-8 Authorization header log from debug to warn
Extract shared admin Bearer token validation, code generation, and test
helpers that were duplicated across claim_codes, create_account, and
create_signing_key.
- routes/auth.rs: require_admin_token() replaces 37-line auth block copied 3×;
also fixes create_signing_key which was missing the inspect_err debug log
for non-UTF-8 Authorization headers
- routes/code_gen.rs: generate_code() + CODE_LEN/CHARSET moved here from
claim_codes and create_account where they were defined identically
- routes/test_utils.rs: test_state_with_admin_token() shared instead of
duplicated in each route's test module
- create_account: consolidate 4 separate pre-check queries into 2 OR EXISTS
queries (email and handle each check both tables in one round-trip)
Critical fixes:
- Distinguish unique constraint violations by inspecting db_err.message():
pending_accounts.email → AccountExists (409), pending_accounts.handle
→ HandleTaken (409), claim_codes.code → retry. Prevents TOCTOU races
from being silently swallowed and retried as claim-code collisions.
- Add AccountExists, HandleTaken, InvalidHandle to status_code_mapping test
Important fixes:
- Add duplicate_handle_in_handles_returns_409 test covering the
handle_in_handles SELECT path (was untested)
- Assert json["error"]["code"] in all four 409 conflict tests so
AccountExists vs HandleTaken body swaps are caught
- Add inspect_err logging to both execute calls inside
insert_pending_account for per-operation failure attribution
Suggestion:
- Replace retry-exhaustion message "failed to generate unique claim code
after retries" with generic "failed to create account"; move the
detail to a tracing::error! before the return
Adds operator-authenticated account provisioning endpoint that creates
a pending account slot with a 24h claim code before DID assignment.
- V005 migration: pending_accounts staging table (id, email, handle,
tier, claim_code FK → claim_codes, created_at); unique indices on
email and handle
- New ErrorCode variants: AccountExists (409), HandleTaken (409),
InvalidHandle (400)
- POST /v1/accounts handler: auth → handle validation → email/handle
uniqueness across both pending and active tables → single-TX insert
into claim_codes + pending_accounts → 201 with {accountId, did: null,
claimCode, status: "pending"}
- 26 tests covering happy path, DB persistence, duplicate email/handle,
handle format, tier validation, missing fields, auth, and 500 path
- Bruno create_account.bru collection entry
- uuid v1 workspace dependency for account_id generation
- Remove `&& attempt < 2` guard: unique violations now always retry;
post-loop error becomes the exhaustion case (was dead code before)
- Fix comment: "Retry up to 3 times" → "Attempt up to 3 times total (2 retries)"
- Add expires_at window assertion to persistence tests (5s tolerance)
- Add non_unique_db_error_returns_500_without_retry test (closes pool before request)
- Annotate begin/commit in insert_claim_codes with inspect_err logging
- Log non-UTF-8 Authorization header at debug level
- Add doc comment to ClaimCodesResponse.codes field
Adds operator-authenticated endpoint for generating batch invite codes
before account creation exists. Fixes the Wave 1 schema which incorrectly
required a NOT NULL DID FK on claim_codes, making pre-account invite codes
structurally impossible.
- V004 migration: recreates claim_codes without did FK; adds expires_at index
- POST /v1/accounts/claim-codes: Bearer-auth, count 1–10, configurable expiry
- 6-char uppercase alphanumeric codes via OsRng, batch-inserted in one tx
- Status derived from redeemed_at/expires_at columns (no status enum)
- 15 handler tests covering happy path, format, persistence, validation, auth
- Replace branching GF(2^8) reduction with branchless mask:
(a as i8 >> 7) as u8 selects 0x1b without branching on secret bits
- Add upper-bound index check (> 3) in combine_shares; silent wrong
reconstruction on out-of-range indices was not caught before
- Switch fill_bytes -> try_fill_bytes so RNG failure returns
CryptoError::SecretSharing instead of panicking
- Remove #[derive(Clone)] from ShamirShare — no call site uses it and
Clone on a secret-bearing type is inconsistent with P256Keypair
- Expand combine_with_index_zero_fails to test both argument positions
- Add combine_with_index_out_of_range_fails test (index: 4)
- Expand gf_mul_is_commutative to exhaustive 256×256 check
- Update gf_mul/gf_inv doc comments: describe branchless reduction,
fix "repeated squaring" -> "binary exponentiation (square-and-multiply)",
add standard -> GF(2^8) Lagrange derivation step
- Fix 1: Replace wrong sqlx error source for corrupt migration version. Use
Protocol(format!(...)) instead of RowNotFound to accurately describe the
i64-to-u32 conversion failure.
- Fix 2: Remove InvalidKeyId from CryptoError variants list in CLAUDE.md (never
implemented).
- Fix 3: Rename three test functions to match their assertions:
unsupported_algorithm_returns_422, empty_algorithm_returns_422,
missing_algorithm_field_returns_422.
- Fix 4: Add test for null algorithm field. Null deserialization returns 400
(Bad Request) from Axum's default JSON rejection, distinct from missing/invalid
enum variants (422).
- Fix 5: Add key_id context to DB insert error log for better debugging when
signing key persistence fails.
- Fix 6: Add comment explaining why Sensitive<T> has pub T field — deliberate
design choice to make raw value access visible in source for code review.
- Fix #5: Config.signing_key_master_key leaks via Debug and clone
- Wrap signing_key_master_key in Sensitive<Zeroizing<[u8; 32]>>
- Adds Sensitive newtype that redacts Debug output to "***"
- Zeroizing ensures key bytes are securely zeroized on drop
- Never copies key into non-zeroizing allocation
- Fix #6: CreateSigningKeyRequest.algorithm should be one-variant enum
- Replace Option<String> with Algorithm enum (single P256 variant)
- Serde validates at deserialization time, not runtime
- Remove dead runtime algorithm matching code
- Updated tests to expect 422 (Unprocessable Entity) for invalid enum
- Fix #7: Remove dead CryptoError::InvalidKeyId variant
- Variant was never constructed in this PR
- Fix #8: Wrap raw_bytes in Zeroizing in keys.rs
- Ensures intermediate GenericArray from secret_key.to_bytes() is zeroized
- Guards against future changes to p256 library behavior
- Fix #9: Add PRAGMA table_info test for V003 relay_signing_keys columns
- Validates exact column order and names: id, algorithm, public_key,
private_key_encrypted, created_at
- Fix #10: Add V003 PRIMARY KEY uniqueness constraint test
- Verifies duplicate id inserts fail with constraint violation
- Fix #11: Introduce DidKeyUri newtype for P256Keypair.key_id
- Prevents silent positional swap bugs in SQL binds and API responses
- Type-safe distinction between key_id (did:key:z...) and public_key (z...)
- Converts to string for DB inserts and JSON responses
Changes:
- crates/common: Add zeroize dependency, Sensitive<T> wrapper, export it
- crates/crypto: Add DidKeyUri newtype, remove InvalidKeyId CryptoError variant
- crates/relay: Add zeroize dependency, update handler to use new types, add V003 tests
- Issue #1 (Critical): Replace non-constant-time Bearer token comparison with subtle::ConstantTimeEq to prevent timing attacks in create_signing_key.rs:57
- Issue #2 (Critical): Move zeroize and subtle dependencies to [workspace.dependencies] in root Cargo.toml; update crates to use { workspace = true } per project conventions
- Issue #3 (High): Fix migration infrastructure to return DbError instead of silently mapping corrupt schema_migrations version numbers to 0; now propagates parse errors with ? operator in mod.rs:99-107
- Issue #4 (High): Add sentinel field signing_key_master_key_toml_sentinel to RawConfig to detect and reject misconfigured operators who set the security-sensitive field in relay.toml instead of env var EZPDS_SIGNING_KEY_MASTER_KEY; includes validation check and regression test in config.rs
Critical:
- Enable foreign_keys=true in open_pool via SqliteConnectOptions so all
FK constraints are enforced at runtime, not just in tests
- Add 4 missing did indexes: idx_handles_did, idx_signing_keys_did,
idx_devices_did, idx_sessions_did
Important:
- Add comment to did_documents explaining no FK to accounts (intentional:
caches external DIDs from remote PDSs, not only local accounts)
- Expand FK test coverage: sessions.device_id, refresh_tokens.session_id,
oauth_authorization_codes.client_id
- Add core auth chain insert test: accounts → devices → sessions →
refresh_tokens (validates column names and NOT NULL constraints end-to-end)
Suggestions:
- Replace v002_migrations_are_idempotent (duplicate of generic test) with
v002_tables_survive_second_migration_run (behavioral, not count-based)
- Remove redundant PRAGMA foreign_keys = ON from FK test (pool handles it)
- Add EXPLAIN QUERY PLAN tests for all 4 new did indexes
- Update db/CLAUDE.md Invariants and Key Files sections
Creates V002__auth_identity.sql with all 12 Wave 2 tables: accounts,
handles, did_documents, signing_keys, devices, claim_codes, sessions,
refresh_tokens, oauth_clients, oauth_authorization_codes, oauth_tokens,
and oauth_par_requests. Adds the 5 required indexes (unique email,
claim_codes/refresh_tokens/oauth_tokens did lookups). Uses WITHOUT ROWID
on tables with PK-only access paths. Updates MIGRATIONS static and
existing row-count assertions to be future-proof via MIGRATIONS.len().
Adds 9 V002 tests covering table existence, idempotency, FK enforcement,
unique email, and EXPLAIN QUERY PLAN for all 4 non-trivial indexes.
Critical:
- Move #[tracing::instrument] to after doc comment blocks in db/mod.rs
(was splitting /// paragraphs, breaking rustdoc)
- Add skip(url) to open_pool's instrument attribute to avoid recording
the database URL as a span field
Important:
- Replace eprintln! in OtelGuard::drop with tracing::error! — the
subscriber is still live at drop time, so structured logging is correct
- Add tracing::debug! in HeaderMapCarrier::get for non-UTF-8 header
values instead of silently discarding them
- Add comment at _otel_guard binding explaining the naming requirement:
bare _ drops immediately, which would shut down the exporter early
- Add otlp_endpoint validation in validate_and_build: must be non-empty
and start with http:// or https://
- Add unit tests for HeaderMapCarrier (get, absent, case-insensitive,
keys) including the non-UTF-8 silent-drop behaviour being intentional
Suggestions:
- Add eprintln warning when RUST_LOG is invalid (subscriber not up yet)
- Update apply_env_overrides doc to mention OTEL_SERVICE_NAME
- Update validate_and_build doc to include invite_code_required default
and telemetry validation rules
- Add `[telemetry]` config section (enabled, otlp_endpoint, service_name)
with env overrides: EZPDS_TELEMETRY_ENABLED, EZPDS_OTLP_ENDPOINT,
OTEL_SERVICE_NAME
- New crates/relay/src/telemetry.rs: layered subscriber init with
conditional OTel layer; OtelGuard flushes spans on graceful shutdown
- OtelMakeSpan in app.rs: custom TraceLayer make_span_with that extracts
W3C traceparent/tracestate headers and links upstream traces via
TraceContextPropagator
- DB instrumentation: #[tracing::instrument] on open_pool and
run_migrations with db.system = "sqlite" span attribute
- 7 new config tests covering telemetry TOML parsing, all three env
overrides, env-wins-over-toml precedence, and invalid bool error
- Zero overhead when telemetry.enabled = false (default)
- relay.dev.toml: change available_user_domains to ["localhost"] so dev
startup doesn't fail validation out of the box
- config.rs: return ConfigError::Invalid (not MissingField) when
available_user_domains is present but empty; update doc comment to
list it as a required field
- app.rs: remove stale #[allow(dead_code)] on config field now that
describe_server reads it
- describe_server.rs: remove stale TODO/Parameters/trade-offs comment
from resolve_did; add two handler-level tests asserting the did field
(derived from public_url and from explicit server_did config)
Adds the ATProto service discovery endpoint required by Bluesky clients
during login. Config is extended with available_user_domains (required),
invite_code_required (default true), optional server_did, links, and
contact sections. The handler derives did:web:<host> from public_url as
a placeholder until Wave 3 generates a real DID.
- Log error details when DB health check fails (tracing::error!)
- Assert content-type: application/json on 503 response in test
- Add health_post_returns_405 test (consistent with existing XRPC pattern)
- Clarify SELECT 1 scope in handler comment (liveness only, not schema)
- Fix test_state() doc comment to note pool is fully migrated
Adds the first real XRPC route establishing the pattern for all subsequent
endpoint additions. Returns {"version":"0.1.0","db":"ok"} on 200 or
{"version":"0.1.0","db":"error"} on 503 when the SQLite pool is unreachable.
- Add routes/ module with one file per endpoint (routes/health.rs)
- Register /xrpc/_health before the catch-all /xrpc/:method route
- Promote test_state() to pub(crate) so per-endpoint test modules can share it
- Remove dead_code suppression on AppState.db now that a handler uses it
CRITICAL (5):
- Add DbError::InvalidUrl variant to distinguish URL parsing failures from pool-open failures
- Remove #[from] on DbError::Pool, add explicit error handling at each call site
- Add DbError::Setup.step field to provide detailed context (4 distinct failure stages)
- Update run_migrations doc comment to accurately describe bootstrap vs transaction scope
- Update DbError::Setup doc comment to list all 4 failure stages (bootstrap DDL, fetch versions, begin, commit)
IMPORTANT (9):
- Extract db_url normalization into to_sqlite_url() function with comprehensive unit tests
- Fix i32/u32 type mismatch: fetch as i64, convert with u32::try_from(), bind as i64
- Add tracing::info!() at: migrations start, each migration apply, commit success; debug for no-op
- Add tracing::error!() before fatal errors propagate to main() eprintln
- Update migration failure context to include database URL
- Add test for server_metadata PRIMARY KEY uniqueness constraint enforcement
- Add tracing::warn!() for relative-path URL normalization (CWD sensitivity)
- Fix open_pool doc: 're-issues' not 'tracks' for WAL PRAGMA behavior
- Update db/CLAUDE.md Guarantees section to accurately describe bootstrap vs transaction scope
MINOR (7):
- Document that commit() failure triggers Drop rollback; no partial schema, safe to re-run
- Document that pool creation succeeds even if file path is invalid; failure surfaces at first query
- Add distinct purpose doc to migrations_apply_on_first_run (row count = 1 after first run)
- Add doc comment to select_one_succeeds explaining it's a pool-connectivity smoke test
- Strengthen applied_at assertion: check length=19 and ISO-8601 format (starts with '20')
- Remove duplicate '// pattern: Imperative Shell' comment from top of db/mod.rs
- Consolidate duplicate #[allow(dead_code)] comments in AppState struct
All tests pass; build clean; clippy and fmt verified.
Completed brainstorming session. Design includes:
- Custom forward-only migration runner with schema_migrations tracking
- WAL-mode SqlitePool (max 1 connection) via SqliteConnectOptions
- Flat db/mod.rs with include_str!-embedded SQL migrations
- AppState extended with SqlitePool; open_pool accepts plain URL for future per-user DB reuse
- 3 implementation phases: sqlx dep, db module + V001 schema, AppState integration
Adds processes.relay to devenv.nix so `devenv up` launches the relay via
cargo run, with EZPDS_* env vars providing dev defaults for data_dir and
public_url. relay.dev.toml holds committed non-secret defaults; state lands
in .devenv/state/relay (already gitignored). Override per-machine in
devenv.local.nix.