lots of thinking · accidental.cc/skypod@ffa4c59

+1603

readme-brainstorm.org

···

··· 1 + * uncategorized notes 2 + 3 + ** sync 4 + - each client keeps the full data set 5 + - dexie sync and observable let us stream change sets 6 + - we can publish the "latest" to all peers 7 + - on first pull, if not the first client, we can request a dump out of band 8 + 9 + *** rss feed data 10 + - do we want to backup feed data? 11 + - conceptually, this should be refetchable 12 + - but feeds go away, and some will only show recent stories 13 + - so yes, we'll need this 14 + - but server side, we can dedupe 15 + - content-addressed server-side cache? 16 + 17 + - server side does RSS pulling 18 + - can feeds be marked private, such that they won't be pulled through the proxy? 19 + - but then we require everything to be fetchable via cors 20 + - client configured proxy settings? 21 + 22 + *** peer connection 23 + - on startup, check for current realm-id and key pair 24 + - if not present, ask to login or start new 25 + - if login, run through the [[* pairing]] process 26 + - if start new, run through the [[* registration]] process 27 + - use keypair to authenticate to server 28 + - response includes list of active peers to connect 29 + - clients negotiate sync from there 30 + - an identity is a keypair and a realm 31 + 32 + - realm is uuid 33 + - realm on the server is the socket connection for peer discovery 34 + - keeps a list of verified public keys 35 + - and manages the /current/ ~public-key->peer ids~ mapping 36 + - realm on the client side is first piece of info required for sync 37 + - when connecting to the signalling server, you present a realm, and a signed public key 38 + - server accepts/rejects based on signature and current verified keys 39 + 40 + - a new keypair can create a realm 41 + 42 + - a new keypair can double sign an invitation 43 + - invite = ~{ realm:, nonce:, not_before:, not_after:, authorizer: }~, signed with verified key 44 + - exchanging an invite = ~{ invite: }~, signed with my key 45 + 46 + - on startup 47 + - start stand-alone (no syncing required, usually the case on first-run) 48 + - generate a keypair 49 + - want server backup? 50 + - sign a "setup" message with new keypair and send to the server 51 + - server responds with a new realm, that this keypair is already verified for 52 + - move along 53 + - exchange invite to sync to other devices 54 + - generate a keypair 55 + - sign the exchange message with the invite and send to the server 56 + - server verifies the invite 57 + - adds the new public key to the peer list and publishes downstream 58 + - move along 59 + 60 + ***** standalone 61 + in this mode, there is no syncing. this is the most likely first-time run option. 62 + 63 + - generate a keypair on startup, so we have a stable fingerprint in the future 64 + - done 65 + 66 + ***** pairing 67 + in this mode, there is syncing to a named realm, but not necessarily server resources consumed 68 + we don't need an email, since the server is just doing signalling and peer management 69 + 70 + - generate an invite from an existing verified peer 71 + - ~{ realm:, not_before:, not_after:, inviter: peer.public_key }~ 72 + - sign that invitation from the existing verified peer 73 + 74 + - standalone -> paired 75 + - get the invitation somehow (QR code?) 76 + - sign an invite exchange with the standalone's public key 77 + - send to server 78 + - server verifies the invite 79 + - adds the new public key to the peer list and publishes downstream 80 + 81 + ***** server backup 82 + in this mode, there is syncing to a named realm by email. 83 + 84 + goal of server backup mode is that we can go from email->fully working client with latest data without having to have any clients left around that could participate in the sync. 85 + 86 + - generate a keypair on startup 87 + - sign a registration message sent to the server 88 + - send a verification email 89 + - if email/realm already exists, this is authorization 90 + - if not, it's email validation 91 + - server starts a realm and associates the public key 92 + - server acts as a peer for the realm, and stores private data 93 + 94 + - since dexie is publishing change sets, we should be able to just store deltas 95 + - but we'll need to store _all_ deltas, unless we're materializing on the server side too 96 + - should we use an indexdb shim so we can import/export from the server for clean start? 97 + - how much materialization does the server need? 98 + 99 + ** summarized architecture design (may 28-29) :ai:claude: 100 + 101 + key decisions and system design: 102 + 103 + *** sync model 104 + - device-specific records for playback state/queues to avoid conflicts 105 + - content-addressed server cache with deduplication 106 + - dual-JWT invitation flow for secure realm joining 107 + 108 + *** data structures 109 + - tag-based filtering system instead of rigid hierarchies 110 + - regex patterns for episode title parsing and organization 111 + - service worker caching with background download support 112 + 113 + *** core schemas 114 + **** client (dexie) 115 + - Channel/ChannelEntry for RSS feeds and episodes 116 + - PlayRecord/QueueItem scoped by deviceId 117 + - FilterView for virtual feed organization 118 + 119 + **** server (drizzle) 120 + - ContentStore for deduplicated content by hash 121 + - Realm/PeerConnection for sync authorization 122 + - HttpCache with health tracking and TTL 123 + 124 + *** push sync strategy 125 + - revision-based sync (just send revision ranges in push notifications) 126 + - background fetch API for large downloads where supported 127 + - graceful degradation to reactive caching 128 + 129 + *** research todos :ai:claude: 130 + **** sync and data management 131 + ***** DONE identity and signature management 132 + ***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation 133 + ***** TODO webrtc p2p sync implementation patterns and reliability 134 + ***** TODO conflict resolution strategies for device-specific data in distributed sync 135 + ***** TODO content-addressed deduplication algorithms for rss/podcast content 136 + **** client-side storage and caching 137 + ***** TODO opfs storage limits and cleanup strategies for client-side caching 138 + ***** TODO practical background fetch api limits and edge cases for podcast downloads 139 + **** automation and intelligence 140 + ***** TODO llm-based regex generation for episode title parsing automation 141 + ***** TODO push notification subscription management and realm authentication 142 + **** platform and browser capabilities 143 + ***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip) 144 + ***** TODO progressive web app installation and platform-specific behaviors 145 + 146 + ** <2025-05-28 Wed> 147 + getting everything setup 148 + 149 + the biggest open question I have is what sort of privacy/encryption guarantee I need. I 150 + want the server to be able to do things like cache and store feed data long-term. 151 + 152 + Is "if you want full privacy, self-host" valid? 153 + 154 + *** possibilities 155 + 156 + - fully PWA 157 + - CON: cors, which would require a proxy anyway 158 + - CON: audio analysis, llm based stuff for categorization, etc. won't work 159 + - PRO: private as all get out 160 + - can still do WebRTC p2p sync for resiliancy 161 + - can still do server backups, if sync stream is encrypted, but no compaction would be available 162 + - could do _explicit_ server backups as dump files 163 + 164 + - self hostable 165 + - PRO: can do bunches of private stuff on the server, because if you don't want me to see it, do it elsewhere 166 + - CON: hard for folk to use 167 + 168 + *** sync conflict resolution design discussion :ai:claude: 169 + 170 + discussed the sync architecture and dexie conflict handling: 171 + 172 + *dexie syncable limitations*: 173 + - logical clocks handle causally-related changes well 174 + - basic timestamp-based conflict resolution for concurrent updates 175 + - last-writer-wins for same field conflicts 176 + - no sophisticated CRDT or vector clock support 177 + 178 + *solutions for podcast-specific conflicts*: 179 + 180 + - play records: device-specific approach 181 + - store separate ~play_records~ per ~device_id~ 182 + - each record: ~{ episode_id, device_id, position, completed, timestamp }~ 183 + - UI handles conflict resolution with "continue from X device?" prompts 184 + - avoids arbitrary timestamp wins, gives users control 185 + 186 + - subscription trees 187 + - store ~parent_path~ as single string field ("/Tech/Programming") 188 + - simpler than managing folder membership tables 189 + - conflicts still possible but contained to single field 190 + - could store move operations as events for richer resolution 191 + 192 + *other sync considerations*: 193 + - settings/preferences: distinguish device-local vs global 194 + - bulk operations: "mark all played" can create duplicate operations 195 + - metadata updates: server RSS updates vs local renames 196 + - temporal ordering: recently played lists, queue reordering 197 + - storage limits: cleanup operations conflicting across devices 198 + - feed state: refresh timestamps, error states 199 + 200 + *approach*: prefer "events not state" pattern and device-specific records where semantic conflicts are likely 201 + 202 + *** data model brainstorm :ai:claude: 203 + 204 + core entities designed with sync in mind: 205 + 206 + **** ~Feed~ :: RSS/podcast subscription 207 + - ~parent_path~ field for folder structure (eg. ~/Tech/Programming~) 208 + - ~is_private~ flag to skip server proxy 209 + - ~refresh_interval~ for custom update frequencies 210 + 211 + **** ~Episode~ :: individual podcast episodes 212 + - standard RSS metadata (guid, title, description, media url) 213 + - duration and file info for playback 214 + 215 + **** ~PlayRecord~ :: device-specific playback state 216 + - separate record per ~device_id~ to avoid timestamp conflicts 217 + - position, completed status, playback speed 218 + - UI can prompt "continue from X device?" for resolution 219 + 220 + **** ~QueueItem~ :: device-specific episode queue 221 + - ordered list with position field 222 + - ~device_id~ scoped to avoid queue conflicts 223 + 224 + **** ~Subscription~ :: feed membership settings 225 + - can be global or device-specific 226 + - auto-download preferences per device 227 + 228 + **** ~Settings~ :: split global vs device-local 229 + - theme, default speed = global 230 + - download path, audio device = device-local 231 + 232 + **** Event tables for complex operations: 233 + - ~FeedMoveEvent~ for folder reorganization 234 + - ~BulkMarkPlayedEvent~ for "mark all read" operations 235 + - better conflict resolution than direct state updates 236 + 237 + **** sync considerations 238 + - device identity established on first run 239 + - dexie syncable handles basic timestamp conflicts 240 + - prefer device-scoped records for semantic conflicts 241 + - event-driven pattern for bulk operations 242 + 243 + *** schema evolution from previous iteration :ai:claude: 244 + 245 + reviewed existing schema from tmp/feed.ts - well designed foundation: 246 + 247 + **** keep from original 248 + - Channel/ChannelEntry naming and structure 249 + - ~refreshHP~ adaptive refresh system (much better than simple intervals) 250 + - rich podcast metadata (people, tags, enclosure, podcast object) 251 + - HTTP caching with etag/status tracking 252 + - epoch millisecond timestamps 253 + - ~hashId()~ approach for entry IDs 254 + 255 + **** add for multi-device sync 256 + - ~PlayState~ table (device-scoped position/completion) 257 + - Subscription table (with ~parentPath~ for folders, device-scoped settings) 258 + - ~QueueItem~ table (device-scoped episode queues) 259 + - Device table (identity management) 260 + 261 + **** migration considerations 262 + - existing Channel/ChannelEntry can be preserved 263 + - new tables are additive 264 + - ~fetchAndUpsert~ method works well with server proxy architecture 265 + - dexie sync vs rxdb - need to evaluate change tracking capabilities 266 + 267 + *** content-addressed caching for offline resilience :ai:claude: 268 + 269 + designed caching system for when upstream feeds fail/disappear, building on existing cache-schema.ts: 270 + 271 + **** server-side schema evolution (drizzle sqlite): 272 + - keep existing ~httpCacheTable~ design (health tracking, http headers, ttl) 273 + - add ~contentHash~ field pointing to deduplicated content 274 + - new ~contentStoreTable~: deduplicated blobs by sha256 hash 275 + - new ~contentHistoryTable~: url -> contentHash timeline with isLatest flag 276 + - reference counting for garbage collection 277 + 278 + **** client-side OPFS storage 279 + - ~/cache/content/{contentHash}.xml~ for raw feeds 280 + - ~/cache/media/{contentHash}.mp3~ for podcast episodes 281 + - ~LocalCacheEntry~ metadata tracks expiration and offline-only flags 282 + - maintains last N versions per feed for historical access 283 + 284 + **** fetch strategy & fallback 285 + 1. check local OPFS cache first (fastest) 286 + 2. try server proxy ~/api/feed?url={feedUrl}~ (deduplicated) 287 + 3. server checks ~contentHistory~, serves latest or fetches upstream 288 + 4. server returns ~{contentHash, content, cached: boolean}~ 289 + 5. client stores with content hash as filename 290 + 6. emergency mode: serve stale content when upstream fails 291 + 292 + - preserves existing health tracking and HTTP caching logic 293 + - popular feeds cached once on server, many clients benefit 294 + - bandwidth savings via content hash comparison 295 + - historical feed state preservation (feeds disappear!) 296 + - true offline operation after initial sync 297 + 298 + ** <2025-05-29 Thu> :ai:claude: 299 + e2e encryption and invitation flow design 300 + 301 + worked through the crypto and invitation architecture. key decisions: 302 + 303 + *** keypair strategy 304 + - use jwk format for interoperability (server stores public keys) 305 + - ed25519 for signing, separate x25519 for encryption if needed 306 + - zustand lazy initialization pattern: ~ensureKeypair()~ on first use 307 + - store private jwk in persisted zustand state 308 + 309 + *** invitation flow: dual-jwt approach 310 + solved the chicken-and-egg problem of sharing encryption keys securely. 311 + 312 + **** qr code contains two signed jwts: 313 + 1. invitation token: ~{iss: inviter_fingerprint, sub: invitation_id, purpose: "realm_invite"}~ 314 + 2. encryption key token: ~{iss: inviter_fingerprint, ephemeral_private: base64_key, purpose: "ephemeral_key"}~ 315 + 316 + **** exchange process: 317 + 1. invitee posts jwt1 + their public keys to ~/invitations~ 318 + 2. server verifies jwt1 signature against realm members 319 + 3. if valid: adds invitee to realm, returns ~{realm_id, realm_members, encrypted_realm_key}~ 320 + 4. invitee verifies jwt2 signature against returned realm members 321 + 5. invitee extracts ephemeral private key, decrypts realm encryption key 322 + 323 + **** security properties: 324 + - server never has decryption capability (missing ephemeral private key) 325 + - both jwts must be signed by verified realm member 326 + - if first exchange fails, second jwt is cryptographically worthless 327 + - atomic operation: identity added only if invitation valid 328 + - built-in expiration and tamper detection via jwt standard 329 + 330 + **** considered alternatives: 331 + - raw ephemeral keys in qr: simpler but no authenticity 332 + - ecdh key agreement: chicken-and-egg problem with public key exchange 333 + - server escrow: good but missing authentication layer 334 + - password-based: requires secure out-of-band sharing 335 + 336 + the dual-jwt approach provides proper authenticated invitations while maintaining e2e encryption properties. 337 + 338 + **** refined dual-jwt with ephemeral signing 339 + simplified the approach by using ephemeral key for second jwt signature: 340 + 341 + **setup**: 342 + 1. inviter generates ephemeral keypair 343 + 2. encrypts realm key with ephemeral private key 344 + 3. posts to server: ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~ 345 + 346 + **qr code contains**: 347 + #+BEGIN_SRC json 348 + // JWT 1: signed with inviter's realm signing key 349 + { 350 + "realm_id": "uuid", 351 + "invitation_id": "uuid", 352 + "iss": "inviter_fingerprint" 353 + } 354 + 355 + // JWT 2: signed with ephemeral private key 356 + { 357 + "ephemeral_private": "base64_key", 358 + "invitation_id": "uuid" 359 + } 360 + #+END_SRC 361 + 362 + **exchange flow**: 363 + 1. submit jwt1 → server verifies against realm members → returns ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~ 364 + 2. verify jwt2 signature using ~ephemeral_public~ from server response 365 + 3. extract ~ephemeral_private~ from jwt2, decrypt realm key 366 + 367 + **benefits over previous version**: 368 + - no premature key disclosure (invitee keys shared via normal webrtc peering) 369 + - self-contained verification (ephemeral public key verifies jwt2) 370 + - cleaner separation of realm auth vs encryption key distribution 371 + - simpler flow (no need to return realm member list) 372 + 373 + **crypto verification principle**: digital signatures work as sign-with-private/verify-with-public, while encryption works as encrypt-with-public/decrypt-with-private. jwt2 verification uses signature verification, not decryption. 374 + 375 + **invitation flow diagram**: 376 + #+BEGIN_SRC mermaid 377 + sequenceDiagram 378 + participant I as Inviter 379 + participant S as Server 380 + participant E as Invitee 381 + 382 + Note over I: Generate ephemeral keypair 383 + I->>I: ephemeral_private, ephemeral_public 384 + 385 + Note over I: Encrypt realm key 386 + I->>I: encrypted_realm_key = encrypt(realm_key, ephemeral_private) 387 + 388 + I->>S: POST /invitations {invitation_id, realm_id, ephemeral_public, encrypted_realm_key} 389 + S-->>I: OK 390 + 391 + Note over I: Create JWTs for QR code 392 + I->>I: jwt1 = sign({realm_id, invitation_id}, inviter_private) 393 + I->>I: jwt2 = sign({ephemeral_private, invitation_id}, ephemeral_private) 394 + 395 + Note over I,E: QR code contains [jwt1, jwt2] 396 + 397 + E->>S: POST /invitations/exchange {jwt1} 398 + Note over S: Verify jwt1 signature against realm members 399 + S-->>E: {invitation_id, realm_id, ephemeral_public, encrypted_realm_key} 400 + 401 + Note over E: Verify jwt2 signature using ephemeral_public 402 + E->>E: verify_signature(jwt2, ephemeral_public) 403 + 404 + Note over E: Extract key and decrypt 405 + E->>E: ephemeral_private = decode(jwt2) 406 + E->>E: realm_key = decrypt(encrypted_realm_key, ephemeral_private) 407 + 408 + Note over E: Now member of realm! 409 + #+END_SRC 410 + 411 + **** jwk keypair generation and validation :ai:claude: 412 + 413 + discussed jwk vs raw crypto.subtle for keypair storage. since public keys need server storage for realm authorization, jwk is better for interoperability. 414 + 415 + **keypair generation**: 416 + #+BEGIN_SRC typescript 417 + const keypair = await crypto.subtle.generateKey( 418 + { name: "Ed25519" }, 419 + true, 420 + ["sign", "verify"] 421 + ); 422 + 423 + const publicJWK = await crypto.subtle.exportKey("jwk", keypair.publicKey); 424 + const privateJWK = await crypto.subtle.exportKey("jwk", keypair.privateKey); 425 + 426 + // JWK format: 427 + { 428 + "kty": "OKP", 429 + "crv": "Ed25519", 430 + "x": "base64url-encoded-public-key", 431 + "d": "base64url-encoded-private-key" // only in private JWK 432 + } 433 + #+END_SRC 434 + 435 + **client validation**: 436 + #+BEGIN_SRC typescript 437 + function isValidEd25519PublicJWK(jwk: any): boolean { 438 + return ( 439 + typeof jwk === 'object' && 440 + jwk.kty === 'OKP' && 441 + jwk.crv === 'Ed25519' && 442 + typeof jwk.x === 'string' && 443 + jwk.x.length === 43 && // base64url Ed25519 public key length 444 + !jwk.d && // public key shouldn't have private component 445 + !jwk.use || jwk.use === 'sig' 446 + ); 447 + } 448 + 449 + async function validatePublicKey(publicJWK: JsonWebKey): Promise<CryptoKey | null> { 450 + try { 451 + if (!isValidEd25519PublicJWK(publicJWK)) return null; 452 + 453 + const key = await crypto.subtle.importKey( 454 + 'jwk', 455 + publicJWK, 456 + { name: 'Ed25519' }, 457 + false, 458 + ['verify'] 459 + ); 460 + 461 + return key; 462 + } catch { 463 + return null; 464 + } 465 + } 466 + #+END_SRC 467 + 468 + **server validation (node.js)**: 469 + #+BEGIN_SRC typescript 470 + import { webcrypto } from 'node:crypto'; 471 + 472 + async function validateClientPublicKey(publicJWK: JsonWebKey): Promise<boolean> { 473 + try { 474 + if (!isValidEd25519PublicJWK(publicJWK)) return false; 475 + 476 + await webcrypto.subtle.importKey( 477 + 'jwk', 478 + publicJWK, 479 + { name: 'Ed25519' }, 480 + false, 481 + ['verify'] 482 + ); 483 + 484 + return true; 485 + } catch { 486 + return false; 487 + } 488 + } 489 + #+END_SRC 490 + 491 + **authentication flow**: 492 + #+BEGIN_SRC typescript 493 + // client signs message 494 + const authMessage = { 495 + realm: 'uuid-here', 496 + timestamp: Date.now(), 497 + action: 'join' 498 + }; 499 + 500 + const signature = await crypto.subtle.sign( 501 + 'Ed25519', 502 + privateKey, 503 + new TextEncoder().encode(JSON.stringify(authMessage)) 504 + ); 505 + 506 + // server verifies 507 + async function verifyAuth(req: AuthRequest): Promise<boolean> { 508 + const publicKey = await webcrypto.subtle.importKey( 509 + 'jwk', 510 + req.publicKey, 511 + { name: 'Ed25519' }, 512 + false, 513 + ['verify'] 514 + ); 515 + 516 + const messageBytes = new TextEncoder().encode(JSON.stringify(req.message)); 517 + const signatureBytes = new Uint8Array(req.signature); 518 + 519 + return await webcrypto.subtle.verify( 520 + 'Ed25519', 521 + publicKey, 522 + signatureBytes, 523 + messageBytes 524 + ); 525 + } 526 + #+END_SRC 527 + 528 + **** proposed schemas :ai:claude: 529 + 530 + ***** client-side schema (dexie) 531 + 532 + #+BEGIN_SRC typescript 533 + // Core RSS/Podcast data (from your existing design) 534 + interface Channel { 535 + id: string; 536 + feedUrl: string; 537 + htmlUrl?: string; 538 + imageUrl?: string; 539 + title?: string; 540 + description?: string; 541 + language?: string; 542 + people?: Record<string, string>; 543 + tags?: string[]; 544 + 545 + // Refresh management 546 + refreshHP: number; 547 + nextRefreshAt?: number; 548 + lastRefreshAt?: number; 549 + lastRefreshStatus?: string; 550 + lastRefreshHttpStatus?: number; 551 + lastRefreshHttpEtag?: string; 552 + 553 + // Cache info 554 + contentHash?: string; 555 + lastFetchedAt?: number; 556 + } 557 + 558 + interface ChannelEntry { 559 + id: string; 560 + channelId: string; 561 + guid: string; 562 + title: string; 563 + linkUrl?: string; 564 + imageUrl?: string; 565 + snippet?: string; 566 + content?: string; 567 + 568 + enclosure?: { 569 + url: string; 570 + type?: string; 571 + length?: number; 572 + }; 573 + 574 + podcast?: { 575 + explicit?: boolean; 576 + duration?: string; 577 + seasonNum?: number; 578 + episodeNum?: number; 579 + transcriptUrl?: string; 580 + }; 581 + 582 + publishedAt?: number; 583 + fetchedAt?: number; 584 + } 585 + 586 + // Device-specific sync tables 587 + interface PlayRecord { 588 + id: string; 589 + entryId: string; 590 + deviceId: string; 591 + position: number; 592 + duration?: number; 593 + completed: boolean; 594 + speed: number; 595 + updatedAt: number; 596 + } 597 + 598 + interface Subscription { 599 + id: string; 600 + channelId: string; 601 + deviceId?: string; 602 + parentPath: string; // "/Tech/Programming" 603 + autoDownload: boolean; 604 + downloadLimit?: number; 605 + isActive: boolean; 606 + createdAt: number; 607 + updatedAt: number; 608 + } 609 + 610 + interface QueueItem { 611 + id: string; 612 + entryId: string; 613 + deviceId: string; 614 + position: number; 615 + addedAt: number; 616 + } 617 + 618 + interface Device { 619 + id: string; 620 + name: string; 621 + platform: string; 622 + lastSeen: number; 623 + } 624 + 625 + // Local cache metadata 626 + interface LocalCache { 627 + id: string; 628 + url: string; 629 + contentHash: string; 630 + filePath: string; // OPFS path 631 + cachedAt: number; 632 + expiresAt?: number; 633 + size: number; 634 + isOfflineOnly: boolean; 635 + } 636 + 637 + // Dexie schema 638 + const db = new Dexie('SkypodDB'); 639 + db.version(1).stores({ 640 + channels: '&id, feedUrl, contentHash', 641 + channelEntries: '&id, channelId, publishedAt', 642 + playRecords: '&id, [entryId+deviceId], deviceId, updatedAt', 643 + subscriptions: '&id, channelId, deviceId, parentPath', 644 + queueItems: '&id, entryId, deviceId, position', 645 + devices: '&id, lastSeen', 646 + localCache: '&id, url, contentHash, expiresAt' 647 + }); 648 + #+END_SRC 649 + 650 + ***** server-side schema 651 + 652 + #+BEGIN_SRC typescript 653 + // Content-addressed cache 654 + interface ContentStore { 655 + contentHash: string; // Primary key 656 + content: Buffer; // Raw feed content 657 + contentType: string; 658 + contentLength: number; 659 + firstSeenAt: number; 660 + referenceCount: number; 661 + } 662 + 663 + interface ContentHistory { 664 + id: string; 665 + url: string; 666 + contentHash: string; 667 + fetchedAt: number; 668 + isLatest: boolean; 669 + } 670 + 671 + // HTTP cache with health tracking (from your existing design) 672 + interface HttpCache { 673 + key: string; // URL hash, primary key 674 + url: string; 675 + 676 + status: 'alive' | 'dead'; 677 + lastFetchedAt: number; 678 + lastFetchError?: string; 679 + lastFetchErrorStreak: number; 680 + 681 + lastHttpStatus: number; 682 + lastHttpEtag?: string; 683 + lastHttpHeaders: Record<string, string>; 684 + expiresAt: number; 685 + expirationTtl: number; 686 + 687 + contentHash: string; // Points to ContentStore 688 + } 689 + 690 + // Sync/auth tables 691 + interface Realm { 692 + id: string; // UUID 693 + createdAt: number; 694 + verifiedKeys: string[]; // Public key list 695 + } 696 + 697 + interface PeerConnection { 698 + id: string; 699 + realmId: string; 700 + publicKey: string; 701 + lastSeen: number; 702 + isOnline: boolean; 703 + } 704 + 705 + // Media cache for podcast episodes 706 + interface MediaCache { 707 + contentHash: string; // Primary key 708 + originalUrl: string; 709 + mimeType: string; 710 + fileSize: number; 711 + content: Buffer; 712 + cachedAt: number; 713 + accessCount: number; 714 + } 715 + #+END_SRC 716 + 717 + **** episode title parsing for sub-feed groupings :ai:claude: 718 + 719 + *problem*: some podcast feeds contain multiple shows, need hierarchical organization within a feed 720 + 721 + *example*: "Apocalypse Players" podcast 722 + - episode title: "A Term of Art 6 - Winston's Hollow" 723 + - desired grouping: "Apocalypse Players > A Term of Art > 6 - Winston's Hollow" 724 + - UI shows sub-shows within the main feed 725 + 726 + ***** approaches considered 727 + 728 + 1. *manual regex patterns* (short-term solution) 729 + - user provides regex with capture groups = tags 730 + - reliable, immediate, user-controlled 731 + - requires manual setup per feed 732 + 733 + 2. *LLM-generated regex* (automation goal) 734 + - analyze last 100 episode titles 735 + - generate regex pattern automatically 736 + - good balance of automation + reliability 737 + 738 + 3. *NER model training* (experimental) 739 + - train spacy model for episode title parsing 740 + - current prototype: 150 labelled examples, limited success 741 + - needs more training data to be viable 742 + 743 + ***** data model implications 744 + 745 + - add regex pattern field to Channel/Feed 746 + - store extracted groupings as hierarchical tags on ~ChannelEntry~ 747 + - maybe add grouping/series field to episodes 748 + 749 + ***** plan 750 + 751 + *preference*: start with manual regex, evolve toward LLM automation 752 + 753 + *implementation design*: 754 + - if no title pattern: episodes are direct children of the feed 755 + - title pattern = regex with named capture groups + path template 756 + 757 + *example configuration*: 758 + - regex: ~^(?<series>[^0-9]+)\s*(?<episode>\d+)\s*-\s*(?<title>.+)$~ 759 + - path template: ~{series} > Episode {episode} - {title}~ 760 + - result: "A Term of Art 6 - Winston's Hollow" → "A Term of Art > Episode 6 - Winston's Hollow" 761 + 762 + *schema additions*: 763 + #+BEGIN_SRC typescript 764 + interface Channel { 765 + // ... existing fields 766 + titlePatterns?: Array<{ 767 + name: string; // "Main Episodes", "Bonus Content", etc. 768 + regex: string; // named capture groups 769 + pathTemplate: string; // interpolation template 770 + priority: number; // order to try patterns (lower = first) 771 + isActive: boolean; // can disable without deleting 772 + }>; 773 + fallbackPath?: string; // template for unmatched episodes 774 + } 775 + 776 + interface ChannelEntry { 777 + // ... existing fields 778 + parsedPath?: string; // computed from titlePattern 779 + parsedGroups?: Record<string, string>; // captured groups 780 + matchedPatternName?: string; // which pattern was used 781 + } 782 + #+END_SRC 783 + 784 + *pattern matching logic*: 785 + 1. try patterns in priority order (lower number = higher priority) 786 + 2. first matching pattern wins 787 + 3. if no patterns match, use fallbackPath template (e.g., "Misc > {title}") 788 + 4. if no fallbackPath, episode stays direct child of feed 789 + 790 + *example multi-pattern setup*: 791 + - Pattern 1: "Main Episodes" - ~^(?<series>[^0-9]+)\s*(?<episode>\d+)~ → ~{series} > Episode {episode}~ 792 + - Pattern 2: "Bonus Content" - ~^Bonus:\s*(?<title>.+)~ → ~Bonus > {title}~ 793 + - Fallback: ~Misc > {title}~ 794 + 795 + **** scoped tags and filter-based UI evolution :ai:claude: 796 + 797 + *generalization*: move from rigid hierarchies to tag-based filtering system 798 + 799 + *tag scoping*: 800 + - feed-level tags: "Tech", "Gaming", "D&D" 801 + - episode-level tags: from regex captures like "series:CriticalRole", "campaign:2", "type:main" 802 + - user tags: manual additions like "favorites", "todo" 803 + 804 + *UI as tag filtering*: 805 + - default view: all episodes grouped by feed 806 + - filter by ~series:CriticalRole~ → shows only CR episodes across all feeds 807 + - filter by ~type:bonus~ → shows bonus content from all podcasts 808 + - combine filters: ~series:CriticalRole AND type:main~ → main CR episodes only 809 + 810 + *benefits*: 811 + - no rigid hierarchy - users create their own views 812 + - regex patterns become automated episode taggers 813 + - same filtering system works for search, organization, queues 814 + - tags are syncable metadata, views are client-side 815 + 816 + *schema evolution*: 817 + #+BEGIN_SRC typescript 818 + interface Tag { 819 + scope: 'feed' | 'episode' | 'user'; 820 + key: string; // "series", "type", "campaign" 821 + value: string; // "CriticalRole", "bonus", "2" 822 + } 823 + 824 + interface ChannelEntry { 825 + // ... existing 826 + tags: Tag[]; // includes regex-generated + manual 827 + } 828 + 829 + interface FilterView { 830 + id: string; 831 + name: string; 832 + folderPath: string; // "/Channels/Critical Role" 833 + filters: Array<{ 834 + key: string; 835 + value: string; 836 + operator: 'equals' | 'contains' | 'not'; 837 + }>; 838 + isDefault: boolean; 839 + createdAt: number; 840 + } 841 + #+END_SRC 842 + 843 + **** default UI construction and feed merging :ai:claude: 844 + 845 + *auto-generated views on subscribe*: 846 + - subscribe to "Critical Role" → creates ~/Channels/Critical Role~ folder 847 + - default filter view: ~feed:CriticalRole~ (shows all episodes from that feed) 848 + - user can customize, split into sub-views, or delete 849 + 850 + *smart view suggestions*: 851 + - after regex patterns generate tags, suggest splitting views 852 + - "I noticed episodes with ~series:Campaign2~ and ~series:Campaign3~ - create separate views?" 853 + - "Create view for ~type:bonus~ episodes?" 854 + 855 + *view management UX*: 856 + - right-click feed → "Split by series", "Split by type" 857 + - drag episodes between views to create manual filters 858 + - views can be nested: ~/Channels/Critical Role/Campaign 2/Main Episodes~ 859 + 860 + *feed merging for multi-source shows*: 861 + problem: patreon feed + main show feed for same podcast 862 + 863 + #+BEGIN_EXAMPLE 864 + /Channels/ 865 + Critical Role/ 866 + All Episodes # merged view: feed:CriticalRole OR feed:CriticalRolePatreon 867 + Main Feed # filter: feed:CriticalRole 868 + Patreon Feed # filter: feed:CriticalRolePatreon 869 + #+END_EXAMPLE 870 + 871 + *deduplication strategy*: 872 + - episodes matched by ~guid~ or similar content hash 873 + - duplicate episodes get ~source:main,patreon~ tags 874 + - UI shows single episode with source indicators 875 + - user can choose preferred source for playback 876 + - play state syncs across all sources of same episode 877 + 878 + *feed relationship schema*: 879 + #+BEGIN_SRC typescript 880 + interface FeedGroup { 881 + id: string; 882 + name: string; // "Critical Role" 883 + feedIds: string[]; // [mainFeedId, patreonFeedId] 884 + mergeStrategy: 'guid' | 'title' | 'contentHash'; 885 + defaultView: FilterView; 886 + } 887 + 888 + interface ChannelEntry { 889 + // ... existing 890 + duplicateOf?: string; // points to canonical episode ID 891 + sources: string[]; // feed IDs where this episode appears 892 + } 893 + #+END_SRC 894 + 895 + **per-view settings and state**: 896 + each filter view acts like a virtual feed with its own: 897 + - unread counts (episodes matching filter that haven't been played) 898 + - notification settings (notify for new episodes in this view) 899 + - muted state (hide notifications, mark as read automatically) 900 + - auto-download preferences (download episodes that match this filter) 901 + - play queue integration (add new episodes to queue) 902 + 903 + **use cases**: 904 + - mute "Bonus Content" view but keep notifications for main episodes 905 + - auto-download only "Campaign 2" episodes, skip everything else 906 + - separate unread counts: "5 unread in Main Episodes, 2 in Bonus" 907 + - queue only certain series automatically 908 + 909 + **schema additions**: 910 + #+BEGIN_SRC typescript 911 + interface FilterView { 912 + // ... existing fields 913 + settings: { 914 + notificationsEnabled: boolean; 915 + isMuted: boolean; 916 + autoDownload: boolean; 917 + autoQueue: boolean; 918 + downloadLimit?: number; // max episodes to keep 919 + }; 920 + state: { 921 + unreadCount: number; 922 + lastViewedAt?: number; 923 + isCollapsed: boolean; // in sidebar 924 + }; 925 + } 926 + #+END_SRC 927 + 928 + *inheritance behavior*: 929 + - new filter views inherit settings from parent feed/group 930 + - user can override per-view 931 + - "mute all Critical Role" vs "mute only bonus episodes" 932 + 933 + **** client-side episode caching strategy :ai:claude: 934 + 935 + *architecture*: service worker-based transparent caching 936 + 937 + *flow*: 938 + 1. audio player requests ~/audio?url={episodeUrl}~ 939 + 2. service worker intercepts request 940 + 3. if present in cache (with Range header support): 941 + - serve from cache 942 + 4. else: 943 + - let request continue to server (immediate playback) 944 + - simultaneously start background fetch of full audio file 945 + - when complete, broadcast "episode-cached" event 946 + - audio player catches event and restarts feed → now uses cached version 947 + 948 + **benefits**: 949 + - no playback interruption (streaming starts immediately) 950 + - seamless transition to cached version 951 + - Range header support for seeking/scrubbing 952 + - transparent to audio player implementation 953 + 954 + *implementation considerations*: 955 + - cache storage limits and cleanup policies 956 + - partial download resumption if interrupted 957 + - cache invalidation when episode URLs change 958 + - offline playback support 959 + - progress tracking for background downloads 960 + 961 + **schema additions**: 962 + #+BEGIN_SRC typescript 963 + interface CachedEpisode { 964 + episodeId: string; 965 + originalUrl: string; 966 + cacheKey: string; // for cache API 967 + fileSize: number; 968 + cachedAt: number; 969 + lastAccessedAt: number; 970 + downloadProgress?: number; // 0-100 for in-progress downloads 971 + } 972 + #+END_SRC 973 + 974 + **service worker events**: 975 + - ~episode-cache-started~ - background download began 976 + - ~episode-cache-progress~ - download progress update 977 + - ~episode-cache-complete~ - ready to switch to cached version 978 + - ~episode-cache-error~ - download failed, stay with streaming 979 + 980 + **background sync for proactive downloads**: 981 + 982 + **browser support reality**: 983 + - Background Sync API: good support (Chrome/Edge, limited Safari) 984 + - Periodic Background Sync: very limited (Chrome only, requires PWA install) 985 + - Push notifications: good support, but requires user permission 986 + 987 + **hybrid approach**: 988 + 1. **foreground sync** (reliable): when app is open, check for new episodes 989 + 2. **background sync** (opportunistic): register sync event when app closes 990 + 3. **push notifications** (fallback): server pushes "new episodes available" 991 + 4. **manual sync** (always works): pull-to-refresh, settings toggle 992 + 993 + **implementation strategy**: 994 + #+BEGIN_SRC typescript 995 + // Register background sync when app becomes hidden 996 + document.addEventListener('visibilitychange', () => { 997 + if (document.hidden && 'serviceWorker' in navigator) { 998 + navigator.serviceWorker.ready.then(registration => { 999 + return registration.sync.register('download-episodes'); 1000 + }); 1001 + } 1002 + }); 1003 + 1004 + // Service worker handles sync event 1005 + self.addEventListener('sync', event => { 1006 + if (event.tag === 'download-episodes') { 1007 + event.waitUntil(syncEpisodes()); 1008 + } 1009 + }); 1010 + #+END_SRC 1011 + 1012 + **realistic expectations**: 1013 + - iOS Safari: very limited background processing 1014 + - Android Chrome: decent background sync support 1015 + - Desktop: mostly works 1016 + - battery/data saver modes: disabled by OS 1017 + 1018 + **fallback strategy**: rely primarily on foreground sync + push notifications, treat background sync as nice-to-have enhancement 1019 + 1020 + **push notification sync workflow**: 1021 + 1022 + **server-side trigger**: 1023 + 1. server detects new episodes during RSS refresh 1024 + 2. check which users are subscribed to that feed 1025 + 3. send push notification with episode metadata payload 1026 + 4. notification wakes up service worker on client 1027 + 1028 + **service worker notification handler**: 1029 + #+BEGIN_SRC typescript 1030 + self.addEventListener('push', event => { 1031 + const data = event.data?.json(); 1032 + 1033 + if (data.type === 'new-episodes') { 1034 + event.waitUntil( 1035 + // Start background download of new episodes 1036 + downloadNewEpisodes(data.episodes) 1037 + .then(() => { 1038 + // Show notification to user 1039 + return self.registration.showNotification('New episodes available', { 1040 + body: ~${data.episodes.length} new episodes downloaded~, 1041 + icon: '/icon-192.png', 1042 + badge: '/badge-72.png', 1043 + tag: 'new-episodes', 1044 + data: { episodeIds: data.episodes.map(e => e.id) } 1045 + }); 1046 + }) 1047 + ); 1048 + } 1049 + }); 1050 + 1051 + // Handle notification click 1052 + self.addEventListener('notificationclick', event => { 1053 + event.notification.close(); 1054 + 1055 + // Open app to specific episode or feed 1056 + event.waitUntil( 1057 + clients.openWindow(~/episodes/${event.notification.data.episodeIds[0]}~) 1058 + ); 1059 + }); 1060 + #+END_SRC 1061 + 1062 + **server push logic**: 1063 + - batch notifications (don't spam for every episode) 1064 + - respect user notification preferences from FilterView settings 1065 + - include episode metadata in payload to avoid round-trip 1066 + - throttle notifications (max 1 per feed per hour?) 1067 + 1068 + **user flow**: 1069 + 1. new episode published → server pushes notification 1070 + 2. service worker downloads episode in background 1071 + 3. user sees "New episodes downloaded" notification 1072 + 4. tap notification → opens app to new episode, ready to play offline 1073 + 1074 + *benefits*: 1075 + - true background downloading without user interaction 1076 + - works even when app is closed 1077 + - respects per-feed notification settings 1078 + 1079 + **push payload size constraints**: 1080 + - **limit**: ~4KB (4,096 bytes) across most services 1081 + - **practical limit**: ~3KB to account for service overhead 1082 + - **implications for episode metadata**: 1083 + 1084 + #+BEGIN_SRC json 1085 + { 1086 + "type": "new-episodes", 1087 + "episodes": [ 1088 + { 1089 + "id": "ep123", 1090 + "channelId": "ch456", 1091 + "title": "Episode Title", 1092 + "url": "https://...", 1093 + "duration": 3600, 1094 + "size": 89432112 1095 + } 1096 + ] 1097 + } 1098 + #+END_SRC 1099 + 1100 + **payload optimization strategies**: 1101 + - minimal episode metadata in push (id, url, basic info) 1102 + - batch multiple episodes in single notification 1103 + - full episode details fetched after service worker wakes up 1104 + - URL shortening for long episode URLs 1105 + - compress JSON payload if needed 1106 + 1107 + **alternative for large payloads**: 1108 + - push notification contains only "new episodes available" signal 1109 + - service worker makes API call to get full episode list 1110 + - trade-off: requires network round-trip but unlimited data 1111 + 1112 + **logical clock sync optimization**: 1113 + 1114 + much simpler approach using sync revisions: 1115 + 1116 + #+BEGIN_SRC json 1117 + { 1118 + "type": "sync-available", 1119 + "fromRevision": 12345, 1120 + "toRevision": 12389, 1121 + "changeCount": 8 1122 + } 1123 + #+END_SRC 1124 + 1125 + **service worker sync flow**: 1126 + 1. push notification wakes service worker with revision range 1127 + 2. service worker fetches ~/sync?from=12345&to=12389~ 1128 + 3. server returns only changes in that range (episodes, feed updates, etc) 1129 + 4. service worker applies changes to local dexie store 1130 + 5. service worker queues background downloads for new episodes 1131 + 6. updates local revision to 12389 1132 + 1133 + **benefits of revision-based approach**: 1134 + - tiny push payload (just revision numbers) 1135 + - server can efficiently return only changes in range 1136 + - automatic deduplication (revision already applied = skip) 1137 + - works for any sync data (episodes, feed metadata, user settings) 1138 + - handles offline gaps gracefully (fetch missing revision ranges) 1139 + 1140 + **sync API response**: 1141 + #+BEGIN_SRC typescript 1142 + interface SyncResponse { 1143 + fromRevision: number; 1144 + toRevision: number; 1145 + changes: Array<{ 1146 + type: 'episode' | 'channel' | 'subscription'; 1147 + operation: 'create' | 'update' | 'delete'; 1148 + data: any; 1149 + revision: number; 1150 + }>; 1151 + } 1152 + #+END_SRC 1153 + 1154 + **integration with episode downloads**: 1155 + - service worker processes sync changes 1156 + - identifies new episodes that match user's auto-download filters 1157 + - queues those for background cache fetching 1158 + - much more efficient than sending episode metadata in push payload 1159 + 1160 + **service worker processing time constraints**: 1161 + 1162 + **hard limits**: 1163 + - **30 seconds idle timeout**: service worker terminates after 30s of inactivity 1164 + - **5 minutes event processing**: single event/request must complete within 5 minutes 1165 + - **30 seconds fetch timeout**: individual network requests timeout after 30s 1166 + - **notification requirement**: push events MUST display notification before promise settles 1167 + 1168 + **practical implications**: 1169 + - sync API call (~/sync?from=X&to=Y~) must complete within 30s 1170 + - large episode downloads must be queued, not started immediately in push handler 1171 + - use ~event.waitUntil()~ to keep service worker alive during processing 1172 + - break large operations into smaller chunks 1173 + 1174 + **recommended push event flow**: 1175 + #+BEGIN_SRC typescript 1176 + self.addEventListener('push', event => { 1177 + const data = event.data?.json(); 1178 + 1179 + event.waitUntil( 1180 + // Must complete within 5 minutes total 1181 + handlePushSync(data) 1182 + .then(() => { 1183 + // Required: show notification before promise settles 1184 + return self.registration.showNotification('Episodes synced'); 1185 + }) 1186 + ); 1187 + }); 1188 + 1189 + async function handlePushSync(data) { 1190 + // 1. Quick sync API call (< 30s) 1191 + const changes = await fetch(~/sync?from=${data.fromRevision}&to=${data.toRevision}~); 1192 + 1193 + // 2. Apply changes to dexie store (fast, local) 1194 + await applyChangesToStore(changes); 1195 + 1196 + // 3. Queue episode downloads for later (don't start here) 1197 + await queueEpisodeDownloads(changes.newEpisodes); 1198 + 1199 + // Total time: < 5 minutes, preferably < 30s 1200 + } 1201 + #+END_SRC 1202 + 1203 + *download strategy*: use push event for sync + queuing, separate background tasks for actual downloads 1204 + 1205 + *background fetch API for large downloads*: 1206 + 1207 + *progressive enhancement approach*: 1208 + #+BEGIN_SRC typescript 1209 + async function queueEpisodeDownloads(episodes) { 1210 + for (const episode of episodes) { 1211 + if ('serviceWorker' in navigator && 'BackgroundFetch' in window) { 1212 + // Chrome/Edge: use Background Fetch API for true background downloading 1213 + await navigator.serviceWorker.ready.then(registration => { 1214 + return registration.backgroundFetch.fetch( 1215 + ~episode-${episode.id}~, 1216 + episode.url, 1217 + { 1218 + icons: [{ src: '/icon-256.png', sizes: '256x256', type: 'image/png' }], 1219 + title: ~Downloading: ${episode.title}~, 1220 + downloadTotal: episode.fileSize 1221 + } 1222 + ); 1223 + }); 1224 + } else { 1225 + // Fallback: queue for reactive download (download while streaming) 1226 + await queueReactiveDownload(episode); 1227 + } 1228 + } 1229 + } 1230 + 1231 + // Handle background fetch completion 1232 + self.addEventListener('backgroundfetch', event => { 1233 + if (event.tag.startsWith('episode-')) { 1234 + event.waitUntil(handleEpisodeDownloadComplete(event)); 1235 + } 1236 + }); 1237 + #+END_SRC 1238 + 1239 + *browser support reality*: 1240 + - *Chrome/Edge*: Background Fetch API supported 1241 + - *Firefox/Safari*: not supported, fallback to reactive caching 1242 + - *mobile*: varies by platform and browser 1243 + 1244 + *benefits when available*: 1245 + - true background downloading (survives app close, browser close) 1246 + - built-in download progress UI 1247 + - automatic retry on network failure 1248 + - no service worker time limits during download 1249 + 1250 + *graceful degradation*: 1251 + - detect support, use when available 1252 + - fallback to reactive caching (download while streaming) 1253 + - user gets best experience possible on their platform 1254 + 1255 + *** research todos :ai:claude: 1256 + 1257 + high-level unanswered questions from architecture brainstorming: 1258 + 1259 + **** sync and data management 1260 + ***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation 1261 + ***** TODO webrtc p2p sync implementation patterns and reliability 1262 + ***** TODO conflict resolution strategies for device-specific data in distributed sync 1263 + ***** TODO content-addressed deduplication algorithms for rss/podcast content 1264 + **** client-side storage and caching 1265 + ***** TODO opfs storage limits and cleanup strategies for client-side caching 1266 + ***** TODO practical background fetch api limits and edge cases for podcast downloads 1267 + **** automation and intelligence 1268 + ***** TODO llm-based regex generation for episode title parsing automation 1269 + ***** TODO push notification subscription management and realm authentication 1270 + **** platform and browser capabilities 1271 + ***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip) 1272 + ***** TODO progressive web app installation and platform-specific behaviors 1273 + 1274 + * webtorrent brainstorming 6/16 :ai:claude: 1275 + 1276 + ** WebTorrent + Event Log CRDT Architecture 1277 + 1278 + *** Core Concept Split 1279 + We identified two fundamentally different types of data that need different sync strategies: 1280 + 1281 + **** 1. Dynamic Metadata (Event Log CRDT) 1282 + - **Data**: Play state, scroll position, settings, subscriptions 1283 + - **Characteristics**: Frequently changing, small, device-specific 1284 + - **Solution**: Event log with Hybrid Logical Clocks (HLC) 1285 + - **Sync**: Merkle tree efficient diff + P2P exchange via realm 1286 + 1287 + **** 2. Static Content (WebTorrent) 1288 + - **Data**: RSS feeds, podcast episodes (audio files) 1289 + - **Characteristics**: Immutable, large, content-addressable 1290 + - **Solution**: WebTorrent with infohash references 1291 + - **Storage**: IndexedDB chunk store (idb-chunk-store npm package) 1292 + 1293 + *** Event Log CRDT Design 1294 + 1295 + **** Hybrid Logical Clock (HLC) 1296 + Based on James Long's crdt-example-app implementation: 1297 + #+BEGIN_SRC typescript 1298 + interface HLC { 1299 + millis: number; // physical time 1300 + counter: number; // logical counter (0-65535) 1301 + node: string; // device identity ID 1302 + } 1303 + 1304 + interface SyncEvent { 1305 + timestamp: HLC; 1306 + type: 'subscribe' | 'unsubscribe' | 'markPlayed' | 'updatePosition' | ... 1307 + payload: any; 1308 + } 1309 + #+END_SRC 1310 + 1311 + **Benefits**: 1312 + - Causality preserved even with clock drift 1313 + - Compact representation (vs full vector clocks) 1314 + - Total ordering via (millis, counter, node) comparison 1315 + - No merge conflicts - just union of events 1316 + 1317 + **** Merkle Tree Sync 1318 + Efficient sync using merkle trees over time ranges: 1319 + 1320 + #+BEGIN_SRC typescript 1321 + interface RangeMerkleNode { 1322 + startTime: HLC; 1323 + endTime: HLC; 1324 + hash: string; 1325 + eventCount: number; 1326 + } 1327 + #+END_SRC 1328 + 1329 + **Sync Protocol**: 1330 + 1. Exchange merkle roots 1331 + 2. If different, drill down to find divergent ranges 1332 + 3. Exchange only missing events 1333 + 4. Apply in HLC order 1334 + 1335 + **Key insight**: No merge conflicts because events are immutable and ordered by HLC 1336 + 1337 + **** Progressive Compaction 1338 + Use idle time to compact old events: 1339 + - Recent (< 5 min): Individual events for active sync 1340 + - Hourly chunks: After 5 minutes 1341 + - Daily chunks: After 24 hours 1342 + - Monthly chunks: After 30 days 1343 + 1344 + Benefits: 1345 + - Fast recent sync 1346 + - Efficient storage of history 1347 + - Old chunks can move to OPFS as blobs 1348 + 1349 + *** WebTorrent Integration 1350 + 1351 + **** Content Flow 1352 + 1. **CORS-friendly feeds**: 1353 + - Browser fetches directly 1354 + - Creates torrent with original URL as webseed 1355 + - Broadcasts infohash to realm 1356 + 1357 + 2. **CORS-blocked feeds**: 1358 + - Server fetches and hashes 1359 + - Returns infohash (server doesn't store content) 1360 + - Client uses WebTorrent with original URL as webseed 1361 + 1362 + **** Realm as Private Tracker 1363 + - Realm members announce infohashes they have 1364 + - No need for DHT or public trackers 1365 + - Existing WebRTC signaling used for peer discovery 1366 + - Private swarm for each realm 1367 + 1368 + **** Storage via Chunk Store 1369 + Use `idb-chunk-store` (or similar) for persistence: 1370 + - WebTorrent handles chunking/verification 1371 + - IndexedDB provides persistence across sessions 1372 + - Abstract-chunk-store interface allows swapping implementations 1373 + 1374 + *** Bootstrap & History Sharing 1375 + 1376 + **** History Snapshots as Torrents 1377 + Serialize event history into content-addressed chunks: 1378 + 1379 + #+BEGIN_SRC typescript 1380 + interface HistorySnapshot { 1381 + period: "2024-05"; 1382 + events: SyncEvent[]; 1383 + merkleRoot: string; 1384 + deviceStates: Record<string, DeviceState>; 1385 + } 1386 + 1387 + // Share via WebTorrent 1388 + const blob = await serializeSnapshot(events); 1389 + const infohash = await createTorrent(blob); 1390 + realm.broadcast({ type: "historySnapshot", period, infohash }); 1391 + #+END_SRC 1392 + 1393 + **** Materialized State Snapshots 1394 + Using dexie-export-import for database snapshots: 1395 + 1396 + #+BEGIN_SRC typescript 1397 + const dbBlob = await exportDB(db, { 1398 + tables: ['channels', 'channelEntries'], 1399 + filter: (table, value) => !isDeviceSpecific(table, value) 1400 + }); 1401 + 1402 + const infohash = await createTorrent(dbBlob); 1403 + #+END_SRC 1404 + 1405 + **** New Device Bootstrap 1406 + 1. Download latest DB snapshot → Instant UI 1407 + 2. Download recent events → Apply updates 1408 + 3. Background: fetch historical event logs 1409 + 4. Result: Fast startup with complete history 1410 + 1411 + *** Implementation Benefits 1412 + 1413 + 1. **Privacy**: No server sees listening history 1414 + 2. **Offline-first**: Everything works locally 1415 + 3. **Efficient sync**: Only exchange missing data 1416 + 4. **P2P content**: Reduce server bandwidth 1417 + 5. **Scalable**: Torrents for bulk data transfer 1418 + 6. **Verifiable**: Merkle trees ensure consistency 1419 + 1420 + *** Next Steps 1421 + - [ ] Implement HLC timestamps 1422 + - [ ] Build merkle tree sync protocol 1423 + - [ ] Integrate WebTorrent with realm signaling 1424 + - [ ] Create history snapshot system 1425 + - [ ] Test cross-device sync scenarios 1426 + 1427 + ** Additional Architecture Insights 1428 + 1429 + *** Unified Infohash Approach 1430 + Instead of having separate hashes for merkle tree and WebTorrent, use infohashes throughout: 1431 + 1432 + **** Hierarchical Infohash Structure 1433 + #+BEGIN_SRC typescript 1434 + // Leaf level: individual files 1435 + const episode1Hash = await createTorrent(episode1.mp3); 1436 + const feedXmlHash = await createTorrent(feed.xml); 1437 + 1438 + // Directory level: multi-file torrent 1439 + const feedTorrent = await createTorrent({ 1440 + name: 'example.com.rss', 1441 + files: [ 1442 + { path: 'rss.xml', infohash: feedXmlHash }, 1443 + { path: 'episode-1.mp3', infohash: episode1Hash } 1444 + ] 1445 + }); 1446 + 1447 + // Root level: torrent of feed torrents 1448 + const rootTorrent = await createTorrent({ 1449 + name: 'feeds', 1450 + folders: [ 1451 + { path: 'example.com.rss', infohash: feedTorrent.infoHash } 1452 + ] 1453 + }); 1454 + #+END_SRC 1455 + 1456 + Benefits: 1457 + - Single hash type throughout system 1458 + - Progressive loading (directory structure first, then files) 1459 + - Natural deduplication 1460 + - WebTorrent native sharing of folder structures 1461 + 1462 + *** Long-term Event Log Scaling 1463 + 1464 + **** Checkpoint + Delta Pattern 1465 + For handling millions of events, use periodic checkpoints: 1466 + 1467 + #+BEGIN_SRC typescript 1468 + interface EventCheckpoint { 1469 + hlc: HLC; 1470 + stateSnapshot: { 1471 + subscriptions: Channel[]; 1472 + playStates: PlayRecord[]; 1473 + settings: Settings; 1474 + }; 1475 + eventCount: number; 1476 + infohash: string; // torrent of this checkpoint 1477 + } 1478 + 1479 + // Every 10k events or monthly 1480 + async function createCheckpoint(): Promise<Checkpoint> { 1481 + const currentHLC = getLatestEventHLC(); 1482 + 1483 + // Export materialized state using dexie-export-import 1484 + const dbBlob = await exportDB(db, { 1485 + filter: (table, value) => { 1486 + return !['activeSyncs', 'tempData'].includes(table); 1487 + } 1488 + }); 1489 + 1490 + const infohash = await createTorrent(dbBlob); 1491 + return { hlc: currentHLC, dbExport: dbBlob, infohash }; 1492 + } 1493 + #+END_SRC 1494 + 1495 + **** Bootstrap Flow with Checkpoints 1496 + 1. New device downloads latest checkpoint via WebTorrent 1497 + 2. Imports directly to IndexedDB: `await importDB(checkpoint.blob)` 1498 + 3. Requests only recent events since checkpoint 1499 + 4. Applies recent events to catch up 1500 + 1501 + Benefits: 1502 + - Fast bootstrap (one checkpoint instead of million events) 1503 + - No double materialization (IndexedDB is already materialized state) 1504 + - P2P distribution of checkpoints 1505 + - Clear version migration path 1506 + 1507 + *** Sync State Management 1508 + 1509 + **** Catching Up vs Live Events 1510 + #+BEGIN_SRC typescript 1511 + interface SyncState { 1512 + localHLC: HLC; 1513 + remoteHLC: HLC; 1514 + mode: 'catching-up' | 'live'; 1515 + } 1516 + 1517 + // Separate handlers for historical vs live events 1518 + async function replayHistoricalEvents(from: HLC, to: HLC) { 1519 + const events = await fetchEvents(from, to); 1520 + 1521 + // Process in batches without UI updates 1522 + await db.transaction('rw', db.tables, async () => { 1523 + for (const batch of chunks(events, 1000)) { 1524 + await Promise.all(batch.map(applyEventSilently)); 1525 + } 1526 + }); 1527 + 1528 + // One UI update at the end 1529 + notifyUI('Sync complete', { newEpisodes: 47 }); 1530 + } 1531 + 1532 + function handleLiveEvent(event: SyncEvent) { 1533 + // Real-time event - update UI immediately 1534 + applyEvent(event); 1535 + if (event.type === 'newEpisode') { 1536 + showNotification(`New episode: ${event.title}`); 1537 + } 1538 + } 1539 + #+END_SRC 1540 + 1541 + **** HLC Comparison for Ordering 1542 + #+BEGIN_SRC typescript 1543 + function compareHLC(a: HLC, b: HLC): number { 1544 + if (a.millis !== b.millis) return a.millis - b.millis; 1545 + if (a.counter !== b.counter) return a.counter - b.counter; 1546 + return a.node.localeCompare(b.node); 1547 + } 1548 + 1549 + // Determine if caught up 1550 + function isCaughtUp(myHLC: HLC, peerHLC: HLC): boolean { 1551 + return compareHLC(myHLC, peerHLC) >= 0; 1552 + } 1553 + #+END_SRC 1554 + 1555 + *** Handling Out-of-Order Events 1556 + 1557 + **** Idempotent Reducers (No Replay Needed) 1558 + Design reducers to handle events arriving out of order: 1559 + 1560 + #+BEGIN_SRC typescript 1561 + // HLC-aware reducer that handles out-of-order events 1562 + function reducePlayPosition(state, event) { 1563 + if (event.type === 'updatePosition') { 1564 + const existing = state.positions[event.episodeId]; 1565 + // Only update if this event is newer 1566 + if (!existing || compareHLC(event.hlc, existing.hlc) > 0) { 1567 + state.positions[event.episodeId] = { 1568 + position: event.position, 1569 + hlc: event.hlc // Track which event set this 1570 + }; 1571 + } 1572 + } 1573 + } 1574 + #+END_SRC 1575 + 1576 + **** Example: Offline Device Rejoining 1577 + #+BEGIN_SRC typescript 1578 + // Device A offline for a week, comes back with old events 1579 + Device A: [ 1580 + { hlc: "1000:0:A", type: "markPlayed", episode: "ep1" }, 1581 + { hlc: "1100:0:A", type: "updatePosition", episode: "ep1", position: 500 } 1582 + ] 1583 + 1584 + // Device B already has newer event 1585 + Device B: [ 1586 + { hlc: "1050:0:B", type: "updatePosition", episode: "ep1", position: 1000 } 1587 + ] 1588 + 1589 + // Smart reducer produces correct final state 1590 + finalState = { 1591 + "ep1": { 1592 + played: true, // from 1000:0:A 1593 + position: 1000, // from 1050:0:B (newer HLC wins) 1594 + lastPositionHLC: "1050:0:B" 1595 + } 1596 + } 1597 + #+END_SRC 1598 + 1599 + Key principles: 1600 + - Store HLC with state changes 1601 + - Use "last write wins" with HLC comparison 1602 + - Make operations commutative when possible 1603 + - No need for full replay when inserting old events

+81 -1265

readme-devlog.org

··· 1 #+PROPERTY: COOKIE_DATA recursive 2 #+STARTUP: overview 3 4 - most of this is old, I need to rework it 5 - 6 - * design 7 - 8 - ** frontend (packages/app) 9 - - http://localhost:7891 10 - - proxies ~/api~ and ~/sync~ to the backend in development 11 - - uses Dexie for local storage with sync plugin 12 - - custom sync replication implementation using PeerJS through the signalling server 13 - 14 - ** backend (packages/server) 15 - - http://localhost:7890 16 - - serves ~/dist~ if the directory is present (see ~dist~ script) 17 - - serves ~/api~ for RSS caching proxy 18 - - file-based routing under the api directory 19 - - serves ~/sync~ which is a ~peerjs~ signalling server 20 - 21 - ** sync 22 - - each client keeps the full data set 23 - - dexie sync and observable let us stream change sets 24 - - we can publish the "latest" to all peers 25 - - on first pull, if not the first client, we can request a dump out of band 26 - 27 - *** rss feed data 28 - - do we want to backup feed data? 29 - - conceptually, this should be refetchable 30 - - but feeds go away, and some will only show recent stories 31 - - so yes, we'll need this 32 - - but server side, we can dedupe 33 - - content-addressed server-side cache? 34 - 35 - - server side does RSS pulling 36 - - can feeds be marked private, such that they won't be pulled through the proxy? 37 - - but then we require everything to be fetchable via cors 38 - - client configured proxy settings? 39 - 40 - *** peer connection 41 - - on startup, check for current realm-id and key pair 42 - - if not present, ask to login or start new 43 - - if login, run through the [[* pairing]] process 44 - - if start new, run through the [[* registration]] process 45 - - use keypair to authenticate to server 46 - - response includes list of active peers to connect 47 - - clients negotiate sync from there 48 - - an identity is a keypair and a realm 49 - 50 - - realm is uuid 51 - - realm on the server is the socket connection for peer discovery 52 - - keeps a list of verified public keys 53 - - and manages the /current/ ~public-key->peer ids~ mapping 54 - - realm on the client side is first piece of info required for sync 55 - - when connecting to the signalling server, you present a realm, and a signed public key 56 - - server accepts/rejects based on signature and current verified keys 57 - 58 - - a new keypair can create a realm 59 - 60 - - a new keypair can double sign an invitation 61 - - invite = ~{ realm:, nonce:, not_before:, not_after:, authorizer: }~, signed with verified key 62 - - exchanging an invite = ~{ invite: }~, signed with my key 63 - 64 - - on startup 65 - - start stand-alone (no syncing required, usually the case on first-run) 66 - - generate a keypair 67 - - want server backup? 68 - - sign a "setup" message with new keypair and send to the server 69 - - server responds with a new realm, that this keypair is already verified for 70 - - move along 71 - - exchange invite to sync to other devices 72 - - generate a keypair 73 - - sign the exchange message with the invite and send to the server 74 - - server verifies the invite 75 - - adds the new public key to the peer list and publishes downstream 76 - - move along 77 - 78 - ***** standalone 79 - in this mode, there is no syncing. this is the most likely first-time run option. 80 - 81 - - generate a keypair on startup, so we have a stable fingerprint in the future 82 - - done 83 - 84 - ***** pairing 85 - in this mode, there is syncing to a named realm, but not necessarily server resources consumed 86 - we don't need an email, since the server is just doing signalling and peer management 87 - 88 - - generate an invite from an existing verified peer 89 - - ~{ realm:, not_before:, not_after:, inviter: peer.public_key }~ 90 - - sign that invitation from the existing verified peer 91 - 92 - - standalone -> paired 93 - - get the invitation somehow (QR code?) 94 - - sign an invite exchange with the standalone's public key 95 - - send to server 96 - - server verifies the invite 97 - - adds the new public key to the peer list and publishes downstream 98 - 99 - ***** server backup 100 - in this mode, there is syncing to a named realm by email. 101 - 102 - goal of server backup mode is that we can go from email->fully working client with latest data without having to have any clients left around that could participate in the sync. 103 - 104 - - generate a keypair on startup 105 - - sign a registration message sent to the server 106 - - send a verification email 107 - - if email/realm already exists, this is authorization 108 - - if not, it's email validation 109 - - server starts a realm and associates the public key 110 - - server acts as a peer for the realm, and stores private data 111 - 112 - - since dexie is publishing change sets, we should be able to just store deltas 113 - - but we'll need to store _all_ deltas, unless we're materializing on the server side too 114 - - should we use an indexdb shim so we can import/export from the server for clean start? 115 - - how much materialization does the server need? 116 - 117 - * ai instructions 118 - - when writing to the devlog, add tags to your entries specifying ~:ai:~ and what tool did it. 119 - - false starts and prototypes are in ~./devlog/~ 120 - 121 - * notes and decision record [1/11] 122 - ** architecture design (may 28-29) :ai:claude: 123 - 124 - details notes are in [[./devlog/may-29.org]] 125 - key decisions and system design: 126 - 127 - *** sync model 128 - - device-specific records for playback state/queues to avoid conflicts 129 - - content-addressed server cache with deduplication 130 - - dual-JWT invitation flow for secure realm joining 131 - 132 - *** data structures 133 - - tag-based filtering system instead of rigid hierarchies 134 - - regex patterns for episode title parsing and organization 135 - - service worker caching with background download support 136 - 137 - *** core schemas 138 - **** client (dexie) 139 - - Channel/ChannelEntry for RSS feeds and episodes 140 - - PlayRecord/QueueItem scoped by deviceId 141 - - FilterView for virtual feed organization 142 - 143 - **** server (drizzle) 144 - - ContentStore for deduplicated content by hash 145 - - Realm/PeerConnection for sync authorization 146 - - HttpCache with health tracking and TTL 147 - 148 - *** push sync strategy 149 - - revision-based sync (just send revision ranges in push notifications) 150 - - background fetch API for large downloads where supported 151 - - graceful degradation to reactive caching 152 - 153 - *** research todos :ai:claude: 154 - 155 - **** sync and data management 156 - ***** DONE identity and signature management 157 - ***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation 158 - ***** TODO webrtc p2p sync implementation patterns and reliability 159 - ***** TODO conflict resolution strategies for device-specific data in distributed sync 160 - ***** TODO content-addressed deduplication algorithms for rss/podcast content 161 - **** client-side storage and caching 162 - ***** TODO opfs storage limits and cleanup strategies for client-side caching 163 - ***** TODO practical background fetch api limits and edge cases for podcast downloads 164 - **** automation and intelligence 165 - ***** TODO llm-based regex generation for episode title parsing automation 166 - ***** TODO push notification subscription management and realm authentication 167 - **** platform and browser capabilities 168 - ***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip) 169 - ***** TODO progressive web app installation and platform-specific behaviors 170 - 171 - # Local Variables: 172 - # org-hierarchical-todo-statistics: nil 173 - # org-checkbox-hierarchical-statistics: nil 174 - # End: 175 - 176 - ** <2025-05-28 Wed> 177 - getting everything setup 178 - 179 - the biggest open question I have is what sort of privacy/encryption guarantee I need. I want the server to be able to do things like cache and store feed data long-term. 180 - 181 - Is "if you want full privacy, self-host" valid? 182 - 183 - *** possibilities 184 - 185 - - fully PWA 186 - - CON: cors, which would require a proxy anyway 187 - - CON: audio analysis, llm based stuff for categorization, etc. won't work 188 - - PRO: private as all get out 189 - - can still do WebRTC p2p sync for resiliancy 190 - - can still do server backups, if sync stream is encrypted, but no compaction would be available 191 - - could do _explicit_ server backups as dump files 192 - 193 - - self hostable 194 - - PRO: can do bunches of private stuff on the server, because if you don't want me to see it, do it elsewhere 195 - - CON: hard for folk to use 196 - 197 - *** brainstorm :ai:claude: 198 - **** sync conflict resolution design discussion :ai:claude: 199 - 200 - discussed the sync architecture and dexie conflict handling: 201 - 202 - *dexie syncable limitations*: 203 - - logical clocks handle causally-related changes well 204 - - basic timestamp-based conflict resolution for concurrent updates 205 - - last-writer-wins for same field conflicts 206 - - no sophisticated CRDT or vector clock support 207 - 208 - *solutions for podcast-specific conflicts*: 209 - 210 - - play records: device-specific approach 211 - - store separate ~play_records~ per ~device_id~ 212 - - each record: ~{ episode_id, device_id, position, completed, timestamp }~ 213 - - UI handles conflict resolution with "continue from X device?" prompts 214 - - avoids arbitrary timestamp wins, gives users control 215 - 216 - - subscription trees 217 - - store ~parent_path~ as single string field ("/Tech/Programming") 218 - - simpler than managing folder membership tables 219 - - conflicts still possible but contained to single field 220 - - could store move operations as events for richer resolution 221 - 222 - *other sync considerations*: 223 - - settings/preferences: distinguish device-local vs global 224 - - bulk operations: "mark all played" can create duplicate operations 225 - - metadata updates: server RSS updates vs local renames 226 - - temporal ordering: recently played lists, queue reordering 227 - - storage limits: cleanup operations conflicting across devices 228 - - feed state: refresh timestamps, error states 229 - 230 - *approach*: prefer "events not state" pattern and device-specific records where semantic conflicts are likely 231 - 232 - **** data model brainstorm :ai:claude: 233 - 234 - core entities designed with sync in mind: 235 - 236 - ***** ~Feed~ :: RSS/podcast subscription 237 - - ~parent_path~ field for folder structure (eg. ~/Tech/Programming~) 238 - - ~is_private~ flag to skip server proxy 239 - - ~refresh_interval~ for custom update frequencies 240 - 241 - ***** ~Episode~ :: individual podcast episodes 242 - - standard RSS metadata (guid, title, description, media url) 243 - - duration and file info for playback 244 - 245 - ***** ~PlayRecord~ :: device-specific playback state 246 - - separate record per ~device_id~ to avoid timestamp conflicts 247 - - position, completed status, playback speed 248 - - UI can prompt "continue from X device?" for resolution 249 - 250 - ***** ~QueueItem~ :: device-specific episode queue 251 - - ordered list with position field 252 - - ~device_id~ scoped to avoid queue conflicts 253 - 254 - ***** ~Subscription~ :: feed membership settings 255 - - can be global or device-specific 256 - - auto-download preferences per device 257 - 258 - ***** ~Settings~ :: split global vs device-local 259 - - theme, default speed = global 260 - - download path, audio device = device-local 261 - 262 - ***** Event tables for complex operations: 263 - - ~FeedMoveEvent~ for folder reorganization 264 - - ~BulkMarkPlayedEvent~ for "mark all read" operations 265 - - better conflict resolution than direct state updates 266 - 267 - ***** sync considerations 268 - - device identity established on first run 269 - - dexie syncable handles basic timestamp conflicts 270 - - prefer device-scoped records for semantic conflicts 271 - - event-driven pattern for bulk operations 272 - 273 - **** schema evolution from previous iteration :ai:claude: 274 - 275 - reviewed existing schema from tmp/feed.ts - well designed foundation: 276 - 277 - ***** keep from original 278 - - Channel/ChannelEntry naming and structure 279 - - ~refreshHP~ adaptive refresh system (much better than simple intervals) 280 - - rich podcast metadata (people, tags, enclosure, podcast object) 281 - - HTTP caching with etag/status tracking 282 - - epoch millisecond timestamps 283 - - ~hashId()~ approach for entry IDs 284 - 285 - ***** add for multi-device sync 286 - - ~PlayState~ table (device-scoped position/completion) 287 - - Subscription table (with ~parentPath~ for folders, device-scoped settings) 288 - - ~QueueItem~ table (device-scoped episode queues) 289 - - Device table (identity management) 290 - 291 - ***** migration considerations 292 - - existing Channel/ChannelEntry can be preserved 293 - - new tables are additive 294 - - ~fetchAndUpsert~ method works well with server proxy architecture 295 - - dexie sync vs rxdb - need to evaluate change tracking capabilities 296 - 297 - **** content-addressed caching for offline resilience :ai:claude: 298 - 299 - designed caching system for when upstream feeds fail/disappear, building on existing cache-schema.ts: 300 - 301 - ***** server-side schema evolution (drizzle sqlite): 302 - - keep existing ~httpCacheTable~ design (health tracking, http headers, ttl) 303 - - add ~contentHash~ field pointing to deduplicated content 304 - - new ~contentStoreTable~: deduplicated blobs by sha256 hash 305 - - new ~contentHistoryTable~: url -> contentHash timeline with isLatest flag 306 - - reference counting for garbage collection 307 - 308 - ***** client-side OPFS storage 309 - - ~/cache/content/{contentHash}.xml~ for raw feeds 310 - - ~/cache/media/{contentHash}.mp3~ for podcast episodes 311 - - ~LocalCacheEntry~ metadata tracks expiration and offline-only flags 312 - - maintains last N versions per feed for historical access 313 - 314 - ***** fetch strategy & fallback 315 - 1. check local OPFS cache first (fastest) 316 - 2. try server proxy ~/api/feed?url={feedUrl}~ (deduplicated) 317 - 3. server checks ~contentHistory~, serves latest or fetches upstream 318 - 4. server returns ~{contentHash, content, cached: boolean}~ 319 - 5. client stores with content hash as filename 320 - 6. emergency mode: serve stale content when upstream fails 321 - 322 - - preserves existing health tracking and HTTP caching logic 323 - - popular feeds cached once on server, many clients benefit 324 - - bandwidth savings via content hash comparison 325 - - historical feed state preservation (feeds disappear!) 326 - - true offline operation after initial sync 327 - 328 - ** <2025-05-29 Thu> :ai:claude: 329 - e2e encryption and invitation flow design 330 - 331 - worked through the crypto and invitation architecture. key decisions: 332 - 333 - *** keypair strategy 334 - - use jwk format for interoperability (server stores public keys) 335 - - ed25519 for signing, separate x25519 for encryption if needed 336 - - zustand lazy initialization pattern: ~ensureKeypair()~ on first use 337 - - store private jwk in persisted zustand state 338 - 339 - *** invitation flow: dual-jwt approach 340 - solved the chicken-and-egg problem of sharing encryption keys securely. 341 - 342 - **** qr code contains two signed jwts: 343 - 1. invitation token: ~{iss: inviter_fingerprint, sub: invitation_id, purpose: "realm_invite"}~ 344 - 2. encryption key token: ~{iss: inviter_fingerprint, ephemeral_private: base64_key, purpose: "ephemeral_key"}~ 345 - 346 - **** exchange process: 347 - 1. invitee posts jwt1 + their public keys to ~/invitations~ 348 - 2. server verifies jwt1 signature against realm members 349 - 3. if valid: adds invitee to realm, returns ~{realm_id, realm_members, encrypted_realm_key}~ 350 - 4. invitee verifies jwt2 signature against returned realm members 351 - 5. invitee extracts ephemeral private key, decrypts realm encryption key 352 - 353 - **** security properties: 354 - - server never has decryption capability (missing ephemeral private key) 355 - - both jwts must be signed by verified realm member 356 - - if first exchange fails, second jwt is cryptographically worthless 357 - - atomic operation: identity added only if invitation valid 358 - - built-in expiration and tamper detection via jwt standard 359 - 360 - **** considered alternatives: 361 - - raw ephemeral keys in qr: simpler but no authenticity 362 - - ecdh key agreement: chicken-and-egg problem with public key exchange 363 - - server escrow: good but missing authentication layer 364 - - password-based: requires secure out-of-band sharing 365 - 366 - the dual-jwt approach provides proper authenticated invitations while maintaining e2e encryption properties. 367 - 368 - **** refined dual-jwt with ephemeral signing 369 - simplified the approach by using ephemeral key for second jwt signature: 370 - 371 - **setup**: 372 - 1. inviter generates ephemeral keypair 373 - 2. encrypts realm key with ephemeral private key 374 - 3. posts to server: ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~ 375 - 376 - **qr code contains**: 377 - #+BEGIN_SRC json 378 - // JWT 1: signed with inviter's realm signing key 379 - { 380 - "realm_id": "uuid", 381 - "invitation_id": "uuid", 382 - "iss": "inviter_fingerprint" 383 - } 384 - 385 - // JWT 2: signed with ephemeral private key 386 - { 387 - "ephemeral_private": "base64_key", 388 - "invitation_id": "uuid" 389 - } 390 - #+END_SRC 391 - 392 - **exchange flow**: 393 - 1. submit jwt1 → server verifies against realm members → returns ~{invitation_id, realm_id, ephemeral_public, encrypted_realm_key}~ 394 - 2. verify jwt2 signature using ~ephemeral_public~ from server response 395 - 3. extract ~ephemeral_private~ from jwt2, decrypt realm key 396 - 397 - **benefits over previous version**: 398 - - no premature key disclosure (invitee keys shared via normal webrtc peering) 399 - - self-contained verification (ephemeral public key verifies jwt2) 400 - - cleaner separation of realm auth vs encryption key distribution 401 - - simpler flow (no need to return realm member list) 402 - 403 - **crypto verification principle**: digital signatures work as sign-with-private/verify-with-public, while encryption works as encrypt-with-public/decrypt-with-private. jwt2 verification uses signature verification, not decryption. 404 - 405 - **invitation flow diagram**: 406 - #+BEGIN_SRC mermaid 407 - sequenceDiagram 408 - participant I as Inviter 409 - participant S as Server 410 - participant E as Invitee 411 - 412 - Note over I: Generate ephemeral keypair 413 - I->>I: ephemeral_private, ephemeral_public 414 - 415 - Note over I: Encrypt realm key 416 - I->>I: encrypted_realm_key = encrypt(realm_key, ephemeral_private) 417 - 418 - I->>S: POST /invitations {invitation_id, realm_id, ephemeral_public, encrypted_realm_key} 419 - S-->>I: OK 420 - 421 - Note over I: Create JWTs for QR code 422 - I->>I: jwt1 = sign({realm_id, invitation_id}, inviter_private) 423 - I->>I: jwt2 = sign({ephemeral_private, invitation_id}, ephemeral_private) 424 - 425 - Note over I,E: QR code contains [jwt1, jwt2] 426 - 427 - E->>S: POST /invitations/exchange {jwt1} 428 - Note over S: Verify jwt1 signature against realm members 429 - S-->>E: {invitation_id, realm_id, ephemeral_public, encrypted_realm_key} 430 - 431 - Note over E: Verify jwt2 signature using ephemeral_public 432 - E->>E: verify_signature(jwt2, ephemeral_public) 433 - 434 - Note over E: Extract key and decrypt 435 - E->>E: ephemeral_private = decode(jwt2) 436 - E->>E: realm_key = decrypt(encrypted_realm_key, ephemeral_private) 437 - 438 - Note over E: Now member of realm! 439 - #+END_SRC 440 - 441 - **** jwk keypair generation and validation :ai:claude: 442 - 443 - discussed jwk vs raw crypto.subtle for keypair storage. since public keys need server storage for realm authorization, jwk is better for interoperability. 444 - 445 - **keypair generation**: 446 - #+BEGIN_SRC typescript 447 - const keypair = await crypto.subtle.generateKey( 448 - { name: "Ed25519" }, 449 - true, 450 - ["sign", "verify"] 451 - ); 452 - 453 - const publicJWK = await crypto.subtle.exportKey("jwk", keypair.publicKey); 454 - const privateJWK = await crypto.subtle.exportKey("jwk", keypair.privateKey); 455 - 456 - // JWK format: 457 - { 458 - "kty": "OKP", 459 - "crv": "Ed25519", 460 - "x": "base64url-encoded-public-key", 461 - "d": "base64url-encoded-private-key" // only in private JWK 462 - } 463 - #+END_SRC 464 - 465 - **client validation**: 466 - #+BEGIN_SRC typescript 467 - function isValidEd25519PublicJWK(jwk: any): boolean { 468 - return ( 469 - typeof jwk === 'object' && 470 - jwk.kty === 'OKP' && 471 - jwk.crv === 'Ed25519' && 472 - typeof jwk.x === 'string' && 473 - jwk.x.length === 43 && // base64url Ed25519 public key length 474 - !jwk.d && // public key shouldn't have private component 475 - !jwk.use || jwk.use === 'sig' 476 - ); 477 - } 478 - 479 - async function validatePublicKey(publicJWK: JsonWebKey): Promise<CryptoKey | null> { 480 - try { 481 - if (!isValidEd25519PublicJWK(publicJWK)) return null; 482 - 483 - const key = await crypto.subtle.importKey( 484 - 'jwk', 485 - publicJWK, 486 - { name: 'Ed25519' }, 487 - false, 488 - ['verify'] 489 - ); 490 - 491 - return key; 492 - } catch { 493 - return null; 494 - } 495 - } 496 - #+END_SRC 497 - 498 - **server validation (node.js)**: 499 - #+BEGIN_SRC typescript 500 - import { webcrypto } from 'node:crypto'; 501 - 502 - async function validateClientPublicKey(publicJWK: JsonWebKey): Promise<boolean> { 503 - try { 504 - if (!isValidEd25519PublicJWK(publicJWK)) return false; 505 - 506 - await webcrypto.subtle.importKey( 507 - 'jwk', 508 - publicJWK, 509 - { name: 'Ed25519' }, 510 - false, 511 - ['verify'] 512 - ); 513 - 514 - return true; 515 - } catch { 516 - return false; 517 - } 518 - } 519 - #+END_SRC 520 - 521 - **authentication flow**: 522 - #+BEGIN_SRC typescript 523 - // client signs message 524 - const authMessage = { 525 - realm: 'uuid-here', 526 - timestamp: Date.now(), 527 - action: 'join' 528 - }; 529 - 530 - const signature = await crypto.subtle.sign( 531 - 'Ed25519', 532 - privateKey, 533 - new TextEncoder().encode(JSON.stringify(authMessage)) 534 - ); 535 - 536 - // server verifies 537 - async function verifyAuth(req: AuthRequest): Promise<boolean> { 538 - const publicKey = await webcrypto.subtle.importKey( 539 - 'jwk', 540 - req.publicKey, 541 - { name: 'Ed25519' }, 542 - false, 543 - ['verify'] 544 - ); 545 - 546 - const messageBytes = new TextEncoder().encode(JSON.stringify(req.message)); 547 - const signatureBytes = new Uint8Array(req.signature); 548 - 549 - return await webcrypto.subtle.verify( 550 - 'Ed25519', 551 - publicKey, 552 - signatureBytes, 553 - messageBytes 554 - ); 555 - } 556 - #+END_SRC 557 - 558 - **** proposed schemas :ai:claude: 559 - 560 - ***** client-side schema (dexie) 561 - 562 - #+BEGIN_SRC typescript 563 - // Core RSS/Podcast data (from your existing design) 564 - interface Channel { 565 - id: string; 566 - feedUrl: string; 567 - htmlUrl?: string; 568 - imageUrl?: string; 569 - title?: string; 570 - description?: string; 571 - language?: string; 572 - people?: Record<string, string>; 573 - tags?: string[]; 574 - 575 - // Refresh management 576 - refreshHP: number; 577 - nextRefreshAt?: number; 578 - lastRefreshAt?: number; 579 - lastRefreshStatus?: string; 580 - lastRefreshHttpStatus?: number; 581 - lastRefreshHttpEtag?: string; 582 - 583 - // Cache info 584 - contentHash?: string; 585 - lastFetchedAt?: number; 586 - } 587 - 588 - interface ChannelEntry { 589 - id: string; 590 - channelId: string; 591 - guid: string; 592 - title: string; 593 - linkUrl?: string; 594 - imageUrl?: string; 595 - snippet?: string; 596 - content?: string; 597 - 598 - enclosure?: { 599 - url: string; 600 - type?: string; 601 - length?: number; 602 - }; 603 - 604 - podcast?: { 605 - explicit?: boolean; 606 - duration?: string; 607 - seasonNum?: number; 608 - episodeNum?: number; 609 - transcriptUrl?: string; 610 - }; 611 - 612 - publishedAt?: number; 613 - fetchedAt?: number; 614 - } 615 - 616 - // Device-specific sync tables 617 - interface PlayRecord { 618 - id: string; 619 - entryId: string; 620 - deviceId: string; 621 - position: number; 622 - duration?: number; 623 - completed: boolean; 624 - speed: number; 625 - updatedAt: number; 626 - } 627 - 628 - interface Subscription { 629 - id: string; 630 - channelId: string; 631 - deviceId?: string; 632 - parentPath: string; // "/Tech/Programming" 633 - autoDownload: boolean; 634 - downloadLimit?: number; 635 - isActive: boolean; 636 - createdAt: number; 637 - updatedAt: number; 638 - } 639 - 640 - interface QueueItem { 641 - id: string; 642 - entryId: string; 643 - deviceId: string; 644 - position: number; 645 - addedAt: number; 646 - } 647 - 648 - interface Device { 649 - id: string; 650 - name: string; 651 - platform: string; 652 - lastSeen: number; 653 - } 654 - 655 - // Local cache metadata 656 - interface LocalCache { 657 - id: string; 658 - url: string; 659 - contentHash: string; 660 - filePath: string; // OPFS path 661 - cachedAt: number; 662 - expiresAt?: number; 663 - size: number; 664 - isOfflineOnly: boolean; 665 - } 666 - 667 - // Dexie schema 668 - const db = new Dexie('SkypodDB'); 669 - db.version(1).stores({ 670 - channels: '&id, feedUrl, contentHash', 671 - channelEntries: '&id, channelId, publishedAt', 672 - playRecords: '&id, [entryId+deviceId], deviceId, updatedAt', 673 - subscriptions: '&id, channelId, deviceId, parentPath', 674 - queueItems: '&id, entryId, deviceId, position', 675 - devices: '&id, lastSeen', 676 - localCache: '&id, url, contentHash, expiresAt' 677 - }); 678 - #+END_SRC 679 680 - ***** server-side schema 681 682 - #+BEGIN_SRC typescript 683 - // Content-addressed cache 684 - interface ContentStore { 685 - contentHash: string; // Primary key 686 - content: Buffer; // Raw feed content 687 - contentType: string; 688 - contentLength: number; 689 - firstSeenAt: number; 690 - referenceCount: number; 691 - } 692 693 - interface ContentHistory { 694 - id: string; 695 - url: string; 696 - contentHash: string; 697 - fetchedAt: number; 698 - isLatest: boolean; 699 - } 700 701 - // HTTP cache with health tracking (from your existing design) 702 - interface HttpCache { 703 - key: string; // URL hash, primary key 704 - url: string; 705 706 - status: 'alive' | 'dead'; 707 - lastFetchedAt: number; 708 - lastFetchError?: string; 709 - lastFetchErrorStreak: number; 710 711 - lastHttpStatus: number; 712 - lastHttpEtag?: string; 713 - lastHttpHeaders: Record<string, string>; 714 - expiresAt: number; 715 - expirationTtl: number; 716 717 - contentHash: string; // Points to ContentStore 718 - } 719 720 - // Sync/auth tables 721 - interface Realm { 722 - id: string; // UUID 723 - createdAt: number; 724 - verifiedKeys: string[]; // Public key list 725 - } 726 727 - interface PeerConnection { 728 - id: string; 729 - realmId: string; 730 - publicKey: string; 731 - lastSeen: number; 732 - isOnline: boolean; 733 - } 734 735 - // Media cache for podcast episodes 736 - interface MediaCache { 737 - contentHash: string; // Primary key 738 - originalUrl: string; 739 - mimeType: string; 740 - fileSize: number; 741 - content: Buffer; 742 - cachedAt: number; 743 - accessCount: number; 744 - } 745 - #+END_SRC 746 747 - **** episode title parsing for sub-feed groupings :ai:claude: 748 749 - *problem*: some podcast feeds contain multiple shows, need hierarchical organization within a feed 750 751 - *example*: "Apocalypse Players" podcast 752 - - episode title: "A Term of Art 6 - Winston's Hollow" 753 - - desired grouping: "Apocalypse Players > A Term of Art > 6 - Winston's Hollow" 754 - - UI shows sub-shows within the main feed 755 756 - ***** approaches considered 757 758 - 1. *manual regex patterns* (short-term solution) 759 - - user provides regex with capture groups = tags 760 - - reliable, immediate, user-controlled 761 - - requires manual setup per feed 762 763 - 2. *LLM-generated regex* (automation goal) 764 - - analyze last 100 episode titles 765 - - generate regex pattern automatically 766 - - good balance of automation + reliability 767 768 - 3. *NER model training* (experimental) 769 - - train spacy model for episode title parsing 770 - - current prototype: 150 labelled examples, limited success 771 - - needs more training data to be viable 772 773 - ***** data model implications 774 775 - - add regex pattern field to Channel/Feed 776 - - store extracted groupings as hierarchical tags on ~ChannelEntry~ 777 - - maybe add grouping/series field to episodes 778 779 - ***** plan 780 781 - *preference*: start with manual regex, evolve toward LLM automation 782 783 - *implementation design*: 784 - - if no title pattern: episodes are direct children of the feed 785 - - title pattern = regex with named capture groups + path template 786 787 - *example configuration*: 788 - - regex: ~^(?<series>[^0-9]+)\s*(?<episode>\d+)\s*-\s*(?<title>.+)$~ 789 - - path template: ~{series} > Episode {episode} - {title}~ 790 - - result: "A Term of Art 6 - Winston's Hollow" → "A Term of Art > Episode 6 - Winston's Hollow" 791 792 - *schema additions*: 793 - #+BEGIN_SRC typescript 794 - interface Channel { 795 - // ... existing fields 796 - titlePatterns?: Array<{ 797 - name: string; // "Main Episodes", "Bonus Content", etc. 798 - regex: string; // named capture groups 799 - pathTemplate: string; // interpolation template 800 - priority: number; // order to try patterns (lower = first) 801 - isActive: boolean; // can disable without deleting 802 - }>; 803 - fallbackPath?: string; // template for unmatched episodes 804 - } 805 806 - interface ChannelEntry { 807 - // ... existing fields 808 - parsedPath?: string; // computed from titlePattern 809 - parsedGroups?: Record<string, string>; // captured groups 810 - matchedPatternName?: string; // which pattern was used 811 - } 812 - #+END_SRC 813 814 - *pattern matching logic*: 815 - 1. try patterns in priority order (lower number = higher priority) 816 - 2. first matching pattern wins 817 - 3. if no patterns match, use fallbackPath template (e.g., "Misc > {title}") 818 - 4. if no fallbackPath, episode stays direct child of feed 819 820 - *example multi-pattern setup*: 821 - - Pattern 1: "Main Episodes" - ~^(?<series>[^0-9]+)\s*(?<episode>\d+)~ → ~{series} > Episode {episode}~ 822 - - Pattern 2: "Bonus Content" - ~^Bonus:\s*(?<title>.+)~ → ~Bonus > {title}~ 823 - - Fallback: ~Misc > {title}~ 824 825 - **** scoped tags and filter-based UI evolution :ai:claude: 826 827 - *generalization*: move from rigid hierarchies to tag-based filtering system 828 829 - *tag scoping*: 830 - - feed-level tags: "Tech", "Gaming", "D&D" 831 - - episode-level tags: from regex captures like "series:CriticalRole", "campaign:2", "type:main" 832 - - user tags: manual additions like "favorites", "todo" 833 - 834 - *UI as tag filtering*: 835 - - default view: all episodes grouped by feed 836 - - filter by ~series:CriticalRole~ → shows only CR episodes across all feeds 837 - - filter by ~type:bonus~ → shows bonus content from all podcasts 838 - - combine filters: ~series:CriticalRole AND type:main~ → main CR episodes only 839 - 840 - *benefits*: 841 - - no rigid hierarchy - users create their own views 842 - - regex patterns become automated episode taggers 843 - - same filtering system works for search, organization, queues 844 - - tags are syncable metadata, views are client-side 845 - 846 - *schema evolution*: 847 - #+BEGIN_SRC typescript 848 - interface Tag { 849 - scope: 'feed' | 'episode' | 'user'; 850 - key: string; // "series", "type", "campaign" 851 - value: string; // "CriticalRole", "bonus", "2" 852 - } 853 - 854 - interface ChannelEntry { 855 - // ... existing 856 - tags: Tag[]; // includes regex-generated + manual 857 - } 858 - 859 - interface FilterView { 860 - id: string; 861 - name: string; 862 - folderPath: string; // "/Channels/Critical Role" 863 - filters: Array<{ 864 - key: string; 865 - value: string; 866 - operator: 'equals' | 'contains' | 'not'; 867 - }>; 868 - isDefault: boolean; 869 - createdAt: number; 870 - } 871 - #+END_SRC 872 - 873 - **** default UI construction and feed merging :ai:claude: 874 - 875 - *auto-generated views on subscribe*: 876 - - subscribe to "Critical Role" → creates ~/Channels/Critical Role~ folder 877 - - default filter view: ~feed:CriticalRole~ (shows all episodes from that feed) 878 - - user can customize, split into sub-views, or delete 879 - 880 - *smart view suggestions*: 881 - - after regex patterns generate tags, suggest splitting views 882 - - "I noticed episodes with ~series:Campaign2~ and ~series:Campaign3~ - create separate views?" 883 - - "Create view for ~type:bonus~ episodes?" 884 - 885 - *view management UX*: 886 - - right-click feed → "Split by series", "Split by type" 887 - - drag episodes between views to create manual filters 888 - - views can be nested: ~/Channels/Critical Role/Campaign 2/Main Episodes~ 889 - 890 - *feed merging for multi-source shows*: 891 - problem: patreon feed + main show feed for same podcast 892 - 893 - #+BEGIN_EXAMPLE 894 - /Channels/ 895 - Critical Role/ 896 - All Episodes # merged view: feed:CriticalRole OR feed:CriticalRolePatreon 897 - Main Feed # filter: feed:CriticalRole 898 - Patreon Feed # filter: feed:CriticalRolePatreon 899 - #+END_EXAMPLE 900 - 901 - *deduplication strategy*: 902 - - episodes matched by ~guid~ or similar content hash 903 - - duplicate episodes get ~source:main,patreon~ tags 904 - - UI shows single episode with source indicators 905 - - user can choose preferred source for playback 906 - - play state syncs across all sources of same episode 907 908 - *feed relationship schema*: 909 - #+BEGIN_SRC typescript 910 - interface FeedGroup { 911 - id: string; 912 - name: string; // "Critical Role" 913 - feedIds: string[]; // [mainFeedId, patreonFeedId] 914 - mergeStrategy: 'guid' | 'title' | 'contentHash'; 915 - defaultView: FilterView; 916 - } 917 918 - interface ChannelEntry { 919 - // ... existing 920 - duplicateOf?: string; // points to canonical episode ID 921 - sources: string[]; // feed IDs where this episode appears 922 - } 923 - #+END_SRC 924 925 - **per-view settings and state**: 926 - each filter view acts like a virtual feed with its own: 927 - - unread counts (episodes matching filter that haven't been played) 928 - - notification settings (notify for new episodes in this view) 929 - - muted state (hide notifications, mark as read automatically) 930 - - auto-download preferences (download episodes that match this filter) 931 - - play queue integration (add new episodes to queue) 932 933 - **use cases**: 934 - - mute "Bonus Content" view but keep notifications for main episodes 935 - - auto-download only "Campaign 2" episodes, skip everything else 936 - - separate unread counts: "5 unread in Main Episodes, 2 in Bonus" 937 - - queue only certain series automatically 938 939 - **schema additions**: 940 - #+BEGIN_SRC typescript 941 - interface FilterView { 942 - // ... existing fields 943 - settings: { 944 - notificationsEnabled: boolean; 945 - isMuted: boolean; 946 - autoDownload: boolean; 947 - autoQueue: boolean; 948 - downloadLimit?: number; // max episodes to keep 949 - }; 950 - state: { 951 - unreadCount: number; 952 - lastViewedAt?: number; 953 - isCollapsed: boolean; // in sidebar 954 - }; 955 - } 956 - #+END_SRC 957 958 - *inheritance behavior*: 959 - - new filter views inherit settings from parent feed/group 960 - - user can override per-view 961 - - "mute all Critical Role" vs "mute only bonus episodes" 962 963 - **** client-side episode caching strategy :ai:claude: 964 965 - *architecture*: service worker-based transparent caching 966 967 - *flow*: 968 - 1. audio player requests ~/audio?url={episodeUrl}~ 969 - 2. service worker intercepts request 970 - 3. if present in cache (with Range header support): 971 - - serve from cache 972 - 4. else: 973 - - let request continue to server (immediate playback) 974 - - simultaneously start background fetch of full audio file 975 - - when complete, broadcast "episode-cached" event 976 - - audio player catches event and restarts feed → now uses cached version 977 - 978 - **benefits**: 979 - - no playback interruption (streaming starts immediately) 980 - - seamless transition to cached version 981 - - Range header support for seeking/scrubbing 982 - - transparent to audio player implementation 983 - 984 - *implementation considerations*: 985 - - cache storage limits and cleanup policies 986 - - partial download resumption if interrupted 987 - - cache invalidation when episode URLs change 988 - - offline playback support 989 - - progress tracking for background downloads 990 - 991 - **schema additions**: 992 - #+BEGIN_SRC typescript 993 - interface CachedEpisode { 994 - episodeId: string; 995 - originalUrl: string; 996 - cacheKey: string; // for cache API 997 - fileSize: number; 998 - cachedAt: number; 999 - lastAccessedAt: number; 1000 - downloadProgress?: number; // 0-100 for in-progress downloads 1001 - } 1002 - #+END_SRC 1003 - 1004 - **service worker events**: 1005 - - ~episode-cache-started~ - background download began 1006 - - ~episode-cache-progress~ - download progress update 1007 - - ~episode-cache-complete~ - ready to switch to cached version 1008 - - ~episode-cache-error~ - download failed, stay with streaming 1009 - 1010 - **background sync for proactive downloads**: 1011 - 1012 - **browser support reality**: 1013 - - Background Sync API: good support (Chrome/Edge, limited Safari) 1014 - - Periodic Background Sync: very limited (Chrome only, requires PWA install) 1015 - - Push notifications: good support, but requires user permission 1016 - 1017 - **hybrid approach**: 1018 - 1. **foreground sync** (reliable): when app is open, check for new episodes 1019 - 2. **background sync** (opportunistic): register sync event when app closes 1020 - 3. **push notifications** (fallback): server pushes "new episodes available" 1021 - 4. **manual sync** (always works): pull-to-refresh, settings toggle 1022 - 1023 - **implementation strategy**: 1024 - #+BEGIN_SRC typescript 1025 - // Register background sync when app becomes hidden 1026 - document.addEventListener('visibilitychange', () => { 1027 - if (document.hidden && 'serviceWorker' in navigator) { 1028 - navigator.serviceWorker.ready.then(registration => { 1029 - return registration.sync.register('download-episodes'); 1030 - }); 1031 - } 1032 - }); 1033 - 1034 - // Service worker handles sync event 1035 - self.addEventListener('sync', event => { 1036 - if (event.tag === 'download-episodes') { 1037 - event.waitUntil(syncEpisodes()); 1038 - } 1039 - }); 1040 - #+END_SRC 1041 - 1042 - **realistic expectations**: 1043 - - iOS Safari: very limited background processing 1044 - - Android Chrome: decent background sync support 1045 - - Desktop: mostly works 1046 - - battery/data saver modes: disabled by OS 1047 - 1048 - **fallback strategy**: rely primarily on foreground sync + push notifications, treat background sync as nice-to-have enhancement 1049 - 1050 - **push notification sync workflow**: 1051 - 1052 - **server-side trigger**: 1053 - 1. server detects new episodes during RSS refresh 1054 - 2. check which users are subscribed to that feed 1055 - 3. send push notification with episode metadata payload 1056 - 4. notification wakes up service worker on client 1057 - 1058 - **service worker notification handler**: 1059 - #+BEGIN_SRC typescript 1060 - self.addEventListener('push', event => { 1061 - const data = event.data?.json(); 1062 - 1063 - if (data.type === 'new-episodes') { 1064 - event.waitUntil( 1065 - // Start background download of new episodes 1066 - downloadNewEpisodes(data.episodes) 1067 - .then(() => { 1068 - // Show notification to user 1069 - return self.registration.showNotification('New episodes available', { 1070 - body: ~${data.episodes.length} new episodes downloaded~, 1071 - icon: '/icon-192.png', 1072 - badge: '/badge-72.png', 1073 - tag: 'new-episodes', 1074 - data: { episodeIds: data.episodes.map(e => e.id) } 1075 - }); 1076 - }) 1077 - ); 1078 - } 1079 - }); 1080 - 1081 - // Handle notification click 1082 - self.addEventListener('notificationclick', event => { 1083 - event.notification.close(); 1084 - 1085 - // Open app to specific episode or feed 1086 - event.waitUntil( 1087 - clients.openWindow(~/episodes/${event.notification.data.episodeIds[0]}~) 1088 - ); 1089 - }); 1090 - #+END_SRC 1091 - 1092 - **server push logic**: 1093 - - batch notifications (don't spam for every episode) 1094 - - respect user notification preferences from FilterView settings 1095 - - include episode metadata in payload to avoid round-trip 1096 - - throttle notifications (max 1 per feed per hour?) 1097 - 1098 - **user flow**: 1099 - 1. new episode published → server pushes notification 1100 - 2. service worker downloads episode in background 1101 - 3. user sees "New episodes downloaded" notification 1102 - 4. tap notification → opens app to new episode, ready to play offline 1103 - 1104 - *benefits*: 1105 - - true background downloading without user interaction 1106 - - works even when app is closed 1107 - - respects per-feed notification settings 1108 - 1109 - **push payload size constraints**: 1110 - - **limit**: ~4KB (4,096 bytes) across most services 1111 - - **practical limit**: ~3KB to account for service overhead 1112 - - **implications for episode metadata**: 1113 - 1114 - #+BEGIN_SRC json 1115 - { 1116 - "type": "new-episodes", 1117 - "episodes": [ 1118 - { 1119 - "id": "ep123", 1120 - "channelId": "ch456", 1121 - "title": "Episode Title", 1122 - "url": "https://...", 1123 - "duration": 3600, 1124 - "size": 89432112 1125 - } 1126 - ] 1127 - } 1128 - #+END_SRC 1129 - 1130 - **payload optimization strategies**: 1131 - - minimal episode metadata in push (id, url, basic info) 1132 - - batch multiple episodes in single notification 1133 - - full episode details fetched after service worker wakes up 1134 - - URL shortening for long episode URLs 1135 - - compress JSON payload if needed 1136 - 1137 - **alternative for large payloads**: 1138 - - push notification contains only "new episodes available" signal 1139 - - service worker makes API call to get full episode list 1140 - - trade-off: requires network round-trip but unlimited data 1141 - 1142 - **logical clock sync optimization**: 1143 - 1144 - much simpler approach using sync revisions: 1145 - 1146 - #+BEGIN_SRC json 1147 - { 1148 - "type": "sync-available", 1149 - "fromRevision": 12345, 1150 - "toRevision": 12389, 1151 - "changeCount": 8 1152 - } 1153 - #+END_SRC 1154 - 1155 - **service worker sync flow**: 1156 - 1. push notification wakes service worker with revision range 1157 - 2. service worker fetches ~/sync?from=12345&to=12389~ 1158 - 3. server returns only changes in that range (episodes, feed updates, etc) 1159 - 4. service worker applies changes to local dexie store 1160 - 5. service worker queues background downloads for new episodes 1161 - 6. updates local revision to 12389 1162 - 1163 - **benefits of revision-based approach**: 1164 - - tiny push payload (just revision numbers) 1165 - - server can efficiently return only changes in range 1166 - - automatic deduplication (revision already applied = skip) 1167 - - works for any sync data (episodes, feed metadata, user settings) 1168 - - handles offline gaps gracefully (fetch missing revision ranges) 1169 - 1170 - **sync API response**: 1171 - #+BEGIN_SRC typescript 1172 - interface SyncResponse { 1173 - fromRevision: number; 1174 - toRevision: number; 1175 - changes: Array<{ 1176 - type: 'episode' | 'channel' | 'subscription'; 1177 - operation: 'create' | 'update' | 'delete'; 1178 - data: any; 1179 - revision: number; 1180 - }>; 1181 - } 1182 - #+END_SRC 1183 - 1184 - **integration with episode downloads**: 1185 - - service worker processes sync changes 1186 - - identifies new episodes that match user's auto-download filters 1187 - - queues those for background cache fetching 1188 - - much more efficient than sending episode metadata in push payload 1189 - 1190 - **service worker processing time constraints**: 1191 - 1192 - **hard limits**: 1193 - - **30 seconds idle timeout**: service worker terminates after 30s of inactivity 1194 - - **5 minutes event processing**: single event/request must complete within 5 minutes 1195 - - **30 seconds fetch timeout**: individual network requests timeout after 30s 1196 - - **notification requirement**: push events MUST display notification before promise settles 1197 - 1198 - **practical implications**: 1199 - - sync API call (~/sync?from=X&to=Y~) must complete within 30s 1200 - - large episode downloads must be queued, not started immediately in push handler 1201 - - use ~event.waitUntil()~ to keep service worker alive during processing 1202 - - break large operations into smaller chunks 1203 - 1204 - **recommended push event flow**: 1205 - #+BEGIN_SRC typescript 1206 - self.addEventListener('push', event => { 1207 - const data = event.data?.json(); 1208 - 1209 - event.waitUntil( 1210 - // Must complete within 5 minutes total 1211 - handlePushSync(data) 1212 - .then(() => { 1213 - // Required: show notification before promise settles 1214 - return self.registration.showNotification('Episodes synced'); 1215 - }) 1216 - ); 1217 - }); 1218 - 1219 - async function handlePushSync(data) { 1220 - // 1. Quick sync API call (< 30s) 1221 - const changes = await fetch(~/sync?from=${data.fromRevision}&to=${data.toRevision}~); 1222 - 1223 - // 2. Apply changes to dexie store (fast, local) 1224 - await applyChangesToStore(changes); 1225 - 1226 - // 3. Queue episode downloads for later (don't start here) 1227 - await queueEpisodeDownloads(changes.newEpisodes); 1228 - 1229 - // Total time: < 5 minutes, preferably < 30s 1230 - } 1231 - #+END_SRC 1232 - 1233 - *download strategy*: use push event for sync + queuing, separate background tasks for actual downloads 1234 - 1235 - *background fetch API for large downloads*: 1236 - 1237 - *progressive enhancement approach*: 1238 - #+BEGIN_SRC typescript 1239 - async function queueEpisodeDownloads(episodes) { 1240 - for (const episode of episodes) { 1241 - if ('serviceWorker' in navigator && 'BackgroundFetch' in window) { 1242 - // Chrome/Edge: use Background Fetch API for true background downloading 1243 - await navigator.serviceWorker.ready.then(registration => { 1244 - return registration.backgroundFetch.fetch( 1245 - ~episode-${episode.id}~, 1246 - episode.url, 1247 - { 1248 - icons: [{ src: '/icon-256.png', sizes: '256x256', type: 'image/png' }], 1249 - title: ~Downloading: ${episode.title}~, 1250 - downloadTotal: episode.fileSize 1251 - } 1252 - ); 1253 - }); 1254 - } else { 1255 - // Fallback: queue for reactive download (download while streaming) 1256 - await queueReactiveDownload(episode); 1257 - } 1258 - } 1259 - } 1260 - 1261 - // Handle background fetch completion 1262 - self.addEventListener('backgroundfetch', event => { 1263 - if (event.tag.startsWith('episode-')) { 1264 - event.waitUntil(handleEpisodeDownloadComplete(event)); 1265 - } 1266 - }); 1267 - #+END_SRC 1268 - 1269 - *browser support reality*: 1270 - - *Chrome/Edge*: Background Fetch API supported 1271 - - *Firefox/Safari*: not supported, fallback to reactive caching 1272 - - *mobile*: varies by platform and browser 1273 - 1274 - *benefits when available*: 1275 - - true background downloading (survives app close, browser close) 1276 - - built-in download progress UI 1277 - - automatic retry on network failure 1278 - - no service worker time limits during download 1279 - 1280 - *graceful degradation*: 1281 - - detect support, use when available 1282 - - fallback to reactive caching (download while streaming) 1283 - - user gets best experience possible on their platform 1284 - 1285 - *** research todos :ai:claude: 1286 - 1287 - high-level unanswered questions from architecture brainstorming: 1288 - 1289 - **** sync and data management 1290 - ***** TODO dexie sync capabilities vs rxdb for multi-device sync implementation 1291 - ***** TODO webrtc p2p sync implementation patterns and reliability 1292 - ***** TODO conflict resolution strategies for device-specific data in distributed sync 1293 - ***** TODO content-addressed deduplication algorithms for rss/podcast content 1294 - **** client-side storage and caching 1295 - ***** TODO opfs storage limits and cleanup strategies for client-side caching 1296 - ***** TODO practical background fetch api limits and edge cases for podcast downloads 1297 - **** automation and intelligence 1298 - ***** TODO llm-based regex generation for episode title parsing automation 1299 - ***** TODO push notification subscription management and realm authentication 1300 - **** platform and browser capabilities 1301 - ***** TODO browser audio api capabilities for podcast-specific features (speed, silence skip) 1302 - ***** TODO progressive web app installation and platform-specific behaviors 1303 - 1304 - # Local Variables: 1305 - # org-hierarchical-todo-statistics: nil 1306 - # org-checkbox-hierarchical-statistics: nil 1307 - # End:

··· 1 #+PROPERTY: COOKIE_DATA recursive 2 #+STARTUP: overview 3 4 + * concepts [0/4] 5 6 + the skypod architecture is broken into pieces: 7 8 + ** p2p with realms 9 10 + In order to sync device and playback state, we need to have devices communicate with each 11 + other, which requires a signaling server, for peer discovery at the very least. 12 13 + *** realm 14 15 + A realm is a collection of known identities, verified with signed JWTs, where those 16 + identities can communicate outside of the server's purview. 17 18 + A realm is not publicly routable; to gain access, one must have the realm id and an 19 + invitation to the realm from an already existing member; new realms are created on 20 + demand with a random realmid supplied by the client. 21 22 + **** realm server 23 24 + A realm at the signalling sever is only a collection of known identity public keys, and 25 + the currently connected sockets. 26 27 + It acts mostly as a smart router; socket connection, authentication, and directed and 28 + realm-wide broadcast. 29 30 + ***** TODO webrtc requirements 31 32 + - [ ] what needs to happen for webrtc? I think SDP messages will just flow like normal 33 + - checkout whatever simplepeer does 34 35 + **** realm client 36 37 + In order to keep private data off of the server, the realm client takes on the additional 38 + task of maintaining a shared encyrption key for the realm, which can be used to encrypt 39 + data going over broadcasts. 40 41 + ***** TODO key exchange protocol 42 43 + *** identity 44 45 + Identity in the realm system is just an id and keypair. 46 47 + The private key stays local to the installed device, and is used to send signed tokens 48 + over the wire, either to the realm server to manage authentication, or over other channels 49 + to other members in the realm. 50 51 + The /public/ key is stored by all members of the realm, and the server, in order to 52 + perform signature validation (which is also authentication). 53 54 + **** browser private key storage 55 56 + There is no good way to store private keys in a browser, but there are less bad ways. 57 58 + - private keys are ~CryptoKey~ objects, with ~{ exportable: false }~ 59 + - WebCrypto native ~CryptoKey~ are structured clonable, which means they can get saved to indexeddb 60 61 + At the end of the day this is a podcast app. 62 63 + ***** TODO are there other ways to do this? 64 65 + Could we use webauthn, or some way to use a yubikey to sign something? 66 67 + *** sync 68 69 + Once a device has authenticated to the realm server over a websocket connection, it can 70 + send and broadcast any message it likes to the other online of the realm, via the websocket 71 + announcement channel. 72 73 + This sets it up to be used as a signaling server for WebRTC, allowing realm devices to 74 + communicate p2p without a dependency on the realm server outside of authentication and 75 + signaling. 76 77 + ** feed proxy server 78 79 + Due to ~CORS~, we'll need to help clients fetch the contents of feeds by running a caching 80 + proxy server for various HTTP requests. 81 82 + - help bypass ~CORS~ restrictions, so clients can access the content of the response 83 + - cache feeds, especially with regards to running transformations 84 + - perform transformations on responses: 85 + - text feeds: reader mode, detect reading time 86 + - podcast feeds: extract episode metadata, audio analysis for silence skips, etc 87 + - all feeds: extract title tags, etc. 88 89 + *** TODO open question: is the client able to not use the proxy? 90 91 + I'm not sure yet if we want the PWA to be able to pull feeds directly when the server 92 + isn't present. It would be much easier to keep it around, but 93 94 + ** feed management 95 96 + With a solid p2p WebRTC connection, we can use something like ~dexie~ or ~rxdb~ to get a 97 + synced document database that we use to manage feeds. 98 99 + * flow 100 101 + - user goes to https://skypod.accidental.cc 102 + - pwa runs, prompts to do full install for storage and offline 103 + - pwa is installed, sets up caches 104 105 + - first run 106 + - identity is generated (id + keypair per device) 107 + - do you want to sync to an existing install? 108 + - if yes, go to invitee flow 109 + - otherwise, new realm is generated and registered 110 + - pubkey and id get stored in the realm, to make future sync easier 111 112 + - subsequent runs 113 + - identity already exists, so we just go about our day 114 115 + - invitee flow 116 + - already generated identity 117 + - qr code pops 118 + - scanned by inviter, see inviter flow 119 + - done button after 120 + - camera pops, scan inviter's QR codes 121 + - sends invitation+registration token to server 122 + - added to the realm 123 + - go to subsequent runs

+5 -5

readme.org

··· 29 - Common 30 - ES2024 Javascript, running on in modern browsers or [[https://nodejs.org][Node v24]] 31 - [[https://github.com/panva/jose][~jose~]] for cross-platform webcrypto and JWT management 32 - - [[https://zod.dev/][Zod v4]] describes schema and builds transforamtion pipelines 33 - Backend 34 - [[https://expressjs.com/][Express]] and Node's ~stdlib~ for HTTP and WebSocket servers 35 - Frontend 36 - [[https://vite.dev/][Vite]] does FE builds 37 - - [[https://react.dev][React]] + [[https://zustand.docs.pmnd.rs][Zustand]] for UI 38 - Build & DX 39 - [[https://github.com/google/wireit][Wireit]] does script dependencies and services 40 - [[https://jsdoc.app/][JSDoc]], along with [[https://www.typescriptlang.org/docs/handbook/jsdoc-supported-types.html][Typescript's JSDoc support]] does typechecking ··· 46 - per-realm SQLite databases with node's native sqlite support 47 - docker compose for deployment with self-hosted realm storage 48 49 - See [[./devlog.org]] for design and architecture thoughts. 50 51 ** Scripts 52 53 All scripts can have ~--watch~ passed as an argument to have ~wireit~ rerun when inputs change. 54 This is not useful for everything. 55 56 - - ~npm run dev~ :: alias for ~npm run start:dev --watch~ 57 - ~npm run lint~ :: runs ~eslint~ 58 - ~npm run types~ :: runs ~tsc~ (no emitting, just typechecking) 59 - ~npm run docs~ :: runs ~jsdoc~ to generate docs in ~./docs~ ··· 73 This program is free software: you can redistribute it and/or modify it under the terms of 74 the **Affero General Public License verson 3 or later** (AGPLv3+). 75 76 - Please see [[./license.txt]] for a copy of the full license.

··· 29 - Common 30 - ES2024 Javascript, running on in modern browsers or [[https://nodejs.org][Node v24]] 31 - [[https://github.com/panva/jose][~jose~]] for cross-platform webcrypto and JWT management 32 + - [[https://zod.dev/][Zod v4]] describes schema and builds transformation pipelines 33 - Backend 34 - [[https://expressjs.com/][Express]] and Node's ~stdlib~ for HTTP and WebSocket servers 35 - Frontend 36 - [[https://vite.dev/][Vite]] does FE builds 37 + - [[https://preactjs.com/][Preact]] + [[https://zustand.docs.pmnd.rs][Zustand]] for UI 38 - Build & DX 39 - [[https://github.com/google/wireit][Wireit]] does script dependencies and services 40 - [[https://jsdoc.app/][JSDoc]], along with [[https://www.typescriptlang.org/docs/handbook/jsdoc-supported-types.html][Typescript's JSDoc support]] does typechecking ··· 46 - per-realm SQLite databases with node's native sqlite support 47 - docker compose for deployment with self-hosted realm storage 48 49 + See [[./readme-devlog.org]] for design and architecture thoughts. 50 51 ** Scripts 52 53 All scripts can have ~--watch~ passed as an argument to have ~wireit~ rerun when inputs change. 54 This is not useful for everything. 55 56 + - ~npm run dev~ :: alias for ~npm run start:dev~ 57 - ~npm run lint~ :: runs ~eslint~ 58 - ~npm run types~ :: runs ~tsc~ (no emitting, just typechecking) 59 - ~npm run docs~ :: runs ~jsdoc~ to generate docs in ~./docs~ ··· 73 This program is free software: you can redistribute it and/or modify it under the terms of 74 the **Affero General Public License verson 3 or later** (AGPLv3+). 75 76 + Please see [[./readme-license.txt]] for a copy of the full license.

+2 -2

src/client/page-app.spec.jsx

··· 17 18 // Check the JSX structure without full rendering 19 expect(component.type).toBe(Fragment) 20 - expect(component.props.children).toHaveProperty('type', 'h1') 21 - expect(component.props.children.props.children).toBe('whatever') 22 }) 23 }) 24

··· 17 18 // Check the JSX structure without full rendering 19 expect(component.type).toBe(Fragment) 20 + expect(component.props.children[0]).toHaveProperty('type', 'h1') 21 + expect(component.props.children[0].props.children).toBe('whatever') 22 }) 23 }) 24