A community based topic aggregation platform built on atproto

perf(votes): use cache-first lookup for vote existence checks

Replace O(n) PDS pagination with O(1) cache lookups for vote existence
checks in CreateVote and DeleteVote. First operation populates cache
from PDS, subsequent operations use fast hashmap lookups.

- Add findExistingVoteWithCache for cache-first lookups
- Rename findExistingVote to findExistingVoteFromPDS (fallback only)
- Propagate auth errors immediately instead of attempting doomed fallback
- Update comments to accurately reflect complexity characteristics

Closes Coves-fqg

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+59 -11
+1 -1
.beads/issues.jsonl
··· 5 {"id":"Coves-e16","content_hash":"7c5d0fc8f0e7f626be3dad62af0e8412467330bad01a244e5a7e52ac5afff1c1","title":"Complete post creation and moderation features","description":"","status":"open","priority":1,"issue_type":"feature","created_at":"2025-11-17T20:30:12.885991306-08:00","updated_at":"2025-11-17T20:30:12.885991306-08:00","source_repo":"."} 6 {"id":"Coves-f9q","content_hash":"a1a38759edc37d11227d5992cdbed1b8cf27e09496165e45c542b208f58d34ce","title":"Apply functional options pattern to NewGetTimelineHandler and RegisterTimelineRoutes","description":"Locations:\n- internal/api/handlers/timeline/get.go (NewGetTimelineHandler)\n- internal/api/routes/timeline.go (RegisterTimelineRoutes)\n\nApply functional options pattern for optional dependencies (votes, bluesky).\n\nUpdate RegisterTimelineRoutes last after handlers are refactored.\n\nDepends on: Coves-jdf, Coves-8b1, Coves-iw5\nParent: Coves-8k1","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-22T21:35:27.420117481-08:00","updated_at":"2025-12-22T21:35:58.166765845-08:00","source_repo":"."} 7 {"id":"Coves-fce","content_hash":"26b3e16b99f827316ee0d741cc959464bd0c813446c95aef8105c7fd1e6b09ff","title":"Implement aggregator feed federation","description":"","status":"open","priority":1,"issue_type":"feature","created_at":"2025-11-17T20:30:21.453326012-08:00","updated_at":"2025-11-17T20:30:21.453326012-08:00","source_repo":"."} 8 - {"id":"Coves-fqg","content_hash":"715c3a860b5000385f2787f89ea199d940dc06ef6b50902dd5fcdea71c96edbb","title":"Performance: PDS ListRecords bottleneck causing slow votes and feed loads","description":"## Problem Summary\n\nMultiple user-facing operations are slow due to repeated `listRecords` calls to the user's PDS (Personal Data Server). This affects both vote creation (~2-3s) and initial feed loads (~800ms).\n\n## Root Cause\n\nThe AppView queries the user's PDS to list ALL vote records whenever it needs to:\n1. Check if a vote already exists (for toggle logic)\n2. Populate the vote cache for viewer state\n\n**The problematic code path** (`service_impl.go:317-374`):\n\n```go\nfunc (s *voteService) findExistingVote(ctx context.Context, pdsClient pds.Client, subjectURI string) (*existingVote, error) {\n cursor := \"\"\n for {\n // Fetches 100 records per page from user's PDS\n result, err := pdsClient.ListRecords(ctx, voteCollection, pageSize, cursor)\n // Iterates through ALL records looking for matching subject URI\n for _, rec := range result.Records {\n // Linear search through every vote...\n }\n }\n}\n```\n\n## Affected Operations\n\n| Operation | Latency | Cause |\n|-----------|---------|-------|\n| Vote create/toggle | 2-3s | `getPDSClient` → token refresh w/ DPoP retry → `listRecords` (find existing) → `createRecord` |\n| First feed load | ~800ms | `listRecords` to populate vote cache |\n| Subsequent feeds | ~100ms | Cache hit (no PDS call) |\n\n## Why Token Refresh Adds Latency\n\nLogs show DPoP nonce mismatches requiring retries:\n```\n22:13:09 [AUTH_SUCCESS]\n22:13:11 WARN auth server request failed request=token-refresh statusCode=400 body=\"use_dpop_nonce\"\n22:13:12 INFO vote created\n```\n\nThe OAuth DPoP flow requires a server-provided nonce. On first request, the server rejects with the nonce, client retries with it. This adds ~1s per token refresh.\n\n## Scaling Concern\n\nThe `findExistingVote` function paginates through ALL votes (100 per page):\n\n| User's Vote Count | PDS Calls | Estimated Latency |\n|-------------------|-----------|-------------------|\n| 36 (current) | 1 | ~1s |\n| 200 | 2 | ~2s |\n| 500 | 5 | ~5s |\n| 1000 | 10 | ~10s |\n| 2000+ | 20+ | Timeout risk (30s limit) |\n\n**Active user projection**: 20 votes/day × 30 days = 600 votes/month → UX degradation within weeks of active use.\n\n## Current Architecture Flow\n\n```\n┌─────────────┐ ┌─────────────┐ ┌─────────────┐\n│ Mobile App │────▶│ AppView │────▶│ User's PDS │\n└─────────────┘ └─────────────┘ └─────────────┘\n │\n ▼\n ┌─────────────┐\n │ Vote Cache │ (in-memory, per-user)\n └─────────────┘\n```\n\n**Vote Creation Flow:**\n1. Mobile sends vote request to AppView\n2. AppView refreshes OAuth token (DPoP nonce retry) → **+1s**\n3. AppView calls `listRecords` on PDS to find existing vote → **+1s per 100 votes**\n4. AppView creates/deletes record on PDS → **+0.5s**\n5. Total: **2-3s+ for 36 votes**\n\n**First Feed Load Flow:**\n1. Mobile requests feed\n2. AppView checks if vote cache populated\n3. If not, calls `listRecords` to fetch ALL votes → **+800ms**\n4. Returns feed with viewer vote state\n\n## The Irony\n\nA vote cache already exists and is properly maintained:\n- Updated on every vote create/delete (`service_impl.go:157-159`, `service_impl.go:209-215`)\n- Indexed by subject URI for O(1) lookup\n- Cleared on sign-out\n\nBut it's **bypassed for vote existence checks** because the code treats PDS as \"source of truth\" to avoid eventual consistency issues.","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-13T14:29:00.821389478-08:00","updated_at":"2026-01-13T14:29:13.666892869-08:00","source_repo":"."} 9 {"id":"Coves-iw5","content_hash":"d3379c617b7583f6b88a0523b3cdd1e4415176877ab00b48710819f2484c4856","title":"Apply functional options pattern to NewGetCommunityHandler","description":"Location: internal/api/handlers/communityFeed/get.go\n\nApply functional options pattern for optional dependencies (votes, bluesky).\n\nDepends on: Coves-jdf (NewPostService refactor should be done first to establish pattern)\nParent: Coves-8k1","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-22T21:35:27.369297201-08:00","updated_at":"2025-12-22T21:35:58.115771178-08:00","source_repo":"."} 10 {"id":"Coves-jdf","content_hash":"cb27689d71f44fd555e29d2988f2ad053efb6c565cd4f803ff68eaade59c7546","title":"Apply functional options pattern to NewPostService","description":"Location: internal/core/posts/service.go\n\nCurrent constructor (7 params, 4 optional):\n```go\nfunc NewPostService(repo Repository, communityService communities.Service, aggregatorService aggregators.Service, blobService blobs.Service, unfurlService unfurl.Service, blueskyService blueskypost.Service, pdsURL string) Service\n```\n\nRefactor to:\n```go\ntype Option func(*postService)\n\nfunc WithAggregatorService(svc aggregators.Service) Option\nfunc WithBlobService(svc blobs.Service) Option\nfunc WithUnfurlService(svc unfurl.Service) Option\nfunc WithBlueskyService(svc blueskypost.Service) Option\n\nfunc NewPostService(repo Repository, communityService communities.Service, pdsURL string, opts ...Option) Service\n```\n\nFiles to update:\n- internal/core/posts/service.go (define Option type and With* functions)\n- cmd/server/main.go (production caller)\n- ~15 test files with call sites\n\nStart with this one as it has the most params and is most impacted.\nParent: Coves-8k1","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-22T21:35:27.264325344-08:00","updated_at":"2025-12-22T21:35:58.003863381-08:00","source_repo":"."} 11 {"id":"Coves-p44","content_hash":"6f12091f6e5f1ad9812f8da4ecd720e0f9df1afd1fdb593b3e52c32be0193d94","title":"Bluesky embed conversion Phase 2: resolve post and populate CID","description":"When converting a Bluesky URL to a social.coves.embed.post, we need to:\n\n1. Call blueskyService.ResolvePost() to get the full post data including CID\n2. Populate both URI and CID in the strongRef\n3. Consider caching/re-using resolved post data for rendering\n\nCurrently disabled in Phase 1 (text-only) because:\n- social.coves.embed.post requires a valid CID in com.atproto.repo.strongRef\n- Empty CID causes PDS to reject the record creation\n\nRelated files:\n- internal/core/posts/service.go:tryConvertBlueskyURLToPostEmbed()\n- internal/atproto/lexicon/social/coves/embed/post.json\n\nThis is part of the Bluesky post cross-posting feature (images/embeds phase).","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-12-22T21:25:23.540135876-08:00","updated_at":"2025-12-23T14:41:49.014541876-08:00","closed_at":"2025-12-23T14:41:49.014541876-08:00","source_repo":"."}
··· 5 {"id":"Coves-e16","content_hash":"7c5d0fc8f0e7f626be3dad62af0e8412467330bad01a244e5a7e52ac5afff1c1","title":"Complete post creation and moderation features","description":"","status":"open","priority":1,"issue_type":"feature","created_at":"2025-11-17T20:30:12.885991306-08:00","updated_at":"2025-11-17T20:30:12.885991306-08:00","source_repo":"."} 6 {"id":"Coves-f9q","content_hash":"a1a38759edc37d11227d5992cdbed1b8cf27e09496165e45c542b208f58d34ce","title":"Apply functional options pattern to NewGetTimelineHandler and RegisterTimelineRoutes","description":"Locations:\n- internal/api/handlers/timeline/get.go (NewGetTimelineHandler)\n- internal/api/routes/timeline.go (RegisterTimelineRoutes)\n\nApply functional options pattern for optional dependencies (votes, bluesky).\n\nUpdate RegisterTimelineRoutes last after handlers are refactored.\n\nDepends on: Coves-jdf, Coves-8b1, Coves-iw5\nParent: Coves-8k1","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-22T21:35:27.420117481-08:00","updated_at":"2025-12-22T21:35:58.166765845-08:00","source_repo":"."} 7 {"id":"Coves-fce","content_hash":"26b3e16b99f827316ee0d741cc959464bd0c813446c95aef8105c7fd1e6b09ff","title":"Implement aggregator feed federation","description":"","status":"open","priority":1,"issue_type":"feature","created_at":"2025-11-17T20:30:21.453326012-08:00","updated_at":"2025-11-17T20:30:21.453326012-08:00","source_repo":"."} 8 + {"id":"Coves-fqg","content_hash":"b3e4af5d914ad9fa222a2216603e6f089afa88cbc8f65c8a593e571e7f926ccb","title":"Performance: PDS ListRecords bottleneck causing slow votes and feed loads","description":"## Problem Summary\n\nMultiple user-facing operations are slow due to repeated `listRecords` calls to the user's PDS (Personal Data Server). This affects both vote creation (~2-3s) and initial feed loads (~800ms).\n\n## Root Cause\n\nThe AppView queries the user's PDS to list ALL vote records whenever it needs to:\n1. Check if a vote already exists (for toggle logic)\n2. Populate the vote cache for viewer state\n\n**The problematic code path** (`service_impl.go:317-374`):\n\n```go\nfunc (s *voteService) findExistingVote(ctx context.Context, pdsClient pds.Client, subjectURI string) (*existingVote, error) {\n cursor := \"\"\n for {\n // Fetches 100 records per page from user's PDS\n result, err := pdsClient.ListRecords(ctx, voteCollection, pageSize, cursor)\n // Iterates through ALL records looking for matching subject URI\n for _, rec := range result.Records {\n // Linear search through every vote...\n }\n }\n}\n```\n\n## Affected Operations\n\n| Operation | Latency | Cause |\n|-----------|---------|-------|\n| Vote create/toggle | 2-3s | `getPDSClient` → token refresh w/ DPoP retry → `listRecords` (find existing) → `createRecord` |\n| First feed load | ~800ms | `listRecords` to populate vote cache |\n| Subsequent feeds | ~100ms | Cache hit (no PDS call) |\n\n## Why Token Refresh Adds Latency\n\nLogs show DPoP nonce mismatches requiring retries:\n```\n22:13:09 [AUTH_SUCCESS]\n22:13:11 WARN auth server request failed request=token-refresh statusCode=400 body=\"use_dpop_nonce\"\n22:13:12 INFO vote created\n```\n\nThe OAuth DPoP flow requires a server-provided nonce. On first request, the server rejects with the nonce, client retries with it. This adds ~1s per token refresh.\n\n## Scaling Concern\n\nThe `findExistingVote` function paginates through ALL votes (100 per page):\n\n| User's Vote Count | PDS Calls | Estimated Latency |\n|-------------------|-----------|-------------------|\n| 36 (current) | 1 | ~1s |\n| 200 | 2 | ~2s |\n| 500 | 5 | ~5s |\n| 1000 | 10 | ~10s |\n| 2000+ | 20+ | Timeout risk (30s limit) |\n\n**Active user projection**: 20 votes/day × 30 days = 600 votes/month → UX degradation within weeks of active use.\n\n## Current Architecture Flow\n\n```\n┌─────────────┐ ┌─────────────┐ ┌─────────────┐\n│ Mobile App │────▶│ AppView │────▶│ User's PDS │\n└─────────────┘ └─────────────┘ └─────────────┘\n │\n ▼\n ┌─────────────┐\n │ Vote Cache │ (in-memory, per-user)\n └─────────────┘\n```\n\n**Vote Creation Flow:**\n1. Mobile sends vote request to AppView\n2. AppView refreshes OAuth token (DPoP nonce retry) → **+1s**\n3. AppView calls `listRecords` on PDS to find existing vote → **+1s per 100 votes**\n4. AppView creates/deletes record on PDS → **+0.5s**\n5. Total: **2-3s+ for 36 votes**\n\n**First Feed Load Flow:**\n1. Mobile requests feed\n2. AppView checks if vote cache populated\n3. If not, calls `listRecords` to fetch ALL votes → **+800ms**\n4. Returns feed with viewer vote state\n\n## The Irony\n\nA vote cache already exists and is properly maintained:\n- Updated on every vote create/delete (`service_impl.go:157-159`, `service_impl.go:209-215`)\n- Indexed by subject URI for O(1) lookup\n- Cleared on sign-out\n\nBut it's **bypassed for vote existence checks** because the code treats PDS as \"source of truth\" to avoid eventual consistency issues.","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-13T14:29:00.821389478-08:00","updated_at":"2026-01-21T19:54:45.676280382-08:00","closed_at":"2026-01-21T19:54:45.676280382-08:00","source_repo":"."} 9 {"id":"Coves-iw5","content_hash":"d3379c617b7583f6b88a0523b3cdd1e4415176877ab00b48710819f2484c4856","title":"Apply functional options pattern to NewGetCommunityHandler","description":"Location: internal/api/handlers/communityFeed/get.go\n\nApply functional options pattern for optional dependencies (votes, bluesky).\n\nDepends on: Coves-jdf (NewPostService refactor should be done first to establish pattern)\nParent: Coves-8k1","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-22T21:35:27.369297201-08:00","updated_at":"2025-12-22T21:35:58.115771178-08:00","source_repo":"."} 10 {"id":"Coves-jdf","content_hash":"cb27689d71f44fd555e29d2988f2ad053efb6c565cd4f803ff68eaade59c7546","title":"Apply functional options pattern to NewPostService","description":"Location: internal/core/posts/service.go\n\nCurrent constructor (7 params, 4 optional):\n```go\nfunc NewPostService(repo Repository, communityService communities.Service, aggregatorService aggregators.Service, blobService blobs.Service, unfurlService unfurl.Service, blueskyService blueskypost.Service, pdsURL string) Service\n```\n\nRefactor to:\n```go\ntype Option func(*postService)\n\nfunc WithAggregatorService(svc aggregators.Service) Option\nfunc WithBlobService(svc blobs.Service) Option\nfunc WithUnfurlService(svc unfurl.Service) Option\nfunc WithBlueskyService(svc blueskypost.Service) Option\n\nfunc NewPostService(repo Repository, communityService communities.Service, pdsURL string, opts ...Option) Service\n```\n\nFiles to update:\n- internal/core/posts/service.go (define Option type and With* functions)\n- cmd/server/main.go (production caller)\n- ~15 test files with call sites\n\nStart with this one as it has the most params and is most impacted.\nParent: Coves-8k1","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-22T21:35:27.264325344-08:00","updated_at":"2025-12-22T21:35:58.003863381-08:00","source_repo":"."} 11 {"id":"Coves-p44","content_hash":"6f12091f6e5f1ad9812f8da4ecd720e0f9df1afd1fdb593b3e52c32be0193d94","title":"Bluesky embed conversion Phase 2: resolve post and populate CID","description":"When converting a Bluesky URL to a social.coves.embed.post, we need to:\n\n1. Call blueskyService.ResolvePost() to get the full post data including CID\n2. Populate both URI and CID in the strongRef\n3. Consider caching/re-using resolved post data for rendering\n\nCurrently disabled in Phase 1 (text-only) because:\n- social.coves.embed.post requires a valid CID in com.atproto.repo.strongRef\n- Empty CID causes PDS to reject the record creation\n\nRelated files:\n- internal/core/posts/service.go:tryConvertBlueskyURLToPostEmbed()\n- internal/atproto/lexicon/social/coves/embed/post.json\n\nThis is part of the Bluesky post cross-posting feature (images/embeds phase).","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-12-22T21:25:23.540135876-08:00","updated_at":"2025-12-23T14:41:49.014541876-08:00","closed_at":"2025-12-23T14:41:49.014541876-08:00","source_repo":"."}
+58 -10
internal/core/votes/service_impl.go
··· 2 3 import ( 4 "context" 5 "fmt" 6 "log/slog" 7 "strings" ··· 121 // handles orphaned votes correctly by only updating counts for non-deleted subjects. 122 // This avoids race conditions and eventual consistency issues. 123 124 - // Check for existing vote by querying PDS directly (source of truth) 125 - // This avoids eventual consistency issues with the AppView database 126 - existing, err := s.findExistingVote(ctx, pdsClient, req.Subject.URI) 127 if err != nil { 128 s.logger.Error("failed to check existing vote on PDS", 129 "error", err, ··· 239 return fmt.Errorf("failed to create PDS client: %w", err) 240 } 241 242 - // Find existing vote by querying PDS directly (source of truth) 243 - // This avoids eventual consistency issues with the AppView database 244 - existing, err := s.findExistingVote(ctx, pdsClient, req.Subject.URI) 245 if err != nil { 246 s.logger.Error("failed to find vote on PDS", 247 "error", err, ··· 310 Direction string 311 } 312 313 - // findExistingVote queries the user's PDS directly to find an existing vote for a subject. 314 - // This avoids eventual consistency issues with the AppView database populated by Jetstream. 315 - // Paginates through all vote records to handle users with >100 votes. 316 // Returns the vote record with rkey, or nil if no vote exists for the subject. 317 - func (s *voteService) findExistingVote(ctx context.Context, pdsClient pds.Client, subjectURI string) (*existingVote, error) { 318 cursor := "" 319 const pageSize = 100 320
··· 2 3 import ( 4 "context" 5 + "errors" 6 "fmt" 7 "log/slog" 8 "strings" ··· 122 // handles orphaned votes correctly by only updating counts for non-deleted subjects. 123 // This avoids race conditions and eventual consistency issues. 124 125 + // Check for existing vote using cache with PDS fallback 126 + // First check populates cache from PDS, subsequent checks are O(1) lookups 127 + existing, err := s.findExistingVoteWithCache(ctx, pdsClient, session.AccountDID.String(), req.Subject.URI) 128 if err != nil { 129 s.logger.Error("failed to check existing vote on PDS", 130 "error", err, ··· 240 return fmt.Errorf("failed to create PDS client: %w", err) 241 } 242 243 + // Find existing vote using cache with PDS fallback 244 + // First check populates cache from PDS, subsequent checks are O(1) lookups 245 + existing, err := s.findExistingVoteWithCache(ctx, pdsClient, session.AccountDID.String(), req.Subject.URI) 246 if err != nil { 247 s.logger.Error("failed to find vote on PDS", 248 "error", err, ··· 311 Direction string 312 } 313 314 + // findExistingVoteWithCache uses the vote cache for O(1) lookups when available. 315 + // Falls back to direct PDS pagination if cache is unavailable or cannot be populated. 316 + func (s *voteService) findExistingVoteWithCache(ctx context.Context, pdsClient pds.Client, userDID string, subjectURI string) (*existingVote, error) { 317 + if s.cache != nil { 318 + if !s.cache.IsCached(userDID) { 319 + // Populate cache first (fetches all votes via pagination, then cached for subsequent O(1) lookups) 320 + if err := s.cache.FetchAndCacheFromPDS(ctx, pdsClient); err != nil { 321 + // Auth errors won't succeed on fallback either - propagate immediately 322 + if errors.Is(err, ErrNotAuthorized) { 323 + return nil, err 324 + } 325 + // Log warning for other errors and fall back to direct PDS query 326 + s.logger.Warn("failed to populate vote cache, falling back to PDS pagination", 327 + "error", err, 328 + "user", userDID, 329 + "subject", subjectURI) 330 + } 331 + } 332 + 333 + if s.cache.IsCached(userDID) { 334 + cached := s.cache.GetVote(userDID, subjectURI) 335 + if cached == nil { 336 + s.logger.Debug("vote existence check via cache: not found", 337 + "user", userDID, 338 + "subject", subjectURI) 339 + return nil, nil // No vote exists 340 + } 341 + s.logger.Debug("vote existence check via cache: found", 342 + "user", userDID, 343 + "subject", subjectURI, 344 + "direction", cached.Direction) 345 + return &existingVote{ 346 + URI: cached.URI, 347 + RKey: cached.RKey, 348 + Direction: cached.Direction, 349 + // CID not cached - not needed for toggle/delete operations 350 + }, nil 351 + } 352 + } 353 + 354 + // Fallback: query PDS directly via pagination 355 + s.logger.Debug("vote existence check via PDS pagination (cache unavailable)", 356 + "user", userDID, 357 + "subject", subjectURI) 358 + return s.findExistingVoteFromPDS(ctx, pdsClient, subjectURI) 359 + } 360 + 361 + // findExistingVoteFromPDS queries the user's PDS directly to find an existing vote for a subject. 362 + // This is the slow fallback path that paginates through all vote records. 363 + // Prefer findExistingVoteWithCache for production use. 364 // Returns the vote record with rkey, or nil if no vote exists for the subject. 365 + func (s *voteService) findExistingVoteFromPDS(ctx context.Context, pdsClient pds.Client, subjectURI string) (*existingVote, error) { 366 cursor := "" 367 const pageSize = 100 368