# ATCR Quota System This document describes ATCR's storage quota implementation, inspired by Harbor's proven approach to per-project blob tracking with deduplication. ## Table of Contents - [Overview](#overview) - [Harbor's Approach (Reference Implementation)](#harbors-approach-reference-implementation) - [Storage Options](#storage-options) - [Quota Data Model](#quota-data-model) - [Push Flow (Detailed)](#push-flow-detailed) - [Delete Flow](#delete-flow) - [Garbage Collection](#garbage-collection) - [Quota Reconciliation](#quota-reconciliation) - [Configuration](#configuration) - [Trade-offs & Design Decisions](#trade-offs--design-decisions) - [Future Enhancements](#future-enhancements) ## Overview ATCR implements per-user storage quotas to: 1. **Limit storage consumption** on shared hold services 2. **Track actual S3 costs** (what new data was added) 3. **Benefit from deduplication** (users only pay once per layer) 4. **Provide transparency** (show users their storage usage) **Key principle:** Users pay for layers they've uploaded, but only ONCE per layer regardless of how many images reference it. ### Example Scenario ``` Alice pushes myapp:v1 (layers A, B, C - each 100MB) → Alice's quota: +300MB (all new layers) Alice pushes myapp:v2 (layers A, B, D) → Layers A, B already claimed by Alice → Layer D is new (100MB) → Alice's quota: +100MB (only D is new) → Total: 400MB Bob pushes his-app:latest (layers A, E) → Layer A already exists in S3 (uploaded by Alice) → Bob claims it for first time → +100MB to Bob's quota → Layer E is new → +100MB to Bob's quota → Bob's quota: 200MB Physical S3 storage: 500MB (A, B, C, D, E) Claimed storage: 600MB (Alice: 400MB, Bob: 200MB) Deduplication savings: 100MB (layer A shared) ``` ## Harbor's Approach (Reference Implementation) Harbor is built on distribution/distribution (same as ATCR) and implements quotas as middleware. Their approach: ### Key Insights from Harbor 1. **"Shared blobs are only computed once per project"** - Each project tracks which blobs it has uploaded - Same blob used in multiple images counts only once per project - Different projects claiming the same blob each pay for it 2. **Quota checked when manifest is pushed** - Blobs upload first (presigned URLs, can't intercept) - Manifest pushed last → quota check happens here - Can reject manifest if quota exceeded (orphaned blobs cleaned by GC) 3. **Middleware-based implementation** - distribution/distribution has NO built-in quota support - Harbor added it as request preprocessing middleware - Uses database (PostgreSQL) or Redis for quota storage 4. **Per-project ownership model** - Blobs are physically deduplicated globally - Quota accounting is logical (per-project claims) - Total claimed storage can exceed physical storage ### References - Harbor Quota Documentation: https://goharbor.io/docs/1.10/administration/configure-project-quotas/ - Harbor Source: https://github.com/goharbor/harbor (see `src/controller/quota`) ## Storage Options The hold service needs to store quota data somewhere. Two options: ### Option 1: S3-Based Storage (Recommended for BYOS) Store quota metadata alongside blobs in the same S3 bucket: ``` Bucket structure: /docker/registry/v2/blobs/sha256/ab/abc123.../data ← actual blobs /atcr/quota/did:plc:alice.json ← quota tracking /atcr/quota/did:plc:bob.json ``` **Pros:** - ✅ No separate database needed - ✅ Single S3 bucket (better UX - no second bucket to configure) - ✅ Quota data lives with the blobs - ✅ Hold service stays relatively stateless - ✅ Works with any S3-compatible service (Storj, Minio, Upcloud, Fly.io) **Cons:** - ❌ Slower than local database (network round-trip) - ❌ Eventual consistency issues - ❌ Race conditions on concurrent updates - ❌ Extra S3 API costs (GET/PUT per upload) **Performance:** - Each blob upload: 1 HEAD (blob exists?) + 1 GET (quota) + 1 PUT (update quota) - Typical latency: 100-200ms total overhead - For high-throughput registries, consider SQLite ### Option 2: SQLite Database (Recommended for Shared Holds) Local database in hold service: ```bash /var/lib/atcr/hold-quota.db ``` **Pros:** - ✅ Fast local queries (no network latency) - ✅ ACID transactions (no race conditions) - ✅ Efficient for high-throughput registries - ✅ Can use foreign keys and joins **Cons:** - ❌ Makes hold service stateful (persistent volume needed) - ❌ Not ideal for ephemeral BYOS deployments - ❌ Backup/restore complexity - ❌ Multi-instance scaling requires shared database **Schema:** ```sql CREATE TABLE user_quotas ( did TEXT PRIMARY KEY, quota_limit INTEGER NOT NULL DEFAULT 10737418240, -- 10GB quota_used INTEGER NOT NULL DEFAULT 0, updated_at TIMESTAMP ); CREATE TABLE claimed_layers ( did TEXT NOT NULL, digest TEXT NOT NULL, size INTEGER NOT NULL, claimed_at TIMESTAMP, PRIMARY KEY(did, digest) ); ``` ### Recommendation - **BYOS (user-owned holds):** S3-based (keeps hold service ephemeral) - **Shared holds (multi-user):** SQLite (better performance and consistency) - **High-traffic production:** SQLite or PostgreSQL (Harbor uses this) ## Quota Data Model ### Quota File Format (S3-based) ```json { "did": "did:plc:alice123", "limit": 10737418240, "used": 5368709120, "claimed_layers": { "sha256:abc123...": 104857600, "sha256:def456...": 52428800, "sha256:789ghi...": 209715200 }, "last_updated": "2025-10-09T12:34:56Z", "version": 1 } ``` **Fields:** - `did`: User's ATProto DID - `limit`: Maximum storage in bytes (default: 10GB) - `used`: Current storage usage in bytes (sum of claimed_layers) - `claimed_layers`: Map of digest → size for all layers user has uploaded - `last_updated`: Timestamp of last quota update - `version`: Schema version for future migrations ### Why Track Individual Layers? **Q: Can't we just track a counter?** **A: We need layer tracking for:** 1. **Deduplication detection** - Check if user already claimed a layer → free upload - Example: Updating an image reuses most layers 2. **Accurate deletes** - When manifest deleted, only decrement unclaimed layers - User may have 5 images sharing layer A - deleting 1 image doesn't free layer A 3. **Quota reconciliation** - Verify quota matches reality by listing user's manifests - Recalculate from layers in manifests vs claimed_layers map 4. **Auditing** - "Show me what I'm storing" - Users can see which layers consume their quota ## Push Flow (Detailed) ### Step-by-Step: User Pushes Image ``` ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Client │ │ Hold │ │ S3 │ │ (Docker) │ │ Service │ │ Bucket │ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ 1. PUT /v2/.../blobs/ │ │ │ upload?digest=sha256:abc│ │ ├───────────────────────────>│ │ │ │ │ │ │ 2. Check if blob exists │ │ │ (Stat/HEAD request) │ │ ├───────────────────────────>│ │ │<───────────────────────────┤ │ │ 200 OK (exists) or │ │ │ 404 Not Found │ │ │ │ │ │ 3. Read user quota │ │ │ GET /atcr/quota/{did} │ │ ├───────────────────────────>│ │ │<───────────────────────────┤ │ │ quota.json │ │ │ │ │ │ 4. Calculate quota impact │ │ │ - If digest in │ │ │ claimed_layers: 0 │ │ │ - Else: size │ │ │ │ │ │ 5. Check quota limit │ │ │ used + impact <= limit? │ │ │ │ │ │ 6. Update quota │ │ │ PUT /atcr/quota/{did} │ │ ├───────────────────────────>│ │ │<───────────────────────────┤ │ │ 200 OK │ │ │ │ │ 7. Presigned URL │ │ │<───────────────────────────┤ │ │ {url: "https://s3..."} │ │ │ │ │ │ 8. Upload blob to S3 │ │ ├────────────────────────────┼───────────────────────────>│ │ │ │ │ 9. 200 OK │ │ │<───────────────────────────┼────────────────────────────┤ │ │ │ ``` ### Implementation (Pseudocode) ```go // cmd/hold/main.go - HandlePutPresignedURL func (s *HoldService) HandlePutPresignedURL(w http.ResponseWriter, r *http.Request) { var req PutPresignedURLRequest json.NewDecoder(r.Body).Decode(&req) // Step 1: Check if blob already exists in S3 blobPath := fmt.Sprintf("/docker/registry/v2/blobs/%s/%s/%s/data", algorithm, digest[:2], digest) _, err := s.driver.Stat(ctx, blobPath) blobExists := (err == nil) // Step 2: Read quota from S3 (or SQLite) quota, err := s.quotaManager.GetQuota(req.DID) if err != nil { // First upload - create quota with defaults quota = &Quota{ DID: req.DID, Limit: s.config.QuotaDefaultLimit, Used: 0, ClaimedLayers: make(map[string]int64), } } // Step 3: Calculate quota impact quotaImpact := req.Size // Default: assume new layer if _, alreadyClaimed := quota.ClaimedLayers[req.Digest]; alreadyClaimed { // User already uploaded this layer before quotaImpact = 0 log.Printf("Layer %s already claimed by %s, no quota impact", req.Digest, req.DID) } else if blobExists { // Blob exists in S3 (uploaded by another user) // But this user is claiming it for first time // Still counts against their quota log.Printf("Layer %s exists globally but new to %s, quota impact: %d", req.Digest, req.DID, quotaImpact) } else { // Brand new blob - will be uploaded to S3 log.Printf("New layer %s for %s, quota impact: %d", req.Digest, req.DID, quotaImpact) } // Step 4: Check quota limit if quota.Used + quotaImpact > quota.Limit { http.Error(w, fmt.Sprintf( "quota exceeded: used=%d, impact=%d, limit=%d", quota.Used, quotaImpact, quota.Limit, ), http.StatusPaymentRequired) // 402 return } // Step 5: Update quota (optimistic - before upload completes) quota.Used += quotaImpact if quotaImpact > 0 { quota.ClaimedLayers[req.Digest] = req.Size } quota.LastUpdated = time.Now() if err := s.quotaManager.SaveQuota(quota); err != nil { http.Error(w, "failed to update quota", http.StatusInternalServerError) return } // Step 6: Generate presigned URL presignedURL, err := s.getUploadURL(ctx, req.Digest, req.Size, req.DID) if err != nil { // Rollback quota update on error quota.Used -= quotaImpact delete(quota.ClaimedLayers, req.Digest) s.quotaManager.SaveQuota(quota) http.Error(w, "failed to generate presigned URL", http.StatusInternalServerError) return } // Step 7: Return presigned URL + quota info resp := PutPresignedURLResponse{ URL: presignedURL, ExpiresAt: time.Now().Add(15 * time.Minute), QuotaInfo: QuotaInfo{ Used: quota.Used, Limit: quota.Limit, Available: quota.Limit - quota.Used, Impact: quotaImpact, AlreadyClaimed: quotaImpact == 0, }, } w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(resp) } ``` ### Race Condition Handling **Problem:** Two concurrent uploads of the same blob ``` Time User A User B 0ms Upload layer X (100MB) 10ms Upload layer X (100MB) 20ms Check exists: NO Check exists: NO 30ms Quota impact: 100MB Quota impact: 100MB 40ms Update quota A: +100MB Update quota B: +100MB 50ms Generate presigned URL Generate presigned URL 100ms Upload to S3 completes Upload to S3 (overwrites A's) ``` **Result:** Both users charged 100MB, but only 100MB stored in S3. **Mitigation strategies:** 1. **Accept eventual consistency** (recommended for S3-based) - Run periodic reconciliation to fix discrepancies - Small inconsistency window (minutes) is acceptable - Reconciliation uses PDS as source of truth 2. **Optimistic locking** (S3 ETags) ```go // Use S3 ETags for conditional writes oldETag := getQuotaFileETag(did) err := putQuotaFileWithCondition(quota, oldETag) if err == PreconditionFailed { // Retry with fresh read } ``` 3. **Database transactions** (SQLite-based) ```sql BEGIN TRANSACTION; SELECT * FROM user_quotas WHERE did = ? FOR UPDATE; UPDATE user_quotas SET used = used + ? WHERE did = ?; COMMIT; ``` ## Delete Flow ### Manifest Deletion via AppView UI When a user deletes a manifest through the AppView web interface: ``` ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ User │ │ AppView │ │ Hold │ │ PDS │ │ UI │ │ Database │ │ Service │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ DELETE manifest │ │ │ ├─────────────────────>│ │ │ │ │ │ │ │ │ 1. Get manifest │ │ │ │ and layers │ │ │ │ │ │ │ │ 2. Check which │ │ │ │ layers still │ │ │ │ referenced by │ │ │ │ user's other │ │ │ │ manifests │ │ │ │ │ │ │ │ 3. DELETE manifest │ │ │ │ from PDS │ │ │ ├──────────────────────┼─────────────────────>│ │ │ │ │ │ │ 4. POST /quota/decrement │ │ ├─────────────────────>│ │ │ │ {layers: [...]} │ │ │ │ │ │ │ │ │ 5. Update quota │ │ │ │ Remove unclaimed │ │ │ │ layers │ │ │ │ │ │ │ 6. 200 OK │ │ │ │<─────────────────────┤ │ │ │ │ │ │ │ 7. Delete from DB │ │ │ │ │ │ │ 8. Success │ │ │ │<─────────────────────┤ │ │ │ │ │ │ ``` ### AppView Implementation ```go // pkg/appview/handlers/manifest.go func (h *ManifestHandler) DeleteManifest(w http.ResponseWriter, r *http.Request) { did := r.Context().Value("auth.did").(string) repository := chi.URLParam(r, "repository") digest := chi.URLParam(r, "digest") // Step 1: Get manifest and its layers from database manifest, err := db.GetManifest(h.db, digest) if err != nil { http.Error(w, "manifest not found", 404) return } layers, err := db.GetLayersForManifest(h.db, manifest.ID) if err != nil { http.Error(w, "failed to get layers", 500) return } // Step 2: For each layer, check if user still references it // in other manifests layersToDecrement := []LayerInfo{} for _, layer := range layers { // Query: does this user have other manifests using this layer? stillReferenced, err := db.CheckLayerReferencedByUser( h.db, did, repository, layer.Digest, manifest.ID, ) if err != nil { http.Error(w, "failed to check layer references", 500) return } if !stillReferenced { // This layer is no longer used by user layersToDecrement = append(layersToDecrement, LayerInfo{ Digest: layer.Digest, Size: layer.Size, }) } } // Step 3: Delete manifest from user's PDS atprotoClient := atproto.NewClient(manifest.PDSEndpoint, did, accessToken) err = atprotoClient.DeleteRecord(ctx, atproto.ManifestCollection, manifestRKey) if err != nil { http.Error(w, "failed to delete from PDS", 500) return } // Step 4: Notify hold service to decrement quota if len(layersToDecrement) > 0 { holdClient := &http.Client{} decrementReq := QuotaDecrementRequest{ DID: did, Layers: layersToDecrement, } body, _ := json.Marshal(decrementReq) resp, err := holdClient.Post( manifest.HoldEndpoint + "/quota/decrement", "application/json", bytes.NewReader(body), ) if err != nil || resp.StatusCode != 200 { log.Printf("Warning: failed to update quota on hold service: %v", err) // Continue anyway - GC reconciliation will fix it } } // Step 5: Delete from AppView database err = db.DeleteManifest(h.db, did, repository, digest) if err != nil { http.Error(w, "failed to delete from database", 500) return } w.WriteHeader(http.StatusNoContent) } ``` ### Hold Service Decrement Endpoint ```go // cmd/hold/main.go type QuotaDecrementRequest struct { DID string `json:"did"` Layers []LayerInfo `json:"layers"` } type LayerInfo struct { Digest string `json:"digest"` Size int64 `json:"size"` } func (s *HoldService) HandleQuotaDecrement(w http.ResponseWriter, r *http.Request) { var req QuotaDecrementRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, "invalid request", 400) return } // Read current quota quota, err := s.quotaManager.GetQuota(req.DID) if err != nil { http.Error(w, "quota not found", 404) return } // Decrement quota for each layer for _, layer := range req.Layers { if size, claimed := quota.ClaimedLayers[layer.Digest]; claimed { // Remove from claimed layers delete(quota.ClaimedLayers, layer.Digest) quota.Used -= size log.Printf("Decremented quota for %s: layer %s (%d bytes)", req.DID, layer.Digest, size) } else { log.Printf("Warning: layer %s not in claimed_layers for %s", layer.Digest, req.DID) } } // Ensure quota.Used doesn't go negative (defensive) if quota.Used < 0 { log.Printf("Warning: quota.Used went negative for %s, resetting to 0", req.DID) quota.Used = 0 } // Save updated quota quota.LastUpdated = time.Now() if err := s.quotaManager.SaveQuota(quota); err != nil { http.Error(w, "failed to save quota", 500) return } // Return updated quota info json.NewEncoder(w).Encode(map[string]any{ "used": quota.Used, "limit": quota.Limit, }) } ``` ### SQL Query: Check Layer References ```sql -- pkg/appview/db/queries.go -- Check if user still references this layer in other manifests SELECT COUNT(*) FROM layers l JOIN manifests m ON l.manifest_id = m.id WHERE m.did = ? -- User's DID AND l.digest = ? -- Layer digest AND m.id != ? -- Exclude the manifest being deleted ``` ## Garbage Collection ### Background: Orphaned Blobs Orphaned blobs accumulate when: 1. Manifest push fails after blobs uploaded (presigned URLs bypass hold) 2. Quota exceeded - manifest rejected, blobs already in S3 3. User deletes manifest - blobs no longer referenced **GC periodically cleans these up.** ### GC Cron Implementation Similar to AppView's backfill worker, the hold service can run periodic GC: ```go // cmd/hold/gc/gc.go type GarbageCollector struct { driver storagedriver.StorageDriver appviewURL string holdURL string quotaManager *quota.Manager } // Run garbage collection func (gc *GarbageCollector) Run(ctx context.Context) error { log.Println("Starting garbage collection...") // Step 1: Get list of referenced blobs from AppView referenced, err := gc.getReferencedBlobs() if err != nil { return fmt.Errorf("failed to get referenced blobs: %w", err) } referencedSet := make(map[string]bool) for _, digest := range referenced { referencedSet[digest] = true } log.Printf("AppView reports %d referenced blobs", len(referenced)) // Step 2: Walk S3 blobs deletedCount := 0 reclaimedBytes := int64(0) err = gc.driver.Walk(ctx, "/docker/registry/v2/blobs", func(fileInfo storagedriver.FileInfo) error { if fileInfo.IsDir() { return nil // Skip directories } // Extract digest from path // Path: /docker/registry/v2/blobs/sha256/ab/abc123.../data digest := extractDigestFromPath(fileInfo.Path()) if !referencedSet[digest] { // Unreferenced blob - delete it size := fileInfo.Size() if err := gc.driver.Delete(ctx, fileInfo.Path()); err != nil { log.Printf("Failed to delete blob %s: %v", digest, err) return nil // Continue anyway } deletedCount++ reclaimedBytes += size log.Printf("GC: Deleted unreferenced blob %s (%d bytes)", digest, size) } return nil }) if err != nil { return fmt.Errorf("failed to walk blobs: %w", err) } log.Printf("GC complete: deleted %d blobs, reclaimed %d bytes", deletedCount, reclaimedBytes) return nil } // Get referenced blobs from AppView func (gc *GarbageCollector) getReferencedBlobs() ([]string, error) { // Query AppView for all blobs referenced by manifests // stored in THIS hold service url := fmt.Sprintf("%s/internal/blobs/referenced?hold=%s", gc.appviewURL, url.QueryEscape(gc.holdURL)) resp, err := http.Get(url) if err != nil { return nil, err } defer resp.Body.Close() var result struct { Blobs []string `json:"blobs"` } if err := json.NewDecoder(resp.Body).Decode(&result); err != nil { return nil, err } return result.Blobs, nil } ``` ### AppView Internal API ```go // pkg/appview/handlers/internal.go // Get all referenced blobs for a specific hold func (h *InternalHandler) GetReferencedBlobs(w http.ResponseWriter, r *http.Request) { holdEndpoint := r.URL.Query().Get("hold") if holdEndpoint == "" { http.Error(w, "missing hold parameter", 400) return } // Query database for all layers in manifests stored in this hold query := ` SELECT DISTINCT l.digest FROM layers l JOIN manifests m ON l.manifest_id = m.id WHERE m.hold_endpoint = ? ` rows, err := h.db.Query(query, holdEndpoint) if err != nil { http.Error(w, "database error", 500) return } defer rows.Close() blobs := []string{} for rows.Next() { var digest string if err := rows.Scan(&digest); err != nil { continue } blobs = append(blobs, digest) } json.NewEncoder(w).Encode(map[string]any{ "blobs": blobs, "count": len(blobs), "hold": holdEndpoint, }) } ``` ### GC Cron Schedule ```go // cmd/hold/main.go func main() { // ... service setup ... // Start GC cron if enabled if os.Getenv("GC_ENABLED") == "true" { gcInterval := 24 * time.Hour // Daily by default go func() { ticker := time.NewTicker(gcInterval) defer ticker.Stop() for range ticker.C { if err := garbageCollector.Run(context.Background()); err != nil { log.Printf("GC error: %v", err) } } }() log.Printf("GC cron started: runs every %v", gcInterval) } // Start server... } ``` ## Quota Reconciliation ### PDS as Source of Truth **Key insight:** Manifest records in PDS are publicly readable (no OAuth needed for reads). Each manifest contains: - Repository name - Digest - Layers array with digest + size - Hold endpoint The hold service can query the PDS to calculate the user's true quota: ``` 1. List all io.atcr.manifest records for user 2. Filter manifests where holdEndpoint == this hold service 3. Extract unique layers (deduplicate by digest) 4. Sum layer sizes = true quota usage 5. Compare to quota file 6. Fix discrepancies ``` ### Implementation ```go // cmd/hold/quota/reconcile.go type Reconciler struct { quotaManager *Manager atprotoResolver *atproto.Resolver holdURL string } // ReconcileUser recalculates quota from PDS manifests func (r *Reconciler) ReconcileUser(ctx context.Context, did string) error { log.Printf("Reconciling quota for %s", did) // Step 1: Resolve user's PDS endpoint identity, err := r.atprotoResolver.ResolveIdentity(ctx, did) if err != nil { return fmt.Errorf("failed to resolve DID: %w", err) } // Step 2: Create unauthenticated ATProto client // (manifest records are public - no OAuth needed) client := atproto.NewClient(identity.PDSEndpoint, did, "") // Step 3: List all manifest records for this user manifests, err := client.ListRecords(ctx, atproto.ManifestCollection, 1000) if err != nil { return fmt.Errorf("failed to list manifests: %w", err) } // Step 4: Filter manifests stored in THIS hold service // and extract unique layers uniqueLayers := make(map[string]int64) // digest -> size for _, record := range manifests { var manifest atproto.ManifestRecord if err := json.Unmarshal(record.Value, &manifest); err != nil { log.Printf("Warning: failed to parse manifest: %v", err) continue } // Only count manifests stored in this hold if manifest.HoldEndpoint != r.holdURL { continue } // Add config blob if manifest.Config.Digest != "" { uniqueLayers[manifest.Config.Digest] = manifest.Config.Size } // Add layer blobs for _, layer := range manifest.Layers { uniqueLayers[layer.Digest] = layer.Size } } // Step 5: Calculate true quota usage trueUsage := int64(0) for _, size := range uniqueLayers { trueUsage += size } log.Printf("User %s true usage from PDS: %d bytes (%d unique layers)", did, trueUsage, len(uniqueLayers)) // Step 6: Compare with current quota file quota, err := r.quotaManager.GetQuota(did) if err != nil { log.Printf("No existing quota for %s, creating new", did) quota = &Quota{ DID: did, Limit: r.quotaManager.DefaultLimit, ClaimedLayers: make(map[string]int64), } } // Step 7: Fix discrepancies if quota.Used != trueUsage || len(quota.ClaimedLayers) != len(uniqueLayers) { log.Printf("Quota mismatch for %s: recorded=%d, actual=%d (diff=%d)", did, quota.Used, trueUsage, trueUsage - quota.Used) // Update quota to match PDS truth quota.Used = trueUsage quota.ClaimedLayers = uniqueLayers quota.LastUpdated = time.Now() if err := r.quotaManager.SaveQuota(quota); err != nil { return fmt.Errorf("failed to save reconciled quota: %w", err) } log.Printf("Reconciled quota for %s: %d bytes", did, trueUsage) } else { log.Printf("Quota for %s is accurate", did) } return nil } // ReconcileAll reconciles all users (run periodically) func (r *Reconciler) ReconcileAll(ctx context.Context) error { // Get list of all users with quota files users, err := r.quotaManager.ListUsers() if err != nil { return err } log.Printf("Starting reconciliation for %d users", len(users)) for _, did := range users { if err := r.ReconcileUser(ctx, did); err != nil { log.Printf("Failed to reconcile %s: %v", did, err) // Continue with other users } } log.Println("Reconciliation complete") return nil } ``` ### Reconciliation Cron ```go // cmd/hold/main.go func main() { // ... setup ... // Start reconciliation cron if os.Getenv("QUOTA_RECONCILE_ENABLED") == "true" { reconcileInterval := 24 * time.Hour // Daily go func() { ticker := time.NewTicker(reconcileInterval) defer ticker.Stop() for range ticker.C { if err := reconciler.ReconcileAll(context.Background()); err != nil { log.Printf("Reconciliation error: %v", err) } } }() log.Printf("Quota reconciliation cron started: runs every %v", reconcileInterval) } // ... start server ... } ``` ### Why PDS as Source of Truth Works 1. **Manifests are canonical** - If manifest exists in PDS, user owns those layers 2. **Public reads** - No OAuth needed, just resolve DID → PDS endpoint 3. **ATProto durability** - PDS is user's authoritative data store 4. **AppView is cache** - AppView database might lag or have inconsistencies 5. **Reconciliation fixes drift** - Periodic sync from PDS ensures accuracy **Example reconciliation scenarios:** - **Orphaned quota entries:** User deleted manifest from PDS, but hold quota still has it → Reconciliation removes from claimed_layers - **Missing quota entries:** User pushed manifest, but quota update failed → Reconciliation adds to claimed_layers - **Race condition duplicates:** Two concurrent pushes double-counted a layer → Reconciliation fixes to actual usage ## Configuration ### Hold Service Environment Variables ```bash # .env.hold # ============================================================================ # Quota Configuration # ============================================================================ # Enable quota enforcement QUOTA_ENABLED=true # Default quota limit per user (bytes) # 10GB = 10737418240 # 50GB = 53687091200 # 100GB = 107374182400 QUOTA_DEFAULT_LIMIT=10737418240 # Storage backend for quota data # Options: s3, sqlite QUOTA_STORAGE_BACKEND=s3 # For S3-based storage: # Quota files stored in same bucket as blobs QUOTA_STORAGE_PREFIX=/atcr/quota/ # For SQLite-based storage: QUOTA_DB_PATH=/var/lib/atcr/hold-quota.db # ============================================================================ # Garbage Collection # ============================================================================ # Enable periodic garbage collection GC_ENABLED=true # GC interval (default: 24h) GC_INTERVAL=24h # AppView URL for GC reference checking APPVIEW_URL=https://atcr.io # ============================================================================ # Quota Reconciliation # ============================================================================ # Enable quota reconciliation from PDS QUOTA_RECONCILE_ENABLED=true # Reconciliation interval (default: 24h) QUOTA_RECONCILE_INTERVAL=24h # ============================================================================ # Hold Service Identity (Required) # ============================================================================ # Public URL of this hold service HOLD_PUBLIC_URL=https://hold1.example.com # Owner DID (for auto-registration) HOLD_OWNER=did:plc:xyz123 ``` ### AppView Configuration ```bash # .env.appview # Internal API endpoint for hold services # Used for GC reference checking ATCR_INTERNAL_API_ENABLED=true # Optional: authentication token for internal APIs ATCR_INTERNAL_API_TOKEN=secret123 ``` ## Trade-offs & Design Decisions ### 1. Claimed Storage vs Physical Storage **Decision:** Track claimed storage (logical accounting) **Why:** - Predictable for users: "you pay for what you upload" - No complex cross-user dependencies - Delete always gives you quota back - Matches Harbor's proven model **Trade-off:** - Total claimed can exceed physical storage - Users might complain "I uploaded 10GB but S3 only has 6GB" **Mitigation:** - Show deduplication savings metric - Educate users: "You claimed 10GB, but deduplication saved 4GB" ### 2. S3 vs SQLite for Quota Storage **Decision:** Support both, recommend based on use case **S3 Pros:** - No database to manage - Quota data lives with blobs - Better for ephemeral BYOS **SQLite Pros:** - Faster (no network) - ACID transactions (no race conditions) - Better for high-traffic shared holds **Trade-off:** - S3: eventual consistency, race conditions - SQLite: stateful service, scaling challenges **Mitigation:** - Reconciliation fixes S3 inconsistencies - SQLite can use shared DB for multi-instance ### 3. Optimistic Quota Update **Decision:** Update quota BEFORE upload completes **Why:** - Prevent race conditions (two users uploading simultaneously) - Can reject before presigned URL generated - Simpler flow **Trade-off:** - If upload fails, quota already incremented (user "paid" for nothing) **Mitigation:** - Reconciliation from PDS fixes orphaned quota entries - Acceptable for MVP (upload failures are rare) ### 4. AppView as Intermediary **Decision:** AppView notifies hold service on deletes **Why:** - AppView already has manifest/layer database - Can efficiently check if layer still referenced - Hold service doesn't need to query PDS on every delete **Trade-off:** - AppView → Hold dependency - Network hop on delete **Mitigation:** - If notification fails, reconciliation fixes quota - Eventually consistent is acceptable ### 5. PDS as Source of Truth **Decision:** Use PDS manifests for reconciliation **Why:** - Manifests in PDS are canonical user data - Public reads (no OAuth for reconciliation) - AppView database might lag or be inconsistent **Trade-off:** - Reconciliation requires PDS queries (slower) - Limited to 1000 manifests per query **Mitigation:** - Run reconciliation daily (not real-time) - Paginate if user has >1000 manifests ## Future Enhancements ### 1. Quota API Endpoints ``` GET /quota/usage - Get current user's quota GET /quota/breakdown - Get storage by repository POST /quota/limit - Update user's quota limit (admin) GET /quota/stats - Get hold-wide statistics ``` ### 2. Quota Alerts Notify users when approaching limit: - Email/webhook at 80%, 90%, 95% - Reject uploads at 100% (currently implemented) - Grace period: allow 105% temporarily ### 3. Tiered Quotas Different limits based on user tier: - Free: 10GB - Pro: 100GB - Enterprise: unlimited ### 4. Quota Purchasing Allow users to buy additional storage: - Stripe integration - $0.10/GB/month pricing - Dynamic limit updates ### 5. Cross-Hold Deduplication If multiple holds share same S3 bucket: - Track blob ownership globally - Split costs proportionally - More complex, but maximizes deduplication ### 6. Manifest-Based Quota (Alternative Model) Instead of tracking layers, track manifests: - Simpler: just count manifest sizes - No deduplication benefits for users - Might be acceptable for some use cases ### 7. Redis-Based Quota (High Performance) For high-traffic registries: - Use Redis instead of S3/SQLite - Sub-millisecond quota checks - Harbor-proven approach ### 8. Quota Visualizations Web UI showing: - Storage usage over time - Top consumers by repository - Deduplication savings graph - Layer size distribution ## Appendix: SQL Queries ### Check if User Still References Layer ```sql -- After deleting manifest, check if user has other manifests using this layer SELECT COUNT(*) FROM layers l JOIN manifests m ON l.manifest_id = m.id WHERE m.did = ? -- User's DID AND l.digest = ? -- Layer digest to check AND m.id != ? -- Exclude the manifest being deleted ``` ### Get All Unique Layers for User ```sql -- Calculate true quota usage for a user SELECT DISTINCT l.digest, l.size FROM layers l JOIN manifests m ON l.manifest_id = m.id WHERE m.did = ? AND m.hold_endpoint = ? ``` ### Get Referenced Blobs for Hold ```sql -- For GC: get all blobs still referenced by any user of this hold SELECT DISTINCT l.digest FROM layers l JOIN manifests m ON l.manifest_id = m.id WHERE m.hold_endpoint = ? ``` ### Get Storage Stats by Repository ```sql -- User's storage broken down by repository SELECT m.repository, COUNT(DISTINCT m.id) as manifest_count, COUNT(DISTINCT l.digest) as unique_layers, SUM(l.size) as total_size FROM manifests m JOIN layers l ON l.manifest_id = m.id WHERE m.did = ? AND m.hold_endpoint = ? GROUP BY m.repository ORDER BY total_size DESC ``` ## References - **Harbor Quotas:** https://goharbor.io/docs/1.10/administration/configure-project-quotas/ - **Harbor Source:** https://github.com/goharbor/harbor - **ATProto Spec:** https://atproto.com/specs/record - **OCI Distribution Spec:** https://github.com/opencontainers/distribution-spec - **S3 API Reference:** https://docs.aws.amazon.com/AmazonS3/latest/API/ - **Distribution GC:** https://github.com/distribution/distribution/blob/main/registry/storage/garbagecollect.go --- **Document Version:** 1.0 **Last Updated:** 2025-10-09 **Author:** Generated from implementation research and Harbor analysis