A container registry that uses the AT Protocol for manifest storage and S3 for blob storage. atcr.io
docker container atproto go
at loom 1289 lines 42 kB view raw view rendered
1# ATCR Quota System 2 3This document describes ATCR's storage quota implementation, inspired by Harbor's proven approach to per-project blob tracking with deduplication. 4 5## Table of Contents 6 7- [Overview](#overview) 8- [Harbor's Approach (Reference Implementation)](#harbors-approach-reference-implementation) 9- [Storage Options](#storage-options) 10- [Quota Data Model](#quota-data-model) 11- [Push Flow (Detailed)](#push-flow-detailed) 12- [Delete Flow](#delete-flow) 13- [Garbage Collection](#garbage-collection) 14- [Quota Reconciliation](#quota-reconciliation) 15- [Configuration](#configuration) 16- [Trade-offs & Design Decisions](#trade-offs--design-decisions) 17- [Future Enhancements](#future-enhancements) 18 19## Overview 20 21ATCR implements per-user storage quotas to: 221. **Limit storage consumption** on shared hold services 232. **Track actual S3 costs** (what new data was added) 243. **Benefit from deduplication** (users only pay once per layer) 254. **Provide transparency** (show users their storage usage) 26 27**Key principle:** Users pay for layers they've uploaded, but only ONCE per layer regardless of how many images reference it. 28 29### Example Scenario 30 31``` 32Alice pushes myapp:v1 (layers A, B, C - each 100MB) 33→ Alice's quota: +300MB (all new layers) 34 35Alice pushes myapp:v2 (layers A, B, D) 36→ Layers A, B already claimed by Alice 37→ Layer D is new (100MB) 38→ Alice's quota: +100MB (only D is new) 39→ Total: 400MB 40 41Bob pushes his-app:latest (layers A, E) 42→ Layer A already exists in S3 (uploaded by Alice) 43→ Bob claims it for first time → +100MB to Bob's quota 44→ Layer E is new → +100MB to Bob's quota 45→ Bob's quota: 200MB 46 47Physical S3 storage: 500MB (A, B, C, D, E) 48Claimed storage: 600MB (Alice: 400MB, Bob: 200MB) 49Deduplication savings: 100MB (layer A shared) 50``` 51 52## Harbor's Approach (Reference Implementation) 53 54Harbor is built on distribution/distribution (same as ATCR) and implements quotas as middleware. Their approach: 55 56### Key Insights from Harbor 57 581. **"Shared blobs are only computed once per project"** 59 - Each project tracks which blobs it has uploaded 60 - Same blob used in multiple images counts only once per project 61 - Different projects claiming the same blob each pay for it 62 632. **Quota checked when manifest is pushed** 64 - Blobs upload first (presigned URLs, can't intercept) 65 - Manifest pushed last → quota check happens here 66 - Can reject manifest if quota exceeded (orphaned blobs cleaned by GC) 67 683. **Middleware-based implementation** 69 - distribution/distribution has NO built-in quota support 70 - Harbor added it as request preprocessing middleware 71 - Uses database (PostgreSQL) or Redis for quota storage 72 734. **Per-project ownership model** 74 - Blobs are physically deduplicated globally 75 - Quota accounting is logical (per-project claims) 76 - Total claimed storage can exceed physical storage 77 78### References 79 80- Harbor Quota Documentation: https://goharbor.io/docs/1.10/administration/configure-project-quotas/ 81- Harbor Source: https://github.com/goharbor/harbor (see `src/controller/quota`) 82 83## Storage Options 84 85The hold service needs to store quota data somewhere. Two options: 86 87### Option 1: S3-Based Storage (Recommended for BYOS) 88 89Store quota metadata alongside blobs in the same S3 bucket: 90 91``` 92Bucket structure: 93/docker/registry/v2/blobs/sha256/ab/abc123.../data ← actual blobs 94/atcr/quota/did:plc:alice.json ← quota tracking 95/atcr/quota/did:plc:bob.json 96``` 97 98**Pros:** 99- ✅ No separate database needed 100- ✅ Single S3 bucket (better UX - no second bucket to configure) 101- ✅ Quota data lives with the blobs 102- ✅ Hold service stays relatively stateless 103- ✅ Works with any S3-compatible service (Storj, Minio, Upcloud, Fly.io) 104 105**Cons:** 106- ❌ Slower than local database (network round-trip) 107- ❌ Eventual consistency issues 108- ❌ Race conditions on concurrent updates 109- ❌ Extra S3 API costs (GET/PUT per upload) 110 111**Performance:** 112- Each blob upload: 1 HEAD (blob exists?) + 1 GET (quota) + 1 PUT (update quota) 113- Typical latency: 100-200ms total overhead 114- For high-throughput registries, consider SQLite 115 116### Option 2: SQLite Database (Recommended for Shared Holds) 117 118Local database in hold service: 119 120```bash 121/var/lib/atcr/hold-quota.db 122``` 123 124**Pros:** 125- ✅ Fast local queries (no network latency) 126- ✅ ACID transactions (no race conditions) 127- ✅ Efficient for high-throughput registries 128- ✅ Can use foreign keys and joins 129 130**Cons:** 131- ❌ Makes hold service stateful (persistent volume needed) 132- ❌ Not ideal for ephemeral BYOS deployments 133- ❌ Backup/restore complexity 134- ❌ Multi-instance scaling requires shared database 135 136**Schema:** 137```sql 138CREATE TABLE user_quotas ( 139 did TEXT PRIMARY KEY, 140 quota_limit INTEGER NOT NULL DEFAULT 10737418240, -- 10GB 141 quota_used INTEGER NOT NULL DEFAULT 0, 142 updated_at TIMESTAMP 143); 144 145CREATE TABLE claimed_layers ( 146 did TEXT NOT NULL, 147 digest TEXT NOT NULL, 148 size INTEGER NOT NULL, 149 claimed_at TIMESTAMP, 150 PRIMARY KEY(did, digest) 151); 152``` 153 154### Recommendation 155 156- **BYOS (user-owned holds):** S3-based (keeps hold service ephemeral) 157- **Shared holds (multi-user):** SQLite (better performance and consistency) 158- **High-traffic production:** SQLite or PostgreSQL (Harbor uses this) 159 160## Quota Data Model 161 162### Quota File Format (S3-based) 163 164```json 165{ 166 "did": "did:plc:alice123", 167 "limit": 10737418240, 168 "used": 5368709120, 169 "claimed_layers": { 170 "sha256:abc123...": 104857600, 171 "sha256:def456...": 52428800, 172 "sha256:789ghi...": 209715200 173 }, 174 "last_updated": "2025-10-09T12:34:56Z", 175 "version": 1 176} 177``` 178 179**Fields:** 180- `did`: User's ATProto DID 181- `limit`: Maximum storage in bytes (default: 10GB) 182- `used`: Current storage usage in bytes (sum of claimed_layers) 183- `claimed_layers`: Map of digest → size for all layers user has uploaded 184- `last_updated`: Timestamp of last quota update 185- `version`: Schema version for future migrations 186 187### Why Track Individual Layers? 188 189**Q: Can't we just track a counter?** 190 191**A: We need layer tracking for:** 192 1931. **Deduplication detection** 194 - Check if user already claimed a layer → free upload 195 - Example: Updating an image reuses most layers 196 1972. **Accurate deletes** 198 - When manifest deleted, only decrement unclaimed layers 199 - User may have 5 images sharing layer A - deleting 1 image doesn't free layer A 200 2013. **Quota reconciliation** 202 - Verify quota matches reality by listing user's manifests 203 - Recalculate from layers in manifests vs claimed_layers map 204 2054. **Auditing** 206 - "Show me what I'm storing" 207 - Users can see which layers consume their quota 208 209## Push Flow (Detailed) 210 211### Step-by-Step: User Pushes Image 212 213``` 214┌──────────┐ ┌──────────┐ ┌──────────┐ 215│ Client │ │ Hold │ │ S3 │ 216│ (Docker) │ │ Service │ │ Bucket │ 217└──────────┘ └──────────┘ └──────────┘ 218 │ │ │ 219 │ 1. PUT /v2/.../blobs/ │ │ 220 │ upload?digest=sha256:abc│ │ 221 ├───────────────────────────>│ │ 222 │ │ │ 223 │ │ 2. Check if blob exists │ 224 │ │ (Stat/HEAD request) │ 225 │ ├───────────────────────────>│ 226 │ │<───────────────────────────┤ 227 │ │ 200 OK (exists) or │ 228 │ │ 404 Not Found │ 229 │ │ │ 230 │ │ 3. Read user quota │ 231 │ │ GET /atcr/quota/{did} │ 232 │ ├───────────────────────────>│ 233 │ │<───────────────────────────┤ 234 │ │ quota.json │ 235 │ │ │ 236 │ │ 4. Calculate quota impact │ 237 │ │ - If digest in │ 238 │ │ claimed_layers: 0 │ 239 │ │ - Else: size │ 240 │ │ │ 241 │ │ 5. Check quota limit │ 242 │ │ used + impact <= limit? │ 243 │ │ │ 244 │ │ 6. Update quota │ 245 │ │ PUT /atcr/quota/{did} │ 246 │ ├───────────────────────────>│ 247 │ │<───────────────────────────┤ 248 │ │ 200 OK │ 249 │ │ │ 250 │ 7. Presigned URL │ │ 251 │<───────────────────────────┤ │ 252 │ {url: "https://s3..."} │ │ 253 │ │ │ 254 │ 8. Upload blob to S3 │ │ 255 ├────────────────────────────┼───────────────────────────>│ 256 │ │ │ 257 │ 9. 200 OK │ │ 258 │<───────────────────────────┼────────────────────────────┤ 259 │ │ │ 260``` 261 262### Implementation (Pseudocode) 263 264```go 265// cmd/hold/main.go - HandlePutPresignedURL 266 267func (s *HoldService) HandlePutPresignedURL(w http.ResponseWriter, r *http.Request) { 268 var req PutPresignedURLRequest 269 json.NewDecoder(r.Body).Decode(&req) 270 271 // Step 1: Check if blob already exists in S3 272 blobPath := fmt.Sprintf("/docker/registry/v2/blobs/%s/%s/%s/data", 273 algorithm, digest[:2], digest) 274 275 _, err := s.driver.Stat(ctx, blobPath) 276 blobExists := (err == nil) 277 278 // Step 2: Read quota from S3 (or SQLite) 279 quota, err := s.quotaManager.GetQuota(req.DID) 280 if err != nil { 281 // First upload - create quota with defaults 282 quota = &Quota{ 283 DID: req.DID, 284 Limit: s.config.QuotaDefaultLimit, 285 Used: 0, 286 ClaimedLayers: make(map[string]int64), 287 } 288 } 289 290 // Step 3: Calculate quota impact 291 quotaImpact := req.Size // Default: assume new layer 292 293 if _, alreadyClaimed := quota.ClaimedLayers[req.Digest]; alreadyClaimed { 294 // User already uploaded this layer before 295 quotaImpact = 0 296 log.Printf("Layer %s already claimed by %s, no quota impact", 297 req.Digest, req.DID) 298 } else if blobExists { 299 // Blob exists in S3 (uploaded by another user) 300 // But this user is claiming it for first time 301 // Still counts against their quota 302 log.Printf("Layer %s exists globally but new to %s, quota impact: %d", 303 req.Digest, req.DID, quotaImpact) 304 } else { 305 // Brand new blob - will be uploaded to S3 306 log.Printf("New layer %s for %s, quota impact: %d", 307 req.Digest, req.DID, quotaImpact) 308 } 309 310 // Step 4: Check quota limit 311 if quota.Used + quotaImpact > quota.Limit { 312 http.Error(w, fmt.Sprintf( 313 "quota exceeded: used=%d, impact=%d, limit=%d", 314 quota.Used, quotaImpact, quota.Limit, 315 ), http.StatusPaymentRequired) // 402 316 return 317 } 318 319 // Step 5: Update quota (optimistic - before upload completes) 320 quota.Used += quotaImpact 321 if quotaImpact > 0 { 322 quota.ClaimedLayers[req.Digest] = req.Size 323 } 324 quota.LastUpdated = time.Now() 325 326 if err := s.quotaManager.SaveQuota(quota); err != nil { 327 http.Error(w, "failed to update quota", http.StatusInternalServerError) 328 return 329 } 330 331 // Step 6: Generate presigned URL 332 presignedURL, err := s.getUploadURL(ctx, req.Digest, req.Size, req.DID) 333 if err != nil { 334 // Rollback quota update on error 335 quota.Used -= quotaImpact 336 delete(quota.ClaimedLayers, req.Digest) 337 s.quotaManager.SaveQuota(quota) 338 339 http.Error(w, "failed to generate presigned URL", http.StatusInternalServerError) 340 return 341 } 342 343 // Step 7: Return presigned URL + quota info 344 resp := PutPresignedURLResponse{ 345 URL: presignedURL, 346 ExpiresAt: time.Now().Add(15 * time.Minute), 347 QuotaInfo: QuotaInfo{ 348 Used: quota.Used, 349 Limit: quota.Limit, 350 Available: quota.Limit - quota.Used, 351 Impact: quotaImpact, 352 AlreadyClaimed: quotaImpact == 0, 353 }, 354 } 355 356 w.Header().Set("Content-Type", "application/json") 357 json.NewEncoder(w).Encode(resp) 358} 359``` 360 361### Race Condition Handling 362 363**Problem:** Two concurrent uploads of the same blob 364 365``` 366Time User A User B 3670ms Upload layer X (100MB) 36810ms Upload layer X (100MB) 36920ms Check exists: NO Check exists: NO 37030ms Quota impact: 100MB Quota impact: 100MB 37140ms Update quota A: +100MB Update quota B: +100MB 37250ms Generate presigned URL Generate presigned URL 373100ms Upload to S3 completes Upload to S3 (overwrites A's) 374``` 375 376**Result:** Both users charged 100MB, but only 100MB stored in S3. 377 378**Mitigation strategies:** 379 3801. **Accept eventual consistency** (recommended for S3-based) 381 - Run periodic reconciliation to fix discrepancies 382 - Small inconsistency window (minutes) is acceptable 383 - Reconciliation uses PDS as source of truth 384 3852. **Optimistic locking** (S3 ETags) 386 ```go 387 // Use S3 ETags for conditional writes 388 oldETag := getQuotaFileETag(did) 389 err := putQuotaFileWithCondition(quota, oldETag) 390 if err == PreconditionFailed { 391 // Retry with fresh read 392 } 393 ``` 394 3953. **Database transactions** (SQLite-based) 396 ```sql 397 BEGIN TRANSACTION; 398 SELECT * FROM user_quotas WHERE did = ? FOR UPDATE; 399 UPDATE user_quotas SET used = used + ? WHERE did = ?; 400 COMMIT; 401 ``` 402 403## Delete Flow 404 405### Manifest Deletion via AppView UI 406 407When a user deletes a manifest through the AppView web interface: 408 409``` 410┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ 411│ User │ │ AppView │ │ Hold │ │ PDS │ 412│ UI │ │ Database │ │ Service │ │ │ 413└──────────┘ └──────────┘ └──────────┘ └──────────┘ 414 │ │ │ │ 415 │ DELETE manifest │ │ │ 416 ├─────────────────────>│ │ │ 417 │ │ │ │ 418 │ │ 1. Get manifest │ │ 419 │ │ and layers │ │ 420 │ │ │ │ 421 │ │ 2. Check which │ │ 422 │ │ layers still │ │ 423 │ │ referenced by │ │ 424 │ │ user's other │ │ 425 │ │ manifests │ │ 426 │ │ │ │ 427 │ │ 3. DELETE manifest │ │ 428 │ │ from PDS │ │ 429 │ ├──────────────────────┼─────────────────────>│ 430 │ │ │ │ 431 │ │ 4. POST /quota/decrement │ 432 │ ├─────────────────────>│ │ 433 │ │ {layers: [...]} │ │ 434 │ │ │ │ 435 │ │ │ 5. Update quota │ 436 │ │ │ Remove unclaimed │ 437 │ │ │ layers │ 438 │ │ │ │ 439 │ │ 6. 200 OK │ │ 440 │ │<─────────────────────┤ │ 441 │ │ │ │ 442 │ │ 7. Delete from DB │ │ 443 │ │ │ │ 444 │ 8. Success │ │ │ 445 │<─────────────────────┤ │ │ 446 │ │ │ │ 447``` 448 449### AppView Implementation 450 451```go 452// pkg/appview/handlers/manifest.go 453 454func (h *ManifestHandler) DeleteManifest(w http.ResponseWriter, r *http.Request) { 455 did := r.Context().Value("auth.did").(string) 456 repository := chi.URLParam(r, "repository") 457 digest := chi.URLParam(r, "digest") 458 459 // Step 1: Get manifest and its layers from database 460 manifest, err := db.GetManifest(h.db, digest) 461 if err != nil { 462 http.Error(w, "manifest not found", 404) 463 return 464 } 465 466 layers, err := db.GetLayersForManifest(h.db, manifest.ID) 467 if err != nil { 468 http.Error(w, "failed to get layers", 500) 469 return 470 } 471 472 // Step 2: For each layer, check if user still references it 473 // in other manifests 474 layersToDecrement := []LayerInfo{} 475 476 for _, layer := range layers { 477 // Query: does this user have other manifests using this layer? 478 stillReferenced, err := db.CheckLayerReferencedByUser( 479 h.db, did, repository, layer.Digest, manifest.ID, 480 ) 481 482 if err != nil { 483 http.Error(w, "failed to check layer references", 500) 484 return 485 } 486 487 if !stillReferenced { 488 // This layer is no longer used by user 489 layersToDecrement = append(layersToDecrement, LayerInfo{ 490 Digest: layer.Digest, 491 Size: layer.Size, 492 }) 493 } 494 } 495 496 // Step 3: Delete manifest from user's PDS 497 atprotoClient := atproto.NewClient(manifest.PDSEndpoint, did, accessToken) 498 err = atprotoClient.DeleteRecord(ctx, atproto.ManifestCollection, manifestRKey) 499 if err != nil { 500 http.Error(w, "failed to delete from PDS", 500) 501 return 502 } 503 504 // Step 4: Notify hold service to decrement quota 505 if len(layersToDecrement) > 0 { 506 holdClient := &http.Client{} 507 508 decrementReq := QuotaDecrementRequest{ 509 DID: did, 510 Layers: layersToDecrement, 511 } 512 513 body, _ := json.Marshal(decrementReq) 514 resp, err := holdClient.Post( 515 manifest.HoldEndpoint + "/quota/decrement", 516 "application/json", 517 bytes.NewReader(body), 518 ) 519 520 if err != nil || resp.StatusCode != 200 { 521 log.Printf("Warning: failed to update quota on hold service: %v", err) 522 // Continue anyway - GC reconciliation will fix it 523 } 524 } 525 526 // Step 5: Delete from AppView database 527 err = db.DeleteManifest(h.db, did, repository, digest) 528 if err != nil { 529 http.Error(w, "failed to delete from database", 500) 530 return 531 } 532 533 w.WriteHeader(http.StatusNoContent) 534} 535``` 536 537### Hold Service Decrement Endpoint 538 539```go 540// cmd/hold/main.go 541 542type QuotaDecrementRequest struct { 543 DID string `json:"did"` 544 Layers []LayerInfo `json:"layers"` 545} 546 547type LayerInfo struct { 548 Digest string `json:"digest"` 549 Size int64 `json:"size"` 550} 551 552func (s *HoldService) HandleQuotaDecrement(w http.ResponseWriter, r *http.Request) { 553 var req QuotaDecrementRequest 554 if err := json.NewDecoder(r.Body).Decode(&req); err != nil { 555 http.Error(w, "invalid request", 400) 556 return 557 } 558 559 // Read current quota 560 quota, err := s.quotaManager.GetQuota(req.DID) 561 if err != nil { 562 http.Error(w, "quota not found", 404) 563 return 564 } 565 566 // Decrement quota for each layer 567 for _, layer := range req.Layers { 568 if size, claimed := quota.ClaimedLayers[layer.Digest]; claimed { 569 // Remove from claimed layers 570 delete(quota.ClaimedLayers, layer.Digest) 571 quota.Used -= size 572 573 log.Printf("Decremented quota for %s: layer %s (%d bytes)", 574 req.DID, layer.Digest, size) 575 } else { 576 log.Printf("Warning: layer %s not in claimed_layers for %s", 577 layer.Digest, req.DID) 578 } 579 } 580 581 // Ensure quota.Used doesn't go negative (defensive) 582 if quota.Used < 0 { 583 log.Printf("Warning: quota.Used went negative for %s, resetting to 0", req.DID) 584 quota.Used = 0 585 } 586 587 // Save updated quota 588 quota.LastUpdated = time.Now() 589 if err := s.quotaManager.SaveQuota(quota); err != nil { 590 http.Error(w, "failed to save quota", 500) 591 return 592 } 593 594 // Return updated quota info 595 json.NewEncoder(w).Encode(map[string]any{ 596 "used": quota.Used, 597 "limit": quota.Limit, 598 }) 599} 600``` 601 602### SQL Query: Check Layer References 603 604```sql 605-- pkg/appview/db/queries.go 606 607-- Check if user still references this layer in other manifests 608SELECT COUNT(*) 609FROM layers l 610JOIN manifests m ON l.manifest_id = m.id 611WHERE m.did = ? -- User's DID 612 AND l.digest = ? -- Layer digest 613 AND m.id != ? -- Exclude the manifest being deleted 614``` 615 616## Garbage Collection 617 618### Background: Orphaned Blobs 619 620Orphaned blobs accumulate when: 6211. Manifest push fails after blobs uploaded (presigned URLs bypass hold) 6222. Quota exceeded - manifest rejected, blobs already in S3 6233. User deletes manifest - blobs no longer referenced 624 625**GC periodically cleans these up.** 626 627### GC Cron Implementation 628 629Similar to AppView's backfill worker, the hold service can run periodic GC: 630 631```go 632// cmd/hold/gc/gc.go 633 634type GarbageCollector struct { 635 driver storagedriver.StorageDriver 636 appviewURL string 637 holdURL string 638 quotaManager *quota.Manager 639} 640 641// Run garbage collection 642func (gc *GarbageCollector) Run(ctx context.Context) error { 643 log.Println("Starting garbage collection...") 644 645 // Step 1: Get list of referenced blobs from AppView 646 referenced, err := gc.getReferencedBlobs() 647 if err != nil { 648 return fmt.Errorf("failed to get referenced blobs: %w", err) 649 } 650 651 referencedSet := make(map[string]bool) 652 for _, digest := range referenced { 653 referencedSet[digest] = true 654 } 655 656 log.Printf("AppView reports %d referenced blobs", len(referenced)) 657 658 // Step 2: Walk S3 blobs 659 deletedCount := 0 660 reclaimedBytes := int64(0) 661 662 err = gc.driver.Walk(ctx, "/docker/registry/v2/blobs", func(fileInfo storagedriver.FileInfo) error { 663 if fileInfo.IsDir() { 664 return nil // Skip directories 665 } 666 667 // Extract digest from path 668 // Path: /docker/registry/v2/blobs/sha256/ab/abc123.../data 669 digest := extractDigestFromPath(fileInfo.Path()) 670 671 if !referencedSet[digest] { 672 // Unreferenced blob - delete it 673 size := fileInfo.Size() 674 675 if err := gc.driver.Delete(ctx, fileInfo.Path()); err != nil { 676 log.Printf("Failed to delete blob %s: %v", digest, err) 677 return nil // Continue anyway 678 } 679 680 deletedCount++ 681 reclaimedBytes += size 682 683 log.Printf("GC: Deleted unreferenced blob %s (%d bytes)", digest, size) 684 } 685 686 return nil 687 }) 688 689 if err != nil { 690 return fmt.Errorf("failed to walk blobs: %w", err) 691 } 692 693 log.Printf("GC complete: deleted %d blobs, reclaimed %d bytes", 694 deletedCount, reclaimedBytes) 695 696 return nil 697} 698 699// Get referenced blobs from AppView 700func (gc *GarbageCollector) getReferencedBlobs() ([]string, error) { 701 // Query AppView for all blobs referenced by manifests 702 // stored in THIS hold service 703 url := fmt.Sprintf("%s/internal/blobs/referenced?hold=%s", 704 gc.appviewURL, url.QueryEscape(gc.holdURL)) 705 706 resp, err := http.Get(url) 707 if err != nil { 708 return nil, err 709 } 710 defer resp.Body.Close() 711 712 var result struct { 713 Blobs []string `json:"blobs"` 714 } 715 716 if err := json.NewDecoder(resp.Body).Decode(&result); err != nil { 717 return nil, err 718 } 719 720 return result.Blobs, nil 721} 722``` 723 724### AppView Internal API 725 726```go 727// pkg/appview/handlers/internal.go 728 729// Get all referenced blobs for a specific hold 730func (h *InternalHandler) GetReferencedBlobs(w http.ResponseWriter, r *http.Request) { 731 holdEndpoint := r.URL.Query().Get("hold") 732 if holdEndpoint == "" { 733 http.Error(w, "missing hold parameter", 400) 734 return 735 } 736 737 // Query database for all layers in manifests stored in this hold 738 query := ` 739 SELECT DISTINCT l.digest 740 FROM layers l 741 JOIN manifests m ON l.manifest_id = m.id 742 WHERE m.hold_endpoint = ? 743 ` 744 745 rows, err := h.db.Query(query, holdEndpoint) 746 if err != nil { 747 http.Error(w, "database error", 500) 748 return 749 } 750 defer rows.Close() 751 752 blobs := []string{} 753 for rows.Next() { 754 var digest string 755 if err := rows.Scan(&digest); err != nil { 756 continue 757 } 758 blobs = append(blobs, digest) 759 } 760 761 json.NewEncoder(w).Encode(map[string]any{ 762 "blobs": blobs, 763 "count": len(blobs), 764 "hold": holdEndpoint, 765 }) 766} 767``` 768 769### GC Cron Schedule 770 771```go 772// cmd/hold/main.go 773 774func main() { 775 // ... service setup ... 776 777 // Start GC cron if enabled 778 if os.Getenv("GC_ENABLED") == "true" { 779 gcInterval := 24 * time.Hour // Daily by default 780 781 go func() { 782 ticker := time.NewTicker(gcInterval) 783 defer ticker.Stop() 784 785 for range ticker.C { 786 if err := garbageCollector.Run(context.Background()); err != nil { 787 log.Printf("GC error: %v", err) 788 } 789 } 790 }() 791 792 log.Printf("GC cron started: runs every %v", gcInterval) 793 } 794 795 // Start server... 796} 797``` 798 799## Quota Reconciliation 800 801### PDS as Source of Truth 802 803**Key insight:** Manifest records in PDS are publicly readable (no OAuth needed for reads). 804 805Each manifest contains: 806- Repository name 807- Digest 808- Layers array with digest + size 809- Hold endpoint 810 811The hold service can query the PDS to calculate the user's true quota: 812 813``` 8141. List all io.atcr.manifest records for user 8152. Filter manifests where holdEndpoint == this hold service 8163. Extract unique layers (deduplicate by digest) 8174. Sum layer sizes = true quota usage 8185. Compare to quota file 8196. Fix discrepancies 820``` 821 822### Implementation 823 824```go 825// cmd/hold/quota/reconcile.go 826 827type Reconciler struct { 828 quotaManager *Manager 829 atprotoResolver *atproto.Resolver 830 holdURL string 831} 832 833// ReconcileUser recalculates quota from PDS manifests 834func (r *Reconciler) ReconcileUser(ctx context.Context, did string) error { 835 log.Printf("Reconciling quota for %s", did) 836 837 // Step 1: Resolve user's PDS endpoint 838 identity, err := r.atprotoResolver.ResolveIdentity(ctx, did) 839 if err != nil { 840 return fmt.Errorf("failed to resolve DID: %w", err) 841 } 842 843 // Step 2: Create unauthenticated ATProto client 844 // (manifest records are public - no OAuth needed) 845 client := atproto.NewClient(identity.PDSEndpoint, did, "") 846 847 // Step 3: List all manifest records for this user 848 manifests, err := client.ListRecords(ctx, atproto.ManifestCollection, 1000) 849 if err != nil { 850 return fmt.Errorf("failed to list manifests: %w", err) 851 } 852 853 // Step 4: Filter manifests stored in THIS hold service 854 // and extract unique layers 855 uniqueLayers := make(map[string]int64) // digest -> size 856 857 for _, record := range manifests { 858 var manifest atproto.ManifestRecord 859 if err := json.Unmarshal(record.Value, &manifest); err != nil { 860 log.Printf("Warning: failed to parse manifest: %v", err) 861 continue 862 } 863 864 // Only count manifests stored in this hold 865 if manifest.HoldEndpoint != r.holdURL { 866 continue 867 } 868 869 // Add config blob 870 if manifest.Config.Digest != "" { 871 uniqueLayers[manifest.Config.Digest] = manifest.Config.Size 872 } 873 874 // Add layer blobs 875 for _, layer := range manifest.Layers { 876 uniqueLayers[layer.Digest] = layer.Size 877 } 878 } 879 880 // Step 5: Calculate true quota usage 881 trueUsage := int64(0) 882 for _, size := range uniqueLayers { 883 trueUsage += size 884 } 885 886 log.Printf("User %s true usage from PDS: %d bytes (%d unique layers)", 887 did, trueUsage, len(uniqueLayers)) 888 889 // Step 6: Compare with current quota file 890 quota, err := r.quotaManager.GetQuota(did) 891 if err != nil { 892 log.Printf("No existing quota for %s, creating new", did) 893 quota = &Quota{ 894 DID: did, 895 Limit: r.quotaManager.DefaultLimit, 896 ClaimedLayers: make(map[string]int64), 897 } 898 } 899 900 // Step 7: Fix discrepancies 901 if quota.Used != trueUsage || len(quota.ClaimedLayers) != len(uniqueLayers) { 902 log.Printf("Quota mismatch for %s: recorded=%d, actual=%d (diff=%d)", 903 did, quota.Used, trueUsage, trueUsage - quota.Used) 904 905 // Update quota to match PDS truth 906 quota.Used = trueUsage 907 quota.ClaimedLayers = uniqueLayers 908 quota.LastUpdated = time.Now() 909 910 if err := r.quotaManager.SaveQuota(quota); err != nil { 911 return fmt.Errorf("failed to save reconciled quota: %w", err) 912 } 913 914 log.Printf("Reconciled quota for %s: %d bytes", did, trueUsage) 915 } else { 916 log.Printf("Quota for %s is accurate", did) 917 } 918 919 return nil 920} 921 922// ReconcileAll reconciles all users (run periodically) 923func (r *Reconciler) ReconcileAll(ctx context.Context) error { 924 // Get list of all users with quota files 925 users, err := r.quotaManager.ListUsers() 926 if err != nil { 927 return err 928 } 929 930 log.Printf("Starting reconciliation for %d users", len(users)) 931 932 for _, did := range users { 933 if err := r.ReconcileUser(ctx, did); err != nil { 934 log.Printf("Failed to reconcile %s: %v", did, err) 935 // Continue with other users 936 } 937 } 938 939 log.Println("Reconciliation complete") 940 return nil 941} 942``` 943 944### Reconciliation Cron 945 946```go 947// cmd/hold/main.go 948 949func main() { 950 // ... setup ... 951 952 // Start reconciliation cron 953 if os.Getenv("QUOTA_RECONCILE_ENABLED") == "true" { 954 reconcileInterval := 24 * time.Hour // Daily 955 956 go func() { 957 ticker := time.NewTicker(reconcileInterval) 958 defer ticker.Stop() 959 960 for range ticker.C { 961 if err := reconciler.ReconcileAll(context.Background()); err != nil { 962 log.Printf("Reconciliation error: %v", err) 963 } 964 } 965 }() 966 967 log.Printf("Quota reconciliation cron started: runs every %v", reconcileInterval) 968 } 969 970 // ... start server ... 971} 972``` 973 974### Why PDS as Source of Truth Works 975 9761. **Manifests are canonical** - If manifest exists in PDS, user owns those layers 9772. **Public reads** - No OAuth needed, just resolve DID → PDS endpoint 9783. **ATProto durability** - PDS is user's authoritative data store 9794. **AppView is cache** - AppView database might lag or have inconsistencies 9805. **Reconciliation fixes drift** - Periodic sync from PDS ensures accuracy 981 982**Example reconciliation scenarios:** 983 984- **Orphaned quota entries:** User deleted manifest from PDS, but hold quota still has it 985 → Reconciliation removes from claimed_layers 986 987- **Missing quota entries:** User pushed manifest, but quota update failed 988 → Reconciliation adds to claimed_layers 989 990- **Race condition duplicates:** Two concurrent pushes double-counted a layer 991 → Reconciliation fixes to actual usage 992 993## Configuration 994 995### Hold Service Environment Variables 996 997```bash 998# .env.hold 999 1000# ============================================================================ 1001# Quota Configuration 1002# ============================================================================ 1003 1004# Enable quota enforcement 1005QUOTA_ENABLED=true 1006 1007# Default quota limit per user (bytes) 1008# 10GB = 10737418240 1009# 50GB = 53687091200 1010# 100GB = 107374182400 1011QUOTA_DEFAULT_LIMIT=10737418240 1012 1013# Storage backend for quota data 1014# Options: s3, sqlite 1015QUOTA_STORAGE_BACKEND=s3 1016 1017# For S3-based storage: 1018# Quota files stored in same bucket as blobs 1019QUOTA_STORAGE_PREFIX=/atcr/quota/ 1020 1021# For SQLite-based storage: 1022QUOTA_DB_PATH=/var/lib/atcr/hold-quota.db 1023 1024# ============================================================================ 1025# Garbage Collection 1026# ============================================================================ 1027 1028# Enable periodic garbage collection 1029GC_ENABLED=true 1030 1031# GC interval (default: 24h) 1032GC_INTERVAL=24h 1033 1034# AppView URL for GC reference checking 1035APPVIEW_URL=https://atcr.io 1036 1037# ============================================================================ 1038# Quota Reconciliation 1039# ============================================================================ 1040 1041# Enable quota reconciliation from PDS 1042QUOTA_RECONCILE_ENABLED=true 1043 1044# Reconciliation interval (default: 24h) 1045QUOTA_RECONCILE_INTERVAL=24h 1046 1047# ============================================================================ 1048# Hold Service Identity (Required) 1049# ============================================================================ 1050 1051# Public URL of this hold service 1052HOLD_PUBLIC_URL=https://hold1.example.com 1053 1054# Owner DID (for auto-registration) 1055HOLD_OWNER=did:plc:xyz123 1056``` 1057 1058### AppView Configuration 1059 1060```bash 1061# .env.appview 1062 1063# Internal API endpoint for hold services 1064# Used for GC reference checking 1065ATCR_INTERNAL_API_ENABLED=true 1066 1067# Optional: authentication token for internal APIs 1068ATCR_INTERNAL_API_TOKEN=secret123 1069``` 1070 1071## Trade-offs & Design Decisions 1072 1073### 1. Claimed Storage vs Physical Storage 1074 1075**Decision:** Track claimed storage (logical accounting) 1076 1077**Why:** 1078- Predictable for users: "you pay for what you upload" 1079- No complex cross-user dependencies 1080- Delete always gives you quota back 1081- Matches Harbor's proven model 1082 1083**Trade-off:** 1084- Total claimed can exceed physical storage 1085- Users might complain "I uploaded 10GB but S3 only has 6GB" 1086 1087**Mitigation:** 1088- Show deduplication savings metric 1089- Educate users: "You claimed 10GB, but deduplication saved 4GB" 1090 1091### 2. S3 vs SQLite for Quota Storage 1092 1093**Decision:** Support both, recommend based on use case 1094 1095**S3 Pros:** 1096- No database to manage 1097- Quota data lives with blobs 1098- Better for ephemeral BYOS 1099 1100**SQLite Pros:** 1101- Faster (no network) 1102- ACID transactions (no race conditions) 1103- Better for high-traffic shared holds 1104 1105**Trade-off:** 1106- S3: eventual consistency, race conditions 1107- SQLite: stateful service, scaling challenges 1108 1109**Mitigation:** 1110- Reconciliation fixes S3 inconsistencies 1111- SQLite can use shared DB for multi-instance 1112 1113### 3. Optimistic Quota Update 1114 1115**Decision:** Update quota BEFORE upload completes 1116 1117**Why:** 1118- Prevent race conditions (two users uploading simultaneously) 1119- Can reject before presigned URL generated 1120- Simpler flow 1121 1122**Trade-off:** 1123- If upload fails, quota already incremented (user "paid" for nothing) 1124 1125**Mitigation:** 1126- Reconciliation from PDS fixes orphaned quota entries 1127- Acceptable for MVP (upload failures are rare) 1128 1129### 4. AppView as Intermediary 1130 1131**Decision:** AppView notifies hold service on deletes 1132 1133**Why:** 1134- AppView already has manifest/layer database 1135- Can efficiently check if layer still referenced 1136- Hold service doesn't need to query PDS on every delete 1137 1138**Trade-off:** 1139- AppView → Hold dependency 1140- Network hop on delete 1141 1142**Mitigation:** 1143- If notification fails, reconciliation fixes quota 1144- Eventually consistent is acceptable 1145 1146### 5. PDS as Source of Truth 1147 1148**Decision:** Use PDS manifests for reconciliation 1149 1150**Why:** 1151- Manifests in PDS are canonical user data 1152- Public reads (no OAuth for reconciliation) 1153- AppView database might lag or be inconsistent 1154 1155**Trade-off:** 1156- Reconciliation requires PDS queries (slower) 1157- Limited to 1000 manifests per query 1158 1159**Mitigation:** 1160- Run reconciliation daily (not real-time) 1161- Paginate if user has >1000 manifests 1162 1163## Future Enhancements 1164 1165### 1. Quota API Endpoints 1166 1167``` 1168GET /quota/usage - Get current user's quota 1169GET /quota/breakdown - Get storage by repository 1170POST /quota/limit - Update user's quota limit (admin) 1171GET /quota/stats - Get hold-wide statistics 1172``` 1173 1174### 2. Quota Alerts 1175 1176Notify users when approaching limit: 1177- Email/webhook at 80%, 90%, 95% 1178- Reject uploads at 100% (currently implemented) 1179- Grace period: allow 105% temporarily 1180 1181### 3. Tiered Quotas 1182 1183Different limits based on user tier: 1184- Free: 10GB 1185- Pro: 100GB 1186- Enterprise: unlimited 1187 1188### 4. Quota Purchasing 1189 1190Allow users to buy additional storage: 1191- Stripe integration 1192- $0.10/GB/month pricing 1193- Dynamic limit updates 1194 1195### 5. Cross-Hold Deduplication 1196 1197If multiple holds share same S3 bucket: 1198- Track blob ownership globally 1199- Split costs proportionally 1200- More complex, but maximizes deduplication 1201 1202### 6. Manifest-Based Quota (Alternative Model) 1203 1204Instead of tracking layers, track manifests: 1205- Simpler: just count manifest sizes 1206- No deduplication benefits for users 1207- Might be acceptable for some use cases 1208 1209### 7. Redis-Based Quota (High Performance) 1210 1211For high-traffic registries: 1212- Use Redis instead of S3/SQLite 1213- Sub-millisecond quota checks 1214- Harbor-proven approach 1215 1216### 8. Quota Visualizations 1217 1218Web UI showing: 1219- Storage usage over time 1220- Top consumers by repository 1221- Deduplication savings graph 1222- Layer size distribution 1223 1224## Appendix: SQL Queries 1225 1226### Check if User Still References Layer 1227 1228```sql 1229-- After deleting manifest, check if user has other manifests using this layer 1230SELECT COUNT(*) 1231FROM layers l 1232JOIN manifests m ON l.manifest_id = m.id 1233WHERE m.did = ? -- User's DID 1234 AND l.digest = ? -- Layer digest to check 1235 AND m.id != ? -- Exclude the manifest being deleted 1236``` 1237 1238### Get All Unique Layers for User 1239 1240```sql 1241-- Calculate true quota usage for a user 1242SELECT DISTINCT l.digest, l.size 1243FROM layers l 1244JOIN manifests m ON l.manifest_id = m.id 1245WHERE m.did = ? 1246 AND m.hold_endpoint = ? 1247``` 1248 1249### Get Referenced Blobs for Hold 1250 1251```sql 1252-- For GC: get all blobs still referenced by any user of this hold 1253SELECT DISTINCT l.digest 1254FROM layers l 1255JOIN manifests m ON l.manifest_id = m.id 1256WHERE m.hold_endpoint = ? 1257``` 1258 1259### Get Storage Stats by Repository 1260 1261```sql 1262-- User's storage broken down by repository 1263SELECT 1264 m.repository, 1265 COUNT(DISTINCT m.id) as manifest_count, 1266 COUNT(DISTINCT l.digest) as unique_layers, 1267 SUM(l.size) as total_size 1268FROM manifests m 1269JOIN layers l ON l.manifest_id = m.id 1270WHERE m.did = ? 1271 AND m.hold_endpoint = ? 1272GROUP BY m.repository 1273ORDER BY total_size DESC 1274``` 1275 1276## References 1277 1278- **Harbor Quotas:** https://goharbor.io/docs/1.10/administration/configure-project-quotas/ 1279- **Harbor Source:** https://github.com/goharbor/harbor 1280- **ATProto Spec:** https://atproto.com/specs/record 1281- **OCI Distribution Spec:** https://github.com/opencontainers/distribution-spec 1282- **S3 API Reference:** https://docs.aws.amazon.com/AmazonS3/latest/API/ 1283- **Distribution GC:** https://github.com/distribution/distribution/blob/main/registry/storage/garbagecollect.go 1284 1285--- 1286 1287**Document Version:** 1.0 1288**Last Updated:** 2025-10-09 1289**Author:** Generated from implementation research and Harbor analysis