A container registry that uses the AT Protocol for manifest storage and S3 for blob storage. atcr.io
docker container atproto go
at refactor 820 lines 27 kB view raw view rendered
1# Layer Records in ATProto 2 3## Overview 4 5This document describes the architecture for storing container layer metadata as ATProto records in the hold service's embedded PDS. This makes blob storage more "ATProto-native" by creating discoverable records for each unique layer. 6 7## TL;DR 8 9**Status: BUG FIXED ✅ | Layer Records Feature PLANNED 🔮** 10 11### Quick Fix (IMPLEMENTED) 12 13The critical bug where S3Native multipart uploads didn't move from temp → final location is now **FIXED**. 14 15**What was fixed:** 161. ✅ AppView sends real digest in complete request (not just tempDigest) 172. ✅ Hold's CompleteMultipartUploadWithManager now accepts finalDigest parameter 183. ✅ S3Native mode copies temp → final and deletes temp 194. ✅ Buffered mode writes directly to final location 20 21**Files changed:** 22- `pkg/appview/storage/proxy_blob_store.go` - Send real digest 23- `pkg/hold/s3.go` - Add copyBlobS3() and deleteBlobS3() 24- `pkg/hold/multipart.go` - Use finalDigest and move blob 25- `pkg/hold/blobstore_adapter.go` - Pass finalDigest through 26- `pkg/hold/pds/xrpc.go` - Update interface and handler 27 28### Layer Records Feature (PLANNED) 29 30Building on the quick fix, layer records will add: 311. 🔮 Hold creates ATProto record for each unique layer 322. 🔮 Deduplication: check layer record exists before finalizing upload 333. 🔮 Manifest backlinks: include layer record AT-URIs 344. 🔮 Discovery: `listRecords(io.atcr.manifest.layers)` shows all unique blobs 35 36**Benefits:** 37- Makes blobs discoverable via ATProto protocol 38- Enables garbage collection (find unreferenced layers) 39- Foundation for per-layer access control 40- Audit trail for storage operations 41 42## Motivation 43 44**Goal:** Make hold services more ATProto-native by tracking unique blobs as records. 45 46**Benefits:** 47- **Discovery:** Query `listRecords(io.atcr.manifest.layers)` to see all unique layers in a hold 48- **Auditing:** Track when unique content arrived, sizes, media types 49- **Deduplication:** One record per unique digest (not per upload) 50- **Migration:** Enumerate all blobs for moving between storage backends 51- **Future:** Foundation for per-blob access control, retention policies 52 53**Key Design Decision:** Store records for **unique digests only**, not every blob upload. This mirrors the content-addressed deduplication already happening in S3. 54 55## Current Upload Flow 56 57### OCI Distribution Spec Pattern 58 59The OCI distribution spec uses a two-phase upload: 60 611. **Initiate Upload** 62 ``` 63 POST /v2/<name>/blobs/uploads/ 64 → Returns upload UUID (digest unknown at this point!) 65 ``` 66 672. **Upload Data** 68 ``` 69 PATCH/PUT to temp location: uploads/temp-<uuid> 70 → Client streams blob data 71 → Digest not yet known 72 ``` 73 743. **Finalize Upload** 75 ``` 76 PUT /v2/<name>/blobs/uploads/<uuid>?digest=sha256:abc123 77 → Digest provided at finalization time 78 → Registry moves: temp → final location at digest path 79 ``` 80 81**Critical insight:** In standard OCI distribution, the digest is only known at **finalization time**, not during upload. This allows clients to compute the digest as they stream data. 82 83### Current ATCR Implementation 84 85**Multipart Upload Flow:** 86 87``` 881. Start multipart (XRPC POST with action=start, digest=sha256:abc...) 89 - Client provides digest upfront (xrpc.go:849 requires req.Digest) 90 - Generate uploadID (UUID) 91 - S3Native: Create S3 multipart upload at FINAL path blobPath(digest) 92 - Buffered: Create in-memory session with digest 93 - Session stores: uploadID, digest, mode 94 952. Upload parts (XRPC POST with action=part, uploadId, partNumber) 96 - S3Native: Returns presigned URLs to upload parts to final location 97 - Buffered: Returns XRPC endpoint with X-Upload-Id/X-Part-Number headers 98 - Parts go to final digest location (S3Native) or memory (Buffered) 99 1003. Complete (XRPC POST with action=complete, uploadId, parts[]) 101 - S3Native: S3 CompleteMultipartUpload at final location 102 - Buffered: Assemble parts, write to final location blobPath(digest) 103``` 104 105**Current paths:** 106- Final: `/docker/registry/v2/blobs/{algorithm}/{xx}/{hash}/data` 107- Example: `/docker/registry/v2/blobs/sha256/ab/abc123.../data` 108- Temp: `/docker/registry/v2/uploads/temp-<uuid>/data` (used during upload, then moved to final) 109 110**Key insight:** Unlike standard OCI distribution spec (where digest is provided at finalization), ATCR's XRPC multipart flow requires digest upfront at start time. This is fine, but we should still use temp paths for atomic deduplication with layer records. 111 112**Note:** The move operation bug described below has been fixed. The rest of this document describes the planned layer records feature. 113 114## The Bug (FIXED) 115 116### How It Was Fixed 117 118The bug was fixed by: 119 1201. **AppView** sends the real digest in complete request (not tempDigest) 121 - `pkg/appview/storage/proxy_blob_store.go:740-745` 122 1232. **Hold** accepts finalDigest parameter in CompleteMultipartUpload 124 - `pkg/hold/multipart.go:281` - Added finalDigest parameter 125 - `pkg/hold/s3.go:223-285` - Added copyBlobS3() and deleteBlobS3() 126 1273. **S3Native mode** now moves blob from temp → final location 128 - Complete multipart at temp location 129 - Copy to final digest location 130 - Delete temp 131 1324. **Buffered mode** writes directly to final location (no change needed) 133 134**Result:** Blobs are now correctly placed at final digest paths, downloads work correctly. 135 136### The Problem (Historical Context) 137 138Looking at the old `pkg/hold/multipart.go:278-317`, the `CompleteMultipartUploadWithManager` function: 139 140**S3Native mode (lines 282-289):** 141```go 142if session.Mode == S3Native { 143 parts := session.GetCompletedParts() 144 if err := s.completeMultipartUpload(ctx, session.Digest, session.S3UploadID, parts); err != nil { 145 return fmt.Errorf("failed to complete S3 multipart: %w", err) 146 } 147 log.Printf("Completed S3 native multipart: uploadID=%s, parts=%d", session.UploadID, len(parts)) 148 return nil // ❌ Missing move operation! 149} 150``` 151 152**What's missing:** 1531. S3 CompleteMultipartUpload assembles parts at temp location: `uploads/temp-<uuid>` 1542. **MISSING:** S3 CopyObject from `uploads/temp-<uuid>` → `blobs/sha256/ab/abc123.../data` 1553. **MISSING:** Delete temp blob 156 157**Buffered mode works correctly** (lines 292-316) because it writes assembled data directly to final path `blobPath(session.Digest)`. 158 159### Evidence from Design Doc 160 161From `docs/XRPC_BLOB_MIGRATION.md` (lines 105-114): 162``` 1631. Multipart parts uploaded → uploads/temp-{uploadID} 1642. Complete multipart → S3 assembles parts at uploads/temp-{uploadID} 1653. **Move operation** → S3 copy from uploads/temp-{uploadID} → blobs/sha256/ab/abc123... 166``` 167 168The move was supposed to be internalized into the complete action (lines 308-311): 169``` 170Call service.CompleteMultipartUploadWithManager(ctx, session, multipartMgr) 171 - This internally calls S3 CompleteMultipartUpload to assemble parts 172 - Then performs server-side S3 copy from temp location to final digest location 173 - Equivalent to legacy /move endpoint operation 174``` 175 176### The Actual Flow (Currently Broken for S3Native) 177 178**AppView sends tempDigest:** 179```go 180// proxy_blob_store.go 181tempDigest := fmt.Sprintf("uploads/temp-%s", writerID) 182uploadID, err := p.startMultipartUpload(ctx, tempDigest) 183// Passes tempDigest to hold via XRPC 184``` 185 186**Hold receives and uses tempDigest:** 187```go 188// xrpc.go:854 189uploadID, mode, err := h.blobStore.StartMultipartUpload(ctx, req.Digest) 190// req.Digest = "uploads/temp-<writerID>" from AppView 191 192// blobstore_adapter.go → multipart.go → s3.go:93 193path := blobPath(digest) // digest = "uploads/temp-<writerID>" 194// Returns: "/docker/registry/v2/uploads/temp-<writerID>/data" 195 196// S3 multipart created at temp path ✅ 197``` 198 199**Parts uploaded to temp location ✅** 200 201**Complete called:** 202```go 203// proxy_blob_store.go (comment on line): 204// Complete multipart upload - XRPC complete action handles move internally 205if err := w.store.completeMultipartUpload(ctx, tempDigest, w.uploadID, w.parts); err != nil 206``` 207 208**Hold's CompleteMultipartUploadWithManager for S3Native:** 209```go 210// multipart.go:282-289 211if session.Mode == S3Native { 212 parts := session.GetCompletedParts() 213 if err := s.completeMultipartUpload(ctx, session.Digest, session.S3UploadID, parts); err != nil { 214 return fmt.Errorf("failed to complete S3 multipart: %w", err) 215 } 216 log.Printf("Completed S3 native multipart: uploadID=%s, parts=%d", session.UploadID, len(parts)) 217 return nil // ❌ BUG: No move operation! 218} 219``` 220 221**Result:** 222- Blob is at: `/docker/registry/v2/uploads/temp-<writerID>/data` (temp location) 223- Blob should be at: `/docker/registry/v2/blobs/sha256/ab/abc123.../data` (final location) 224- **Downloads will fail** because AppView looks for blob at final digest path 225 226**Why this might appear to work:** 227- Buffered mode writes directly to final path (no temp used) 228- Or S3Native isn't being used in current deployments 229- Or there's a workaround somewhere else 230 231## Proposed Flow with Layer Records (Future Feature) 232 233### High-Level Flow 234 235**Building on the quick fix above, layer records will add:** 2361. PDS record creation for each unique layer digest 2372. Deduplication check before finalizing storage 2383. Manifest backlinks to layer records 239 240**Note:** The quick fix already implements sending finalDigest in complete request. The layer records feature extends this to create ATProto records. 241 242``` 2431. Start multipart upload (XRPC action=start with tempDigest) 244 - AppView provides tempDigest: "uploads/temp-<writerID>" 245 - S3Native: Create S3 multipart at temp path: /uploads/temp-<writerID>/data 246 - Buffered: Create in-memory session with temp identifier 247 - Store in MultipartSession: 248 * TempDigest: "uploads/temp-<writerID>" (upload location) 249 * FinalDigest: null (not known yet at start time!) 250 251 NOTE: AppView knows the real digest (desc.Digest), but doesn't send it at start 252 2532. Upload parts (XRPC action=part) 254 - S3Native: Presigned URLs to temp path (uploads/temp-<uuid>) 255 - Buffered: Buffer parts in memory with temp identifier 256 - All parts go to temp location (not final digest location yet) 257 2583. Complete upload (XRPC action=complete, uploadId, finalDigest, parts) 259 - AppView NOW sends: 260 * uploadId: the session ID 261 * finalDigest: "sha256:abc123..." (the real digest for final location) 262 * parts: array of {partNumber, etag} 263 264 - Hold looks up session by uploadId 265 - Updates session.FinalDigest = finalDigest 266 267 a. Try PutRecord(io.atcr.manifest.layers, digestHash, layerRecord) 268 - digestHash = finalDigest without "sha256:" prefix 269 - Record key = digestHash (content-addressed, naturally idempotent) 270 271 b. If record already exists (PDS returns ErrRecordAlreadyExists): 272 - DEDUPLICATION! Layer already tracked 273 - Delete temp blob (S3 or buffered data) 274 - Return existing layerRecord AT-URI 275 - Client saved bandwidth/time (uploaded to temp, but not stored) 276 277 c. If record creation succeeds (new layer!): 278 - Finalize storage: 279 * S3Native: S3 CopyObject(uploads/temp-<uuid> → blobs/sha256/ab/abc123.../data) 280 * Buffered: Write assembled data to final path (blobs/sha256/ab/abc123.../data) 281 - Delete temp 282 - Return new layerRecord AT-URI + metadata 283 284 d. If record creation fails (PDS error): 285 - Delete temp blob 286 - Return error (upload failed, no storage consumed) 287``` 288 289**Why use temp paths if digest is known?** 290- Deduplication check happens BEFORE committing blob to storage 291- If layer exists, we avoid expensive S3 copy to final location 292- Atomic: record creation + blob finalization together 293 294### Atomic Commit Logic 295 296The key is making record creation + blob finalization atomic: 297 298```go 299// In CompleteMultipartUploadWithManager 300func (s *HoldService) CompleteMultipartUploadWithManager( 301 ctx context.Context, 302 session *MultipartSession, 303 manager *MultipartManager, 304) (layerRecordURI string, err error) { 305 defer manager.DeleteSession(session.UploadID) 306 307 // Session now has both temp and final digests 308 tempDigest := session.TempDigest // "uploads/temp-<writerID>" 309 finalDigest := session.FinalDigest // "sha256:abc123..." (set during complete) 310 311 tempPath := blobPath(tempDigest) // /uploads/temp-<writerID>/data 312 finalPath := blobPath(finalDigest) // /blobs/sha256/ab/abc123.../data 313 314 // Extract digest hash for record key 315 digestHash := strings.TrimPrefix(finalDigest, "sha256:") 316 317 // Build layer record 318 layerRecord := &atproto.ManifestLayerRecord{ 319 Type: "io.atcr.manifest.layers", 320 Digest: finalDigest, 321 Size: session.TotalSize, 322 MediaType: "application/vnd.oci.image.layer.v1.tar+gzip", 323 UploadedAt: time.Now().Format(time.RFC3339), 324 } 325 326 // Try to create layer record (idempotent with digest as rkey) 327 err = s.holdPDS.PutRecord(ctx, atproto.ManifestLayersCollection, digestHash, layerRecord) 328 329 if err == atproto.ErrRecordAlreadyExists { 330 // Dedupe! Layer already tracked 331 log.Printf("Layer already exists, deduplicating: digest=%s", digest) 332 s.deleteBlob(ctx, tempPath) 333 334 // Return existing record URI 335 return fmt.Sprintf("at://%s/%s/%s", 336 s.holdPDS.DID(), 337 atproto.ManifestLayersCollection, 338 digestHash), nil 339 } else if err != nil { 340 // PDS error - abort upload 341 log.Printf("Failed to create layer record: %v", err) 342 s.deleteBlob(ctx, tempPath) 343 return "", fmt.Errorf("failed to create layer record: %w", err) 344 } 345 346 // New layer! Finalize storage 347 if session.Mode == S3Native { 348 // S3 multipart already uploaded to temp path 349 // Copy to final location 350 if err := s.copyBlob(ctx, tempPath, finalPath); err != nil { 351 // Rollback: delete layer record 352 s.holdPDS.DeleteRecord(ctx, atproto.ManifestLayersCollection, digestHash) 353 s.deleteBlob(ctx, tempPath) 354 return "", fmt.Errorf("failed to copy blob: %w", err) 355 } 356 s.deleteBlob(ctx, tempPath) 357 } else { 358 // Buffered mode: assemble and write to final location 359 data, size, err := session.AssembleBufferedParts() 360 if err != nil { 361 s.holdPDS.DeleteRecord(ctx, atproto.ManifestLayersCollection, digestHash) 362 return "", fmt.Errorf("failed to assemble parts: %w", err) 363 } 364 365 if err := s.writeBlob(ctx, finalPath, data); err != nil { 366 s.holdPDS.DeleteRecord(ctx, atproto.ManifestLayersCollection, digestHash) 367 return "", fmt.Errorf("failed to write blob: %w", err) 368 } 369 370 log.Printf("Wrote blob to final location: size=%d", size) 371 } 372 373 // Success! Return new layer record URI 374 layerRecordURI = fmt.Sprintf("at://%s/%s/%s", 375 s.holdPDS.DID(), 376 atproto.ManifestLayersCollection, 377 digestHash) 378 379 log.Printf("Created new layer record: %s", layerRecordURI) 380 return layerRecordURI, nil 381} 382``` 383 384## Lexicon Schema 385 386### io.atcr.manifest.layers 387 388```json 389{ 390 "lexicon": 1, 391 "id": "io.atcr.manifest.layers", 392 "defs": { 393 "main": { 394 "type": "record", 395 "key": "literal:self", 396 "record": { 397 "type": "object", 398 "required": ["digest", "size", "mediaType", "uploadedAt"], 399 "properties": { 400 "digest": { 401 "type": "string", 402 "description": "Full OCI digest (sha256:abc123...)" 403 }, 404 "size": { 405 "type": "integer", 406 "description": "Size in bytes" 407 }, 408 "mediaType": { 409 "type": "string", 410 "description": "Media type (e.g., application/vnd.oci.image.layer.v1.tar+gzip)" 411 }, 412 "uploadedAt": { 413 "type": "string", 414 "format": "datetime", 415 "description": "When this unique layer first arrived" 416 } 417 } 418 } 419 } 420 } 421} 422``` 423 424**Record key:** Digest hash (without algorithm prefix) 425- Example: `sha256:abc123...` → record key `abc123...` 426- This makes records content-addressed and naturally deduplicates 427 428### Example Record 429 430```json 431{ 432 "$type": "io.atcr.manifest.layers", 433 "digest": "sha256:abc123def456...", 434 "size": 12345678, 435 "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", 436 "uploadedAt": "2025-10-18T12:34:56Z" 437} 438``` 439 440**AT-URI:** `at://did:web:hold1.atcr.io/io.atcr.manifest.layers/abc123def456...` 441 442## Implementation Details 443 444### Files to Modify 445 4461. **pkg/atproto/lexicon.go** 447 - Add `ManifestLayersCollection = "io.atcr.manifest.layers"` 448 - Add `ManifestLayerRecord` struct 449 4502. **pkg/hold/multipart.go** 451 - Update `MultipartSession` struct: 452 - Rename `Digest` to `TempDigest` - temp identifier (e.g., "uploads/temp-<writerID>") 453 - Add `FinalDigest string` - final digest (e.g., "sha256:abc123..."), set during complete 454 - Update `StartMultipartUploadWithManager` to: 455 - Receive tempDigest from AppView (not final digest) 456 - Create S3 multipart at temp path 457 - Store TempDigest in session (FinalDigest is null at start) 458 - Modify `CompleteMultipartUploadWithManager` to: 459 - Try PutRecord to create layer record 460 - If exists: delete temp, return existing record (dedupe) 461 - If new: finalize storage (copy/move temp → final) 462 - Handle rollback on errors 463 4643. **pkg/hold/s3.go** 465 - Add `copyBlob(src, dst)` for S3 CopyObject 466 - Add `deleteBlob(path)` for cleanup 467 4684. **pkg/hold/storage.go** 469 - Update `blobPath()` to handle temp digests 470 - Add helper for final path generation 471 4725. **pkg/hold/pds/server.go** 473 - Add `PutRecord(ctx, collection, rkey, record)` method to HoldPDS 474 - Wraps `repomgr.CreateRecord()` or `repomgr.UpdateRecord()` 475 - Returns `ErrRecordAlreadyExists` if rkey exists (for deduplication) 476 - Similar pattern to existing `AddCrewMember()` method 477 - Add `DeleteRecord(ctx, collection, rkey)` method (for rollback) 478 - Wraps `repomgr.DeleteRecord()` 479 - Add error constant: `var ErrRecordAlreadyExists = errors.New("record already exists")` 480 4816. **pkg/hold/pds/xrpc.go** 482 - Update `BlobStore` interface: 483 - Change `CompleteMultipartUpload` signature: 484 * Was: `CompleteMultipartUpload(ctx, uploadID, parts) error` 485 * New: `CompleteMultipartUpload(ctx, uploadID, finalDigest, parts) (*LayerMetadata, error)` 486 * Takes finalDigest to know where to move blob + create layer record 487 - Update `handleMultipartOperation` complete action to: 488 - Parse `finalDigest` from request body (NEW) 489 - Look up session by uploadID 490 - Set session.FinalDigest = finalDigest 491 - Call CompleteMultipartUpload (returns LayerMetadata) 492 - Include layerRecord AT-URI in response 493 - Add `LayerMetadata` struct: 494 ```go 495 type LayerMetadata struct { 496 LayerRecord string // AT-URI 497 Digest string 498 Size int64 499 Deduplicated bool 500 } 501 ``` 502 5037. **pkg/appview/storage/proxy_blob_store.go** 504 - Update `ProxyBlobWriter.Commit()` to send finalDigest in complete request: 505 ```go 506 // Current: only sends tempDigest 507 completeMultipartUpload(ctx, tempDigest, uploadID, parts) 508 509 // New: also sends finalDigest 510 completeMultipartUpload(ctx, uploadID, finalDigest, parts) 511 ``` 512 - The writer already has `w.desc.Digest` (the real digest) 513 - Pass both uploadID (to find session) and finalDigest (for move + layer record) 514 515### API Changes 516 517#### Complete Multipart Request (XRPC) - UPDATED 518 519**Before:** 520```json 521{ 522 "action": "complete", 523 "uploadId": "upload-1634567890", 524 "parts": [ 525 { "partNumber": 1, "etag": "abc123" }, 526 { "partNumber": 2, "etag": "def456" } 527 ] 528} 529``` 530 531**After (with finalDigest):** 532```json 533{ 534 "action": "complete", 535 "uploadId": "upload-1634567890", 536 "digest": "sha256:abc123...", 537 "parts": [ 538 { "partNumber": 1, "etag": "abc123" }, 539 { "partNumber": 2, "etag": "def456" } 540 ] 541} 542``` 543 544#### Complete Multipart Response (XRPC) 545 546**Before:** 547```json 548{ 549 "status": "completed" 550} 551``` 552 553**After:** 554```json 555{ 556 "status": "completed", 557 "layerRecord": "at://did:web:hold1.atcr.io/io.atcr.manifest.layers/abc123...", 558 "digest": "sha256:abc123...", 559 "size": 12345678, 560 "deduplicated": false 561} 562``` 563 564**Deduplication case:** 565```json 566{ 567 "status": "completed", 568 "layerRecord": "at://did:web:hold1.atcr.io/io.atcr.manifest.layers/abc123...", 569 "digest": "sha256:abc123...", 570 "size": 12345678, 571 "deduplicated": true 572} 573``` 574 575### S3 Operations 576 577**S3 Native Mode:** 578```go 579// Start: Create multipart upload at TEMP path 580uploadID = s3.CreateMultipartUpload(bucket, "uploads/temp-<uuid>") 581 582// Upload parts: to temp location 583s3.UploadPart(bucket, "uploads/temp-<uuid>", partNum, data) 584 585// Complete: Copy temp → final 586s3.CopyObject( 587 bucket, "uploads/temp-<uuid>", // source 588 bucket, "blobs/sha256/ab/abc123.../data" // dest 589) 590s3.DeleteObject(bucket, "uploads/temp-<uuid>") 591``` 592 593**Buffered Mode:** 594```go 595// Parts buffered in memory 596session.Parts[partNum] = data 597 598// Complete: Write to final location 599assembledData = session.AssembleBufferedParts() 600driver.Writer("blobs/sha256/ab/abc123.../data").Write(assembledData) 601``` 602 603## Manifest Integration 604 605### Manifest Record Enhancement 606 607When AppView writes manifests to user's PDS, include layer record references: 608 609```json 610{ 611 "$type": "io.atcr.manifest", 612 "repository": "myapp", 613 "digest": "sha256:manifest123...", 614 "holdEndpoint": "https://hold1.atcr.io", 615 "holdDid": "did:web:hold1.atcr.io", 616 "layers": [ 617 { 618 "digest": "sha256:abc123...", 619 "size": 12345678, 620 "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", 621 "layerRecord": "at://did:web:hold1.atcr.io/io.atcr.manifest.layers/abc123..." 622 } 623 ] 624} 625``` 626 627**Cross-repo references:** Manifests in user's PDS point to layer records in hold's PDS. 628 629### AppView Flow 630 6311. Client pushes layer to hold 6322. Hold returns `layerRecord` AT-URI in response 6333. AppView caches: `digest → layerRecord AT-URI` 6344. When writing manifest to user's PDS: 635 - Add `layerRecord` field to each layer 636 - Add `holdDid` to manifest root 637 638## Benefits 639 6401. **ATProto Discovery** 641 - `listRecords(io.atcr.manifest.layers)` shows all unique layers 642 - Standard ATProto queries work 643 6442. **Automatic Deduplication** 645 - PutRecord with digest as rkey is naturally idempotent 646 - Concurrent uploads of same layer handled gracefully 647 6483. **Audit Trail** 649 - Track when each unique layer first arrived 650 - Monitor storage growth by unique content 651 6524. **Migration Support** 653 - Enumerate all blobs via ATProto queries 654 - Verify blob existence before migration 655 6565. **Cross-Repo References** 657 - Manifests link to layer records via AT-URI 658 - Verifiable blob existence 659 6606. **Future Features** 661 - Per-layer access control 662 - Retention policies 663 - Layer tagging/metadata 664 665## Trade-offs 666 667### Complexity 668- Additional PDS writes during upload 669- S3 copy operation (temp → final) 670- Rollback logic if record creation succeeds but storage fails 671 672### Performance 673- Extra latency: PDS write + S3 copy 674- BUT: Deduplication saves bandwidth on repeated uploads 675 676### Storage 677- Minimal: Layer records are just metadata (~200 bytes each) 678- S3 temp → final copy uses same S3 account (no egress cost) 679 680### Consistency 681- Must keep layer records and S3 blobs in sync 682- Rollback deletes layer record if storage fails 683- Orphaned records possible if process crashes mid-commit 684 685## Future Considerations 686 687### Garbage Collection 688 689Layer records enable GC: 690``` 6911. List all layer records in hold 6922. For each layer: 693 - Query manifests that reference it (via AppView) 694 - If no references, mark for deletion 6953. Delete unreferenced layers (record + blob) 696``` 697 698### Private Layers 699 700Currently, holds are public or crew-only (hold-level auth). Future: 701- Per-layer permissions via layer record metadata 702- Reference from manifest proves user has access 703 704### Layer Provenance 705 706Track additional metadata: 707- First uploader DID 708- Upload source (manifest URI) 709- Verification status 710 711## Configuration 712 713Add environment variable: 714``` 715HOLD_TRACK_LAYERS=true # Enable layer record creation (default: true) 716``` 717 718If disabled, hold service works as before (no layer records). 719 720## Testing Strategy 721 7221. **Deduplication Test** 723 - Upload same layer twice 724 - Verify only one record created 725 - Verify second upload returns same AT-URI 726 7272. **Concurrent Upload Test** 728 - Upload same layer from 2 clients simultaneously 729 - Verify one succeeds, one dedupes 730 - Verify only one blob in S3 731 7323. **Rollback Test** 733 - Mock S3 failure after record creation 734 - Verify layer record is deleted (rollback) 735 7364. **Migration Test** 737 - Upload multiple layers 738 - List all layer records 739 - Verify blobs exist in S3 740 741## Open Questions 742 7431. **What happens if S3 copy fails after record creation?** 744 - Current plan: Delete layer record (rollback) 745 - Alternative: Leave record, retry copy on next request? 746 7472. **Should we verify blob digest matches record?** 748 - On upload: Client provides digest, but we trust it 749 - Could compute digest during upload to verify 750 7513. **How to handle orphaned layer records?** 752 - Record exists but blob missing from S3 753 - Background job to verify and clean up? 754 7554. **Should manifests store layer records?** 756 - Yes: Strong references, verifiable 757 - No: Extra complexity, larger manifests 758 - **Decision:** Yes, for ATProto graph completeness 759 760## Testing & Verification 761 762### Verify the Quick Fix Works (Bug is Fixed) 763 764After the quick fix implementation: 765 7661. **Push a test image** with S3Native mode enabled 7672. **Verify blob at final location:** 768 ```bash 769 aws s3 ls s3://bucket/docker/registry/v2/blobs/sha256/ab/abc123.../data 770 ``` 7713. **Verify temp is cleaned up:** 772 ```bash 773 aws s3 ls s3://bucket/docker/registry/v2/uploads/temp-* # Should be empty 774 ``` 7754. **Pull the image** → should succeed ✅ 776 777### Test Layer Records Feature (When Implemented) 778 779After implementing the full layer records feature: 780 7811. **Push an image** 7822. **Verify layer record created:** 783 ``` 784 GET /xrpc/com.atproto.repo.getRecord?repo={holdDID}&collection=io.atcr.manifest.layers&rkey=abc123... 785 ``` 7863. **Verify blob at final location** (same as quick fix) 7874. **Verify temp deleted** (same as quick fix) 7885. **Pull image** → should succeed 789 790### Test Deduplication (Layer Records Feature) 791 7921. Push same layer from different client 7932. Verify only one layer record exists 7943. Verify complete returns `deduplicated: true` 7954. Verify no duplicate blobs in S3 7965. Verify temp blob was deleted without copying (dedupe path) 797 798## Summary 799 800### Current State (Quick Fix Implemented) 801 802The critical bug is **FIXED**: 803- ✅ S3Native mode correctly moves blobs from temp → final digest location 804- ✅ AppView sends real digest in complete requests 805- ✅ Blobs are stored at correct paths, downloads work 806- ✅ Temp uploads are cleaned up properly 807 808### Future State (Layer Records Feature) 809 810When implemented, layer records will make ATCR more ATProto-native by: 811- 🔮 Storing unique blobs as discoverable ATProto records 812- 🔮 Enabling deduplication via idempotent PutRecord (check before upload) 813- 🔮 Creating cross-repo references (manifest → layer records) 814- 🔮 Foundation for GC, access control, provenance tracking 815 816**Next Steps:** 8171. Test the quick fix in production 8182. Plan layer records implementation (requires PDS record creation) 8193. Implement deduplication logic 8204. Add manifest backlinks to layer records