A container registry that uses the AT Protocol for manifest storage and S3 for blob storage.
atcr.io
docker
container
atproto
go
1# Layer Records in ATProto
2
3## Overview
4
5This document describes the architecture for storing container layer metadata as ATProto records in the hold service's embedded PDS. This makes blob storage more "ATProto-native" by creating discoverable records for each unique layer.
6
7## TL;DR
8
9**Status: BUG FIXED ✅ | Layer Records Feature PLANNED 🔮**
10
11### Quick Fix (IMPLEMENTED)
12
13The critical bug where S3Native multipart uploads didn't move from temp → final location is now **FIXED**.
14
15**What was fixed:**
161. ✅ AppView sends real digest in complete request (not just tempDigest)
172. ✅ Hold's CompleteMultipartUploadWithManager now accepts finalDigest parameter
183. ✅ S3Native mode copies temp → final and deletes temp
194. ✅ Buffered mode writes directly to final location
20
21**Files changed:**
22- `pkg/appview/storage/proxy_blob_store.go` - Send real digest
23- `pkg/hold/s3.go` - Add copyBlobS3() and deleteBlobS3()
24- `pkg/hold/multipart.go` - Use finalDigest and move blob
25- `pkg/hold/blobstore_adapter.go` - Pass finalDigest through
26- `pkg/hold/pds/xrpc.go` - Update interface and handler
27
28### Layer Records Feature (PLANNED)
29
30Building on the quick fix, layer records will add:
311. 🔮 Hold creates ATProto record for each unique layer
322. 🔮 Deduplication: check layer record exists before finalizing upload
333. 🔮 Manifest backlinks: include layer record AT-URIs
344. 🔮 Discovery: `listRecords(io.atcr.manifest.layers)` shows all unique blobs
35
36**Benefits:**
37- Makes blobs discoverable via ATProto protocol
38- Enables garbage collection (find unreferenced layers)
39- Foundation for per-layer access control
40- Audit trail for storage operations
41
42## Motivation
43
44**Goal:** Make hold services more ATProto-native by tracking unique blobs as records.
45
46**Benefits:**
47- **Discovery:** Query `listRecords(io.atcr.manifest.layers)` to see all unique layers in a hold
48- **Auditing:** Track when unique content arrived, sizes, media types
49- **Deduplication:** One record per unique digest (not per upload)
50- **Migration:** Enumerate all blobs for moving between storage backends
51- **Future:** Foundation for per-blob access control, retention policies
52
53**Key Design Decision:** Store records for **unique digests only**, not every blob upload. This mirrors the content-addressed deduplication already happening in S3.
54
55## Current Upload Flow
56
57### OCI Distribution Spec Pattern
58
59The OCI distribution spec uses a two-phase upload:
60
611. **Initiate Upload**
62 ```
63 POST /v2/<name>/blobs/uploads/
64 → Returns upload UUID (digest unknown at this point!)
65 ```
66
672. **Upload Data**
68 ```
69 PATCH/PUT to temp location: uploads/temp-<uuid>
70 → Client streams blob data
71 → Digest not yet known
72 ```
73
743. **Finalize Upload**
75 ```
76 PUT /v2/<name>/blobs/uploads/<uuid>?digest=sha256:abc123
77 → Digest provided at finalization time
78 → Registry moves: temp → final location at digest path
79 ```
80
81**Critical insight:** In standard OCI distribution, the digest is only known at **finalization time**, not during upload. This allows clients to compute the digest as they stream data.
82
83### Current ATCR Implementation
84
85**Multipart Upload Flow:**
86
87```
881. Start multipart (XRPC POST with action=start, digest=sha256:abc...)
89 - Client provides digest upfront (xrpc.go:849 requires req.Digest)
90 - Generate uploadID (UUID)
91 - S3Native: Create S3 multipart upload at FINAL path blobPath(digest)
92 - Buffered: Create in-memory session with digest
93 - Session stores: uploadID, digest, mode
94
952. Upload parts (XRPC POST with action=part, uploadId, partNumber)
96 - S3Native: Returns presigned URLs to upload parts to final location
97 - Buffered: Returns XRPC endpoint with X-Upload-Id/X-Part-Number headers
98 - Parts go to final digest location (S3Native) or memory (Buffered)
99
1003. Complete (XRPC POST with action=complete, uploadId, parts[])
101 - S3Native: S3 CompleteMultipartUpload at final location
102 - Buffered: Assemble parts, write to final location blobPath(digest)
103```
104
105**Current paths:**
106- Final: `/docker/registry/v2/blobs/{algorithm}/{xx}/{hash}/data`
107- Example: `/docker/registry/v2/blobs/sha256/ab/abc123.../data`
108- Temp: `/docker/registry/v2/uploads/temp-<uuid>/data` (used during upload, then moved to final)
109
110**Key insight:** Unlike standard OCI distribution spec (where digest is provided at finalization), ATCR's XRPC multipart flow requires digest upfront at start time. This is fine, but we should still use temp paths for atomic deduplication with layer records.
111
112**Note:** The move operation bug described below has been fixed. The rest of this document describes the planned layer records feature.
113
114## The Bug (FIXED)
115
116### How It Was Fixed
117
118The bug was fixed by:
119
1201. **AppView** sends the real digest in complete request (not tempDigest)
121 - `pkg/appview/storage/proxy_blob_store.go:740-745`
122
1232. **Hold** accepts finalDigest parameter in CompleteMultipartUpload
124 - `pkg/hold/multipart.go:281` - Added finalDigest parameter
125 - `pkg/hold/s3.go:223-285` - Added copyBlobS3() and deleteBlobS3()
126
1273. **S3Native mode** now moves blob from temp → final location
128 - Complete multipart at temp location
129 - Copy to final digest location
130 - Delete temp
131
1324. **Buffered mode** writes directly to final location (no change needed)
133
134**Result:** Blobs are now correctly placed at final digest paths, downloads work correctly.
135
136### The Problem (Historical Context)
137
138Looking at the old `pkg/hold/multipart.go:278-317`, the `CompleteMultipartUploadWithManager` function:
139
140**S3Native mode (lines 282-289):**
141```go
142if session.Mode == S3Native {
143 parts := session.GetCompletedParts()
144 if err := s.completeMultipartUpload(ctx, session.Digest, session.S3UploadID, parts); err != nil {
145 return fmt.Errorf("failed to complete S3 multipart: %w", err)
146 }
147 log.Printf("Completed S3 native multipart: uploadID=%s, parts=%d", session.UploadID, len(parts))
148 return nil // ❌ Missing move operation!
149}
150```
151
152**What's missing:**
1531. S3 CompleteMultipartUpload assembles parts at temp location: `uploads/temp-<uuid>`
1542. **MISSING:** S3 CopyObject from `uploads/temp-<uuid>` → `blobs/sha256/ab/abc123.../data`
1553. **MISSING:** Delete temp blob
156
157**Buffered mode works correctly** (lines 292-316) because it writes assembled data directly to final path `blobPath(session.Digest)`.
158
159### Evidence from Design Doc
160
161From `docs/XRPC_BLOB_MIGRATION.md` (lines 105-114):
162```
1631. Multipart parts uploaded → uploads/temp-{uploadID}
1642. Complete multipart → S3 assembles parts at uploads/temp-{uploadID}
1653. **Move operation** → S3 copy from uploads/temp-{uploadID} → blobs/sha256/ab/abc123...
166```
167
168The move was supposed to be internalized into the complete action (lines 308-311):
169```
170Call service.CompleteMultipartUploadWithManager(ctx, session, multipartMgr)
171 - This internally calls S3 CompleteMultipartUpload to assemble parts
172 - Then performs server-side S3 copy from temp location to final digest location
173 - Equivalent to legacy /move endpoint operation
174```
175
176### The Actual Flow (Currently Broken for S3Native)
177
178**AppView sends tempDigest:**
179```go
180// proxy_blob_store.go
181tempDigest := fmt.Sprintf("uploads/temp-%s", writerID)
182uploadID, err := p.startMultipartUpload(ctx, tempDigest)
183// Passes tempDigest to hold via XRPC
184```
185
186**Hold receives and uses tempDigest:**
187```go
188// xrpc.go:854
189uploadID, mode, err := h.blobStore.StartMultipartUpload(ctx, req.Digest)
190// req.Digest = "uploads/temp-<writerID>" from AppView
191
192// blobstore_adapter.go → multipart.go → s3.go:93
193path := blobPath(digest) // digest = "uploads/temp-<writerID>"
194// Returns: "/docker/registry/v2/uploads/temp-<writerID>/data"
195
196// S3 multipart created at temp path ✅
197```
198
199**Parts uploaded to temp location ✅**
200
201**Complete called:**
202```go
203// proxy_blob_store.go (comment on line):
204// Complete multipart upload - XRPC complete action handles move internally
205if err := w.store.completeMultipartUpload(ctx, tempDigest, w.uploadID, w.parts); err != nil
206```
207
208**Hold's CompleteMultipartUploadWithManager for S3Native:**
209```go
210// multipart.go:282-289
211if session.Mode == S3Native {
212 parts := session.GetCompletedParts()
213 if err := s.completeMultipartUpload(ctx, session.Digest, session.S3UploadID, parts); err != nil {
214 return fmt.Errorf("failed to complete S3 multipart: %w", err)
215 }
216 log.Printf("Completed S3 native multipart: uploadID=%s, parts=%d", session.UploadID, len(parts))
217 return nil // ❌ BUG: No move operation!
218}
219```
220
221**Result:**
222- Blob is at: `/docker/registry/v2/uploads/temp-<writerID>/data` (temp location)
223- Blob should be at: `/docker/registry/v2/blobs/sha256/ab/abc123.../data` (final location)
224- **Downloads will fail** because AppView looks for blob at final digest path
225
226**Why this might appear to work:**
227- Buffered mode writes directly to final path (no temp used)
228- Or S3Native isn't being used in current deployments
229- Or there's a workaround somewhere else
230
231## Proposed Flow with Layer Records (Future Feature)
232
233### High-Level Flow
234
235**Building on the quick fix above, layer records will add:**
2361. PDS record creation for each unique layer digest
2372. Deduplication check before finalizing storage
2383. Manifest backlinks to layer records
239
240**Note:** The quick fix already implements sending finalDigest in complete request. The layer records feature extends this to create ATProto records.
241
242```
2431. Start multipart upload (XRPC action=start with tempDigest)
244 - AppView provides tempDigest: "uploads/temp-<writerID>"
245 - S3Native: Create S3 multipart at temp path: /uploads/temp-<writerID>/data
246 - Buffered: Create in-memory session with temp identifier
247 - Store in MultipartSession:
248 * TempDigest: "uploads/temp-<writerID>" (upload location)
249 * FinalDigest: null (not known yet at start time!)
250
251 NOTE: AppView knows the real digest (desc.Digest), but doesn't send it at start
252
2532. Upload parts (XRPC action=part)
254 - S3Native: Presigned URLs to temp path (uploads/temp-<uuid>)
255 - Buffered: Buffer parts in memory with temp identifier
256 - All parts go to temp location (not final digest location yet)
257
2583. Complete upload (XRPC action=complete, uploadId, finalDigest, parts)
259 - AppView NOW sends:
260 * uploadId: the session ID
261 * finalDigest: "sha256:abc123..." (the real digest for final location)
262 * parts: array of {partNumber, etag}
263
264 - Hold looks up session by uploadId
265 - Updates session.FinalDigest = finalDigest
266
267 a. Try PutRecord(io.atcr.manifest.layers, digestHash, layerRecord)
268 - digestHash = finalDigest without "sha256:" prefix
269 - Record key = digestHash (content-addressed, naturally idempotent)
270
271 b. If record already exists (PDS returns ErrRecordAlreadyExists):
272 - DEDUPLICATION! Layer already tracked
273 - Delete temp blob (S3 or buffered data)
274 - Return existing layerRecord AT-URI
275 - Client saved bandwidth/time (uploaded to temp, but not stored)
276
277 c. If record creation succeeds (new layer!):
278 - Finalize storage:
279 * S3Native: S3 CopyObject(uploads/temp-<uuid> → blobs/sha256/ab/abc123.../data)
280 * Buffered: Write assembled data to final path (blobs/sha256/ab/abc123.../data)
281 - Delete temp
282 - Return new layerRecord AT-URI + metadata
283
284 d. If record creation fails (PDS error):
285 - Delete temp blob
286 - Return error (upload failed, no storage consumed)
287```
288
289**Why use temp paths if digest is known?**
290- Deduplication check happens BEFORE committing blob to storage
291- If layer exists, we avoid expensive S3 copy to final location
292- Atomic: record creation + blob finalization together
293
294### Atomic Commit Logic
295
296The key is making record creation + blob finalization atomic:
297
298```go
299// In CompleteMultipartUploadWithManager
300func (s *HoldService) CompleteMultipartUploadWithManager(
301 ctx context.Context,
302 session *MultipartSession,
303 manager *MultipartManager,
304) (layerRecordURI string, err error) {
305 defer manager.DeleteSession(session.UploadID)
306
307 // Session now has both temp and final digests
308 tempDigest := session.TempDigest // "uploads/temp-<writerID>"
309 finalDigest := session.FinalDigest // "sha256:abc123..." (set during complete)
310
311 tempPath := blobPath(tempDigest) // /uploads/temp-<writerID>/data
312 finalPath := blobPath(finalDigest) // /blobs/sha256/ab/abc123.../data
313
314 // Extract digest hash for record key
315 digestHash := strings.TrimPrefix(finalDigest, "sha256:")
316
317 // Build layer record
318 layerRecord := &atproto.ManifestLayerRecord{
319 Type: "io.atcr.manifest.layers",
320 Digest: finalDigest,
321 Size: session.TotalSize,
322 MediaType: "application/vnd.oci.image.layer.v1.tar+gzip",
323 UploadedAt: time.Now().Format(time.RFC3339),
324 }
325
326 // Try to create layer record (idempotent with digest as rkey)
327 err = s.holdPDS.PutRecord(ctx, atproto.ManifestLayersCollection, digestHash, layerRecord)
328
329 if err == atproto.ErrRecordAlreadyExists {
330 // Dedupe! Layer already tracked
331 log.Printf("Layer already exists, deduplicating: digest=%s", digest)
332 s.deleteBlob(ctx, tempPath)
333
334 // Return existing record URI
335 return fmt.Sprintf("at://%s/%s/%s",
336 s.holdPDS.DID(),
337 atproto.ManifestLayersCollection,
338 digestHash), nil
339 } else if err != nil {
340 // PDS error - abort upload
341 log.Printf("Failed to create layer record: %v", err)
342 s.deleteBlob(ctx, tempPath)
343 return "", fmt.Errorf("failed to create layer record: %w", err)
344 }
345
346 // New layer! Finalize storage
347 if session.Mode == S3Native {
348 // S3 multipart already uploaded to temp path
349 // Copy to final location
350 if err := s.copyBlob(ctx, tempPath, finalPath); err != nil {
351 // Rollback: delete layer record
352 s.holdPDS.DeleteRecord(ctx, atproto.ManifestLayersCollection, digestHash)
353 s.deleteBlob(ctx, tempPath)
354 return "", fmt.Errorf("failed to copy blob: %w", err)
355 }
356 s.deleteBlob(ctx, tempPath)
357 } else {
358 // Buffered mode: assemble and write to final location
359 data, size, err := session.AssembleBufferedParts()
360 if err != nil {
361 s.holdPDS.DeleteRecord(ctx, atproto.ManifestLayersCollection, digestHash)
362 return "", fmt.Errorf("failed to assemble parts: %w", err)
363 }
364
365 if err := s.writeBlob(ctx, finalPath, data); err != nil {
366 s.holdPDS.DeleteRecord(ctx, atproto.ManifestLayersCollection, digestHash)
367 return "", fmt.Errorf("failed to write blob: %w", err)
368 }
369
370 log.Printf("Wrote blob to final location: size=%d", size)
371 }
372
373 // Success! Return new layer record URI
374 layerRecordURI = fmt.Sprintf("at://%s/%s/%s",
375 s.holdPDS.DID(),
376 atproto.ManifestLayersCollection,
377 digestHash)
378
379 log.Printf("Created new layer record: %s", layerRecordURI)
380 return layerRecordURI, nil
381}
382```
383
384## Lexicon Schema
385
386### io.atcr.manifest.layers
387
388```json
389{
390 "lexicon": 1,
391 "id": "io.atcr.manifest.layers",
392 "defs": {
393 "main": {
394 "type": "record",
395 "key": "literal:self",
396 "record": {
397 "type": "object",
398 "required": ["digest", "size", "mediaType", "uploadedAt"],
399 "properties": {
400 "digest": {
401 "type": "string",
402 "description": "Full OCI digest (sha256:abc123...)"
403 },
404 "size": {
405 "type": "integer",
406 "description": "Size in bytes"
407 },
408 "mediaType": {
409 "type": "string",
410 "description": "Media type (e.g., application/vnd.oci.image.layer.v1.tar+gzip)"
411 },
412 "uploadedAt": {
413 "type": "string",
414 "format": "datetime",
415 "description": "When this unique layer first arrived"
416 }
417 }
418 }
419 }
420 }
421}
422```
423
424**Record key:** Digest hash (without algorithm prefix)
425- Example: `sha256:abc123...` → record key `abc123...`
426- This makes records content-addressed and naturally deduplicates
427
428### Example Record
429
430```json
431{
432 "$type": "io.atcr.manifest.layers",
433 "digest": "sha256:abc123def456...",
434 "size": 12345678,
435 "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
436 "uploadedAt": "2025-10-18T12:34:56Z"
437}
438```
439
440**AT-URI:** `at://did:web:hold1.atcr.io/io.atcr.manifest.layers/abc123def456...`
441
442## Implementation Details
443
444### Files to Modify
445
4461. **pkg/atproto/lexicon.go**
447 - Add `ManifestLayersCollection = "io.atcr.manifest.layers"`
448 - Add `ManifestLayerRecord` struct
449
4502. **pkg/hold/multipart.go**
451 - Update `MultipartSession` struct:
452 - Rename `Digest` to `TempDigest` - temp identifier (e.g., "uploads/temp-<writerID>")
453 - Add `FinalDigest string` - final digest (e.g., "sha256:abc123..."), set during complete
454 - Update `StartMultipartUploadWithManager` to:
455 - Receive tempDigest from AppView (not final digest)
456 - Create S3 multipart at temp path
457 - Store TempDigest in session (FinalDigest is null at start)
458 - Modify `CompleteMultipartUploadWithManager` to:
459 - Try PutRecord to create layer record
460 - If exists: delete temp, return existing record (dedupe)
461 - If new: finalize storage (copy/move temp → final)
462 - Handle rollback on errors
463
4643. **pkg/hold/s3.go**
465 - Add `copyBlob(src, dst)` for S3 CopyObject
466 - Add `deleteBlob(path)` for cleanup
467
4684. **pkg/hold/storage.go**
469 - Update `blobPath()` to handle temp digests
470 - Add helper for final path generation
471
4725. **pkg/hold/pds/server.go**
473 - Add `PutRecord(ctx, collection, rkey, record)` method to HoldPDS
474 - Wraps `repomgr.CreateRecord()` or `repomgr.UpdateRecord()`
475 - Returns `ErrRecordAlreadyExists` if rkey exists (for deduplication)
476 - Similar pattern to existing `AddCrewMember()` method
477 - Add `DeleteRecord(ctx, collection, rkey)` method (for rollback)
478 - Wraps `repomgr.DeleteRecord()`
479 - Add error constant: `var ErrRecordAlreadyExists = errors.New("record already exists")`
480
4816. **pkg/hold/pds/xrpc.go**
482 - Update `BlobStore` interface:
483 - Change `CompleteMultipartUpload` signature:
484 * Was: `CompleteMultipartUpload(ctx, uploadID, parts) error`
485 * New: `CompleteMultipartUpload(ctx, uploadID, finalDigest, parts) (*LayerMetadata, error)`
486 * Takes finalDigest to know where to move blob + create layer record
487 - Update `handleMultipartOperation` complete action to:
488 - Parse `finalDigest` from request body (NEW)
489 - Look up session by uploadID
490 - Set session.FinalDigest = finalDigest
491 - Call CompleteMultipartUpload (returns LayerMetadata)
492 - Include layerRecord AT-URI in response
493 - Add `LayerMetadata` struct:
494 ```go
495 type LayerMetadata struct {
496 LayerRecord string // AT-URI
497 Digest string
498 Size int64
499 Deduplicated bool
500 }
501 ```
502
5037. **pkg/appview/storage/proxy_blob_store.go**
504 - Update `ProxyBlobWriter.Commit()` to send finalDigest in complete request:
505 ```go
506 // Current: only sends tempDigest
507 completeMultipartUpload(ctx, tempDigest, uploadID, parts)
508
509 // New: also sends finalDigest
510 completeMultipartUpload(ctx, uploadID, finalDigest, parts)
511 ```
512 - The writer already has `w.desc.Digest` (the real digest)
513 - Pass both uploadID (to find session) and finalDigest (for move + layer record)
514
515### API Changes
516
517#### Complete Multipart Request (XRPC) - UPDATED
518
519**Before:**
520```json
521{
522 "action": "complete",
523 "uploadId": "upload-1634567890",
524 "parts": [
525 { "partNumber": 1, "etag": "abc123" },
526 { "partNumber": 2, "etag": "def456" }
527 ]
528}
529```
530
531**After (with finalDigest):**
532```json
533{
534 "action": "complete",
535 "uploadId": "upload-1634567890",
536 "digest": "sha256:abc123...",
537 "parts": [
538 { "partNumber": 1, "etag": "abc123" },
539 { "partNumber": 2, "etag": "def456" }
540 ]
541}
542```
543
544#### Complete Multipart Response (XRPC)
545
546**Before:**
547```json
548{
549 "status": "completed"
550}
551```
552
553**After:**
554```json
555{
556 "status": "completed",
557 "layerRecord": "at://did:web:hold1.atcr.io/io.atcr.manifest.layers/abc123...",
558 "digest": "sha256:abc123...",
559 "size": 12345678,
560 "deduplicated": false
561}
562```
563
564**Deduplication case:**
565```json
566{
567 "status": "completed",
568 "layerRecord": "at://did:web:hold1.atcr.io/io.atcr.manifest.layers/abc123...",
569 "digest": "sha256:abc123...",
570 "size": 12345678,
571 "deduplicated": true
572}
573```
574
575### S3 Operations
576
577**S3 Native Mode:**
578```go
579// Start: Create multipart upload at TEMP path
580uploadID = s3.CreateMultipartUpload(bucket, "uploads/temp-<uuid>")
581
582// Upload parts: to temp location
583s3.UploadPart(bucket, "uploads/temp-<uuid>", partNum, data)
584
585// Complete: Copy temp → final
586s3.CopyObject(
587 bucket, "uploads/temp-<uuid>", // source
588 bucket, "blobs/sha256/ab/abc123.../data" // dest
589)
590s3.DeleteObject(bucket, "uploads/temp-<uuid>")
591```
592
593**Buffered Mode:**
594```go
595// Parts buffered in memory
596session.Parts[partNum] = data
597
598// Complete: Write to final location
599assembledData = session.AssembleBufferedParts()
600driver.Writer("blobs/sha256/ab/abc123.../data").Write(assembledData)
601```
602
603## Manifest Integration
604
605### Manifest Record Enhancement
606
607When AppView writes manifests to user's PDS, include layer record references:
608
609```json
610{
611 "$type": "io.atcr.manifest",
612 "repository": "myapp",
613 "digest": "sha256:manifest123...",
614 "holdEndpoint": "https://hold1.atcr.io",
615 "holdDid": "did:web:hold1.atcr.io",
616 "layers": [
617 {
618 "digest": "sha256:abc123...",
619 "size": 12345678,
620 "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
621 "layerRecord": "at://did:web:hold1.atcr.io/io.atcr.manifest.layers/abc123..."
622 }
623 ]
624}
625```
626
627**Cross-repo references:** Manifests in user's PDS point to layer records in hold's PDS.
628
629### AppView Flow
630
6311. Client pushes layer to hold
6322. Hold returns `layerRecord` AT-URI in response
6333. AppView caches: `digest → layerRecord AT-URI`
6344. When writing manifest to user's PDS:
635 - Add `layerRecord` field to each layer
636 - Add `holdDid` to manifest root
637
638## Benefits
639
6401. **ATProto Discovery**
641 - `listRecords(io.atcr.manifest.layers)` shows all unique layers
642 - Standard ATProto queries work
643
6442. **Automatic Deduplication**
645 - PutRecord with digest as rkey is naturally idempotent
646 - Concurrent uploads of same layer handled gracefully
647
6483. **Audit Trail**
649 - Track when each unique layer first arrived
650 - Monitor storage growth by unique content
651
6524. **Migration Support**
653 - Enumerate all blobs via ATProto queries
654 - Verify blob existence before migration
655
6565. **Cross-Repo References**
657 - Manifests link to layer records via AT-URI
658 - Verifiable blob existence
659
6606. **Future Features**
661 - Per-layer access control
662 - Retention policies
663 - Layer tagging/metadata
664
665## Trade-offs
666
667### Complexity
668- Additional PDS writes during upload
669- S3 copy operation (temp → final)
670- Rollback logic if record creation succeeds but storage fails
671
672### Performance
673- Extra latency: PDS write + S3 copy
674- BUT: Deduplication saves bandwidth on repeated uploads
675
676### Storage
677- Minimal: Layer records are just metadata (~200 bytes each)
678- S3 temp → final copy uses same S3 account (no egress cost)
679
680### Consistency
681- Must keep layer records and S3 blobs in sync
682- Rollback deletes layer record if storage fails
683- Orphaned records possible if process crashes mid-commit
684
685## Future Considerations
686
687### Garbage Collection
688
689Layer records enable GC:
690```
6911. List all layer records in hold
6922. For each layer:
693 - Query manifests that reference it (via AppView)
694 - If no references, mark for deletion
6953. Delete unreferenced layers (record + blob)
696```
697
698### Private Layers
699
700Currently, holds are public or crew-only (hold-level auth). Future:
701- Per-layer permissions via layer record metadata
702- Reference from manifest proves user has access
703
704### Layer Provenance
705
706Track additional metadata:
707- First uploader DID
708- Upload source (manifest URI)
709- Verification status
710
711## Configuration
712
713Add environment variable:
714```
715HOLD_TRACK_LAYERS=true # Enable layer record creation (default: true)
716```
717
718If disabled, hold service works as before (no layer records).
719
720## Testing Strategy
721
7221. **Deduplication Test**
723 - Upload same layer twice
724 - Verify only one record created
725 - Verify second upload returns same AT-URI
726
7272. **Concurrent Upload Test**
728 - Upload same layer from 2 clients simultaneously
729 - Verify one succeeds, one dedupes
730 - Verify only one blob in S3
731
7323. **Rollback Test**
733 - Mock S3 failure after record creation
734 - Verify layer record is deleted (rollback)
735
7364. **Migration Test**
737 - Upload multiple layers
738 - List all layer records
739 - Verify blobs exist in S3
740
741## Open Questions
742
7431. **What happens if S3 copy fails after record creation?**
744 - Current plan: Delete layer record (rollback)
745 - Alternative: Leave record, retry copy on next request?
746
7472. **Should we verify blob digest matches record?**
748 - On upload: Client provides digest, but we trust it
749 - Could compute digest during upload to verify
750
7513. **How to handle orphaned layer records?**
752 - Record exists but blob missing from S3
753 - Background job to verify and clean up?
754
7554. **Should manifests store layer records?**
756 - Yes: Strong references, verifiable
757 - No: Extra complexity, larger manifests
758 - **Decision:** Yes, for ATProto graph completeness
759
760## Testing & Verification
761
762### Verify the Quick Fix Works (Bug is Fixed)
763
764After the quick fix implementation:
765
7661. **Push a test image** with S3Native mode enabled
7672. **Verify blob at final location:**
768 ```bash
769 aws s3 ls s3://bucket/docker/registry/v2/blobs/sha256/ab/abc123.../data
770 ```
7713. **Verify temp is cleaned up:**
772 ```bash
773 aws s3 ls s3://bucket/docker/registry/v2/uploads/temp-* # Should be empty
774 ```
7754. **Pull the image** → should succeed ✅
776
777### Test Layer Records Feature (When Implemented)
778
779After implementing the full layer records feature:
780
7811. **Push an image**
7822. **Verify layer record created:**
783 ```
784 GET /xrpc/com.atproto.repo.getRecord?repo={holdDID}&collection=io.atcr.manifest.layers&rkey=abc123...
785 ```
7863. **Verify blob at final location** (same as quick fix)
7874. **Verify temp deleted** (same as quick fix)
7885. **Pull image** → should succeed
789
790### Test Deduplication (Layer Records Feature)
791
7921. Push same layer from different client
7932. Verify only one layer record exists
7943. Verify complete returns `deduplicated: true`
7954. Verify no duplicate blobs in S3
7965. Verify temp blob was deleted without copying (dedupe path)
797
798## Summary
799
800### Current State (Quick Fix Implemented)
801
802The critical bug is **FIXED**:
803- ✅ S3Native mode correctly moves blobs from temp → final digest location
804- ✅ AppView sends real digest in complete requests
805- ✅ Blobs are stored at correct paths, downloads work
806- ✅ Temp uploads are cleaned up properly
807
808### Future State (Layer Records Feature)
809
810When implemented, layer records will make ATCR more ATProto-native by:
811- 🔮 Storing unique blobs as discoverable ATProto records
812- 🔮 Enabling deduplication via idempotent PutRecord (check before upload)
813- 🔮 Creating cross-repo references (manifest → layer records)
814- 🔮 Foundation for GC, access control, provenance tracking
815
816**Next Steps:**
8171. Test the quick fix in production
8182. Plan layer records implementation (requires PDS record creation)
8193. Implement deduplication logic
8204. Add manifest backlinks to layer records