# Blob Support Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** Add blob (image/video) upload, storage, and retrieval to the PDS using Cloudflare R2. **Architecture:** Blobs stored in R2 bucket keyed by `{did}/{cid}`. Metadata tracked in SQLite tables (`blob`, `record_blob`) within each Durable Object. Orphan cleanup via DO alarm. MIME sniffing for security. **Tech Stack:** Cloudflare R2, Durable Object SQLite, Web Crypto API (SHA-256 for CID generation) --- ## Task 1: Add R2 Bucket Binding **Files:** - Modify: `wrangler.toml` **Step 1: Add R2 binding to wrangler.toml** Add after the existing migrations section: ```toml [[r2_buckets]] binding = "BLOBS" bucket_name = "pds-blobs" ``` **Step 2: Create R2 bucket (if not exists)** Run: `npx wrangler r2 bucket create pds-blobs` **Step 3: Commit** ```bash git add wrangler.toml git commit -m "feat: add R2 bucket binding for blob storage" ``` --- ## Task 2: Add Blob Database Schema **Files:** - Modify: `src/pds.js:1162-1190` (constructor schema initialization) **Step 1: Add blob and record_blob tables** In the `PersonalDataServer` constructor, after the existing `CREATE TABLE` statements (around line 1186), add: ```javascript CREATE TABLE IF NOT EXISTS blob ( cid TEXT PRIMARY KEY, mimeType TEXT NOT NULL, size INTEGER NOT NULL, createdAt TEXT NOT NULL ); CREATE TABLE IF NOT EXISTS record_blob ( blobCid TEXT NOT NULL, recordUri TEXT NOT NULL, PRIMARY KEY (blobCid, recordUri) ); ``` **Step 2: Test schema creation manually** Deploy and verify tables exist: ```bash npx wrangler deploy ``` **Step 3: Commit** ```bash git add src/pds.js git commit -m "feat: add blob and record_blob tables to schema" ``` --- ## Task 3: Implement MIME Type Sniffing **Files:** - Modify: `src/pds.js` (add after error helper, around line 30) - Test: `test/pds.test.js` **Step 1: Write the failing test** Add to `test/pds.test.js`: ```javascript import { // ... existing imports ... sniffMimeType, } from '../src/pds.js'; describe('MIME Type Sniffing', () => { test('detects JPEG', () => { const bytes = new Uint8Array([0xFF, 0xD8, 0xFF, 0xE0, 0x00, 0x10]); assert.strictEqual(sniffMimeType(bytes), 'image/jpeg'); }); test('detects PNG', () => { const bytes = new Uint8Array([0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A]); assert.strictEqual(sniffMimeType(bytes), 'image/png'); }); test('detects GIF', () => { const bytes = new Uint8Array([0x47, 0x49, 0x46, 0x38, 0x39, 0x61]); assert.strictEqual(sniffMimeType(bytes), 'image/gif'); }); test('detects WebP', () => { const bytes = new Uint8Array([ 0x52, 0x49, 0x46, 0x46, // RIFF 0x00, 0x00, 0x00, 0x00, // size (ignored) 0x57, 0x45, 0x42, 0x50, // WEBP ]); assert.strictEqual(sniffMimeType(bytes), 'image/webp'); }); test('detects MP4', () => { const bytes = new Uint8Array([ 0x00, 0x00, 0x00, 0x18, // size 0x66, 0x74, 0x79, 0x70, // ftyp ]); assert.strictEqual(sniffMimeType(bytes), 'video/mp4'); }); test('returns null for unknown', () => { const bytes = new Uint8Array([0x00, 0x01, 0x02, 0x03]); assert.strictEqual(sniffMimeType(bytes), null); }); }); ``` **Step 2: Run test to verify it fails** Run: `npm test` Expected: FAIL with "sniffMimeType is not exported" **Step 3: Write minimal implementation** Add to `src/pds.js` after the error helper (around line 30): ```javascript // === MIME TYPE SNIFFING === // Detect file type from magic bytes (first 12 bytes) /** * Sniff MIME type from file magic bytes * @param {Uint8Array|ArrayBuffer} bytes - File bytes (only first 12 needed) * @returns {string|null} Detected MIME type or null if unknown */ export function sniffMimeType(bytes) { const arr = new Uint8Array(bytes.slice(0, 12)); // JPEG: FF D8 FF if (arr[0] === 0xff && arr[1] === 0xd8 && arr[2] === 0xff) { return 'image/jpeg'; } // PNG: 89 50 4E 47 0D 0A 1A 0A if ( arr[0] === 0x89 && arr[1] === 0x50 && arr[2] === 0x4e && arr[3] === 0x47 && arr[4] === 0x0d && arr[5] === 0x0a && arr[6] === 0x1a && arr[7] === 0x0a ) { return 'image/png'; } // GIF: 47 49 46 38 (GIF8) if ( arr[0] === 0x47 && arr[1] === 0x49 && arr[2] === 0x46 && arr[3] === 0x38 ) { return 'image/gif'; } // WebP: RIFF....WEBP if ( arr[0] === 0x52 && arr[1] === 0x49 && arr[2] === 0x46 && arr[3] === 0x46 && arr[8] === 0x57 && arr[9] === 0x45 && arr[10] === 0x42 && arr[11] === 0x50 ) { return 'image/webp'; } // MP4/MOV: ....ftyp at byte 4 if ( arr[4] === 0x66 && arr[5] === 0x74 && arr[6] === 0x79 && arr[7] === 0x70 ) { return 'video/mp4'; } return null; } ``` **Step 4: Run test to verify it passes** Run: `npm test` Expected: PASS **Step 5: Commit** ```bash git add src/pds.js test/pds.test.js git commit -m "feat: add MIME type sniffing from magic bytes" ``` --- ## Task 4: Implement Blob Ref Detection **Files:** - Modify: `src/pds.js` (add after sniffMimeType) - Test: `test/pds.test.js` **Step 1: Write the failing test** Add to `test/pds.test.js`: ```javascript import { // ... existing imports ... findBlobRefs, } from '../src/pds.js'; describe('Blob Ref Detection', () => { test('finds blob ref in simple object', () => { const record = { $type: 'app.bsky.feed.post', text: 'Hello', embed: { $type: 'app.bsky.embed.images', images: [ { image: { $type: 'blob', ref: { $link: 'bafkreiabc123' }, mimeType: 'image/jpeg', size: 1234, }, alt: 'test image', }, ], }, }; const refs = findBlobRefs(record); assert.deepStrictEqual(refs, ['bafkreiabc123']); }); test('finds multiple blob refs', () => { const record = { images: [ { image: { $type: 'blob', ref: { $link: 'cid1' }, mimeType: 'image/png', size: 100 } }, { image: { $type: 'blob', ref: { $link: 'cid2' }, mimeType: 'image/png', size: 200 } }, ], }; const refs = findBlobRefs(record); assert.deepStrictEqual(refs, ['cid1', 'cid2']); }); test('returns empty array when no blobs', () => { const record = { text: 'Hello world', count: 42 }; const refs = findBlobRefs(record); assert.deepStrictEqual(refs, []); }); test('handles null and primitives', () => { assert.deepStrictEqual(findBlobRefs(null), []); assert.deepStrictEqual(findBlobRefs('string'), []); assert.deepStrictEqual(findBlobRefs(42), []); }); }); ``` **Step 2: Run test to verify it fails** Run: `npm test` Expected: FAIL with "findBlobRefs is not exported" **Step 3: Write minimal implementation** Add to `src/pds.js` after sniffMimeType: ```javascript // === BLOB REF DETECTION === // Recursively find blob references in records /** * Find all blob CID references in a record * @param {*} obj - Record value to scan * @param {string[]} refs - Accumulator array (internal) * @returns {string[]} Array of blob CID strings */ export function findBlobRefs(obj, refs = []) { if (!obj || typeof obj !== 'object') { return refs; } // Check if this object is a blob ref if (obj.$type === 'blob' && obj.ref?.$link) { refs.push(obj.ref.$link); } // Recurse into arrays and objects if (Array.isArray(obj)) { for (const item of obj) { findBlobRefs(item, refs); } } else { for (const value of Object.values(obj)) { findBlobRefs(value, refs); } } return refs; } ``` **Step 4: Run test to verify it passes** Run: `npm test` Expected: PASS **Step 5: Commit** ```bash git add src/pds.js test/pds.test.js git commit -m "feat: add blob ref detection for records" ``` --- ## Task 5: Implement uploadBlob Endpoint **Files:** - Modify: `src/pds.js` (add route and handler) **Step 1: Add route to pdsRoutes** In `pdsRoutes` object (around line 1055), add: ```javascript '/xrpc/com.atproto.repo.uploadBlob': { method: 'POST', handler: (pds, req, _url) => pds.handleUploadBlob(req), }, ``` **Step 2: Add handler method to PersonalDataServer class** Add method to the class (after existing handlers): ```javascript async handleUploadBlob(request) { // Require auth const authResult = await this.requireAuth(request); if (authResult instanceof Response) return authResult; const did = await this.getDid(); if (!did) { return errorResponse('InvalidRequest', 'PDS not initialized', 400); } // Read body as ArrayBuffer const bodyBytes = await request.arrayBuffer(); const size = bodyBytes.byteLength; // Check size limit (50MB) const MAX_BLOB_SIZE = 50 * 1024 * 1024; if (size > MAX_BLOB_SIZE) { return errorResponse( 'BlobTooLarge', `Blob size ${size} exceeds maximum ${MAX_BLOB_SIZE}`, 400, ); } // Sniff MIME type, fall back to Content-Type header const contentType = request.headers.get('Content-Type') || 'application/octet-stream'; const sniffed = sniffMimeType(bodyBytes); const mimeType = sniffed || contentType; // Compute CID (reuse existing createCid) const cid = await createCid(new Uint8Array(bodyBytes)); const cidStr = cidToString(cid); // Check if blob already exists const existing = this.sql .exec('SELECT cid FROM blob WHERE cid = ?', cidStr) .toArray(); if (existing.length === 0) { // Upload to R2 const r2Key = `${did}/${cidStr}`; await this.env.BLOBS.put(r2Key, bodyBytes, { httpMetadata: { contentType: mimeType }, }); // Insert metadata const createdAt = new Date().toISOString(); this.sql.exec( 'INSERT INTO blob (cid, mimeType, size, createdAt) VALUES (?, ?, ?, ?)', cidStr, mimeType, size, createdAt, ); } // Return BlobRef return Response.json({ blob: { $type: 'blob', ref: { $link: cidStr }, mimeType, size, }, }); } ``` **Step 3: Verify deployment** Run: `npx wrangler deploy` **Step 4: Test manually with curl** ```bash curl -X POST \ -H "Authorization: Bearer " \ -H "Content-Type: image/png" \ --data-binary @test-image.png \ https://your-pds.workers.dev/xrpc/com.atproto.repo.uploadBlob ``` Expected: JSON response with blob ref **Step 5: Commit** ```bash git add src/pds.js git commit -m "feat: implement uploadBlob endpoint with R2 storage" ``` --- ## Task 6: Implement getBlob Endpoint **Files:** - Modify: `src/pds.js` (add route and handler) **Step 1: Add route to pdsRoutes** ```javascript '/xrpc/com.atproto.sync.getBlob': { handler: (pds, _req, url) => pds.handleGetBlob(url), }, ``` **Step 2: Add handler method** ```javascript async handleGetBlob(url) { const did = url.searchParams.get('did'); const cid = url.searchParams.get('cid'); if (!did || !cid) { return errorResponse('InvalidRequest', 'missing did or cid parameter', 400); } // Verify DID matches this DO const myDid = await this.getDid(); if (did !== myDid) { return errorResponse('InvalidRequest', 'DID does not match this repo', 400); } // Look up blob metadata const rows = this.sql .exec('SELECT mimeType, size FROM blob WHERE cid = ?', cid) .toArray(); if (rows.length === 0) { return errorResponse('BlobNotFound', 'blob not found', 404); } const { mimeType, size } = rows[0]; // Fetch from R2 const r2Key = `${did}/${cid}`; const object = await this.env.BLOBS.get(r2Key); if (!object) { return errorResponse('BlobNotFound', 'blob not found in storage', 404); } // Return blob with security headers return new Response(object.body, { headers: { 'Content-Type': mimeType, 'Content-Length': String(size), 'X-Content-Type-Options': 'nosniff', 'Content-Security-Policy': "default-src 'none'; sandbox", 'Cache-Control': 'public, max-age=31536000, immutable', }, }); } ``` **Step 3: Deploy and test** Run: `npx wrangler deploy` Test: ```bash curl "https://your-pds.workers.dev/xrpc/com.atproto.sync.getBlob?did=did:plc:xxx&cid=bafkrei..." ``` **Step 4: Commit** ```bash git add src/pds.js git commit -m "feat: implement getBlob endpoint" ``` --- ## Task 7: Implement listBlobs Endpoint **Files:** - Modify: `src/pds.js` (add route and handler) **Step 1: Add route to pdsRoutes** ```javascript '/xrpc/com.atproto.sync.listBlobs': { handler: (pds, _req, url) => pds.handleListBlobs(url), }, ``` **Step 2: Add handler method** ```javascript async handleListBlobs(url) { const did = url.searchParams.get('did'); const cursor = url.searchParams.get('cursor'); const limit = Math.min(Number(url.searchParams.get('limit')) || 500, 1000); if (!did) { return errorResponse('InvalidRequest', 'missing did parameter', 400); } // Verify DID matches this DO const myDid = await this.getDid(); if (did !== myDid) { return errorResponse('InvalidRequest', 'DID does not match this repo', 400); } // Query blobs with pagination let query = 'SELECT cid, createdAt FROM blob'; const params = []; if (cursor) { query += ' WHERE createdAt > ?'; params.push(cursor); } query += ' ORDER BY createdAt ASC LIMIT ?'; params.push(limit + 1); // Fetch one extra to detect if there's more const rows = this.sql.exec(query, ...params).toArray(); // Determine if there's a next page let nextCursor = null; if (rows.length > limit) { rows.pop(); // Remove the extra row nextCursor = rows[rows.length - 1].createdAt; } return Response.json({ cids: rows.map((r) => r.cid), cursor: nextCursor, }); } ``` **Step 3: Deploy and test** Run: `npx wrangler deploy` Test: ```bash curl "https://your-pds.workers.dev/xrpc/com.atproto.sync.listBlobs?did=did:plc:xxx" ``` **Step 4: Commit** ```bash git add src/pds.js git commit -m "feat: implement listBlobs endpoint" ``` --- ## Task 8: Integrate Blob Association with createRecord **Files:** - Modify: `src/pds.js:1253` (createRecord method) **Step 1: Add blob association after record storage** In `createRecord` method, after storing the record in the `records` table (around line 1280), add: ```javascript // Associate blobs with this record const blobRefs = findBlobRefs(record); for (const blobCid of blobRefs) { // Verify blob exists const blobExists = this.sql .exec('SELECT cid FROM blob WHERE cid = ?', blobCid) .toArray(); if (blobExists.length === 0) { throw new Error(`BlobNotFound: ${blobCid}`); } // Create association this.sql.exec( 'INSERT OR IGNORE INTO record_blob (blobCid, recordUri) VALUES (?, ?)', blobCid, uri, ); } ``` **Step 2: Deploy and test** Test by uploading a blob, then creating a post that references it: ```bash # Upload blob BLOB=$(curl -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: image/png" --data-binary @test.png \ https://your-pds.workers.dev/xrpc/com.atproto.repo.uploadBlob) echo $BLOB # Get the CID # Create post with image curl -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://your-pds.workers.dev/xrpc/com.atproto.repo.createRecord \ -d '{ "repo": "did:plc:xxx", "collection": "app.bsky.feed.post", "record": { "$type": "app.bsky.feed.post", "text": "Hello with image!", "createdAt": "2026-01-06T12:00:00.000Z", "embed": { "$type": "app.bsky.embed.images", "images": [{ "image": { "$type": "blob", "ref": {"$link": ""}, "mimeType": "image/png", "size": 1234 }, "alt": "test" }] } } }' ``` **Step 3: Commit** ```bash git add src/pds.js git commit -m "feat: associate blobs with records on createRecord" ``` --- ## Task 9: Implement Blob Cleanup on deleteRecord **Files:** - Modify: `src/pds.js:1391` (deleteRecord method) **Step 1: Add blob cleanup after record deletion** In `deleteRecord` method, after deleting the record from the `records` table, add: ```javascript // Get blobs associated with this record const associatedBlobs = this.sql .exec('SELECT blobCid FROM record_blob WHERE recordUri = ?', uri) .toArray(); // Remove associations for this record this.sql.exec('DELETE FROM record_blob WHERE recordUri = ?', uri); // Check each blob for orphan status and delete if unreferenced for (const { blobCid } of associatedBlobs) { const stillReferenced = this.sql .exec('SELECT 1 FROM record_blob WHERE blobCid = ? LIMIT 1', blobCid) .toArray(); if (stillReferenced.length === 0) { // Blob is orphaned, delete from R2 and database const did = await this.getDid(); await this.env.BLOBS.delete(`${did}/${blobCid}`); this.sql.exec('DELETE FROM blob WHERE cid = ?', blobCid); } } ``` **Step 2: Deploy and test** Test by creating a post with an image, then deleting it: ```bash # Delete the post curl -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ https://your-pds.workers.dev/xrpc/com.atproto.repo.deleteRecord \ -d '{ "repo": "did:plc:xxx", "collection": "app.bsky.feed.post", "rkey": "" }' # Verify blob is gone curl "https://your-pds.workers.dev/xrpc/com.atproto.sync.listBlobs?did=did:plc:xxx" ``` **Step 3: Commit** ```bash git add src/pds.js git commit -m "feat: cleanup orphaned blobs on record deletion" ``` --- ## Task 10: Implement Orphan Cleanup Alarm **Files:** - Modify: `src/pds.js` (add alarm handler and scheduling) **Step 1: Add alarm scheduling in initIdentity** In the `initIdentity` method (or after successful init), add: ```javascript // Schedule blob cleanup alarm (runs daily) const currentAlarm = await this.state.storage.getAlarm(); if (!currentAlarm) { await this.state.storage.setAlarm(Date.now() + 24 * 60 * 60 * 1000); } ``` **Step 2: Add alarm handler to PersonalDataServer class** ```javascript async alarm() { await this.cleanupOrphanedBlobs(); // Reschedule for next day await this.state.storage.setAlarm(Date.now() + 24 * 60 * 60 * 1000); } async cleanupOrphanedBlobs() { const did = await this.getDid(); if (!did) return; // Find orphans: blobs not in record_blob, older than 24h const cutoff = new Date(Date.now() - 24 * 60 * 60 * 1000).toISOString(); const orphans = this.sql .exec( `SELECT b.cid FROM blob b LEFT JOIN record_blob rb ON b.cid = rb.blobCid WHERE rb.blobCid IS NULL AND b.createdAt < ?`, cutoff, ) .toArray(); for (const { cid } of orphans) { await this.env.BLOBS.delete(`${did}/${cid}`); this.sql.exec('DELETE FROM blob WHERE cid = ?', cid); } if (orphans.length > 0) { console.log(`Cleaned up ${orphans.length} orphaned blobs`); } } ``` **Step 3: Deploy** Run: `npx wrangler deploy` **Step 4: Commit** ```bash git add src/pds.js git commit -m "feat: add DO alarm for orphaned blob cleanup" ``` --- ## Task 11: Update README **Files:** - Modify: `README.md` **Step 1: Update feature checklist** Change: ```markdown - [ ] Blob storage (uploadBlob, getBlob, listBlobs) ``` To: ```markdown - [x] Blob storage (uploadBlob, getBlob, listBlobs) ``` **Step 2: Add blob configuration section** Add under configuration: ```markdown ### Blob Storage Blobs (images, videos) are stored in Cloudflare R2: 1. Create an R2 bucket: `npx wrangler r2 bucket create pds-blobs` 2. The binding is already configured in `wrangler.toml` Supported formats: JPEG, PNG, GIF, WebP, MP4 Max size: 50MB Orphaned blobs are automatically cleaned up after 24 hours. ``` **Step 3: Commit** ```bash git add README.md git commit -m "docs: update README with blob storage feature" ``` --- ## Summary | Task | Description | Files Modified | |------|-------------|----------------| | 1 | Add R2 bucket binding | `wrangler.toml` | | 2 | Add blob database schema | `src/pds.js` | | 3 | Implement MIME sniffing | `src/pds.js`, `test/pds.test.js` | | 4 | Implement blob ref detection | `src/pds.js`, `test/pds.test.js` | | 5 | Implement uploadBlob endpoint | `src/pds.js` | | 6 | Implement getBlob endpoint | `src/pds.js` | | 7 | Implement listBlobs endpoint | `src/pds.js` | | 8 | Integrate blob association | `src/pds.js` | | 9 | Cleanup blobs on delete | `src/pds.js` | | 10 | Add orphan cleanup alarm | `src/pds.js` | | 11 | Update README | `README.md` | **Estimated additions:** ~250 lines to `src/pds.js`, ~60 lines to `test/pds.test.js`