semantic bufo search find-bufo.com
bufo

docs: add zig atproto sdk wishlist

wishlist based on building bufo-bot - covers typed lexicons,
session management, blob handling, jetstream client, and more.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+573
+573
docs/zig-atproto-sdk-wishlist.md
··· 1 + # zig atproto sdk wishlist 2 + 3 + a pie-in-the-sky wishlist for what a zig AT protocol sdk could provide, based on building [bufo-bot](../bot) - a bluesky firehose bot that quote-posts matching images. 4 + 5 + --- 6 + 7 + ## 1. typed lexicon schemas 8 + 9 + the single biggest pain point: everything is `json.Value` with manual field extraction. 10 + 11 + ### what we have now 12 + 13 + ```zig 14 + const parsed = json.parseFromSlice(json.Value, allocator, response.items, .{}); 15 + const root = parsed.value.object; 16 + const jwt_val = root.get("accessJwt") orelse return error.NoJwt; 17 + if (jwt_val != .string) return error.NoJwt; 18 + self.access_jwt = try self.allocator.dupe(u8, jwt_val.string); 19 + ``` 20 + 21 + this pattern repeats hundreds of times. it's verbose, error-prone, and provides zero compile-time safety. 22 + 23 + ### what we want 24 + 25 + ```zig 26 + const atproto = @import("atproto"); 27 + 28 + // codegen from lexicon json schemas 29 + const session = try atproto.server.createSession(allocator, .{ 30 + .identifier = handle, 31 + .password = app_password, 32 + }); 33 + // session.accessJwt is already []const u8 34 + // session.did is already []const u8 35 + // session.handle is already []const u8 36 + ``` 37 + 38 + ideally: 39 + - generate zig structs from lexicon json files at build time (build.zig integration) 40 + - full type safety - if a field is optional in the lexicon, it's `?T` in zig 41 + - proper union types for lexicon unions (e.g., embed types) 42 + - automatic serialization/deserialization 43 + 44 + ### lexicon unions are especially painful 45 + 46 + ```zig 47 + // current: manual $type dispatch 48 + const embed_type = record.object.get("$type") orelse return error.NoType; 49 + if (mem.eql(u8, embed_type.string, "app.bsky.embed.images")) { 50 + // handle images... 51 + } else if (mem.eql(u8, embed_type.string, "app.bsky.embed.video")) { 52 + // handle video... 53 + } else if (mem.eql(u8, embed_type.string, "app.bsky.embed.record")) { 54 + // handle quote... 55 + } else if (mem.eql(u8, embed_type.string, "app.bsky.embed.recordWithMedia")) { 56 + // handle quote with media... 57 + } 58 + 59 + // wanted: tagged union 60 + switch (record.embed) { 61 + .images => |imgs| { ... }, 62 + .video => |vid| { ... }, 63 + .record => |quote| { ... }, 64 + .recordWithMedia => |rwm| { ... }, 65 + } 66 + ``` 67 + 68 + --- 69 + 70 + ## 2. session management 71 + 72 + authentication is surprisingly complex and we had to handle it all manually. 73 + 74 + ### what we had to build 75 + 76 + - login with identifier + app password 77 + - store access JWT and refresh JWT 78 + - detect `ExpiredToken` errors in response bodies 79 + - re-login on expiration (we just re-login, didn't implement refresh) 80 + - resolve DID to PDS host via plc.directory lookup 81 + - get service auth tokens for video upload 82 + 83 + ### what we want 84 + 85 + ```zig 86 + const atproto = @import("atproto"); 87 + 88 + var agent = try atproto.Agent.init(allocator, .{ 89 + .service = "https://bsky.social", 90 + }); 91 + 92 + // login with automatic token refresh 93 + try agent.login(handle, app_password); 94 + 95 + // agent automatically: 96 + // - refreshes tokens before expiration 97 + // - retries on ExpiredToken errors 98 + // - resolves DID -> PDS host 99 + // - handles service auth for video.bsky.app 100 + 101 + // just use it, auth is handled 102 + const blob = try agent.uploadBlob(data, "image/png"); 103 + ``` 104 + 105 + ### service auth is particularly gnarly 106 + 107 + for video uploads, you need: 108 + 1. get a service auth token scoped to `did:web:video.bsky.app` with lexicon `com.atproto.repo.uploadBlob` 109 + 2. use that token (not your session token) for the upload 110 + 3. the endpoint is different (`video.bsky.app` not `bsky.social`) 111 + 112 + we had to figure this out from reading other implementations. an sdk should abstract this entirely. 113 + 114 + --- 115 + 116 + ## 3. blob and media handling 117 + 118 + uploading media requires too much manual work. 119 + 120 + ### current pain 121 + 122 + ```zig 123 + // upload blob, get back raw json string 124 + const blob_json = try client.uploadBlob(data, content_type); 125 + // later, interpolate that json string into another json blob 126 + try body_buf.print(allocator, 127 + \\{{"image":{s},"alt":"{s}"}} 128 + , .{ blob_json, alt_text }); 129 + ``` 130 + 131 + we're passing around json strings and interpolating them. this is fragile. 132 + 133 + ### what we want 134 + 135 + ```zig 136 + // upload returns a typed BlobRef 137 + const blob = try agent.uploadBlob(data, .{ .mime_type = "image/png" }); 138 + 139 + // use it directly in a struct 140 + const post = atproto.feed.Post{ 141 + .text = "", 142 + .embed = .{ .images = .{ 143 + .images = &[_]atproto.embed.Image{ 144 + .{ .image = blob, .alt = "a bufo" }, 145 + }, 146 + }}, 147 + }; 148 + try agent.createRecord("app.bsky.feed.post", post); 149 + ``` 150 + 151 + ### video upload is even worse 152 + 153 + ```zig 154 + // current: manual job polling 155 + const job_id = try client.uploadVideo(data, filename); 156 + var attempts: u32 = 0; 157 + while (attempts < 60) : (attempts += 1) { 158 + // poll job status 159 + // check for JOB_STATE_COMPLETED or JOB_STATE_FAILED 160 + // sleep 1 second between polls 161 + } 162 + 163 + // wanted: one call that handles the async nature 164 + const video_blob = try agent.uploadVideo(data, .{ 165 + .filename = "bufo.gif", 166 + .mime_type = "image/gif", 167 + // sdk handles polling internally 168 + }); 169 + ``` 170 + 171 + --- 172 + 173 + ## 4. AT-URI utilities 174 + 175 + we parse AT-URIs by hand with string splitting. 176 + 177 + ```zig 178 + // current 179 + var parts = mem.splitScalar(u8, uri[5..], '/'); // skip "at://" 180 + const did = parts.next() orelse return error.InvalidUri; 181 + _ = parts.next(); // skip collection 182 + const rkey = parts.next() orelse return error.InvalidUri; 183 + 184 + // wanted 185 + const parsed = atproto.AtUri.parse(uri); 186 + // parsed.repo (the DID) 187 + // parsed.collection 188 + // parsed.rkey 189 + ``` 190 + 191 + also want: 192 + - `AtUri.format()` to construct URIs 193 + - validation (is this a valid DID? valid rkey?) 194 + - CID parsing/validation 195 + 196 + --- 197 + 198 + ## 5. jetstream / firehose client 199 + 200 + we used a separate websocket library and manually parsed jetstream messages. 201 + 202 + ### current 203 + 204 + ```zig 205 + const websocket = @import("websocket"); // third party 206 + 207 + // manual connection with exponential backoff 208 + // manual message parsing 209 + // manual event dispatch 210 + ``` 211 + 212 + ### what we want 213 + 214 + ```zig 215 + const atproto = @import("atproto"); 216 + 217 + var jetstream = atproto.Jetstream.init(allocator, .{ 218 + .endpoint = "jetstream2.us-east.bsky.network", 219 + .collections = &[_][]const u8{"app.bsky.feed.post"}, 220 + }); 221 + 222 + // typed events! 223 + while (try jetstream.next()) |event| { 224 + switch (event) { 225 + .commit => |commit| { 226 + switch (commit.operation) { 227 + .create => |record| { 228 + // record is already typed based on collection 229 + if (commit.collection == .feed_post) { 230 + const post: atproto.feed.Post = record; 231 + std.debug.print("new post: {s}\n", .{post.text}); 232 + } 233 + }, 234 + .delete => { ... }, 235 + } 236 + }, 237 + .identity => |identity| { ... }, 238 + .account => |account| { ... }, 239 + } 240 + } 241 + ``` 242 + 243 + bonus points: 244 + - automatic reconnection with configurable backoff 245 + - cursor support for resuming from a position 246 + - filtering (dids, collections) built-in 247 + - automatic decompression if using zstd streams 248 + 249 + --- 250 + 251 + ## 6. record operations 252 + 253 + CRUD for records is manual json construction. 254 + 255 + ### current 256 + 257 + ```zig 258 + var body_buf: std.ArrayList(u8) = .{}; 259 + try body_buf.print(allocator, 260 + \\{{"repo":"{s}","collection":"app.bsky.feed.post","record":{{...}}}} 261 + , .{ did, ... }); 262 + 263 + const result = client.fetch(.{ 264 + .location = .{ .url = "https://bsky.social/xrpc/com.atproto.repo.createRecord" }, 265 + .method = .POST, 266 + .headers = .{ .content_type = .{ .override = "application/json" }, ... }, 267 + .payload = body_buf.items, 268 + ... 269 + }); 270 + ``` 271 + 272 + ### what we want 273 + 274 + ```zig 275 + // create 276 + const result = try agent.createRecord("app.bsky.feed.post", .{ 277 + .text = "hello world", 278 + .createdAt = atproto.Datetime.now(), 279 + }); 280 + // result.uri, result.cid are typed 281 + 282 + // read 283 + const record = try agent.getRecord(atproto.feed.Post, uri); 284 + 285 + // delete 286 + try agent.deleteRecord(uri); 287 + 288 + // list 289 + var iter = agent.listRecords("app.bsky.feed.post", .{ .limit = 50 }); 290 + while (try iter.next()) |record| { ... } 291 + ``` 292 + 293 + --- 294 + 295 + ## 7. rich text / facets 296 + 297 + we avoided facets entirely because they're complex. an sdk should make them easy. 298 + 299 + ### what we want 300 + 301 + ```zig 302 + const rt = atproto.RichText.init(allocator); 303 + try rt.append("check out "); 304 + try rt.appendLink("this repo", "https://github.com/..."); 305 + try rt.append(" by "); 306 + try rt.appendMention("@someone.bsky.social"); 307 + try rt.append(" "); 308 + try rt.appendTag("zig"); 309 + 310 + const post = atproto.feed.Post{ 311 + .text = rt.text(), 312 + .facets = rt.facets(), 313 + }; 314 + ``` 315 + 316 + the sdk should: 317 + - handle unicode byte offsets correctly (this is notoriously tricky) 318 + - auto-detect links/mentions/tags in plain text 319 + - validate handles resolve to real DIDs 320 + 321 + --- 322 + 323 + ## 8. rate limiting and retries 324 + 325 + we have no rate limiting. when we hit limits, we just fail. 326 + 327 + ### what we want 328 + 329 + ```zig 330 + var agent = atproto.Agent.init(allocator, .{ 331 + .rate_limit = .{ 332 + .strategy = .wait, // or .error 333 + .max_retries = 3, 334 + }, 335 + }); 336 + 337 + // agent automatically: 338 + // - respects rate limit headers 339 + // - waits and retries on 429 340 + // - exponential backoff on transient errors 341 + ``` 342 + 343 + --- 344 + 345 + ## 9. pagination helpers 346 + 347 + listing records or searching requires manual cursor handling. 348 + 349 + ```zig 350 + // current: manual 351 + var cursor: ?[]const u8 = null; 352 + while (true) { 353 + const response = try fetch(cursor); 354 + for (response.records) |record| { ... } 355 + cursor = response.cursor orelse break; 356 + } 357 + 358 + // wanted: iterator 359 + var iter = agent.listRecords("app.bsky.feed.post", .{}); 360 + while (try iter.next()) |record| { 361 + // handles pagination transparently 362 + } 363 + 364 + // or collect all 365 + const all_records = try iter.collect(); // fetches all pages 366 + ``` 367 + 368 + --- 369 + 370 + ## 10. did resolution 371 + 372 + we manually hit plc.directory to resolve DIDs. 373 + 374 + ```zig 375 + // current 376 + var url_buf: [256]u8 = undefined; 377 + const url = std.fmt.bufPrint(&url_buf, "https://plc.directory/{s}", .{did}); 378 + // fetch, parse, find service endpoint... 379 + 380 + // wanted 381 + const doc = try atproto.resolveDid(did); 382 + // doc.pds - the PDS endpoint 383 + // doc.handle - verified handle 384 + // doc.signingKey, doc.rotationKeys, etc. 385 + ``` 386 + 387 + should support: 388 + - did:plc via plc.directory 389 + - did:web via .well-known 390 + - caching with TTL 391 + 392 + --- 393 + 394 + ## 11. build.zig integration 395 + 396 + ### lexicon codegen 397 + 398 + ```zig 399 + // build.zig 400 + const atproto = @import("atproto"); 401 + 402 + pub fn build(b: *std.Build) void { 403 + // generate zig types from lexicon schemas 404 + const lexicons = atproto.addLexiconCodegen(b, .{ 405 + .lexicon_dirs = &.{"lexicons/"}, 406 + // or fetch from network 407 + .fetch_lexicons = &.{ 408 + "app.bsky.feed.*", 409 + "app.bsky.actor.*", 410 + "com.atproto.repo.*", 411 + }, 412 + }); 413 + 414 + exe.root_module.addImport("lexicons", lexicons); 415 + } 416 + ``` 417 + 418 + ### bundled CA certs 419 + 420 + TLS in zig requires CA certs. would be nice if the sdk bundled mozilla's CA bundle or made it easy to configure. 421 + 422 + --- 423 + 424 + ## 12. testing utilities 425 + 426 + ### mocks 427 + 428 + ```zig 429 + const atproto = @import("atproto"); 430 + 431 + test "bot responds to matching posts" { 432 + var mock = atproto.testing.MockAgent.init(allocator); 433 + defer mock.deinit(); 434 + 435 + // set up expected calls 436 + mock.expectCreateRecord("app.bsky.feed.post", .{ 437 + .text = "", 438 + // ... 439 + }); 440 + 441 + // run test code 442 + try handlePost(&mock, test_post); 443 + 444 + // verify 445 + try mock.verify(); 446 + } 447 + ``` 448 + 449 + ### jetstream replay 450 + 451 + ```zig 452 + // replay recorded jetstream events for testing 453 + var replay = atproto.testing.JetstreamReplay.init("testdata/events.jsonl"); 454 + while (try replay.next()) |event| { 455 + try handleEvent(event); 456 + } 457 + ``` 458 + 459 + --- 460 + 461 + ## 13. logging / observability 462 + 463 + ### structured logging 464 + 465 + ```zig 466 + var agent = atproto.Agent.init(allocator, .{ 467 + .logger = myLogger, // compatible with std.log or custom 468 + }); 469 + 470 + // logs requests, responses, retries, rate limits 471 + ``` 472 + 473 + ### metrics 474 + 475 + ```zig 476 + var agent = atproto.Agent.init(allocator, .{ 477 + .metrics = .{ 478 + .requests_total = &my_counter, 479 + .request_duration = &my_histogram, 480 + .rate_limit_waits = &my_counter, 481 + }, 482 + }); 483 + ``` 484 + 485 + --- 486 + 487 + ## 14. error handling 488 + 489 + ### typed errors with context 490 + 491 + ```zig 492 + // current: generic errors 493 + error.PostFailed 494 + 495 + // wanted: rich errors 496 + atproto.Error.RateLimit => |e| { 497 + std.debug.print("rate limited, reset at {}\n", .{e.reset_at}); 498 + }, 499 + atproto.Error.InvalidRecord => |e| { 500 + std.debug.print("validation failed: {s}\n", .{e.message}); 501 + }, 502 + atproto.Error.ExpiredToken => { 503 + // sdk should handle this automatically, but if not... 504 + }, 505 + ``` 506 + 507 + --- 508 + 509 + ## 15. moderation / labels 510 + 511 + we didn't need this for bufo-bot, but a complete sdk should support: 512 + 513 + ```zig 514 + // applying labels 515 + try agent.createLabels(.{ 516 + .src = agent.did, 517 + .uri = post_uri, 518 + .val = "spam", 519 + }); 520 + 521 + // reading labels on content 522 + const labels = try agent.getLabels(uri); 523 + for (labels) |label| { 524 + if (mem.eql(u8, label.val, "nsfw")) { 525 + // handle... 526 + } 527 + } 528 + ``` 529 + 530 + --- 531 + 532 + ## 16. feed generators and custom feeds 533 + 534 + ```zig 535 + // serving a feed generator 536 + var server = atproto.FeedGenerator.init(allocator, .{ 537 + .did = my_feed_did, 538 + .hostname = "feed.example.com", 539 + }); 540 + 541 + server.addFeed("trending-bufos", struct { 542 + fn getFeed(ctx: *Context, params: GetFeedParams) !GetFeedResponse { 543 + // return skeleton 544 + } 545 + }.getFeed); 546 + 547 + try server.listen(8080); 548 + ``` 549 + 550 + --- 551 + 552 + ## summary 553 + 554 + the core theme: **let us write application logic, not protocol plumbing**. 555 + 556 + right now building an atproto app in zig means: 557 + - manual json construction/parsing everywhere 558 + - hand-rolling authentication flows 559 + - string interpolation for record creation 560 + - manual http request management 561 + - third-party websocket libraries for firehose 562 + - no compile-time safety for lexicon types 563 + 564 + a good sdk would give us: 565 + - typed lexicon schemas (codegen) 566 + - managed sessions with automatic refresh 567 + - high-level record CRUD 568 + - built-in jetstream client with typed events 569 + - utilities for rich text, AT-URIs, DIDs 570 + - rate limiting and retry logic 571 + - testing helpers 572 + 573 + the dream is writing a bot like bufo-bot in ~100 lines instead of ~1000.