prototypey.org - atproto lexicon typescript toolkit - mirror https://github.com/tylersayshi/prototypey

remove md

Tyler 9f6e6d82 80ef5398

-544
-544
aislop/plan-emit.md
··· 1 - # Plan: Lexicon Emission for Prototypey 2 - 3 - ## Current State 4 - 5 - - **Project**: Type-safe lexicon inference library (similar to Arktype's approach) 6 - - **Structure**: TypeScript library with `src/`, `lib/` (compiled output), `samples/` (example JSON lexicons) 7 - - **Build**: Uses `tsdown` for bundling, pnpm for package management 8 - 9 - ## Emission Strategy 10 - 11 - ### 1. Two-Track Approach 12 - 13 - Since prototypey is about **type inference** from lexicons (not traditional codegen), we should support both: 14 - 15 - #### Track A: Traditional Code Generation (compatibility) 16 - 17 - - Install `@atproto/lex-cli` as a dev dependency 18 - - Emit standard TypeScript files like other atproto projects 19 - - Useful for projects that want traditional generated types 20 - 21 - #### Track B: Type Inference (prototypey's core value) 22 - 23 - - Leverage your existing inference engine (`src/infer.ts`) 24 - - Generate minimal runtime code with inferred types 25 - - This is your differentiator from standard atproto tooling 26 - 27 - ### 2. Directory Structure 28 - 29 - ``` 30 - prototypey/ 31 - ├── lexicons/ # NEW: Input lexicon schemas 32 - │ └── (empty initially, users add their schemas here) 33 - ├── samples/ # Keep existing samples 34 - │ └── *.json 35 - ├── src/ 36 - │ ├── cli/ # NEW: CLI tool for codegen 37 - │ │ ├── index.ts # Main CLI entry 38 - │ │ ├── commands/ 39 - │ │ │ ├── gen-types.ts # Track A: Standard codegen 40 - │ │ │ └── gen-inferred.ts # Track B: Inference-based 41 - │ │ └── templates/ 42 - │ └── ...existing code 43 - ├── generated/ # NEW: Default output directory 44 - │ ├── types/ # Track A output 45 - │ └── inferred/ # Track B output 46 - └── package.json 47 - ``` 48 - 49 - ### 3. CLI Commands 50 - 51 - Add to `package.json`: 52 - 53 - ```json 54 - { 55 - "bin": { 56 - "prototypey": "./lib/cli/index.js" 57 - }, 58 - "scripts": { 59 - "codegen": "prototypey gen-inferred ./generated/inferred ./lexicons/**/*.json" 60 - } 61 - } 62 - ``` 63 - 64 - Provide these commands: 65 - 66 - - `prototypey gen-inferred <outdir> <schemas...>` - Generate type-inferred code (your unique approach) 67 - - `prototypey gen-types <outdir> <schemas...>` - Generate standard TypeScript (delegates to @atproto/lex-cli) 68 - - `prototypey init` - Initialize a new lexicon project with sample configs 69 - 70 - ### 4. Track B: Inferred Code Generation (Your Secret Sauce) 71 - 72 - Generate minimal runtime code that leverages your inference: 73 - 74 - ```typescript 75 - // Example output: generated/inferred/app/bsky/feed/post.ts 76 - import type { Infer } from "prototypey"; 77 - import schema from "../../../../lexicons/app/bsky/feed/post.json" with { type: "json" }; 78 - 79 - export type Post = Infer<typeof schema>; 80 - 81 - // Minimal runtime helpers 82 - export const PostSchema = schema; 83 - export const isPost = (v: unknown): v is Post => { 84 - return ( 85 - typeof v === "object" && 86 - v !== null && 87 - "$type" in v && 88 - v.$type === "app.bsky.feed.post" 89 - ); 90 - }; 91 - ``` 92 - 93 - Benefits: 94 - 95 - - **No validation code duplication** - reuse @atproto/lexicon at runtime 96 - - **Type inference magic** - your core competency 97 - - **Smaller bundle size** - minimal generated code 98 - - **Simpler output** - easier to understand 99 - 100 - ### 5. Dependencies to Add 101 - 102 - ```json 103 - { 104 - "dependencies": { 105 - "@atproto/lexicon": "^0.3.0" 106 - }, 107 - "devDependencies": { 108 - "@atproto/lex-cli": "^0.9.1", 109 - "commander": "^12.0.0", 110 - "glob": "^10.0.0" 111 - }, 112 - "peerDependencies": { 113 - "typescript": ">=5.0.0" 114 - } 115 - } 116 - ``` 117 - 118 - ### 6. Build Pipeline Integration 119 - 120 - Update `package.json` scripts: 121 - 122 - ```json 123 - { 124 - "scripts": { 125 - "build": "tsdown", 126 - "build:cli": "tsdown --entry src/cli/index.ts --format esm --dts false", 127 - "codegen:samples": "prototypey gen-inferred ./generated/samples ./samples/*.json", 128 - "prepack": "pnpm build && pnpm build:cli" 129 - } 130 - } 131 - ``` 132 - 133 - ### 7. Configuration File (optional) 134 - 135 - `prototypey.config.json`: 136 - 137 - ```json 138 - { 139 - "lexicons": "./lexicons", 140 - "output": { 141 - "inferred": "./generated/inferred", 142 - "types": "./generated/types" 143 - }, 144 - "include": ["**/*.json"], 145 - "exclude": ["**/node_modules/**"] 146 - } 147 - ``` 148 - 149 - ### 8. Documentation Updates 150 - 151 - Create docs for: 152 - 153 - 1. **Quick Start**: How to run codegen on your lexicons 154 - 2. **Track Comparison**: When to use inferred vs. standard generation 155 - 3. **Migration Guide**: Moving from @atproto/lex-cli to prototypey 156 - 4. **Type Inference Deep Dive**: How your inference works (marketing!) 157 - 158 - ## Key Differentiators 159 - 160 - ### Prototypey's Unique Value 161 - 162 - 1. **Compile-time type inference** - No runtime validation code needed 163 - 2. **Smaller bundles** - Minimal generated code 164 - 3. **Better DX** - Types are inferred, not generated boilerplate 165 - 4. **Same safety guarantees** - Full TypeScript type checking 166 - 167 - ### vs. Standard @atproto/lex-cli 168 - 169 - - **Standard**: Generates verbose validation code 170 - - **Prototypey**: Generates minimal code + type inference 171 - - **Both**: Same type safety, but prototypey is leaner 172 - 173 - ## Implementation Priority 174 - 175 - 1. ✅ **Phase 1**: Basic CLI structure + Track B (inferred generation) - COMPLETE 176 - 2. ✅ **Phase 2**: File organization + output directory structure - COMPLETE 177 - 3. ✅ **Phase 3**: Convert to pnpm workspaces monorepo - COMPLETE - this was marked complete but we still have src and packages 178 - 4. **Phase 4**: Track A (standard generation, delegate to lex-cli) 179 - 5. **Phase 5**: Configuration file support 180 - 6. **Phase 6**: Documentation + examples 181 - 182 - ## Phase 1 & 2 Implementation Notes 183 - 184 - ### ✅ Completed (2025-10-16) 185 - 186 - **Tech Stack Choices:** 187 - 188 - - Used `sade` instead of `commander` (modern, minimal CLI framework from awesome-e18e) 189 - - Used `tinyglobby` instead of `glob` (faster, modern alternative) 190 - - Built with `tsdown` for CLI bundling 191 - 192 - **Structure Created:** 193 - 194 - ``` 195 - prototypey/ 196 - ├── src/cli/ 197 - │ ├── index.ts # CLI entry with sade 198 - │ ├── commands/ 199 - │ │ └── gen-inferred.ts # Track B implementation 200 - │ └── templates/ 201 - │ └── inferred.ts # Code generation template 202 - ├── generated/ 203 - │ └── inferred/ # Generated type files 204 - ├── lexicons/ # Input directory (empty, ready for user schemas) 205 - └── lib/cli/ # Built CLI output 206 - ``` 207 - 208 - **Generated Code Pattern:** 209 - 210 - ```typescript 211 - // generated/inferred/app/bsky/actor/profile.ts 212 - import type { Infer } from "prototypey"; 213 - import schema from "../../../../../samples/demo.json" with { type: "json" }; 214 - 215 - export type Profile = Infer<typeof schema>; 216 - export const ProfileSchema = schema; 217 - export function isProfile(v: unknown): v is Profile { ... } 218 - ``` 219 - 220 - **CLI Usage:** 221 - 222 - ```bash 223 - # Build CLI 224 - pnpm build:cli 225 - 226 - # Generate from samples 227 - pnpm codegen:samples 228 - 229 - # Direct usage 230 - node lib/cli/index.js gen-inferred ./generated/inferred './samples/*.json' 231 - ``` 232 - 233 - **Key Features:** 234 - 235 - - Converts NSID to file paths: `app.bsky.feed.post` → `app/bsky/feed/post.ts` 236 - - Generates minimal runtime code with type inference 237 - - Auto-creates directory structure 238 - - Skips invalid schemas gracefully 239 - - Type guard functions for runtime checks 240 - 241 - **Testing:** 242 - 243 - - Successfully generated types from sample lexicons 244 - - Runtime validation works (tested with node) 245 - - Schema imports work correctly with JSON modules 246 - 247 - ## Phase 3: Monorepo Strategy 248 - 249 - ### Why Monorepo? 250 - 251 - The CLI tool should be a separate package from the core inference library for several reasons: 252 - 253 - 1. **Separation of concerns**: Core inference types vs. code generation tooling 254 - 2. **Dependency isolation**: CLI needs `sade`, `tinyglobby`, etc. - consumers of the core library don't 255 - 3. **Bundle size**: Users importing just types don't want CLI bloat 256 - 4. **Independent versioning**: CLI can evolve separately from type inference 257 - 5. **Better organization**: Clear boundaries between runtime and build-time code 258 - 259 - ### Proposed Structure 260 - 261 - ``` 262 - prototypey/ 263 - ├── package.json # Root workspace config 264 - ├── pnpm-workspace.yaml # Workspace definition 265 - ├── packages/ 266 - │ ├── prototypey/ # Core inference library 267 - │ │ ├── package.json # Main package (prototypey) 268 - │ │ ├── src/ 269 - │ │ │ ├── index.ts 270 - │ │ │ ├── infer.ts 271 - │ │ │ ├── lib.ts 272 - │ │ │ └── type-utils.ts 273 - │ │ ├── lib/ # Built output 274 - │ │ └── tests/ 275 - │ │ 276 - │ └── cli/ # CLI package 277 - │ ├── package.json # Separate package (@prototypey/cli) 278 - │ ├── src/ 279 - │ │ ├── index.ts 280 - │ │ ├── commands/ 281 - │ │ │ └── gen-inferred.ts 282 - │ │ └── templates/ 283 - │ │ └── inferred.ts 284 - │ └── lib/ # Built CLI output 285 - 286 - ├── samples/ # Shared samples 287 - ├── generated/ # Generated output (gitignored) 288 - └── lexicons/ # Input lexicons (gitignored) 289 - ``` 290 - 291 - ### Package Configurations 292 - 293 - **Root `pnpm-workspace.yaml`:** 294 - 295 - ```yaml 296 - packages: 297 - - "packages/*" 298 - ``` 299 - 300 - **Root `package.json`:** 301 - 302 - ```json 303 - { 304 - "name": "prototypey-monorepo", 305 - "private": true, 306 - "scripts": { 307 - "build": "pnpm -r build", 308 - "test": "pnpm -r test", 309 - "lint": "pnpm -r lint", 310 - "format": "prettier . --write" 311 - } 312 - } 313 - ``` 314 - 315 - **`packages/prototypey/package.json`:** 316 - 317 - ```json 318 - { 319 - "name": "prototypey", 320 - "version": "0.0.0", 321 - "main": "lib/index.js", 322 - "exports": { 323 - ".": "./lib/index.js", 324 - "./infer": "./lib/infer.js" 325 - }, 326 - "dependencies": {}, 327 - "scripts": { 328 - "build": "tsdown", 329 - "test": "vitest run" 330 - } 331 - } 332 - ``` 333 - 334 - **`packages/cli/package.json`:** 335 - 336 - ```json 337 - { 338 - "name": "@prototypey/cli", 339 - "version": "0.0.0", 340 - "bin": { 341 - "prototypey": "./lib/index.js" 342 - }, 343 - "dependencies": { 344 - "prototypey": "workspace:*", 345 - "sade": "^1.8.1", 346 - "tinyglobby": "^0.2.15" 347 - }, 348 - "scripts": { 349 - "build": "tsdown --entry src/index.ts --format esm --dts false" 350 - } 351 - } 352 - ``` 353 - 354 - ### Migration Steps 355 - 356 - 1. Create `pnpm-workspace.yaml` at root 357 - 2. Create `packages/prototypey/` and move core files 358 - 3. Create `packages/cli/` and move CLI files 359 - 4. Update import paths in CLI to use `prototypey` package 360 - 5. Update root `package.json` to be private workspace root 361 - 6. Update build scripts to use `pnpm -r` (recursive) 362 - 7. Test both packages build independently 363 - 8. Update documentation 364 - 365 - ### Benefits 366 - 367 - - **Cleaner dependency tree**: Core has zero dependencies 368 - - **Better DX**: Users can `npm install prototypey` for types only 369 - - **CLI as optional tool**: `npm install -D @prototypey/cli` when needed 370 - - **Easier testing**: Each package can have its own test suite 371 - - **Future expansion**: Easy to add more packages (e.g., `@prototypey/validator`) 372 - 373 - ## ATProto Lexicon Background Research 374 - 375 - ### Official Tooling: @atproto/lex-cli 376 - 377 - ATProto projects use **lexicon schemas** (JSON files) to define data structures, API endpoints, and event streams. These schemas are then automatically transformed into type-safe TypeScript code using the **@atproto/lex-cli** code generation tool. 378 - 379 - #### Installation 380 - 381 - ```bash 382 - npm install @atproto/lex-cli 383 - ``` 384 - 385 - #### Available Commands 386 - 387 - - **`lex gen-api <outdir> <schemas...>`** - Generate TypeScript client API 388 - - **`lex gen-server <outdir> <schemas...>`** - Generate TypeScript server API 389 - - **`lex gen-ts-obj <schemas...>`** - Generate a TS file that exports an array of schemas 390 - - **`lex gen-md <schemas...>`** - Generate markdown documentation 391 - - **`lex new [options] <nsid> [outfile]`** - Create a new schema JSON file 392 - 393 - #### Common Options 394 - 395 - - **`--yes`** - Auto-confirm overwrites during generation 396 - 397 - ### Typical Project Structure 398 - 399 - ``` 400 - project-root/ 401 - ├── lexicons/ # Input: JSON schema definitions 402 - │ ├── com/ 403 - │ │ └── atproto/ 404 - │ │ ├── repo/ 405 - │ │ │ ├── getRecord.json 406 - │ │ │ └── createRecord.json 407 - │ │ └── server/ 408 - │ │ └── defs.json 409 - │ └── app/ 410 - │ └── bsky/ 411 - │ ├── feed/ 412 - │ │ └── post.json 413 - │ └── richtext/ 414 - │ └── facet.json 415 - ├── src/ 416 - │ ├── client/ # Output: Generated client code 417 - │ │ └── types/ 418 - │ │ ├── com/ 419 - │ │ │ └── atproto/ 420 - │ │ │ └── repo/ 421 - │ │ │ └── getRecord.ts 422 - │ │ └── app/ 423 - │ │ └── bsky/ 424 - │ │ └── richtext/ 425 - │ │ └── facet.ts 426 - │ └── lexicon/ # Output: Generated server code 427 - └── package.json 428 - ``` 429 - 430 - ### Naming Conventions 431 - 432 - **NSIDs (Namespaced Identifiers)**: 433 - 434 - - Format: Reverse-DNS + name (e.g., `com.atproto.repo.getRecord`) 435 - - Domain authority: `com.atproto` (reverse DNS of `atproto.com`) 436 - - Name segment: `getRecord` 437 - - File path mirrors NSID: `lexicons/com/atproto/repo/getRecord.json` 438 - 439 - **Definition Naming**: 440 - 441 - - Records: Single nouns, not pluralized (e.g., `post`, `profile`) 442 - - XRPC methods: verbNoun format (e.g., `getProfile`, `createRecord`) 443 - - Shared definitions: Use `*.defs` lexicons (e.g., `com.atproto.server.defs`) 444 - 445 - ### Generated TypeScript Code Structure 446 - 447 - The generated TypeScript file includes: 448 - 449 - 1. **TypeScript Interfaces** with explicit `$type` properties 450 - 2. **Type Guard Functions** (`is*`) for runtime type checking 451 - 3. **Validation Functions** (`validate*`) for schema validation 452 - 453 - Example: 454 - 455 - ```typescript 456 - /** 457 - * GENERATED CODE - DO NOT MODIFY 458 - */ 459 - import { ValidationResult, BlobRef } from "@atproto/lexicon"; 460 - import { lexicons } from "../../../../lexicons"; 461 - import { isObj, hasProp } from "../../../../util"; 462 - import { CID } from "multiformats/cid"; 463 - 464 - export interface Main { 465 - $type?: "app.bsky.richtext.facet"; 466 - index: ByteSlice; 467 - features: (Mention | Link | Tag | { $type: string; [k: string]: unknown })[]; 468 - [k: string]: unknown; 469 - } 470 - 471 - export function isMain(v: unknown): v is Main { 472 - return ( 473 - isObj(v) && 474 - hasProp(v, "$type") && 475 - (v.$type === "app.bsky.richtext.facet#main" || 476 - v.$type === "app.bsky.richtext.facet") 477 - ); 478 - } 479 - 480 - export function validateMain(v: unknown): ValidationResult { 481 - return lexicons.validate("app.bsky.richtext.facet#main", v); 482 - } 483 - ``` 484 - 485 - ### Build Scripts & Integration 486 - 487 - Example `package.json` scripts: 488 - 489 - ```json 490 - { 491 - "scripts": { 492 - "codegen": "lex gen-api --yes ./src/client ../../lexicons/com/atproto/*/* ../../lexicons/app/bsky/*/*", 493 - "build": "tsc --build tsconfig.build.json" 494 - }, 495 - "devDependencies": { 496 - "@atproto/lex-cli": "^0.9.1" 497 - } 498 - } 499 - ``` 500 - 501 - ### Best Practices 502 - 503 - 1. **Use reverse-DNS NSIDs** for your domain (e.g., `com.example.*`) 504 - 2. **Group related schemas** by namespace hierarchy 505 - 3. **Create `*.defs` lexicons** for shared definitions used across multiple schemas 506 - 4. **Store lexicons in `/lexicons` directory** at repository root 507 - 5. **Mirror NSID structure in filesystem** (e.g., `lexicons/com/example/thing.json`) 508 - 6. **Run codegen before build** in your npm scripts 509 - 7. **Generate to predictable directories** (e.g., `./src/client`, `./src/lexicon`) 510 - 511 - ### Schema Evolution Rules 512 - 513 - 1. **New fields must be optional** to maintain backward compatibility 514 - 2. **Cannot remove non-optional fields** without breaking changes 515 - 3. **Cannot change field types** without creating new lexicon 516 - 4. **Cannot rename fields** - must deprecate and add new field 517 - 5. **Breaking changes require new NSID** (e.g., `v2` suffix) 518 - 519 - ### Type Categories in Lexicons 520 - 521 - #### Primary Types (one per file) 522 - 523 - - **record** - Repository-stored objects 524 - - **query** - XRPC HTTP GET endpoints 525 - - **procedure** - XRPC HTTP POST endpoints 526 - - **subscription** - WebSocket event streams 527 - 528 - #### Field Types 529 - 530 - - **Primitives**: null, boolean, integer, string, bytes 531 - - **Special**: cid-link, blob, unknown 532 - - **Structures**: array, object, params 533 - - **References**: ref, union, token 534 - 535 - ### Real-World Examples 536 - 537 - - **Official ATProto Repository**: https://github.com/bluesky-social/atproto 538 - - Lexicons: `/lexicons/com/atproto/*`, `/lexicons/app/bsky/*` 539 - - Generated Client: `/packages/api/src/client/` 540 - - Generated Server: `/packages/pds/src/lexicon/` 541 - 542 - ## Next Steps 543 - 544 - Start with **Phase 1** - building the CLI and the inferred code generation, since that's prototypey's core differentiator.