about things
at main 111 lines 3.4 kB view raw view rendered
1# data 2 3atproto's data model: each user is a signed database. 4 5## repos 6 7a repository is a user's data store. it contains all their records - posts, likes, follows, whatever the applications define. 8 9repos are merkle trees. every commit is signed by the user's key and can be verified by anyone. this is what enables authenticated data gossip - you don't need to trust the messenger, you verify the signature. 10 11## records 12 13records are JSON documents stored in collections: 14 15``` 16at://did:plc:xyz/app.bsky.feed.post/3jui7akfj2k2a 17 └── DID ──┘ └── collection ───┘ └── rkey ──┘ 18``` 19 20- **DID**: whose repo 21- **collection**: the record type (lexicon NSID) 22- **rkey**: record key within the collection 23 24record keys are typically TIDs (timestamp-based IDs) for records where users have many (posts, likes). for singletons like profiles, the literal `self` is used. 25 26## AT-URIs 27 28the `at://` URI scheme identifies records: 29 30``` 31at://did:plc:xyz/fm.plyr.track/3jui7akfj2k2a 32at://zzstoatzz.io/app.bsky.feed.post/3jui7akfj2k2a # handle also works 33``` 34 35these are stable references. the URI uniquely identifies a record across the network. 36 37## CIDs 38 39a CID (Content Identifier) is a hash of a specific version of a record: 40 41``` 42bafyreig2fjxi3qbp5jvyqx2i4djxfkp... 43``` 44 45URIs identify *what*, CIDs identify *which version*. when you reference another record and care about the exact content, you include both. 46 47## strongRef 48 49the standard pattern for cross-record references: 50 51```json 52{ 53 "subject": { 54 "uri": "at://did:plc:xyz/fm.plyr.track/abc123", 55 "cid": "bafyreig..." 56 } 57} 58``` 59 60used in likes (referencing tracks), comments (referencing tracks), lists (referencing any records). the CID proves you're referencing a specific version. 61 62from [plyr.fm lexicons](https://github.com/zzstoatzz/plyr.fm/tree/main/lexicons) - likes, comments, and lists all use strongRef. 63 64## collections 65 66records are grouped into collections by type: 67 68``` 69repo/ 70├── app.bsky.feed.post/ 71│ ├── 3jui7akfj2k2a 72│ └── 3jui8bklg3l3b 73├── app.bsky.feed.like/ 74│ └── ... 75└── fm.plyr.track/ 76 └── ... 77``` 78 79each collection corresponds to a lexicon. applications read and write to collections they understand. 80 81## local indexing 82 83querying across PDSes is slow. applications maintain local indexes: 84 85```sql 86-- plyr.fm indexes fm.plyr.track records 87CREATE TABLE tracks ( 88 id SERIAL PRIMARY KEY, 89 did TEXT NOT NULL, 90 rkey TEXT NOT NULL, 91 uri TEXT NOT NULL, 92 cid TEXT, 93 title TEXT NOT NULL, 94 artist TEXT NOT NULL, 95 -- ... application-specific fields 96 UNIQUE(did, rkey) 97); 98``` 99 100when users log in, sync their records from PDS to local database. background jobs keep indexes fresh. 101 102from [plyr.fm](https://github.com/zzstoatzz/plyr.fm) - indexes tracks, likes, comments, playlists locally. 103 104## why this matters 105 106the "each user is one database" model is the foundation of **atmospheric computing**: 107 108- **portability**: your "personal cloud" is yours. if a host fails, you move your data elsewhere. 109- **verification**: trust is cryptographic. you verify the data signature, not the provider. 110- **aggregation**: applications weave together data from millions of personal clouds into a cohesive "atmosphere." 111- **interop**: apps share schemas, so my music player can read your social graph.