# data atproto's data model: each user is a signed database. ## repos a repository is a user's data store. it contains all their records - posts, likes, follows, whatever the applications define. repos are merkle trees. every commit is signed by the user's key and can be verified by anyone. this is what enables authenticated data gossip - you don't need to trust the messenger, you verify the signature. ## records records are JSON documents stored in collections: ``` at://did:plc:xyz/app.bsky.feed.post/3jui7akfj2k2a └── DID ──┘ └── collection ───┘ └── rkey ──┘ ``` - **DID**: whose repo - **collection**: the record type (lexicon NSID) - **rkey**: record key within the collection record keys are typically TIDs (timestamp-based IDs) for records where users have many (posts, likes). for singletons like profiles, the literal `self` is used. ## AT-URIs the `at://` URI scheme identifies records: ``` at://did:plc:xyz/fm.plyr.track/3jui7akfj2k2a at://zzstoatzz.io/app.bsky.feed.post/3jui7akfj2k2a # handle also works ``` these are stable references. the URI uniquely identifies a record across the network. ## CIDs a CID (Content Identifier) is a hash of a specific version of a record: ``` bafyreig2fjxi3qbp5jvyqx2i4djxfkp... ``` URIs identify *what*, CIDs identify *which version*. when you reference another record and care about the exact content, you include both. ## strongRef the standard pattern for cross-record references: ```json { "subject": { "uri": "at://did:plc:xyz/fm.plyr.track/abc123", "cid": "bafyreig..." } } ``` used in likes (referencing tracks), comments (referencing tracks), lists (referencing any records). the CID proves you're referencing a specific version. from [plyr.fm lexicons](https://github.com/zzstoatzz/plyr.fm/tree/main/lexicons) - likes, comments, and lists all use strongRef. ## collections records are grouped into collections by type: ``` repo/ ├── app.bsky.feed.post/ │ ├── 3jui7akfj2k2a │ └── 3jui8bklg3l3b ├── app.bsky.feed.like/ │ └── ... └── fm.plyr.track/ └── ... ``` each collection corresponds to a lexicon. applications read and write to collections they understand. ## local indexing querying across PDSes is slow. applications maintain local indexes: ```sql -- plyr.fm indexes fm.plyr.track records CREATE TABLE tracks ( id SERIAL PRIMARY KEY, did TEXT NOT NULL, rkey TEXT NOT NULL, uri TEXT NOT NULL, cid TEXT, title TEXT NOT NULL, artist TEXT NOT NULL, -- ... application-specific fields UNIQUE(did, rkey) ); ``` when users log in, sync their records from PDS to local database. background jobs keep indexes fresh. from [plyr.fm](https://github.com/zzstoatzz/plyr.fm) - indexes tracks, likes, comments, playlists locally. ## why this matters the "each user is one database" model is the foundation of **atmospheric computing**: - **portability**: your "personal cloud" is yours. if a host fails, you move your data elsewhere. - **verification**: trust is cryptographic. you verify the data signature, not the provider. - **aggregation**: applications weave together data from millions of personal clouds into a cohesive "atmosphere." - **interop**: apps share schemas, so my music player can read your social graph.