protocols/atproto/data.md at main

zzstoatzz.io / notes
fork atom
about things
fork atom
notes / protocols / atproto / data.md
at main 111 lines 3.4 kB view raw view rendered
wrap content
zzstoatzz.io feat: Add MCP protocol notes and integrate Atmospheric Computing concepts 2mo ago
8d3cee01
  1# data
  2
  3atproto's data model: each user is a signed database.
  4
  5## repos
  6
  7a repository is a user's data store. it contains all their records - posts, likes, follows, whatever the applications define.
  8
  9repos are merkle trees. every commit is signed by the user's key and can be verified by anyone. this is what enables authenticated data gossip - you don't need to trust the messenger, you verify the signature.
 10
 11## records
 12
 13records are JSON documents stored in collections:
 14
 15```
 16at://did:plc:xyz/app.bsky.feed.post/3jui7akfj2k2a
 17     └── DID ──┘ └── collection ───┘ └── rkey ──┘
 18```
 19
 20- **DID**: whose repo
 21- **collection**: the record type (lexicon NSID)
 22- **rkey**: record key within the collection
 23
 24record keys are typically TIDs (timestamp-based IDs) for records where users have many (posts, likes). for singletons like profiles, the literal `self` is used.
 25
 26## AT-URIs
 27
 28the `at://` URI scheme identifies records:
 29
 30```
 31at://did:plc:xyz/fm.plyr.track/3jui7akfj2k2a
 32at://zzstoatzz.io/app.bsky.feed.post/3jui7akfj2k2a  # handle also works
 33```
 34
 35these are stable references. the URI uniquely identifies a record across the network.
 36
 37## CIDs
 38
 39a CID (Content Identifier) is a hash of a specific version of a record:
 40
 41```
 42bafyreig2fjxi3qbp5jvyqx2i4djxfkp...
 43```
 44
 45URIs identify *what*, CIDs identify *which version*. when you reference another record and care about the exact content, you include both.
 46
 47## strongRef
 48
 49the standard pattern for cross-record references:
 50
 51```json
 52{
 53  "subject": {
 54    "uri": "at://did:plc:xyz/fm.plyr.track/abc123",
 55    "cid": "bafyreig..."
 56  }
 57}
 58```
 59
 60used in likes (referencing tracks), comments (referencing tracks), lists (referencing any records). the CID proves you're referencing a specific version.
 61
 62from [plyr.fm lexicons](https://github.com/zzstoatzz/plyr.fm/tree/main/lexicons) - likes, comments, and lists all use strongRef.
 63
 64## collections
 65
 66records are grouped into collections by type:
 67
 68```
 69repo/
 70├── app.bsky.feed.post/
 71│   ├── 3jui7akfj2k2a
 72│   └── 3jui8bklg3l3b
 73├── app.bsky.feed.like/
 74│   └── ...
 75└── fm.plyr.track/
 76    └── ...
 77```
 78
 79each collection corresponds to a lexicon. applications read and write to collections they understand.
 80
 81## local indexing
 82
 83querying across PDSes is slow. applications maintain local indexes:
 84
 85```sql
 86-- plyr.fm indexes fm.plyr.track records
 87CREATE TABLE tracks (
 88    id SERIAL PRIMARY KEY,
 89    did TEXT NOT NULL,
 90    rkey TEXT NOT NULL,
 91    uri TEXT NOT NULL,
 92    cid TEXT,
 93    title TEXT NOT NULL,
 94    artist TEXT NOT NULL,
 95    -- ... application-specific fields
 96    UNIQUE(did, rkey)
 97);
 98```
 99
100when users log in, sync their records from PDS to local database. background jobs keep indexes fresh.
101
102from [plyr.fm](https://github.com/zzstoatzz/plyr.fm) - indexes tracks, likes, comments, playlists locally.
103
104## why this matters
105
106the "each user is one database" model is the foundation of **atmospheric computing**:
107
108- **portability**: your "personal cloud" is yours. if a host fails, you move your data elsewhere.
109- **verification**: trust is cryptographic. you verify the data signature, not the provider.
110- **aggregation**: applications weave together data from millions of personal clouds into a cohesive "atmosphere."
111- **interop**: apps share schemas, so my music player can read your social graph.