about things
1# data
2
3atproto's data model: each user is a signed database.
4
5## repos
6
7a repository is a user's data store. it contains all their records - posts, likes, follows, whatever the applications define.
8
9repos are merkle trees. every commit is signed by the user's key and can be verified by anyone. this is what enables authenticated data gossip - you don't need to trust the messenger, you verify the signature.
10
11## records
12
13records are JSON documents stored in collections:
14
15```
16at://did:plc:xyz/app.bsky.feed.post/3jui7akfj2k2a
17 └── DID ──┘ └── collection ───┘ └── rkey ──┘
18```
19
20- **DID**: whose repo
21- **collection**: the record type (lexicon NSID)
22- **rkey**: record key within the collection
23
24record keys are typically TIDs (timestamp-based IDs) for records where users have many (posts, likes). for singletons like profiles, the literal `self` is used.
25
26## AT-URIs
27
28the `at://` URI scheme identifies records:
29
30```
31at://did:plc:xyz/fm.plyr.track/3jui7akfj2k2a
32at://zzstoatzz.io/app.bsky.feed.post/3jui7akfj2k2a # handle also works
33```
34
35these are stable references. the URI uniquely identifies a record across the network.
36
37## CIDs
38
39a CID (Content Identifier) is a hash of a specific version of a record:
40
41```
42bafyreig2fjxi3qbp5jvyqx2i4djxfkp...
43```
44
45URIs identify *what*, CIDs identify *which version*. when you reference another record and care about the exact content, you include both.
46
47## strongRef
48
49the standard pattern for cross-record references:
50
51```json
52{
53 "subject": {
54 "uri": "at://did:plc:xyz/fm.plyr.track/abc123",
55 "cid": "bafyreig..."
56 }
57}
58```
59
60used in likes (referencing tracks), comments (referencing tracks), lists (referencing any records). the CID proves you're referencing a specific version.
61
62from [plyr.fm lexicons](https://github.com/zzstoatzz/plyr.fm/tree/main/lexicons) - likes, comments, and lists all use strongRef.
63
64## collections
65
66records are grouped into collections by type:
67
68```
69repo/
70├── app.bsky.feed.post/
71│ ├── 3jui7akfj2k2a
72│ └── 3jui8bklg3l3b
73├── app.bsky.feed.like/
74│ └── ...
75└── fm.plyr.track/
76 └── ...
77```
78
79each collection corresponds to a lexicon. applications read and write to collections they understand.
80
81## local indexing
82
83querying across PDSes is slow. applications maintain local indexes:
84
85```sql
86-- plyr.fm indexes fm.plyr.track records
87CREATE TABLE tracks (
88 id SERIAL PRIMARY KEY,
89 did TEXT NOT NULL,
90 rkey TEXT NOT NULL,
91 uri TEXT NOT NULL,
92 cid TEXT,
93 title TEXT NOT NULL,
94 artist TEXT NOT NULL,
95 -- ... application-specific fields
96 UNIQUE(did, rkey)
97);
98```
99
100when users log in, sync their records from PDS to local database. background jobs keep indexes fresh.
101
102from [plyr.fm](https://github.com/zzstoatzz/plyr.fm) - indexes tracks, likes, comments, playlists locally.
103
104## why this matters
105
106the "each user is one database" model is the foundation of **atmospheric computing**:
107
108- **portability**: your "personal cloud" is yours. if a host fails, you move your data elsewhere.
109- **verification**: trust is cryptographic. you verify the data signature, not the provider.
110- **aggregation**: applications weave together data from millions of personal clouds into a cohesive "atmosphere."
111- **interop**: apps share schemas, so my music player can read your social graph.