data#
atproto's data model: each user is a signed database.
repos#
a repository is a user's data store. it contains all their records - posts, likes, follows, whatever the applications define.
repos are merkle trees. every commit is signed by the user's key and can be verified by anyone. this is what enables authenticated data gossip - you don't need to trust the messenger, you verify the signature.
records#
records are JSON documents stored in collections:
at://did:plc:xyz/app.bsky.feed.post/3jui7akfj2k2a
└── DID ──┘ └── collection ───┘ └── rkey ──┘
- DID: whose repo
- collection: the record type (lexicon NSID)
- rkey: record key within the collection
record keys are typically TIDs (timestamp-based IDs) for records where users have many (posts, likes). for singletons like profiles, the literal self is used.
AT-URIs#
the at:// URI scheme identifies records:
at://did:plc:xyz/fm.plyr.track/3jui7akfj2k2a
at://zzstoatzz.io/app.bsky.feed.post/3jui7akfj2k2a # handle also works
these are stable references. the URI uniquely identifies a record across the network.
CIDs#
a CID (Content Identifier) is a hash of a specific version of a record:
bafyreig2fjxi3qbp5jvyqx2i4djxfkp...
URIs identify what, CIDs identify which version. when you reference another record and care about the exact content, you include both.
strongRef#
the standard pattern for cross-record references:
{
"subject": {
"uri": "at://did:plc:xyz/fm.plyr.track/abc123",
"cid": "bafyreig..."
}
}
used in likes (referencing tracks), comments (referencing tracks), lists (referencing any records). the CID proves you're referencing a specific version.
from plyr.fm lexicons - likes, comments, and lists all use strongRef.
collections#
records are grouped into collections by type:
repo/
├── app.bsky.feed.post/
│ ├── 3jui7akfj2k2a
│ └── 3jui8bklg3l3b
├── app.bsky.feed.like/
│ └── ...
└── fm.plyr.track/
└── ...
each collection corresponds to a lexicon. applications read and write to collections they understand.
local indexing#
querying across PDSes is slow. applications maintain local indexes:
-- plyr.fm indexes fm.plyr.track records
CREATE TABLE tracks (
id SERIAL PRIMARY KEY,
did TEXT NOT NULL,
rkey TEXT NOT NULL,
uri TEXT NOT NULL,
cid TEXT,
title TEXT NOT NULL,
artist TEXT NOT NULL,
-- ... application-specific fields
UNIQUE(did, rkey)
);
when users log in, sync their records from PDS to local database. background jobs keep indexes fresh.
from plyr.fm - indexes tracks, likes, comments, playlists locally.
why this matters#
the "each user is one database" model is the foundation of atmospheric computing:
- portability: your "personal cloud" is yours. if a host fails, you move your data elsewhere.
- verification: trust is cryptographic. you verify the data signature, not the provider.
- aggregation: applications weave together data from millions of personal clouds into a cohesive "atmosphere."
- interop: apps share schemas, so my music player can read your social graph.