about things

data#

atproto's data model: each user is a signed database.

repos#

a repository is a user's data store. it contains all their records - posts, likes, follows, whatever the applications define.

repos are merkle trees. every commit is signed by the user's key and can be verified by anyone. this is what enables authenticated data gossip - you don't need to trust the messenger, you verify the signature.

records#

records are JSON documents stored in collections:

at://did:plc:xyz/app.bsky.feed.post/3jui7akfj2k2a
     └── DID ──┘ └── collection ───┘ └── rkey ──┘
  • DID: whose repo
  • collection: the record type (lexicon NSID)
  • rkey: record key within the collection

record keys are typically TIDs (timestamp-based IDs) for records where users have many (posts, likes). for singletons like profiles, the literal self is used.

AT-URIs#

the at:// URI scheme identifies records:

at://did:plc:xyz/fm.plyr.track/3jui7akfj2k2a
at://zzstoatzz.io/app.bsky.feed.post/3jui7akfj2k2a  # handle also works

these are stable references. the URI uniquely identifies a record across the network.

CIDs#

a CID (Content Identifier) is a hash of a specific version of a record:

bafyreig2fjxi3qbp5jvyqx2i4djxfkp...

URIs identify what, CIDs identify which version. when you reference another record and care about the exact content, you include both.

strongRef#

the standard pattern for cross-record references:

{
  "subject": {
    "uri": "at://did:plc:xyz/fm.plyr.track/abc123",
    "cid": "bafyreig..."
  }
}

used in likes (referencing tracks), comments (referencing tracks), lists (referencing any records). the CID proves you're referencing a specific version.

from plyr.fm lexicons - likes, comments, and lists all use strongRef.

collections#

records are grouped into collections by type:

repo/
├── app.bsky.feed.post/
│   ├── 3jui7akfj2k2a
│   └── 3jui8bklg3l3b
├── app.bsky.feed.like/
│   └── ...
└── fm.plyr.track/
    └── ...

each collection corresponds to a lexicon. applications read and write to collections they understand.

local indexing#

querying across PDSes is slow. applications maintain local indexes:

-- plyr.fm indexes fm.plyr.track records
CREATE TABLE tracks (
    id SERIAL PRIMARY KEY,
    did TEXT NOT NULL,
    rkey TEXT NOT NULL,
    uri TEXT NOT NULL,
    cid TEXT,
    title TEXT NOT NULL,
    artist TEXT NOT NULL,
    -- ... application-specific fields
    UNIQUE(did, rkey)
);

when users log in, sync their records from PDS to local database. background jobs keep indexes fresh.

from plyr.fm - indexes tracks, likes, comments, playlists locally.

why this matters#

the "each user is one database" model is the foundation of atmospheric computing:

  • portability: your "personal cloud" is yours. if a host fails, you move your data elsewhere.
  • verification: trust is cryptographic. you verify the data signature, not the provider.
  • aggregation: applications weave together data from millions of personal clouds into a cohesive "atmosphere."
  • interop: apps share schemas, so my music player can read your social graph.