···110110111111- `record`: The atproto record. Its CID can be computed over the bytes of its `block` (see below).
112112113113-### node / base
114113115115-```
116116-|----- node -----|
117117-[ len | mst node ]
118118-```
119119-120120-- `len` (varint): the length of the proceeding CBOR block, in bytes.
121121-122122-- `mst node` (DAG-CBOR): object with the following schema
123123- - `l` (hash link, nullable)
124124-125125-note1: it's a bit tempting to redesign the MST nodes, because the _reason_ (and lack of special-ness) for `l` being separate from the entries in `e` took a long time for me to understand. but the existing format definitely works so maybe sticking close to it is the move?
126126-127127-note2: a magic special zero hash-link is a pretty gross way to shoehorn in a sentinel! null was already taken because subtrees always are optional
128128-129129-(this section is very much in flux)
130130-131131-was thinking of making base (depth=0) nodes special (implicit cid) and then further simplifying to a simple array of entries since they can't have subtrees (`l` or `t`s).
132132-133133-buuuutttt it's probably simpler just to give the node a nullable `cid` property that's required when depth=0.
134134-135135-on the other track, i was thinking nodes could be rewritten as a pair of arrays
136136-137137-```
138138-index: [ 0 , 1 , 2 , 3 ]
139139-140140-new
141141-entries: [ (keyA, linkA) , (keyB, linkB) , (keyC, linkC) ] xxxxxxxxxxxxxxx
142142-trees: [ * tree before A , * tree before B , <null> , *tree after C ]
143143-144144-vs old repo spec
145145-mst node:[ tree in `l` , keyA's `t` , keyB's null `t`, keyC's `t` ]
146146-```
147147-148148-i think most languages can handle a pair of arrays ok with zip? but the equal-or-one-shorter length of `entries` compared to `trees` seems like asking for bugs.
149149-150150-so let's keep it simple (similar to the repo spec), trying again:
151151-114114+### node
152115153116```
154117|----- node -----|
···158121- `len` (varint): the length of the proceeding CBOR block, in bytes.
159122160123- `mst node` (DAG-CBOR): object with the following schema
161161- - `cid` (hash link, nullable): the CID of this MST node. must be `null` for nodes at `depth=0`; required to be non-null for nodes at any higher `depth`.
162162- - `l` (hash link, nullable): reference to a subtree at a lower depth containing only keys to the left of this node. when the referenced node is included in the archive, it must be given a special zeroed-out link reference (all zero bytes (deal with hash link prefixes or whatever... probably can assume sha256 but careful for lossless reversibility back to CAR))
124124+ - `l` (hash link, **optional and nullable**): reference to a subtree at a lower depth containing only keys to the left of this node.
125125+ - when **absent**: there is no left subtree
126126+ - when **null**: the left subtree is present and will follow in the archive (implicit CID)
127127+ - when **non-null**: the left subtree exists but is abset from the archive
163128 - `e` (array, required): ordered array of entry objects, each containing:
164129 - `p` (integer, required): number of bytes shared with the previous entry (TODO key compression actually)
165130 - `k` (byte string, required): key suffix remaining
166166- - `v` (hash link, **nullable**): reference to the record data for this key. must be null if the STAR includes the record; must _not_ be null if the record is not included in the STAR
167167- - `t` (hash link, nullable): link to a subtree that sorts to the right of this entry's key and to the left of the next entry's key. see `l` above.
168168-169169-NOTE: the option to not include `v` (and requiring its hash link to be present in that case) keeps the option open for `key->CID`-only archives, which can be nice for things like diffing a repo to handle a firehose `#sync` event, or perhaps to exclude large records specifically from the archive. (make this cohesive with optional vs null handling if using that)
170170-171171-TODO: nullable vs optional? (in general??)
172172-173173-tempting to do something like:
174174-175175-- omitted means there is no subtree
176176-- null means there is a subtree and it's included (CID to-calculate)
177177-- non-null means there is a subtree and it's *not* included (MST slice or sparse tree)
178178-179179-hmmm: having separate optional and null cases might make deserializing into some languages tricky. i'm not sure if serde can handle that well? omitempty + nullable => `Option<Option<T>>`? should probably check other languages.
131131+ - `v` (hash link, **nullable**): reference to the record data for this key.
132132+ - when **null**: the record is included in the archive and will follow (implicit CID)
133133+ - when **non-null**: the record exists but is not included in the archive
134134+ - `t` (hash link, nullable): link to a subtree that sorts to the right of this entry's key and to the left of the next entry's key. same rules as `l`:
135135+ - when **absent**: there is no left subtree
136136+ - when **null**: the left subtree is present and will follow in the archive (implicit CID)
137137+ - when **non-null**: the left subtree exists but is abset from the archive
180138181139182140### record