Fast and robust atproto CAR file processing in rust

update where it's at

+27 -2
+27 -2
readme.md
··· 1 1 # repo-stream 2 2 3 - a futures atproto record stream from CAR file 3 + Fast and (aspirationally) robust atproto CAR file processing in rust 4 + 5 + 6 + todo 7 + 8 + - [ ] car file test fixtures & validation tests 9 + - [ ] make sure we can get the did and signature out for verification 10 + - [ ] spec compliance todos 11 + - [ ] assert that keys are ordered and fail if not 12 + - [ ] verify node mst depth from key (possibly pending [interop test fixes](https://github.com/bluesky-social/atproto-interop-tests/issues/5)) 13 + - [ ] performance todos 14 + - [ ] consume the serialized nodes into a mutable efficient format 15 + - [ ] maybe customize the deserialize impl to do that directly? 16 + - [ ] benchmark and profile 17 + - [ ] robustness todos 18 + - [ ] swap the blocks hashmap for a BlockStore trait that can be dumped to redb 19 + - [ ] maybe keep the redb function behind a feature flag? 20 + - [ ] can we assert a max size for node blocks? 21 + - [ ] figure out why asserting the upper nibble of the fourth byte of a node fails fingerprinting 22 + - [ ] max mst depth (there is actually a hard limit but a malicious repo could do anything) 23 + - [ ] i don't think we need a max recursion depth for processing cbor contents since we leave records to the user to decode 4 24 5 - current notes 25 + newer ideas 26 + 27 + - fixing the interleaved mst walk/ block load actually does perform ok: just need the walker to tell the block loader which block we actually need next, so that the block loader can go ahead and load all blocks until that one without checking back with the walker. so i think we're streaming-order ready! 28 + 29 + 30 + later ideas 6 31 7 32 - just buffering all the blocks is 2.5x faster than interleaving optimistic walking 8 33 - at least, this is true on huge CARs with the current (stream-unfriendly) pds export behaviour