A simple tool for incremental sync of atproto repo CAR files

carsync#

Python script to efficiently† refresh an outdated copy of an atproto repo CAR file using com.atproto.sync.getBlocks

$ carsync
Usage: carsync <src_car> <dst_car> <pds_url>

†Caveats:

  • Every missing block is fetched sequentially via getBlocks - there is no batching or concurrency.
  • The whole CAR file is read and re-written.

The latter can be solved by storing the repo in SQLite (or maybe rocksdb) instead of a CAR file, and doing an MST diff rather than a full MST traversal (as is currently the case). Solving the former would probably require some galaxy-brain concurrent MST diff impl.

Despite these limitations, it's still practical and fast even for large-ish repos.

P.S. in theory it could resolve the PDS URL automatically, I didn't implement that so you have to pass it manually.