cozy-setup (move to another repo).md
legacy/cozy-setup (move to another repo).md
+35
legacy/old-readme-details.md
···11+[Constellation](./constellation/)
22+--------------------------------------------
33+44+A global atproto backlink index ✨
55+66+- Self hostable: handles the full write throughput of the global atproto firehose on a raspberry pi 4b + single SSD
77+- Storage efficient: less than 2GB/day disk consumption indexing all references in all lexicons and all non-atproto URLs
88+- Handles record deletion, account de/re-activation, and account deletion, ensuring accurate link counts and respecting users data choices
99+- Simple JSON API
1010+1111+All social interactions in atproto tend to be represented by links (or references) between PDS records. This index can answer questions like "how many likes does a bsky post have", "who follows an account", "what are all the comments on a [frontpage](https://frontpage.fyi/) post", and more.
1212+1313+- **status**: works! api is unstable and likely to change, and no known instances have a full network backfill yet.
1414+- source: [./constellation/](./constellation/)
1515+- public instance: [constellation.microcosm.blue](https://constellation.microcosm.blue/)
1616+1717+_note: the public instance currently runs on a little raspberry pi in my house, feel free to use it! it comes with only with best-effort uptime, no commitment to not breaking the api for now, and possible rate-limiting. if you want to be nice you can put your project name and bsky username (or email) in your user-agent header for api requests._
1818+1919+2020+App: Spacedust
2121+--------------
2222+2323+A notification subscription service 💫
2424+2525+using the same "link source" concept as [constellation](./constellation/), offer webhook notifications for new references created to records
2626+2727+- **status**: in design
2828+2929+3030+Library: [links](./links/)
3131+------------------------------------
3232+3333+A rust crate (not published on crates.io yet) for optimistically parsing links out of arbitrary atproto PDS records, and potentially canonicalizing them
3434+3535+- **status**: unstable, might remain an internal lib for constellation (and spacedust, soon)
+123
legacy/original-notes.md
···11+---
22+33+44+old notes follow, ignore
55+------------------------
66+77+88+as far as i can tell, atproto lexicons today don't follow much of a convention for referencing across documents: sometimes it's a StrongRef, sometimes it's a DID, sometimes it's a bare at-uri. lexicon authors choose any old link-sounding key name for the key in their document.
99+1010+it's pretty messy so embrace the mess: atproto wants to be part of the web, so this library will also extract URLs and other URIs if you want it to. all the links.
1111+1212+1313+why
1414+---
1515+1616+the atproto firehose that bluesky sprays at you will contain raw _contents_ from peoples' pdses. these are isolated, decontextualized updates. it's very easy to build some kinds of interesting downstream apps off of this feed.
1717+1818+- bluesky posts (firesky, deletions, )
1919+- blueksy post stats (emojis, )
2020+- trending keywords ()
2121+2222+but bringing almost kind of _context_ into your project requires a big step up in complexity and potentially cost: you're entering "appview" territory. _how many likes does a post have? who follows this account?_
2323+2424+you own your atproto data: it's kept in your personal data repository (PDS) and noone else can write to it. when someone likes your post, they create a "like" record in their _own_ pds, and that like belongs to _them_, not to you/your post.
2525+2626+in the firehose you'll see a `app.bsky.feed.post` record created, with no details about who has liked it. then you'll see separate `app.bsky.feed.like` records show up for each like that comes in on that post, with no context about the post except a random-looking reference to it. storing these in order to do so is up to you!
2727+2828+**so, why**
2929+3030+everything is links, and they're a mess, but they all kinda work the same, so maybe some tooling can bring down that big step in complexity from firehose raw-content apps -> apps requiring any social context.
3131+3232+everything is links:
3333+3434+- likes
3535+- follows
3636+- blocks
3737+- reposts
3838+- quotes
3939+4040+some low-level things you could make from links:
4141+4242+- notification streams (part of ucosm)
4343+- a global reverse index (part of ucosm)
4444+4545+i think that making these low-level services as easy to use as jetstream could open up pathways for building more atproto apps that operate at full scale with interesting features for reasonable effort at low cost to operate.
4646+4747+4848+extracting links
4949+---------------
5050+5151+5252+- low-level: pass a &str of a field value and get a parsed link back
5353+5454+- med-level: pass a &str of record in json form and get a list of parsed links + json paths back. (todo: should also handle dag-cbor prob?)
5555+5656+- high-ish level: pass the json record and maybe apply some pre-loaded rules based on known lexicons to get the best result.
5757+5858+for now, a link is only considered if it matches for the entire value of the record's field -- links embedded in text content are not included. note that urls in bluesky posts _will_ still be extracted, since they are broken out into facets.
5959+6060+6161+resolving / canonicalizing links
6262+--------------------------------
6363+6464+6565+### at-uris
6666+6767+every at-uri has at least two equivalent forms, one with a `DID`, and one with an account handle. the at-uri spec [illustrates this by example](https://atproto.com/specs/at-uri-scheme):
6868+6969+- `at://did:plc:44ybard66vv44zksje25o7dz/app.bsky.feed.post/3jwdwj2ctlk26`
7070+- `at://bnewbold.bsky.team/app.bsky.feed.post/3jwdwj2ctlk26`
7171+7272+some applications, like a reverse link index, may wish to canonicalize at-uris to a single form. the `DID`-form is stable as an account changes its handle and probably the right choice to canonicalize to, but maybe some apps would actually perfer to canonicalise to handles?
7373+7474+hopefully atrium will make it easy to resolve at-uris.
7575+7676+7777+### urls
7878+7979+canonicalizing URLs is more annoying but also a bit more established. lots of details.
8080+8181+- do we have to deal with punycode?
8282+- follow redirects (todo: only permanent ones, or all?)
8383+- check for rel=canonical http header and possibly follow it
8484+- check link rel=canonical meta tag and possibly follow it
8585+- do we need to check site maps??
8686+- do we have to care at all about AMP?
8787+- do we want anything to do with url shorteners??
8888+- how do multilingual sites affect this?
8989+- do we have to care about `script type="application/ld+json"` ???
9090+9191+ugh. is there a crate for this.
9292+9393+9494+### relative uris?
9595+9696+links might be relative, in which case they might need to be made absolute before being useful. is that a concern for this library, or up to the user? (seems like we might not have context here to determine its absolute)
9797+9898+9999+### canonicalizing
100100+101101+there should be a few async functions available to canonicalize already-parsed links.
102102+103103+- what happens if a link can't be resolved?
104104+105105+106106+---
107107+108108+- using `tinyjson` because it's nice -- maybe should switch to serde_json to share deps with atrium?
109109+110110+- would use atrium for parsing at-uris, but it's not in there. there's a did-only version in the non-lib commands.rs. its identifier parser is strict to did + handle, which makes sense, but for our purposes we might want to allow unknown methods too?
111111+112112+ - rsky-syntax has an aturi
113113+ - adenosyne also
114114+ - might come back to these
115115+116116+117117+-------
118118+119119+rocks
120120+121121+```bash
122122+ROCKSDB_LIB_DIR=/nix/store/z2chn0hsik0clridr8mlprx1cngh1g3c-rocksdb-9.7.3/lib/ cargo build
123123+```
+41-137
readme.md
···11-microcosm: links
22-================
33-44-this repo contains libraries and apps for working with cross-record references in at-protocol.
55-11+microcosm
22+=========
6377-App: [Constellation](./constellation/)
88---------------------------------------------
44+This repo contains APIs and libraries for [atproto](https://atproto.com/) services from [microcosm](https://microcosm.blue):
951010-A global atproto backlink index ✨
1161212-- Self hostable: handles the full write throughput of the global atproto firehose on a raspberry pi 4b + single SSD
1313-- Storage efficient: less than 2GB/day disk consumption indexing all references in all lexicons and all non-atproto URLs
1414-- Handles record deletion, account de/re-activation, and account deletion, ensuring accurate link counts and respecting users data choices
1515-- Simple JSON API
1616-1717-All social interactions in atproto tend to be represented by links (or references) between PDS records. This index can answer questions like "how many likes does a bsky post have", "who follows an account", "what are all the comments on a [frontpage](https://frontpage.fyi/) post", and more.
1818-1919-- **status**: works! api is unstable and likely to change, and no known instances have a full network backfill yet.
2020-- source: [./constellation/](./constellation/)
2121-- public instance: [constellation.microcosm.blue](https://constellation.microcosm.blue/)
2222-2323-_note: the public instance currently runs on a little raspberry pi in my house, feel free to use it! it comes with only with best-effort uptime, no commitment to not breaking the api for now, and possible rate-limiting. if you want to be nice you can put your project name and bsky username (or email) in your user-agent header for api requests._
2424-2525-2626-App: Spacedust
2727---------------
2828-2929-A notification subscription service 💫
3030-3131-using the same "link source" concept as [constellation](./constellation/), offer webhook notifications for new references created to records
3232-3333-- **status**: in design
3434-3535-3636-Library: [links](./links/)
77+🌌 [Constellation](./constellation/)
378------------------------------------
3893939-A rust crate (not published on crates.io yet) for optimistically parsing links out of arbitrary atproto PDS records, and potentially canonicalizing them
4040-4141-- **status**: unstable, might remain an internal lib for constellation (and spacedust, soon)
4242-4343-4444-4545----
4646-4747-4848-old notes follow, ignore
4949-------------------------
5050-5151-5252-as far as i can tell, atproto lexicons today don't follow much of a convention for referencing across documents: sometimes it's a StrongRef, sometimes it's a DID, sometimes it's a bare at-uri. lexicon authors choose any old link-sounding key name for the key in their document.
5353-5454-it's pretty messy so embrace the mess: atproto wants to be part of the web, so this library will also extract URLs and other URIs if you want it to. all the links.
5555-5656-5757-why
5858----
5959-6060-the atproto firehose that bluesky sprays at you will contain raw _contents_ from peoples' pdses. these are isolated, decontextualized updates. it's very easy to build some kinds of interesting downstream apps off of this feed.
6161-6262-- bluesky posts (firesky, deletions, )
6363-- blueksy post stats (emojis, )
6464-- trending keywords ()
6565-6666-but bringing almost kind of _context_ into your project requires a big step up in complexity and potentially cost: you're entering "appview" territory. _how many likes does a post have? who follows this account?_
6767-6868-you own your atproto data: it's kept in your personal data repository (PDS) and noone else can write to it. when someone likes your post, they create a "like" record in their _own_ pds, and that like belongs to _them_, not to you/your post.
6969-7070-in the firehose you'll see a `app.bsky.feed.post` record created, with no details about who has liked it. then you'll see separate `app.bsky.feed.like` records show up for each like that comes in on that post, with no context about the post except a random-looking reference to it. storing these in order to do so is up to you!
7171-7272-**so, why**
7373-7474-everything is links, and they're a mess, but they all kinda work the same, so maybe some tooling can bring down that big step in complexity from firehose raw-content apps -> apps requiring any social context.
7575-7676-everything is links:
7777-7878-- likes
7979-- follows
8080-- blocks
8181-- reposts
8282-- quotes
8383-8484-some low-level things you could make from links:
8585-8686-- notification streams (part of ucosm)
8787-- a global reverse index (part of ucosm)
8888-8989-i think that making these low-level services as easy to use as jetstream could open up pathways for building more atproto apps that operate at full scale with interesting features for reasonable effort at low cost to operate.
9090-1010+A global atproto interactions backlink index as a simple JSON API. Works with every lexicon, runs on a raspberry pi, consumes less than 2GiB of disk per day. Handles record deletion, account de/re-activation, and account deletion, ensuring accurate link counts while respecting users' data choices.
91119292-extracting links
9393----------------
1212+- source: [./constellation/](./constellation/)
1313+- [public instance + API docs](https://constellation.microcosm.blue/)
1414+- status: used in production. APIs will change but backwards compatibility will be maintained as long as needed.
941595169696-- low-level: pass a &str of a field value and get a parsed link back
1717+🎇 [Spacedust](./spacedust/)
1818+----------------------------
97199898-- med-level: pass a &str of record in json form and get a list of parsed links + json paths back. (todo: should also handle dag-cbor prob?)
2020+A global atproto interactions firehose. Extracts all at-uris, DIDs, and URLs from every lexicon in the firehose, and exposes them over a websocket modelled after [jetstream](github.com/bluesky-social/jetstream).
9921100100-- high-ish level: pass the json record and maybe apply some pre-loaded rules based on known lexicons to get the best result.
2222+- source: [./spacedust/](./spacedust/)
2323+- [public instance + API docs](https://spacedust.microcosm.blue/)
2424+- status: v0: the basics work and the APIs are in place! missing cursor replay, forward link storage, and delete event link hydration.
10125102102-for now, a link is only considered if it matches for the entire value of the record's field -- links embedded in text content are not included. note that urls in bluesky posts _will_ still be extracted, since they are broken out into facets.
2626+Demos:
103272828+- [Spacedust notifications](https://notifications.microcosm.blue/): web push notifications for _every_ atproto app
2929+- [Zero-Bluesky real-time interaction-updating post embed](https://bsky.bad-example.com/zero-bluesky-realtime-embed/)
10430105105-resolving / canonicalizing links
106106---------------------------------
107313232+🛰️ [Slingshot](./slingshot)
3333+---------------------------
10834109109-### at-uris
3535+A fast, eager, production-grade edge cache for atproto records and identities. Pre-caches all records from the firehose and maintains a longer-term cache of requested records on disk.
11036111111-every at-uri has at least two equivalent forms, one with a `DID`, and one with an account handle. the at-uri spec [illustrates this by example](https://atproto.com/specs/at-uri-scheme):
3737+- source: [./slingshot/](./slingshot/)
3838+- [public instance + API docs](https://slingshot.microcosm.blue/)
3939+- status: v0: most XRPC APIs are working. cache storage is being reworked.
11240113113-- `at://did:plc:44ybard66vv44zksje25o7dz/app.bsky.feed.post/3jwdwj2ctlk26`
114114-- `at://bnewbold.bsky.team/app.bsky.feed.post/3jwdwj2ctlk26`
11541116116-some applications, like a reverse link index, may wish to canonicalize at-uris to a single form. the `DID`-form is stable as an account changes its handle and probably the right choice to canonicalize to, but maybe some apps would actually perfer to canonicalise to handles?
4242+🛸 [UFOs API](./ufos)
4343+---------------------
11744118118-hopefully atrium will make it easy to resolve at-uris.
4545+Timeseries stats and sample records for every [collection](https://atproto.com/guides/glossary#collection) ever seen in the atproto firehose. Unique users are counted in hyperloglog sketches enabling arbitrary cardinality aggregation across time buckets and/or NSIDs.
119464747+- source: [./ufos/](./ufos/)
4848+- [public instance + API docs](https://ufos-api.microcosm.blue/)
4949+- status: Used in production. It has APIs and they work! Needs improvement on indexing; needs more indexes and some more APIs to the data exposed.
12050121121-### urls
5151+See also: [UFOs atproto explorer](https://ufos.microcosm.blue/) built on UFOs API. ([source](github.com/at-microcosm/spacedust-utils))
12252123123-canonicalizing URLs is more annoying but also a bit more established. lots of details.
12453125125-- do we have to deal with punycode?
126126-- follow redirects (todo: only permanent ones, or all?)
127127-- check for rel=canonical http header and possibly follow it
128128-- check link rel=canonical meta tag and possibly follow it
129129-- do we need to check site maps??
130130-- do we have to care at all about AMP?
131131-- do we want anything to do with url shorteners??
132132-- how do multilingual sites affect this?
133133-- do we have to care about `script type="application/ld+json"` ???
134134-135135-ugh. is there a crate for this.
136136-137137-138138-### relative uris?
139139-140140-links might be relative, in which case they might need to be made absolute before being useful. is that a concern for this library, or up to the user? (seems like we might not have context here to determine its absolute)
141141-142142-143143-### canonicalizing
144144-145145-there should be a few async functions available to canonicalize already-parsed links.
146146-147147-- what happens if a link can't be resolved?
5454+💫 [Links](./links)
5555+-------------------
14856149149-150150----
5757+Rust library for parsing and extracting links (at-uris, DIDs, and URLs) from atproto records.
15158152152-- using `tinyjson` because it's nice -- maybe should switch to serde_json to share deps with atrium?
5959+- source: [./links/](./links/)
6060+- status: not yet published to crates.io; needs some rework
15361154154-- would use atrium for parsing at-uris, but it's not in there. there's a did-only version in the non-lib commands.rs. its identifier parser is strict to did + handle, which makes sense, but for our purposes we might want to allow unknown methods too?
155155-156156- - rsky-syntax has an aturi
157157- - adenosyne also
158158- - might come back to these
159626363+🔭 Deprecated: [Who am I](./who-am-i)
6464+-------------------------------------
16065161161--------
6666+An identity bridge for microcosm demos, that kinda worked. Fixing its problems is about equivalent to reinventing a lot of OIDC, so it's being retired.
16267163163-rocks
6868+- source: [./who-am-i/](./who-am-i/)
6969+- status: ready for retirement.
16470165165-```bash
166166-ROCKSDB_LIB_DIR=/nix/store/z2chn0hsik0clridr8mlprx1cngh1g3c-rocksdb-9.7.3/lib/ cargo build
167167-```
7171+Still in use for the Spacedust Notifications demo, but that will hopefully be migrated to use atproto oauth directly instead.
ufos ops (move to micro-ops).md
legacy/ufos ops (move to micro-ops).md