Add new get_many_to_many XRPC endpoint #7

+131 -127

9 changed files

Interdiff #1 → #2

expand all

unified split

constellation

src

lib.rs

server

mod.rs

storage

mem_store.rs

mod.rs

rocks_store.rs

templates

get-many-to-many.html.j2

hello.html.j2

try-it-macros.html.j2

lexicons

blue.microcosm

links

getManyToMany.json

constellation/src/lib.rs

This file has not been changed.

constellation/src/server/mod.rs

This patch was likely rebased, as context lines do not match.

constellation/src/storage/mem_store.rs

This patch was likely rebased, as context lines do not match.

+131 -127

constellation/src/storage/mod.rs

··· 104 fn get_all_record_counts(&self, _target: &str) 105 -> Result<HashMap<String, HashMap<String, u64>>>; 106 107 fn get_many_to_many( 108 &self, 109 target: &str, ··· 144 145 146 147 - fn get_all_record_counts(&self, _target: &str) 148 - -> Result<HashMap<String, HashMap<String, u64>>>; 149 150 - fn get_all_counts( 151 - &self, 152 153 154 ··· 1533 1534 1535 1536 1537 1538 ··· 1560 1561 1562 1563 next: None, 1564 } 1565 ); ··· 1737 .unwrap(); 1738 assert_eq!(b_group.subject, "b.com"); 1739 assert_eq!(b_group.records.len(), 2); 1740 - assert!(b_group.records 1741 .iter() 1742 .any(|r| r.did.0 == "did:plc:asdf" && r.rkey == "asdf")); 1743 - assert!(b_group.records 1744 .iter() 1745 .any(|r| r.did.0 == "did:plc:asdf" && r.rkey == "asdf2")); 1746 // Find c.com group ··· 1751 .unwrap(); 1752 assert_eq!(c_group.subject, "c.com"); 1753 assert_eq!(c_group.records.len(), 2); 1754 - assert!(c_group.records 1755 .iter() 1756 .any(|r| r.did.0 == "did:plc:fdsa" && r.rkey == "fdsa")); 1757 - assert!(c_group.records 1758 .iter() 1759 .any(|r| r.did.0 == "did:plc:fdsa" && r.rkey == "fdsa2")); 1760 ··· 1791 assert_eq!(group.subject, "b.com"); 1792 assert_eq!(group.records.len(), 2); 1793 assert!(group.records.iter().all(|r| r.did.0 == "did:plc:asdf")); 1794 - }); 1795 - } 1796 - 1797 - 1798 - 1799 - 1800 - 1801 - 1802 - 1803 - 1804 - 1805 - 1806 - 1807 - 1808 - 1809 - 1810 - 1811 - 1812 - 1813 - 1814 - 1815 - 1816 - 1817 - 1818 - 1819 - 1820 - 1821 - 1822 - 1823 - 1824 - 1825 - 1826 - 1827 - 1828 - 1829 - 1830 - 1831 - 1832 - 1833 - 1834 - 1835 - 1836 - 1837 - 1838 - 1839 - 1840 - 1841 - 1842 - 1843 - 1844 - 1845 - 1846 - 1847 - 1848 - 1849 - 1850 - 1851 - 1852 - 1853 - 1854 - 1855 - 1856 - 1857 - 1858 - 1859 - 1860 - 1861 - 1862 - 1863 - 1864 - 1865 - 1866 - 1867 - 1868 - 1869 - 1870 - 1871 - 1872 - 1873 - 1874 - 1875 - 1876 - 1877 - 1878 - 1879 - 1880 - 1881 - 1882 - 1883 - 1884 - 1885 - 1886 - 1887 - 1888 - 1889 - 1890 - 1891 - 1892 - 1893 - 1894 - 1895 - 1896 - 1897 - 1898 - 1899 - 1900 - 1901 - 1902 - 1903 - 1904 - 1905 - 1906 - 1907 - 1908 - 1909 - 1910 - next: None, 1911 - } 1912 - ); 1913 }); 1914 }

··· 104 fn get_all_record_counts(&self, _target: &str) 105 -> Result<HashMap<String, HashMap<String, u64>>>; 106 107 + fn get_all_counts( 108 + &self, 109 + 110 + 111 + 112 + 113 + 114 + 115 + 116 + 117 + 118 + 119 + 120 + 121 + 122 + 123 + 124 + 125 + 126 + 127 + 128 + 129 + 130 + 131 + 132 + 133 + 134 + 135 + fn get_all_record_counts(&self, _target: &str) 136 + -> Result<HashMap<String, HashMap<String, u64>>>; 137 + 138 fn get_many_to_many( 139 &self, 140 target: &str, ··· 175 176 177 178 179 180 181 ··· 1560 1561 1562 1563 + next: None, 1564 + } 1565 + ); 1566 + }); 1567 + } 1568 1569 1570 ··· 1592 1593 1594 1595 + 1596 + 1597 + 1598 + 1599 + 1600 + 1601 + 1602 + 1603 + 1604 + 1605 + 1606 + 1607 + 1608 + 1609 + 1610 + 1611 + 1612 + 1613 + 1614 + 1615 + 1616 + 1617 + 1618 + 1619 + 1620 + 1621 + 1622 + 1623 + 1624 + 1625 + 1626 + 1627 + 1628 + 1629 + 1630 + 1631 + 1632 + 1633 + 1634 + 1635 + 1636 + 1637 + 1638 + 1639 + 1640 + 1641 + 1642 + 1643 + 1644 + 1645 + 1646 + 1647 + 1648 + 1649 + 1650 + 1651 + 1652 + 1653 + 1654 + 1655 + 1656 + 1657 + 1658 + 1659 + 1660 + 1661 + 1662 + 1663 + 1664 + 1665 + 1666 + 1667 + 1668 + 1669 + 1670 + 1671 + 1672 + 1673 + 1674 + 1675 + 1676 + 1677 + 1678 + 1679 + 1680 + 1681 + 1682 next: None, 1683 } 1684 ); ··· 1856 .unwrap(); 1857 assert_eq!(b_group.subject, "b.com"); 1858 assert_eq!(b_group.records.len(), 2); 1859 + assert!(b_group 1860 + .records 1861 .iter() 1862 .any(|r| r.did.0 == "did:plc:asdf" && r.rkey == "asdf")); 1863 + assert!(b_group 1864 + .records 1865 .iter() 1866 .any(|r| r.did.0 == "did:plc:asdf" && r.rkey == "asdf2")); 1867 // Find c.com group ··· 1872 .unwrap(); 1873 assert_eq!(c_group.subject, "c.com"); 1874 assert_eq!(c_group.records.len(), 2); 1875 + assert!(c_group 1876 + .records 1877 .iter() 1878 .any(|r| r.did.0 == "did:plc:fdsa" && r.rkey == "fdsa")); 1879 + assert!(c_group 1880 + .records 1881 .iter() 1882 .any(|r| r.did.0 == "did:plc:fdsa" && r.rkey == "fdsa2")); 1883 ··· 1914 assert_eq!(group.subject, "b.com"); 1915 assert_eq!(group.records.len(), 2); 1916 assert!(group.records.iter().all(|r| r.did.0 == "did:plc:asdf")); 1917 }); 1918 }

constellation/src/storage/rocks_store.rs

This patch was likely rebased, as context lines do not match.

constellation/templates/get-many-to-many.html.j2

This file has not been changed.

constellation/templates/hello.html.j2

This patch was likely rebased, as context lines do not match.

constellation/templates/try-it-macros.html.j2

This patch was likely rebased, as context lines do not match.

lexicons/blue.microcosm/links/getManyToMany.json

This file has not been changed.

History

8 rounds 13 comments

seoul.systems submitted #7 3w

diff interdiff

11 commits

expand

45e798b8

wip: m2m

f3144f8d

Add tests for new get_many_to_many query handler

21057eca

Fix get_m2m_empty test

4d686c35

Replace tuple with RecordsBySubject struct

4e358f1a

Fix conflicts after rebasing on main

bb107c03

Use record_id/subject tuple as return type for get_many_to_many

90220d0b

Fix get_many_to_many pagination with composite cursor

373d0744

Fix get_many_to_many_counts pagination with fetch N+1

46f5fae4

wip

6d22f695

Fix rocks-store to match mem-store composite cursor

c2d01284

Address feedback from fig

expand 0 comments

pull request successfully merged

seoul.systems submitted #6 4w

diff interdiff

10 commits

expand

45e798b8

wip: m2m

f3144f8d

Add tests for new get_many_to_many query handler

21057eca

Fix get_m2m_empty test

4d686c35

Replace tuple with RecordsBySubject struct

4e358f1a

Fix conflicts after rebasing on main

bb107c03

Use record_id/subject tuple as return type for get_many_to_many

90220d0b

Fix get_many_to_many pagination with composite cursor

373d0744

Fix get_many_to_many_counts pagination with fetch N+1

46f5fae4

wip

6d22f695

Fix rocks-store to match mem-store composite cursor

expand 0 comments

seoul.systems submitted #5 4w

diff interdiff

8 commits

expand

45e798b8

wip: m2m

f3144f8d

Add tests for new get_many_to_many query handler

21057eca

Fix get_m2m_empty test

4d686c35

Replace tuple with RecordsBySubject struct

4e358f1a

Fix conflicts after rebasing on main

bb107c03

Use record_id/subject tuple as return type for get_many_to_many

90220d0b

Fix get_many_to_many pagination with composite cursor

373d0744

Fix get_many_to_many_counts pagination with fetch N+1

expand 1 comment

seoul.systems 4w

Okay. I wrapped my head around the composite cursor you proposed and am working on refactoring both storage implementations towards that. I think I might re-submit another round tomorrow :)

seoul.systems submitted #4 4w

diff interdiff

6 commits

expand

45e798b8

wip: m2m

f3144f8d

Add tests for new get_many_to_many query handler

21057eca

Fix get_m2m_empty test

4d686c35

Replace tuple with RecordsBySubject struct

4e358f1a

Fix conflicts after rebasing on main

bb107c03

Use record_id/subject tuple as return type for get_many_to_many

expand 3 comments

seoul.systems 4w

Found a bug in how we handle some of the pagination logic in cases where the number of items and the user selected limit are identical to very close too each other (already working on a fix)

bad-example.com 4w

thanks for the rebase! i tried to write things in the tiny text box but ended up needing to make a diagram: https://bsky.app/profile/did:plc:hdhoaan3xa3jiuq4fg4mefid/post/3mejuq44twc2t

key thing is that where the focus of getManyToManyCounts was the other subject (aggregation was against that, so grouping happened with it),

i think the focus of disagreggated many-to-many is on the linking records themselves

to me that takes me toward a few things

i don't think we should need to group the links by target (does the current code build up the full aggregation on every requested page? we should be able to avoid doing that)
i think the order of the response should actually be based on the linking record itself (since we have a row in the output), not the other subject, unlike with the aggregated/count version. this means you get eg. list items in order they were added instead of the order of the listed things being created. (i haven't fully wrapped my head around the grouping/ordering code here yet)
since any linking record can have a path_to_other with multiple links, i think a composite cursor could work here:

a 2-tuple of (backlink_vec_idx, forward_vec_idx).

for normal cases where the many-to-many record points to exactly one other subject, it would just be advancing backlink_vec_idx like normal backlinks

for cases where the many-to-many record actually has multiple foward links at the given path_to_other, the second part of the tuple would track progress through that list

i think that allows us to hold the necessary state between calls without needing to reconstruct too much in memory each time?

(also it's hard to write in this tiny tiny textbox and have a sense of whether what i'm saying makes sense)

seoul.systems 4w

Interesting approach! I have to think through this for a bit to be honest. Maybe I tried to follow the existing counts implementation too closely

Having said that, I added a new composite cursor to fix a couple of bugs that would arrive when hitting a couple of possible edge-cases in the pagination logic. This affects both the new get-many-to-many endpoint as well as the existing get-many-to-many-counts endpoint. As the changes are split over two distinct commits things should be straightforward to review.

Your assumption is still correct in the sense that we do indeed have to build up the aggregation again for every request. I have to double-check the get-backlinks endpoint to get a better sense of where you're going at.

Finally, I agree that the interface here doesn't necessarily make the whole thing easier to understand, unfortunately

seoul.systems submitted #3 4w

diff interdiff

6 commits

expand

dbbef261

wip: m2m

52039af4

Add tests for new get_many_to_many query handler

da6fb97c

Fix get_m2m_empty test

9bd3a217

Replace tuple with RecordsBySubject struct

6f1351b9

Fix conflicts after rebasing on main

235fb506

Use record_id/subject tuple as return type for get_many_to_many

expand 2 comments

bad-example.com 4w

i think something got funky with a rebase or the way tangled is showing it -- some of my changes on main seem to be getting shown (reverted) in the diff.

i don't mind sorting it locally but will mostly get to it tomorrow, in case you want to see what's up before i do.

seoul.systems 4w

That's one on me, sorry! Rebased again on main and now everything seems fine

seoul.systems submitted #2 7w

diff

5 commits

expand

dbbef261

wip: m2m

52039af4

Add tests for new get_many_to_many query handler

da6fb97c

Fix get_m2m_empty test

9bd3a217

Replace tuple with RecordsBySubject struct

6f1351b9

Fix conflicts after rebasing on main

expand 5 comments

seoul.systems 7w

Rebased on main. As we discussed in the PR for the order query parameter, I didn't include this here as it's not a particular sensible fit.

bad-example.com 6w

i need to get into the code properly but my initial thought is that this endpoint should return a flat list of results, like

{
  "items": [
    {
      "link": { did, collection, rkey }, // the m2m link record
      "subject": "a.com"
    },
    {
      "link": { did, collection, rkey },
      "subject": "a.com"
    },
    {
      "link": { did, collection, rkey },
      "subject": "b.com"
    },
  ]
}

this will require a bit of tricks in the cursor to track pages across half-finished groups of links

(also this isn't an immediate change request, just getting it down for discussion!)

(and separately, i've also been wondering about moving more toward returning at-uris instead of broken-out did/collection/rkey objects. which isn't specifically about this PR, but if that happens then switching before releasing it is nice)

seoul.systems 6w

Hmm, I wonder how this would then work with the path_to_other parameter. Currently we have this nested grouping in order to show and disambiguate different relationships between different links.

For instance take the following query and it's results:

http://localhost:6789/xrpc/blue.microcosm.links.getManyToMany?subject=at://did:plc:2w45zyhuklwihpdc7oj3mi63/app.bsky.feed.post/3mdbbkuq6t32y&source=app.bsky.feed.post:reply.root.uri&pathToOther=reply.parent.uri&limit=16

This query asks: "Show me all posts in this thread, grouped by who they're responding to."

A flat list would just give us all the posts in the thread. The nested structure answers a richer question: who's talking to whom? Some posts are direct responses to the original article. Others are replies to other commenters, forming side conversations that branch off from the main thread.

The pathToOther grouping preserves that distinction. Without it, we'd lose the information about who's talking to whom.

{
  "linking_records": [
    {
      "subject": "at://did:plc:2w45zyhuklwihpdc7oj3mi63/app.bsky.feed.post/3mdbbkuq6t32y",
      "records": [
        {
          "did": "did:plc:lznqwrsbnyf6fdxohikqj6h3",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdd27pja7s2y"
        },
        {
          "did": "did:plc:uffx77au6hoauuuumkbuvqdr",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdd2tt5efc2a"
        },
        {
          "did": "did:plc:y7qyxzo7dns5m54dlq3youu3",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdd2wtjxgc2d"
        },
        {
          "did": "did:plc:yaakslxyqydb76ybgkhrr4jk",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdd35hyads22"
        },
        {
          "did": "did:plc:fia7w2kbnrdjwp6zvxywt7qv",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdd37j3ldk2m"
        },
        {
          "did": "did:plc:xtecipifublblkomwau5x2ok",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdd3dbtbz22n"
        },
        {
          "did": "did:plc:hl5lhiy2qr4nf5e4eefldvme",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdd42hpw7c2e"
        },
        {
          "did": "did:plc:fgquypfh32pewivn3bcmzseb",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdd46jteoc2m"
        }
      ]
    },
    {
      "subject": "at://did:plc:3rhjxwwui6wwfokh4at3q2dl/app.bsky.feed.post/3mdczc7c4gk2i",
      "records": [
        {
          "did": "did:plc:3rhjxwwui6wwfokh4at3q2dl",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdczt7cwhk2i"
        }
      ]
    },
    {
      "subject": "at://did:plc:6buibzhkqr4vkqu75ezr2uv2/app.bsky.feed.post/3mdby25hbbk2v",
      "records": [
        {
          "did": "did:plc:fgeie2bmzlmx37iglj3xbzuj",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdd26ulf4k2j"
        }
      ]
    },
    {
      "subject": "at://did:plc:lwgvv5oqh5stzb6dxa5d7z3n/app.bsky.feed.post/3mdcxqbkkfk2i",
      "records": [
        {
          "did": "did:plc:hl5lhiy2qr4nf5e4eefldvme",
          "collection": "app.bsky.feed.post",
          "rkey": "3mdd45u56sk2e"
        }
      ]
    }
  ],
  "cursor": null
}

Correct me if I'm somehow wrong here!

Regarding returning at-uris: I think this might be a nice idea as users might be able to split these up when they feel the need to any way and it feels conceptually more complete. But, it might be easier to do this in a different PR over all existing XRPC endpoints. This would allow us to add this new endpoint already while working on the updated return values in the meantime. I'd like to avoid doing too much distinct stuff in one PR. :)

bad-example.com 6w

at-uris: totally fair, holding off for a follow-up.

flat list: i might have messed it up in my example but i think what i meant is actually equivalent to the grouped version: flattened, with the subject ("group by") included with every item in the flatted list.

clients can collect the flat list and group on subject to get back to your structured example, if they want.

my motivations are probably part sql-brain, part flat-list-enjoyer, and part cursor-related. i'm trying to disregard the first two, and i'm curious about your thoughts about how to handle the cursor:

with a flat list it's easy (from the client perspective at least) -- just keep chasing the cursor for as much of the data as you need. (cursors can happen in the middle of a subject)

with nested results grouped by subject it's less obvious to me. correct me if i'm wrong (need another block of time to actually get into the code) but i think the grouped item sub-list is unbounded size in the proposed code here? so cursors are only limiting the number of groups.

if we go with the grouped nested response, i think maybe we'd want something like:

a cursor at the end for fetching more groups, and
a cursor for each group-list that lets you fetch more items from just that group-list.

(i think this kind of nested paging is pretty neat!)

seoul.systems 6w

Interesting. Now that you mention it I feel I kinda get where you're going at!

I think the whole cursor thing, albeit possible for sure, is kinda creating more unnecessary complexity so I'll probably go with your suggestion.

It seems easier to create custom groupings on their own for most users (having more freedom is always great) and I think from an ergonomic perspective the two cursors might create more friction.

seoul.systems submitted #1 2mo

diff interdiff

4 commits

expand

a4177589

wip: m2m

81b7df55

Add tests for new get_many_to_many query handler

c6742917

Fix get_m2m_empty test

f0b16b50

Replace tuple with RecordsBySubject struct

expand 1 comment

seoul.systems 2mo

Added the missing lexicon entry for the new endpoint and changed the return type as well. Commented this wrongly at the other PR that I was working on. Sorry about that lol.

seoul.systems submitted #0 2mo

diff

3 commits

expand

a4177589

wip: m2m

81b7df55

Add tests for new get_many_to_many query handler

c6742917

Fix get_m2m_empty test

expand 1 comment

seoul.systems 2mo

I think the existing get_many_to_many_counts handler and the new get_many_to_many handler are similar enough that we might extract the bulk of their logic in a shared piece of logic. Maybe a method that takes the existing identical function parameters and a new additional callback parameter (that handles what we do with found matches, i.e. calculate counts or join URIs) might be one way to go for it.

I am not too sure yet though if this is indeed the right thing to do as the new shared implementation might be a bit complicated. But given the strong similarities between the two I think it's worth at least considering.