A collection of Osprey rules for ATProto
1# Osprey ATProto Ruleset
2
3> [!WARNING]
4> The content in this repository drives social network moderation decisions. As such, there may be items that you find offensive, such as lists of slurs.
5
6This is a ruleset for [Osprey](https://github.com/roostorg/osprey) for use with ATProto, and specifically for [Bluesky](https://bsky.app). It is the ruleset that is used live on
7the [labeler that I personally run](https://bsky.app/profile/labeler.hailey.at). It may be used in conjunction with [my fork of Osprey](https://github.com/haileyok/osprey), which
8has implemented various components required for these rules (ATProto labels output sink, ML model calling, Redis counter system, etc).
9
10## Using This Ruleset
11
12The easiest way to get started with these rules is to clone them into your Osprey rules directory, wherever that is located. For example, cloning down the official (or forked)
13version of Osprey will leave you with an `example_rules/` directory. Replacing the contents of that directory with this repository's contents will allow these rules to run.
14
15Again, note that you will need to have the required sinks and UDFs that these rules required, which are maintained inside of my Osprey fork for the time being.
16
17# Writing Rules
18
19> [!NOTE]
20> This documentation is a WIP for _generic_ rule writing, not ATProto specific rules. More documentation will come for ATProto specific rules.
21
22Osprey rules are written in SML, a sort of subset of Python (think Starlark). You can write rules that are specific to certain types of events that happen on a network or rules that take effect regardless of event type, depending on the type of behavior or patterns you are looking for.
23
24## Structuring Rules
25
26You will likely find it useful to maintain two subdirectories inside of your main rules directory - a `rules` directory where actual logic will be added and a `models` directory for defining the various features that occur in any or specific event types. For example, your structure may look something like this:
27
28```bash
29example-rules/
30| rules/
31| | record/
32| | | post/
33| | | | first_post_link.sml
34| | | | index.sml
35| | | like/
36| | | | like_own_post.sml
37| | | | index.sml
38| | account/
39| | | signup/
40| | | | high_risk_signup.sml
41| | | | index.sml
42| | index.sml
43| models/
44| | record/
45| | | post.sml
46| | | like.sml
47| | account/
48| | | signup.sml
49| main.sml
50```
51
52This sort of structure lets you define rules and models that are specific to certain event types so that only the necessary rules are run for various event types. For example, you likely have some rules that should only be run on a `post` event, since only a `post` will have features like `text` or `mention_count`.
53
54Inside of each directory, you may maintain an `index.sml` file that will define the conditional logic in which the rules inside that directory are actually included for execution. Although you could handle all of this conditional logic inside of a single file, maintaining separate `index.sml`s per directory greatly helps with neat organization.
55
56## Models
57
58Before you actually write a rule, you’ll need to define a “model” for an event type. For this example, we will assume that you run a social media website that lets users create posts, either at the “top level” or as a reply to another top level post. Each post may include text, mentions of other users on your network, and an optional link embed in the post. Let’s say that the event’s JSON structure looks like this:
59
60```json
61{
62 "eventType": "userPost",
63 "user": {
64 "userId": "user_id_789",
65 "handle": "carol",
66 "postCount": 3,
67 "accountAgeSeconds": 9002
68 },
69 "postId": "abc123xyz",
70 "replyId": null,
71 "text": "Is anyone online right now? @alice or @bob, you there? If so check this video out",
72 "mentionIds": ["user_id_123", "user_id_456"],
73 "embedLink": "https://youtube.com/watch?id=1"
74}
75```
76
77Inside of our `models/record` directory, we should now create a `post.sml` file where we will define the features for a post.
78
79```python
80PostId: Entity[str] = EntityJson(
81 type='PostId',
82 path='$.postId',
83)
84
85PostText: str = JsonData(
86 path='$.text',
87)
88
89MentionIds: List[str] = JsonData(
90 path='$.mentionIds',
91)
92
93EmbedLink: Optional[str] = JsonData(
94 path='$.embedLink',
95 required=False,
96)
97
98ReplyId: Entity[str] = JsonData(
99 path='$.replyId',
100 required=False,
101)
102```
103
104The `JsonData` UDF (more on UDFs to follow) lets us take the event’s JSON and define features based on the contents of that JSON. These features can then be referenced in other rules that we import the `models/record/post.sml` model into. If you have any values inside your JSON object that may not always be present, you can set `required` to `False`, and these features will be `None` whenever the feature is not present.
105
106Note that we did not actually create any features for things like `userId` or `handle`. That is because these values will be present in *any* event. It wouldn’t be very nice to have to copy these features into each event type’s model. Therefore, we will actually create a `base.sml` model that defines these features which are always present. Inside of `models/base.sml`, let’s define these.
107
108```python
109EventType = JsonData(
110 path='$.eventType',
111)
112
113UserId: Entity[str] = EntityJson(
114 type='UserId',
115 path='$.user.userId',
116)
117
118Handle: Entity[str] = EntityJson(
119 type='Handle',
120 path='$.user.handle',
121)
122
123PostCount: int = JsonData(
124 path='$.user.postCount',
125)
126
127AccountAgeSeconds: int = JsonData(
128 path='$.user.accountAgeSeconds',
129)
130```
131
132Here, instead of simply using `JsonData`, we instead use the `EntityJson` UDF. More on this later, but as a rule of thumb, you likely will want to have values for things like a user’s ID set to be entities. This will help more later, such as when doing data explorations within the Osprey UI.
133
134### Model Hierarchy
135
136In practice, you may find it useful to create a hierarchy of base models:
137
138- `base.sml` for features present in every event (user IDs, handles, account stats, etc.)
139- `account_base.sml` for features that appear only in account related events, but always appear in each account related event. Similarly, you may add one like `record_base.sml` for those features which appear in all record events.
140
141This type of hierarchy prevents duplication (which Osprey does not allow) and ensures features are defined at the appropriate level of abstraction.
142
143## Rules
144
145More in-depth documentation on rule writing can be found in `docs/WRITING-RULES.md`, however we’ll offer a brief overview here.
146
147Let's imagine we want to flag accounts whose first post mentions at least one user and includes a link. We’ll create a `.sml` file at `rules/record/post/first_post_link.sml` for our rules logic. This file will include both the conditions which will result in the rule evaluating to `True`, as well as the actions that we want to take if that rule does indeed evaluate to `True`.
148
149```python
150# First, import the models that you will need inside of this rule
151Import(
152 rules=[
153 'models/base.sml',
154 'models/record/post.sml',
155 ],
156)
157
158# Next, define a variable that uses the `Rule` UDF
159FirstPostLinkRule = Rule(
160 # Set the conditions in which this rule will be `True`
161 when_all=[
162 PostCount == 1, # if this is the user's first post
163 EmbedLink != None, # if there is a link inside of the post
164 ListLength(list=MentionIds) >= 1, # if there is at least one mention in the post
165 ],
166 description='First post for user includes a link embed',
167)
168
169# Finally, set which effect UDFs (more on this later) will be triggered
170WhenRules(
171 rules_any=[FirstPostLinkRule],
172 then=[
173 ReportRecord(
174 entity=PostId,
175 comment='This was the first post by a user and included a link',
176 severity=3,
177 ),
178 ],
179)
180```
181
182We also want to make sure this rule runs *only* whenever the event is a post event. Since we have a well defined project structure, this is pretty easy. We’ll start by modifying the `main.sml` at the project root to include a single, simple `Require` statement.
183
184```bash
185Require(
186 rule='rules/index.sml',
187)
188```
189
190Next, inside of `rules/index.sml` we will define the conditions that result in post rules executing:
191
192```bash
193Import(
194 rules=[
195 'models/base.sml',
196 ],
197)
198
199Require(
200 rule='rules/record/post/index.sml',
201 require_if=EventType == 'userPost',
202)
203```
204
205Finally, inside of `rules/record/post/index.sml` we will require this new rule that we have written.
206
207```bash
208Import(
209 rules=[
210 'models/base.sml',
211 'models/record/post.sml',
212 ],
213)
214
215Require(
216 rule='rules/record/post/first_post_link.sml',
217)