SML_RULES_GUIDE.md at main · hailey.at/osprey-mcp

An MCP server for Osprey
osprey-mcp / SML_RULES_GUIDE.md
at main 610 lines 14 kB view raw view rendered
wrap content
  1# SML Rules Guide for AI Agents
  2
  3This document provides a comprehensive guide to writing SML (Some Madeup Language) rules in Osprey. SML is a statically-typed subset of Python designed for writing detection and classification rules.
  4
  5## Table of Contents
  6
  71. [Overview](#overview)
  82. [Basic Concepts](#basic-concepts)
  93. [Rule Structure](#rule-structure)
 104. [Data Types](#data-types)
 115. [Operators](#operators)
 126. [Built-in Functions (UDFs)](#built-in-functions-udfs)
 137. [File Organization](#file-organization)
 148. [Wiring Rules to Effects](#wiring-rules-to-effects)
 159. [Null Handling](#null-handling-critical)
 1610. [Complete Examples](#complete-examples)
 1711. [Common Patterns](#common-patterns)
 1812. [Validation Rules](#validation-rules)
 19
 20---
 21
 22## Overview
 23
 24SML rules are used to evaluate incoming action data and trigger effects like verdicts and labels. Key characteristics:
 25
 26- **Statically typed**: All types are checked at validation time
 27- **Python-like syntax**: Familiar syntax with restrictions for safety
 28- **Stateless logic**: Rules define conditions, not procedures
 29- **Event-driven**: Rules evaluate against incoming action JSON data
 30
 31---
 32
 33## Basic Concepts
 34
 35### What is a Rule?
 36
 37A rule is a named boolean expression that evaluates conditions against action data:
 38
 39```python
 40MyRule = Rule(
 41    when_all=[
 42        # List of conditions - ALL must be True for the rule to pass
 43        Condition1,
 44        Condition2,
 45    ],
 46    description='Human-readable description of what this rule detects'
 47)
 48```
 49
 50### What is an Effect?
 51
 52Effects are actions taken when rules pass. They are triggered through `WhenRules()`:
 53
 54- `DeclareVerdict(verdict='reject')` - Returns a verdict to the caller
 55- `LabelAdd(entity=UserId, label='flagged')` - Adds a label to an entity
 56- `LabelRemove(entity=UserId, label='flagged')` - Removes a label from an entity
 57
 58### What is an Entity?
 59
 60Entities are typed identifiers (like User IDs, emails, IPs) that can have labels attached:
 61
 62```python
 63UserId: Entity[str] = EntityJson(type='User', path='$.user_id')
 64```
 65
 66---
 67
 68## Rule Structure
 69
 70### Basic Rule Definition
 71
 72```python
 73RuleName = Rule(
 74    when_all=[
 75        # Conditions go here - ALL must be True
 76    ],
 77    description='Description string or f-string'
 78)
 79```
 80
 81### Syntax Requirements
 82
 831. **Rule names must be non-local variables** (cannot start with `_`):
 84   ```python
 85   # Valid
 86   MyRule = Rule(...)
 87
 88   # Invalid - will fail validation
 89   _MyRule = Rule(...)
 90   ```
 91
 922. **Descriptions must be string or f-string literals** (not variables):
 93   ```python
 94   # Valid
 95   description='Static description'
 96   description=f'User {UserId} triggered rule'
 97
 98   # Invalid - will fail validation
 99   my_desc = 'description'
100   description=my_desc
101   ```
102
1033. **when_all accepts a list of boolean conditions**:
104   - All conditions must evaluate to True for the rule to pass
105   - Conditions can be comparisons, function calls, other rules, or boolean values
106
107---
108
109## Data Types
110
111### Basic Types
112
113```python
114# Integer
115Count: int = JsonData(path='$.count')
116
117# String
118Name: str = JsonData(path='$.name')
119
120# Boolean
121IsActive: bool = JsonData(path='$.active')
122
123# List
124Items: list = JsonData(path='$.items')
125```
126
127### Entity Types
128
129Entities are special types for identifiers that can have labels:
130
131```python
132# Entity with string ID
133UserId: Entity[str] = EntityJson(
134    type='User',
135    path='$.user_id',
136    coerce_type=True  # Optional: convert value to expected type
137)
138
139# Entity with integer ID
140PostId: Entity[int] = EntityJson(
141    type='Post',
142    path='$.post_id'
143)
144
145# Manually created entity
146MyEntity = Entity(type='MyType', id='some_value')
147```
148
149### Optional Types
150
151For fields that may not exist:
152
153```python
154# Optional field - won't fail if missing
155OptionalField: Optional[str] = JsonData(path='$.maybe_exists', required=False)
156```
157
158---
159
160## Operators
161
162### Comparison Operators
163
164```python
165Value == 5              # Equals
166Value != 5              # Not equals
167Value > 5               # Greater than
168Value >= 5              # Greater than or equal
169Value < 5               # Less than
170Value <= 5              # Less than or equal
171Value in [1, 2, 3]      # In list
172Value not in [1, 2, 3]  # Not in list
173```
174
175### Arithmetic Operators
176
177```python
1785 + 3       # Addition
1795 - 3       # Subtraction
1805 * 3       # Multiplication
1815 / 3       # Division
1825 // 3      # Floor division
1835 % 3       # Modulo
1845 ** 3      # Power
185```
186
187### Boolean Operators
188
189```python
190Condition1 and Condition2    # Logical AND
191Condition1 or Condition2     # Logical OR
192not Condition1               # Logical NOT
193
194# In when_all, conditions are implicitly AND-ed:
195Rule(when_all=[
196    Cond1,
197    Cond2,  # Both must be True
198])
199
200# Use 'or' for explicit OR logic:
201Rule(when_all=[
202    (Cond1 or Cond2),  # Either Cond1 or Cond2
203    Cond3,              # AND Cond3
204])
205```
206
207### Null Checking
208
209```python
210Value != Null    # Check if value is NOT null
211Value == Null    # Check if value IS null
212```
213
214---
215
216## File Organization
217
218### Import - Include Other Files
219
220Use `Import` to include rules/features from other files:
221
222```python
223Import(rules=[
224    'models/base.sml',
225    'models/user.sml',
226    'rules/common.sml',
227])
228```
229
230**Requirements:**
231- File paths must be relative
232- List must be **lexicographically sorted**
233- No duplicates allowed
234- Imported variables/rules are accessible in current file
235
236### Require - Conditionally Include Files
237
238Use `Require` for conditional or template-based includes:
239
240```python
241# Always include
242Require(rule='expensive_check.sml')
243
244# Conditional include
245Require(rule='ai_check.sml', require_if=ActionName == 'register')
246
247# Template-based (f-string)
248Require(rule=f'actions/{ActionName}.sml')
249```
250
251**Note:** Unlike Import, outputs from Required files are NOT accessible in the parent file.
252
253### Typical File Structure
254
255```
256rules/
257├── main.sml                 # Entry point
258├── models/
259│   ├── base.sml            # Common entities (UserId, etc.)
260│   ├── user.sml            # User-related features
261│   └── content.sml         # Content-related features
262├── rules/
263│   ├── spam.sml            # Spam detection rules
264│   └── abuse.sml           # Abuse detection rules
265└── actions/
266    ├── register.sml        # Action-specific rules
267    └── send_message.sml
268```
269
270---
271
272## Wiring Rules to Effects
273
274Rules by themselves don't do anything. Use `WhenRules()` to connect rules to effects:
275
276```python
277WhenRules(
278    rules_any=[
279        Rule1,
280        Rule2,
281        Rule3,
282    ],
283    then=[
284        DeclareVerdict(verdict='reject'),
285        LabelAdd(entity=UserId, label='flagged'),
286    ],
287)
288```
289
290**Semantics:**
291- If **ANY** rule in `rules_any` evaluates to True, **ALL** effects in `then` execute
292- Failed rules don't prevent other rules from being checked
293- Failed effects don't prevent other effects from executing
294
295### Conditional Effects
296
297Use `apply_if` to make individual effects conditional:
298
299```python
300WhenRules(
301    rules_any=[MainRule],
302    then=[
303        LabelAdd(entity=UserId, label='basic_flag'),
304        LabelAdd(entity=UserId, label='severe_flag', apply_if=SevereRule),
305        LabelAdd(entity=UserId, label='repeat_offender', apply_if=RepeatOffenderRule),
306    ],
307)
308```
309
310---
311
312## Null Handling (CRITICAL)
313
314SML has unique null semantics that differ from most languages. **Understanding this is critical.**
315
316### Null Propagation Rule
317
318If a value evaluates to Null:
3191. The containing rule evaluates to **Null** (not False!)
3202. Any rule depending on it also becomes **Null**
3213. This propagates through the entire dependency chain
322
323### Example of Null Propagation
324
325```python
326# If $.missing_property doesn't exist...
327Thing: int = JsonData(path='$.missing_property')
328
329# This rule becomes Null (NOT False)
330MyRule = Rule(when_all=[
331    Thing > 1,
332])
333
334# This rule ALSO becomes Null (propagates!)
335DependentRule = Rule(when_all=[
336    MyRule,
337])
338```
339
340### Solutions to Null Issues
341
342**Solution 1: Use `required=False`**
343```python
344Thing: Optional[int] = JsonData(path='$.maybe_exists', required=False)
345```
346
347**Solution 2: Explicit null checks**
348```python
349SafeRule = Rule(when_all=[
350    Thing != Null,  # Guard against null
351    Thing > 1,
352])
353```
354
355**Solution 3: Use ResolveOptional**
356```python
357SafeThing: int = ResolveOptional(
358    optional_value=MaybeThing,
359    default_value=0
360)
361```
362
363---
364
365## Complete Examples
366
367### Example 1: Basic Spam Detection
368
369```python
370# models/base.sml
371UserId: Entity[str] = EntityJson(
372    type='User',
373    path='$.user_id',
374    coerce_type=True
375)
376
377MessageText: str = JsonData(path='$.message.text')
378EventType: str = JsonData(path='$.event_type')
379
380# rules/spam.sml
381Import(rules=['models/base.sml'])
382
383MessageLength = StringLength(s=MessageText)
384
385ContainsSpamWords = RegexMatch(
386    target=MessageText,
387    pattern=r'(free money|click here|buy now)',
388    case_insensitive=True
389)
390
391SpamMessage = Rule(
392    when_all=[
393        EventType == 'send_message',
394        ContainsSpamWords,
395        MessageLength > 50,
396    ],
397    description=f'Spam detected from user {UserId}'
398)
399
400WhenRules(
401    rules_any=[SpamMessage],
402    then=[
403        DeclareVerdict(verdict='reject'),
404        LabelAdd(entity=UserId, label='spammer', expires_after=TimeDelta(days=30)),
405    ],
406)
407```
408
409### Example 2: New Account Risk Detection
410
411```python
412# models/user.sml
413Import(rules=['models/base.sml'])
414
415AccountCreatedAt: str = JsonData(path='$.user.created_at')
416AccountAge = TimeSince(timestamp=AccountCreatedAt)
417IsNewAccount = AccountAge < TimeDelta(days=7)
418
419EmailAddress: Entity[str] = EntityJson(
420    type='Email',
421    path='$.user.email',
422    coerce_type=True
423)
424
425EmailDomainStr = EmailDomain(email=JsonData(path='$.user.email'))
426
427# rules/new_account.sml
428Import(rules=['models/base.sml', 'models/user.sml'])
429
430IsSuspiciousEmailDomain = EmailDomainStr in ['tempmail.com', 'throwaway.net']
431
432HighRiskNewAccount = Rule(
433    when_all=[
434        IsNewAccount,
435        IsSuspiciousEmailDomain,
436        not HasLabel(entity=UserId, label='verified'),
437    ],
438    description=f'High-risk new account: {UserId} using {EmailAddress}'
439)
440
441WhenRules(
442    rules_any=[HighRiskNewAccount],
443    then=[
444        DeclareVerdict(verdict='challenge'),
445        LabelAdd(entity=UserId, label='needs_verification'),
446        LabelAdd(entity=EmailAddress, label='suspicious_domain'),
447    ],
448)
449```
450
451### Example 3: Multi-Tier Detection
452
453```python
454Import(rules=['models/base.sml', 'models/user.sml'])
455
456# Tier 1: Basic suspicious activity
457BasicSuspicious = Rule(
458    when_all=[
459        HasLabel(entity=UserId, label='previously_warned'),
460        EventType == 'create_post',
461    ],
462    description=f'Previously warned user {UserId} creating content'
463)
464
465# Tier 2: Escalated risk
466EscalatedRisk = Rule(
467    when_all=[
468        BasicSuspicious,
469        HasLabel(entity=UserId, label='multiple_violations'),
470    ],
471    description=f'Repeat offender {UserId} detected'
472)
473
474# Tier 3: Severe risk
475SevereRisk = Rule(
476    when_all=[
477        EscalatedRisk,
478        IsNewAccount,
479    ],
480    description=f'Severe risk: new repeat offender {UserId}'
481)
482
483WhenRules(
484    rules_any=[BasicSuspicious, EscalatedRisk, SevereRisk],
485    then=[
486        # Always apply basic flag
487        LabelAdd(entity=UserId, label='flagged'),
488        # Conditional escalations
489        LabelAdd(entity=UserId, label='review_queue', apply_if=EscalatedRisk),
490        DeclareVerdict(verdict='reject', apply_if=SevereRisk),
491    ],
492)
493```
494
495---
496
497## Common Patterns
498
499### Pattern 1: Safe Field Access
500
501```python
502# For potentially missing fields, use required=False
503MaybeField: Optional[str] = JsonData(path='$.optional.field', required=False)
504
505# Then check for null before using
506SafeRule = Rule(when_all=[
507    MaybeField != Null,
508    StringLength(s=MaybeField) > 10,
509])
510```
511
512### Pattern 2: Action-Specific Rules
513
514```python
515# main.sml
516ActionName = GetActionName()
517Require(rule=f'actions/{ActionName}.sml')
518```
519
520### Pattern 3: Reusable Feature Definitions
521
522```python
523# models/features.sml
524MessageLength = StringLength(s=MessageText)
525IsLongMessage = MessageLength > 500
526IsShortMessage = MessageLength < 10
527ContainsUrls = ListLength(list=StringExtractURLs(s=MessageText)) > 0
528
529# rules/detection.sml
530Import(rules=['models/features.sml'])
531
532SuspiciousLongMessage = Rule(when_all=[
533    IsLongMessage,
534    ContainsUrls,
535])
536```
537
538### Pattern 4: Label-Based State Machine
539
540```python
541# First offense
542FirstOffense = Rule(when_all=[
543    ViolatesPolicy,
544    not HasLabel(entity=UserId, label='warned'),
545])
546
547# Second offense
548SecondOffense = Rule(when_all=[
549    ViolatesPolicy,
550    HasLabel(entity=UserId, label='warned'),
551    not HasLabel(entity=UserId, label='suspended'),
552])
553
554WhenRules(
555    rules_any=[FirstOffense],
556    then=[
557        LabelAdd(entity=UserId, label='warned', expires_after=TimeDelta(days=30)),
558    ],
559)
560
561WhenRules(
562    rules_any=[SecondOffense],
563    then=[
564        LabelAdd(entity=UserId, label='suspended'),
565        DeclareVerdict(verdict='reject'),
566    ],
567)
568```
569
570---
571
572## Validation Rules
573
574SML validates rules at compile time. Common validation errors:
575
576| Error | Cause | Fix |
577|-------|-------|-----|
578| "rules must be stored in non-local features" | Rule name starts with `_` | Remove underscore prefix |
579| "use local feaures when possible" | Feature that isn't useful to a moderator or evaluator in the UI starts with `_` | Add underscore prefix |
580| "requires either a string literal or an f-string" | Using a variable for description | Use string or f-string literal |
581| "import rules are not sorted" | Import list not alphabetized | Sort imports lexicographically |
582| "imported file not found" | Invalid path in Import | Check file path exists |
583| "unknown label" | Label not in config | Add label to labels config |
584| "invalid regex pattern" | Bad regex syntax | Fix regex pattern |
585
586---
587
588## Quick Reference
589
590### Rule Definition
591```python
592RuleName = Rule(
593    when_all=[conditions],
594    description='string' or f'f-string with {Variable}'
595)
596```
597
598### Effect Wiring
599```python
600WhenRules(
601    rules_any=[Rule1, Rule2],
602    then=[Effect1, Effect2],
603)
604
605
606### Null Safety Checklist
607- [ ] Check if optional fields use `required=False`
608- [ ] Add explicit `!= Null` checks before using potentially null values
609- [ ] Consider using `ResolveOptional` for default values
610- [ ] Test rules with missing data scenarios