An MCP server for Osprey
at main 610 lines 14 kB view raw view rendered
1# SML Rules Guide for AI Agents 2 3This document provides a comprehensive guide to writing SML (Some Madeup Language) rules in Osprey. SML is a statically-typed subset of Python designed for writing detection and classification rules. 4 5## Table of Contents 6 71. [Overview](#overview) 82. [Basic Concepts](#basic-concepts) 93. [Rule Structure](#rule-structure) 104. [Data Types](#data-types) 115. [Operators](#operators) 126. [Built-in Functions (UDFs)](#built-in-functions-udfs) 137. [File Organization](#file-organization) 148. [Wiring Rules to Effects](#wiring-rules-to-effects) 159. [Null Handling](#null-handling-critical) 1610. [Complete Examples](#complete-examples) 1711. [Common Patterns](#common-patterns) 1812. [Validation Rules](#validation-rules) 19 20--- 21 22## Overview 23 24SML rules are used to evaluate incoming action data and trigger effects like verdicts and labels. Key characteristics: 25 26- **Statically typed**: All types are checked at validation time 27- **Python-like syntax**: Familiar syntax with restrictions for safety 28- **Stateless logic**: Rules define conditions, not procedures 29- **Event-driven**: Rules evaluate against incoming action JSON data 30 31--- 32 33## Basic Concepts 34 35### What is a Rule? 36 37A rule is a named boolean expression that evaluates conditions against action data: 38 39```python 40MyRule = Rule( 41 when_all=[ 42 # List of conditions - ALL must be True for the rule to pass 43 Condition1, 44 Condition2, 45 ], 46 description='Human-readable description of what this rule detects' 47) 48``` 49 50### What is an Effect? 51 52Effects are actions taken when rules pass. They are triggered through `WhenRules()`: 53 54- `DeclareVerdict(verdict='reject')` - Returns a verdict to the caller 55- `LabelAdd(entity=UserId, label='flagged')` - Adds a label to an entity 56- `LabelRemove(entity=UserId, label='flagged')` - Removes a label from an entity 57 58### What is an Entity? 59 60Entities are typed identifiers (like User IDs, emails, IPs) that can have labels attached: 61 62```python 63UserId: Entity[str] = EntityJson(type='User', path='$.user_id') 64``` 65 66--- 67 68## Rule Structure 69 70### Basic Rule Definition 71 72```python 73RuleName = Rule( 74 when_all=[ 75 # Conditions go here - ALL must be True 76 ], 77 description='Description string or f-string' 78) 79``` 80 81### Syntax Requirements 82 831. **Rule names must be non-local variables** (cannot start with `_`): 84 ```python 85 # Valid 86 MyRule = Rule(...) 87 88 # Invalid - will fail validation 89 _MyRule = Rule(...) 90 ``` 91 922. **Descriptions must be string or f-string literals** (not variables): 93 ```python 94 # Valid 95 description='Static description' 96 description=f'User {UserId} triggered rule' 97 98 # Invalid - will fail validation 99 my_desc = 'description' 100 description=my_desc 101 ``` 102 1033. **when_all accepts a list of boolean conditions**: 104 - All conditions must evaluate to True for the rule to pass 105 - Conditions can be comparisons, function calls, other rules, or boolean values 106 107--- 108 109## Data Types 110 111### Basic Types 112 113```python 114# Integer 115Count: int = JsonData(path='$.count') 116 117# String 118Name: str = JsonData(path='$.name') 119 120# Boolean 121IsActive: bool = JsonData(path='$.active') 122 123# List 124Items: list = JsonData(path='$.items') 125``` 126 127### Entity Types 128 129Entities are special types for identifiers that can have labels: 130 131```python 132# Entity with string ID 133UserId: Entity[str] = EntityJson( 134 type='User', 135 path='$.user_id', 136 coerce_type=True # Optional: convert value to expected type 137) 138 139# Entity with integer ID 140PostId: Entity[int] = EntityJson( 141 type='Post', 142 path='$.post_id' 143) 144 145# Manually created entity 146MyEntity = Entity(type='MyType', id='some_value') 147``` 148 149### Optional Types 150 151For fields that may not exist: 152 153```python 154# Optional field - won't fail if missing 155OptionalField: Optional[str] = JsonData(path='$.maybe_exists', required=False) 156``` 157 158--- 159 160## Operators 161 162### Comparison Operators 163 164```python 165Value == 5 # Equals 166Value != 5 # Not equals 167Value > 5 # Greater than 168Value >= 5 # Greater than or equal 169Value < 5 # Less than 170Value <= 5 # Less than or equal 171Value in [1, 2, 3] # In list 172Value not in [1, 2, 3] # Not in list 173``` 174 175### Arithmetic Operators 176 177```python 1785 + 3 # Addition 1795 - 3 # Subtraction 1805 * 3 # Multiplication 1815 / 3 # Division 1825 // 3 # Floor division 1835 % 3 # Modulo 1845 ** 3 # Power 185``` 186 187### Boolean Operators 188 189```python 190Condition1 and Condition2 # Logical AND 191Condition1 or Condition2 # Logical OR 192not Condition1 # Logical NOT 193 194# In when_all, conditions are implicitly AND-ed: 195Rule(when_all=[ 196 Cond1, 197 Cond2, # Both must be True 198]) 199 200# Use 'or' for explicit OR logic: 201Rule(when_all=[ 202 (Cond1 or Cond2), # Either Cond1 or Cond2 203 Cond3, # AND Cond3 204]) 205``` 206 207### Null Checking 208 209```python 210Value != Null # Check if value is NOT null 211Value == Null # Check if value IS null 212``` 213 214--- 215 216## File Organization 217 218### Import - Include Other Files 219 220Use `Import` to include rules/features from other files: 221 222```python 223Import(rules=[ 224 'models/base.sml', 225 'models/user.sml', 226 'rules/common.sml', 227]) 228``` 229 230**Requirements:** 231- File paths must be relative 232- List must be **lexicographically sorted** 233- No duplicates allowed 234- Imported variables/rules are accessible in current file 235 236### Require - Conditionally Include Files 237 238Use `Require` for conditional or template-based includes: 239 240```python 241# Always include 242Require(rule='expensive_check.sml') 243 244# Conditional include 245Require(rule='ai_check.sml', require_if=ActionName == 'register') 246 247# Template-based (f-string) 248Require(rule=f'actions/{ActionName}.sml') 249``` 250 251**Note:** Unlike Import, outputs from Required files are NOT accessible in the parent file. 252 253### Typical File Structure 254 255``` 256rules/ 257├── main.sml # Entry point 258├── models/ 259│ ├── base.sml # Common entities (UserId, etc.) 260│ ├── user.sml # User-related features 261│ └── content.sml # Content-related features 262├── rules/ 263│ ├── spam.sml # Spam detection rules 264│ └── abuse.sml # Abuse detection rules 265└── actions/ 266 ├── register.sml # Action-specific rules 267 └── send_message.sml 268``` 269 270--- 271 272## Wiring Rules to Effects 273 274Rules by themselves don't do anything. Use `WhenRules()` to connect rules to effects: 275 276```python 277WhenRules( 278 rules_any=[ 279 Rule1, 280 Rule2, 281 Rule3, 282 ], 283 then=[ 284 DeclareVerdict(verdict='reject'), 285 LabelAdd(entity=UserId, label='flagged'), 286 ], 287) 288``` 289 290**Semantics:** 291- If **ANY** rule in `rules_any` evaluates to True, **ALL** effects in `then` execute 292- Failed rules don't prevent other rules from being checked 293- Failed effects don't prevent other effects from executing 294 295### Conditional Effects 296 297Use `apply_if` to make individual effects conditional: 298 299```python 300WhenRules( 301 rules_any=[MainRule], 302 then=[ 303 LabelAdd(entity=UserId, label='basic_flag'), 304 LabelAdd(entity=UserId, label='severe_flag', apply_if=SevereRule), 305 LabelAdd(entity=UserId, label='repeat_offender', apply_if=RepeatOffenderRule), 306 ], 307) 308``` 309 310--- 311 312## Null Handling (CRITICAL) 313 314SML has unique null semantics that differ from most languages. **Understanding this is critical.** 315 316### Null Propagation Rule 317 318If a value evaluates to Null: 3191. The containing rule evaluates to **Null** (not False!) 3202. Any rule depending on it also becomes **Null** 3213. This propagates through the entire dependency chain 322 323### Example of Null Propagation 324 325```python 326# If $.missing_property doesn't exist... 327Thing: int = JsonData(path='$.missing_property') 328 329# This rule becomes Null (NOT False) 330MyRule = Rule(when_all=[ 331 Thing > 1, 332]) 333 334# This rule ALSO becomes Null (propagates!) 335DependentRule = Rule(when_all=[ 336 MyRule, 337]) 338``` 339 340### Solutions to Null Issues 341 342**Solution 1: Use `required=False`** 343```python 344Thing: Optional[int] = JsonData(path='$.maybe_exists', required=False) 345``` 346 347**Solution 2: Explicit null checks** 348```python 349SafeRule = Rule(when_all=[ 350 Thing != Null, # Guard against null 351 Thing > 1, 352]) 353``` 354 355**Solution 3: Use ResolveOptional** 356```python 357SafeThing: int = ResolveOptional( 358 optional_value=MaybeThing, 359 default_value=0 360) 361``` 362 363--- 364 365## Complete Examples 366 367### Example 1: Basic Spam Detection 368 369```python 370# models/base.sml 371UserId: Entity[str] = EntityJson( 372 type='User', 373 path='$.user_id', 374 coerce_type=True 375) 376 377MessageText: str = JsonData(path='$.message.text') 378EventType: str = JsonData(path='$.event_type') 379 380# rules/spam.sml 381Import(rules=['models/base.sml']) 382 383MessageLength = StringLength(s=MessageText) 384 385ContainsSpamWords = RegexMatch( 386 target=MessageText, 387 pattern=r'(free money|click here|buy now)', 388 case_insensitive=True 389) 390 391SpamMessage = Rule( 392 when_all=[ 393 EventType == 'send_message', 394 ContainsSpamWords, 395 MessageLength > 50, 396 ], 397 description=f'Spam detected from user {UserId}' 398) 399 400WhenRules( 401 rules_any=[SpamMessage], 402 then=[ 403 DeclareVerdict(verdict='reject'), 404 LabelAdd(entity=UserId, label='spammer', expires_after=TimeDelta(days=30)), 405 ], 406) 407``` 408 409### Example 2: New Account Risk Detection 410 411```python 412# models/user.sml 413Import(rules=['models/base.sml']) 414 415AccountCreatedAt: str = JsonData(path='$.user.created_at') 416AccountAge = TimeSince(timestamp=AccountCreatedAt) 417IsNewAccount = AccountAge < TimeDelta(days=7) 418 419EmailAddress: Entity[str] = EntityJson( 420 type='Email', 421 path='$.user.email', 422 coerce_type=True 423) 424 425EmailDomainStr = EmailDomain(email=JsonData(path='$.user.email')) 426 427# rules/new_account.sml 428Import(rules=['models/base.sml', 'models/user.sml']) 429 430IsSuspiciousEmailDomain = EmailDomainStr in ['tempmail.com', 'throwaway.net'] 431 432HighRiskNewAccount = Rule( 433 when_all=[ 434 IsNewAccount, 435 IsSuspiciousEmailDomain, 436 not HasLabel(entity=UserId, label='verified'), 437 ], 438 description=f'High-risk new account: {UserId} using {EmailAddress}' 439) 440 441WhenRules( 442 rules_any=[HighRiskNewAccount], 443 then=[ 444 DeclareVerdict(verdict='challenge'), 445 LabelAdd(entity=UserId, label='needs_verification'), 446 LabelAdd(entity=EmailAddress, label='suspicious_domain'), 447 ], 448) 449``` 450 451### Example 3: Multi-Tier Detection 452 453```python 454Import(rules=['models/base.sml', 'models/user.sml']) 455 456# Tier 1: Basic suspicious activity 457BasicSuspicious = Rule( 458 when_all=[ 459 HasLabel(entity=UserId, label='previously_warned'), 460 EventType == 'create_post', 461 ], 462 description=f'Previously warned user {UserId} creating content' 463) 464 465# Tier 2: Escalated risk 466EscalatedRisk = Rule( 467 when_all=[ 468 BasicSuspicious, 469 HasLabel(entity=UserId, label='multiple_violations'), 470 ], 471 description=f'Repeat offender {UserId} detected' 472) 473 474# Tier 3: Severe risk 475SevereRisk = Rule( 476 when_all=[ 477 EscalatedRisk, 478 IsNewAccount, 479 ], 480 description=f'Severe risk: new repeat offender {UserId}' 481) 482 483WhenRules( 484 rules_any=[BasicSuspicious, EscalatedRisk, SevereRisk], 485 then=[ 486 # Always apply basic flag 487 LabelAdd(entity=UserId, label='flagged'), 488 # Conditional escalations 489 LabelAdd(entity=UserId, label='review_queue', apply_if=EscalatedRisk), 490 DeclareVerdict(verdict='reject', apply_if=SevereRisk), 491 ], 492) 493``` 494 495--- 496 497## Common Patterns 498 499### Pattern 1: Safe Field Access 500 501```python 502# For potentially missing fields, use required=False 503MaybeField: Optional[str] = JsonData(path='$.optional.field', required=False) 504 505# Then check for null before using 506SafeRule = Rule(when_all=[ 507 MaybeField != Null, 508 StringLength(s=MaybeField) > 10, 509]) 510``` 511 512### Pattern 2: Action-Specific Rules 513 514```python 515# main.sml 516ActionName = GetActionName() 517Require(rule=f'actions/{ActionName}.sml') 518``` 519 520### Pattern 3: Reusable Feature Definitions 521 522```python 523# models/features.sml 524MessageLength = StringLength(s=MessageText) 525IsLongMessage = MessageLength > 500 526IsShortMessage = MessageLength < 10 527ContainsUrls = ListLength(list=StringExtractURLs(s=MessageText)) > 0 528 529# rules/detection.sml 530Import(rules=['models/features.sml']) 531 532SuspiciousLongMessage = Rule(when_all=[ 533 IsLongMessage, 534 ContainsUrls, 535]) 536``` 537 538### Pattern 4: Label-Based State Machine 539 540```python 541# First offense 542FirstOffense = Rule(when_all=[ 543 ViolatesPolicy, 544 not HasLabel(entity=UserId, label='warned'), 545]) 546 547# Second offense 548SecondOffense = Rule(when_all=[ 549 ViolatesPolicy, 550 HasLabel(entity=UserId, label='warned'), 551 not HasLabel(entity=UserId, label='suspended'), 552]) 553 554WhenRules( 555 rules_any=[FirstOffense], 556 then=[ 557 LabelAdd(entity=UserId, label='warned', expires_after=TimeDelta(days=30)), 558 ], 559) 560 561WhenRules( 562 rules_any=[SecondOffense], 563 then=[ 564 LabelAdd(entity=UserId, label='suspended'), 565 DeclareVerdict(verdict='reject'), 566 ], 567) 568``` 569 570--- 571 572## Validation Rules 573 574SML validates rules at compile time. Common validation errors: 575 576| Error | Cause | Fix | 577|-------|-------|-----| 578| "rules must be stored in non-local features" | Rule name starts with `_` | Remove underscore prefix | 579| "use local feaures when possible" | Feature that isn't useful to a moderator or evaluator in the UI starts with `_` | Add underscore prefix | 580| "requires either a string literal or an f-string" | Using a variable for description | Use string or f-string literal | 581| "import rules are not sorted" | Import list not alphabetized | Sort imports lexicographically | 582| "imported file not found" | Invalid path in Import | Check file path exists | 583| "unknown label" | Label not in config | Add label to labels config | 584| "invalid regex pattern" | Bad regex syntax | Fix regex pattern | 585 586--- 587 588## Quick Reference 589 590### Rule Definition 591```python 592RuleName = Rule( 593 when_all=[conditions], 594 description='string' or f'f-string with {Variable}' 595) 596``` 597 598### Effect Wiring 599```python 600WhenRules( 601 rules_any=[Rule1, Rule2], 602 then=[Effect1, Effect2], 603) 604 605 606### Null Safety Checklist 607- [ ] Check if optional fields use `required=False` 608- [ ] Add explicit `!= Null` checks before using potentially null values 609- [ ] Consider using `ResolveOptional` for default values 610- [ ] Test rules with missing data scenarios