# SML Rules Guide for AI Agents This document provides a comprehensive guide to writing SML (Some Madeup Language) rules in Osprey. SML is a statically-typed subset of Python designed for writing detection and classification rules. ## Table of Contents 1. [Overview](#overview) 2. [Basic Concepts](#basic-concepts) 3. [Rule Structure](#rule-structure) 4. [Data Types](#data-types) 5. [Operators](#operators) 6. [Built-in Functions (UDFs)](#built-in-functions-udfs) 7. [File Organization](#file-organization) 8. [Wiring Rules to Effects](#wiring-rules-to-effects) 9. [Null Handling](#null-handling-critical) 10. [Complete Examples](#complete-examples) 11. [Common Patterns](#common-patterns) 12. [Validation Rules](#validation-rules) --- ## Overview SML rules are used to evaluate incoming action data and trigger effects like verdicts and labels. Key characteristics: - **Statically typed**: All types are checked at validation time - **Python-like syntax**: Familiar syntax with restrictions for safety - **Stateless logic**: Rules define conditions, not procedures - **Event-driven**: Rules evaluate against incoming action JSON data --- ## Basic Concepts ### What is a Rule? A rule is a named boolean expression that evaluates conditions against action data: ```python MyRule = Rule( when_all=[ # List of conditions - ALL must be True for the rule to pass Condition1, Condition2, ], description='Human-readable description of what this rule detects' ) ``` ### What is an Effect? Effects are actions taken when rules pass. They are triggered through `WhenRules()`: - `DeclareVerdict(verdict='reject')` - Returns a verdict to the caller - `LabelAdd(entity=UserId, label='flagged')` - Adds a label to an entity - `LabelRemove(entity=UserId, label='flagged')` - Removes a label from an entity ### What is an Entity? Entities are typed identifiers (like User IDs, emails, IPs) that can have labels attached: ```python UserId: Entity[str] = EntityJson(type='User', path='$.user_id') ``` --- ## Rule Structure ### Basic Rule Definition ```python RuleName = Rule( when_all=[ # Conditions go here - ALL must be True ], description='Description string or f-string' ) ``` ### Syntax Requirements 1. **Rule names must be non-local variables** (cannot start with `_`): ```python # Valid MyRule = Rule(...) # Invalid - will fail validation _MyRule = Rule(...) ``` 2. **Descriptions must be string or f-string literals** (not variables): ```python # Valid description='Static description' description=f'User {UserId} triggered rule' # Invalid - will fail validation my_desc = 'description' description=my_desc ``` 3. **when_all accepts a list of boolean conditions**: - All conditions must evaluate to True for the rule to pass - Conditions can be comparisons, function calls, other rules, or boolean values --- ## Data Types ### Basic Types ```python # Integer Count: int = JsonData(path='$.count') # String Name: str = JsonData(path='$.name') # Boolean IsActive: bool = JsonData(path='$.active') # List Items: list = JsonData(path='$.items') ``` ### Entity Types Entities are special types for identifiers that can have labels: ```python # Entity with string ID UserId: Entity[str] = EntityJson( type='User', path='$.user_id', coerce_type=True # Optional: convert value to expected type ) # Entity with integer ID PostId: Entity[int] = EntityJson( type='Post', path='$.post_id' ) # Manually created entity MyEntity = Entity(type='MyType', id='some_value') ``` ### Optional Types For fields that may not exist: ```python # Optional field - won't fail if missing OptionalField: Optional[str] = JsonData(path='$.maybe_exists', required=False) ``` --- ## Operators ### Comparison Operators ```python Value == 5 # Equals Value != 5 # Not equals Value > 5 # Greater than Value >= 5 # Greater than or equal Value < 5 # Less than Value <= 5 # Less than or equal Value in [1, 2, 3] # In list Value not in [1, 2, 3] # Not in list ``` ### Arithmetic Operators ```python 5 + 3 # Addition 5 - 3 # Subtraction 5 * 3 # Multiplication 5 / 3 # Division 5 // 3 # Floor division 5 % 3 # Modulo 5 ** 3 # Power ``` ### Boolean Operators ```python Condition1 and Condition2 # Logical AND Condition1 or Condition2 # Logical OR not Condition1 # Logical NOT # In when_all, conditions are implicitly AND-ed: Rule(when_all=[ Cond1, Cond2, # Both must be True ]) # Use 'or' for explicit OR logic: Rule(when_all=[ (Cond1 or Cond2), # Either Cond1 or Cond2 Cond3, # AND Cond3 ]) ``` ### Null Checking ```python Value != Null # Check if value is NOT null Value == Null # Check if value IS null ``` --- ## File Organization ### Import - Include Other Files Use `Import` to include rules/features from other files: ```python Import(rules=[ 'models/base.sml', 'models/user.sml', 'rules/common.sml', ]) ``` **Requirements:** - File paths must be relative - List must be **lexicographically sorted** - No duplicates allowed - Imported variables/rules are accessible in current file ### Require - Conditionally Include Files Use `Require` for conditional or template-based includes: ```python # Always include Require(rule='expensive_check.sml') # Conditional include Require(rule='ai_check.sml', require_if=ActionName == 'register') # Template-based (f-string) Require(rule=f'actions/{ActionName}.sml') ``` **Note:** Unlike Import, outputs from Required files are NOT accessible in the parent file. ### Typical File Structure ``` rules/ ├── main.sml # Entry point ├── models/ │ ├── base.sml # Common entities (UserId, etc.) │ ├── user.sml # User-related features │ └── content.sml # Content-related features ├── rules/ │ ├── spam.sml # Spam detection rules │ └── abuse.sml # Abuse detection rules └── actions/ ├── register.sml # Action-specific rules └── send_message.sml ``` --- ## Wiring Rules to Effects Rules by themselves don't do anything. Use `WhenRules()` to connect rules to effects: ```python WhenRules( rules_any=[ Rule1, Rule2, Rule3, ], then=[ DeclareVerdict(verdict='reject'), LabelAdd(entity=UserId, label='flagged'), ], ) ``` **Semantics:** - If **ANY** rule in `rules_any` evaluates to True, **ALL** effects in `then` execute - Failed rules don't prevent other rules from being checked - Failed effects don't prevent other effects from executing ### Conditional Effects Use `apply_if` to make individual effects conditional: ```python WhenRules( rules_any=[MainRule], then=[ LabelAdd(entity=UserId, label='basic_flag'), LabelAdd(entity=UserId, label='severe_flag', apply_if=SevereRule), LabelAdd(entity=UserId, label='repeat_offender', apply_if=RepeatOffenderRule), ], ) ``` --- ## Null Handling (CRITICAL) SML has unique null semantics that differ from most languages. **Understanding this is critical.** ### Null Propagation Rule If a value evaluates to Null: 1. The containing rule evaluates to **Null** (not False!) 2. Any rule depending on it also becomes **Null** 3. This propagates through the entire dependency chain ### Example of Null Propagation ```python # If $.missing_property doesn't exist... Thing: int = JsonData(path='$.missing_property') # This rule becomes Null (NOT False) MyRule = Rule(when_all=[ Thing > 1, ]) # This rule ALSO becomes Null (propagates!) DependentRule = Rule(when_all=[ MyRule, ]) ``` ### Solutions to Null Issues **Solution 1: Use `required=False`** ```python Thing: Optional[int] = JsonData(path='$.maybe_exists', required=False) ``` **Solution 2: Explicit null checks** ```python SafeRule = Rule(when_all=[ Thing != Null, # Guard against null Thing > 1, ]) ``` **Solution 3: Use ResolveOptional** ```python SafeThing: int = ResolveOptional( optional_value=MaybeThing, default_value=0 ) ``` --- ## Complete Examples ### Example 1: Basic Spam Detection ```python # models/base.sml UserId: Entity[str] = EntityJson( type='User', path='$.user_id', coerce_type=True ) MessageText: str = JsonData(path='$.message.text') EventType: str = JsonData(path='$.event_type') # rules/spam.sml Import(rules=['models/base.sml']) MessageLength = StringLength(s=MessageText) ContainsSpamWords = RegexMatch( target=MessageText, pattern=r'(free money|click here|buy now)', case_insensitive=True ) SpamMessage = Rule( when_all=[ EventType == 'send_message', ContainsSpamWords, MessageLength > 50, ], description=f'Spam detected from user {UserId}' ) WhenRules( rules_any=[SpamMessage], then=[ DeclareVerdict(verdict='reject'), LabelAdd(entity=UserId, label='spammer', expires_after=TimeDelta(days=30)), ], ) ``` ### Example 2: New Account Risk Detection ```python # models/user.sml Import(rules=['models/base.sml']) AccountCreatedAt: str = JsonData(path='$.user.created_at') AccountAge = TimeSince(timestamp=AccountCreatedAt) IsNewAccount = AccountAge < TimeDelta(days=7) EmailAddress: Entity[str] = EntityJson( type='Email', path='$.user.email', coerce_type=True ) EmailDomainStr = EmailDomain(email=JsonData(path='$.user.email')) # rules/new_account.sml Import(rules=['models/base.sml', 'models/user.sml']) IsSuspiciousEmailDomain = EmailDomainStr in ['tempmail.com', 'throwaway.net'] HighRiskNewAccount = Rule( when_all=[ IsNewAccount, IsSuspiciousEmailDomain, not HasLabel(entity=UserId, label='verified'), ], description=f'High-risk new account: {UserId} using {EmailAddress}' ) WhenRules( rules_any=[HighRiskNewAccount], then=[ DeclareVerdict(verdict='challenge'), LabelAdd(entity=UserId, label='needs_verification'), LabelAdd(entity=EmailAddress, label='suspicious_domain'), ], ) ``` ### Example 3: Multi-Tier Detection ```python Import(rules=['models/base.sml', 'models/user.sml']) # Tier 1: Basic suspicious activity BasicSuspicious = Rule( when_all=[ HasLabel(entity=UserId, label='previously_warned'), EventType == 'create_post', ], description=f'Previously warned user {UserId} creating content' ) # Tier 2: Escalated risk EscalatedRisk = Rule( when_all=[ BasicSuspicious, HasLabel(entity=UserId, label='multiple_violations'), ], description=f'Repeat offender {UserId} detected' ) # Tier 3: Severe risk SevereRisk = Rule( when_all=[ EscalatedRisk, IsNewAccount, ], description=f'Severe risk: new repeat offender {UserId}' ) WhenRules( rules_any=[BasicSuspicious, EscalatedRisk, SevereRisk], then=[ # Always apply basic flag LabelAdd(entity=UserId, label='flagged'), # Conditional escalations LabelAdd(entity=UserId, label='review_queue', apply_if=EscalatedRisk), DeclareVerdict(verdict='reject', apply_if=SevereRisk), ], ) ``` --- ## Common Patterns ### Pattern 1: Safe Field Access ```python # For potentially missing fields, use required=False MaybeField: Optional[str] = JsonData(path='$.optional.field', required=False) # Then check for null before using SafeRule = Rule(when_all=[ MaybeField != Null, StringLength(s=MaybeField) > 10, ]) ``` ### Pattern 2: Action-Specific Rules ```python # main.sml ActionName = GetActionName() Require(rule=f'actions/{ActionName}.sml') ``` ### Pattern 3: Reusable Feature Definitions ```python # models/features.sml MessageLength = StringLength(s=MessageText) IsLongMessage = MessageLength > 500 IsShortMessage = MessageLength < 10 ContainsUrls = ListLength(list=StringExtractURLs(s=MessageText)) > 0 # rules/detection.sml Import(rules=['models/features.sml']) SuspiciousLongMessage = Rule(when_all=[ IsLongMessage, ContainsUrls, ]) ``` ### Pattern 4: Label-Based State Machine ```python # First offense FirstOffense = Rule(when_all=[ ViolatesPolicy, not HasLabel(entity=UserId, label='warned'), ]) # Second offense SecondOffense = Rule(when_all=[ ViolatesPolicy, HasLabel(entity=UserId, label='warned'), not HasLabel(entity=UserId, label='suspended'), ]) WhenRules( rules_any=[FirstOffense], then=[ LabelAdd(entity=UserId, label='warned', expires_after=TimeDelta(days=30)), ], ) WhenRules( rules_any=[SecondOffense], then=[ LabelAdd(entity=UserId, label='suspended'), DeclareVerdict(verdict='reject'), ], ) ``` --- ## Validation Rules SML validates rules at compile time. Common validation errors: | Error | Cause | Fix | |-------|-------|-----| | "rules must be stored in non-local features" | Rule name starts with `_` | Remove underscore prefix | | "use local feaures when possible" | Feature that isn't useful to a moderator or evaluator in the UI starts with `_` | Add underscore prefix | | "requires either a string literal or an f-string" | Using a variable for description | Use string or f-string literal | | "import rules are not sorted" | Import list not alphabetized | Sort imports lexicographically | | "imported file not found" | Invalid path in Import | Check file path exists | | "unknown label" | Label not in config | Add label to labels config | | "invalid regex pattern" | Bad regex syntax | Fix regex pattern | --- ## Quick Reference ### Rule Definition ```python RuleName = Rule( when_all=[conditions], description='string' or f'f-string with {Variable}' ) ``` ### Effect Wiring ```python WhenRules( rules_any=[Rule1, Rule2], then=[Effect1, Effect2], ) ### Null Safety Checklist - [ ] Check if optional fields use `required=False` - [ ] Add explicit `!= Null` checks before using potentially null values - [ ] Consider using `ResolveOptional` for default values - [ ] Test rules with missing data scenarios