SML Rules Guide for AI Agents#
This document provides a comprehensive guide to writing SML (Some Madeup Language) rules in Osprey. SML is a statically-typed subset of Python designed for writing detection and classification rules.
Table of Contents#
- Overview
- Basic Concepts
- Rule Structure
- Data Types
- Operators
- Built-in Functions (UDFs)
- File Organization
- Wiring Rules to Effects
- Null Handling
- Complete Examples
- Common Patterns
- Validation Rules
Overview#
SML rules are used to evaluate incoming action data and trigger effects like verdicts and labels. Key characteristics:
- Statically typed: All types are checked at validation time
- Python-like syntax: Familiar syntax with restrictions for safety
- Stateless logic: Rules define conditions, not procedures
- Event-driven: Rules evaluate against incoming action JSON data
Basic Concepts#
What is a Rule?#
A rule is a named boolean expression that evaluates conditions against action data:
MyRule = Rule(
when_all=[
# List of conditions - ALL must be True for the rule to pass
Condition1,
Condition2,
],
description='Human-readable description of what this rule detects'
)
What is an Effect?#
Effects are actions taken when rules pass. They are triggered through WhenRules():
DeclareVerdict(verdict='reject')- Returns a verdict to the callerLabelAdd(entity=UserId, label='flagged')- Adds a label to an entityLabelRemove(entity=UserId, label='flagged')- Removes a label from an entity
What is an Entity?#
Entities are typed identifiers (like User IDs, emails, IPs) that can have labels attached:
UserId: Entity[str] = EntityJson(type='User', path='$.user_id')
Rule Structure#
Basic Rule Definition#
RuleName = Rule(
when_all=[
# Conditions go here - ALL must be True
],
description='Description string or f-string'
)
Syntax Requirements#
-
Rule names must be non-local variables (cannot start with
_):# Valid MyRule = Rule(...) # Invalid - will fail validation _MyRule = Rule(...) -
Descriptions must be string or f-string literals (not variables):
# Valid description='Static description' description=f'User {UserId} triggered rule' # Invalid - will fail validation my_desc = 'description' description=my_desc -
when_all accepts a list of boolean conditions:
- All conditions must evaluate to True for the rule to pass
- Conditions can be comparisons, function calls, other rules, or boolean values
Data Types#
Basic Types#
# Integer
Count: int = JsonData(path='$.count')
# String
Name: str = JsonData(path='$.name')
# Boolean
IsActive: bool = JsonData(path='$.active')
# List
Items: list = JsonData(path='$.items')
Entity Types#
Entities are special types for identifiers that can have labels:
# Entity with string ID
UserId: Entity[str] = EntityJson(
type='User',
path='$.user_id',
coerce_type=True # Optional: convert value to expected type
)
# Entity with integer ID
PostId: Entity[int] = EntityJson(
type='Post',
path='$.post_id'
)
# Manually created entity
MyEntity = Entity(type='MyType', id='some_value')
Optional Types#
For fields that may not exist:
# Optional field - won't fail if missing
OptionalField: Optional[str] = JsonData(path='$.maybe_exists', required=False)
Operators#
Comparison Operators#
Value == 5 # Equals
Value != 5 # Not equals
Value > 5 # Greater than
Value >= 5 # Greater than or equal
Value < 5 # Less than
Value <= 5 # Less than or equal
Value in [1, 2, 3] # In list
Value not in [1, 2, 3] # Not in list
Arithmetic Operators#
5 + 3 # Addition
5 - 3 # Subtraction
5 * 3 # Multiplication
5 / 3 # Division
5 // 3 # Floor division
5 % 3 # Modulo
5 ** 3 # Power
Boolean Operators#
Condition1 and Condition2 # Logical AND
Condition1 or Condition2 # Logical OR
not Condition1 # Logical NOT
# In when_all, conditions are implicitly AND-ed:
Rule(when_all=[
Cond1,
Cond2, # Both must be True
])
# Use 'or' for explicit OR logic:
Rule(when_all=[
(Cond1 or Cond2), # Either Cond1 or Cond2
Cond3, # AND Cond3
])
Null Checking#
Value != Null # Check if value is NOT null
Value == Null # Check if value IS null
File Organization#
Import - Include Other Files#
Use Import to include rules/features from other files:
Import(rules=[
'models/base.sml',
'models/user.sml',
'rules/common.sml',
])
Requirements:
- File paths must be relative
- List must be lexicographically sorted
- No duplicates allowed
- Imported variables/rules are accessible in current file
Require - Conditionally Include Files#
Use Require for conditional or template-based includes:
# Always include
Require(rule='expensive_check.sml')
# Conditional include
Require(rule='ai_check.sml', require_if=ActionName == 'register')
# Template-based (f-string)
Require(rule=f'actions/{ActionName}.sml')
Note: Unlike Import, outputs from Required files are NOT accessible in the parent file.
Typical File Structure#
rules/
├── main.sml # Entry point
├── models/
│ ├── base.sml # Common entities (UserId, etc.)
│ ├── user.sml # User-related features
│ └── content.sml # Content-related features
├── rules/
│ ├── spam.sml # Spam detection rules
│ └── abuse.sml # Abuse detection rules
└── actions/
├── register.sml # Action-specific rules
└── send_message.sml
Wiring Rules to Effects#
Rules by themselves don't do anything. Use WhenRules() to connect rules to effects:
WhenRules(
rules_any=[
Rule1,
Rule2,
Rule3,
],
then=[
DeclareVerdict(verdict='reject'),
LabelAdd(entity=UserId, label='flagged'),
],
)
Semantics:
- If ANY rule in
rules_anyevaluates to True, ALL effects inthenexecute - Failed rules don't prevent other rules from being checked
- Failed effects don't prevent other effects from executing
Conditional Effects#
Use apply_if to make individual effects conditional:
WhenRules(
rules_any=[MainRule],
then=[
LabelAdd(entity=UserId, label='basic_flag'),
LabelAdd(entity=UserId, label='severe_flag', apply_if=SevereRule),
LabelAdd(entity=UserId, label='repeat_offender', apply_if=RepeatOffenderRule),
],
)
Null Handling (CRITICAL)#
SML has unique null semantics that differ from most languages. Understanding this is critical.
Null Propagation Rule#
If a value evaluates to Null:
- The containing rule evaluates to Null (not False!)
- Any rule depending on it also becomes Null
- This propagates through the entire dependency chain
Example of Null Propagation#
# If $.missing_property doesn't exist...
Thing: int = JsonData(path='$.missing_property')
# This rule becomes Null (NOT False)
MyRule = Rule(when_all=[
Thing > 1,
])
# This rule ALSO becomes Null (propagates!)
DependentRule = Rule(when_all=[
MyRule,
])
Solutions to Null Issues#
Solution 1: Use required=False
Thing: Optional[int] = JsonData(path='$.maybe_exists', required=False)
Solution 2: Explicit null checks
SafeRule = Rule(when_all=[
Thing != Null, # Guard against null
Thing > 1,
])
Solution 3: Use ResolveOptional
SafeThing: int = ResolveOptional(
optional_value=MaybeThing,
default_value=0
)
Complete Examples#
Example 1: Basic Spam Detection#
# models/base.sml
UserId: Entity[str] = EntityJson(
type='User',
path='$.user_id',
coerce_type=True
)
MessageText: str = JsonData(path='$.message.text')
EventType: str = JsonData(path='$.event_type')
# rules/spam.sml
Import(rules=['models/base.sml'])
MessageLength = StringLength(s=MessageText)
ContainsSpamWords = RegexMatch(
target=MessageText,
pattern=r'(free money|click here|buy now)',
case_insensitive=True
)
SpamMessage = Rule(
when_all=[
EventType == 'send_message',
ContainsSpamWords,
MessageLength > 50,
],
description=f'Spam detected from user {UserId}'
)
WhenRules(
rules_any=[SpamMessage],
then=[
DeclareVerdict(verdict='reject'),
LabelAdd(entity=UserId, label='spammer', expires_after=TimeDelta(days=30)),
],
)
Example 2: New Account Risk Detection#
# models/user.sml
Import(rules=['models/base.sml'])
AccountCreatedAt: str = JsonData(path='$.user.created_at')
AccountAge = TimeSince(timestamp=AccountCreatedAt)
IsNewAccount = AccountAge < TimeDelta(days=7)
EmailAddress: Entity[str] = EntityJson(
type='Email',
path='$.user.email',
coerce_type=True
)
EmailDomainStr = EmailDomain(email=JsonData(path='$.user.email'))
# rules/new_account.sml
Import(rules=['models/base.sml', 'models/user.sml'])
IsSuspiciousEmailDomain = EmailDomainStr in ['tempmail.com', 'throwaway.net']
HighRiskNewAccount = Rule(
when_all=[
IsNewAccount,
IsSuspiciousEmailDomain,
not HasLabel(entity=UserId, label='verified'),
],
description=f'High-risk new account: {UserId} using {EmailAddress}'
)
WhenRules(
rules_any=[HighRiskNewAccount],
then=[
DeclareVerdict(verdict='challenge'),
LabelAdd(entity=UserId, label='needs_verification'),
LabelAdd(entity=EmailAddress, label='suspicious_domain'),
],
)
Example 3: Multi-Tier Detection#
Import(rules=['models/base.sml', 'models/user.sml'])
# Tier 1: Basic suspicious activity
BasicSuspicious = Rule(
when_all=[
HasLabel(entity=UserId, label='previously_warned'),
EventType == 'create_post',
],
description=f'Previously warned user {UserId} creating content'
)
# Tier 2: Escalated risk
EscalatedRisk = Rule(
when_all=[
BasicSuspicious,
HasLabel(entity=UserId, label='multiple_violations'),
],
description=f'Repeat offender {UserId} detected'
)
# Tier 3: Severe risk
SevereRisk = Rule(
when_all=[
EscalatedRisk,
IsNewAccount,
],
description=f'Severe risk: new repeat offender {UserId}'
)
WhenRules(
rules_any=[BasicSuspicious, EscalatedRisk, SevereRisk],
then=[
# Always apply basic flag
LabelAdd(entity=UserId, label='flagged'),
# Conditional escalations
LabelAdd(entity=UserId, label='review_queue', apply_if=EscalatedRisk),
DeclareVerdict(verdict='reject', apply_if=SevereRisk),
],
)
Common Patterns#
Pattern 1: Safe Field Access#
# For potentially missing fields, use required=False
MaybeField: Optional[str] = JsonData(path='$.optional.field', required=False)
# Then check for null before using
SafeRule = Rule(when_all=[
MaybeField != Null,
StringLength(s=MaybeField) > 10,
])
Pattern 2: Action-Specific Rules#
# main.sml
ActionName = GetActionName()
Require(rule=f'actions/{ActionName}.sml')
Pattern 3: Reusable Feature Definitions#
# models/features.sml
MessageLength = StringLength(s=MessageText)
IsLongMessage = MessageLength > 500
IsShortMessage = MessageLength < 10
ContainsUrls = ListLength(list=StringExtractURLs(s=MessageText)) > 0
# rules/detection.sml
Import(rules=['models/features.sml'])
SuspiciousLongMessage = Rule(when_all=[
IsLongMessage,
ContainsUrls,
])
Pattern 4: Label-Based State Machine#
# First offense
FirstOffense = Rule(when_all=[
ViolatesPolicy,
not HasLabel(entity=UserId, label='warned'),
])
# Second offense
SecondOffense = Rule(when_all=[
ViolatesPolicy,
HasLabel(entity=UserId, label='warned'),
not HasLabel(entity=UserId, label='suspended'),
])
WhenRules(
rules_any=[FirstOffense],
then=[
LabelAdd(entity=UserId, label='warned', expires_after=TimeDelta(days=30)),
],
)
WhenRules(
rules_any=[SecondOffense],
then=[
LabelAdd(entity=UserId, label='suspended'),
DeclareVerdict(verdict='reject'),
],
)
Validation Rules#
SML validates rules at compile time. Common validation errors:
| Error | Cause | Fix |
|---|---|---|
| "rules must be stored in non-local features" | Rule name starts with _ |
Remove underscore prefix |
| "use local feaures when possible" | Feature that isn't useful to a moderator or evaluator in the UI starts with _ |
Add underscore prefix |
| "requires either a string literal or an f-string" | Using a variable for description | Use string or f-string literal |
| "import rules are not sorted" | Import list not alphabetized | Sort imports lexicographically |
| "imported file not found" | Invalid path in Import | Check file path exists |
| "unknown label" | Label not in config | Add label to labels config |
| "invalid regex pattern" | Bad regex syntax | Fix regex pattern |
Quick Reference#
Rule Definition#
RuleName = Rule(
when_all=[conditions],
description='string' or f'f-string with {Variable}'
)
Effect Wiring#
WhenRules(
rules_any=[Rule1, Rule2],
then=[Effect1, Effect2],
)
### Null Safety Checklist
- [ ] Check if optional fields use `required=False`
- [ ] Add explicit `!= Null` checks before using potentially null values
- [ ] Consider using `ResolveOptional` for default values
- [ ] Test rules with missing data scenarios