An MCP server for Osprey

SML Rules Guide for AI Agents#

This document provides a comprehensive guide to writing SML (Some Madeup Language) rules in Osprey. SML is a statically-typed subset of Python designed for writing detection and classification rules.

Table of Contents#

  1. Overview
  2. Basic Concepts
  3. Rule Structure
  4. Data Types
  5. Operators
  6. Built-in Functions (UDFs)
  7. File Organization
  8. Wiring Rules to Effects
  9. Null Handling
  10. Complete Examples
  11. Common Patterns
  12. Validation Rules

Overview#

SML rules are used to evaluate incoming action data and trigger effects like verdicts and labels. Key characteristics:

  • Statically typed: All types are checked at validation time
  • Python-like syntax: Familiar syntax with restrictions for safety
  • Stateless logic: Rules define conditions, not procedures
  • Event-driven: Rules evaluate against incoming action JSON data

Basic Concepts#

What is a Rule?#

A rule is a named boolean expression that evaluates conditions against action data:

MyRule = Rule(
    when_all=[
        # List of conditions - ALL must be True for the rule to pass
        Condition1,
        Condition2,
    ],
    description='Human-readable description of what this rule detects'
)

What is an Effect?#

Effects are actions taken when rules pass. They are triggered through WhenRules():

  • DeclareVerdict(verdict='reject') - Returns a verdict to the caller
  • LabelAdd(entity=UserId, label='flagged') - Adds a label to an entity
  • LabelRemove(entity=UserId, label='flagged') - Removes a label from an entity

What is an Entity?#

Entities are typed identifiers (like User IDs, emails, IPs) that can have labels attached:

UserId: Entity[str] = EntityJson(type='User', path='$.user_id')

Rule Structure#

Basic Rule Definition#

RuleName = Rule(
    when_all=[
        # Conditions go here - ALL must be True
    ],
    description='Description string or f-string'
)

Syntax Requirements#

  1. Rule names must be non-local variables (cannot start with _):

    # Valid
    MyRule = Rule(...)
    
    # Invalid - will fail validation
    _MyRule = Rule(...)
    
  2. Descriptions must be string or f-string literals (not variables):

    # Valid
    description='Static description'
    description=f'User {UserId} triggered rule'
    
    # Invalid - will fail validation
    my_desc = 'description'
    description=my_desc
    
  3. when_all accepts a list of boolean conditions:

    • All conditions must evaluate to True for the rule to pass
    • Conditions can be comparisons, function calls, other rules, or boolean values

Data Types#

Basic Types#

# Integer
Count: int = JsonData(path='$.count')

# String
Name: str = JsonData(path='$.name')

# Boolean
IsActive: bool = JsonData(path='$.active')

# List
Items: list = JsonData(path='$.items')

Entity Types#

Entities are special types for identifiers that can have labels:

# Entity with string ID
UserId: Entity[str] = EntityJson(
    type='User',
    path='$.user_id',
    coerce_type=True  # Optional: convert value to expected type
)

# Entity with integer ID
PostId: Entity[int] = EntityJson(
    type='Post',
    path='$.post_id'
)

# Manually created entity
MyEntity = Entity(type='MyType', id='some_value')

Optional Types#

For fields that may not exist:

# Optional field - won't fail if missing
OptionalField: Optional[str] = JsonData(path='$.maybe_exists', required=False)

Operators#

Comparison Operators#

Value == 5              # Equals
Value != 5              # Not equals
Value > 5               # Greater than
Value >= 5              # Greater than or equal
Value < 5               # Less than
Value <= 5              # Less than or equal
Value in [1, 2, 3]      # In list
Value not in [1, 2, 3]  # Not in list

Arithmetic Operators#

5 + 3       # Addition
5 - 3       # Subtraction
5 * 3       # Multiplication
5 / 3       # Division
5 // 3      # Floor division
5 % 3       # Modulo
5 ** 3      # Power

Boolean Operators#

Condition1 and Condition2    # Logical AND
Condition1 or Condition2     # Logical OR
not Condition1               # Logical NOT

# In when_all, conditions are implicitly AND-ed:
Rule(when_all=[
    Cond1,
    Cond2,  # Both must be True
])

# Use 'or' for explicit OR logic:
Rule(when_all=[
    (Cond1 or Cond2),  # Either Cond1 or Cond2
    Cond3,              # AND Cond3
])

Null Checking#

Value != Null    # Check if value is NOT null
Value == Null    # Check if value IS null

File Organization#

Import - Include Other Files#

Use Import to include rules/features from other files:

Import(rules=[
    'models/base.sml',
    'models/user.sml',
    'rules/common.sml',
])

Requirements:

  • File paths must be relative
  • List must be lexicographically sorted
  • No duplicates allowed
  • Imported variables/rules are accessible in current file

Require - Conditionally Include Files#

Use Require for conditional or template-based includes:

# Always include
Require(rule='expensive_check.sml')

# Conditional include
Require(rule='ai_check.sml', require_if=ActionName == 'register')

# Template-based (f-string)
Require(rule=f'actions/{ActionName}.sml')

Note: Unlike Import, outputs from Required files are NOT accessible in the parent file.

Typical File Structure#

rules/
├── main.sml                 # Entry point
├── models/
│   ├── base.sml            # Common entities (UserId, etc.)
│   ├── user.sml            # User-related features
│   └── content.sml         # Content-related features
├── rules/
│   ├── spam.sml            # Spam detection rules
│   └── abuse.sml           # Abuse detection rules
└── actions/
    ├── register.sml        # Action-specific rules
    └── send_message.sml

Wiring Rules to Effects#

Rules by themselves don't do anything. Use WhenRules() to connect rules to effects:

WhenRules(
    rules_any=[
        Rule1,
        Rule2,
        Rule3,
    ],
    then=[
        DeclareVerdict(verdict='reject'),
        LabelAdd(entity=UserId, label='flagged'),
    ],
)

Semantics:

  • If ANY rule in rules_any evaluates to True, ALL effects in then execute
  • Failed rules don't prevent other rules from being checked
  • Failed effects don't prevent other effects from executing

Conditional Effects#

Use apply_if to make individual effects conditional:

WhenRules(
    rules_any=[MainRule],
    then=[
        LabelAdd(entity=UserId, label='basic_flag'),
        LabelAdd(entity=UserId, label='severe_flag', apply_if=SevereRule),
        LabelAdd(entity=UserId, label='repeat_offender', apply_if=RepeatOffenderRule),
    ],
)

Null Handling (CRITICAL)#

SML has unique null semantics that differ from most languages. Understanding this is critical.

Null Propagation Rule#

If a value evaluates to Null:

  1. The containing rule evaluates to Null (not False!)
  2. Any rule depending on it also becomes Null
  3. This propagates through the entire dependency chain

Example of Null Propagation#

# If $.missing_property doesn't exist...
Thing: int = JsonData(path='$.missing_property')

# This rule becomes Null (NOT False)
MyRule = Rule(when_all=[
    Thing > 1,
])

# This rule ALSO becomes Null (propagates!)
DependentRule = Rule(when_all=[
    MyRule,
])

Solutions to Null Issues#

Solution 1: Use required=False

Thing: Optional[int] = JsonData(path='$.maybe_exists', required=False)

Solution 2: Explicit null checks

SafeRule = Rule(when_all=[
    Thing != Null,  # Guard against null
    Thing > 1,
])

Solution 3: Use ResolveOptional

SafeThing: int = ResolveOptional(
    optional_value=MaybeThing,
    default_value=0
)

Complete Examples#

Example 1: Basic Spam Detection#

# models/base.sml
UserId: Entity[str] = EntityJson(
    type='User',
    path='$.user_id',
    coerce_type=True
)

MessageText: str = JsonData(path='$.message.text')
EventType: str = JsonData(path='$.event_type')

# rules/spam.sml
Import(rules=['models/base.sml'])

MessageLength = StringLength(s=MessageText)

ContainsSpamWords = RegexMatch(
    target=MessageText,
    pattern=r'(free money|click here|buy now)',
    case_insensitive=True
)

SpamMessage = Rule(
    when_all=[
        EventType == 'send_message',
        ContainsSpamWords,
        MessageLength > 50,
    ],
    description=f'Spam detected from user {UserId}'
)

WhenRules(
    rules_any=[SpamMessage],
    then=[
        DeclareVerdict(verdict='reject'),
        LabelAdd(entity=UserId, label='spammer', expires_after=TimeDelta(days=30)),
    ],
)

Example 2: New Account Risk Detection#

# models/user.sml
Import(rules=['models/base.sml'])

AccountCreatedAt: str = JsonData(path='$.user.created_at')
AccountAge = TimeSince(timestamp=AccountCreatedAt)
IsNewAccount = AccountAge < TimeDelta(days=7)

EmailAddress: Entity[str] = EntityJson(
    type='Email',
    path='$.user.email',
    coerce_type=True
)

EmailDomainStr = EmailDomain(email=JsonData(path='$.user.email'))

# rules/new_account.sml
Import(rules=['models/base.sml', 'models/user.sml'])

IsSuspiciousEmailDomain = EmailDomainStr in ['tempmail.com', 'throwaway.net']

HighRiskNewAccount = Rule(
    when_all=[
        IsNewAccount,
        IsSuspiciousEmailDomain,
        not HasLabel(entity=UserId, label='verified'),
    ],
    description=f'High-risk new account: {UserId} using {EmailAddress}'
)

WhenRules(
    rules_any=[HighRiskNewAccount],
    then=[
        DeclareVerdict(verdict='challenge'),
        LabelAdd(entity=UserId, label='needs_verification'),
        LabelAdd(entity=EmailAddress, label='suspicious_domain'),
    ],
)

Example 3: Multi-Tier Detection#

Import(rules=['models/base.sml', 'models/user.sml'])

# Tier 1: Basic suspicious activity
BasicSuspicious = Rule(
    when_all=[
        HasLabel(entity=UserId, label='previously_warned'),
        EventType == 'create_post',
    ],
    description=f'Previously warned user {UserId} creating content'
)

# Tier 2: Escalated risk
EscalatedRisk = Rule(
    when_all=[
        BasicSuspicious,
        HasLabel(entity=UserId, label='multiple_violations'),
    ],
    description=f'Repeat offender {UserId} detected'
)

# Tier 3: Severe risk
SevereRisk = Rule(
    when_all=[
        EscalatedRisk,
        IsNewAccount,
    ],
    description=f'Severe risk: new repeat offender {UserId}'
)

WhenRules(
    rules_any=[BasicSuspicious, EscalatedRisk, SevereRisk],
    then=[
        # Always apply basic flag
        LabelAdd(entity=UserId, label='flagged'),
        # Conditional escalations
        LabelAdd(entity=UserId, label='review_queue', apply_if=EscalatedRisk),
        DeclareVerdict(verdict='reject', apply_if=SevereRisk),
    ],
)

Common Patterns#

Pattern 1: Safe Field Access#

# For potentially missing fields, use required=False
MaybeField: Optional[str] = JsonData(path='$.optional.field', required=False)

# Then check for null before using
SafeRule = Rule(when_all=[
    MaybeField != Null,
    StringLength(s=MaybeField) > 10,
])

Pattern 2: Action-Specific Rules#

# main.sml
ActionName = GetActionName()
Require(rule=f'actions/{ActionName}.sml')

Pattern 3: Reusable Feature Definitions#

# models/features.sml
MessageLength = StringLength(s=MessageText)
IsLongMessage = MessageLength > 500
IsShortMessage = MessageLength < 10
ContainsUrls = ListLength(list=StringExtractURLs(s=MessageText)) > 0

# rules/detection.sml
Import(rules=['models/features.sml'])

SuspiciousLongMessage = Rule(when_all=[
    IsLongMessage,
    ContainsUrls,
])

Pattern 4: Label-Based State Machine#

# First offense
FirstOffense = Rule(when_all=[
    ViolatesPolicy,
    not HasLabel(entity=UserId, label='warned'),
])

# Second offense
SecondOffense = Rule(when_all=[
    ViolatesPolicy,
    HasLabel(entity=UserId, label='warned'),
    not HasLabel(entity=UserId, label='suspended'),
])

WhenRules(
    rules_any=[FirstOffense],
    then=[
        LabelAdd(entity=UserId, label='warned', expires_after=TimeDelta(days=30)),
    ],
)

WhenRules(
    rules_any=[SecondOffense],
    then=[
        LabelAdd(entity=UserId, label='suspended'),
        DeclareVerdict(verdict='reject'),
    ],
)

Validation Rules#

SML validates rules at compile time. Common validation errors:

Error Cause Fix
"rules must be stored in non-local features" Rule name starts with _ Remove underscore prefix
"use local feaures when possible" Feature that isn't useful to a moderator or evaluator in the UI starts with _ Add underscore prefix
"requires either a string literal or an f-string" Using a variable for description Use string or f-string literal
"import rules are not sorted" Import list not alphabetized Sort imports lexicographically
"imported file not found" Invalid path in Import Check file path exists
"unknown label" Label not in config Add label to labels config
"invalid regex pattern" Bad regex syntax Fix regex pattern

Quick Reference#

Rule Definition#

RuleName = Rule(
    when_all=[conditions],
    description='string' or f'f-string with {Variable}'
)

Effect Wiring#

WhenRules(
    rules_any=[Rule1, Rule2],
    then=[Effect1, Effect2],
)


### Null Safety Checklist
- [ ] Check if optional fields use `required=False`
- [ ] Add explicit `!= Null` checks before using potentially null values
- [ ] Consider using `ResolveOptional` for default values
- [ ] Test rules with missing data scenarios