# SML Rules Guide for AI Agents

This document provides a comprehensive guide to writing SML (Some Madeup Language) rules in Osprey. SML is a statically-typed subset of Python designed for writing detection and classification rules.

## Table of Contents

1. [Overview](#overview)
2. [Basic Concepts](#basic-concepts)
3. [Rule Structure](#rule-structure)
4. [Data Types](#data-types)
5. [Operators](#operators)
6. [Built-in Functions (UDFs)](#built-in-functions-udfs)
7. [File Organization](#file-organization)
8. [Wiring Rules to Effects](#wiring-rules-to-effects)
9. [Null Handling](#null-handling-critical)
10. [Complete Examples](#complete-examples)
11. [Common Patterns](#common-patterns)
12. [Validation Rules](#validation-rules)

---

## Overview

SML rules are used to evaluate incoming action data and trigger effects like verdicts and labels. Key characteristics:

- **Statically typed**: All types are checked at validation time
- **Python-like syntax**: Familiar syntax with restrictions for safety
- **Stateless logic**: Rules define conditions, not procedures
- **Event-driven**: Rules evaluate against incoming action JSON data

---

## Basic Concepts

### What is a Rule?

A rule is a named boolean expression that evaluates conditions against action data:

```python
MyRule = Rule(
    when_all=[
        # List of conditions - ALL must be True for the rule to pass
        Condition1,
        Condition2,
    ],
    description='Human-readable description of what this rule detects'
)
```

### What is an Effect?

Effects are actions taken when rules pass. They are triggered through `WhenRules()`:

- `DeclareVerdict(verdict='reject')` - Returns a verdict to the caller
- `LabelAdd(entity=UserId, label='flagged')` - Adds a label to an entity
- `LabelRemove(entity=UserId, label='flagged')` - Removes a label from an entity

### What is an Entity?

Entities are typed identifiers (like User IDs, emails, IPs) that can have labels attached:

```python
UserId: Entity[str] = EntityJson(type='User', path='$.user_id')
```

---

## Rule Structure

### Basic Rule Definition

```python
RuleName = Rule(
    when_all=[
        # Conditions go here - ALL must be True
    ],
    description='Description string or f-string'
)
```

### Syntax Requirements

1. **Rule names must be non-local variables** (cannot start with `_`):
   ```python
   # Valid
   MyRule = Rule(...)

   # Invalid - will fail validation
   _MyRule = Rule(...)
   ```

2. **Descriptions must be string or f-string literals** (not variables):
   ```python
   # Valid
   description='Static description'
   description=f'User {UserId} triggered rule'

   # Invalid - will fail validation
   my_desc = 'description'
   description=my_desc
   ```

3. **when_all accepts a list of boolean conditions**:
   - All conditions must evaluate to True for the rule to pass
   - Conditions can be comparisons, function calls, other rules, or boolean values

---

## Data Types

### Basic Types

```python
# Integer
Count: int = JsonData(path='$.count')

# String
Name: str = JsonData(path='$.name')

# Boolean
IsActive: bool = JsonData(path='$.active')

# List
Items: list = JsonData(path='$.items')
```

### Entity Types

Entities are special types for identifiers that can have labels:

```python
# Entity with string ID
UserId: Entity[str] = EntityJson(
    type='User',
    path='$.user_id',
    coerce_type=True  # Optional: convert value to expected type
)

# Entity with integer ID
PostId: Entity[int] = EntityJson(
    type='Post',
    path='$.post_id'
)

# Manually created entity
MyEntity = Entity(type='MyType', id='some_value')
```

### Optional Types

For fields that may not exist:

```python
# Optional field - won't fail if missing
OptionalField: Optional[str] = JsonData(path='$.maybe_exists', required=False)
```

---

## Operators

### Comparison Operators

```python
Value == 5              # Equals
Value != 5              # Not equals
Value > 5               # Greater than
Value >= 5              # Greater than or equal
Value < 5               # Less than
Value <= 5              # Less than or equal
Value in [1, 2, 3]      # In list
Value not in [1, 2, 3]  # Not in list
```

### Arithmetic Operators

```python
5 + 3       # Addition
5 - 3       # Subtraction
5 * 3       # Multiplication
5 / 3       # Division
5 // 3      # Floor division
5 % 3       # Modulo
5 ** 3      # Power
```

### Boolean Operators

```python
Condition1 and Condition2    # Logical AND
Condition1 or Condition2     # Logical OR
not Condition1               # Logical NOT

# In when_all, conditions are implicitly AND-ed:
Rule(when_all=[
    Cond1,
    Cond2,  # Both must be True
])

# Use 'or' for explicit OR logic:
Rule(when_all=[
    (Cond1 or Cond2),  # Either Cond1 or Cond2
    Cond3,              # AND Cond3
])
```

### Null Checking

```python
Value != Null    # Check if value is NOT null
Value == Null    # Check if value IS null
```

---

## File Organization

### Import - Include Other Files

Use `Import` to include rules/features from other files:

```python
Import(rules=[
    'models/base.sml',
    'models/user.sml',
    'rules/common.sml',
])
```

**Requirements:**
- File paths must be relative
- List must be **lexicographically sorted**
- No duplicates allowed
- Imported variables/rules are accessible in current file

### Require - Conditionally Include Files

Use `Require` for conditional or template-based includes:

```python
# Always include
Require(rule='expensive_check.sml')

# Conditional include
Require(rule='ai_check.sml', require_if=ActionName == 'register')

# Template-based (f-string)
Require(rule=f'actions/{ActionName}.sml')
```

**Note:** Unlike Import, outputs from Required files are NOT accessible in the parent file.

### Typical File Structure

```
rules/
├── main.sml                 # Entry point
├── models/
│   ├── base.sml            # Common entities (UserId, etc.)
│   ├── user.sml            # User-related features
│   └── content.sml         # Content-related features
├── rules/
│   ├── spam.sml            # Spam detection rules
│   └── abuse.sml           # Abuse detection rules
└── actions/
    ├── register.sml        # Action-specific rules
    └── send_message.sml
```

---

## Wiring Rules to Effects

Rules by themselves don't do anything. Use `WhenRules()` to connect rules to effects:

```python
WhenRules(
    rules_any=[
        Rule1,
        Rule2,
        Rule3,
    ],
    then=[
        DeclareVerdict(verdict='reject'),
        LabelAdd(entity=UserId, label='flagged'),
    ],
)
```

**Semantics:**
- If **ANY** rule in `rules_any` evaluates to True, **ALL** effects in `then` execute
- Failed rules don't prevent other rules from being checked
- Failed effects don't prevent other effects from executing

### Conditional Effects

Use `apply_if` to make individual effects conditional:

```python
WhenRules(
    rules_any=[MainRule],
    then=[
        LabelAdd(entity=UserId, label='basic_flag'),
        LabelAdd(entity=UserId, label='severe_flag', apply_if=SevereRule),
        LabelAdd(entity=UserId, label='repeat_offender', apply_if=RepeatOffenderRule),
    ],
)
```

---

## Null Handling (CRITICAL)

SML has unique null semantics that differ from most languages. **Understanding this is critical.**

### Null Propagation Rule

If a value evaluates to Null:
1. The containing rule evaluates to **Null** (not False!)
2. Any rule depending on it also becomes **Null**
3. This propagates through the entire dependency chain

### Example of Null Propagation

```python
# If $.missing_property doesn't exist...
Thing: int = JsonData(path='$.missing_property')

# This rule becomes Null (NOT False)
MyRule = Rule(when_all=[
    Thing > 1,
])

# This rule ALSO becomes Null (propagates!)
DependentRule = Rule(when_all=[
    MyRule,
])
```

### Solutions to Null Issues

**Solution 1: Use `required=False`**
```python
Thing: Optional[int] = JsonData(path='$.maybe_exists', required=False)
```

**Solution 2: Explicit null checks**
```python
SafeRule = Rule(when_all=[
    Thing != Null,  # Guard against null
    Thing > 1,
])
```

**Solution 3: Use ResolveOptional**
```python
SafeThing: int = ResolveOptional(
    optional_value=MaybeThing,
    default_value=0
)
```

---

## Complete Examples

### Example 1: Basic Spam Detection

```python
# models/base.sml
UserId: Entity[str] = EntityJson(
    type='User',
    path='$.user_id',
    coerce_type=True
)

MessageText: str = JsonData(path='$.message.text')
EventType: str = JsonData(path='$.event_type')

# rules/spam.sml
Import(rules=['models/base.sml'])

MessageLength = StringLength(s=MessageText)

ContainsSpamWords = RegexMatch(
    target=MessageText,
    pattern=r'(free money|click here|buy now)',
    case_insensitive=True
)

SpamMessage = Rule(
    when_all=[
        EventType == 'send_message',
        ContainsSpamWords,
        MessageLength > 50,
    ],
    description=f'Spam detected from user {UserId}'
)

WhenRules(
    rules_any=[SpamMessage],
    then=[
        DeclareVerdict(verdict='reject'),
        LabelAdd(entity=UserId, label='spammer', expires_after=TimeDelta(days=30)),
    ],
)
```

### Example 2: New Account Risk Detection

```python
# models/user.sml
Import(rules=['models/base.sml'])

AccountCreatedAt: str = JsonData(path='$.user.created_at')
AccountAge = TimeSince(timestamp=AccountCreatedAt)
IsNewAccount = AccountAge < TimeDelta(days=7)

EmailAddress: Entity[str] = EntityJson(
    type='Email',
    path='$.user.email',
    coerce_type=True
)

EmailDomainStr = EmailDomain(email=JsonData(path='$.user.email'))

# rules/new_account.sml
Import(rules=['models/base.sml', 'models/user.sml'])

IsSuspiciousEmailDomain = EmailDomainStr in ['tempmail.com', 'throwaway.net']

HighRiskNewAccount = Rule(
    when_all=[
        IsNewAccount,
        IsSuspiciousEmailDomain,
        not HasLabel(entity=UserId, label='verified'),
    ],
    description=f'High-risk new account: {UserId} using {EmailAddress}'
)

WhenRules(
    rules_any=[HighRiskNewAccount],
    then=[
        DeclareVerdict(verdict='challenge'),
        LabelAdd(entity=UserId, label='needs_verification'),
        LabelAdd(entity=EmailAddress, label='suspicious_domain'),
    ],
)
```

### Example 3: Multi-Tier Detection

```python
Import(rules=['models/base.sml', 'models/user.sml'])

# Tier 1: Basic suspicious activity
BasicSuspicious = Rule(
    when_all=[
        HasLabel(entity=UserId, label='previously_warned'),
        EventType == 'create_post',
    ],
    description=f'Previously warned user {UserId} creating content'
)

# Tier 2: Escalated risk
EscalatedRisk = Rule(
    when_all=[
        BasicSuspicious,
        HasLabel(entity=UserId, label='multiple_violations'),
    ],
    description=f'Repeat offender {UserId} detected'
)

# Tier 3: Severe risk
SevereRisk = Rule(
    when_all=[
        EscalatedRisk,
        IsNewAccount,
    ],
    description=f'Severe risk: new repeat offender {UserId}'
)

WhenRules(
    rules_any=[BasicSuspicious, EscalatedRisk, SevereRisk],
    then=[
        # Always apply basic flag
        LabelAdd(entity=UserId, label='flagged'),
        # Conditional escalations
        LabelAdd(entity=UserId, label='review_queue', apply_if=EscalatedRisk),
        DeclareVerdict(verdict='reject', apply_if=SevereRisk),
    ],
)
```

---

## Common Patterns

### Pattern 1: Safe Field Access

```python
# For potentially missing fields, use required=False
MaybeField: Optional[str] = JsonData(path='$.optional.field', required=False)

# Then check for null before using
SafeRule = Rule(when_all=[
    MaybeField != Null,
    StringLength(s=MaybeField) > 10,
])
```

### Pattern 2: Action-Specific Rules

```python
# main.sml
ActionName = GetActionName()
Require(rule=f'actions/{ActionName}.sml')
```

### Pattern 3: Reusable Feature Definitions

```python
# models/features.sml
MessageLength = StringLength(s=MessageText)
IsLongMessage = MessageLength > 500
IsShortMessage = MessageLength < 10
ContainsUrls = ListLength(list=StringExtractURLs(s=MessageText)) > 0

# rules/detection.sml
Import(rules=['models/features.sml'])

SuspiciousLongMessage = Rule(when_all=[
    IsLongMessage,
    ContainsUrls,
])
```

### Pattern 4: Label-Based State Machine

```python
# First offense
FirstOffense = Rule(when_all=[
    ViolatesPolicy,
    not HasLabel(entity=UserId, label='warned'),
])

# Second offense
SecondOffense = Rule(when_all=[
    ViolatesPolicy,
    HasLabel(entity=UserId, label='warned'),
    not HasLabel(entity=UserId, label='suspended'),
])

WhenRules(
    rules_any=[FirstOffense],
    then=[
        LabelAdd(entity=UserId, label='warned', expires_after=TimeDelta(days=30)),
    ],
)

WhenRules(
    rules_any=[SecondOffense],
    then=[
        LabelAdd(entity=UserId, label='suspended'),
        DeclareVerdict(verdict='reject'),
    ],
)
```

---

## Validation Rules

SML validates rules at compile time. Common validation errors:

| Error | Cause | Fix |
|-------|-------|-----|
| "rules must be stored in non-local features" | Rule name starts with `_` | Remove underscore prefix |
| "use local feaures when possible" | Feature that isn't useful to a moderator or evaluator in the UI starts with `_` | Add underscore prefix |
| "requires either a string literal or an f-string" | Using a variable for description | Use string or f-string literal |
| "import rules are not sorted" | Import list not alphabetized | Sort imports lexicographically |
| "imported file not found" | Invalid path in Import | Check file path exists |
| "unknown label" | Label not in config | Add label to labels config |
| "invalid regex pattern" | Bad regex syntax | Fix regex pattern |

---

## Quick Reference

### Rule Definition
```python
RuleName = Rule(
    when_all=[conditions],
    description='string' or f'f-string with {Variable}'
)
```

### Effect Wiring
```python
WhenRules(
    rules_any=[Rule1, Rule2],
    then=[Effect1, Effect2],
)


### Null Safety Checklist
- [ ] Check if optional fields use `required=False`
- [ ] Add explicit `!= Null` checks before using potentially null values
- [ ] Consider using `ResolveOptional` for default values
- [ ] Test rules with missing data scenarios