OR-1 dataflow CPU sketch

feat: implement frame matching lanes for concurrent PE matching

Orual 264195f0 65613978

+3952 -139
+30 -13
CLAUDE.md
··· 106 106 - `tests/test_migration_cleanup.py` — Verifies removed types (SysToken, CfgOp, etc.) are absent from codebase 107 107 - `tests/test_pe_events.py` — PE event emission tests (TokenReceived, Matched, Executed, Emitted, IRAMWritten, FrameAllocated, FrameFreed, FrameSlotWritten, TokenRejected) 108 108 - `tests/test_pe_frames.py` — Frame-based PE matching, routing, and lifecycle tests 109 + - `tests/test_pe_lanes.py` — Lane-based matching tests (ALLOC_SHARED, FREE_LANE, smart FREE, lane exhaustion, pipelining) 109 110 - `tests/test_sm_events.py` — SM event emission tests (CellWritten, DeferredRead, DeferredSatisfied, ResultSent) 110 111 - `tests/test_cycle_timing.py` — Cycle-accurate timing verification tests 111 112 - `tests/test_network_events.py` — Network-level event propagation tests ··· 163 164 - `FrameSlotValue = int | FrameDest | None` -- type alias for frame slot contents 164 165 - `OutputStyle` enum -- INHERIT, CHANGE_TAG, SINK for output routing decisions 165 166 - `TokenKind` enum -- DYADIC, MONADIC, INLINE for token kind classification 166 - - `FrameOp(IntEnum)` -- ALLOC, FREE for frame lifecycle control tokens 167 + - `FrameOp(IntEnum)` -- ALLOC, FREE, ALLOC_SHARED, FREE_LANE for frame lifecycle control tokens 167 168 - `is_monadic_alu(op: ALUOp) -> bool` -- canonical source of truth for monadic ALU op classification (used by `emu/pe.py` and `asm/opcodes.py`) 168 169 169 170 ### ALU (emu/alu.py) ··· 181 182 Frame-based processing element with activation context management. 182 183 183 184 **Frame Storage:** 184 - - `frames: list[list[FrameSlotValue]]` -- 2D array [frame_id][slot_idx] holding FrameDest objects and constants 185 - - `tag_store: dict[int, int]` -- maps act_id → frame_id for activation-to-frame lookup 186 - - `presence: list[list[bool]]` -- [frame_id][match_slot] for dyadic operand waiting state 187 - - `port_store: list[list[Optional[Port]]]` -- [frame_id][match_slot] for operand port metadata 185 + - `frames: list[list[FrameSlotValue]]` -- 2D array [frame_id][slot_idx] holding FrameDest objects and constants (shared across all lanes) 186 + - `tag_store: dict[int, tuple[int, int]]` -- maps act_id → (frame_id, lane) for activation-to-frame-and-lane lookup 187 + - `match_data: list[list[list[Optional[int]]]]` -- 3D array [frame_id][match_slot][lane] for operand values waiting for partner 188 + - `presence: list[list[list[bool]]]` -- 3D array [frame_id][match_slot][lane] for dyadic operand waiting state 189 + - `port_store: list[list[list[Optional[Port]]]]` -- 3D array [frame_id][match_slot][lane] for operand port metadata 190 + - `lane_count: int` -- number of matching lanes per frame 191 + - `lane_free: dict[int, set[int]]` -- per-frame set of available lane IDs (created on ALLOC, deleted on full FREE) 188 192 - `free_frames: list[int]` -- pool of unallocated frame IDs 189 193 - `iram: dict[int, Instruction]` -- instruction memory indexed by offset 190 194 ··· 194 198 - Monadic CMToken: 4 cycles (dequeue + IFETCH + EXECUTE + EMIT) 195 199 196 200 **Matching Logic:** 197 - - DyadToken arrives with act_id: look up frame_id via tag_store, then check presence[frame_id][iram_offset] 198 - - If slot empty: store token.data and token.port, set presence bit, wait for partner 199 - - If slot occupied: retrieve partner data and port, clear presence bit, fire instruction with both operands 201 + - DyadToken arrives with act_id: look up (frame_id, lane) via tag_store 202 + - Match slot is derived from token.offset: match_slot = token.offset % matchable_offsets 203 + - If presence[frame_id][match_slot][lane] is False: store token.data in match_data[frame_id][match_slot][lane], store token.port in port_store[frame_id][match_slot][lane], set presence bit to True, wait for partner 204 + - If presence[frame_id][match_slot][lane] is True: retrieve partner data and port from match_data and port_store, clear presence bit, fire instruction with both operands 200 205 - Port ordering: partner with Port.L goes to left operand; Port.R to right operand 206 + - Match data, presence, and port storage are per-lane; frame constants/destinations (in frames) remain shared across all lanes 201 207 202 208 **Output Routing** (determined by `Instruction.output`): 203 209 - `OutputStyle.INHERIT` -- routes to destinations specified in frame slots ··· 213 219 **Output logging:** 214 220 - `PE.output_log: list` records every token emitted (for testing and tracing) 215 221 222 + **Frame Control Operations** (`_handle_frame_control`): 223 + - `ALLOC` -- allocates a fresh frame from free_frames, assigns lane 0, initializes lane_free with remaining lanes 224 + - `FREE` -- smart free: removes act_id from tag_store, clears lane match state. If other activations share the frame, returns lane to lane_free (frame_freed=False). If last lane, returns frame to free_frames and clears frame slots (frame_freed=True) 225 + - `ALLOC_SHARED` -- shared allocation: looks up parent act_id (from payload) in tag_store, finds parent's frame_id, assigns next free lane from lane_free. Rejects if parent not found or no free lanes 226 + - `FREE_LANE` -- lane-only free: removes act_id, clears lane match state, returns lane to lane_free. Never returns frame to free_frames (frame_freed always False) 227 + 228 + **ALLOC_REMOTE** (RoutingOp in `_run` pipeline): 229 + - Reads fref+0 (target PE), fref+1 (target act_id), fref+2 (parent act_id) from frame constants 230 + - If fref+2 is non-zero: emits FrameControlToken with ALLOC_SHARED op and parent act_id as payload 231 + - If fref+2 is zero: emits FrameControlToken with ALLOC op (fresh frame allocation) 232 + 216 233 **PELocalWriteToken handling:** 217 234 - Writes data to frame slot at specified region/slot within the act_id's frame (1 cycle) 218 235 ··· 262 279 - `System.load(tokens: list[Token])` -- spawns SimPy process that calls send() for each token in order 263 280 264 281 **PEConfig (emu/types.py):** 265 - - `pe_id: int`, `iram: dict[int, Instruction] | None`, `frame_count: int = 8`, `frame_slots: int = 64`, `matchable_offsets: int = 8` 282 + - `pe_id: int`, `iram: dict[int, Instruction] | None`, `frame_count: int = 8`, `frame_slots: int = 64`, `matchable_offsets: int = 8`, `lane_count: int = 4` 266 283 - `initial_frames: Optional[dict[int, list[FrameSlotValue]]]` -- pre-loaded frame data 267 - - `initial_tag_store: Optional[dict[int, int]]` -- pre-loaded act_id -> frame_id mappings 284 + - `initial_tag_store: Optional[dict[int, tuple[int, int]]]` -- pre-loaded act_id → (frame_id, lane) mappings 268 285 - `allowed_pe_routes: Optional[set[int]]` -- if set, restrict PE route_table to these PE IDs 269 286 - `allowed_sm_routes: Optional[set[int]]` -- if set, restrict PE sm_routes to these SM IDs 270 287 - `on_event: EventCallback | None` -- if set, PE fires `SimEvent` for every token receive, match, execute, emit, frame alloc/free, slot write, and rejection ··· 285 302 - `Executed(time, component, op, result, bool_out)` -- PE executed an ALU instruction 286 303 - `Emitted(time, component, token)` -- PE emitted an output token 287 304 - `IRAMWritten(time, component, offset, count)` -- PE wrote instructions to IRAM 288 - - `FrameAllocated(time, component, act_id, frame_id)` -- PE allocated a frame 289 - - `FrameFreed(time, component, act_id, frame_id)` -- PE freed a frame 305 + - `FrameAllocated(time, component, act_id, frame_id, lane)` -- PE allocated a frame (lane indicates which matching lane was assigned) 306 + - `FrameFreed(time, component, act_id, frame_id, lane, frame_freed)` -- PE freed a frame lane (frame_freed=True if physical frame returned to pool) 290 307 - `FrameSlotWritten(time, component, frame_id, slot, value)` -- PE wrote to a frame slot 291 308 - `TokenRejected(time, component, token, reason)` -- PE rejected a token (e.g., act_id not in tag store) 292 309 - `CellWritten(time, component, addr, old_pres, new_pres)` -- SM cell presence changed ··· 321 338 **StateSnapshot (monitor/snapshot.py):** 322 339 - `capture(system) -> StateSnapshot` reads live PE/SM state into frozen dataclasses 323 340 - `StateSnapshot(sim_time, next_time, pes: dict[int, PESnapshot], sms: dict[int, SMSnapshot])` 324 - - `PESnapshot(pe_id, frames, tag_store, presence, port_store, free_frames, iram, input_queue, output_log)` -- frame-based PE state 341 + - `PESnapshot(pe_id, iram, frames, tag_store, presence, port_store, match_data, free_frames, lane_count, input_queue, output_log)` -- frame-based PE state with 3D match storage (presence, port_store, match_data are all [frame_id][match_slot][lane]), tag_store mapping act_id → (frame_id, lane) tuples, and lane_count field 325 342 - `SMSnapshot(sm_id, cells: dict[int, SMCellSnapshot], deferred_read, t0_store, input_queue)` 326 343 327 344 **WebSocket protocol (monitor/server.py):**
+2 -2
asm/codegen.py
··· 380 380 layout = act_nodes[0].frame_layout 381 381 if layout is None: 382 382 initial_frames[frame_id] = {} 383 - initial_tag_store[act_id] = frame_id 383 + initial_tag_store[act_id] = (frame_id, 0) 384 384 continue 385 385 386 386 # Build frame slot values for this activation as a sparse dict. ··· 409 409 frame_slots_dict[slot + i] = pack_flit1(fd) 410 410 411 411 initial_frames[frame_id] = frame_slots_dict 412 - initial_tag_store[act_id] = frame_id 412 + initial_tag_store[act_id] = (frame_id, 0) 413 413 414 414 # Create PEConfig 415 415 config = PEConfig(
+2
cm_inst.py
··· 88 88 class FrameOp(IntEnum): 89 89 ALLOC = 0 90 90 FREE = 1 91 + ALLOC_SHARED = 2 92 + FREE_LANE = 3 91 93 92 94 93 95 @dataclass(frozen=True)
+497
design-notes/frame-lanes-for-concurrent-matching.md
··· 1 + # Frame Lanes: Concurrent Matching Within a Single Activation 2 + 3 + Design note extending `pe-redesign-frames-and-pipeline.md` with per-frame 4 + matching lanes. Addresses the problem of multiple simultaneous pending operands 5 + for the same dyadic instruction within a single activation — required for loops 6 + and recursion. 7 + 8 + ## Companion Documents 9 + 10 + - `pe-redesign-frames-and-pipeline.md` — base architecture (frames, pipeline, 11 + approaches A/B/C) 12 + 13 + --- 14 + 15 + ## Problem Statement 16 + 17 + The current frame model maps each `activation_id` to exactly one `frame_id`. 18 + Within that frame, each dyadic instruction gets one matching slot indexed by 19 + `offset % matchable_offsets`. This means at most one pending operand per 20 + instruction per activation at any time. 21 + 22 + This fails for loops. Consider a counted loop where a dyadic ADD instruction 23 + receives feedback from its own INC output: 24 + 25 + ``` 26 + iteration 1: L operand arrives at offset 3 → stored at presence[frame][3] 27 + iteration 2: L operand arrives at offset 3 → collision! presence[frame][3] 28 + is already set, so the hardware thinks this is a MATCH 29 + (pairing two L operands from different iterations) 30 + ``` 31 + 32 + Without disambiguation, the second iteration's L operand is incorrectly paired 33 + with the first iteration's L operand instead of waiting for its own R partner. 34 + 35 + The original design solved this with a 2-bit generation counter per context 36 + slot, but that consumed token payload bits (6 bits total for ctx+gen) and 37 + limited the instruction offset field. The frame redesign dropped the generation 38 + field to widen offset to 8 bits and simplify the token format. 39 + 40 + We need a mechanism that provides generation-like disambiguation without 41 + re-adding a token field. 42 + 43 + --- 44 + 45 + ## Proposed Solution: Matching Lanes 46 + 47 + ### Core Idea 48 + 49 + Split the tag store mapping from one-to-one into many-to-one. Multiple 50 + `activation_id` values can map to the **same physical frame** but with different 51 + **lane indices**. Lanes share the frame's constants and destinations (written 52 + once at setup) but provide independent matching slots. 53 + 54 + ``` 55 + tag_store[act_id] → (frame_id, lane) 56 + 57 + Constants/dests: frames[frame_id][slot] — shared across all lanes 58 + Match data: match_data[frame_id][offset][lane] — per-lane 59 + Presence: presence[frame_id][offset][lane] — per-lane (1 bit) 60 + Port: port_store[frame_id][offset][lane] — per-lane (1 bit) 61 + ``` 62 + 63 + ### Frame Control Token Extensions 64 + 65 + The frame control token format (prefix `011+00`) has 3 spare bits in flit 1 66 + and a 16-bit payload in flit 2: 67 + 68 + ``` 69 + Frame control flit 1: [0][1][1][PE:2][00][op:1][act_id:3][spare:3] = 16 bits 70 + Frame control flit 2: [payload:16] 71 + ``` 72 + 73 + The `op` field currently encodes ALLOC (0) and FREE (1). We split these into 74 + four operations using 1 spare bit: 75 + 76 + ``` 77 + op spare[2] operation 78 + ── ──────── ───────── 79 + 0 0 ALLOC_NEW allocate fresh frame, assign lane 0 80 + 0 1 ALLOC_SHARED share existing frame, assign next free lane 81 + 1 0 FREE_FRAME release lane AND return frame to free list 82 + 1 1 FREE_LANE release lane only, frame stays allocated 83 + ``` 84 + 85 + **ALLOC_SHARED** uses flit 2 to carry the parent activation_id whose frame 86 + should be shared: 87 + 88 + ``` 89 + ALLOC_SHARED flit 2: [parent_act_id:3][spare:13] 90 + ``` 91 + 92 + The PE looks up `tag_store[parent_act_id] → (frame_id, _)`, picks the next 93 + free lane from `lane_free[frame_id]`, and records 94 + `tag_store[act_id] = (frame_id, lane)`. 95 + 96 + **FREE_LANE** removes the act_id → (frame, lane) mapping, clears that lane's 97 + presence/port bits across all matchable offsets, and marks the lane as free. 98 + The frame remains allocated (constants/dests preserved). 99 + 100 + **FREE_FRAME** does the same as FREE_LANE, then additionally returns the 101 + frame to the free list. Should only be issued when all lanes for that frame 102 + are free (or the PE can force-clear remaining lanes). 103 + 104 + ### Lifecycle for a Loop 105 + 106 + ``` 107 + 1. ALLOC_NEW(act_id=0) → frame 2, lane 0 108 + 2. Setup: write constants/dests to frame 2 109 + 3. Iteration 1 seed tokens use act_id=0 110 + 111 + 4. Before iteration 2: 112 + ALLOC_SHARED(act_id=1, parent=0) → frame 2, lane 1 113 + Iteration 2 seed tokens use act_id=1 114 + 115 + 5. When iteration 1 completes: 116 + FREE_LANE(act_id=0) → lane 0 freed, frame 2 stays 117 + 118 + 6. Before iteration 3: 119 + ALLOC_SHARED(act_id=2, parent=1) → frame 2, lane 0 (recycled) 120 + Iteration 3 seed tokens use act_id=2 121 + 122 + 7. When all iterations done: 123 + FREE_FRAME(act_id=last) → frame 2 returned to free list 124 + ``` 125 + 126 + Constants are written once in step 2. Each iteration gets its own matching 127 + lanes via a different act_id sharing the same frame. 128 + 129 + ### ABA Safety 130 + 131 + The ABA concern: a stale token from iteration 1 (act_id=0) arrives after 132 + act_id=0 has been freed and re-allocated for iteration 3 (act_id=2). 133 + 134 + This is safe because: 135 + - FREE_LANE removes act_id=0 from the tag store entirely 136 + - When act_id=2 is allocated for iteration 3, it uses act_id=2 (not act_id=0) 137 + - Any stale token with act_id=0 hits "act_id not in tag store" → rejected 138 + 139 + With 3-bit act_id (8 values) and at most 4 lanes per frame, there are 4 IDs 140 + of ABA distance between allocation and re-use of the same act_id value. Given 141 + that stale tokens drain within single-digit cycles, this is sufficient. 142 + 143 + --- 144 + 145 + ## Hardware Impact by Approach 146 + 147 + ### Lane Count 148 + 149 + L = number of lanes per frame. Practical values: 2 (1 bit) or 4 (2 bits). 150 + 151 + With L=4 and 4 frames: 16 possible (frame, lane) pairs. With 8 matchable 152 + offsets: 128 match slots total. This provides 4 simultaneous pending operands 153 + per instruction per frame — enough for most loop depths. Deeply nested 154 + recursion beyond L would require frame splitting across PEs (the assembler 155 + already supports this). 156 + 157 + ### Approach C: 74LS670 Lookup (Recommended v0) 158 + 159 + **Tag store changes:** 160 + 161 + Currently: 2× 670, act_id → {valid:1, frame_id:2, spare:1} 162 + 163 + With lanes: 2× 670, act_id → {valid:1, frame_id:2, lane:1} for L=2 164 + (the spare bit becomes the lane index). For L=4, we need 165 + {valid:1, frame_id:2, lane:2} = 5 bits, which exceeds one 670's 4-bit width. 166 + 167 + Options: 168 + - **L=2 (1-bit lane):** fits in existing 670 layout. Zero additional chips for 169 + tag store. The spare bit becomes the lane bit. 170 + - **L=4 (2-bit lane):** need a third 670 to hold the extra lane bit (and 171 + valid moves there too). +1 chip. 172 + 173 + **Presence/port metadata changes:** 174 + 175 + Currently: 4× 670 indexed by frame_id, each word holds presence+port for 176 + 2 offsets across 4 frames. Layout: 177 + 178 + ``` 179 + 670 chip N (offsets 2N, 2N+1): 180 + word[frame_id] = {pres_2N:1, port_2N:1, pres_2N+1:1, port_2N+1:1} 181 + ``` 182 + 183 + With lanes, the index becomes `[frame_id:2][lane]` instead of just 184 + `[frame_id:2]`. The 670 has 4 words, so: 185 + 186 + - **L=2:** index is `[frame_id:2][lane:1]` = 3 bits, but the 670 only has 187 + 2-bit addressing (4 words). We need to double the 670 count: 8× 670 for 188 + presence+port, with lane as chip-select. **+4 chips.** 189 + 190 + Alternatively, re-pack: each 670 word holds presence+port for 1 offset 191 + across 2 lanes: `{pres_L0:1, port_L0:1, pres_L1:1, port_L1:1}`. Then 192 + we need 8 offsets × 1 chip each = 8× 670, indexed by frame_id (2 bits), 193 + with offset selecting the chip. This is the same +4 chips but cleaner. 194 + 195 + - **L=4:** index is `[frame_id:2][lane:2]` = 4 bits. The 670 has 4 words 196 + (2-bit address), so we'd need 4 670s per offset-pair, one per frame_id. 197 + That's 4 × 4 = 16 670s for presence+port. Impractical. At L=4, the 198 + presence/port metadata should move to SRAM or use a different register 199 + approach. 200 + 201 + **Match operand data:** 202 + 203 + Currently in frame SRAM at `[1][frame_id:2][match_slot:3]` (match slots are 204 + the low 8 offsets within the frame). With lanes, the address becomes 205 + `[1][frame_id:2][match_slot:3][lane]`. 206 + 207 + - **L=2:** address is `[1][frame_id:2][match_slot:3][lane:1]` = 7 bits within 208 + the frame region. 128 entries × 16 bits = 256 bytes. Well within SRAM 209 + capacity. No additional chips. 210 + 211 + - **L=4:** address is `[1][frame_id:2][match_slot:3][lane:2]` = 8 bits. 212 + 256 entries × 16 bits = 512 bytes. Still fits in SRAM. 213 + 214 + **Lane free tracking:** 215 + 216 + Per frame, track which lanes are free. For L=2: 1 flip-flop per frame × 4 217 + frames = 4 bits. For L=4: a 2-bit counter or 4-bit bitmask per frame = 8–16 218 + bits. Either fits in a single 74LS174 (hex D flip-flop) or similar. **+1 chip.** 219 + 220 + **Approach C summary (L=2):** 221 + 222 + | Component | Before | After | Delta | 223 + |----------------------------|--------|--------|-------| 224 + | act_id → (frame_id, lane) | 2× 670 | 2× 670 | 0 | 225 + | Presence + port metadata | 4× 670 | 8× 670 | +4 | 226 + | Bit select mux | 1–2 | 1–2 | 0 | 227 + | Lane free tracking | 0 | 1 | +1 | 228 + | Frame SRAM | 2 | 2 | 0 | 229 + | **Total delta** | | | **+5 chips** | 230 + 231 + **Approach C summary (L=4):** 232 + 233 + Presence/port at L=4 exceeds practical 670 count. Two options: 234 + 235 + (a) Move presence/port to SRAM. Pack all 4 lanes' presence+port for one 236 + (frame, offset) into a single 16-bit word: 237 + `{pres0:1, port0:1, pres1:1, port1:1, ..., pres3:1, port3:1, spare:8}`. 238 + Read in 1 SRAM cycle, same SRAM chip as frame data. Adds 1 cycle to matching 239 + (read presence word before reading/writing match data). **+0 chips, +1 cycle.** 240 + 241 + (b) Use 74LS189 register files instead of 670s for presence/port. 189s are 242 + 16-word × 4-bit, addressed by `[frame_id:2][offset:2]` = 4 bits. Two 189s 243 + (8 bits) hold presence+port for 4 lanes at one (frame, low-2-offset) combo. 244 + With offset[2] as chip-select, that's 4× 189. **+4 chips (replacing 4× 670 245 + with 4× 189)**, net change depends on baseline. 246 + 247 + Option (a) is simpler and fits the v0 "minimal chips" philosophy. The extra 248 + SRAM cycle for presence is identical to Approach A's tag read — it just 249 + applies to the lane dimension instead. 250 + 251 + ### Approach A: Set-Associative Tags in Frame SRAM 252 + 253 + Approach A already uses SRAM for tag storage. Lanes change the tag word format. 254 + 255 + Currently, each tag word packs 4-way set-associative entries: 256 + 257 + ``` 258 + {way0_valid:1, way0_act:3, way1_valid:1, way1_act:3, ...} = 16 bits 259 + ``` 260 + 261 + With lanes, the tag word already supports the concept — each way IS effectively 262 + a lane. The act_id comparison finds the matching way, and the way index IS the 263 + lane. The only change: ALLOC_SHARED must write the same frame region for 264 + multiple act_ids. 265 + 266 + Actually, Approach A's set-associative structure already provides something 267 + very close to lanes. The ways in the tag word serve the same purpose — multiple 268 + act_ids can have pending operands at the same offset, disambiguated by act_id 269 + comparison. The 4-way associativity gives 4 simultaneous pending matches per 270 + offset across ALL activations. 271 + 272 + **Key difference:** in Approach A, the ways are shared across all activations 273 + at that offset (global pool). The lane model gives per-frame isolation. Under 274 + Approach A, if two different functions both have a pending operand at offset 3, 275 + they consume 2 of the 4 ways. Under the lane model, each frame has its own L 276 + lanes — no cross-activation contention. 277 + 278 + **Approach A with lanes:** the tag word becomes: 279 + 280 + ``` 281 + {way0_valid:1, way0_act:3, way0_lane:1, way1_valid:1, way1_act:3, way1_lane:1, ...} 282 + ``` 283 + 284 + This doesn't fit in 16 bits for 4 ways with L=2 (5 bits × 4 = 20 bits). Would 285 + require wider tag words (32-bit SRAM or 2 reads per tag lookup), or reducing 286 + to 2 ways. 287 + 288 + Alternatively, since act_id already implies frame_id (via the tag store), 289 + Approach A doesn't benefit from lanes in the same way. The set-associative 290 + structure already provides the disambiguation — adding lanes on top is 291 + redundant. **Approach A doesn't need lanes; its ways serve the same purpose.** 292 + 293 + The real question for Approach A is: does 4-way associativity (global, shared) 294 + provide enough concurrent matching depth? For loops: yes, as long as no more 295 + than 4 iterations have pending operands at the same offset simultaneously. 296 + For mixed workloads with multiple activations: depends on access patterns. 297 + 298 + ### Approach B: Full Register-File Match Pool 299 + 300 + Original Approach B: 8-entry global pool with `{valid:1, act_id:3, offset:6, 301 + port:1, data:16}` per entry, fully associative. 302 + 303 + **With lanes, the pool needs a lane field:** `{valid:1, act_id:3, offset:3, 304 + lane:1, port:1, data:16}` for L=2. The comparator now matches on 305 + `(act_id, offset, lane)` — but lane is derived from act_id via the tag store, 306 + not carried in the token. So the comparator actually still matches on 307 + `(act_id, offset)` as before. 308 + 309 + Wait — that's the key insight. Since the token carries act_id (not frame_id + 310 + lane), and different iterations use different act_ids, the existing Approach B 311 + pool already disambiguates correctly without any lane concept at all: 312 + 313 + - Iteration 1 (act_id=0): L operand stored as `{act_id=0, offset=3, ...}` 314 + - Iteration 2 (act_id=1): L operand stored as `{act_id=1, offset=3, ...}` 315 + - These don't match because act_id differs. 316 + 317 + **Approach B already handles concurrent matching across iterations, provided 318 + each iteration uses a distinct act_id.** The only addition is the 319 + ALLOC_SHARED/FREE_LANE mechanism to allow multiple act_ids to share one frame. 320 + No changes to the match pool hardware at all. 321 + 322 + The constraint: the global pool has 8 entries total. With 4 iterations × 2 323 + pending operands each = 8 entries consumed. A tight but functional limit. 324 + 325 + **B+670 variants:** same analysis. The 670s resolve act_id → frame_id for 326 + constant/dest access. The match pool (whether fully indexed or semi-CAM) uses 327 + act_id directly and already disambiguates. **Zero additional match hardware 328 + for lanes.** 329 + 330 + ### Approach B+670 Indexed (Dedicated Register Slots) 331 + 332 + Currently: `[frame_id:2][offset:2:0]` = 5-bit address, 32 entries dedicated. 333 + One entry per (frame, offset) pair. 334 + 335 + With ALLOC_SHARED: multiple act_ids map to the same frame_id, but they get 336 + different lanes. The match data must be indexed by `[frame_id:2][offset:3] 337 + [lane]` instead of just `[frame_id:2][offset:3]`. 338 + 339 + - **L=2:** 6-bit address, 64 entries. 8× 189 chips (up from 8). Actually, 340 + the original B+670 indexed already uses 8× 189 for 32 entries of 16-bit 341 + data. Doubling to 64 entries means 16× 189. That's a lot. Alternatively, 342 + use SRAM for the doubled range: `[frame_id:2][offset:3][lane:1]` = 6 bits 343 + within the match region. 64 entries × 16 bits = 128 bytes. Easily fits in 344 + the shared SRAM chip. But then we lose the "zero SRAM cycles for matching" 345 + advantage. 346 + 347 + Better option: keep register file, use the 670 presence bits to encode lane. 348 + The 670 already stores `{presence, port}` per (frame, offset). With L=2, 349 + expand to `{presence_L0, port_L0, presence_L1, port_L1}`. This is exactly 350 + the same as the Approach C lane expansion above: 8× 670 for 351 + presence+port. **The match data register file doubles, the presence 670s 352 + double. +8 register chips, +4 670 chips = +12 chips.** Steep. 353 + 354 + For B+670 indexed, the more practical approach at L>1 is to fall back to 355 + SRAM for match data and keep the 670s only for act_id resolution and 356 + presence tracking. This effectively converts B+670 indexed into Approach C 357 + with lanes — SRAM match data, 670 metadata. 358 + 359 + ### B+670 Semi-CAM (Associative Within Frame) 360 + 361 + Currently: per-frame associative pool with W ways. Tag stores 362 + `{valid:1, offset:3}` per way. Comparators search offset within frame. 363 + 364 + With lanes: each entry's tag becomes `{valid:1, offset:3, lane:1}` for L=2. 365 + The comparator matches on `(offset, lane)` where lane comes from the 670 366 + lookup. **+1 bit per comparator.** For 3-bit offset + 1-bit lane = 4-bit 367 + compare, each 74LS85 (4-bit comparator) handles one entry exactly. 368 + 369 + The pool's way count (W) determines how many simultaneous pending matches 370 + per frame. With L=2 and W=4: 4 pending matches, shared across 2 lanes. 371 + Each lane can use up to W entries (the pool is shared within the frame, 372 + not partitioned per lane). This is actually better than strict per-lane 373 + isolation — if lane 0 has 3 pending and lane 1 has 1, they use 4 entries 374 + total without wasting any. 375 + 376 + **Semi-CAM hardware delta for L=2:** 377 + 378 + | Component | Before (W=2) | After (W=2, L=2) | Delta | 379 + |-------------------|--------------|-------------------|-------| 380 + | Tag registers | 2 chips | 2 chips | 0 | 381 + | Comparators | 2 chips | 2 chips | 0 | 382 + | Data registers | 4 chips | 4 chips | 0 | 383 + | **Total** | | | **0** | 384 + 385 + The only change is the tag width grows by 1 bit (offset:3 → offset:3 + 386 + lane:1 = 4 bits), which fits in the same comparator. **Zero additional chips 387 + for the semi-CAM itself.** 388 + 389 + The ALLOC_SHARED / FREE_LANE logic: +1 chip (lane free tracking). 390 + The 670 tag store: +0 chips (lane bit fits in spare bit). 391 + 392 + **This makes B+670 semi-CAM the most natural fit for lanes.** The 393 + associative pool already handles variable-occupancy matching; adding a 394 + lane bit to the tag is free in hardware. 395 + 396 + --- 397 + 398 + ## Approach Comparison with Lanes 399 + 400 + | Property | A | C (L=2) | C (L=4) | B+670 semi W=2 | B+670 semi W=4 | 401 + |---------------------------|--------------|-------------|--------------|----------------|----------------| 402 + | Needs lanes at all? | no (ways) | yes | yes | yes | yes | 403 + | Extra chips for lanes | 0 | +5 | +1 (SRAM) | +1 | +1 | 404 + | Pending matches/frame | 4 (shared) | 2 per lane | 4 per lane | W (shared) | W (shared) | 405 + | Extra SRAM cycles | 0 | 0 | +1 (pres) | 0 | 0 | 406 + | Cross-activation contention | yes (global) | no | no | no | no | 407 + | Implementation complexity | none | moderate | moderate | minimal | minimal | 408 + 409 + **Winner for lanes: B+670 semi-CAM.** Zero additional match hardware, lanes 410 + come free via the existing associative tag. The 670 tag store absorbs the 411 + lane bit in its spare capacity. Only cost is 1 chip for lane free tracking 412 + and the ALLOC_SHARED/FREE_LANE control logic. 413 + 414 + **Runner-up: Approach C with L=2.** +5 chips (all 670s for doubled 415 + presence/port). Simple, well-understood, but the 670 count is getting high 416 + (10 670s per PE). 417 + 418 + --- 419 + 420 + ## SRAM Address Map Update 421 + 422 + With L=2 lanes, the frame SRAM match region doubles: 423 + 424 + ``` 425 + v0 address space with lanes (L=2): 426 + 427 + IRAM region: [0][offset:8] instruction templates 428 + capacity: 256 instructions (512 bytes) 429 + 430 + Frame region: [1][frame_id:2][slot:6] per-activation storage 431 + capacity: 4 frames × 64 slots = 256 entries (512 bytes) 432 + (constants, destinations, accumulators — shared across lanes) 433 + 434 + Match region: (Approach C / SRAM-based) 435 + [1][1][frame_id:2][offset:3][lane:1] match operand data 436 + capacity: 4 × 8 × 2 = 64 entries (128 bytes) 437 + (carved from frame region address space, or separate region) 438 + ``` 439 + 440 + Total: 512 + 512 + 128 = 1152 bytes. Still well under 32Kx8 capacity. 441 + 442 + For B+670 semi-CAM: match data lives in register files, not SRAM. The SRAM 443 + address map is unchanged (frame region only holds shared constants/dests). 444 + 445 + --- 446 + 447 + ## Assembler Impact 448 + 449 + The assembler must: 450 + 451 + 1. **Detect loops and recursion** that require concurrent matching. Static 452 + analysis of feedback arcs in the dataflow graph. 453 + 454 + 2. **Allocate activation IDs per iteration.** The loop prologue emits 455 + ALLOC_SHARED for each new iteration's act_id before injecting seed tokens. 456 + The loop epilogue emits FREE_LANE when an iteration completes. 457 + 458 + 3. **Track lane depth.** If a loop's concurrency exceeds L (or W for 459 + semi-CAM), the assembler must either: 460 + - Insert synchronisation barriers (drain iteration N before starting N+L) 461 + - Split the loop body across PEs to reduce per-PE concurrency 462 + - Report a warning (analogous to matchable_offsets exceedance, AC5.8) 463 + 464 + 4. **Generate setup tokens.** ALLOC_SHARED tokens carry the parent act_id 465 + in their payload. The codegen pass already generates frame control tokens; 466 + this extends the format. 467 + 468 + --- 469 + 470 + ## Open Questions 471 + 472 + 1. **L=2 vs L=4 for v0.** L=2 is cheaper (+5 670s for Approach C, +0 for 473 + semi-CAM) and handles 2-deep loop pipelining. L=4 handles deeper nesting 474 + but costs more in metadata storage. Recommendation: L=2 for v0, upgradable. 475 + 476 + 2. **Loop iteration management.** Who manages the ALLOC_SHARED / FREE_LANE 477 + sequence? Options: 478 + - **Compiler-generated:** the assembler statically emits alloc/free tokens 479 + as part of the loop control flow. Simple, but inflexible. 480 + - **PE-internal:** a loop counter mechanism in the PE automatically 481 + rotates lanes. More complex hardware, but simpler programs. 482 + - **Hybrid:** compiler generates the control flow, PE provides the 483 + lane allocation hardware. (Recommended.) 484 + 485 + 3. **Semi-CAM way count vs lane count.** With B+670 semi-CAM, W (ways per 486 + frame) and L (lanes per frame) interact. W=4 with L=2 gives 4 pending 487 + matches shared across 2 lanes — 2 pending per lane on average, more if 488 + one lane is quiet. Is W=2 sufficient? Depends on the number of dyadic 489 + instructions with simultaneously pending operands. 490 + 491 + 4. **Interaction with SC arc execution.** Strongly-connected arc blocks 492 + execute sequential instructions within a single activation. Lanes are 493 + orthogonal — SC arcs don't need concurrent matching (they're sequential). 494 + But the frame_id latch for SC arcs must also latch the lane. Trivial 495 + addition. 496 + 497 + <!-- freshness: 2026-03-07 -->
+270
docs/design-plans/2026-03-07-frame-lanes.md
··· 1 + # Frame Matching Lanes Design 2 + 3 + ## Summary 4 + 5 + Extend the PE's frame-based matching to support multiple simultaneous pending 6 + operands per instruction within a single activation. Multiple `activation_id` 7 + values share one physical frame (constants/destinations) while maintaining 8 + independent matching state per lane. Required for loop pipelining and recursion. 9 + Changes span token types, PE internals, codegen, monitor, and tests. Assembler 10 + macro expansion for automatic loop pipelining is out of scope. 11 + 12 + ## Definition of Done 13 + 14 + The PE emulator supports matching lanes — multiple activation IDs sharing one 15 + physical frame with independent match/presence/port storage per lane. FrameOp 16 + gains ALLOC_SHARED and FREE_LANE. FREE_FRAME auto-detects last lane. 17 + ALLOC_REMOTE is data-driven (frame constant flag for shared vs new). Existing 18 + tests pass with the updated tag_store tuple API. New tests demonstrate 19 + shared-frame matching, lane exhaustion rejection, smart free behaviour, and 20 + data-driven ALLOC_REMOTE. Assembler macro expansion for automatic loop 21 + pipelining is explicitly out of scope. 22 + 23 + ## Acceptance Criteria 24 + 25 + ### AC1: Tag Store Tuple API 26 + 27 + - **frame-lanes.AC1.1:** `tag_store` maps `act_id → (frame_id, lane)` where 28 + `lane` is an `int` in range `[0, lane_count)`. 29 + - **frame-lanes.AC1.2:** `PEConfig.initial_tag_store` type is 30 + `dict[int, tuple[int, int]]`. PE constructor initialises tag_store from it. 31 + - **frame-lanes.AC1.3:** `PEConfig.lane_count` field exists with default 4. 32 + Controls third dimension of match arrays. 33 + - **frame-lanes.AC1.4:** All existing tests pass with updated tuple API. 34 + 35 + ### AC2: Separate Match Data Storage 36 + 37 + - **frame-lanes.AC2.1:** Match operand data lives in 38 + `match_data[frame_id][offset][lane]`, separate from `frames[frame_id][slot]`. 39 + - **frame-lanes.AC2.2:** `presence[frame_id][offset][lane]` is a 3D bool 40 + array. `port_store[frame_id][offset][lane]` likewise. 41 + - **frame-lanes.AC2.3:** `_match_frame()` uses `(frame_id, match_slot, lane)` 42 + to read/write match data, presence, and port. 43 + - **frame-lanes.AC2.4:** `frames[frame_id][slot]` remains shared across all 44 + lanes. Constants and destinations are NOT per-lane. 45 + 46 + ### AC3: FrameOp Extensions 47 + 48 + - **frame-lanes.AC3.1:** `FrameOp.ALLOC_SHARED` added. When received, 49 + PE looks up `parent_act_id` (from payload), finds parent's `frame_id`, 50 + assigns next free lane from that frame's lane pool, records 51 + `tag_store[act_id] = (frame_id, lane)`. Clears only that lane's 52 + presence/port bits. 53 + - **frame-lanes.AC3.2:** `FrameOp.FREE_LANE` added. Removes tag_store entry, 54 + clears that lane's presence/port/match_data across all matchable offsets. 55 + Does NOT return frame to free list. 56 + - **frame-lanes.AC3.3:** `FrameOp.FREE` (existing) becomes smart: removes 57 + tag_store entry, clears lane data. If no other tag_store entries reference 58 + the same frame_id, returns frame to free list and clears frame slots. If 59 + other entries exist, behaves like FREE_LANE. 60 + - **frame-lanes.AC3.4:** `FrameOp.ALLOC` (existing) unchanged — allocates 61 + fresh frame, assigns lane 0. 62 + - **frame-lanes.AC3.5:** `FrameAllocated` event gains `lane: int` field. 63 + `FrameFreed` event gains `lane: int` and `frame_freed: bool` fields. 64 + - **frame-lanes.AC3.6:** When all lanes for a frame are occupied and 65 + ALLOC_SHARED is received, PE emits `TokenRejected` with reason 66 + "no free lanes" and drops the token. 67 + 68 + ### AC4: ALLOC_REMOTE Data-Driven 69 + 70 + - **frame-lanes.AC4.1:** ALLOC_REMOTE reads `fref+2` from frame. If value 71 + is non-zero, emits `FrameControlToken` with `op=ALLOC_SHARED` and 72 + `payload=parent_act_id`. If zero, emits `op=ALLOC` as before. 73 + - **frame-lanes.AC4.2:** No new opcodes. Behaviour is entirely data-driven 74 + from frame constants. 75 + 76 + ### AC5: FREE_FRAME Instruction 77 + 78 + - **frame-lanes.AC5.1:** `FREE_FRAME` opcode uses the smart FREE behaviour 79 + from AC3.3. Frees the executing token's activation lane; returns frame to 80 + free list only if last lane. 81 + 82 + ### AC6: Monitor and Snapshot Updates 83 + 84 + - **frame-lanes.AC6.1:** `PESnapshot.tag_store` type becomes 85 + `dict[int, tuple[int, int]]`. 86 + - **frame-lanes.AC6.2:** `PESnapshot` gains `match_data`, `lane_count` fields 87 + reflecting the separated match storage. 88 + - **frame-lanes.AC6.3:** Monitor REPL `pe` command displays lane info in 89 + tag_store output. 90 + - **frame-lanes.AC6.4:** Monitor graph JSON serialises lane info correctly. 91 + 92 + ### AC7: Codegen Updates 93 + 94 + - **frame-lanes.AC7.1:** `codegen.py` generates `initial_tag_store` with 95 + `(frame_id, lane)` tuples. Existing single-activation code uses lane 0. 96 + - **frame-lanes.AC7.2:** No codegen changes needed for ALLOC_SHARED (manual 97 + construction only for now). 98 + 99 + ### AC8: Test Coverage 100 + 101 + - **frame-lanes.AC8.1:** Test: two act_ids sharing a frame via ALLOC_SHARED 102 + have independent matching — L operand for act_id 0 does not interfere with 103 + L operand for act_id 1 at the same offset. 104 + - **frame-lanes.AC8.2:** Test: ALLOC_SHARED with all lanes occupied emits 105 + TokenRejected. 106 + - **frame-lanes.AC8.3:** Test: FREE on a shared frame frees only the lane; 107 + other lanes' data is preserved. FREE on last lane frees the frame. 108 + - **frame-lanes.AC8.4:** Test: ALLOC_REMOTE emits ALLOC_SHARED when 109 + `fref+2` is non-zero. 110 + - **frame-lanes.AC8.5:** Test: ALLOC_REMOTE emits ALLOC when `fref+2` is 111 + zero (backwards compatible). 112 + - **frame-lanes.AC8.6:** Test: full loop pipelining scenario — two 113 + iterations of a dyadic instruction running concurrently on different 114 + lanes, both producing correct results. 115 + 116 + ## Architecture 117 + 118 + ### Current Model 119 + 120 + ``` 121 + tag_store[act_id] → frame_id (1:1 mapping) 122 + frames[frame_id][slot] (constants + dests + match data mixed) 123 + presence[frame_id][offset] (1 pending operand per instruction) 124 + port_store[frame_id][offset] (port of pending operand) 125 + ``` 126 + 127 + ### New Model 128 + 129 + ``` 130 + tag_store[act_id] → (frame_id, lane) (many:1, multiple act_ids per frame) 131 + frames[frame_id][slot] (constants + dests ONLY, shared) 132 + match_data[frame_id][offset][lane] (per-lane operand storage) 133 + presence[frame_id][offset][lane] (per-lane presence bits) 134 + port_store[frame_id][offset][lane] (per-lane port metadata) 135 + lane_free[frame_id] → set[int] (available lanes per frame) 136 + ``` 137 + 138 + ### Frame Control Token Payload Convention 139 + 140 + ``` 141 + ALLOC: payload ignored (or return routing) 142 + ALLOC_SHARED: payload = parent_act_id (low 3 bits) 143 + FREE: payload ignored 144 + FREE_LANE: payload ignored 145 + ``` 146 + 147 + ### ALLOC_REMOTE Frame Slot Convention 148 + 149 + ``` 150 + fref+0: target_pe (int) 151 + fref+1: target_act_id (int) 152 + fref+2: parent_act_id (0 = ALLOC_NEW, non-zero = ALLOC_SHARED) 153 + ``` 154 + 155 + ### Smart FREE Behaviour 156 + 157 + When FREE or FREE_FRAME executes for an act_id: 158 + 1. Look up `(frame_id, lane)` from tag_store 159 + 2. Remove tag_store entry for act_id 160 + 3. Clear match_data/presence/port_store for that lane across all offsets 161 + 4. Return lane to `lane_free[frame_id]` 162 + 5. Scan tag_store: does any other entry reference frame_id? 163 + - No → return frame to free_frames, clear all frame slots 164 + - Yes → frame stays allocated, constants/dests preserved 165 + 166 + ### Lifecycle Example: Loop Pipelining 167 + 168 + ``` 169 + 1. ALLOC(act_id=0) → frame 2, lane 0 170 + 2. Setup: write constants/dests to frame 2 171 + 3. Iteration 1 seeds use act_id=0 172 + 173 + 4. ALLOC_SHARED(act_id=1, parent=0) → frame 2, lane 1 174 + 5. Iteration 2 seeds use act_id=1 175 + (act_id=0 and act_id=1 match independently at same offsets) 176 + 177 + 6. Iteration 1 completes: FREE(act_id=0) → lane 0 freed, frame stays 178 + 7. ALLOC_SHARED(act_id=2, parent=1) → frame 2, lane 0 (recycled) 179 + 8. Iteration 3 seeds use act_id=2 180 + 181 + 9. All done: FREE(act_id=last) → last lane, frame returned 182 + ``` 183 + 184 + ## Existing Patterns 185 + 186 + - **Frame control handling:** `_handle_frame_control()` in `emu/pe.py` already 187 + dispatches on `FrameOp` enum values. Adding ALLOC_SHARED/FREE_LANE follows 188 + the same pattern. 189 + - **Token rejection:** `TokenRejected` event already exists and is emitted for 190 + invalid act_ids. Lane exhaustion follows the same pattern. 191 + - **Smart free (precedent):** The existing FREE already validates act_id 192 + presence in tag_store before freeing. The smart-free extension adds a 193 + scan step after removal. 194 + - **Data-driven opcode behaviour:** ALLOC_REMOTE already reads frame slots 195 + to determine target. Reading an additional slot for shared-vs-new is the 196 + same pattern. 197 + - **Codegen initial_tag_store:** Already generates `act_id → frame_id` 198 + mappings. Extending to tuples is mechanical. 199 + 200 + ## Implementation Phases 201 + 202 + ### Phase 1: Foundation Types and Tag Store API (2 tasks) 203 + 204 + Update FrameOp enum, PEConfig, and tag_store type across the codebase. 205 + All existing tests adapted to tuple API. No new behaviour yet. 206 + 207 + ### Phase 2: Separated Match Storage (2 tasks) 208 + 209 + Extract match_data from frames into its own 3D array. Update _match_frame 210 + to use lane dimension (always lane 0 for now). Update presence/port_store 211 + to 3D. Verify matching still works identically. 212 + 213 + ### Phase 3: ALLOC_SHARED, FREE_LANE, Smart FREE (3 tasks) 214 + 215 + Implement new FrameOp handlers. Add lane_free tracking. Implement smart 216 + FREE behaviour. Add events with lane fields. Write tests for all new ops. 217 + 218 + ### Phase 4: ALLOC_REMOTE Data-Driven and FREE_FRAME Update (2 tasks) 219 + 220 + Update ALLOC_REMOTE to read fref+2 for shared-vs-new. Update FREE_FRAME 221 + opcode to use smart free. Write tests. 222 + 223 + ### Phase 5: Monitor, Snapshot, and Codegen Updates (2 tasks) 224 + 225 + Update PESnapshot, capture(), REPL formatting, graph JSON. Update codegen 226 + initial_tag_store to emit tuples. Verify monitor displays lane info. 227 + 228 + ### Phase 6: Integration Tests (1 task) 229 + 230 + Full loop pipelining scenario test. Two concurrent iterations on shared 231 + frame, both producing correct results. E2E verification. 232 + 233 + ## Additional Considerations 234 + 235 + ### ABA Safety 236 + 237 + With 3-bit act_id (8 values) and at most 4 lanes per frame, there are 4 IDs 238 + of ABA distance between allocation and re-use of the same act_id value. 239 + FREE removes the act_id from tag_store entirely, so stale tokens with freed 240 + act_ids hit rejection. Re-allocation uses a different act_id value. 241 + 242 + ### Hardware Mapping 243 + 244 + This design maps cleanly to Approach C (670 lookup) with L=2 at +5 chips, 245 + or to B+670 semi-CAM with zero additional match hardware (lane bit fits in 246 + existing comparator width). See `design-notes/frame-lanes-for-concurrent- 247 + matching.md` for full hardware analysis. 248 + 249 + ### Future Work 250 + 251 + - **Assembler loop macro:** `#loop_counted` and `#loop_while` could auto- 252 + generate ALLOC_SHARED/FREE_LANE control flow with act_id rotation. 253 + - **Lane depth analysis:** static analysis in the allocator to warn when 254 + loop concurrency exceeds lane_count. 255 + - **SC arc interaction:** frame_id latch for strongly-connected arc execution 256 + must also latch the lane. Trivial addition when SC arcs are implemented. 257 + 258 + ## Glossary 259 + 260 + - **Lane:** An independent matching slot within a shared frame. Multiple 261 + act_ids can map to the same frame_id with different lane indices, 262 + providing concurrent matching without duplicating constants/destinations. 263 + - **Lane pool:** The set of available lanes per frame, tracked by 264 + `lane_free[frame_id]`. Initially all lanes are free; ALLOC assigns lane 0, 265 + ALLOC_SHARED assigns the next free lane. 266 + - **Smart free:** FREE behaviour that auto-detects whether the freed lane 267 + is the last one using that frame. If last, returns frame to free list. 268 + If not, preserves frame for remaining lanes. 269 + - **Parent act_id:** The activation ID whose frame should be shared during 270 + ALLOC_SHARED. Used to look up the target frame_id.
+271
docs/implementation-plans/2026-03-07-frame-lanes/phase_01.md
··· 1 + # Frame Matching Lanes Implementation Plan 2 + 3 + **Goal:** Extend the PE's frame-based matching to support multiple simultaneous pending operands per instruction within a single activation via matching lanes. 4 + 5 + **Architecture:** Multiple `activation_id` values share one physical frame (constants/destinations) while maintaining independent matching state per lane. Tag store maps `act_id → (frame_id, lane)`. Match data, presence, and port storage gain a lane dimension. 6 + 7 + **Tech Stack:** Python 3.12, SimPy 4.1, pytest + hypothesis 8 + 9 + **Scope:** 6 phases from original design (phases 1-6) 10 + 11 + **Codebase verified:** 2026-03-07 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + This phase implements and tests: 18 + 19 + ### frame-lanes.AC1: Tag Store Tuple API 20 + - **frame-lanes.AC1.1 Success:** `tag_store` maps `act_id → (frame_id, lane)` where `lane` is an `int` in range `[0, lane_count)`. 21 + - **frame-lanes.AC1.2 Success:** `PEConfig.initial_tag_store` type is `dict[int, tuple[int, int]]`. PE constructor initialises tag_store from it. 22 + - **frame-lanes.AC1.3 Success:** `PEConfig.lane_count` field exists with default 4. Controls third dimension of match arrays. 23 + - **frame-lanes.AC1.4 Success:** All existing tests pass with updated tuple API. 24 + 25 + --- 26 + 27 + <!-- START_SUBCOMPONENT_A (tasks 1-2) --> 28 + 29 + <!-- START_TASK_1 --> 30 + ### Task 1: Update FrameOp enum with ALLOC_SHARED and FREE_LANE 31 + 32 + **Verifies:** None (enum additions only, no behaviour change) 33 + 34 + **Files:** 35 + - Modify: `cm_inst.py:88-90` 36 + 37 + **Implementation:** 38 + 39 + Add two new enum members to `FrameOp`. Existing values stay unchanged: 40 + 41 + ```python 42 + class FrameOp(IntEnum): 43 + ALLOC = 0 44 + FREE = 1 45 + ALLOC_SHARED = 2 46 + FREE_LANE = 3 47 + ``` 48 + 49 + **Testing:** 50 + 51 + No tests needed — IntEnum membership is compiler-verifiable. Existing tests that use `FrameOp.ALLOC` and `FrameOp.FREE` remain unaffected. 52 + 53 + **Verification:** 54 + Run: `python -m pytest tests/ -v -x` 55 + Expected: All 1277+ existing tests pass unchanged. 56 + 57 + **Commit:** `jj commit -m "feat: add ALLOC_SHARED and FREE_LANE to FrameOp enum"` 58 + <!-- END_TASK_1 --> 59 + 60 + <!-- START_TASK_2 --> 61 + ### Task 2: Add lane_count to PEConfig 62 + 63 + **Verifies:** frame-lanes.AC1.3 64 + 65 + **Files:** 66 + - Modify: `emu/types.py:17-27` 67 + 68 + **Implementation:** 69 + 70 + Add `lane_count` field to `PEConfig` with default 4. Place it after `matchable_offsets` (line 22) to group dimensional config together: 71 + 72 + ```python 73 + @dataclass(frozen=True) 74 + class PEConfig: 75 + pe_id: int = 0 76 + iram: dict[int, Instruction] | None = None 77 + frame_count: int = 8 78 + frame_slots: int = 64 79 + matchable_offsets: int = 8 80 + lane_count: int = 4 81 + initial_frames: Optional[dict[int, list[FrameSlotValue]]] = None 82 + initial_tag_store: Optional[dict[int, int]] = None 83 + allowed_pe_routes: Optional[set[int]] = None 84 + allowed_sm_routes: Optional[set[int]] = None 85 + on_event: EventCallback | None = None 86 + ``` 87 + 88 + Note: `initial_tag_store` type stays `dict[int, int]` for now — the next task (Task 3) changes it to tuples. 89 + 90 + **Testing:** 91 + 92 + No dedicated tests — `lane_count` has a default value so all existing PEConfig constructions remain valid. The field's effect is tested when match arrays gain the lane dimension (Phase 2). 93 + 94 + **Verification:** 95 + Run: `python -m pytest tests/ -v -x` 96 + Expected: All existing tests pass unchanged. 97 + 98 + **Commit:** `jj commit -m "feat: add lane_count field to PEConfig with default 4"` 99 + <!-- END_TASK_2 --> 100 + 101 + <!-- END_SUBCOMPONENT_A --> 102 + 103 + <!-- START_SUBCOMPONENT_B (tasks 3-5) --> 104 + 105 + <!-- START_TASK_3 --> 106 + ### Task 3: Update PEConfig.initial_tag_store to tuple type 107 + 108 + **Verifies:** frame-lanes.AC1.2 109 + 110 + **Files:** 111 + - Modify: `emu/types.py:24` — change type annotation 112 + 113 + **Implementation:** 114 + 115 + Change the `initial_tag_store` type from `dict[int, int]` to `dict[int, tuple[int, int]]`: 116 + 117 + ```python 118 + initial_tag_store: Optional[dict[int, tuple[int, int]]] = None 119 + ``` 120 + 121 + Each entry is now `act_id → (frame_id, lane)`. 122 + 123 + **Testing:** 124 + 125 + No dedicated tests — this is a type change. Downstream call sites are updated in Tasks 4 and 5. 126 + 127 + **Verification:** 128 + 129 + This change alone will break tests that construct `PEConfig` with `initial_tag_store={0: 0}` etc. Do NOT run tests yet — proceed to Task 4 immediately. 130 + 131 + **Commit:** Do not commit yet — combine with Task 4. 132 + <!-- END_TASK_3 --> 133 + 134 + <!-- START_TASK_4 --> 135 + ### Task 4: Update PE constructor and internals for tuple tag_store 136 + 137 + **Verifies:** frame-lanes.AC1.1 138 + 139 + **Files:** 140 + - Modify: `emu/pe.py:72` — tag_store initialization 141 + - Modify: `emu/pe.py:88-90` — free_frames removal from tag_store values 142 + - Modify: `emu/pe.py:169-176` — CMToken act_id lookup 143 + - Modify: `emu/pe.py:260-261` — FREE_FRAME opcode handler 144 + - Modify: `emu/pe.py:289` — ALLOC frame control handler 145 + - Modify: `emu/pe.py:304-305` — FREE frame control handler 146 + - Modify: `emu/pe.py:321-322` — PELocalWriteToken handler 147 + 148 + **Implementation:** 149 + 150 + The internal `tag_store` type changes from `dict[int, int]` to `dict[int, tuple[int, int]]`. Every access point must be updated. 151 + 152 + **Line 72 — Initialization:** 153 + ```python 154 + # Tag store: act_id → (frame_id, lane) 155 + self.tag_store: dict[int, tuple[int, int]] = dict(config.initial_tag_store or {}) 156 + ``` 157 + 158 + **Lines 88-90 — Free frames removal:** 159 + The values are now tuples `(frame_id, lane)`. Extract `frame_id`: 160 + ```python 161 + for frame_id, _lane in self.tag_store.values(): 162 + if frame_id in self.free_frames: 163 + self.free_frames.remove(frame_id) 164 + ``` 165 + 166 + **Lines 169-176 — CMToken pipeline (act_id lookup):** 167 + Where the code currently does `frame_id = self.tag_store[token.act_id]`, change to: 168 + ```python 169 + frame_id, lane = self.tag_store[token.act_id] 170 + ``` 171 + The `lane` value is not used yet in Phase 1 — matching still uses the 2D presence/port arrays. Phase 2 adds the lane dimension to match storage. 172 + 173 + **Lines 260-261 — FREE_FRAME opcode:** 174 + Where the code does `freed_frame = self.tag_store.pop(token.act_id)`, change to: 175 + ```python 176 + freed_frame, _lane = self.tag_store.pop(token.act_id) 177 + ``` 178 + 179 + **Line 289 — ALLOC handler:** 180 + Where the code stores `self.tag_store[token.act_id] = frame_id`, change to: 181 + ```python 182 + self.tag_store[token.act_id] = (frame_id, 0) 183 + ``` 184 + New allocations always get lane 0. 185 + 186 + **Lines 304-305 — FREE handler:** 187 + Where the code does `frame_id = self.tag_store.pop(token.act_id)`, change to: 188 + ```python 189 + frame_id, _lane = self.tag_store.pop(token.act_id) 190 + ``` 191 + 192 + **Lines 321-322 — PELocalWriteToken handler:** 193 + Where the code checks `token.act_id in self.tag_store` and then does `frame_id = self.tag_store[token.act_id]`, change the lookup to: 194 + ```python 195 + frame_id, _lane = self.tag_store[token.act_id] 196 + ``` 197 + 198 + **Testing:** 199 + 200 + No new tests in this task — AC1.1 is verified by the existing test suite passing with the new tuple type (Task 5 updates those tests). 201 + 202 + **Verification:** 203 + 204 + Do NOT run tests yet — existing tests still pass `dict[int, int]` values to `initial_tag_store`. Proceed to Task 5 immediately. 205 + 206 + **Commit:** Do not commit yet — combine with Task 5. 207 + <!-- END_TASK_4 --> 208 + 209 + <!-- START_TASK_5 --> 210 + ### Task 5: Update all test files and downstream code for tuple tag_store API 211 + 212 + **Verifies:** frame-lanes.AC1.2, frame-lanes.AC1.4 213 + 214 + **Files:** 215 + - Modify: `tests/test_pe_frames.py` — ~21 `pe.tag_store[N]` value access sites need tuple unpacking (e.g., `frame_id = pe.tag_store[0]` → `frame_id, _lane = pe.tag_store[0]` or `frame_id = pe.tag_store[0][0]`). Also fix `pe.tag_store[0] in range(pe.frame_count)` at line 99 to `pe.tag_store[0][0] in range(pe.frame_count)`. Note: this file does NOT use `initial_tag_store` — changes are to value access patterns only. 216 + - Modify: `tests/test_pe_events.py` — 9 `initial_tag_store` call sites: all `{0: 0}` → `{0: (0, 0)}` 217 + - Modify: `tests/test_network_routing.py` — 2 tests with `initial_tag_store` construction and `pe.tag_store` value assertions 218 + - Modify: `tests/test_snapshot.py` — tag_store capture assertions and PESnapshot type 219 + - Modify: `tests/test_pe.py` — 18 `initial_tag_store` call sites: 17 `{0: 0}` → `{0: (0, 0)}`, one `{1: 0}` → `{1: (0, 0)}`. Note: `pe.presence` indexing changes are deferred to Phase 2 Task 2 (this task changes ONLY `initial_tag_store` values in this file). 220 + - Modify: `tests/test_monitor_graph_json.py` — PESnapshot constructions with tag_store field 221 + - Modify: `monitor/snapshot.py:25` — PESnapshot.tag_store type annotation 222 + - Modify: `monitor/snapshot.py:81` — capture() tag_store copy 223 + - Modify: `asm/codegen.py:371-422` — initial_tag_store generation 224 + - Modify: `tests/conftest.py` — frame_control_token strategy (if it constructs tag_store) 225 + 226 + **Implementation:** 227 + 228 + This is a mechanical find-and-replace across the codebase. Every place that constructs `initial_tag_store` must change from `{act_id: frame_id}` to `{act_id: (frame_id, lane)}` where lane is 0 for all existing code. 229 + 230 + **Pattern for test files:** 231 + 232 + Every `initial_tag_store={0: 0}` becomes `initial_tag_store={0: (0, 0)}`. 233 + Every `initial_tag_store={1: 0}` becomes `initial_tag_store={1: (0, 0)}`. 234 + Every `initial_tag_store={0: 2, 1: 3}` becomes `initial_tag_store={0: (2, 0), 1: (3, 0)}`. 235 + 236 + **Pattern for assertions on tag_store values:** 237 + 238 + Where tests assert `pe.tag_store[0] == 2`, change to `pe.tag_store[0] == (2, 0)`. 239 + Where tests assert `pe.tag_store[0]` (existence check), no change needed. 240 + 241 + **monitor/snapshot.py line 25:** 242 + ```python 243 + tag_store: dict[int, tuple[int, int]] 244 + ``` 245 + 246 + **monitor/snapshot.py line 81:** 247 + No code change needed — `dict(pe.tag_store)` already copies tuples correctly. 248 + 249 + **asm/codegen.py lines 371-422:** 250 + 251 + Where `initial_tag_store[act_id] = frame_id` is set, change to: 252 + ```python 253 + initial_tag_store[act_id] = (frame_id, 0) 254 + ``` 255 + 256 + This applies at approximately lines 383 and 412. 257 + 258 + **Testing:** 259 + 260 + This task verifies AC1.4 — all existing tests must pass with the updated tuple API. No new test functions are needed; the existing suite IS the verification. 261 + 262 + **Verification:** 263 + Run: `python -m pytest tests/ -v -x` 264 + Expected: All existing tests pass. Zero failures. 265 + 266 + **Commit:** `jj commit -m "feat: update tag_store to tuple API (act_id → frame_id, lane)"` 267 + 268 + This single commit covers Tasks 3, 4, and 5 together since they form an atomic change — the type, the internals, and all call sites must change together. 269 + <!-- END_TASK_5 --> 270 + 271 + <!-- END_SUBCOMPONENT_B -->
+268
docs/implementation-plans/2026-03-07-frame-lanes/phase_02.md
··· 1 + # Frame Matching Lanes Implementation Plan 2 + 3 + **Goal:** Extend the PE's frame-based matching to support multiple simultaneous pending operands per instruction within a single activation via matching lanes. 4 + 5 + **Architecture:** Multiple `activation_id` values share one physical frame (constants/destinations) while maintaining independent matching state per lane. Tag store maps `act_id → (frame_id, lane)`. Match data, presence, and port storage gain a lane dimension. 6 + 7 + **Tech Stack:** Python 3.12, SimPy 4.1, pytest + hypothesis 8 + 9 + **Scope:** 6 phases from original design (phases 1-6) 10 + 11 + **Codebase verified:** 2026-03-07 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + This phase implements and tests: 18 + 19 + ### frame-lanes.AC2: Separate Match Data Storage 20 + - **frame-lanes.AC2.1 Success:** Match operand data lives in `match_data[frame_id][offset][lane]`, separate from `frames[frame_id][slot]`. 21 + - **frame-lanes.AC2.2 Success:** `presence[frame_id][offset][lane]` is a 3D bool array. `port_store[frame_id][offset][lane]` likewise. 22 + - **frame-lanes.AC2.3 Success:** `_match_frame()` uses `(frame_id, match_slot, lane)` to read/write match data, presence, and port. 23 + - **frame-lanes.AC2.4 Success:** `frames[frame_id][slot]` remains shared across all lanes. Constants and destinations are NOT per-lane. 24 + 25 + Also partially satisfies (structural type changes only, full testing in Phase 5): 26 + ### frame-lanes.AC6: Monitor and Snapshot Updates (partial) 27 + - **frame-lanes.AC6.1:** `PESnapshot.tag_store` type updated to `dict[int, tuple[int, int]]`. 28 + - **frame-lanes.AC6.2:** `PESnapshot` gains `match_data`, `lane_count` fields. 29 + 30 + --- 31 + 32 + <!-- START_SUBCOMPONENT_A (tasks 1-3) --> 33 + 34 + <!-- START_TASK_1 --> 35 + ### Task 1: Add match_data 3D array and convert presence/port_store to 3D 36 + 37 + **Verifies:** frame-lanes.AC2.1, frame-lanes.AC2.2 38 + 39 + **Files:** 40 + - Modify: `emu/pe.py:66-84` — PE constructor storage initialization 41 + - Modify: `emu/pe.py:290-296` — ALLOC handler reset logic 42 + 43 + **Implementation:** 44 + 45 + Add a new `match_data` 3D array and extend `presence` and `port_store` from 2D to 3D by adding the lane dimension. After Phase 1, `config.lane_count` is available. 46 + 47 + **Constructor changes (replace lines 74-84):** 48 + 49 + Replace the current 2D presence and port_store initialization with 3D versions, and add a match_data array: 50 + 51 + ```python 52 + # Match data: [frame_id][match_slot][lane] - operand values waiting for partner 53 + self.match_data: list[list[list[Optional[int]]]] = [ 54 + [ 55 + [None for _ in range(config.lane_count)] 56 + for _ in range(config.matchable_offsets) 57 + ] 58 + for _ in range(config.frame_count) 59 + ] 60 + 61 + # Presence bits: [frame_id][match_slot][lane] - True if operand waiting for partner 62 + self.presence: list[list[list[bool]]] = [ 63 + [ 64 + [False for _ in range(config.lane_count)] 65 + for _ in range(config.matchable_offsets) 66 + ] 67 + for _ in range(config.frame_count) 68 + ] 69 + 70 + # Port store: [frame_id][match_slot][lane] - port of waiting operand 71 + self.port_store: list[list[list[Optional[Port]]]] = [ 72 + [ 73 + [None for _ in range(config.lane_count)] 74 + for _ in range(config.matchable_offsets) 75 + ] 76 + for _ in range(config.frame_count) 77 + ] 78 + 79 + self.lane_count = config.lane_count 80 + ``` 81 + 82 + **ALLOC handler reset (lines 293-296):** 83 + 84 + Update the presence/port_store reset loop to iterate all lanes, and also clear match_data: 85 + 86 + ```python 87 + for i in range(self.matchable_offsets): 88 + for ln in range(self.lane_count): 89 + self.match_data[frame_id][i][ln] = None 90 + self.presence[frame_id][i][ln] = False 91 + self.port_store[frame_id][i][ln] = None 92 + ``` 93 + 94 + **Testing:** 95 + 96 + No dedicated tests for storage structure — AC2.1 and AC2.2 are verified by the existing test suite continuing to pass after Task 2 updates `_match_frame()`. The 3D structure is exercised through matching behaviour. 97 + 98 + **Verification:** 99 + 100 + Do NOT run tests yet — `_match_frame()` still uses 2D indexing. Proceed to Task 2 immediately. 101 + 102 + **Commit:** Do not commit yet — combine with Task 2. 103 + <!-- END_TASK_1 --> 104 + 105 + <!-- START_TASK_2 --> 106 + ### Task 2: Update _match_frame() to use lane dimension 107 + 108 + **Verifies:** frame-lanes.AC2.3, frame-lanes.AC2.4 109 + 110 + **Files:** 111 + - Modify: `emu/pe.py:343-383` — `_match_frame()` method 112 + - Modify: `emu/pe.py:169-176` — CMToken pipeline where `_match_frame()` is called 113 + 114 + **Implementation:** 115 + 116 + Update `_match_frame()` to accept and use the `lane` parameter. After Phase 1, the CMToken pipeline already unpacks `frame_id, lane = self.tag_store[token.act_id]` — now pass `lane` through. 117 + 118 + **Update the call site (in the CMToken processing pipeline):** 119 + 120 + Where `_match_frame()` is currently called with `(token, inst, frame_id)`, add `lane`: 121 + ```python 122 + result = self._match_frame(token, inst, frame_id, lane) 123 + ``` 124 + 125 + **Updated `_match_frame()` signature and body:** 126 + 127 + ```python 128 + def _match_frame( 129 + self, 130 + token: DyadToken, 131 + inst: Instruction, 132 + frame_id: int, 133 + lane: int, 134 + ) -> Optional[tuple[int, int]]: 135 + """Frame-based dyadic matching with lane support. 136 + 137 + Derives match slot from low bits of token.offset: 138 + match_slot = token.offset % matchable_offsets 139 + 140 + Match data, presence, and port are per-lane. 141 + Frame constants/destinations remain shared. 142 + """ 143 + match_slot = token.offset % self.matchable_offsets 144 + 145 + if self.presence[frame_id][match_slot][lane]: 146 + # Partner already waiting — pair them 147 + partner_data = self.match_data[frame_id][match_slot][lane] 148 + partner_port = self.port_store[frame_id][match_slot][lane] 149 + self.presence[frame_id][match_slot][lane] = False 150 + self.match_data[frame_id][match_slot][lane] = None 151 + 152 + # Use port metadata to determine left/right ordering 153 + if partner_port == Port.L: 154 + left, right = partner_data, token.data 155 + else: 156 + left, right = token.data, partner_data 157 + 158 + self._on_event(Matched( 159 + time=self.env.now, component=self._component, 160 + left=left, right=right, act_id=token.act_id, 161 + offset=token.offset, frame_id=frame_id, 162 + )) 163 + return left, right 164 + else: 165 + # Store and wait for partner 166 + self.match_data[frame_id][match_slot][lane] = token.data 167 + self.port_store[frame_id][match_slot][lane] = token.port 168 + self.presence[frame_id][match_slot][lane] = True 169 + return None 170 + ``` 171 + 172 + Key changes from current code: 173 + - All `self.frames[frame_id][match_slot]` reads/writes for match data → `self.match_data[frame_id][match_slot][lane]` 174 + - All `self.presence[frame_id][match_slot]` → `self.presence[frame_id][match_slot][lane]` 175 + - All `self.port_store[frame_id][match_slot]` → `self.port_store[frame_id][match_slot][lane]` 176 + - `self.frames` is NOT touched — constants and destinations remain shared (AC2.4) 177 + 178 + **Testing:** 179 + 180 + AC2.3 and AC2.4 are verified by the existing test suite passing. All existing tests use lane 0 (set by Phase 1's tuple tag_store), so matching behaviour is identical. 181 + 182 + Two tests directly check `pe.presence`: 183 + - `tests/test_pe.py:160` — `assert pe.presence[frame_id][0] is True` → change to `pe.presence[frame_id][0][0]` 184 + - `tests/test_pe.py:624` — `assert pe.presence[frame_id][...] is False` → change to `pe.presence[frame_id][...][0]` 185 + 186 + **Verification:** 187 + Run: `python -m pytest tests/ -v -x` 188 + Expected: All existing tests pass. 189 + 190 + **Commit:** `jj commit -m "feat: separate match_data from frames, add lane dimension to presence/port_store"` 191 + 192 + This single commit covers Tasks 1 and 2 since they form an atomic change. 193 + <!-- END_TASK_2 --> 194 + 195 + <!-- START_TASK_3 --> 196 + ### Task 3: Update snapshot capture for 3D match storage 197 + 198 + **Verifies:** None (snapshot updates for AC6 are in Phase 5, but this keeps snapshot working) 199 + 200 + **Files:** 201 + - Modify: `monitor/snapshot.py:18-30` — PESnapshot dataclass 202 + - Modify: `monitor/snapshot.py:82-89` — capture() presence/port_store conversion 203 + - Modify: `tests/test_snapshot.py` — snapshot assertion updates 204 + 205 + **Implementation:** 206 + 207 + Update PESnapshot to reflect the new 3D storage shapes and add match_data field. 208 + 209 + **PESnapshot dataclass updates:** 210 + 211 + ```python 212 + @dataclass(frozen=True) 213 + class PESnapshot: 214 + pe_id: int 215 + iram: dict[int, Instruction] 216 + frames: tuple[tuple[FrameSlotValue, ...], ...] 217 + tag_store: dict[int, tuple[int, int]] 218 + presence: tuple[tuple[tuple[bool, ...], ...], ...] 219 + port_store: tuple[tuple[tuple[Port | None, ...], ...], ...] 220 + match_data: tuple[tuple[tuple[int | None, ...], ...], ...] 221 + free_frames: tuple[int, ...] 222 + lane_count: int 223 + input_queue: tuple[Token, ...] 224 + output_log: tuple[Token, ...] 225 + ``` 226 + 227 + **capture() updates:** 228 + 229 + Replace the 2D presence/port_store capture with 3D, and add match_data capture: 230 + 231 + ```python 232 + presence = tuple( 233 + tuple( 234 + tuple(lane_val for lane_val in offset_lanes) 235 + for offset_lanes in frame_presence 236 + ) 237 + for frame_presence in pe.presence 238 + ) 239 + port_store = tuple( 240 + tuple( 241 + tuple(lane_val for lane_val in offset_lanes) 242 + for offset_lanes in frame_ports 243 + ) 244 + for frame_ports in pe.port_store 245 + ) 246 + match_data = tuple( 247 + tuple( 248 + tuple(lane_val for lane_val in offset_lanes) 249 + for offset_lanes in frame_match 250 + ) 251 + for frame_match in pe.match_data 252 + ) 253 + ``` 254 + 255 + Pass `match_data=match_data` and `lane_count=pe.lane_count` to the PESnapshot constructor. 256 + 257 + **Testing:** 258 + 259 + Update any snapshot tests that assert on `presence` or `port_store` shape to expect 3D tuples. Update tests that construct PESnapshot directly to include `match_data` and `lane_count` fields. 260 + 261 + **Verification:** 262 + Run: `python -m pytest tests/ -v -x` 263 + Expected: All tests pass. 264 + 265 + **Commit:** `jj commit -m "feat: update PESnapshot for 3D match storage and match_data field"` 266 + <!-- END_TASK_3 --> 267 + 268 + <!-- END_SUBCOMPONENT_A -->
+363
docs/implementation-plans/2026-03-07-frame-lanes/phase_03.md
··· 1 + # Frame Matching Lanes Implementation Plan 2 + 3 + **Goal:** Extend the PE's frame-based matching to support multiple simultaneous pending operands per instruction within a single activation via matching lanes. 4 + 5 + **Architecture:** Multiple `activation_id` values share one physical frame (constants/destinations) while maintaining independent matching state per lane. Tag store maps `act_id → (frame_id, lane)`. Match data, presence, and port storage gain a lane dimension. 6 + 7 + **Tech Stack:** Python 3.12, SimPy 4.1, pytest + hypothesis 8 + 9 + **Scope:** 6 phases from original design (phases 1-6) 10 + 11 + **Codebase verified:** 2026-03-07 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + This phase implements and tests: 18 + 19 + ### frame-lanes.AC3: FrameOp Extensions 20 + - **frame-lanes.AC3.1 Success:** `FrameOp.ALLOC_SHARED` added. When received, PE looks up `parent_act_id` (from payload), finds parent's `frame_id`, assigns next free lane from that frame's lane pool, records `tag_store[act_id] = (frame_id, lane)`. Clears only that lane's presence/port bits. 21 + - **frame-lanes.AC3.2 Success:** `FrameOp.FREE_LANE` added. Removes tag_store entry, clears that lane's presence/port/match_data across all matchable offsets. Does NOT return frame to free list. 22 + - **frame-lanes.AC3.3 Success:** `FrameOp.FREE` (existing) becomes smart: removes tag_store entry, clears lane data. If no other tag_store entries reference the same frame_id, returns frame to free list and clears frame slots. If other entries exist, behaves like FREE_LANE. 23 + - **frame-lanes.AC3.4 Success:** `FrameOp.ALLOC` (existing) unchanged — allocates fresh frame, assigns lane 0. 24 + - **frame-lanes.AC3.5 Success:** `FrameAllocated` event gains `lane: int` field. `FrameFreed` event gains `lane: int` and `frame_freed: bool` fields. 25 + - **frame-lanes.AC3.6 Success:** When all lanes for a frame are occupied and ALLOC_SHARED is received, PE emits `TokenRejected` with reason "no free lanes" and drops the token. 26 + 27 + ### frame-lanes.AC8: Test Coverage (partial) 28 + - **frame-lanes.AC8.1 Success:** Test: two act_ids sharing a frame via ALLOC_SHARED have independent matching — L operand for act_id 0 does not interfere with L operand for act_id 1 at the same offset. 29 + - **frame-lanes.AC8.2 Success:** Test: ALLOC_SHARED with all lanes occupied emits TokenRejected. 30 + - **frame-lanes.AC8.3 Success:** Test: FREE on a shared frame frees only the lane; other lanes' data is preserved. FREE on last lane frees the frame. 31 + 32 + --- 33 + 34 + <!-- START_SUBCOMPONENT_A (tasks 1-2) --> 35 + 36 + <!-- START_TASK_1 --> 37 + ### Task 1: Add lane_free tracking and update FrameAllocated/FrameFreed events 38 + 39 + **Verifies:** frame-lanes.AC3.5 40 + 41 + **Files:** 42 + - Modify: `emu/events.py:84-97` — add lane fields to FrameAllocated and FrameFreed 43 + - Modify: `emu/pe.py` — add lane_free data structure, update all event emissions 44 + - Modify: `tests/test_pe_events.py` — update assertions for new event fields 45 + - Modify: `tests/test_pe_frames.py` — update assertions for new event fields 46 + 47 + **Implementation:** 48 + 49 + **emu/events.py — Update event dataclasses:** 50 + 51 + ```python 52 + @dataclass(frozen=True) 53 + class FrameAllocated: 54 + time: float 55 + component: str 56 + act_id: int 57 + frame_id: int 58 + lane: int 59 + 60 + @dataclass(frozen=True) 61 + class FrameFreed: 62 + time: float 63 + component: str 64 + act_id: int 65 + frame_id: int 66 + lane: int 67 + frame_freed: bool 68 + ``` 69 + 70 + **emu/pe.py — Add lane_free tracking in constructor:** 71 + 72 + After the `free_frames` initialization (line 87), add: 73 + 74 + ```python 75 + # Lane tracking: which lanes are free per frame 76 + self.lane_free: dict[int, set[int]] = {} 77 + ``` 78 + 79 + `lane_free` is populated lazily — when a frame is allocated via ALLOC, its lanes are set up. 80 + 81 + **emu/pe.py — Update existing ALLOC handler event emission:** 82 + 83 + After Phase 1, ALLOC stores `(frame_id, 0)` in tag_store. Update to also set up lane tracking and emit `lane=0`: 84 + 85 + ```python 86 + if token.op == FrameOp.ALLOC: 87 + if self.free_frames: 88 + frame_id = self.free_frames.pop() 89 + self.tag_store[token.act_id] = (frame_id, 0) 90 + # Set up lane tracking: lane 0 is taken, rest are free 91 + self.lane_free[frame_id] = set(range(1, self.lane_count)) 92 + # Initialize frame slots to None 93 + for i in range(self.frame_slots): 94 + self.frames[frame_id][i] = None 95 + # Reset all lanes' match state 96 + for i in range(self.matchable_offsets): 97 + for ln in range(self.lane_count): 98 + self.match_data[frame_id][i][ln] = None 99 + self.presence[frame_id][i][ln] = False 100 + self.port_store[frame_id][i][ln] = None 101 + self._on_event(FrameAllocated( 102 + time=self.env.now, component=self._component, 103 + act_id=token.act_id, frame_id=frame_id, lane=0, 104 + )) 105 + else: 106 + logger.warning(f"PE {self.pe_id}: no free frames available") 107 + ``` 108 + 109 + **emu/pe.py — Update existing FREE handler for smart behaviour:** 110 + 111 + Note: This replaces the Phase 1 FREE handler wholesale. The Phase 1 version only added tuple unpacking (`frame_id, _lane = self.tag_store.pop(...)`). This version adds lane data clearing, frame-in-use checking, and conditional frame return. 112 + 113 + ```python 114 + elif token.op == FrameOp.FREE: 115 + if token.act_id in self.tag_store: 116 + frame_id, lane = self.tag_store.pop(token.act_id) 117 + # Clear this lane's match state 118 + for i in range(self.matchable_offsets): 119 + self.match_data[frame_id][i][lane] = None 120 + self.presence[frame_id][i][lane] = False 121 + self.port_store[frame_id][i][lane] = None 122 + # Check if any other activations use this frame 123 + frame_in_use = any(fid == frame_id for fid, _ in self.tag_store.values()) 124 + if frame_in_use: 125 + # Return lane to pool, keep frame 126 + self.lane_free[frame_id].add(lane) 127 + self._on_event(FrameFreed( 128 + time=self.env.now, component=self._component, 129 + act_id=token.act_id, frame_id=frame_id, 130 + lane=lane, frame_freed=False, 131 + )) 132 + else: 133 + # Last lane — return frame to free list 134 + self.free_frames.append(frame_id) 135 + if frame_id in self.lane_free: 136 + del self.lane_free[frame_id] 137 + # Clear frame slots 138 + for i in range(self.frame_slots): 139 + self.frames[frame_id][i] = None 140 + self._on_event(FrameFreed( 141 + time=self.env.now, component=self._component, 142 + act_id=token.act_id, frame_id=frame_id, 143 + lane=lane, frame_freed=True, 144 + )) 145 + ``` 146 + 147 + **emu/pe.py — Update FREE_FRAME opcode handler (lines 259-266):** 148 + 149 + Same smart free logic applies here: 150 + 151 + ```python 152 + if token.act_id in self.tag_store: 153 + freed_frame, lane = self.tag_store.pop(token.act_id) 154 + # Clear this lane's match state 155 + for i in range(self.matchable_offsets): 156 + self.match_data[freed_frame][i][lane] = None 157 + self.presence[freed_frame][i][lane] = False 158 + self.port_store[freed_frame][i][lane] = None 159 + # Check if any other activations use this frame 160 + frame_in_use = any(fid == freed_frame for fid, _ in self.tag_store.values()) 161 + if frame_in_use: 162 + self.lane_free[freed_frame].add(lane) 163 + self._on_event(FrameFreed( 164 + time=self.env.now, component=self._component, 165 + act_id=token.act_id, frame_id=freed_frame, 166 + lane=lane, frame_freed=False, 167 + )) 168 + else: 169 + self.free_frames.append(freed_frame) 170 + if freed_frame in self.lane_free: 171 + del self.lane_free[freed_frame] 172 + for i in range(self.frame_slots): 173 + self.frames[freed_frame][i] = None 174 + self._on_event(FrameFreed( 175 + time=self.env.now, component=self._component, 176 + act_id=token.act_id, frame_id=freed_frame, 177 + lane=lane, frame_freed=True, 178 + )) 179 + ``` 180 + 181 + **Also update constructor initialisation for pre-loaded tag_store entries:** 182 + 183 + After Phase 1, the constructor removes allocated frames from `free_frames`. Also initialise `lane_free` for those frames: 184 + 185 + ```python 186 + for act_id, (frame_id, lane) in self.tag_store.items(): 187 + if frame_id in self.free_frames: 188 + self.free_frames.remove(frame_id) 189 + if frame_id not in self.lane_free: 190 + # First time seeing this frame — set up lane tracking 191 + all_lanes = set(range(self.lane_count)) 192 + self.lane_free[frame_id] = all_lanes - {lane} 193 + else: 194 + self.lane_free[frame_id].discard(lane) 195 + ``` 196 + 197 + **Test updates:** 198 + 199 + All existing tests that assert on FrameAllocated or FrameFreed events need updated assertions to include the new fields. For existing single-activation tests: 200 + - `FrameAllocated` assertions add `lane=0` 201 + - `FrameFreed` assertions add `lane=0, frame_freed=True` 202 + 203 + Specific test files affected: 204 + - `tests/test_pe_frames.py` lines 505-507 (test_alloc_remote), 550-552 (test_free_frame_opcode), 122-125 (test_free_frame_control_token) 205 + - `tests/test_pe_events.py` — any tests asserting on FrameAllocated/FrameFreed event fields 206 + 207 + **Verification:** 208 + Run: `python -m pytest tests/ -v -x` 209 + Expected: All existing tests pass with updated event assertions. 210 + 211 + **Commit:** `jj commit -m "feat: add lane tracking and update FrameAllocated/FrameFreed events with lane fields"` 212 + <!-- END_TASK_1 --> 213 + 214 + <!-- START_TASK_2 --> 215 + ### Task 2: Implement ALLOC_SHARED and FREE_LANE handlers 216 + 217 + **Verifies:** frame-lanes.AC3.1, frame-lanes.AC3.2, frame-lanes.AC3.3, frame-lanes.AC3.4, frame-lanes.AC3.6 218 + 219 + **Files:** 220 + - Modify: `emu/pe.py` — add ALLOC_SHARED and FREE_LANE cases to `_handle_frame_control()` 221 + 222 + **Implementation:** 223 + 224 + Add two new cases to `_handle_frame_control()` after the existing ALLOC and FREE handlers: 225 + 226 + ```python 227 + elif token.op == FrameOp.ALLOC_SHARED: 228 + # Shared allocation: find parent's frame, assign next free lane 229 + parent_act_id = token.payload 230 + if parent_act_id not in self.tag_store: 231 + self._on_event(TokenRejected( 232 + time=self.env.now, component=self._component, 233 + token=token, reason=f"parent act_id {parent_act_id} not in tag store", 234 + )) 235 + return 236 + parent_frame_id, _ = self.tag_store[parent_act_id] 237 + free_lanes = self.lane_free.get(parent_frame_id, set()) 238 + if not free_lanes: 239 + self._on_event(TokenRejected( 240 + time=self.env.now, component=self._component, 241 + token=token, reason="no free lanes", 242 + )) 243 + return 244 + lane = min(free_lanes) # Deterministic: pick lowest free lane 245 + free_lanes.remove(lane) 246 + self.tag_store[token.act_id] = (parent_frame_id, lane) 247 + # Clear only this lane's match state 248 + for i in range(self.matchable_offsets): 249 + self.match_data[parent_frame_id][i][lane] = None 250 + self.presence[parent_frame_id][i][lane] = False 251 + self.port_store[parent_frame_id][i][lane] = None 252 + self._on_event(FrameAllocated( 253 + time=self.env.now, component=self._component, 254 + act_id=token.act_id, frame_id=parent_frame_id, lane=lane, 255 + )) 256 + 257 + elif token.op == FrameOp.FREE_LANE: 258 + # Free lane only — never returns frame to free list 259 + if token.act_id in self.tag_store: 260 + frame_id, lane = self.tag_store.pop(token.act_id) 261 + for i in range(self.matchable_offsets): 262 + self.match_data[frame_id][i][lane] = None 263 + self.presence[frame_id][i][lane] = False 264 + self.port_store[frame_id][i][lane] = None 265 + self.lane_free[frame_id].add(lane) 266 + self._on_event(FrameFreed( 267 + time=self.env.now, component=self._component, 268 + act_id=token.act_id, frame_id=frame_id, 269 + lane=lane, frame_freed=False, 270 + )) 271 + ``` 272 + 273 + **Testing:** 274 + 275 + No new tests in this task — AC3.1-AC3.6 are tested in Task 3. 276 + 277 + **Verification:** 278 + Run: `python -m pytest tests/ -v -x` 279 + Expected: All existing tests pass (new handlers only activate for new FrameOp values). 280 + 281 + **Commit:** `jj commit -m "feat: implement ALLOC_SHARED and FREE_LANE frame control handlers"` 282 + <!-- END_TASK_2 --> 283 + 284 + <!-- END_SUBCOMPONENT_A --> 285 + 286 + <!-- START_SUBCOMPONENT_B (tasks 3-4) --> 287 + 288 + <!-- START_TASK_3 --> 289 + ### Task 3: Tests for ALLOC_SHARED, lane exhaustion, and FREE_LANE 290 + 291 + **Verifies:** frame-lanes.AC3.1, frame-lanes.AC3.2, frame-lanes.AC3.6, frame-lanes.AC8.1, frame-lanes.AC8.2 292 + 293 + **Files:** 294 + - Create: `tests/test_pe_lanes.py` 295 + 296 + **Implementation:** 297 + 298 + Create a new test file dedicated to lane functionality. Follow the existing test patterns from `tests/test_pe_frames.py`: 299 + - `simpy.Environment()` setup 300 + - `PEConfig` with `on_event=events.append` 301 + - `ProcessingElement(env, pe_id, config)` construction 302 + - Token injection via `pe.input_store.put(token)` in a SimPy process 303 + - Event collection via the events list 304 + 305 + **Testing:** 306 + 307 + Tests must verify these specific AC cases: 308 + 309 + - **frame-lanes.AC3.1 (ALLOC_SHARED):** Send `FrameControlToken(op=FrameOp.ALLOC, act_id=0, payload=0)` to allocate a frame. Then send `FrameControlToken(op=FrameOp.ALLOC_SHARED, act_id=1, payload=0)` where payload is the parent act_id. Verify `tag_store[1]` has the same `frame_id` as `tag_store[0]` but a different lane. Verify `FrameAllocated` event has correct lane. 310 + 311 + - **frame-lanes.AC3.2 (FREE_LANE):** After ALLOC_SHARED, send `FrameControlToken(op=FrameOp.FREE_LANE, act_id=1, payload=0)`. Verify `tag_store` no longer has act_id 1. Verify act_id 0 is still present. Verify frame is NOT in `free_frames`. Verify `FrameFreed` event has `frame_freed=False`. 312 + 313 + - **frame-lanes.AC3.6 (lane exhaustion):** Allocate a frame, then ALLOC_SHARED until all `lane_count` lanes are occupied. Send one more ALLOC_SHARED. Verify `TokenRejected` event with reason "no free lanes". 314 + 315 + - **frame-lanes.AC8.1 (independent matching):** Set up two act_ids sharing a frame (ALLOC + ALLOC_SHARED). Load an instruction with IRAM. Send L operand via DyadToken for act_id 0 and L operand via DyadToken for act_id 1 at the same offset. Verify both presence bits are set independently — neither token triggers a match (both are waiting for their R partner). Then send R for act_id 0 — verify only act_id 0 matches and fires, act_id 1's L is still pending. 316 + 317 + - **frame-lanes.AC8.2 (exhaustion):** Same as AC3.6 but via the test coverage AC numbering — allocate all lanes, attempt one more, verify TokenRejected. 318 + 319 + **Verification:** 320 + Run: `python -m pytest tests/test_pe_lanes.py -v` 321 + Expected: All new tests pass. 322 + 323 + Run: `python -m pytest tests/ -v -x` 324 + Expected: All tests pass (new and existing). 325 + 326 + **Commit:** `jj commit -m "test: add tests for ALLOC_SHARED, FREE_LANE, and lane exhaustion"` 327 + <!-- END_TASK_3 --> 328 + 329 + <!-- START_TASK_4 --> 330 + ### Task 4: Tests for smart FREE behaviour 331 + 332 + **Verifies:** frame-lanes.AC3.3, frame-lanes.AC3.4, frame-lanes.AC8.3 333 + 334 + **Files:** 335 + - Modify: `tests/test_pe_lanes.py` — add smart FREE test class 336 + 337 + **Implementation:** 338 + 339 + Add tests to the lane test file created in Task 3. 340 + 341 + **Testing:** 342 + 343 + Tests must verify these specific AC cases: 344 + 345 + - **frame-lanes.AC3.3 (smart FREE on shared frame):** Allocate a frame (act_id=0, lane 0). ALLOC_SHARED (act_id=1, lane 1). Send matching operands to act_id=1 so presence bits are set. FREE act_id=0. Verify: act_id=0 removed from tag_store, act_id=1 still present, frame NOT in free_frames, act_id=1's pending match data is preserved (presence bit still True for lane 1). Verify `FrameFreed` event has `frame_freed=False`. 346 + 347 + - **frame-lanes.AC3.3 (smart FREE on last lane):** Same setup. FREE act_id=0, then FREE act_id=1. After second FREE: frame IS returned to free_frames, `lane_free` entry for that frame is cleaned up. Verify `FrameFreed` event has `frame_freed=True`. 348 + 349 + - **frame-lanes.AC3.4 (ALLOC unchanged):** Verify that regular ALLOC still works — allocates fresh frame, assigns lane 0, no parent required. This is a regression check. 350 + 351 + - **frame-lanes.AC8.3 (data preservation):** Set up shared frame with two act_ids. Store a DyadToken L operand on act_id=1's lane. FREE act_id=0. Verify act_id=1's match_data and presence are untouched — the pending operand is still there. 352 + 353 + **Verification:** 354 + Run: `python -m pytest tests/test_pe_lanes.py -v` 355 + Expected: All tests pass. 356 + 357 + Run: `python -m pytest tests/ -v -x` 358 + Expected: All tests pass. 359 + 360 + **Commit:** `jj commit -m "test: add tests for smart FREE behaviour and data preservation across lanes"` 361 + <!-- END_TASK_4 --> 362 + 363 + <!-- END_SUBCOMPONENT_B -->
+191
docs/implementation-plans/2026-03-07-frame-lanes/phase_04.md
··· 1 + # Frame Matching Lanes Implementation Plan 2 + 3 + **Goal:** Extend the PE's frame-based matching to support multiple simultaneous pending operands per instruction within a single activation via matching lanes. 4 + 5 + **Architecture:** Multiple `activation_id` values share one physical frame (constants/destinations) while maintaining independent matching state per lane. Tag store maps `act_id → (frame_id, lane)`. Match data, presence, and port storage gain a lane dimension. 6 + 7 + **Tech Stack:** Python 3.12, SimPy 4.1, pytest + hypothesis 8 + 9 + **Scope:** 6 phases from original design (phases 1-6) 10 + 11 + **Codebase verified:** 2026-03-07 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + This phase implements and tests: 18 + 19 + ### frame-lanes.AC4: ALLOC_REMOTE Data-Driven 20 + - **frame-lanes.AC4.1 Success:** ALLOC_REMOTE reads `fref+2` from frame. If value is non-zero, emits `FrameControlToken` with `op=ALLOC_SHARED` and `payload=parent_act_id`. If zero, emits `op=ALLOC` as before. 21 + - **frame-lanes.AC4.2 Success:** No new opcodes. Behaviour is entirely data-driven from frame constants. 22 + 23 + ### frame-lanes.AC5: FREE_FRAME Instruction 24 + - **frame-lanes.AC5.1 Success:** `FREE_FRAME` opcode uses the smart FREE behaviour from AC3.3. Frees the executing token's activation lane; returns frame to free list only if last lane. 25 + 26 + ### frame-lanes.AC8: Test Coverage (partial) 27 + - **frame-lanes.AC8.4 Success:** Test: ALLOC_REMOTE emits ALLOC_SHARED when `fref+2` is non-zero. 28 + - **frame-lanes.AC8.5 Success:** Test: ALLOC_REMOTE emits ALLOC when `fref+2` is zero (backwards compatible). 29 + 30 + --- 31 + 32 + <!-- START_SUBCOMPONENT_A (tasks 1-2) --> 33 + 34 + <!-- START_TASK_1 --> 35 + ### Task 1: Update ALLOC_REMOTE to read fref+2 for data-driven shared allocation 36 + 37 + **Verifies:** frame-lanes.AC4.1, frame-lanes.AC4.2 38 + 39 + **Files:** 40 + - Modify: `emu/pe.py:231-248` — ALLOC_REMOTE handler in `_process_token()` 41 + 42 + **Implementation:** 43 + 44 + The current ALLOC_REMOTE handler reads `fref+0` (target PE) and `fref+1` (target act_id) from frame constants, then emits a `FrameControlToken(op=FrameOp.ALLOC)`. Extend it to also read `fref+2` (parent act_id for shared allocation). 45 + 46 + **Updated handler:** 47 + 48 + ```python 49 + elif inst.opcode == RoutingOp.ALLOC_REMOTE: 50 + # PE-level: read target PE, act_id, and optional parent act_id from frame constants 51 + # fref+0: target PE 52 + # fref+1: target act_id 53 + # fref+2: parent act_id (0 = fresh ALLOC, non-zero = ALLOC_SHARED) 54 + target_pe = self.frames[frame_id][inst.fref] if inst.fref < len(self.frames[frame_id]) else 0 55 + target_act = self.frames[frame_id][inst.fref + 1] if inst.fref + 1 < len(self.frames[frame_id]) else 0 56 + parent_act = self.frames[frame_id][inst.fref + 2] if inst.fref + 2 < len(self.frames[frame_id]) else 0 57 + 58 + if parent_act: 59 + alloc_op = FrameOp.ALLOC_SHARED 60 + payload = parent_act 61 + else: 62 + alloc_op = FrameOp.ALLOC 63 + payload = 0 64 + 65 + fct = FrameControlToken( 66 + target=target_pe, 67 + act_id=target_act, 68 + op=alloc_op, 69 + payload=payload, 70 + ) 71 + self._on_event(Executed( 72 + time=self.env.now, component=self._component, 73 + op=inst.opcode, result=0, bool_out=False, 74 + )) 75 + yield self.env.timeout(1) # EXECUTE cycle 76 + yield self.env.timeout(1) # EMIT cycle 77 + self.env.process(self._deliver(self.route_table[target_pe], fct)) 78 + ``` 79 + 80 + Key changes from current code: 81 + - Added `parent_act` read from `fref+2` 82 + - Conditional: if `parent_act` is non-zero, use `ALLOC_SHARED` with `payload=parent_act`; otherwise, use `ALLOC` with `payload=0` (backwards compatible) 83 + - No new opcodes (AC4.2) 84 + - Note: Frame slots at `fref+0`, `fref+1`, `fref+2` must be `int` values (not `FrameDest`). The codegen guarantees this for properly assembled programs. No runtime type check added, consistent with the existing ALLOC_REMOTE pattern at `fref+0` and `fref+1`. 85 + 86 + **Testing:** 87 + 88 + No new tests in this task — AC4.1 and AC4.2 are tested in Task 3. 89 + 90 + **Verification:** 91 + Run: `python -m pytest tests/ -v -x` 92 + Expected: All existing tests pass. Existing ALLOC_REMOTE tests set up frames with only `fref+0` and `fref+1` populated — `fref+2` defaults to `None` which is falsy, so existing tests get `op=ALLOC` as before. 93 + 94 + **Commit:** `jj commit -m "feat: ALLOC_REMOTE reads fref+2 for data-driven ALLOC_SHARED"` 95 + <!-- END_TASK_1 --> 96 + 97 + <!-- START_TASK_2 --> 98 + ### Task 2: Verify FREE_FRAME uses smart FREE behaviour 99 + 100 + **Verifies:** frame-lanes.AC5.1 101 + 102 + **Files:** 103 + - Verify: `emu/pe.py:249-267` — FREE_FRAME handler 104 + 105 + **Implementation:** 106 + 107 + After Phase 3, the FREE_FRAME opcode handler in `_process_token()` already uses the smart FREE behaviour (tag_store.pop with tuple unpacking, lane data clearing, frame-in-use check). This task verifies that the Phase 3 changes correctly cover the FREE_FRAME path. 108 + 109 + If Phase 3 was implemented correctly, the FREE_FRAME handler at lines 260-266 should already: 110 + 1. Unpack `frame_id, lane = self.tag_store.pop(token.act_id)` 111 + 2. Clear the lane's match_data/presence/port_store 112 + 3. Check if other activations reference the same frame_id 113 + 4. Return frame to free_frames only if last lane 114 + 5. Emit FrameFreed with `lane` and `frame_freed` fields 115 + 116 + If the Phase 3 implementation only updated `_handle_frame_control()` FREE and forgot the FREE_FRAME opcode path, this task is where you fix it. Both paths must have identical smart FREE logic. 117 + 118 + **Testing:** 119 + 120 + No new dedicated tests — AC5.1 is a subset of AC3.3 applied to a different code path. The existing `test_free_frame_opcode` test in `tests/test_pe_frames.py` verifies the basic path; the lane-aware behaviour is tested via the AC8.3 tests in Phase 3. 121 + 122 + **Verification:** 123 + Run: `python -m pytest tests/test_pe_frames.py -v -k "free_frame"` 124 + Expected: All FREE_FRAME tests pass. 125 + 126 + **Commit:** No commit needed if Phase 3 already handled this path. If a fix is needed: `jj commit -m "fix: ensure FREE_FRAME opcode uses smart FREE behaviour"` 127 + <!-- END_TASK_2 --> 128 + 129 + <!-- END_SUBCOMPONENT_A --> 130 + 131 + <!-- START_SUBCOMPONENT_B (tasks 3-4) --> 132 + 133 + <!-- START_TASK_3 --> 134 + ### Task 3: Tests for data-driven ALLOC_REMOTE 135 + 136 + **Verifies:** frame-lanes.AC8.4, frame-lanes.AC8.5 137 + 138 + **Files:** 139 + - Modify: `tests/test_pe_lanes.py` — add ALLOC_REMOTE data-driven tests 140 + 141 + **Implementation:** 142 + 143 + Add tests to the lane test file. 144 + 145 + **Testing:** 146 + 147 + Tests must verify these specific AC cases: 148 + 149 + - **frame-lanes.AC8.4 (ALLOC_REMOTE emits ALLOC_SHARED):** Set up PE0 with an allocated frame. Write frame slots: `fref+0 = 1` (target PE 1), `fref+1 = 5` (target act_id), `fref+2 = 3` (parent act_id, non-zero). Load ALLOC_REMOTE instruction. Send a MonadToken to trigger it. Set up PE1 with a route table entry so we can capture the emitted token. Verify the FrameControlToken sent to PE1 has `op=FrameOp.ALLOC_SHARED` and `payload=3`. 150 + 151 + For capturing the emitted FrameControlToken: use `output_store = simpy.Store(env)` and set `pe.route_table[1] = output_store`, then check `output_store.items[0]`. 152 + 153 + - **frame-lanes.AC8.5 (ALLOC_REMOTE emits ALLOC when fref+2 is zero):** Same setup but `fref+2 = 0` (or slot is None). Verify the FrameControlToken has `op=FrameOp.ALLOC` and `payload=0`. This is the backwards-compatible path. 154 + 155 + **Verification:** 156 + Run: `python -m pytest tests/test_pe_lanes.py -v -k "alloc_remote"` 157 + Expected: All new tests pass. 158 + 159 + Run: `python -m pytest tests/ -v -x` 160 + Expected: All tests pass. 161 + 162 + **Commit:** `jj commit -m "test: add tests for data-driven ALLOC_REMOTE (ALLOC_SHARED vs ALLOC)"` 163 + <!-- END_TASK_3 --> 164 + 165 + <!-- START_TASK_4 --> 166 + ### Task 4: Test FREE_FRAME with shared frame (smart FREE via opcode) 167 + 168 + **Verifies:** frame-lanes.AC5.1 169 + 170 + **Files:** 171 + - Modify: `tests/test_pe_lanes.py` — add FREE_FRAME smart free test 172 + 173 + **Implementation:** 174 + 175 + Add a test that exercises the FREE_FRAME opcode path specifically (not the FrameControlToken FREE path, which is already tested in Phase 3). 176 + 177 + **Testing:** 178 + 179 + - **frame-lanes.AC5.1 (FREE_FRAME smart free):** Set up PE with a shared frame (two act_ids on the same frame via initial_tag_store with different lanes). Load FREE_FRAME instruction at the offset used by act_id=0. Send a MonadToken to act_id=0 to trigger FREE_FRAME execution. Verify: act_id=0's lane is freed, act_id=1 is still in tag_store with the same frame, frame is NOT in free_frames, FrameFreed event has `frame_freed=False`. Then trigger FREE_FRAME for act_id=1. Verify: frame IS returned to free_frames, FrameFreed event has `frame_freed=True`. 180 + 181 + **Verification:** 182 + Run: `python -m pytest tests/test_pe_lanes.py -v -k "free_frame"` 183 + Expected: All tests pass. 184 + 185 + Run: `python -m pytest tests/ -v -x` 186 + Expected: All tests pass. 187 + 188 + **Commit:** `jj commit -m "test: add test for FREE_FRAME opcode with smart free on shared frame"` 189 + <!-- END_TASK_4 --> 190 + 191 + <!-- END_SUBCOMPONENT_B -->
+195
docs/implementation-plans/2026-03-07-frame-lanes/phase_05.md
··· 1 + # Frame Matching Lanes Implementation Plan 2 + 3 + **Goal:** Extend the PE's frame-based matching to support multiple simultaneous pending operands per instruction within a single activation via matching lanes. 4 + 5 + **Architecture:** Multiple `activation_id` values share one physical frame (constants/destinations) while maintaining independent matching state per lane. Tag store maps `act_id → (frame_id, lane)`. Match data, presence, and port storage gain a lane dimension. 6 + 7 + **Tech Stack:** Python 3.12, SimPy 4.1, pytest + hypothesis 8 + 9 + **Scope:** 6 phases from original design (phases 1-6) 10 + 11 + **Codebase verified:** 2026-03-07 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + This phase implements and tests: 18 + 19 + ### frame-lanes.AC6: Monitor and Snapshot Updates 20 + - **frame-lanes.AC6.1 Success:** `PESnapshot.tag_store` type becomes `dict[int, tuple[int, int]]`. 21 + - **frame-lanes.AC6.2 Success:** `PESnapshot` gains `match_data`, `lane_count` fields reflecting the separated match storage. 22 + - **frame-lanes.AC6.3 Success:** Monitor REPL `pe` command displays lane info in tag_store output. 23 + - **frame-lanes.AC6.4 Success:** Monitor graph JSON serialises lane info correctly. 24 + 25 + ### frame-lanes.AC7: Codegen Updates 26 + - **frame-lanes.AC7.1 Success:** `codegen.py` generates `initial_tag_store` with `(frame_id, lane)` tuples. Existing single-activation code uses lane 0. 27 + - **frame-lanes.AC7.2 Success:** No codegen changes needed for ALLOC_SHARED (manual construction only for now). 28 + 29 + --- 30 + 31 + Note: AC6.1 and AC6.2 (PESnapshot type changes) are structurally completed in Phase 2 Task 3 as part of the match storage separation. This phase covers the remaining monitor/codegen pieces that depend on those type changes. 32 + 33 + <!-- START_SUBCOMPONENT_A (tasks 1-2) --> 34 + 35 + <!-- START_TASK_1 --> 36 + ### Task 1: Update Monitor REPL formatting for lane info 37 + 38 + **Verifies:** frame-lanes.AC6.3 39 + 40 + **Files:** 41 + - Modify: `monitor/formatting.py:194-202` — `format_pe_state()` tag_store display 42 + 43 + **Implementation:** 44 + 45 + The current tag_store display at `monitor/formatting.py:194-202` formats entries as `{key: value}`. After Phase 1, tag_store values are `(frame_id, lane)` tuples. Update the formatting to show lane info clearly. 46 + 47 + **Current code (lines 194-202):** 48 + ```python 49 + if pe_snapshot.tag_store: 50 + tag_str = ", ".join( 51 + f"{colour(str(k), 'white')}: {colour(str(v), 'white')}" 52 + for k, v in sorted(pe_snapshot.tag_store.items()) 53 + ) 54 + lines.append(f" Tag store: {{{tag_str}}}") 55 + else: 56 + lines.append(" Tag store: (empty)") 57 + ``` 58 + 59 + **Updated code:** 60 + ```python 61 + if pe_snapshot.tag_store: 62 + tag_str = ", ".join( 63 + f"{colour(str(k), 'white')}: frame {colour(str(fid), 'white')} lane {colour(str(lane), 'white')}" 64 + for k, (fid, lane) in sorted(pe_snapshot.tag_store.items()) 65 + ) 66 + lines.append(f" Tag store: {{{tag_str}}}") 67 + else: 68 + lines.append(" Tag store: (empty)") 69 + ``` 70 + 71 + This changes the display from `{0: 0}` to `{0: frame 0 lane 0}`, making lane assignments visible at a glance. 72 + 73 + **Testing:** 74 + 75 + The REPL tests at `tests/test_repl.py:456-473` verify that `do_pe()` produces output but don't assert on specific tag_store formatting content. The formatting change is verified by manual inspection. Existing tests remain valid because they only check `len(out) > 0`. 76 + 77 + **Verification:** 78 + Run: `python -m pytest tests/test_repl.py -v` 79 + Expected: All tests pass. 80 + 81 + **Commit:** `jj commit -m "feat: display lane info in monitor REPL pe command"` 82 + <!-- END_TASK_1 --> 83 + 84 + <!-- START_TASK_2 --> 85 + ### Task 2: Update Monitor graph JSON for lane serialisation 86 + 87 + **Verifies:** frame-lanes.AC6.4 88 + 89 + **Files:** 90 + - Modify: `monitor/graph_json.py:104-127` — `_serialise_pe_state()` function 91 + - Modify: `tests/test_monitor_graph_json.py` — update PESnapshot constructions 92 + 93 + **Implementation:** 94 + 95 + The current `_serialise_pe_state()` at `monitor/graph_json.py:124` passes `tag_store` directly to JSON: 96 + ```python 97 + "tag_store": pe_snap.tag_store, 98 + ``` 99 + 100 + After Phase 1, tag_store values are `(frame_id, lane)` tuples. Use explicit dict format for self-documenting JSON that the TypeScript frontend can easily type: 101 + 102 + ```python 103 + "tag_store": { 104 + str(act_id): {"frame_id": fid, "lane": lane} 105 + for act_id, (fid, lane) in pe_snap.tag_store.items() 106 + }, 107 + "lane_count": pe_snap.lane_count, 108 + ``` 109 + 110 + This produces JSON like `{"0": {"frame_id": 2, "lane": 0}}` instead of `{"0": [2, 0]}`, which is more explicit and easier to type in TypeScript. 111 + 112 + **Test updates:** 113 + 114 + Update `tests/test_monitor_graph_json.py` PESnapshot constructions to include the new fields (`match_data`, `lane_count`). Tests that construct `PESnapshot` directly with `tag_store={}` will work as-is (empty dict). Tests that use non-empty tag_store must change values from `int` to `tuple[int, int]`. 115 + 116 + **Verification:** 117 + Run: `python -m pytest tests/test_monitor_graph_json.py -v` 118 + Expected: All tests pass. 119 + 120 + **Commit:** `jj commit -m "feat: serialise lane info in monitor graph JSON"` 121 + <!-- END_TASK_2 --> 122 + 123 + <!-- END_SUBCOMPONENT_A --> 124 + 125 + <!-- START_SUBCOMPONENT_B (tasks 3-4) --> 126 + 127 + <!-- START_TASK_3 --> 128 + ### Task 3: Update codegen to emit tuple initial_tag_store 129 + 130 + **Verifies:** frame-lanes.AC7.1 131 + 132 + **Files:** 133 + - Modify: `asm/codegen.py:383` — first initial_tag_store assignment 134 + - Modify: `asm/codegen.py:412` — second initial_tag_store assignment 135 + 136 + **Implementation:** 137 + 138 + The codegen at `asm/codegen.py` builds `initial_tag_store` as `dict[int, int]` mapping `act_id → frame_id`. Change both assignment sites to produce `dict[int, tuple[int, int]]` mapping `act_id → (frame_id, lane)` with lane 0 for all existing single-activation code. 139 + 140 + **Line 383 (empty layout path):** 141 + ```python 142 + initial_tag_store[act_id] = (frame_id, 0) 143 + ``` 144 + 145 + **Line 412 (populated layout path):** 146 + ```python 147 + initial_tag_store[act_id] = (frame_id, 0) 148 + ``` 149 + 150 + No other changes needed. The type annotation for the local variable can be updated: 151 + ```python 152 + initial_tag_store: dict[int, tuple[int, int]] = {} 153 + ``` 154 + 155 + **Testing:** 156 + 157 + Existing codegen tests at `tests/test_codegen_frames.py` don't explicitly assert on `initial_tag_store` contents — they test IRAM, setup_tokens, and seed_tokens. The type change is validated by the overall test suite passing (PEConfig now expects tuple values from Phase 1). 158 + 159 + **Verification:** 160 + Run: `python -m pytest tests/test_codegen_frames.py -v` 161 + Expected: All tests pass. 162 + 163 + Run: `python -m pytest tests/ -v -x` 164 + Expected: All tests pass. 165 + 166 + **Commit:** `jj commit -m "feat: codegen emits initial_tag_store with (frame_id, lane) tuples"` 167 + <!-- END_TASK_3 --> 168 + 169 + <!-- START_TASK_4 --> 170 + ### Task 4: Verify no codegen changes needed for ALLOC_SHARED 171 + 172 + **Verifies:** frame-lanes.AC7.2 173 + 174 + **Files:** 175 + - No files to modify 176 + 177 + **Implementation:** 178 + 179 + AC7.2 states: "No codegen changes needed for ALLOC_SHARED (manual construction only for now)." This is a verification task — confirm that the codegen does not attempt to generate ALLOC_SHARED tokens or any lane-related control flow. ALLOC_SHARED is invoked only via manual test construction or hand-crafted assembly. 180 + 181 + **Verification:** 182 + 183 + Grep codegen for any reference to ALLOC_SHARED or FREE_LANE: 184 + ```bash 185 + grep -r "ALLOC_SHARED\|FREE_LANE" asm/ 186 + ``` 187 + Expected: No results. The codegen is unaware of these new FrameOp values. 188 + 189 + Run: `python -m pytest tests/ -v -x` 190 + Expected: All tests pass. 191 + 192 + **Commit:** No commit needed — this is a verification-only task. 193 + <!-- END_TASK_4 --> 194 + 195 + <!-- END_SUBCOMPONENT_B -->
+121
docs/implementation-plans/2026-03-07-frame-lanes/phase_06.md
··· 1 + # Frame Matching Lanes Implementation Plan 2 + 3 + **Goal:** Extend the PE's frame-based matching to support multiple simultaneous pending operands per instruction within a single activation via matching lanes. 4 + 5 + **Architecture:** Multiple `activation_id` values share one physical frame (constants/destinations) while maintaining independent matching state per lane. Tag store maps `act_id → (frame_id, lane)`. Match data, presence, and port storage gain a lane dimension. 6 + 7 + **Tech Stack:** Python 3.12, SimPy 4.1, pytest + hypothesis 8 + 9 + **Scope:** 6 phases from original design (phases 1-6) 10 + 11 + **Codebase verified:** 2026-03-07 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + This phase implements and tests: 18 + 19 + ### frame-lanes.AC8: Test Coverage (final) 20 + - **frame-lanes.AC8.6 Success:** Test: full loop pipelining scenario — two iterations of a dyadic instruction running concurrently on different lanes, both producing correct results. 21 + 22 + --- 23 + 24 + <!-- START_TASK_1 --> 25 + ### Task 1: Full loop pipelining integration test 26 + 27 + **Verifies:** frame-lanes.AC8.6 28 + 29 + **Files:** 30 + - Modify: `tests/test_pe_lanes.py` — add integration test class 31 + 32 + **Implementation:** 33 + 34 + Add an integration test that simulates the complete loop pipelining lifecycle from the design plan's Architecture section. This test exercises every Phase 1-5 feature together. 35 + 36 + **Testing:** 37 + 38 + The test must verify frame-lanes.AC8.6 by simulating this lifecycle: 39 + 40 + ``` 41 + 1. ALLOC(act_id=0) → frame, lane 0 42 + 2. Setup: write constants/dests to frame 43 + 3. Iteration 1: inject L and R DyadTokens for act_id=0 44 + 4. ALLOC_SHARED(act_id=1, parent=0) → same frame, lane 1 45 + 5. Iteration 2: inject L and R DyadTokens for act_id=1 46 + 6. Both iterations match independently, both produce correct results 47 + 7. FREE(act_id=0) → lane 0 freed, frame stays 48 + 8. FREE(act_id=1) → last lane, frame returned to free list 49 + ``` 50 + 51 + **Test structure:** 52 + 53 + Follow the established test patterns from `tests/test_pe_frames.py`: 54 + - Use `simpy.Environment()` and `PEConfig(on_event=events.append)` 55 + - Direct PE construction (no `build_topology` needed for single-PE test) 56 + - Use `inject_and_run()` helper for sequential token injection 57 + - Use a `simpy.Store` as `pe.route_table[target]` to capture output tokens 58 + 59 + **Detailed scenario:** 60 + 61 + 1. **PE setup:** Create PE with `frame_count=4`, `matchable_offsets=4`, no pre-loaded frames or tag_store. Install a dyadic ADD instruction at IRAM offset 0 with `OutputStyle.INHERIT`, `dest_count=1`, `fref=8`. 62 + 63 + 2. **Allocate frame for iteration 1:** 64 + - Inject `FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0)` 65 + - Verify `FrameAllocated` event with `lane=0` 66 + - Get `frame_id` from `pe.tag_store[0]` 67 + 68 + 3. **Write destination to frame:** 69 + - Write a `FrameDest(target_pe=1, offset=0, act_id=0, port=Port.L, token_kind=TokenKind.MONADIC)` to `pe.frames[frame_id][8]` 70 + - Set up `pe.route_table[1] = simpy.Store(env)` to capture output 71 + 72 + 4. **Allocate shared frame for iteration 2:** 73 + - Inject `FrameControlToken(target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0)` (payload=0 is parent act_id) 74 + - Verify `FrameAllocated` event with `lane=1` and same `frame_id` 75 + - Verify `pe.tag_store[1][0] == pe.tag_store[0][0]` (same frame_id) 76 + 77 + 5. **Inject iteration 1 operands (act_id=0):** 78 + - Inject `DyadToken(target=0, offset=0, act_id=0, data=100, port=Port.L)` 79 + - Inject `DyadToken(target=0, offset=0, act_id=0, data=200, port=Port.R)` 80 + - Verify `Matched` event for `act_id=0` with `left=100, right=200` 81 + - Verify output token emitted with `data=300` (100+200) 82 + 83 + 6. **Inject iteration 2 operands (act_id=1):** 84 + - Inject `DyadToken(target=0, offset=0, act_id=1, data=1000, port=Port.L)` 85 + - Inject `DyadToken(target=0, offset=0, act_id=1, data=2000, port=Port.R)` 86 + - Verify `Matched` event for `act_id=1` with `left=1000, right=2000` 87 + - Verify output token emitted with `data=3000` (1000+2000) 88 + 89 + 7. **Interleaved verification:** 90 + - Confirm that iteration 1's L operand (injected first) did NOT interfere with iteration 2's matching — they're on different lanes 91 + - Both `Matched` events should have independent operand values 92 + 93 + 8. **Free iteration 1 (not last lane):** 94 + - Inject `FrameControlToken(target=0, act_id=0, op=FrameOp.FREE, payload=0)` 95 + - Verify `FrameFreed` with `frame_freed=False` 96 + - Verify `0 not in pe.tag_store` 97 + - Verify `1 in pe.tag_store` (iteration 2 still active) 98 + - Verify `frame_id not in pe.free_frames` (frame stays allocated) 99 + 100 + 9. **Free iteration 2 (last lane):** 101 + - Inject `FrameControlToken(target=0, act_id=1, op=FrameOp.FREE, payload=0)` 102 + - Verify `FrameFreed` with `frame_freed=True` 103 + - Verify `1 not in pe.tag_store` 104 + - Verify `frame_id in pe.free_frames` (frame returned to pool) 105 + 106 + **Key assertions for AC8.6:** 107 + - Both iterations produce mathematically correct results (100+200=300, 1000+2000=3000) 108 + - Both iterations ran on the SAME frame (shared constants/destinations) 109 + - Both iterations used DIFFERENT lanes (lane 0 and lane 1) 110 + - Freeing one iteration preserved the other's state 111 + - Freeing the last iteration returned the frame 112 + 113 + **Verification:** 114 + Run: `python -m pytest tests/test_pe_lanes.py -v -k "loop_pipelining"` 115 + Expected: Test passes. 116 + 117 + Run: `python -m pytest tests/ -v -x` 118 + Expected: All tests pass. 119 + 120 + **Commit:** `jj commit -m "test: add full loop pipelining integration test (AC8.6)"` 121 + <!-- END_TASK_1 -->
+40
docs/implementation-plans/2026-03-07-frame-lanes/test-requirements.md
··· 1 + # Test Requirements: Frame Matching Lanes 2 + 3 + ## Automated Test Coverage 4 + 5 + | AC ID | Criterion | Test Type | Expected Test File | Implementation Phase | 6 + |-------|-----------|-----------|-------------------|---------------------| 7 + | frame-lanes.AC1.1 | `tag_store` maps `act_id -> (frame_id, lane)` where `lane` is an `int` in range `[0, lane_count)` | unit | `tests/test_pe_frames.py` (existing tests adapted to tuple API) | Phase 1 | 8 + | frame-lanes.AC1.2 | `PEConfig.initial_tag_store` type is `dict[int, tuple[int, int]]`. PE constructor initialises tag_store from it. | unit | `tests/test_pe_frames.py`, `tests/test_pe_events.py`, `tests/test_pe.py` (all existing tests updated to pass tuple values) | Phase 1 | 9 + | frame-lanes.AC1.3 | `PEConfig.lane_count` field exists with default 4. Controls third dimension of match arrays. | unit | `tests/test_pe_lanes.py` (verified structurally when match arrays gain lane dimension in Phase 2) | Phase 1 | 10 + | frame-lanes.AC1.4 | All existing tests pass with updated tuple API. | integration | `tests/` (full test suite regression run) | Phase 1 | 11 + | frame-lanes.AC2.1 | Match operand data lives in `match_data[frame_id][offset][lane]`, separate from `frames[frame_id][slot]`. | unit | `tests/test_pe_frames.py`, `tests/test_pe.py` (existing matching tests exercise 3D match_data via lane 0) | Phase 2 | 12 + | frame-lanes.AC2.2 | `presence[frame_id][offset][lane]` is a 3D bool array. `port_store[frame_id][offset][lane]` likewise. | unit | `tests/test_pe.py` (existing presence assertions updated from 2D to 3D indexing with `[0]` lane suffix) | Phase 2 | 13 + | frame-lanes.AC2.3 | `_match_frame()` uses `(frame_id, match_slot, lane)` to read/write match data, presence, and port. | unit | `tests/test_pe_frames.py`, `tests/test_pe.py` (existing matching tests pass through `_match_frame` with lane parameter) | Phase 2 | 14 + | frame-lanes.AC2.4 | `frames[frame_id][slot]` remains shared across all lanes. Constants and destinations are NOT per-lane. | unit | `tests/test_pe_lanes.py` (verified in Phase 3 AC8.1 test: two act_ids sharing a frame read the same frame slot constants) | Phase 3 | 15 + | frame-lanes.AC3.1 | `FrameOp.ALLOC_SHARED`: PE looks up `parent_act_id` from payload, finds parent's `frame_id`, assigns next free lane, records `tag_store[act_id] = (frame_id, lane)`. Clears only that lane's presence/port bits. | unit | `tests/test_pe_lanes.py` | Phase 3 | 16 + | frame-lanes.AC3.2 | `FrameOp.FREE_LANE`: Removes tag_store entry, clears that lane's presence/port/match_data. Does NOT return frame to free list. | unit | `tests/test_pe_lanes.py` | Phase 3 | 17 + | frame-lanes.AC3.3 | `FrameOp.FREE` becomes smart: removes tag_store entry, clears lane data. Returns frame to free list only if no other tag_store entries reference the same frame_id. | unit | `tests/test_pe_lanes.py` | Phase 3 | 18 + | frame-lanes.AC3.4 | `FrameOp.ALLOC` unchanged: allocates fresh frame, assigns lane 0. | unit | `tests/test_pe_lanes.py` (regression check), `tests/test_pe_frames.py` (existing ALLOC tests) | Phase 3 | 19 + | frame-lanes.AC3.5 | `FrameAllocated` event gains `lane: int` field. `FrameFreed` event gains `lane: int` and `frame_freed: bool` fields. | unit | `tests/test_pe_events.py`, `tests/test_pe_frames.py` (existing event assertions updated with new fields) | Phase 3 | 20 + | frame-lanes.AC3.6 | When all lanes for a frame are occupied and ALLOC_SHARED is received, PE emits `TokenRejected` with reason "no free lanes". | unit | `tests/test_pe_lanes.py` | Phase 3 | 21 + | frame-lanes.AC4.1 | ALLOC_REMOTE reads `fref+2` from frame. If non-zero, emits `FrameControlToken` with `op=ALLOC_SHARED` and `payload=parent_act_id`. If zero, emits `op=ALLOC`. | unit | `tests/test_pe_lanes.py` | Phase 4 | 22 + | frame-lanes.AC4.2 | No new opcodes. Behaviour is entirely data-driven from frame constants. | unit | Verified by absence: `grep -r "ALLOC_SHARED\|FREE_LANE" asm/` returns no results. No dedicated test file. | Phase 5 | 23 + | frame-lanes.AC5.1 | `FREE_FRAME` opcode uses the smart FREE behaviour from AC3.3. Frees the executing token's activation lane; returns frame to free list only if last lane. | unit | `tests/test_pe_lanes.py` | Phase 4 | 24 + | frame-lanes.AC6.1 | `PESnapshot.tag_store` type becomes `dict[int, tuple[int, int]]`. | unit | `tests/test_snapshot.py`, `tests/test_monitor_graph_json.py` (PESnapshot constructions updated) | Phase 2 | 25 + | frame-lanes.AC6.2 | `PESnapshot` gains `match_data`, `lane_count` fields reflecting the separated match storage. | unit | `tests/test_snapshot.py` (snapshot capture assertions updated for new fields) | Phase 2 | 26 + | frame-lanes.AC6.4 | Monitor graph JSON serialises lane info correctly. | unit | `tests/test_monitor_graph_json.py` (assertions on serialised tag_store JSON structure with frame_id/lane keys) | Phase 5 | 27 + | frame-lanes.AC7.1 | `codegen.py` generates `initial_tag_store` with `(frame_id, lane)` tuples. Existing single-activation code uses lane 0. | unit | `tests/test_codegen_frames.py` (existing codegen tests pass with tuple-valued initial_tag_store) | Phase 5 | 28 + | frame-lanes.AC7.2 | No codegen changes needed for ALLOC_SHARED (manual construction only for now). | unit | Verified by absence: `grep -r "ALLOC_SHARED\|FREE_LANE" asm/` returns no results. No dedicated test file. | Phase 5 | 29 + | frame-lanes.AC8.1 | Two act_ids sharing a frame via ALLOC_SHARED have independent matching: L operand for act_id 0 does not interfere with L operand for act_id 1 at the same offset. | unit | `tests/test_pe_lanes.py` | Phase 3 | 30 + | frame-lanes.AC8.2 | ALLOC_SHARED with all lanes occupied emits TokenRejected. | unit | `tests/test_pe_lanes.py` | Phase 3 | 31 + | frame-lanes.AC8.3 | FREE on a shared frame frees only the lane; other lanes' data is preserved. FREE on last lane frees the frame. | unit | `tests/test_pe_lanes.py` | Phase 3 | 32 + | frame-lanes.AC8.4 | ALLOC_REMOTE emits ALLOC_SHARED when `fref+2` is non-zero. | unit | `tests/test_pe_lanes.py` | Phase 4 | 33 + | frame-lanes.AC8.5 | ALLOC_REMOTE emits ALLOC when `fref+2` is zero (backwards compatible). | unit | `tests/test_pe_lanes.py` | Phase 4 | 34 + | frame-lanes.AC8.6 | Full loop pipelining scenario: two iterations of a dyadic instruction running concurrently on different lanes, both producing correct results. | e2e | `tests/test_pe_lanes.py` | Phase 6 | 35 + 36 + ## Criteria Requiring Human Verification 37 + 38 + | AC ID | Criterion | Justification | Verification Approach | 39 + |-------|-----------|---------------|----------------------| 40 + | frame-lanes.AC6.3 | Monitor REPL `pe` command displays lane info in tag_store output. | The existing REPL tests at `tests/test_repl.py` only assert that `do_pe()` produces non-empty output (`len(out) > 0`); they do not assert on specific formatting content. The formatting change from `{0: 0}` to `{0: frame 0 lane 0}` is a display concern that is most reliably verified by visual inspection. While a string-matching test could be added, the REPL formatting is intentionally loosely tested to allow cosmetic changes without test churn. | Run `python -m monitor` with a loaded program, execute the `pe 0` command, and confirm tag_store entries display as `act_id: frame F lane L` format. Verify that multi-lane scenarios (after ALLOC_SHARED) show distinct lane numbers per act_id. |
+76
docs/test-plans/2026-03-07-frame-lanes.md
··· 1 + # Frame Matching Lanes — Human Test Plan 2 + 3 + ## Overview 4 + 5 + This test plan covers manual verification steps for the frame matching lanes implementation (28 automated acceptance criteria + 1 human verification criterion). 6 + 7 + **Automated coverage:** 28/28 acceptance criteria have automated tests across `tests/test_pe_lanes.py`, `tests/test_pe_frames.py`, `tests/test_pe_events.py`, `tests/test_pe.py`, `tests/test_snapshot.py`, `tests/test_monitor_graph_json.py`, `tests/test_codegen_frames.py`, and `tests/test_repl.py`. 8 + 9 + **Test count:** 1300 tests collected (20 new in `test_pe_lanes.py`, remainder updated for tuple API). 10 + 11 + --- 12 + 13 + ## Manual Verification Required 14 + 15 + ### frame-lanes.AC6.3: Monitor REPL Lane Display 16 + 17 + **Criterion:** Monitor REPL `pe` command displays lane info in tag_store output. 18 + 19 + **Why manual:** REPL tests assert non-empty output only (`len(out) > 0`), not specific formatting. Display formatting is intentionally loosely tested to allow cosmetic changes without test churn. 20 + 21 + **Steps:** 22 + 23 + 1. Start the monitor with a program that uses frame allocation: 24 + ```bash 25 + python -m monitor examples/simple_add.dfasm 26 + ``` 27 + 28 + 2. Load and step the simulation: 29 + ``` 30 + (monitor) load examples/simple_add.dfasm 31 + (monitor) step 32 + ``` 33 + 34 + 3. Inspect PE state: 35 + ``` 36 + (monitor) pe 0 37 + ``` 38 + 39 + 4. **Verify:** Tag store entries display in the format: 40 + ``` 41 + Tag store: {0: frame 0 lane 0} 42 + ``` 43 + Not the old format `{0: 0}`. 44 + 45 + 5. **Multi-lane verification** (requires manual token injection or a program that uses ALLOC_SHARED): 46 + - After ALLOC_SHARED creates a second activation on the same frame, verify output shows distinct lane numbers: 47 + ``` 48 + Tag store: {0: frame 2 lane 0, 1: frame 2 lane 1} 49 + ``` 50 + 51 + **Expected result:** Lane info is clearly visible in tag_store display for all PE state inspections. 52 + 53 + --- 54 + 55 + ## Automated Test Summary by Acceptance Criterion 56 + 57 + | AC Group | Criteria Count | Primary Test File | Key Tests | 58 + |----------|---------------|-------------------|-----------| 59 + | AC1: Tag Store Tuple API | 4 | `test_pe_frames.py`, `test_pe_events.py`, `test_pe.py` | Existing tests adapted to `dict[int, tuple[int, int]]` | 60 + | AC2: Match Data Separation | 4 | `test_pe.py`, `test_pe_frames.py` | 3D presence/port/match_data indexing | 61 + | AC3: FrameOp Extensions | 6 | `test_pe_lanes.py` | ALLOC_SHARED, FREE_LANE, smart FREE, lane exhaustion | 62 + | AC4: ALLOC_REMOTE Data-Driven | 2 | `test_pe_lanes.py` | `fref+2` read for ALLOC_SHARED vs ALLOC | 63 + | AC5: FREE_FRAME Smart Free | 1 | `test_pe_lanes.py` | FREE_FRAME opcode delegates to smart free | 64 + | AC6: Monitor/Snapshot | 4 | `test_snapshot.py`, `test_monitor_graph_json.py`, `test_repl.py` | 3D snapshot, JSON lane serialisation | 65 + | AC7: Codegen | 2 | `test_codegen_frames.py` | Tuple initial_tag_store generation | 66 + | AC8: Test Coverage | 6 | `test_pe_lanes.py` | Independent matching, lane exhaustion, loop pipelining | 67 + 68 + --- 69 + 70 + ## Regression Checklist 71 + 72 + - [ ] All 1300 tests pass: `python -m pytest tests/ -v` 73 + - [ ] No FrameOp references to ALLOC_SHARED/FREE_LANE in `asm/`: `grep -r "ALLOC_SHARED\|FREE_LANE" asm/` returns empty 74 + - [ ] Existing frame allocation tests still pass with tuple API 75 + - [ ] Monitor web UI loads without errors (if available) 76 + - [ ] REPL `pe` command shows lane info (AC6.3 manual check above)
+3
emu/events.py
··· 87 87 component: str 88 88 act_id: int 89 89 frame_id: int 90 + lane: int 90 91 91 92 92 93 @dataclass(frozen=True) ··· 95 96 component: str 96 97 act_id: int 97 98 frame_id: int 99 + lane: int 100 + frame_freed: bool 98 101 99 102 100 103 @dataclass(frozen=True)
+175 -46
emu/pe.py
··· 39 39 40 40 Manages: 41 41 - Frame store: [frame_count][frame_slots] dense per-activation data 42 - - Tag store: act_id → frame_id mapping 43 - - Presence bits: [frame_count][matchable_offsets] for dyadic matching 44 - - Port store: [frame_count][matchable_offsets] for port metadata 42 + - Tag store: act_id → (frame_id, lane) mapping 43 + - Match data: [frame_id][matchable_offsets][lane_count] for operand values 44 + - Presence bits: [frame_id][matchable_offsets][lane_count] for dyadic matching 45 + - Port store: [frame_id][matchable_offsets][lane_count] for port metadata 46 + - Lane free: per-frame set of available lane IDs 45 47 - Free frames: pool of available frame IDs 46 48 47 49 Pipeline (per token): ··· 68 70 for _ in range(config.frame_count) 69 71 ] 70 72 71 - # Tag store: act_id → frame_id 72 - self.tag_store: dict[int, int] = dict(config.initial_tag_store or {}) 73 + # Tag store: act_id → (frame_id, lane) 74 + self.tag_store: dict[int, tuple[int, int]] = dict(config.initial_tag_store or {}) 73 75 74 - # Presence bits: [frame_id][match_slot] - True if operand waiting for partner 75 - self.presence: list[list[bool]] = [ 76 - [False for _ in range(config.matchable_offsets)] 76 + # Match data: [frame_id][match_slot][lane] - operand values waiting for partner 77 + self.match_data: list[list[list[Optional[int]]]] = [ 78 + [ 79 + [None for _ in range(config.lane_count)] 80 + for _ in range(config.matchable_offsets) 81 + ] 82 + for _ in range(config.frame_count) 83 + ] 84 + 85 + # Presence bits: [frame_id][match_slot][lane] - True if operand waiting for partner 86 + self.presence: list[list[list[bool]]] = [ 87 + [ 88 + [False for _ in range(config.lane_count)] 89 + for _ in range(config.matchable_offsets) 90 + ] 77 91 for _ in range(config.frame_count) 78 92 ] 79 93 80 - # Port store: [frame_id][match_slot] - port of waiting operand 81 - self.port_store: list[list[Optional[Port]]] = [ 82 - [None for _ in range(config.matchable_offsets)] 94 + # Port store: [frame_id][match_slot][lane] - port of waiting operand 95 + self.port_store: list[list[list[Optional[Port]]]] = [ 96 + [ 97 + [None for _ in range(config.lane_count)] 98 + for _ in range(config.matchable_offsets) 99 + ] 83 100 for _ in range(config.frame_count) 84 101 ] 85 102 103 + self.lane_count = config.lane_count 104 + 86 105 # Free frames pool 87 106 self.free_frames = list(range(config.frame_count)) 88 - for frame_id in self.tag_store.values(): 107 + for frame_id, _lane in self.tag_store.values(): 89 108 if frame_id in self.free_frames: 90 109 self.free_frames.remove(frame_id) 110 + 111 + # Lane tracking: which lanes are free per frame 112 + self.lane_free: dict[int, set[int]] = {} 113 + 114 + # Initialize lane_free for pre-loaded tag_store entries 115 + for act_id, (frame_id, lane) in self.tag_store.items(): 116 + if frame_id not in self.lane_free: 117 + # First time seeing this frame — set up lane tracking 118 + all_lanes = set(range(self.lane_count)) 119 + self.lane_free[frame_id] = all_lanes - {lane} 120 + else: 121 + self.lane_free[frame_id].discard(lane) 91 122 92 123 # Load initial frame data 93 124 if config.initial_frames: ··· 173 204 )) 174 205 return 175 206 176 - frame_id = self.tag_store[token.act_id] 207 + frame_id, lane = self.tag_store[token.act_id] 177 208 178 209 # Determine if monadic or dyadic instruction 179 210 is_monadic = ( ··· 192 223 left, right = token.data, None 193 224 else: 194 225 # Dyadic matching via presence bits 195 - operands = self._match_frame(token, inst, frame_id) 226 + operands = self._match_frame(token, inst, frame_id, lane) 196 227 yield self.env.timeout(1) # match cycle 197 228 if operands is None: 198 229 return # waiting for partner ··· 229 260 yield self.env.timeout(1) # EMIT cycle 230 261 self._do_emit_new(inst, result, False, token.act_id, frame_id) 231 262 elif inst.opcode == RoutingOp.ALLOC_REMOTE: 232 - # PE-level: read target PE and act_id from frame constants 263 + # PE-level: read target PE, act_id, and optional parent act_id from frame constants 264 + # fref+0: target PE 265 + # fref+1: target act_id 266 + # fref+2: parent act_id (0 = fresh ALLOC, non-zero = ALLOC_SHARED) 233 267 # Total: 4 cycles (dequeue + IFETCH + EXECUTE + EMIT) 234 268 target_pe = self.frames[frame_id][inst.fref] if inst.fref < len(self.frames[frame_id]) else 0 235 269 target_act = self.frames[frame_id][inst.fref + 1] if inst.fref + 1 < len(self.frames[frame_id]) else 0 270 + parent_act = self.frames[frame_id][inst.fref + 2] if inst.fref + 2 < len(self.frames[frame_id]) else 0 271 + 272 + # Guard against None slot values 273 + if target_pe is None or target_act is None: 274 + logger.warning(f"PE {self.pe_id}: ALLOC_REMOTE has None at fref slots, skipping") 275 + return 276 + 277 + if parent_act: 278 + alloc_op = FrameOp.ALLOC_SHARED 279 + payload = parent_act 280 + else: 281 + alloc_op = FrameOp.ALLOC 282 + payload = 0 283 + 236 284 fct = FrameControlToken( 237 285 target=target_pe, 238 286 act_id=target_act, 239 - op=FrameOp.ALLOC, 240 - payload=0, 287 + op=alloc_op, 288 + payload=payload, 241 289 ) 242 290 self._on_event(Executed( 243 291 time=self.env.now, component=self._component, ··· 256 304 )) 257 305 yield self.env.timeout(1) # EXECUTE cycle 258 306 yield self.env.timeout(1) # EMIT cycle (no output token) 259 - # Frame deallocation happens during EMIT cycle 307 + # Frame deallocation happens during EMIT cycle with smart FREE logic 260 308 if token.act_id in self.tag_store: 261 - freed_frame = self.tag_store.pop(token.act_id) 262 - self.free_frames.append(freed_frame) 263 - self._on_event(FrameFreed( 264 - time=self.env.now, component=self._component, 265 - act_id=token.act_id, frame_id=freed_frame, 266 - )) 309 + self._smart_free(token.act_id) 310 + else: 311 + logger.warning(f"PE {self.pe_id}: FREE_FRAME for unknown act_id {token.act_id}") 267 312 else: 268 313 # Normal ALU execute 269 314 # MINOR FIX: Restructure const_val handling to avoid dead code ··· 281 326 yield self.env.timeout(1) # EMIT cycle 282 327 self._do_emit_new(inst, result, bool_out, token.act_id, frame_id, left=left) 283 328 329 + def _smart_free(self, act_id: int) -> None: 330 + """Smart FREE helper: deallocate lane, possibly returning frame to free list. 331 + 332 + Does NOT yield. Caller handles timing. Emits FrameFreed event. 333 + """ 334 + if act_id not in self.tag_store: 335 + return # Caller should have checked, but skip silently 336 + 337 + frame_id, lane = self.tag_store.pop(act_id) 338 + # Clear this lane's match state 339 + for i in range(self.matchable_offsets): 340 + self.match_data[frame_id][i][lane] = None 341 + self.presence[frame_id][i][lane] = False 342 + self.port_store[frame_id][i][lane] = None 343 + # Check if any other activations use this frame 344 + frame_in_use = any(fid == frame_id for fid, _ in self.tag_store.values()) 345 + if frame_in_use: 346 + # Return lane to pool, keep frame 347 + self.lane_free[frame_id].add(lane) 348 + self._on_event(FrameFreed( 349 + time=self.env.now, component=self._component, 350 + act_id=act_id, frame_id=frame_id, 351 + lane=lane, frame_freed=False, 352 + )) 353 + else: 354 + # Last lane — return frame to free list 355 + self.free_frames.append(frame_id) 356 + if frame_id in self.lane_free: 357 + del self.lane_free[frame_id] 358 + # Clear frame slots 359 + for i in range(self.frame_slots): 360 + self.frames[frame_id][i] = None 361 + self._on_event(FrameFreed( 362 + time=self.env.now, component=self._component, 363 + act_id=act_id, frame_id=frame_id, 364 + lane=lane, frame_freed=True, 365 + )) 366 + 284 367 def _handle_frame_control(self, token: FrameControlToken) -> None: 285 - """Handle ALLOC and FREE operations.""" 368 + """Handle ALLOC, FREE, ALLOC_SHARED, and FREE_LANE operations.""" 286 369 if token.op == FrameOp.ALLOC: 287 370 if self.free_frames: 288 371 frame_id = self.free_frames.pop() 289 - self.tag_store[token.act_id] = frame_id 372 + self.tag_store[token.act_id] = (frame_id, 0) 373 + # Set up lane tracking: lane 0 is taken, rest are free 374 + self.lane_free[frame_id] = set(range(1, self.lane_count)) 290 375 # Initialize frame slots to None 291 376 for i in range(self.frame_slots): 292 377 self.frames[frame_id][i] = None 293 - # CRITICAL FIX: Reset stale presence bits and port_store from previous activation 378 + # Reset all lanes' match state 294 379 for i in range(self.matchable_offsets): 295 - self.presence[frame_id][i] = False 296 - self.port_store[frame_id][i] = None 380 + for ln in range(self.lane_count): 381 + self.match_data[frame_id][i][ln] = None 382 + self.presence[frame_id][i][ln] = False 383 + self.port_store[frame_id][i][ln] = None 297 384 self._on_event(FrameAllocated( 298 385 time=self.env.now, component=self._component, 299 - act_id=token.act_id, frame_id=frame_id, 386 + act_id=token.act_id, frame_id=frame_id, lane=0, 300 387 )) 301 388 else: 302 389 logger.warning(f"PE {self.pe_id}: no free frames available") 303 390 elif token.op == FrameOp.FREE: 304 391 if token.act_id in self.tag_store: 305 - frame_id = self.tag_store.pop(token.act_id) 306 - self.free_frames.append(frame_id) 307 - self._on_event(FrameFreed( 392 + self._smart_free(token.act_id) 393 + else: 394 + logger.warning(f"PE {self.pe_id}: FREE for unknown act_id {token.act_id}") 395 + elif token.op == FrameOp.ALLOC_SHARED: 396 + # Shared allocation: find parent's frame, assign next free lane 397 + # Guard against self-referential act_id (would leak old lane) 398 + if token.act_id in self.tag_store: 399 + self._on_event(TokenRejected( 308 400 time=self.env.now, component=self._component, 309 - act_id=token.act_id, frame_id=frame_id, 401 + token=token, reason=f"act_id {token.act_id} already in tag store", 310 402 )) 403 + return 404 + parent_act_id = token.payload 405 + if parent_act_id not in self.tag_store: 406 + self._on_event(TokenRejected( 407 + time=self.env.now, component=self._component, 408 + token=token, reason=f"parent act_id {parent_act_id} not in tag store", 409 + )) 410 + return 411 + parent_frame_id, _ = self.tag_store[parent_act_id] 412 + free_lanes = self.lane_free.get(parent_frame_id, set()) 413 + if not free_lanes: 414 + self._on_event(TokenRejected( 415 + time=self.env.now, component=self._component, 416 + token=token, reason="no free lanes", 417 + )) 418 + return 419 + lane = min(free_lanes) # Deterministic: pick lowest free lane 420 + free_lanes.remove(lane) 421 + self.tag_store[token.act_id] = (parent_frame_id, lane) 422 + # Clear only this lane's match state 423 + for i in range(self.matchable_offsets): 424 + self.match_data[parent_frame_id][i][lane] = None 425 + self.presence[parent_frame_id][i][lane] = False 426 + self.port_store[parent_frame_id][i][lane] = None 427 + self._on_event(FrameAllocated( 428 + time=self.env.now, component=self._component, 429 + act_id=token.act_id, frame_id=parent_frame_id, lane=lane, 430 + )) 431 + elif token.op == FrameOp.FREE_LANE: 432 + # Free lane with smart frame deallocation. 433 + # If this is the last lane using the frame, the frame is returned to free_frames. 434 + # Otherwise, just the lane is returned to the pool. 435 + if token.act_id in self.tag_store: 436 + self._smart_free(token.act_id) 437 + else: 438 + logger.warning(f"PE {self.pe_id}: FREE_LANE for unknown act_id {token.act_id}") 311 439 312 440 def _handle_local_write(self, token: PELocalWriteToken) -> None: 313 441 """Handle IRAM write and frame write.""" ··· 319 447 )) 320 448 elif token.region == 1: # Frame 321 449 if token.act_id in self.tag_store: 322 - frame_id = self.tag_store[token.act_id] 450 + frame_id, _lane = self.tag_store[token.act_id] 323 451 if token.is_dest: 324 452 # Decode flit 1 to FrameDest 325 453 dest = unpack_flit1(token.data) ··· 345 473 token: DyadToken, 346 474 inst: Instruction, 347 475 frame_id: int, 476 + lane: int, 348 477 ) -> Optional[tuple[int, int]]: 349 - """Frame-based dyadic matching. 478 + """Frame-based dyadic matching with lane support. 350 479 351 480 Derives match slot from low bits of token.offset: 352 481 match_slot = token.offset % matchable_offsets 353 482 354 - Both L and R tokens write to frames[frame_id][match_slot]. 355 - Port metadata determines left/right ordering when second arrives. 483 + Match data, presence, and port are per-lane. 484 + Frame constants/destinations remain shared. 356 485 """ 357 486 match_slot = token.offset % self.matchable_offsets 358 487 359 - if self.presence[frame_id][match_slot]: 488 + if self.presence[frame_id][match_slot][lane]: 360 489 # Partner already waiting — pair them 361 - partner_data = self.frames[frame_id][match_slot] 362 - partner_port = self.port_store[frame_id][match_slot] 363 - self.presence[frame_id][match_slot] = False 364 - self.frames[frame_id][match_slot] = None 490 + partner_data = self.match_data[frame_id][match_slot][lane] 491 + partner_port = self.port_store[frame_id][match_slot][lane] 492 + self.presence[frame_id][match_slot][lane] = False 493 + self.match_data[frame_id][match_slot][lane] = None 365 494 366 495 # Use port metadata to determine left/right ordering 367 496 if partner_port == Port.L: ··· 377 506 return left, right 378 507 else: 379 508 # Store and wait for partner 380 - self.frames[frame_id][match_slot] = token.data 381 - self.port_store[frame_id][match_slot] = token.port 382 - self.presence[frame_id][match_slot] = True 509 + self.match_data[frame_id][match_slot][lane] = token.data 510 + self.port_store[frame_id][match_slot][lane] = token.port 511 + self.presence[frame_id][match_slot][lane] = True 383 512 return None 384 513 385 514 def _do_emit_new(
+2 -1
emu/types.py
··· 20 20 frame_count: int = 8 21 21 frame_slots: int = 64 22 22 matchable_offsets: int = 8 23 + lane_count: int = 4 23 24 initial_frames: Optional[dict[int, list[FrameSlotValue]]] = None 24 - initial_tag_store: Optional[dict[int, int]] = None 25 + initial_tag_store: Optional[dict[int, tuple[int, int]]] = None 25 26 allowed_pe_routes: Optional[set[int]] = None 26 27 allowed_sm_routes: Optional[set[int]] = None 27 28 on_event: EventCallback | None = None
+1 -1
monitor/CLAUDE.md
··· 52 52 - `__init__.py` -- Public API exports 53 53 - `backend.py` -- `SimulationBackend` class with thread lifecycle and command dispatch 54 54 - `commands.py` -- All command and result frozen dataclasses, `SimCommand` union type 55 - - `snapshot.py` -- `StateSnapshot`, `PESnapshot` (frame-based: frames, tag_store, presence, port_store, free_frames), `SMSnapshot`, `SMCellSnapshot`, `capture()` 55 + - `snapshot.py` -- `StateSnapshot`, `PESnapshot` (frame-based: frames, tag_store mapping act_id → (frame_id, lane) tuples, presence, port_store, match_data all 3D [frame_id][match_slot][lane], lane_count, free_frames), `SMSnapshot`, `SMCellSnapshot`, `capture()` 56 56 - `graph_json.py` -- JSON serialization with execution overlay (extends dfgraph patterns) 57 57 - `server.py` -- `create_app(backend)` FastAPI factory, `ConnectionManager`, WebSocket handler 58 58 - `repl.py` -- `MonitorREPL(cmd.Cmd)` interactive CLI
+2 -2
monitor/formatting.py
··· 194 194 # Tag store 195 195 if pe_snapshot.tag_store: 196 196 tag_str = ", ".join( 197 - f"{colour(str(k), 'white')}: {colour(str(v), 'white')}" 198 - for k, v in sorted(pe_snapshot.tag_store.items()) 197 + f"{colour(str(k), 'white')}: frame {colour(str(fid), 'white')} lane {colour(str(lane), 'white')}" 198 + for k, (fid, lane) in sorted(pe_snapshot.tag_store.items()) 199 199 ) 200 200 lines.append(f" Tag store: {{{tag_str}}}") 201 201 else:
+10 -1
monitor/graph_json.py
··· 117 117 118 118 frames_json = [[_serialise_slot(s) for s in frame] for frame in pe_snap.frames] 119 119 120 + tag_store_json = { 121 + str(act_id): {"frame_id": fid, "lane": lane} 122 + for act_id, (fid, lane) in pe_snap.tag_store.items() 123 + } 124 + 120 125 return { 121 126 "pe_id": pe_snap.pe_id, 122 127 "iram": iram_json, 123 128 "frames": frames_json, 124 - "tag_store": pe_snap.tag_store, 129 + "tag_store": tag_store_json, 130 + "lane_count": pe_snap.lane_count, 125 131 "free_frames": list(pe_snap.free_frames), 126 132 "input_queue_size": len(pe_snap.input_queue), 127 133 } ··· 186 192 base["details"] = { 187 193 "act_id": event.act_id, 188 194 "frame_id": event.frame_id, 195 + "lane": event.lane, 189 196 } 190 197 elif isinstance(event, FrameFreed): 191 198 base["details"] = { 192 199 "act_id": event.act_id, 193 200 "frame_id": event.frame_id, 201 + "lane": event.lane, 202 + "frame_freed": event.frame_freed, 194 203 } 195 204 elif isinstance(event, FrameSlotWritten): 196 205 base["details"] = {
+22 -5
monitor/snapshot.py
··· 22 22 pe_id: int 23 23 iram: dict[int, Instruction] 24 24 frames: tuple[tuple[FrameSlotValue, ...], ...] 25 - tag_store: dict[int, int] 26 - presence: tuple[tuple[bool, ...], ...] 27 - port_store: tuple[tuple[Port | None, ...], ...] 25 + tag_store: dict[int, tuple[int, int]] 26 + presence: tuple[tuple[tuple[bool, ...], ...], ...] 27 + port_store: tuple[tuple[tuple[Port | None, ...], ...], ...] 28 + match_data: tuple[tuple[tuple[int | None, ...], ...], ...] 28 29 free_frames: tuple[int, ...] 30 + lane_count: int 29 31 input_queue: tuple[Token, ...] 30 32 output_log: tuple[Token, ...] 31 33 ··· 80 82 ) 81 83 tag_store = dict(pe.tag_store) 82 84 presence = tuple( 83 - tuple(p for p in frame_presence) 85 + tuple( 86 + tuple(lane_val for lane_val in offset_lanes) 87 + for offset_lanes in frame_presence 88 + ) 84 89 for frame_presence in pe.presence 85 90 ) 86 91 port_store = tuple( 87 - tuple(p for p in frame_ports) 92 + tuple( 93 + tuple(lane_val for lane_val in offset_lanes) 94 + for offset_lanes in frame_ports 95 + ) 88 96 for frame_ports in pe.port_store 89 97 ) 98 + match_data = tuple( 99 + tuple( 100 + tuple(lane_val for lane_val in offset_lanes) 101 + for offset_lanes in frame_match 102 + ) 103 + for frame_match in pe.match_data 104 + ) 90 105 free_frames = tuple(pe.free_frames) 91 106 92 107 pes[pe_id] = PESnapshot( ··· 96 111 tag_store=tag_store, 97 112 presence=presence, 98 113 port_store=port_store, 114 + match_data=match_data, 99 115 free_frames=free_frames, 116 + lane_count=pe.lane_count, 100 117 input_queue=tuple(pe.input_store.items), 101 118 output_log=tuple(pe.output_log), 102 119 )
+11
tests/test_codegen_frames.py
··· 324 324 assert len(alloc_tokens) == 2 325 325 assert all(t.op == FrameOp.ALLOC for t in alloc_tokens) 326 326 327 + # Verify that PE configs have initial_tag_store with tuple values 328 + assert len(result.pe_configs) == 1 329 + pe_cfg = result.pe_configs[0] 330 + assert pe_cfg.initial_tag_store, "initial_tag_store should not be empty for PE with activations" 331 + for act_id, val in pe_cfg.initial_tag_store.items(): 332 + assert isinstance(val, tuple) and len(val) == 2, \ 333 + f"initial_tag_store[{act_id}] should be (frame_id, lane) tuple, got {val}" 334 + frame_id, lane = val 335 + assert isinstance(frame_id, int), f"frame_id should be int, got {type(frame_id)}" 336 + assert isinstance(lane, int), f"lane should be int, got {type(lane)}" 337 + 327 338 328 339 class TestTask3SeedTokens: 329 340 """Task 3: Seed token generation with act_id."""
+110
tests/test_monitor_graph_json.py
··· 46 46 tag_store={}, 47 47 presence=(), 48 48 port_store=(), 49 + match_data=(), 49 50 free_frames=(), 51 + lane_count=4, 50 52 input_queue=(), 51 53 output_log=(), 52 54 ) ··· 95 97 tag_store={}, 96 98 presence=(), 97 99 port_store=(), 100 + match_data=(), 98 101 free_frames=(), 102 + lane_count=4, 99 103 input_queue=(), 100 104 output_log=(), 101 105 ) ··· 141 145 tag_store={}, 142 146 presence=(), 143 147 port_store=(), 148 + match_data=(), 144 149 free_frames=(), 150 + lane_count=4, 145 151 input_queue=(), 146 152 output_log=(), 147 153 ) ··· 220 226 tag_store={}, 221 227 presence=(), 222 228 port_store=(), 229 + match_data=(), 223 230 free_frames=(), 231 + lane_count=4, 224 232 input_queue=(), 225 233 output_log=(), 226 234 ) ··· 261 269 tag_store={}, 262 270 presence=(), 263 271 port_store=(), 272 + match_data=(), 264 273 free_frames=(), 274 + lane_count=4, 265 275 input_queue=(), 266 276 output_log=(), 267 277 ) ··· 300 310 tag_store={}, 301 311 presence=(), 302 312 port_store=(), 313 + match_data=(), 303 314 free_frames=(), 315 + lane_count=4, 304 316 input_queue=(), 305 317 output_log=(), 306 318 ) ··· 472 484 assert result["finished"] is True 473 485 474 486 487 + class TestTagStoreSerialisation: 488 + """Test non-empty tag_store JSON serialization in PE state.""" 489 + 490 + def test_non_empty_tag_store_serialization(self): 491 + """Test that tag_store with non-empty mapping serializes correctly.""" 492 + node = IRNode( 493 + name="&add", 494 + opcode=ArithOp.ADD, 495 + pe=0, 496 + iram_offset=0, 497 + act_id=0, 498 + ) 499 + ir_graph = IRGraph(nodes={"&add": node}) 500 + 501 + inst = Instruction(opcode=ArithOp.ADD, output=OutputStyle.INHERIT, has_const=False, dest_count=2, wide=False, fref=0) 502 + 503 + # Create PESnapshot with non-empty tag_store: act_id 0 maps to frame 2, lane 1 504 + pe_snap = PESnapshot( 505 + pe_id=0, 506 + iram={0: inst}, 507 + frames=(), 508 + tag_store={0: (2, 1)}, # act_id=0 -> (frame_id=2, lane=1) 509 + presence=(), 510 + port_store=(), 511 + match_data=(), 512 + free_frames=(), 513 + lane_count=4, 514 + input_queue=(), 515 + output_log=(), 516 + ) 517 + snapshot = StateSnapshot( 518 + sim_time=0.0, 519 + next_time=1.0, 520 + pes={0: pe_snap}, 521 + sms={}, 522 + ) 523 + 524 + result = graph_loaded_json(ir_graph, snapshot) 525 + pe_state = result["state"]["pes"]["0"] 526 + 527 + # Verify tag_store is correctly serialized 528 + assert "tag_store" in pe_state 529 + assert "0" in pe_state["tag_store"] # act_id 0 should be a string key "0" 530 + assert pe_state["tag_store"]["0"]["frame_id"] == 2 531 + assert pe_state["tag_store"]["0"]["lane"] == 1 532 + assert pe_state["lane_count"] == 4 533 + 534 + def test_multiple_entries_tag_store_serialization(self): 535 + """Test tag_store with multiple act_id entries serializes correctly.""" 536 + ir_graph = IRGraph() 537 + 538 + inst = Instruction(opcode=ArithOp.ADD, output=OutputStyle.INHERIT, has_const=False, dest_count=2, wide=False, fref=0) 539 + 540 + # Create PESnapshot with multiple tag_store entries 541 + pe_snap = PESnapshot( 542 + pe_id=0, 543 + iram={0: inst}, 544 + frames=(), 545 + tag_store={ 546 + 0: (2, 1), 547 + 1: (3, 0), 548 + 5: (7, 2), 549 + }, 550 + presence=(), 551 + port_store=(), 552 + match_data=(), 553 + free_frames=(), 554 + lane_count=4, 555 + input_queue=(), 556 + output_log=(), 557 + ) 558 + snapshot = StateSnapshot( 559 + sim_time=0.0, 560 + next_time=1.0, 561 + pes={0: pe_snap}, 562 + sms={}, 563 + ) 564 + 565 + result = graph_loaded_json(ir_graph, snapshot) 566 + pe_state = result["state"]["pes"]["0"] 567 + 568 + # Verify all entries are correctly serialized 569 + assert "tag_store" in pe_state 570 + assert pe_state["tag_store"]["0"]["frame_id"] == 2 571 + assert pe_state["tag_store"]["0"]["lane"] == 1 572 + assert pe_state["tag_store"]["1"]["frame_id"] == 3 573 + assert pe_state["tag_store"]["1"]["lane"] == 0 574 + assert pe_state["tag_store"]["5"]["frame_id"] == 7 575 + assert pe_state["tag_store"]["5"]["lane"] == 2 576 + assert pe_state["lane_count"] == 4 577 + 578 + 475 579 class TestAC72_NewEventTypesSerialization: 476 580 """AC7.2: Verify new frame-based event types serialize correctly to JSON.""" 477 581 ··· 484 588 component="pe:0", 485 589 act_id=1, 486 590 frame_id=2, 591 + lane=0, 487 592 ) 488 593 489 594 ir_graph = IRGraph() ··· 508 613 assert "details" in event_json 509 614 assert event_json["details"]["act_id"] == 1 510 615 assert event_json["details"]["frame_id"] == 2 616 + assert event_json["details"]["lane"] == 0 511 617 512 618 def test_frame_freed_event_serialization(self): 513 619 """FrameFreed event should serialize with correct type and fields.""" ··· 518 624 component="pe:1", 519 625 act_id=3, 520 626 frame_id=4, 627 + lane=0, 628 + frame_freed=True, 521 629 ) 522 630 523 631 ir_graph = IRGraph() ··· 540 648 assert "details" in event_json 541 649 assert event_json["details"]["act_id"] == 3 542 650 assert event_json["details"]["frame_id"] == 4 651 + assert event_json["details"]["lane"] == 0 652 + assert event_json["details"]["frame_freed"] == True 543 653 544 654 def test_frame_slot_written_event_serialization(self): 545 655 """FrameSlotWritten event should serialize with correct type and fields."""
+8 -8
tests/test_network_routing.py
··· 173 173 """build_topology loads initial_tag_store into PE and removes frames from free_frames.""" 174 174 env = simpy.Environment() 175 175 initial_tag_store = { 176 - 0: 2, # act_id 0 → frame 2 177 - 1: 3, # act_id 1 → frame 3 176 + 0: (2, 0), # act_id 0 → frame 2, lane 0 177 + 1: (3, 0), # act_id 1 → frame 3, lane 0 178 178 } 179 179 pe_configs = [ 180 180 PECfg( ··· 187 187 system = build_topology(env, pe_configs, sm_configs) 188 188 189 189 pe = system.pes[0] 190 - assert pe.tag_store[0] == 2 191 - assert pe.tag_store[1] == 3 190 + assert pe.tag_store[0] == (2, 0) 191 + assert pe.tag_store[1] == (3, 0) 192 192 # Frames 2 and 3 should be removed from free_frames 193 193 assert 2 not in pe.free_frames 194 194 assert 3 not in pe.free_frames ··· 205 205 2: [7, 8, 9], 206 206 } 207 207 initial_tag_store = { 208 - 0: 0, # act_id 0 uses frame 0 209 - 1: 1, # act_id 1 uses frame 1 208 + 0: (0, 0), # act_id 0 uses frame 0, lane 0 209 + 1: (1, 0), # act_id 1 uses frame 1, lane 0 210 210 } 211 211 pe_configs = [ 212 212 PECfg( ··· 225 225 assert pe.frames[1][:3] == [4, 5, 6] 226 226 assert pe.frames[2][:3] == [7, 8, 9] 227 227 # Verify tag_store loaded 228 - assert pe.tag_store[0] == 0 229 - assert pe.tag_store[1] == 1 228 + assert pe.tag_store[0] == (0, 0) 229 + assert pe.tag_store[1] == (1, 0) 230 230 # Verify free_frames reflects allocations 231 231 assert 0 not in pe.free_frames 232 232 assert 1 not in pe.free_frames
+22 -22
tests/test_pe.py
··· 98 98 pe_id=0, 99 99 iram={0: pass_inst}, 100 100 initial_frames={0: {8: dest}}, 101 - initial_tag_store={0: 0}, 101 + initial_tag_store={0: (0, 0)}, 102 102 ) 103 103 104 104 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 141 141 pe_id=0, 142 142 iram={0: add_inst}, 143 143 initial_frames={0: {8: dest}}, 144 - initial_tag_store={0: 0}, 144 + initial_tag_store={0: (0, 0)}, 145 145 ) 146 146 147 147 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 156 156 # No output from first token 157 157 assert len(output_store.items) == 0 158 158 # Matching store should have the operand 159 - frame_id = pe.tag_store[0] 160 - assert pe.presence[frame_id][0] is True 159 + frame_id, _lane = pe.tag_store[0] 160 + assert pe.presence[frame_id][0][0] is True 161 161 162 162 def test_second_dyadic_fires_left_first(self): 163 163 """AC1.3: Second dyadic token fires when partner found (L then R).""" ··· 181 181 pe_id=0, 182 182 iram={0: add_inst}, 183 183 initial_frames={0: {8: dest}}, 184 - initial_tag_store={0: 0}, 184 + initial_tag_store={0: (0, 0)}, 185 185 ) 186 186 187 187 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 219 219 pe_id=0, 220 220 iram={0: add_inst}, 221 221 initial_frames={0: {8: dest}}, 222 - initial_tag_store={0: 0}, 222 + initial_tag_store={0: (0, 0)}, 223 223 ) 224 224 225 225 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 263 263 pe_id=0, 264 264 iram={0: add_inst}, 265 265 initial_frames={0: {8: dest}}, 266 - initial_tag_store={0: 0}, 266 + initial_tag_store={0: (0, 0)}, 267 267 ) 268 268 269 269 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 310 310 pe_id=0, 311 311 iram={0: add_inst}, 312 312 initial_frames={0: {8: dest_l, 9: dest_r}}, 313 - initial_tag_store={0: 0}, 313 + initial_tag_store={0: (0, 0)}, 314 314 ) 315 315 316 316 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 361 361 pe_id=0, 362 362 iram={0: sweq_inst}, 363 363 initial_frames={0: {8: dest_l, 9: dest_r}}, 364 - initial_tag_store={0: 0}, 364 + initial_tag_store={0: (0, 0)}, 365 365 ) 366 366 367 367 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 414 414 pe_id=0, 415 415 iram={0: sweq_inst}, 416 416 initial_frames={0: {8: dest_l, 9: dest_r}}, 417 - initial_tag_store={0: 0}, 417 + initial_tag_store={0: (0, 0)}, 418 418 ) 419 419 420 420 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 462 462 config = PEConfig( 463 463 pe_id=0, 464 464 iram={0: free_inst}, 465 - initial_tag_store={0: 0}, 465 + initial_tag_store={0: (0, 0)}, 466 466 ) 467 467 468 468 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 498 498 pe_id=0, 499 499 iram={0: gate_inst}, 500 500 initial_frames={0: {8: dest}}, 501 - initial_tag_store={0: 0}, 501 + initial_tag_store={0: (0, 0)}, 502 502 ) 503 503 504 504 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 536 536 pe_id=0, 537 537 iram={0: gate_inst}, 538 538 initial_frames={0: {8: dest}}, 539 - initial_tag_store={0: 0}, 539 + initial_tag_store={0: (0, 0)}, 540 540 ) 541 541 542 542 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 601 601 pe_id=0, 602 602 iram={5: add_inst}, 603 603 initial_frames={0: {8: dest}}, 604 - initial_tag_store={1: 0}, 604 + initial_tag_store={1: (0, 0)}, 605 605 ) 606 606 607 607 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 620 620 inject_two_and_run(env, pe, token_l, token_r) 621 621 622 622 # After firing, matching store should be clear 623 - frame_id = pe.tag_store[token_l.act_id] 624 - assert pe.presence[frame_id][token_l.offset % pe.matchable_offsets] is False 623 + frame_id, _lane = pe.tag_store[token_l.act_id] 624 + assert pe.presence[frame_id][token_l.offset % pe.matchable_offsets][0] is False 625 625 626 626 627 627 class TestOutputTokenCountMatchesMode: ··· 644 644 config = PEConfig( 645 645 pe_id=0, 646 646 iram={0: free_inst}, 647 - initial_tag_store={0: 0}, 647 + initial_tag_store={0: (0, 0)}, 648 648 ) 649 649 650 650 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 687 687 pe_id=0, 688 688 iram={0: add_inst}, 689 689 initial_frames={0: {8: dest}}, 690 - initial_tag_store={0: 0}, 690 + initial_tag_store={0: (0, 0)}, 691 691 ) 692 692 693 693 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 734 734 pe_id=0, 735 735 iram={0: add_inst}, 736 736 initial_frames={0: {8: dest_l, 9: dest_r}}, 737 - initial_tag_store={0: 0}, 737 + initial_tag_store={0: (0, 0)}, 738 738 ) 739 739 740 740 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 784 784 pe_id=0, 785 785 iram={0: sweq_inst}, 786 786 initial_frames={0: {8: dest_l, 9: dest_r}}, 787 - initial_tag_store={0: 0}, 787 + initial_tag_store={0: (0, 0)}, 788 788 ) 789 789 790 790 pe = ProcessingElement(env=env, pe_id=0, config=config) ··· 835 835 pe_id=0, 836 836 iram={0: add_inst}, 837 837 initial_frames={0: {8: dest}}, 838 - initial_tag_store={0: 0}, 838 + initial_tag_store={0: (0, 0)}, 839 839 # Note: only act_id 0 is allocated; other act_ids will be invalid 840 840 ) 841 841 ··· 893 893 pe_id=0, 894 894 iram={0: add_inst, 1: inc_inst}, 895 895 initial_frames={0: {8: dest}}, 896 - initial_tag_store={0: 0}, 896 + initial_tag_store={0: (0, 0)}, 897 897 ) 898 898 899 899 pe = ProcessingElement(env=env, pe_id=0, config=config)
+9 -9
tests/test_pe_events.py
··· 82 82 pe_id=0, 83 83 iram={0: add_inst}, 84 84 initial_frames={0: {8: dest}}, 85 - initial_tag_store={0: 0}, 85 + initial_tag_store={0: (0, 0)}, 86 86 on_event=events.append, 87 87 ) 88 88 ··· 125 125 pe_id=0, 126 126 iram={0: add_inst}, 127 127 initial_frames={0: {8: dest}}, 128 - initial_tag_store={0: 0}, 128 + initial_tag_store={0: (0, 0)}, 129 129 on_event=events.append, 130 130 ) 131 131 ··· 168 168 pe_id=0, 169 169 iram={5: add_inst}, 170 170 initial_frames={0: {8: dest}}, 171 - initial_tag_store={1: 0}, 171 + initial_tag_store={1: (0, 0)}, 172 172 on_event=events.append, 173 173 ) 174 174 ··· 215 215 pe_id=0, 216 216 iram={0: add_inst}, 217 217 initial_frames={0: {8: dest}}, 218 - initial_tag_store={0: 0}, 218 + initial_tag_store={0: (0, 0)}, 219 219 on_event=events.append, 220 220 ) 221 221 ··· 258 258 pe_id=0, 259 259 iram={0: eq_inst}, 260 260 initial_frames={0: {8: dest}}, 261 - initial_tag_store={0: 0}, 261 + initial_tag_store={0: (0, 0)}, 262 262 on_event=events.append, 263 263 ) 264 264 ··· 305 305 pe_id=0, 306 306 iram={0: add_inst}, 307 307 initial_frames={0: {8: dest}}, 308 - initial_tag_store={0: 0}, 308 + initial_tag_store={0: (0, 0)}, 309 309 on_event=events.append, 310 310 ) 311 311 ··· 350 350 pe_id=0, 351 351 iram={0: add_inst}, 352 352 initial_frames={0: {8: dest_l, 9: dest_r}}, 353 - initial_tag_store={0: 0}, 353 + initial_tag_store={0: (0, 0)}, 354 354 on_event=events.append, 355 355 ) 356 356 ··· 396 396 pe_id=0, 397 397 iram={0: sweq_inst}, 398 398 initial_frames={0: {8: dest_l, 9: dest_r}}, 399 - initial_tag_store={0: 0}, 399 + initial_tag_store={0: (0, 0)}, 400 400 on_event=events.append, 401 401 ) 402 402 ··· 433 433 config = PEConfig( 434 434 pe_id=0, 435 435 iram={0: free_inst}, 436 - initial_tag_store={0: 0}, 436 + initial_tag_store={0: (0, 0)}, 437 437 on_event=events.append, 438 438 ) 439 439
+82 -27
tests/test_pe_frames.py
··· 96 96 frame_allocated = [e for e in events if isinstance(e, FrameAllocated)] 97 97 assert len(token_received) > 0 98 98 assert len(frame_allocated) > 0 99 - assert pe.tag_store[0] in range(pe.frame_count) 99 + assert pe.tag_store[0][0] in range(pe.frame_count) 100 + assert frame_allocated[0].lane == 0 100 101 101 102 def test_free_frame_control_token(self): 102 103 env = simpy.Environment() ··· 112 113 fct_alloc = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 113 114 inject_and_run(env, pe, fct_alloc) 114 115 115 - frame_id = pe.tag_store[0] 116 + frame_id, _lane = pe.tag_store[0] 116 117 117 118 # Now deallocate 118 119 fct_free = FrameControlToken(target=0, act_id=0, op=FrameOp.FREE, payload=0) ··· 121 122 # Should have FrameFreed event and tag_store should be cleared 122 123 frame_freed = [e for e in events if isinstance(e, FrameFreed)] 123 124 assert len(frame_freed) > 0 125 + assert frame_freed[0].lane == 0 126 + assert frame_freed[0].frame_freed == True 124 127 assert 0 not in pe.tag_store 125 128 assert frame_id in pe.free_frames 126 129 ··· 143 146 inject_and_run(env, pe, fct) 144 147 145 148 # Set up: install dyadic instruction at offset 0 146 - # Mode 0: no const, dest_count=1 149 + # Mode SINK: no output emission, just execution and matching verification 147 150 inst = Instruction( 148 151 opcode=ArithOp.ADD, 149 - output=OutputStyle.INHERIT, 152 + output=OutputStyle.SINK, 150 153 has_const=False, 151 - dest_count=1, 154 + dest_count=0, 152 155 wide=False, 153 156 fref=0, 154 157 ) ··· 162 165 port=Port.L, 163 166 token_kind=TokenKind.DYADIC, 164 167 ) 165 - pe.frames[pe.tag_store[0]][0] = dest 168 + pe.frames[pe.tag_store[0][0]][0] = dest 166 169 167 170 # Inject first dyadic token (port=L, data=5) 168 171 tok1 = DyadToken( ··· 214 217 # Allocate frame 215 218 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 216 219 inject_and_run(env, pe, fct) 217 - frame_id = pe.tag_store[0] 220 + frame_id, _lane = pe.tag_store[0] 218 221 219 222 # Set up instruction: mode 0 (no const, dest_count=1), fref=8 220 223 inst = Instruction( ··· 345 348 # Allocate frame 346 349 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 347 350 inject_and_run(env, pe, fct) 348 - frame_id = pe.tag_store[0] 351 + frame_id, _lane = pe.tag_store[0] 349 352 350 353 # Set up instruction: SINK output, mode 6 (no const, dest_count=0), fref=10 351 354 inst = Instruction( ··· 413 416 port=Port.L, 414 417 token_kind=TokenKind.MONADIC, 415 418 ) 416 - pe.frames[pe.tag_store[0]][0] = dest 419 + pe.frames[pe.tag_store[0][0]][0] = dest 417 420 418 421 # Wire route table 419 422 pe.route_table[0] = simpy.Store(env) ··· 487 490 pe.iram[6] = inst 488 491 489 492 # Write target PE and target act_id to frame slots 8 and 9 490 - frame_id = pe.tag_store[0] 493 + frame_id, _lane = pe.tag_store[0] 491 494 pe.frames[frame_id][8] = 1 # target PE 492 495 pe.frames[frame_id][9] = 2 # target act_id 493 496 ··· 505 508 frame_allocated = [e for e in pe_events if isinstance(e, FrameAllocated)] 506 509 assert len(frame_allocated) > 0 507 510 assert frame_allocated[0].act_id == 2 511 + assert frame_allocated[0].lane == 0 508 512 509 513 510 514 class TestFreeFrameOpcode: ··· 523 527 # Allocate frame 524 528 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 525 529 inject_and_run(env, pe, fct) 526 - frame_id = pe.tag_store[0] 530 + frame_id, _lane = pe.tag_store[0] 527 531 528 532 # Set up FREE_FRAME instruction 529 533 inst = Instruction( ··· 550 554 frame_freed = [e for e in events if isinstance(e, FrameFreed)] 551 555 assert len(frame_freed) > 0 552 556 assert frame_freed[0].frame_id == frame_id 557 + assert frame_freed[0].lane == 0 558 + assert frame_freed[0].frame_freed == True 553 559 554 560 # tag_store should be cleared 555 561 assert 0 not in pe.tag_store ··· 562 568 assert len(emitted) == 0 563 569 564 570 571 + class TestFreeLane: 572 + """AC3.8: FREE_LANE deallocates lane, potentially returning frame to free list.""" 573 + 574 + def test_free_lane_on_last_lane_returns_frame(self): 575 + """When FREE_LANE is called on the last remaining lane, frame should be returned.""" 576 + env = simpy.Environment() 577 + events = [] 578 + config = PEConfig(frame_count=4, lane_count=2, on_event=events.append) 579 + pe = ProcessingElement( 580 + env=env, 581 + pe_id=0, 582 + config=config, 583 + ) 584 + 585 + # Allocate a frame with act_id=1 (gets lane 0) 586 + fct_alloc1 = FrameControlToken(target=0, act_id=1, op=FrameOp.ALLOC, payload=0) 587 + inject_and_run(env, pe, fct_alloc1) 588 + 589 + frame_id, lane1 = pe.tag_store[1] 590 + assert lane1 == 0 591 + 592 + # Allocate shared child with act_id=2 (gets lane 1) 593 + fct_alloc_shared = FrameControlToken(target=0, act_id=2, op=FrameOp.ALLOC_SHARED, payload=1) 594 + inject_and_run(env, pe, fct_alloc_shared) 595 + 596 + frame_id2, lane2 = pe.tag_store[2] 597 + assert frame_id2 == frame_id 598 + assert lane2 == 1 599 + 600 + # Now FREE_LANE the child (act_id=2) — should not return frame (still in use) 601 + fct_free_lane_child = FrameControlToken(target=0, act_id=2, op=FrameOp.FREE_LANE, payload=0) 602 + inject_and_run(env, pe, fct_free_lane_child) 603 + 604 + # Lane should be freed, frame still in use 605 + assert 2 not in pe.tag_store 606 + assert frame_id in pe.lane_free or frame_id not in [fid for fid, _ in pe.tag_store.values()] 607 + frame_freed_child = [e for e in events if isinstance(e, FrameFreed) and e.act_id == 2] 608 + assert len(frame_freed_child) > 0 609 + assert frame_freed_child[0].frame_freed == False # Lane freed, not frame 610 + 611 + # Now FREE_LANE the parent (act_id=1) — this is the last lane, should return frame 612 + fct_free_lane_parent = FrameControlToken(target=0, act_id=1, op=FrameOp.FREE_LANE, payload=0) 613 + inject_and_run(env, pe, fct_free_lane_parent) 614 + 615 + # Frame should now be in free_frames 616 + assert 1 not in pe.tag_store 617 + assert frame_id in pe.free_frames 618 + frame_freed_parent = [e for e in events if isinstance(e, FrameFreed) and e.act_id == 1] 619 + assert len(frame_freed_parent) > 0 620 + assert frame_freed_parent[0].frame_freed == True # Last lane, frame returned 621 + 622 + 565 623 class TestPELocalWriteToken: 566 624 """AC3.9: PELocalWriteToken with is_dest=True decodes data to FrameDest.""" 567 625 ··· 613 671 # Allocate frame 614 672 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 615 673 inject_and_run(env, pe, fct) 616 - frame_id = pe.tag_store[0] 674 + frame_id, _lane = pe.tag_store[0] 617 675 618 676 # Write FrameDest to frame slot 15, is_dest=True 619 677 dest = FrameDest( ··· 682 740 assert len(rejected) > 0 683 741 assert rejected[0].token == tok 684 742 685 - # Should not crash 686 - assert True 687 - 688 743 689 744 class TestDualDestInherit: 690 745 """IMPORTANT 2: dest_count=2 non-switch: verify both destinations receive same result.""" ··· 702 757 # Allocate frame 703 758 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 704 759 inject_and_run(env, pe, fct) 705 - frame_id = pe.tag_store[0] 760 + frame_id, _lane = pe.tag_store[0] 706 761 707 762 # Set up instruction: mode 2 (no const, dest_count=2), fref=8 708 763 inst = Instruction( ··· 766 821 # Allocate frame 767 822 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 768 823 inject_and_run(env, pe, fct) 769 - frame_id = pe.tag_store[0] 824 + frame_id, _lane = pe.tag_store[0] 770 825 771 826 # Set up SWEQ instruction with dest_count=2, fref=8 772 827 inst = Instruction( ··· 829 884 # Allocate frame 830 885 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 831 886 inject_and_run(env, pe, fct) 832 - frame_id = pe.tag_store[0] 887 + frame_id, _lane = pe.tag_store[0] 833 888 834 889 # Set up SWEQ instruction with dest_count=2, fref=8 835 890 inst = Instruction( ··· 897 952 # Allocate frame 898 953 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 899 954 inject_and_run(env, pe, fct) 900 - frame_id = pe.tag_store[0] 955 + frame_id, _lane = pe.tag_store[0] 901 956 902 957 # Set up GATE instruction 903 958 inst = Instruction( ··· 946 1001 # Allocate frame 947 1002 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 948 1003 inject_and_run(env, pe, fct) 949 - frame_id = pe.tag_store[0] 1004 + frame_id, _lane = pe.tag_store[0] 950 1005 951 1006 # Set up GATE instruction 952 1007 inst = Instruction( ··· 999 1054 # Allocate frame 1000 1055 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 1001 1056 inject_and_run(env, pe, fct) 1002 - frame_id = pe.tag_store[0] 1057 + frame_id, _lane = pe.tag_store[0] 1003 1058 1004 1059 # Set up SM READ instruction with const (mode 1: const, dest with return route), fref=8 1005 1060 # Const slot contains the SM target, dest slot contains return route ··· 1090 1145 # Allocate frame 1091 1146 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 1092 1147 inject_and_run(env, pe, fct) 1093 - frame_id = pe.tag_store[0] 1148 + frame_id, _lane = pe.tag_store[0] 1094 1149 1095 1150 # Set up dyadic instruction 1096 1151 inst = Instruction( ··· 1172 1227 # Allocate frame 1173 1228 fct = FrameControlToken(target=0, act_id=0, op=FrameOp.ALLOC, payload=0) 1174 1229 inject_and_run(env, pe, fct) 1175 - frame_id = pe.tag_store[0] 1230 + frame_id, _lane = pe.tag_store[0] 1176 1231 1177 1232 # Set up monadic instruction 1178 1233 inst = Instruction( ··· 1282 1337 # Allocate frame 1283 1338 fct = FrameControlToken(target=0, act_id=5, op=FrameOp.ALLOC, payload=0) 1284 1339 inject_and_run(env, pe, fct) 1285 - frame_id = pe.tag_store[5] 1340 + frame_id, _lane = pe.tag_store[5] 1286 1341 1287 1342 # Set up EXTRACT_TAG instruction 1288 1343 inst = Instruction( ··· 1354 1409 # Allocate frame 1355 1410 fct = FrameControlToken(target=0, act_id=7, op=FrameOp.ALLOC, payload=0) 1356 1411 inject_and_run(env, pe, fct) 1357 - frame_id = pe.tag_store[7] 1412 + frame_id, _lane = pe.tag_store[7] 1358 1413 1359 1414 # Set up SM READ instruction (monadic in terms of PE pipeline) 1360 1415 inst = Instruction( ··· 1428 1483 # Allocate frame 1429 1484 fct = FrameControlToken(target=0, act_id=10, op=FrameOp.ALLOC, payload=0) 1430 1485 inject_and_run(env, pe, fct) 1431 - frame_id = pe.tag_store[10] 1486 + frame_id, _lane = pe.tag_store[10] 1432 1487 1433 1488 # Set up FREE_FRAME instruction 1434 1489 inst = Instruction( ··· 1491 1546 # Allocate frame 1492 1547 fct = FrameControlToken(target=0, act_id=12, op=FrameOp.ALLOC, payload=0) 1493 1548 inject_and_run(env, pe, fct) 1494 - frame_id = pe.tag_store[12] 1549 + frame_id, _lane = pe.tag_store[12] 1495 1550 1496 1551 # Set up ALLOC_REMOTE instruction 1497 1552 inst = Instruction(
+1165
tests/test_pe_lanes.py
··· 1 + """ 2 + Lane-based PE rewrite tests. 3 + 4 + Verifies frame-lanes.AC3, frame-lanes.AC4, frame-lanes.AC5, and frame-lanes.AC8: 5 + - AC3.1: FrameOp.ALLOC_SHARED assigns next free lane from parent frame 6 + - AC3.2: FrameOp.FREE_LANE removes tag_store entry, clears lane data, keeps frame 7 + - AC3.3: FrameOp.FREE on shared frame returns lane if frame still in use 8 + - AC3.4: FrameOp.ALLOC unchanged — allocates fresh frame, assigns lane 0 9 + - AC3.5: FrameAllocated event gains lane field 10 + - AC3.6: ALLOC_SHARED with all lanes occupied emits TokenRejected 11 + - AC4: ALLOC_REMOTE reads fref+2 for data-driven ALLOC_SHARED vs ALLOC 12 + - AC5.1: FREE_FRAME opcode uses smart FREE behaviour on shared frames 13 + - AC8.1: Two act_ids sharing a frame have independent matching 14 + - AC8.2: ALLOC_SHARED with exhausted lanes emits TokenRejected 15 + - AC8.3: FREE on shared frame preserves other lanes' data 16 + - AC8.4: ALLOC_REMOTE emits ALLOC_SHARED when fref+2 is non-zero 17 + - AC8.5: ALLOC_REMOTE emits ALLOC when fref+2 is zero (backwards compatible) 18 + - AC8.6: Full loop pipelining scenario — two iterations concurrent on different lanes 19 + """ 20 + 21 + import pytest 22 + import simpy 23 + 24 + from cm_inst import ( 25 + ArithOp, FrameDest, FrameOp, Instruction, Port, TokenKind, OutputStyle, 26 + RoutingOp, 27 + ) 28 + from emu.events import ( 29 + FrameAllocated, FrameFreed, TokenReceived, TokenRejected, Matched, Emitted, 30 + ) 31 + from emu.pe import ProcessingElement 32 + from emu.types import PEConfig 33 + from tokens import DyadToken, FrameControlToken 34 + 35 + 36 + def inject_and_run(env, pe, token): 37 + """Helper: inject token and run simulation.""" 38 + def _put(): 39 + yield pe.input_store.put(token) 40 + env.process(_put()) 41 + env.run() 42 + 43 + 44 + class TestAllocShared: 45 + """AC3.1: ALLOC_SHARED assigns next free lane from parent frame.""" 46 + 47 + def test_alloc_shared_basic(self): 48 + """Parent allocates frame, child allocates shared lane.""" 49 + env = simpy.Environment() 50 + events = [] 51 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 52 + pe = ProcessingElement(env=env, pe_id=0, config=config) 53 + 54 + # Parent ALLOC 55 + fct_parent = FrameControlToken( 56 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 57 + ) 58 + inject_and_run(env, pe, fct_parent) 59 + 60 + parent_frame_id, parent_lane = pe.tag_store[0] 61 + assert parent_lane == 0, "Parent should allocate lane 0" 62 + 63 + # Child ALLOC_SHARED with parent_act_id=0 64 + fct_child = FrameControlToken( 65 + target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0 66 + ) 67 + inject_and_run(env, pe, fct_child) 68 + 69 + child_frame_id, child_lane = pe.tag_store[1] 70 + assert child_frame_id == parent_frame_id, "Child should share parent's frame" 71 + assert child_lane == 1, "Child should allocate lane 1" 72 + assert child_lane != parent_lane, "Child lane should differ from parent" 73 + 74 + # Verify FrameAllocated event for child 75 + frame_allocated = [e for e in events if isinstance(e, FrameAllocated)] 76 + assert len(frame_allocated) >= 2, "Should have 2 FrameAllocated events" 77 + assert frame_allocated[0].lane == 0, "Parent allocated lane 0" 78 + assert frame_allocated[1].lane == 1, "Child allocated lane 1" 79 + 80 + def test_alloc_shared_multiple_lanes(self): 81 + """Multiple children allocate different lanes from same parent frame.""" 82 + env = simpy.Environment() 83 + events = [] 84 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 85 + pe = ProcessingElement(env=env, pe_id=0, config=config) 86 + 87 + # Parent ALLOC 88 + fct_parent = FrameControlToken( 89 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 90 + ) 91 + inject_and_run(env, pe, fct_parent) 92 + parent_frame_id, _parent_lane = pe.tag_store[0] 93 + 94 + # Child 1 ALLOC_SHARED 95 + fct_child1 = FrameControlToken( 96 + target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0 97 + ) 98 + inject_and_run(env, pe, fct_child1) 99 + _child1_frame_id, child1_lane = pe.tag_store[1] 100 + 101 + # Child 2 ALLOC_SHARED 102 + fct_child2 = FrameControlToken( 103 + target=0, act_id=2, op=FrameOp.ALLOC_SHARED, payload=0 104 + ) 105 + inject_and_run(env, pe, fct_child2) 106 + _child2_frame_id, child2_lane = pe.tag_store[2] 107 + 108 + # All should share same frame 109 + assert pe.tag_store[0][0] == parent_frame_id 110 + assert pe.tag_store[1][0] == parent_frame_id 111 + assert pe.tag_store[2][0] == parent_frame_id 112 + 113 + # Lanes should differ: 0, 1, 2 114 + assert child1_lane != 0, "Child1 lane should not be 0" 115 + assert child2_lane != 0, "Child2 lane should not be 0" 116 + assert child1_lane != child2_lane, "Child1 and child2 lanes should differ" 117 + 118 + def test_alloc_shared_invalid_parent(self): 119 + """ALLOC_SHARED with non-existent parent emits TokenRejected.""" 120 + env = simpy.Environment() 121 + events = [] 122 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 123 + pe = ProcessingElement(env=env, pe_id=0, config=config) 124 + 125 + # Try ALLOC_SHARED with non-existent parent_act_id=999 126 + fct = FrameControlToken( 127 + target=0, act_id=0, op=FrameOp.ALLOC_SHARED, payload=999 128 + ) 129 + inject_and_run(env, pe, fct) 130 + 131 + rejected = [e for e in events if isinstance(e, TokenRejected)] 132 + assert len(rejected) > 0, "Should have TokenRejected event" 133 + assert "not in tag store" in rejected[0].reason, "Reason should mention tag_store" 134 + 135 + # Parent should not be in tag_store 136 + assert 999 not in pe.tag_store 137 + 138 + def test_alloc_shared_self_referential_guard(self): 139 + """ALLOC_SHARED with act_id already in tag_store emits TokenRejected.""" 140 + env = simpy.Environment() 141 + events = [] 142 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 143 + pe = ProcessingElement(env=env, pe_id=0, config=config) 144 + 145 + # First ALLOC to establish act_id=0 in tag_store 146 + fct_alloc = FrameControlToken( 147 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 148 + ) 149 + inject_and_run(env, pe, fct_alloc) 150 + assert 0 in pe.tag_store, "act_id=0 should be in tag_store after ALLOC" 151 + frame_id_0, lane_0 = pe.tag_store[0] 152 + 153 + # Now try ALLOC_SHARED with act_id=0 and payload=1 (parent_act_id=1) 154 + # This should be rejected because act_id=0 already exists 155 + fct_alloc_parent = FrameControlToken( 156 + target=0, act_id=1, op=FrameOp.ALLOC, payload=0 157 + ) 158 + inject_and_run(env, pe, fct_alloc_parent) 159 + assert 1 in pe.tag_store, "act_id=1 should be in tag_store after ALLOC" 160 + 161 + events.clear() 162 + fct_shared = FrameControlToken( 163 + target=0, act_id=0, op=FrameOp.ALLOC_SHARED, payload=1 164 + ) 165 + inject_and_run(env, pe, fct_shared) 166 + 167 + rejected = [e for e in events if isinstance(e, TokenRejected)] 168 + assert len(rejected) > 0, "Should have TokenRejected event" 169 + assert "already in tag store" in rejected[0].reason, "Reason should mention already in tag store" 170 + 171 + # Frame and lane should be unchanged 172 + assert pe.tag_store[0] == (frame_id_0, lane_0), "act_id=0 state should be unchanged" 173 + 174 + 175 + class TestLaneExhaustion: 176 + """AC3.6, AC8.2: Lane exhaustion and TokenRejected.""" 177 + 178 + def test_alloc_shared_exhausts_all_lanes(self): 179 + """Allocate all lanes, then ALLOC_SHARED fails with TokenRejected.""" 180 + env = simpy.Environment() 181 + events = [] 182 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 183 + pe = ProcessingElement(env=env, pe_id=0, config=config) 184 + 185 + # Parent ALLOC uses lane 0 186 + fct_parent = FrameControlToken( 187 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 188 + ) 189 + inject_and_run(env, pe, fct_parent) 190 + 191 + # Allocate lanes 1, 2, 3 192 + for i in range(1, 4): 193 + fct = FrameControlToken( 194 + target=0, act_id=i, op=FrameOp.ALLOC_SHARED, payload=0 195 + ) 196 + inject_and_run(env, pe, fct) 197 + assert i in pe.tag_store, f"Child {i} should be allocated" 198 + 199 + # Try to allocate one more (all lanes exhausted) 200 + fct_fail = FrameControlToken( 201 + target=0, act_id=4, op=FrameOp.ALLOC_SHARED, payload=0 202 + ) 203 + inject_and_run(env, pe, fct_fail) 204 + 205 + rejected = [e for e in events if isinstance(e, TokenRejected)] 206 + assert len(rejected) > 0, "Should have TokenRejected event" 207 + assert "no free lanes" in rejected[0].reason, "Reason should be 'no free lanes'" 208 + 209 + # act_id=4 should not be in tag_store 210 + assert 4 not in pe.tag_store, "Failed allocation should not add to tag_store" 211 + 212 + def test_lane_exhaustion_with_multiple_frames(self): 213 + """Lane exhaustion is per-frame; different frames have independent lanes.""" 214 + env = simpy.Environment() 215 + events = [] 216 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 217 + pe = ProcessingElement(env=env, pe_id=0, config=config) 218 + 219 + # Frame 1: Parent 0 allocates lane 0 220 + fct1 = FrameControlToken( 221 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 222 + ) 223 + inject_and_run(env, pe, fct1) 224 + frame1_id, _lane = pe.tag_store[0] 225 + 226 + # Frame 2: Parent 10 allocates lane 0 227 + fct2 = FrameControlToken( 228 + target=0, act_id=10, op=FrameOp.ALLOC, payload=0 229 + ) 230 + inject_and_run(env, pe, fct2) 231 + frame2_id, _lane = pe.tag_store[10] 232 + 233 + assert frame1_id != frame2_id, "Should allocate different frames" 234 + 235 + # Frame 1: Exhaust all lanes 236 + for i in range(1, 4): 237 + fct = FrameControlToken( 238 + target=0, act_id=i, op=FrameOp.ALLOC_SHARED, payload=0 239 + ) 240 + inject_and_run(env, pe, fct) 241 + 242 + # Frame 2: Can still allocate more lanes (independent) 243 + for i in range(11, 14): 244 + fct = FrameControlToken( 245 + target=0, act_id=i, op=FrameOp.ALLOC_SHARED, payload=10 246 + ) 247 + inject_and_run(env, pe, fct) 248 + assert i in pe.tag_store, f"Frame2 child {i} should be allocated" 249 + 250 + 251 + class TestFreeLane: 252 + """AC3.2: FREE_LANE clears lane data, keeps frame, returns lane to pool.""" 253 + 254 + def test_free_lane_basic(self): 255 + """FREE_LANE removes act_id from tag_store, clears lane data, keeps frame.""" 256 + env = simpy.Environment() 257 + events = [] 258 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 259 + pe = ProcessingElement(env=env, pe_id=0, config=config) 260 + 261 + # Parent ALLOC 262 + fct_parent = FrameControlToken( 263 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 264 + ) 265 + inject_and_run(env, pe, fct_parent) 266 + parent_frame_id, _parent_lane = pe.tag_store[0] 267 + 268 + # Child ALLOC_SHARED 269 + fct_child = FrameControlToken( 270 + target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0 271 + ) 272 + inject_and_run(env, pe, fct_child) 273 + _child_frame_id, child_lane = pe.tag_store[1] 274 + 275 + # FREE_LANE for child 276 + fct_free = FrameControlToken( 277 + target=0, act_id=1, op=FrameOp.FREE_LANE, payload=0 278 + ) 279 + inject_and_run(env, pe, fct_free) 280 + 281 + # Child should be removed from tag_store 282 + assert 1 not in pe.tag_store, "Child should be removed from tag_store" 283 + 284 + # Parent should still be present 285 + assert 0 in pe.tag_store, "Parent should still be in tag_store" 286 + 287 + # Frame should NOT be in free_frames (still used by parent) 288 + assert parent_frame_id not in pe.free_frames, "Frame should not be free" 289 + 290 + # FrameFreed event should have frame_freed=False 291 + frame_freed = [e for e in events if isinstance(e, FrameFreed)] 292 + assert len(frame_freed) > 0, "Should have FrameFreed event" 293 + assert frame_freed[-1].frame_freed == False, "frame_freed should be False" 294 + assert frame_freed[-1].lane == child_lane, "Event should report correct lane" 295 + 296 + def test_free_lane_returns_lane_to_pool(self): 297 + """After FREE_LANE, freed lane can be reused by ALLOC_SHARED.""" 298 + env = simpy.Environment() 299 + events = [] 300 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 301 + pe = ProcessingElement(env=env, pe_id=0, config=config) 302 + 303 + # Parent ALLOC 304 + fct_parent = FrameControlToken( 305 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 306 + ) 307 + inject_and_run(env, pe, fct_parent) 308 + parent_frame_id, _parent_lane = pe.tag_store[0] 309 + 310 + # Child 1 ALLOC_SHARED (lane 1) 311 + fct_child1 = FrameControlToken( 312 + target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0 313 + ) 314 + inject_and_run(env, pe, fct_child1) 315 + _child1_frame_id, child1_lane = pe.tag_store[1] 316 + assert child1_lane == 1 317 + 318 + # FREE_LANE child 1 319 + fct_free = FrameControlToken( 320 + target=0, act_id=1, op=FrameOp.FREE_LANE, payload=0 321 + ) 322 + inject_and_run(env, pe, fct_free) 323 + 324 + # Child 2 ALLOC_SHARED (should get lane 1 again) 325 + fct_child2 = FrameControlToken( 326 + target=0, act_id=2, op=FrameOp.ALLOC_SHARED, payload=0 327 + ) 328 + inject_and_run(env, pe, fct_child2) 329 + _child2_frame_id, child2_lane = pe.tag_store[2] 330 + 331 + # Lane 1 should be reused for child 2 332 + assert child2_lane == 1, "Freed lane 1 should be reused" 333 + 334 + 335 + class TestIndependentMatching: 336 + """AC8.1: Two act_ids sharing a frame have independent matching.""" 337 + 338 + def test_independent_matching_same_offset(self): 339 + """L operand for act_id 0 does not interfere with L for act_id 1.""" 340 + env = simpy.Environment() 341 + events = [] 342 + config = PEConfig( 343 + frame_count=4, lane_count=4, matchable_offsets=4, on_event=events.append 344 + ) 345 + pe = ProcessingElement(env=env, pe_id=0, config=config) 346 + 347 + # Parent ALLOC 348 + fct_parent = FrameControlToken( 349 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 350 + ) 351 + inject_and_run(env, pe, fct_parent) 352 + parent_frame_id, _parent_lane = pe.tag_store[0] 353 + 354 + # Child ALLOC_SHARED 355 + fct_child = FrameControlToken( 356 + target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0 357 + ) 358 + inject_and_run(env, pe, fct_child) 359 + _child_frame_id, child_lane = pe.tag_store[1] 360 + 361 + # Install dyadic instruction at offset 0 362 + inst = Instruction( 363 + opcode=ArithOp.ADD, 364 + output=OutputStyle.SINK, 365 + has_const=False, 366 + dest_count=0, 367 + wide=False, 368 + fref=0, 369 + ) 370 + pe.iram[0] = inst 371 + 372 + # Send L operand for act_id=0 373 + tok_l_0 = DyadToken( 374 + target=0, offset=0, act_id=0, data=5, port=Port.L 375 + ) 376 + inject_and_run(env, pe, tok_l_0) 377 + 378 + # Should have 1 TokenReceived, 0 Matched (waiting for R) 379 + matched = [e for e in events if isinstance(e, Matched)] 380 + assert len(matched) == 0, "Should not match yet (waiting for R)" 381 + 382 + # Send L operand for act_id=1 at same offset 383 + tok_l_1 = DyadToken( 384 + target=0, offset=0, act_id=1, data=7, port=Port.L 385 + ) 386 + inject_and_run(env, pe, tok_l_1) 387 + 388 + # Should still have 0 Matched (both waiting for R) 389 + matched = [e for e in events if isinstance(e, Matched)] 390 + assert len(matched) == 0, "Both should be waiting for R" 391 + 392 + # Send R for act_id=0 393 + tok_r_0 = DyadToken( 394 + target=0, offset=0, act_id=0, data=3, port=Port.R 395 + ) 396 + inject_and_run(env, pe, tok_r_0) 397 + 398 + # Should now have 1 Matched for act_id=0 399 + matched = [e for e in events if isinstance(e, Matched)] 400 + assert len(matched) == 1, "Should have 1 match for act_id=0" 401 + assert matched[0].act_id == 0, "Match should be for act_id=0" 402 + assert matched[0].left == 5, "Left should be 5" 403 + assert matched[0].right == 3, "Right should be 3" 404 + 405 + # Send R for act_id=1 406 + tok_r_1 = DyadToken( 407 + target=0, offset=0, act_id=1, data=2, port=Port.R 408 + ) 409 + inject_and_run(env, pe, tok_r_1) 410 + 411 + # Should now have 2 Matched 412 + matched = [e for e in events if isinstance(e, Matched)] 413 + assert len(matched) == 2, "Should have 2 matches total" 414 + m1 = [m for m in matched if m.act_id == 1][0] 415 + assert m1.left == 7, "act_id=1 left should be 7" 416 + assert m1.right == 2, "act_id=1 right should be 2" 417 + 418 + def test_independent_matching_different_offsets(self): 419 + """Different offsets per lane maintain independence.""" 420 + env = simpy.Environment() 421 + events = [] 422 + config = PEConfig( 423 + frame_count=4, lane_count=4, matchable_offsets=4, on_event=events.append 424 + ) 425 + pe = ProcessingElement(env=env, pe_id=0, config=config) 426 + 427 + # Parent ALLOC 428 + fct_parent = FrameControlToken( 429 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 430 + ) 431 + inject_and_run(env, pe, fct_parent) 432 + 433 + # Child ALLOC_SHARED 434 + fct_child = FrameControlToken( 435 + target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0 436 + ) 437 + inject_and_run(env, pe, fct_child) 438 + 439 + # Install dyadic instructions at offsets 0 and 1 440 + inst0 = Instruction( 441 + opcode=ArithOp.ADD, output=OutputStyle.SINK, 442 + has_const=False, dest_count=0, wide=False, fref=0 443 + ) 444 + inst1 = Instruction( 445 + opcode=ArithOp.SUB, output=OutputStyle.SINK, 446 + has_const=False, dest_count=0, wide=False, fref=0 447 + ) 448 + pe.iram[0] = inst0 449 + pe.iram[1] = inst1 450 + 451 + # Send L for act_id=0 at offset 0 452 + tok_l_0_off0 = DyadToken( 453 + target=0, offset=0, act_id=0, data=10, port=Port.L 454 + ) 455 + inject_and_run(env, pe, tok_l_0_off0) 456 + 457 + # Send L for act_id=1 at offset 1 458 + tok_l_1_off1 = DyadToken( 459 + target=0, offset=1, act_id=1, data=20, port=Port.L 460 + ) 461 + inject_and_run(env, pe, tok_l_1_off1) 462 + 463 + # Neither should match yet 464 + matched = [e for e in events if isinstance(e, Matched)] 465 + assert len(matched) == 0, "No matches yet" 466 + 467 + # Send R for act_id=0 at offset 0 468 + tok_r_0_off0 = DyadToken( 469 + target=0, offset=0, act_id=0, data=5, port=Port.R 470 + ) 471 + inject_and_run(env, pe, tok_r_0_off0) 472 + 473 + # Should match for offset 0 474 + matched = [e for e in events if isinstance(e, Matched)] 475 + assert len(matched) == 1, "Should have 1 match" 476 + assert matched[0].offset == 0, "Match should be at offset 0" 477 + 478 + # Send R for act_id=1 at offset 1 479 + tok_r_1_off1 = DyadToken( 480 + target=0, offset=1, act_id=1, data=15, port=Port.R 481 + ) 482 + inject_and_run(env, pe, tok_r_1_off1) 483 + 484 + # Should match for offset 1 485 + matched = [e for e in events if isinstance(e, Matched)] 486 + assert len(matched) == 2, "Should have 2 matches" 487 + m1 = [m for m in matched if m.offset == 1][0] 488 + assert m1.act_id == 1, "Offset 1 match should be act_id=1" 489 + 490 + 491 + class TestSmartFree: 492 + """AC3.3, AC8.3: Smart FREE on shared frames preserves data and manages lanes.""" 493 + 494 + def test_free_on_shared_frame_preserves_other_lanes(self): 495 + """FREE on act_id=0 when act_id=1 uses frame; lane 1 data preserved.""" 496 + env = simpy.Environment() 497 + events = [] 498 + config = PEConfig( 499 + frame_count=4, lane_count=4, matchable_offsets=4, on_event=events.append 500 + ) 501 + pe = ProcessingElement(env=env, pe_id=0, config=config) 502 + 503 + # Parent ALLOC 504 + fct_parent = FrameControlToken( 505 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 506 + ) 507 + inject_and_run(env, pe, fct_parent) 508 + parent_frame_id, _parent_lane = pe.tag_store[0] 509 + 510 + # Child ALLOC_SHARED 511 + fct_child = FrameControlToken( 512 + target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0 513 + ) 514 + inject_and_run(env, pe, fct_child) 515 + _child_frame_id, child_lane = pe.tag_store[1] 516 + 517 + # Install instruction 518 + inst = Instruction( 519 + opcode=ArithOp.ADD, output=OutputStyle.SINK, 520 + has_const=False, dest_count=0, wide=False, fref=0 521 + ) 522 + pe.iram[0] = inst 523 + 524 + # Store L operand on child's lane 525 + tok_l_1 = DyadToken( 526 + target=0, offset=0, act_id=1, data=7, port=Port.L 527 + ) 528 + inject_and_run(env, pe, tok_l_1) 529 + 530 + # Verify child's match slot has data 531 + frame_id, lane = pe.tag_store[1] 532 + assert pe.match_data[frame_id][0][lane] == 7, "Child lane should have L operand" 533 + assert pe.presence[frame_id][0][lane] == True, "Child presence should be set" 534 + 535 + # FREE parent 536 + fct_free_parent = FrameControlToken( 537 + target=0, act_id=0, op=FrameOp.FREE, payload=0 538 + ) 539 + inject_and_run(env, pe, fct_free_parent) 540 + 541 + # Parent should be removed, child should still be present 542 + assert 0 not in pe.tag_store, "Parent should be removed" 543 + assert 1 in pe.tag_store, "Child should still be present" 544 + 545 + # Frame should NOT be in free_frames 546 + assert parent_frame_id not in pe.free_frames, "Frame should not be free" 547 + 548 + # Child's match data should be preserved 549 + assert pe.match_data[frame_id][0][lane] == 7, "Child data should be preserved" 550 + assert pe.presence[frame_id][0][lane] == True, "Child presence should be preserved" 551 + 552 + # FrameFreed event should have frame_freed=False 553 + frame_freed = [e for e in events if isinstance(e, FrameFreed)] 554 + assert any(e.frame_freed == False for e in frame_freed), "Should have frame_freed=False" 555 + 556 + def test_free_last_lane_returns_frame(self): 557 + """FREE on last act_id using frame returns frame to free_frames.""" 558 + env = simpy.Environment() 559 + events = [] 560 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 561 + pe = ProcessingElement(env=env, pe_id=0, config=config) 562 + 563 + # Parent ALLOC 564 + fct_parent = FrameControlToken( 565 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 566 + ) 567 + inject_and_run(env, pe, fct_parent) 568 + parent_frame_id, _parent_lane = pe.tag_store[0] 569 + 570 + # Child ALLOC_SHARED 571 + fct_child = FrameControlToken( 572 + target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0 573 + ) 574 + inject_and_run(env, pe, fct_child) 575 + 576 + # FREE child 577 + fct_free_child = FrameControlToken( 578 + target=0, act_id=1, op=FrameOp.FREE_LANE, payload=0 579 + ) 580 + inject_and_run(env, pe, fct_free_child) 581 + 582 + # Frame should still not be free (parent still using it) 583 + assert parent_frame_id not in pe.free_frames, "Frame should not be free yet" 584 + 585 + # FREE parent 586 + fct_free_parent = FrameControlToken( 587 + target=0, act_id=0, op=FrameOp.FREE, payload=0 588 + ) 589 + inject_and_run(env, pe, fct_free_parent) 590 + 591 + # Now frame should be free 592 + assert parent_frame_id in pe.free_frames, "Frame should be free" 593 + 594 + # tag_store should be empty 595 + assert len(pe.tag_store) == 0, "tag_store should be empty" 596 + 597 + # lane_free entry should be cleaned up 598 + assert parent_frame_id not in pe.lane_free, "lane_free entry should be cleaned" 599 + 600 + # FrameFreed event should have frame_freed=True 601 + frame_freed = [e for e in events if isinstance(e, FrameFreed)] 602 + assert any(e.frame_freed == True for e in frame_freed), "Should have frame_freed=True" 603 + 604 + def test_alloc_unchanged_allocates_fresh_frame(self): 605 + """Regular ALLOC still works: allocates fresh frame, lane 0.""" 606 + env = simpy.Environment() 607 + events = [] 608 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 609 + pe = ProcessingElement(env=env, pe_id=0, config=config) 610 + 611 + # First ALLOC 612 + fct1 = FrameControlToken( 613 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 614 + ) 615 + inject_and_run(env, pe, fct1) 616 + frame_id_0, lane_0 = pe.tag_store[0] 617 + assert lane_0 == 0, "First ALLOC should assign lane 0" 618 + 619 + # Second ALLOC (different frame) 620 + fct2 = FrameControlToken( 621 + target=0, act_id=10, op=FrameOp.ALLOC, payload=0 622 + ) 623 + inject_and_run(env, pe, fct2) 624 + frame_id_10, lane_10 = pe.tag_store[10] 625 + assert lane_10 == 0, "Second ALLOC should assign lane 0" 626 + 627 + # Frames should be different 628 + assert frame_id_0 != frame_id_10, "Different ALLOC should get different frames" 629 + 630 + def test_data_preservation_across_free_lanes(self): 631 + """Match data on one lane not affected by FREE of another lane.""" 632 + env = simpy.Environment() 633 + events = [] 634 + config = PEConfig( 635 + frame_count=4, lane_count=4, matchable_offsets=4, on_event=events.append 636 + ) 637 + pe = ProcessingElement(env=env, pe_id=0, config=config) 638 + 639 + # Parent ALLOC 640 + fct_parent = FrameControlToken( 641 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 642 + ) 643 + inject_and_run(env, pe, fct_parent) 644 + frame_id, _parent_lane = pe.tag_store[0] 645 + 646 + # Child 1 ALLOC_SHARED 647 + fct_child1 = FrameControlToken( 648 + target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0 649 + ) 650 + inject_and_run(env, pe, fct_child1) 651 + _frame_id_1, lane_1 = pe.tag_store[1] 652 + 653 + # Child 2 ALLOC_SHARED 654 + fct_child2 = FrameControlToken( 655 + target=0, act_id=2, op=FrameOp.ALLOC_SHARED, payload=0 656 + ) 657 + inject_and_run(env, pe, fct_child2) 658 + _frame_id_2, lane_2 = pe.tag_store[2] 659 + 660 + # Install instruction 661 + inst = Instruction( 662 + opcode=ArithOp.ADD, output=OutputStyle.SINK, 663 + has_const=False, dest_count=0, wide=False, fref=0 664 + ) 665 + pe.iram[0] = inst 666 + 667 + # Store L operand on lane 1 668 + tok_l_1 = DyadToken( 669 + target=0, offset=0, act_id=1, data=7, port=Port.L 670 + ) 671 + inject_and_run(env, pe, tok_l_1) 672 + 673 + # Store L operand on lane 2 674 + tok_l_2 = DyadToken( 675 + target=0, offset=0, act_id=2, data=11, port=Port.L 676 + ) 677 + inject_and_run(env, pe, tok_l_2) 678 + 679 + # FREE lane 1 680 + fct_free_1 = FrameControlToken( 681 + target=0, act_id=1, op=FrameOp.FREE_LANE, payload=0 682 + ) 683 + inject_and_run(env, pe, fct_free_1) 684 + 685 + # Lane 2's data should be untouched 686 + assert pe.match_data[frame_id][0][lane_2] == 11, "Lane 2 data should be preserved" 687 + assert pe.presence[frame_id][0][lane_2] == True, "Lane 2 presence should be preserved" 688 + 689 + # Lane 1 should be cleared 690 + assert pe.match_data[frame_id][0][lane_1] is None, "Lane 1 data should be cleared" 691 + assert pe.presence[frame_id][0][lane_1] == False, "Lane 1 presence should be cleared" 692 + 693 + 694 + class TestAllocRemoteDataDriven: 695 + """AC8.4, AC8.5: ALLOC_REMOTE reads fref+2 for data-driven ALLOC_SHARED vs ALLOC.""" 696 + 697 + def test_alloc_remote_emits_alloc_shared_when_parent_nonzero(self): 698 + """AC8.4: ALLOC_REMOTE emits ALLOC_SHARED when fref+2 is non-zero.""" 699 + env = simpy.Environment() 700 + events = [] 701 + output_store = simpy.Store(env) 702 + 703 + # PE0: source of ALLOC_REMOTE 704 + config0 = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 705 + pe0 = ProcessingElement(env=env, pe_id=0, config=config0) 706 + pe0.route_table[1] = output_store # Capture emitted token 707 + 708 + # Allocate a frame for act_id=0 on PE0 709 + fct_parent = FrameControlToken( 710 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 711 + ) 712 + inject_and_run(env, pe0, fct_parent) 713 + frame_id, _lane = pe0.tag_store[0] 714 + 715 + # Set up ALLOC_REMOTE instruction with fref pointing to frame constants 716 + # fref+0: target PE=1, fref+1: target act_id=5, fref+2: parent act_id=3 717 + inst = Instruction( 718 + opcode=RoutingOp.ALLOC_REMOTE, 719 + output=OutputStyle.SINK, # Not used for ALLOC_REMOTE 720 + has_const=False, 721 + dest_count=0, 722 + wide=False, 723 + fref=10, 724 + ) 725 + pe0.iram[0] = inst 726 + 727 + # Load frame slots with constants 728 + pe0.frames[frame_id][10] = 1 # target PE 729 + pe0.frames[frame_id][11] = 5 # target act_id 730 + pe0.frames[frame_id][12] = 3 # parent act_id (non-zero = ALLOC_SHARED) 731 + 732 + # Send MonadToken to trigger ALLOC_REMOTE 733 + tok = DyadToken( 734 + target=0, offset=0, act_id=0, data=0, port=Port.L 735 + ) 736 + inject_and_run(env, pe0, tok) 737 + 738 + # Verify FrameControlToken was emitted with ALLOC_SHARED 739 + assert len(output_store.items) > 0, "Should have emitted a token" 740 + emitted = output_store.items[0] 741 + assert isinstance(emitted, FrameControlToken), "Should emit FrameControlToken" 742 + assert emitted.op == FrameOp.ALLOC_SHARED, "Should emit ALLOC_SHARED" 743 + assert emitted.payload == 3, "Payload should be parent act_id=3" 744 + assert emitted.target == 1, "Should target PE 1" 745 + assert emitted.act_id == 5, "Should target act_id 5" 746 + 747 + def test_alloc_remote_emits_alloc_when_parent_zero(self): 748 + """AC8.5: ALLOC_REMOTE emits ALLOC when fref+2 is zero (backwards compatible).""" 749 + env = simpy.Environment() 750 + events = [] 751 + output_store = simpy.Store(env) 752 + 753 + # PE0: source of ALLOC_REMOTE 754 + config0 = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 755 + pe0 = ProcessingElement(env=env, pe_id=0, config=config0) 756 + pe0.route_table[1] = output_store # Capture emitted token 757 + 758 + # Allocate a frame for act_id=0 on PE0 759 + fct_parent = FrameControlToken( 760 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 761 + ) 762 + inject_and_run(env, pe0, fct_parent) 763 + frame_id, _lane = pe0.tag_store[0] 764 + 765 + # Set up ALLOC_REMOTE instruction 766 + # fref+0: target PE=1, fref+1: target act_id=5, fref+2: parent act_id=0 767 + inst = Instruction( 768 + opcode=RoutingOp.ALLOC_REMOTE, 769 + output=OutputStyle.SINK, 770 + has_const=False, 771 + dest_count=0, 772 + wide=False, 773 + fref=10, 774 + ) 775 + pe0.iram[0] = inst 776 + 777 + # Load frame slots with constants 778 + pe0.frames[frame_id][10] = 1 # target PE 779 + pe0.frames[frame_id][11] = 5 # target act_id 780 + pe0.frames[frame_id][12] = 0 # parent act_id (zero = ALLOC) 781 + 782 + # Send MonadToken to trigger ALLOC_REMOTE 783 + tok = DyadToken( 784 + target=0, offset=0, act_id=0, data=0, port=Port.L 785 + ) 786 + inject_and_run(env, pe0, tok) 787 + 788 + # Verify FrameControlToken was emitted with ALLOC (not ALLOC_SHARED) 789 + assert len(output_store.items) > 0, "Should have emitted a token" 790 + emitted = output_store.items[0] 791 + assert isinstance(emitted, FrameControlToken), "Should emit FrameControlToken" 792 + assert emitted.op == FrameOp.ALLOC, "Should emit ALLOC" 793 + assert emitted.payload == 0, "Payload should be 0 for ALLOC" 794 + assert emitted.target == 1, "Should target PE 1" 795 + assert emitted.act_id == 5, "Should target act_id 5" 796 + 797 + def test_alloc_remote_fref_plus_2_missing_defaults_to_zero(self): 798 + """ALLOC_REMOTE gracefully handles fref+2 outside frame bounds (defaults to 0).""" 799 + env = simpy.Environment() 800 + events = [] 801 + output_store = simpy.Store(env) 802 + 803 + # PE0: source of ALLOC_REMOTE 804 + config0 = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 805 + pe0 = ProcessingElement(env=env, pe_id=0, config=config0) 806 + pe0.route_table[1] = output_store 807 + 808 + # Allocate frame 809 + fct_parent = FrameControlToken( 810 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 811 + ) 812 + inject_and_run(env, pe0, fct_parent) 813 + frame_id, _lane = pe0.tag_store[0] 814 + 815 + # Set up ALLOC_REMOTE with fref pointing near end of frame 816 + inst = Instruction( 817 + opcode=RoutingOp.ALLOC_REMOTE, 818 + output=OutputStyle.SINK, 819 + has_const=False, 820 + dest_count=0, 821 + wide=False, 822 + fref=62, # frame_slots defaults to 64, so fref+2=64 is outside 823 + ) 824 + pe0.iram[0] = inst 825 + 826 + # Load only fref+0 and fref+1 (fref+2 is beyond frame bounds) 827 + pe0.frames[frame_id][62] = 1 828 + pe0.frames[frame_id][63] = 7 829 + 830 + # Send MonadToken 831 + tok = DyadToken( 832 + target=0, offset=0, act_id=0, data=0, port=Port.L 833 + ) 834 + inject_and_run(env, pe0, tok) 835 + 836 + # Should emit ALLOC (not ALLOC_SHARED) because fref+2 is missing/falsy 837 + assert len(output_store.items) > 0, "Should have emitted a token" 838 + emitted = output_store.items[0] 839 + assert emitted.op == FrameOp.ALLOC, "Should emit ALLOC when fref+2 is missing" 840 + 841 + 842 + class TestFreeFrameOpcode: 843 + """AC5.1: FREE_FRAME opcode uses smart FREE behaviour on shared frames.""" 844 + 845 + def test_free_frame_opcode_shared_frame_partial_free(self): 846 + """FREE_FRAME smart free: partial frame free when other lanes remain.""" 847 + env = simpy.Environment() 848 + events = [] 849 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 850 + pe = ProcessingElement(env=env, pe_id=0, config=config) 851 + 852 + # Pre-allocate frame with two act_ids on different lanes 853 + # This simulates ALLOC for act_id=0 and ALLOC_SHARED for act_id=1 854 + frame_id = 0 855 + pe.frames[frame_id] = [None] * pe.frame_slots 856 + pe.tag_store[0] = (frame_id, 0) # act_id=0 on lane 0 857 + pe.tag_store[1] = (frame_id, 1) # act_id=1 on lane 1 858 + pe.lane_free[frame_id] = {2, 3} # Lanes 2 and 3 are free 859 + # Remove frame_id from free_frames (it's in use) 860 + if frame_id in pe.free_frames: 861 + pe.free_frames.remove(frame_id) 862 + 863 + # Install FREE_FRAME instruction 864 + inst = Instruction( 865 + opcode=RoutingOp.FREE_FRAME, 866 + output=OutputStyle.SINK, 867 + has_const=False, 868 + dest_count=0, 869 + wide=False, 870 + fref=0, 871 + ) 872 + pe.iram[0] = inst 873 + 874 + # Send MonadToken for act_id=0 to trigger FREE_FRAME 875 + tok = DyadToken( 876 + target=0, offset=0, act_id=0, data=0, port=Port.L 877 + ) 878 + inject_and_run(env, pe, tok) 879 + 880 + # Verify act_id=0 is removed from tag_store 881 + assert 0 not in pe.tag_store, "act_id=0 should be removed from tag_store" 882 + 883 + # Verify act_id=1 is still in tag_store 884 + assert 1 in pe.tag_store, "act_id=1 should still be in tag_store" 885 + 886 + # Verify frame is NOT returned to free_frames (still in use by act_id=1) 887 + assert frame_id not in pe.free_frames, "Frame should not be in free_frames" 888 + 889 + # Verify FrameFreed event has frame_freed=False 890 + frame_freed = [e for e in events if isinstance(e, FrameFreed)] 891 + assert any(e.frame_freed == False for e in frame_freed), \ 892 + "Should have FrameFreed event with frame_freed=False" 893 + last_frame_freed = [e for e in frame_freed if e.act_id == 0][-1] 894 + assert last_frame_freed.frame_freed == False, "Frame should not be marked as freed" 895 + 896 + def test_free_frame_opcode_shared_frame_full_free(self): 897 + """FREE_FRAME smart free: full frame free when last lane is freed.""" 898 + env = simpy.Environment() 899 + events = [] 900 + config = PEConfig(frame_count=4, lane_count=4, on_event=events.append) 901 + pe = ProcessingElement(env=env, pe_id=0, config=config) 902 + 903 + # Pre-allocate frame with two act_ids 904 + frame_id = 0 905 + pe.frames[frame_id] = [None] * pe.frame_slots 906 + pe.tag_store[0] = (frame_id, 0) # act_id=0 on lane 0 907 + pe.tag_store[1] = (frame_id, 1) # act_id=1 on lane 1 908 + pe.lane_free[frame_id] = {2, 3} 909 + if frame_id in pe.free_frames: 910 + pe.free_frames.remove(frame_id) 911 + 912 + # Install FREE_FRAME instruction 913 + inst = Instruction( 914 + opcode=RoutingOp.FREE_FRAME, 915 + output=OutputStyle.SINK, 916 + has_const=False, 917 + dest_count=0, 918 + wide=False, 919 + fref=0, 920 + ) 921 + pe.iram[0] = inst 922 + 923 + # First: free act_id=0 924 + tok0 = DyadToken( 925 + target=0, offset=0, act_id=0, data=0, port=Port.L 926 + ) 927 + inject_and_run(env, pe, tok0) 928 + 929 + # Verify frame still not free 930 + assert frame_id not in pe.free_frames, "Frame should not be free after first FREE_FRAME" 931 + assert 1 in pe.tag_store, "act_id=1 should still be present" 932 + 933 + # Second: free act_id=1 (last lane on frame) 934 + tok1 = DyadToken( 935 + target=0, offset=0, act_id=1, data=0, port=Port.L 936 + ) 937 + inject_and_run(env, pe, tok1) 938 + 939 + # Verify frame is now freed 940 + assert frame_id in pe.free_frames, "Frame should be in free_frames after last FREE_FRAME" 941 + assert 1 not in pe.tag_store, "act_id=1 should be removed from tag_store" 942 + 943 + # Verify tag_store is empty 944 + assert len(pe.tag_store) == 0, "tag_store should be empty" 945 + 946 + # Verify lane_free is cleaned up 947 + assert frame_id not in pe.lane_free, "lane_free entry should be deleted" 948 + 949 + # Verify FrameFreed event has frame_freed=True 950 + frame_freed = [e for e in events if isinstance(e, FrameFreed)] 951 + last_frame_freed = [e for e in frame_freed if e.act_id == 1][-1] 952 + assert last_frame_freed.frame_freed == True, \ 953 + "Last FREE_FRAME should emit FrameFreed with frame_freed=True" 954 + 955 + 956 + class TestLoopPipelining: 957 + """AC8.6: Full loop pipelining integration test with multiple lanes.""" 958 + 959 + def test_full_loop_pipelining_scenario(self): 960 + """ 961 + Complete loop pipelining lifecycle: two iterations of a dyadic instruction 962 + running concurrently on different lanes, both producing correct results. 963 + 964 + Simulates: 965 + 1. ALLOC(act_id=0) → frame, lane 0 966 + 2. Setup: write destination to frame 967 + 3. Iteration 1: inject L and R DyadTokens for act_id=0 968 + 4. ALLOC_SHARED(act_id=1, parent=0) → same frame, lane 1 969 + 5. Iteration 2: inject L and R DyadTokens for act_id=1 970 + 6. Both iterations match independently, both produce correct results 971 + 7. FREE(act_id=0) → lane 0 freed, frame stays 972 + 8. FREE(act_id=1) → last lane, frame returned to free list 973 + """ 974 + env = simpy.Environment() 975 + events = [] 976 + config = PEConfig( 977 + frame_count=4, lane_count=4, matchable_offsets=4, 978 + on_event=events.append 979 + ) 980 + pe = ProcessingElement(env=env, pe_id=0, config=config) 981 + 982 + # 1. ALLOC(act_id=0) → frame, lane 0 983 + fct_alloc_0 = FrameControlToken( 984 + target=0, act_id=0, op=FrameOp.ALLOC, payload=0 985 + ) 986 + inject_and_run(env, pe, fct_alloc_0) 987 + 988 + # Verify act_id=0 is allocated 989 + assert 0 in pe.tag_store, "act_id=0 should be in tag_store" 990 + frame_id, lane_0 = pe.tag_store[0] 991 + assert lane_0 == 0, "First ALLOC should assign lane 0" 992 + 993 + # Verify FrameAllocated event for iteration 1 994 + frame_allocated = [e for e in events if isinstance(e, FrameAllocated)] 995 + assert len(frame_allocated) >= 1, "Should have FrameAllocated event" 996 + assert frame_allocated[0].frame_id == frame_id, "Event should report correct frame_id" 997 + assert frame_allocated[0].lane == 0, "Event should report lane 0" 998 + 999 + # 2. Setup: write destination to frame at slot 8 1000 + dest = FrameDest( 1001 + target_pe=1, offset=0, act_id=0, port=Port.L, 1002 + token_kind=TokenKind.MONADIC 1003 + ) 1004 + pe.frames[frame_id][8] = dest 1005 + 1006 + # Set up route to capture output 1007 + pe.route_table[1] = simpy.Store(env) 1008 + 1009 + # 3. Install ADD instruction at IRAM offset 0 1010 + inst = Instruction( 1011 + opcode=ArithOp.ADD, 1012 + output=OutputStyle.INHERIT, 1013 + has_const=False, 1014 + dest_count=1, 1015 + wide=False, 1016 + fref=8, 1017 + ) 1018 + pe.iram[0] = inst 1019 + 1020 + # 4. ALLOC_SHARED(act_id=1, parent=0) → same frame, lane 1 1021 + fct_alloc_shared = FrameControlToken( 1022 + target=0, act_id=1, op=FrameOp.ALLOC_SHARED, payload=0 1023 + ) 1024 + inject_and_run(env, pe, fct_alloc_shared) 1025 + 1026 + # Verify act_id=1 is allocated on same frame, different lane 1027 + assert 1 in pe.tag_store, "act_id=1 should be in tag_store" 1028 + frame_id_1, lane_1 = pe.tag_store[1] 1029 + assert frame_id_1 == frame_id, "Both should share same frame" 1030 + assert lane_1 == 1, "Second allocation should assign lane 1" 1031 + assert lane_1 != lane_0, "Lanes should be different" 1032 + 1033 + # Verify FrameAllocated event for iteration 2 1034 + frame_allocated = [e for e in events if isinstance(e, FrameAllocated)] 1035 + assert len(frame_allocated) >= 2, "Should have 2 FrameAllocated events" 1036 + assert frame_allocated[1].frame_id == frame_id, "Event should report correct frame_id" 1037 + assert frame_allocated[1].lane == 1, "Event should report lane 1" 1038 + 1039 + # 5. Inject iteration 1 operands (act_id=0, lane 0) 1040 + tok_l_0 = DyadToken( 1041 + target=0, offset=0, act_id=0, data=100, port=Port.L 1042 + ) 1043 + inject_and_run(env, pe, tok_l_0) 1044 + 1045 + tok_r_0 = DyadToken( 1046 + target=0, offset=0, act_id=0, data=200, port=Port.R 1047 + ) 1048 + inject_and_run(env, pe, tok_r_0) 1049 + 1050 + # Verify Matched event for iteration 1 1051 + matched = [e for e in events if isinstance(e, Matched)] 1052 + assert len(matched) >= 1, "Should have Matched event for iteration 1" 1053 + match_0 = [m for m in matched if m.act_id == 0][-1] 1054 + assert match_0.left == 100, "Iteration 1 left operand should be 100" 1055 + assert match_0.right == 200, "Iteration 1 right operand should be 200" 1056 + assert match_0.offset == 0, "Iteration 1 offset should be 0" 1057 + 1058 + # Verify output token with correct data (100+200=300) 1059 + emitted = [e for e in events if isinstance(e, Emitted)] 1060 + assert len(emitted) >= 1, "Should have Emitted event for iteration 1" 1061 + out_tok_0 = emitted[-1].token 1062 + assert out_tok_0.data == 300, "Iteration 1 output should be 300 (100+200)" 1063 + assert out_tok_0.target == 1, "Output should route to target_pe=1" 1064 + 1065 + # 6. Inject iteration 2 operands (act_id=1, lane 1) 1066 + tok_l_1 = DyadToken( 1067 + target=0, offset=0, act_id=1, data=1000, port=Port.L 1068 + ) 1069 + inject_and_run(env, pe, tok_l_1) 1070 + 1071 + tok_r_1 = DyadToken( 1072 + target=0, offset=0, act_id=1, data=2000, port=Port.R 1073 + ) 1074 + inject_and_run(env, pe, tok_r_1) 1075 + 1076 + # Verify Matched event for iteration 2 1077 + matched = [e for e in events if isinstance(e, Matched)] 1078 + assert len(matched) >= 2, "Should have Matched events for both iterations" 1079 + match_1 = [m for m in matched if m.act_id == 1][-1] 1080 + assert match_1.left == 1000, "Iteration 2 left operand should be 1000" 1081 + assert match_1.right == 2000, "Iteration 2 right operand should be 2000" 1082 + assert match_1.offset == 0, "Iteration 2 offset should be 0" 1083 + 1084 + # Verify output token with correct data (1000+2000=3000) 1085 + emitted = [e for e in events if isinstance(e, Emitted)] 1086 + assert len(emitted) >= 2, "Should have Emitted events for both iterations" 1087 + out_tok_1 = emitted[-1].token 1088 + assert out_tok_1.data == 3000, "Iteration 2 output should be 3000 (1000+2000)" 1089 + assert out_tok_1.target == 1, "Output should route to target_pe=1" 1090 + 1091 + # Interleaved verification: confirm independent lanes 1092 + matches_by_id = {} 1093 + for m in matched: 1094 + if m.act_id not in matches_by_id: 1095 + matches_by_id[m.act_id] = [] 1096 + matches_by_id[m.act_id].append(m) 1097 + 1098 + assert 0 in matches_by_id, "Should have match for iteration 1 (act_id=0)" 1099 + assert 1 in matches_by_id, "Should have match for iteration 2 (act_id=1)" 1100 + assert matches_by_id[0][-1].left == 100, "Iteration 1 left should be 100" 1101 + assert matches_by_id[1][-1].left == 1000, "Iteration 2 left should be 1000" 1102 + 1103 + # 7. FREE(act_id=0) → lane 0 freed, frame stays 1104 + fct_free_0 = FrameControlToken( 1105 + target=0, act_id=0, op=FrameOp.FREE, payload=0 1106 + ) 1107 + inject_and_run(env, pe, fct_free_0) 1108 + 1109 + # Verify act_id=0 removed, act_id=1 still present 1110 + assert 0 not in pe.tag_store, "act_id=0 should be removed from tag_store" 1111 + assert 1 in pe.tag_store, "act_id=1 should still be in tag_store" 1112 + 1113 + # Verify frame not returned (still used by act_id=1) 1114 + assert frame_id not in pe.free_frames, "Frame should not be in free_frames" 1115 + 1116 + # Verify FrameFreed event with frame_freed=False 1117 + frame_freed = [e for e in events if isinstance(e, FrameFreed)] 1118 + freed_0 = [f for f in frame_freed if f.act_id == 0][-1] 1119 + assert freed_0.frame_freed == False, "frame_freed should be False (not last lane)" 1120 + assert freed_0.lane == lane_0, "Event should report lane 0" 1121 + 1122 + # 8. FREE(act_id=1) → last lane, frame returned to free list 1123 + fct_free_1 = FrameControlToken( 1124 + target=0, act_id=1, op=FrameOp.FREE, payload=0 1125 + ) 1126 + inject_and_run(env, pe, fct_free_1) 1127 + 1128 + # Verify act_id=1 removed from tag_store 1129 + assert 1 not in pe.tag_store, "act_id=1 should be removed from tag_store" 1130 + 1131 + # Verify tag_store is now empty 1132 + assert len(pe.tag_store) == 0, "tag_store should be empty" 1133 + 1134 + # Verify frame returned to free_frames 1135 + assert frame_id in pe.free_frames, "Frame should be in free_frames" 1136 + 1137 + # Verify lane_free entry cleaned up 1138 + assert frame_id not in pe.lane_free, "lane_free entry should be deleted" 1139 + 1140 + # Verify FrameFreed event with frame_freed=True 1141 + frame_freed = [e for e in events if isinstance(e, FrameFreed)] 1142 + freed_1 = [f for f in frame_freed if f.act_id == 1][-1] 1143 + assert freed_1.frame_freed == True, "frame_freed should be True (last lane)" 1144 + assert freed_1.lane == lane_1, "Event should report lane 1" 1145 + 1146 + # Summary: verify AC8.6 acceptance criteria 1147 + # Both iterations produce mathematically correct results 1148 + assert matches_by_id[0][-1].left + matches_by_id[0][-1].right == 300, \ 1149 + "Iteration 1 arithmetic correct" 1150 + assert matches_by_id[1][-1].left + matches_by_id[1][-1].right == 3000, \ 1151 + "Iteration 2 arithmetic correct" 1152 + 1153 + # Both iterations ran on SAME frame (verified at allocation, re-confirmed) 1154 + assert frame_id_1 == frame_id, "Both iterations ran on same frame" 1155 + 1156 + # Both iterations used DIFFERENT lanes 1157 + assert lane_0 != lane_1, "Iterations used different lanes" 1158 + assert lane_0 == 0 and lane_1 == 1, "Lanes are 0 and 1 respectively" 1159 + 1160 + # Freeing one iteration preserved the other 1161 + frame_freed_events = [e for e in events if isinstance(e, FrameFreed)] 1162 + assert len(frame_freed_events) >= 2, "Should have 2 FrameFreed events" 1163 + 1164 + # Freeing the last iteration returned the frame 1165 + assert frame_id in pe.free_frames, "Frame returned to pool after last FREE"
+2
tests/test_repl.py
··· 471 471 out = output.getvalue() 472 472 # Should show PE state or not found message 473 473 assert len(out) > 0 474 + # Verify formatting includes lane information or empty tag store marker (case-insensitive) 475 + assert "lane" in out.lower() or "tag store: (empty)" in out.lower() 474 476 475 477 def test_pe_invalid_id(self, repl, temp_dfasm_file): 476 478 """pe with non-integer ID should error."""
+2 -2
tests/test_snapshot.py
··· 123 123 assert isinstance(frame, tuple) 124 124 # Each frame has slots 125 125 126 - # tag_store should be dict mapping act_id to frame_id 126 + # tag_store should be dict mapping act_id to (frame_id, lane) tuple 127 127 assert isinstance(pe_snap.tag_store, dict) 128 128 129 129 # presence should be a tuple of tuples (frame_count x matchable_offsets) ··· 183 183 snapshot = capture(system) 184 184 185 185 pe_snap = snapshot.pes[0] 186 - # tag_store should be a dict mapping act_id to frame_id 186 + # tag_store should be a dict mapping act_id to (frame_id, lane) tuple 187 187 assert isinstance(pe_snap.tag_store, dict) 188 188 assert hasattr(pe_snap, 'frames') 189 189 assert hasattr(pe_snap, 'free_frames')