feat: model inference, tokenizer, and full NER pipeline
- model.zig: weight loading from binary + forward pass (embed → CNN → linear → parser scoring)
- tokenizer.zig: full port of spaCy's tokenizer algorithm (whitespace split → iterative prefix/suffix stripping → infix splitting → special cases)
- tokenizer_data.zig: generated unicode tables, pattern data, and 1347 special cases from spaCy en_core_web_sm
- parser.zig: label enum reordered to match spaCy training order, N_ACTIONS=74, decodeAction rewritten
- spacez.zig: top-level recognize() function connecting tokenizer → model → byte-offset entities
- scripts/gen_tokenizer_data.py: extracts tokenizer patterns from spaCy via sre_parse
- scripts/compare.py: cross-validation against spaCy
- 15/15 tokenizer cross-validation tests pass
- model NER tests match spaCy: "Barack Obama visited Paris" → PERSON[0..2), GPE[3..4)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>