Implement HTML5 tokenizer state machine
Spec-compliant HTML5 tokenizer per WHATWG §13.2.5 with:
- Full state machine: Data, TagOpen, EndTagOpen, TagName, attributes,
comments, DOCTYPE (with PUBLIC/SYSTEM), character references
- Named character references: 2125 entities with binary search lookup
- Numeric character references: decimal, hex, Windows-1252 replacements
- Proper token coalescing (adjacent Character tokens merged)
- 34 unit tests, 6661/6807 html5lib tokenizer tests passing (97.7%)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>