Punycode (RFC3492) in OCaml
at main 112 lines 3.7 kB view raw view rendered
1# puny - RFC 3492 Punycode and IDNA for OCaml 2 3High-quality implementation of RFC 3492 (Punycode) with IDNA (Internationalized Domain Names in Applications) support for OCaml. Enables encoding and decoding of internationalized domain names with proper Unicode normalization. 4 5## Key Features 6 7- **RFC 3492 Punycode**: Complete implementation of the Bootstring algorithm for encoding Unicode in ASCII-compatible form 8- **IDNA Support**: ToASCII and ToUnicode operations per RFC 5891 (IDNA 2008) for internationalized domain names 9- **Unicode Normalization**: Automatic NFC normalization using `uunf` for proper IDNA compliance 10- **Mixed-Case Annotation**: Optional case preservation through Punycode encoding round-trips 11- **Domain Integration**: Native support for the `domain-name` library 12- **Comprehensive Error Handling**: Detailed position tracking and RFC-compliant error reporting 13 14## Usage 15 16### Basic Punycode Encoding/Decoding 17 18```ocaml 19(* Encode a UTF-8 string to Punycode *) 20let encoded = Punycode.encode_utf8 "münchen" 21(* = Ok "mnchen-3ya" *) 22 23(* Decode Punycode back to UTF-8 *) 24let decoded = Punycode.decode_utf8 "mnchen-3ya" 25(* = Ok "münchen" *) 26``` 27 28### Domain Label Operations 29 30```ocaml 31(* Encode a domain label with ACE prefix *) 32let label = Punycode.encode_label "münchen" 33(* = Ok "xn--mnchen-3ya" *) 34 35(* Decode an ACE-prefixed label *) 36let original = Punycode.decode_label "xn--mnchen-3ya" 37(* = Ok "münchen" *) 38``` 39 40### IDNA Domain Name Conversion 41 42```ocaml 43(* Convert internationalized domain to ASCII for DNS lookup *) 44let ascii_domain = Punycode_idna.to_ascii "münchen.example.com" 45(* = Ok "xn--mnchen-3ya.example.com" *) 46 47(* Convert ASCII domain back to Unicode for display *) 48let unicode_domain = Punycode_idna.to_unicode "xn--mnchen-3ya.example.com" 49(* = Ok "münchen.example.com" *) 50``` 51 52### Working with Unicode Code Points 53 54```ocaml 55(* Encode an array of Unicode code points *) 56let codepoints = [| Uchar.of_int 0x4ED6; Uchar.of_int 0x4EEC |] 57let encoded = Punycode.encode codepoints 58(* Result is Punycode string *) 59 60(* Decode to code points *) 61let decoded = Punycode.decode "ihqwcrb4cv8a8dqg056pqjye" 62(* Result is Uchar.t array *) 63``` 64 65### Integration with domain-name Library 66 67```ocaml 68(* Convert a Domain_name.t to ASCII *) 69let domain = Domain_name.of_string_exn "münchen.example.com" in 70let ascii = Punycode_idna.domain_to_ascii domain 71(* = Ok (Domain_name for "xn--mnchen-3ya.example.com") *) 72 73(* Convert back to Unicode *) 74let unicode = Punycode_idna.domain_to_unicode ascii 75(* = Ok (original domain) *) 76``` 77 78## Installation 79 80``` 81opam install puny 82``` 83 84## Documentation 85 86API documentation is available at https://tangled.org/@anil.recoil.org/ocaml-punycode or via: 87 88``` 89opam install puny 90odig doc puny 91``` 92 93## Limitations 94 95The following IDNA 2008 features are not yet implemented: 96 97- **Bidi rules** (RFC 5893): Bidirectional text validation for right-to-left scripts 98- **Contextual joiners** (RFC 5892 Appendix A.1): Zero-width joiner/non-joiner validation 99 100These checks are disabled by default in the API. Most common use cases (European languages, CJK) work correctly without them. 101 102## References 103 104- [RFC 3492](https://datatracker.ietf.org/doc/html/rfc3492) - Punycode: A Bootstring encoding of Unicode for IDNA 105- [RFC 5891](https://datatracker.ietf.org/doc/html/rfc5891) - Internationalized Domain Names in Applications (IDNA): Protocol 106- [RFC 5892](https://datatracker.ietf.org/doc/html/rfc5892) - Unicode Code Points and IDNA 107- [RFC 5893](https://datatracker.ietf.org/doc/html/rfc5893) - Right-to-Left Scripts for IDNA 108- [RFC 1035](https://datatracker.ietf.org/doc/html/rfc1035) - Domain Names Implementation and Specification 109 110## License 111 112ISC