Punycode (RFC3492) in OCaml
1# puny - RFC 3492 Punycode and IDNA for OCaml
2
3High-quality implementation of RFC 3492 (Punycode) with IDNA (Internationalized Domain Names in Applications) support for OCaml. Enables encoding and decoding of internationalized domain names with proper Unicode normalization.
4
5## Key Features
6
7- **RFC 3492 Punycode**: Complete implementation of the Bootstring algorithm for encoding Unicode in ASCII-compatible form
8- **IDNA Support**: ToASCII and ToUnicode operations per RFC 5891 (IDNA 2008) for internationalized domain names
9- **Unicode Normalization**: Automatic NFC normalization using `uunf` for proper IDNA compliance
10- **Mixed-Case Annotation**: Optional case preservation through Punycode encoding round-trips
11- **Domain Integration**: Native support for the `domain-name` library
12- **Comprehensive Error Handling**: Detailed position tracking and RFC-compliant error reporting
13
14## Usage
15
16### Basic Punycode Encoding/Decoding
17
18```ocaml
19(* Encode a UTF-8 string to Punycode *)
20let encoded = Punycode.encode_utf8 "münchen"
21(* = Ok "mnchen-3ya" *)
22
23(* Decode Punycode back to UTF-8 *)
24let decoded = Punycode.decode_utf8 "mnchen-3ya"
25(* = Ok "münchen" *)
26```
27
28### Domain Label Operations
29
30```ocaml
31(* Encode a domain label with ACE prefix *)
32let label = Punycode.encode_label "münchen"
33(* = Ok "xn--mnchen-3ya" *)
34
35(* Decode an ACE-prefixed label *)
36let original = Punycode.decode_label "xn--mnchen-3ya"
37(* = Ok "münchen" *)
38```
39
40### IDNA Domain Name Conversion
41
42```ocaml
43(* Convert internationalized domain to ASCII for DNS lookup *)
44let ascii_domain = Punycode_idna.to_ascii "münchen.example.com"
45(* = Ok "xn--mnchen-3ya.example.com" *)
46
47(* Convert ASCII domain back to Unicode for display *)
48let unicode_domain = Punycode_idna.to_unicode "xn--mnchen-3ya.example.com"
49(* = Ok "münchen.example.com" *)
50```
51
52### Working with Unicode Code Points
53
54```ocaml
55(* Encode an array of Unicode code points *)
56let codepoints = [| Uchar.of_int 0x4ED6; Uchar.of_int 0x4EEC |]
57let encoded = Punycode.encode codepoints
58(* Result is Punycode string *)
59
60(* Decode to code points *)
61let decoded = Punycode.decode "ihqwcrb4cv8a8dqg056pqjye"
62(* Result is Uchar.t array *)
63```
64
65### Integration with domain-name Library
66
67```ocaml
68(* Convert a Domain_name.t to ASCII *)
69let domain = Domain_name.of_string_exn "münchen.example.com" in
70let ascii = Punycode_idna.domain_to_ascii domain
71(* = Ok (Domain_name for "xn--mnchen-3ya.example.com") *)
72
73(* Convert back to Unicode *)
74let unicode = Punycode_idna.domain_to_unicode ascii
75(* = Ok (original domain) *)
76```
77
78## Installation
79
80```
81opam install puny
82```
83
84## Documentation
85
86API documentation is available at https://tangled.org/@anil.recoil.org/ocaml-punycode or via:
87
88```
89opam install puny
90odig doc puny
91```
92
93## Limitations
94
95The following IDNA 2008 features are not yet implemented:
96
97- **Bidi rules** (RFC 5893): Bidirectional text validation for right-to-left scripts
98- **Contextual joiners** (RFC 5892 Appendix A.1): Zero-width joiner/non-joiner validation
99
100These checks are disabled by default in the API. Most common use cases (European languages, CJK) work correctly without them.
101
102## References
103
104- [RFC 3492](https://datatracker.ietf.org/doc/html/rfc3492) - Punycode: A Bootstring encoding of Unicode for IDNA
105- [RFC 5891](https://datatracker.ietf.org/doc/html/rfc5891) - Internationalized Domain Names in Applications (IDNA): Protocol
106- [RFC 5892](https://datatracker.ietf.org/doc/html/rfc5892) - Unicode Code Points and IDNA
107- [RFC 5893](https://datatracker.ietf.org/doc/html/rfc5893) - Right-to-Left Scripts for IDNA
108- [RFC 1035](https://datatracker.ietf.org/doc/html/rfc1035) - Domain Names Implementation and Specification
109
110## License
111
112ISC