···11+ISC License
22+33+Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>
44+55+Permission to use, copy, modify, and distribute this software for any
66+purpose with or without fee is hereby granted, provided that the above
77+copyright notice and this permission notice appear in all copies.
88+99+THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
1010+WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
1111+MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
1212+ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
1313+WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
1414+ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
1515+OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+102
README.md
···11+# puny - RFC 3492 Punycode and IDNA for OCaml
22+33+High-quality implementation of RFC 3492 (Punycode) with IDNA (Internationalized Domain Names in Applications) support for OCaml. Enables encoding and decoding of internationalized domain names with proper Unicode normalization.
44+55+## Key Features
66+77+- **RFC 3492 Punycode**: Complete implementation of the Bootstring algorithm for encoding Unicode in ASCII-compatible form
88+- **IDNA Support**: ToASCII and ToUnicode operations per RFC 5891 (IDNA 2008) for internationalized domain names
99+- **Unicode Normalization**: Automatic NFC normalization using `uunf` for proper IDNA compliance
1010+- **Mixed-Case Annotation**: Optional case preservation through Punycode encoding round-trips
1111+- **Domain Integration**: Native support for the `domain-name` library
1212+- **Comprehensive Error Handling**: Detailed position tracking and RFC-compliant error reporting
1313+1414+## Usage
1515+1616+### Basic Punycode Encoding/Decoding
1717+1818+```ocaml
1919+(* Encode a UTF-8 string to Punycode *)
2020+let encoded = Punycode.encode_utf8 "münchen"
2121+(* = Ok "mnchen-3ya" *)
2222+2323+(* Decode Punycode back to UTF-8 *)
2424+let decoded = Punycode.decode_utf8 "mnchen-3ya"
2525+(* = Ok "münchen" *)
2626+```
2727+2828+### Domain Label Operations
2929+3030+```ocaml
3131+(* Encode a domain label with ACE prefix *)
3232+let label = Punycode.encode_label "münchen"
3333+(* = Ok "xn--mnchen-3ya" *)
3434+3535+(* Decode an ACE-prefixed label *)
3636+let original = Punycode.decode_label "xn--mnchen-3ya"
3737+(* = Ok "münchen" *)
3838+```
3939+4040+### IDNA Domain Name Conversion
4141+4242+```ocaml
4343+(* Convert internationalized domain to ASCII for DNS lookup *)
4444+let ascii_domain = Punycode_idna.to_ascii "münchen.example.com"
4545+(* = Ok "xn--mnchen-3ya.example.com" *)
4646+4747+(* Convert ASCII domain back to Unicode for display *)
4848+let unicode_domain = Punycode_idna.to_unicode "xn--mnchen-3ya.example.com"
4949+(* = Ok "münchen.example.com" *)
5050+```
5151+5252+### Working with Unicode Code Points
5353+5454+```ocaml
5555+(* Encode an array of Unicode code points *)
5656+let codepoints = [| Uchar.of_int 0x4ED6; Uchar.of_int 0x4EEC |]
5757+let encoded = Punycode.encode codepoints
5858+(* Result is Punycode string *)
5959+6060+(* Decode to code points *)
6161+let decoded = Punycode.decode "ihqwcrb4cv8a8dqg056pqjye"
6262+(* Result is Uchar.t array *)
6363+```
6464+6565+### Integration with domain-name Library
6666+6767+```ocaml
6868+(* Convert a Domain_name.t to ASCII *)
6969+let domain = Domain_name.of_string_exn "münchen.example.com" in
7070+let ascii = Punycode_idna.domain_to_ascii domain
7171+(* = Ok (Domain_name for "xn--mnchen-3ya.example.com") *)
7272+7373+(* Convert back to Unicode *)
7474+let unicode = Punycode_idna.domain_to_unicode ascii
7575+(* = Ok (original domain) *)
7676+```
7777+7878+## Installation
7979+8080+```
8181+opam install puny
8282+```
8383+8484+## Documentation
8585+8686+API documentation is available at https://tangled.org/@anil.recoil.org/ocaml-punycode or via:
8787+8888+```
8989+opam install puny
9090+odig doc puny
9191+```
9292+9393+## References
9494+9595+- [RFC 3492](https://datatracker.ietf.org/doc/html/rfc3492) - Punycode: A Bootstring encoding of Unicode for IDNA
9696+- [RFC 5891](https://datatracker.ietf.org/doc/html/rfc5891) - Internationalized Domain Names in Applications (IDNA): Protocol
9797+- [RFC 5892](https://datatracker.ietf.org/doc/html/rfc5892) - Unicode Code Points and IDNA
9898+- [RFC 1035](https://datatracker.ietf.org/doc/html/rfc1035) - Domain Names Implementation and Specification
9999+100100+## License
101101+102102+ISC
···11+(*---------------------------------------------------------------------------
22+ Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved.
33+ SPDX-License-Identifier: ISC
44+ ---------------------------------------------------------------------------*)
55+16(** RFC 3492 Punycode: A Bootstring encoding of Unicode for IDNA.
2738 This module implements the Punycode algorithm as specified in
+5
lib/punycode_idna.ml
···11+(*---------------------------------------------------------------------------
22+ Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved.
33+ SPDX-License-Identifier: ISC
44+ ---------------------------------------------------------------------------*)
55+16(* IDNA (Internationalized Domain Names in Applications) Implementation *)
2738let max_domain_length = 253
+5
lib/punycode_idna.mli
···11+(*---------------------------------------------------------------------------
22+ Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>. All rights reserved.
33+ SPDX-License-Identifier: ISC
44+ ---------------------------------------------------------------------------*)
55+16(** IDNA (Internationalized Domain Names in Applications) support.
2738 This module provides ToASCII and ToUnicode operations as specified