Punycode (RFC3492) in OCaml
at main 955 lines 38 kB view raw
1 2 3 4 5 6 7Internet Engineering Task Force (IETF) J. Klensin 8Request for Comments: 5891 August 2010 9Obsoletes: 3490, 3491 10Updates: 3492 11Category: Standards Track 12ISSN: 2070-1721 13 14 15 Internationalized Domain Names in Applications (IDNA): Protocol 16 17Abstract 18 19 This document is the revised protocol definition for 20 Internationalized Domain Names (IDNs). The rationale for changes, 21 the relationship to the older specification, and important 22 terminology are provided in other documents. This document specifies 23 the protocol mechanism, called Internationalized Domain Names in 24 Applications (IDNA), for registering and looking up IDNs in a way 25 that does not require changes to the DNS itself. IDNA is only meant 26 for processing domain names, not free text. 27 28Status of This Memo 29 30 This is an Internet Standards Track document. 31 32 This document is a product of the Internet Engineering Task Force 33 (IETF). It represents the consensus of the IETF community. It has 34 received public review and has been approved for publication by the 35 Internet Engineering Steering Group (IESG). Further information on 36 Internet Standards is available in Section 2 of RFC 5741. 37 38 Information about the current status of this document, any errata, 39 and how to provide feedback on it may be obtained at 40 http://www.rfc-editor.org/info/rfc5891. 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58Klensin Standards Track [Page 1] 59 60RFC 5891 IDNA2008 Protocol August 2010 61 62 63Copyright Notice 64 65 Copyright (c) 2010 IETF Trust and the persons identified as the 66 document authors. All rights reserved. 67 68 This document is subject to BCP 78 and the IETF Trust's Legal 69 Provisions Relating to IETF Documents 70 (http://trustee.ietf.org/license-info) in effect on the date of 71 publication of this document. Please review these documents 72 carefully, as they describe your rights and restrictions with respect 73 to this document. Code Components extracted from this document must 74 include Simplified BSD License text as described in Section 4.e of 75 the Trust Legal Provisions and are provided without warranty as 76 described in the Simplified BSD License. 77 78 This document may contain material from IETF Documents or IETF 79 Contributions published or made publicly available before November 80 10, 2008. The person(s) controlling the copyright in some of this 81 material may not have granted the IETF Trust the right to allow 82 modifications of such material outside the IETF Standards Process. 83 Without obtaining an adequate license from the person(s) controlling 84 the copyright in such materials, this document may not be modified 85 outside the IETF Standards Process, and derivative works of it may 86 not be created outside the IETF Standards Process, except to format 87 it for publication as an RFC or to translate it into languages other 88 than English. 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114Klensin Standards Track [Page 2] 115 116RFC 5891 IDNA2008 Protocol August 2010 117 118 119Table of Contents 120 121 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 122 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 123 3. Requirements and Applicability . . . . . . . . . . . . . . . . 5 124 3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 5 125 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 5 126 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 6 127 3.2.2. Non-Domain-Name Data Types Stored in the DNS . . . . . 6 128 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 6 129 4.1. Input to IDNA Registration . . . . . . . . . . . . . . . . 7 130 4.2. Permitted Character and Label Validation . . . . . . . . . 7 131 4.2.1. Input Format . . . . . . . . . . . . . . . . . . . . . 7 132 4.2.2. Rejection of Characters That Are Not Permitted . . . . 8 133 4.2.3. Label Validation . . . . . . . . . . . . . . . . . . . 8 134 4.2.4. Registration Validation Requirements . . . . . . . . . 9 135 4.3. Registry Restrictions . . . . . . . . . . . . . . . . . . 9 136 4.4. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9 137 4.5. Insertion in the Zone . . . . . . . . . . . . . . . . . . 10 138 5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 10 139 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 10 140 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10 141 5.3. A-label Input . . . . . . . . . . . . . . . . . . . . . . 10 142 5.4. Validation and Character List Testing . . . . . . . . . . 11 143 5.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 13 144 5.6. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 13 145 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 146 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 147 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13 148 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 149 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 150 10.1. Normative References . . . . . . . . . . . . . . . . . . . 14 151 10.2. Informative References . . . . . . . . . . . . . . . . . . 15 152 Appendix A. Summary of Major Changes from IDNA2003 . . . . . . . 17 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170Klensin Standards Track [Page 3] 171 172RFC 5891 IDNA2008 Protocol August 2010 173 174 1751. Introduction 176 177 This document supplies the protocol definition for Internationalized 178 Domain Names in Applications (IDNA), with the version specified here 179 known as IDNA2008. Essential definitions and terminology for 180 understanding this document and a road map of the collection of 181 documents that make up IDNA2008 appear in a separate Definitions 182 document [RFC5890]. Appendix A discusses the relationship between 183 this specification and the earlier version of IDNA (referred to here 184 as "IDNA2003"). The rationale for these changes, along with 185 considerable explanatory material and advice to zone administrators 186 who support IDNs, is provided in another document, known informally 187 in this series as the "Rationale document" [RFC5894]. 188 189 IDNA works by allowing applications to use certain ASCII [ASCII] 190 string labels (beginning with a special prefix) to represent 191 non-ASCII name labels. Lower-layer protocols need not be aware of 192 this; therefore, IDNA does not change any infrastructure. In 193 particular, IDNA does not depend on any changes to DNS servers, 194 resolvers, or DNS protocol elements, because the ASCII name service 195 provided by the existing DNS can be used for IDNA. 196 197 IDNA applies only to a specific subset of DNS labels. The base DNS 198 standards [RFC1034] [RFC1035] and their various updates specify how 199 to combine labels into fully-qualified domain names and parse labels 200 out of those names. 201 202 This document describes two separate protocols, one for IDN 203 registration (Section 4) and one for IDN lookup (Section 5). These 204 two protocols share some terminology, reference data, and operations. 205 2062. Terminology 207 208 As mentioned above, terminology used as part of the definition of 209 IDNA appears in the Definitions document [RFC5890]. It is worth 210 noting that some of this terminology overlaps with, and is consistent 211 with, that used in Unicode or other character set standards and the 212 DNS. Readers of this document are assumed to be familiar with the 213 associated Definitions document and with the DNS-specific terminology 214 in RFC 1034 [RFC1034]. 215 216 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 217 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 218 document are to be interpreted as described in BCP 14, RFC 2119 219 [RFC2119]. 220 221 222 223 224 225 226Klensin Standards Track [Page 4] 227 228RFC 5891 IDNA2008 Protocol August 2010 229 230 2313. Requirements and Applicability 232 2333.1. Requirements 234 235 IDNA makes the following requirements: 236 237 1. Whenever a domain name is put into a domain name slot that is not 238 IDNA-aware (see Section 2.3.2.6 of the Definitions document 239 [RFC5890]), it MUST contain only ASCII characters (i.e., its 240 labels must be either A-labels or NR-LDH labels), unless the DNS 241 application is not subject to historical recommendations for 242 "hostname"-style names (see RFC 1034 [RFC1034] and 243 Section 3.2.1). 244 245 2. Labels MUST be compared using equivalent forms: either both 246 A-label forms or both U-label forms. Because A-labels and 247 U-labels can be transformed into each other without loss of 248 information, these comparisons are equivalent (however, in 249 practice, comparison of U-labels requires first verifying that 250 they actually are U-labels and not just Unicode strings). A pair 251 of A-labels MUST be compared as case-insensitive ASCII (as with 252 all comparisons of ASCII DNS labels). U-labels MUST be compared 253 as-is, without case folding or other intermediate steps. While 254 it is not necessary to validate labels in order to compare them, 255 successful comparison does not imply validity. In many cases, 256 not limited to comparison, validation may be important for other 257 reasons and SHOULD be performed. 258 259 3. Labels being registered MUST conform to the requirements of 260 Section 4. Labels being looked up and the lookup process MUST 261 conform to the requirements of Section 5. 262 2633.2. Applicability 264 265 IDNA applies to all domain names in all domain name slots in 266 protocols except where it is explicitly excluded. It does not apply 267 to domain name slots that do not use the LDH syntax rules as 268 described in the Definitions document [RFC5890]. 269 270 Because it uses the DNS, IDNA applies to many protocols that were 271 specified before it was designed. IDNs occupying domain name slots 272 in those older protocols MUST be in A-label form until and unless 273 those protocols and their implementations are explicitly upgraded to 274 be aware of IDNs and to accept the U-label form. IDNs actually 275 appearing in DNS queries or responses MUST be A-labels. 276 277 278 279 280 281 282Klensin Standards Track [Page 5] 283 284RFC 5891 IDNA2008 Protocol August 2010 285 286 287 IDNA-aware protocols and implementations MAY accept U-labels, 288 A-labels, or both as those particular protocols specify. IDNA is not 289 defined for extended label types (see RFC 2671 [RFC2671], Section 3). 290 2913.2.1. DNS Resource Records 292 293 IDNA applies only to domain names in the NAME and RDATA fields of DNS 294 resource records whose CLASS is IN. See the DNS specification 295 [RFC1035] for precise definitions of these terms. 296 297 The application of IDNA to DNS resource records depends entirely on 298 the CLASS of the record, and not on the TYPE except as noted below. 299 This will remain true, even as new TYPEs are defined, unless a new 300 TYPE defines TYPE-specific rules. Special naming conventions for SRV 301 records (and "underscore labels" more generally) are incompatible 302 with IDNA coding as discussed in the Definitions document [RFC5890], 303 especially Section 2.3.2.3. Of course, underscore labels may be part 304 of a domain that uses IDN labels at higher levels in the tree. 305 3063.2.2. Non-Domain-Name Data Types Stored in the DNS 307 308 Although IDNA enables the representation of non-ASCII characters in 309 domain names, that does not imply that IDNA enables the 310 representation of non-ASCII characters in other data types that are 311 stored in domain names, specifically in the RDATA field for types 312 that have structured RDATA format. For example, an email address 313 local part is stored in a domain name in the RNAME field as part of 314 the RDATA of an SOA record (e.g., hostmaster@example.com would be 315 represented as hostmaster.example.com). IDNA does not update the 316 existing email standards, which allow only ASCII characters in local 317 parts. Even though work is in progress to define 318 internationalization for email addresses [RFC4952], changes to the 319 email address part of the SOA RDATA would require action in, or 320 updates to, other standards, specifically those that specify the 321 format of the SOA RR. 322 3234. Registration Protocol 324 325 This section defines the model for registering an IDN. The model is 326 implementation independent; any sequence of steps that produces 327 exactly the same result for all labels is considered a valid 328 implementation. 329 330 Note that, while the registration (this section) and lookup protocols 331 (Section 5) are very similar in most respects, they are not 332 identical, and implementers should carefully follow the steps 333 described in this specification. 334 335 336 337 338Klensin Standards Track [Page 6] 339 340RFC 5891 IDNA2008 Protocol August 2010 341 342 3434.1. Input to IDNA Registration 344 345 Registration processes, especially processing by entities (often 346 called "registrars") who deal with registrants before the request 347 actually reaches the zone manager ("registry") are outside the scope 348 of this definition and may differ significantly depending on local 349 needs. By the time a string enters the IDNA registration process as 350 described in this specification, it MUST be in Unicode and in 351 Normalization Form C (NFC [Unicode-UAX15]). Entities responsible for 352 zone files ("registries") MUST accept only the exact string for which 353 registration is requested, free of any mappings or local adjustments. 354 They MAY accept that input in any of three forms: 355 356 1. As a pair of A-label and U-label. 357 358 2. As an A-label only. 359 360 3. As a U-label only. 361 362 The first two of these forms are RECOMMENDED because the use of 363 A-labels avoids any possibility of ambiguity. The first is normally 364 preferred over the second because it permits further verification of 365 user intent (see Section 4.2.1). 366 3674.2. Permitted Character and Label Validation 368 3694.2.1. Input Format 370 371 If both the U-label and A-label forms are available, the registry 372 MUST ensure that the A-label form is in lowercase, perform a 373 conversion to a U-label, perform the steps and tests described below 374 on that U-label, and then verify that the A-label produced by the 375 step in Section 4.4 matches the one provided as input. In addition, 376 the U-label that was provided as input and the one obtained by 377 conversion of the A-label MUST match exactly. If, for some reason, 378 these tests fail, the registration MUST be rejected. 379 380 If only an A-label was provided and the conversion to a U-label is 381 not performed, the registry MUST still verify that the A-label is 382 superficially valid, i.e., that it does not violate any of the rules 383 of Punycode encoding [RFC3492] such as the prohibition on trailing 384 hyphen-minus, the requirement that all characters be ASCII, and so 385 on. Strings that appear to be A-labels (e.g., they start with 386 "xn--") and strings that are supplied to the registry in a context 387 reserved for A-labels (such as a field in a form to be filled out), 388 but that are not valid A-labels as described in this paragraph, MUST 389 NOT be placed in DNS zones that support IDNA. 390 391 392 393 394Klensin Standards Track [Page 7] 395 396RFC 5891 IDNA2008 Protocol August 2010 397 398 399 If only an A-label is provided, the conversion to a U-label is not 400 performed, but the superficial tests described in the previous 401 paragraph are performed, registration procedures MAY, and usually 402 will, bypass the tests and actions in the balance of Section 4.2 and 403 in Sections 4.3 and 4.4. 404 4054.2.2. Rejection of Characters That Are Not Permitted 406 407 The candidate Unicode string MUST NOT contain characters that appear 408 in the "DISALLOWED" and "UNASSIGNED" lists specified in the Tables 409 document [RFC5892]. 410 4114.2.3. Label Validation 412 413 The proposed label (in the form of a Unicode string, i.e., a string 414 that at least superficially appears to be a U-label) is then examined 415 using tests that require examination of more than one character. 416 Character order is considered to be the on-the-wire order. That 417 order may not be the same as the display order. 418 4194.2.3.1. Hyphen Restrictions 420 421 The Unicode string MUST NOT contain "--" (two consecutive hyphens) in 422 the third and fourth character positions and MUST NOT start or end 423 with a "-" (hyphen). 424 4254.2.3.2. Leading Combining Marks 426 427 The Unicode string MUST NOT begin with a combining mark or combining 428 character (see The Unicode Standard, Section 2.11 [Unicode] for an 429 exact definition). 430 4314.2.3.3. Contextual Rules 432 433 The Unicode string MUST NOT contain any characters whose validity is 434 context-dependent, unless the validity is positively confirmed by a 435 contextual rule. To check this, each code point identified as 436 CONTEXTJ or CONTEXTO in the Tables document [RFC5892] MUST have a 437 non-null rule. If such a code point is missing a rule, the label is 438 invalid. If the rule exists but the result of applying the rule is 439 negative or inconclusive, the proposed label is invalid. 440 4414.2.3.4. Labels Containing Characters Written Right to Left 442 443 If the proposed label contains any characters from scripts that are 444 written from right to left, it MUST meet the Bidi criteria [RFC5893]. 445 446 447 448 449 450Klensin Standards Track [Page 8] 451 452RFC 5891 IDNA2008 Protocol August 2010 453 454 4554.2.4. Registration Validation Requirements 456 457 Strings that contain at least one non-ASCII character, have been 458 produced by the steps above, whose contents pass all of the tests in 459 Section 4.2.3, and are 63 or fewer characters long in 460 ASCII-compatible encoding (ACE) form (see Section 4.4), are U-labels. 461 462 To summarize, tests are made in Section 4.2 for invalid characters, 463 invalid combinations of characters, for labels that are invalid even 464 if the characters they contain are valid individually, and for labels 465 that do not conform to the restrictions for strings containing 466 right-to-left characters. 467 4684.3. Registry Restrictions 469 470 In addition to the rules and tests above, there are many reasons why 471 a registry could reject a label. Registries at all levels of the 472 DNS, not just the top level, are expected to establish policies about 473 label registrations. Policies are likely to be informed by the local 474 languages and the scripts that are used to write them and may depend 475 on many factors including what characters are in the label (for 476 example, a label may be rejected based on other labels already 477 registered). See the Rationale document [RFC5894], Section 3.2, for 478 further discussion and recommendations about registry policies. 479 480 The string produced by the steps in Section 4.2 is checked and 481 processed as appropriate to local registry restrictions. Application 482 of those registry restrictions may result in the rejection of some 483 labels or the application of special restrictions to others. 484 4854.4. Punycode Conversion 486 487 The resulting U-label is converted to an A-label (defined in Section 488 2.3.2.1 of the Definitions document [RFC5890]). The A-label is the 489 encoding of the U-label according to the Punycode algorithm [RFC3492] 490 with the ACE prefix "xn--" added at the beginning of the string. The 491 resulting string must, of course, conform to the length limits 492 imposed by the DNS. This document does not update or alter the 493 Punycode algorithm specified in RFC 3492 in any way. RFC 3492 does 494 make a non-normative reference to the information about the value and 495 construction of the ACE prefix that appears in RFC 3490 or Nameprep 496 [RFC3491]. For consistency and reader convenience, IDNA2008 497 effectively updates that reference to point to this document. That 498 change does not alter the prefix itself. The prefix, "xn--", is the 499 same in both sets of documents. 500 501 502 503 504 505 506Klensin Standards Track [Page 9] 507 508RFC 5891 IDNA2008 Protocol August 2010 509 510 511 With the exception of the maximum string length test on Punycode 512 output, the failure conditions identified in the Punycode encoding 513 procedure cannot occur if the input is a U-label as determined by the 514 steps in Sections 4.1 through 4.3 above. 515 5164.5. Insertion in the Zone 517 518 The label is registered in the DNS by inserting the A-label into a 519 zone. 520 5215. Domain Name Lookup Protocol 522 523 Lookup is different from registration and different tests are applied 524 on the client. Although some validity checks are necessary to avoid 525 serious problems with the protocol, the lookup-side tests are more 526 permissive and rely on the assumption that names that are present in 527 the DNS are valid. That assumption is, however, a weak one because 528 the presence of wildcards in the DNS might cause a string that is not 529 actually registered in the DNS to be successfully looked up. 530 5315.1. Label String Input 532 533 The user supplies a string in the local character set, for example, 534 by typing it, clicking on it, or copying and pasting it from a 535 resource identifier, e.g., a Uniform Resource Identifier (URI) 536 [RFC3986] or an Internationalized Resource Identifier (IRI) 537 [RFC3987], from which the domain name is extracted. Alternately, 538 some process not directly involving the user may read the string from 539 a file or obtain it in some other way. Processing in this step and 540 the one specified in Section 5.2 are local matters, to be 541 accomplished prior to actual invocation of IDNA. 542 5435.2. Conversion to Unicode 544 545 The string is converted from the local character set into Unicode, if 546 it is not already in Unicode. Depending on local needs, this 547 conversion may involve mapping some characters into other characters 548 as well as coding conversions. Those issues are discussed in the 549 mapping-related sections (Sections 4.2, 4.4, 6, and 7.3) of the 550 Rationale document [RFC5894] and in the separate Mapping document 551 [IDNA2008-Mapping]. The result MUST be a Unicode string in NFC form. 552 5535.3. A-label Input 554 555 If the input to this procedure appears to be an A-label (i.e., it 556 starts in "xn--", interpreted case-insensitively), the lookup 557 application MAY attempt to convert it to a U-label, first ensuring 558 that the A-label is entirely in lowercase (converting it to lowercase 559 560 561 562Klensin Standards Track [Page 10] 563 564RFC 5891 IDNA2008 Protocol August 2010 565 566 567 if necessary), and apply the tests of Section 5.4 and the conversion 568 of Section 5.5 to that form. If the label is converted to Unicode 569 (i.e., to U-label form) using the Punycode decoding algorithm, then 570 the processing specified in those two sections MUST be performed, and 571 the label MUST be rejected if the resulting label is not identical to 572 the original. See Section 8.1 of the Rationale document [RFC5894] 573 for additional discussion on this topic. 574 575 Conversion from the A-label and testing that the result is a U-label 576 SHOULD be performed if the domain name will later be presented to the 577 user in native character form (this requires that the lookup 578 application be IDNA-aware). If those steps are not performed, the 579 lookup process SHOULD at least test to determine that the string is 580 actually an A-label, examining it for the invalid formats specified 581 in the Punycode decoding specification. Applications that are not 582 IDNA-aware will obviously omit that testing; others MAY treat the 583 string as opaque to avoid the additional processing at the expense of 584 providing less protection and information to users. 585 5865.4. Validation and Character List Testing 587 588 As with the registration procedure described in Section 4, the 589 Unicode string is checked to verify that all characters that appear 590 in it are valid as input to IDNA lookup processing. As discussed 591 above and in the Rationale document [RFC5894], the lookup check is 592 more liberal than the registration one. Labels that have not been 593 fully evaluated for conformance to the applicable rules are referred 594 to as "putative" labels as discussed in Section 2.3.2.1 of the 595 Definitions document [RFC5890]. Putative U-labels with any of the 596 following characteristics MUST be rejected prior to DNS lookup: 597 598 o Labels that are not in NFC [Unicode-UAX15]. 599 600 o Labels containing "--" (two consecutive hyphens) in the third and 601 fourth character positions. 602 603 o Labels whose first character is a combining mark (see The Unicode 604 Standard, Section 2.11 [Unicode]). 605 606 o Labels containing prohibited code points, i.e., those that are 607 assigned to the "DISALLOWED" category of the Tables document 608 [RFC5892]. 609 610 o Labels containing code points that are identified in the Tables 611 document as "CONTEXTJ", i.e., requiring exceptional contextual 612 rule processing on lookup, but that do not conform to those rules. 613 Note that this implies that a rule must be defined, not null: a 614 615 616 617 618Klensin Standards Track [Page 11] 619 620RFC 5891 IDNA2008 Protocol August 2010 621 622 623 character that requires a contextual rule but for which the rule 624 is null is treated in this step as having failed to conform to the 625 rule. 626 627 o Labels containing code points that are identified in the Tables 628 document as "CONTEXTO", but for which no such rule appears in the 629 table of rules. Applications resolving DNS names or carrying out 630 equivalent operations are not required to test contextual rules 631 for "CONTEXTO" characters, only to verify that a rule is defined 632 (although they MAY make such tests to provide better protection or 633 give better information to the user). 634 635 o Labels containing code points that are unassigned in the version 636 of Unicode being used by the application, i.e., in the UNASSIGNED 637 category of the Tables document. 638 639 This requirement means that the application must use a list of 640 unassigned characters that is matched to the version of Unicode 641 that is being used for the other requirements in this section. It 642 is not required that the application know which version of Unicode 643 is being used; that information might be part of the operating 644 environment in which the application is running. 645 646 In addition, the application SHOULD apply the following test. 647 648 o Verification that the string is compliant with the requirements 649 for right-to-left characters specified in the Bidi document 650 [RFC5893]. 651 652 This test may be omitted in special circumstances, such as when the 653 lookup application knows that the conditions are enforced elsewhere, 654 because an attempt to look up and resolve such strings will almost 655 certainly lead to a DNS lookup failure except when wildcards are 656 present in the zone. However, applying the test is likely to give 657 much better information about the reason for a lookup failure -- 658 information that may be usefully passed to the user when that is 659 feasible -- than DNS resolution failure information alone. 660 661 For all other strings, the lookup application MUST rely on the 662 presence or absence of labels in the DNS to determine the validity of 663 those labels and the validity of the characters they contain. If 664 they are registered, they are presumed to be valid; if they are not, 665 their possible validity is not relevant. While a lookup application 666 may reasonably issue warnings about strings it believes may be 667 problematic, applications that decline to process a string that 668 conforms to the rules above (i.e., does not look it up in the DNS) 669 are not in conformance with this protocol. 670 671 672 673 674Klensin Standards Track [Page 12] 675 676RFC 5891 IDNA2008 Protocol August 2010 677 678 6795.5. Punycode Conversion 680 681 The string that has now been validated for lookup is converted to ACE 682 form by applying the Punycode algorithm to the string and then adding 683 the ACE prefix ("xn--"). 684 6855.6. DNS Name Resolution 686 687 The A-label resulting from the conversion in Section 5.5 or supplied 688 directly (see Section 5.3) is combined with other labels as needed to 689 form a fully-qualified domain name that is then looked up in the DNS, 690 using normal DNS resolver procedures. The lookup can obviously 691 either succeed (returning information) or fail. 692 6936. Security Considerations 694 695 Security Considerations for this version of IDNA are described in the 696 Definitions document [RFC5890], except for the special issues 697 associated with right-to-left scripts and characters. The latter are 698 discussed in the Bidi document [RFC5893]. 699 700 In order to avoid intentional or accidental attacks from labels that 701 might be confused with others, special problems in rendering, and so 702 on, the IDNA model requires that registries exercise care and 703 thoughtfulness about what labels they choose to permit. That issue 704 is discussed in Section 4.3 of this document which, in turn, points 705 to a somewhat more extensive discussion in the Rationale document 706 [RFC5894]. 707 7087. IANA Considerations 709 710 IANA actions for this version of IDNA are specified in the Tables 711 document [RFC5892] and discussed informally in the Rationale document 712 [RFC5894]. The components of IDNA described in this document do not 713 require any IANA actions. 714 7158. Contributors 716 717 While the listed editor held the pen, the original versions of this 718 document represent the joint work and conclusions of an ad hoc design 719 team consisting of the editor and, in alphabetic order, Harald 720 Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document 721 draws significantly on the original version of IDNA [RFC3490] both 722 conceptually and for specific text. This second-generation version 723 would not have been possible without the work that went into that 724 first version and especially the contributions of its authors Patrik 725 Faltstrom, Paul Hoffman, and Adam Costello. While Faltstrom was 726 727 728 729 730Klensin Standards Track [Page 13] 731 732RFC 5891 IDNA2008 Protocol August 2010 733 734 735 actively involved in the creation of this version, Hoffman and 736 Costello were not and should not be held responsible for any errors 737 or omissions. 738 7399. Acknowledgments 740 741 This revision to IDNA would have been impossible without the 742 accumulated experience since RFC 3490 was published and resulting 743 comments and complaints of many people in the IETF, ICANN, and other 744 communities (too many people to list here). Nor would it have been 745 possible without RFC 3490 itself and the efforts of the Working Group 746 that defined it. Those people whose contributions are acknowledged 747 in RFC 3490, RFC 4690 [RFC4690], and the Rationale document [RFC5894] 748 were particularly important. 749 750 Specific textual changes were incorporated into this document after 751 suggestions from the other contributors, Stephane Bortzmeyer, Vint 752 Cerf, Lisa Dusseault, Paul Hoffman, Kent Karlsson, James Mitchell, 753 Erik van der Poel, Marcos Sanz, Andrew Sullivan, Wil Tan, Ken 754 Whistler, Chris Wright, and other WG participants and reviewers 755 including Martin Duerst, James Mitchell, Subramanian Moonesamy, Peter 756 Saint-Andre, Margaret Wasserman, and Dan Winship who caught specific 757 errors and recommended corrections. Special thanks are due to Paul 758 Hoffman for permission to extract material to form the basis for 759 Appendix A from a draft document that he prepared. 760 76110. References 762 76310.1. Normative References 764 765 [RFC1034] Mockapetris, P., "Domain names - concepts and 766 facilities", STD 13, RFC 1034, November 1987. 767 768 [RFC1035] Mockapetris, P., "Domain names - implementation and 769 specification", STD 13, RFC 1035, November 1987. 770 771 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 772 Requirement Levels", BCP 14, RFC 2119, March 1997. 773 774 [RFC3492] Costello, A., "Punycode: A Bootstring encoding of 775 Unicode for Internationalized Domain Names in 776 Applications (IDNA)", RFC 3492, March 2003. 777 778 [RFC5890] Klensin, J., "Internationalized Domain Names for 779 Applications (IDNA): Definitions and Document 780 Framework", RFC 5890, August 2010. 781 782 783 784 785 786Klensin Standards Track [Page 14] 787 788RFC 5891 IDNA2008 Protocol August 2010 789 790 791 [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and 792 Internationalized Domain Names for Applications (IDNA)", 793 RFC 5892, August 2010. 794 795 [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts 796 for Internationalized Domain Names for Applications 797 (IDNA)", RFC 5893, August 2010. 798 799 [Unicode-UAX15] 800 The Unicode Consortium, "Unicode Standard Annex #15: 801 Unicode Normalization Forms", September 2009, 802 <http://www.unicode.org/reports/tr15/>. 803 80410.2. Informative References 805 806 [ASCII] American National Standards Institute (formerly United 807 States of America Standards Institute), "USA Code for 808 Information Interchange", ANSI X3.4-1968, 1968. ANSI 809 X3.4-1968 has been replaced by newer versions with 810 slight modifications, but the 1968 version remains 811 definitive for the Internet. 812 813 [IDNA2008-Mapping] 814 Resnick, P. and P. Hoffman, "Mapping Characters in 815 Internationalized Domain Names for Applications (IDNA)", 816 Work in Progress, April 2010. 817 818 [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", 819 RFC 2671, August 1999. 820 821 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 822 "Internationalizing Domain Names in Applications 823 (IDNA)", RFC 3490, March 2003. 824 825 [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 826 Profile for Internationalized Domain Names (IDN)", 827 RFC 3491, March 2003. 828 829 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 830 Resource Identifier (URI): Generic Syntax", STD 66, 831 RFC 3986, January 2005. 832 833 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 834 Identifiers (IRIs)", RFC 3987, January 2005. 835 836 [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review 837 and Recommendations for Internationalized Domain Names 838 (IDNs)", RFC 4690, September 2006. 839 840 841 842Klensin Standards Track [Page 15] 843 844RFC 5891 IDNA2008 Protocol August 2010 845 846 847 [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for 848 Internationalized Email", RFC 4952, July 2007. 849 850 [RFC5894] Klensin, J., "Internationalized Domain Names for 851 Applications (IDNA): Background, Explanation, and 852 Rationale", RFC 5894, August 2010. 853 854 [Unicode] The Unicode Consortium, "The Unicode Standard, Version 855 5.0", 2007. Boston, MA, USA: Addison-Wesley. ISBN 856 0-321-48091-0. This printed reference has now been 857 updated online to reflect additional code points. For 858 code points, the reference at the time this document was 859 published is to Unicode 5.2. 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898Klensin Standards Track [Page 16] 899 900RFC 5891 IDNA2008 Protocol August 2010 901 902 903Appendix A. Summary of Major Changes from IDNA2003 904 905 1. Update base character set from Unicode 3.2 to Unicode version 906 agnostic. 907 908 2. Separate the definitions for the "registration" and "lookup" 909 activities. 910 911 3. Disallow symbol and punctuation characters except where special 912 exceptions are necessary. 913 914 4. Remove the mapping and normalization steps from the protocol and 915 have them, instead, done by the applications themselves, 916 possibly in a local fashion, before invoking the protocol. 917 918 5. Change the way that the protocol specifies which characters are 919 allowed in labels from "humans decide what the table of code 920 points contains" to "decision about code points are based on 921 Unicode properties plus a small exclusion list created by 922 humans". 923 924 6. Introduce the new concept of characters that can be used only in 925 specific contexts. 926 927 7. Allow typical words and names in languages such as Dhivehi and 928 Yiddish to be expressed. 929 930 8. Make bidirectional domain names (delimited strings of labels, 931 not just labels standing on their own) display in a less 932 surprising fashion, whether they appear in obvious domain name 933 contexts or as part of running text in paragraphs. 934 935 9. Remove the dot separator from the mandatory part of the 936 protocol. 937 938 10. Make some currently valid labels that are not actually IDNA 939 labels invalid. 940 941Author's Address 942 943 John C Klensin 944 1770 Massachusetts Ave, Ste 322 945 Cambridge, MA 02140 946 USA 947 948 Phone: +1 617 245 1457 949 EMail: john+ietf@jck.com 950 951 952 953 954Klensin Standards Track [Page 17] 955