···11+Network Working Group P. Mockapetris
22+Request for Comments: 1035 ISI
33+ November 1987
44+Obsoletes: RFCs 882, 883, 973
55+66+ DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION
77+88+99+1. STATUS OF THIS MEMO
1010+1111+This RFC describes the details of the domain system and protocol, and
1212+assumes that the reader is familiar with the concepts discussed in a
1313+companion RFC, "Domain Names - Concepts and Facilities" [RFC-1034].
1414+1515+The domain system is a mixture of functions and data types which are an
1616+official protocol and functions and data types which are still
1717+experimental. Since the domain system is intentionally extensible, new
1818+data types and experimental behavior should always be expected in parts
1919+of the system beyond the official protocol. The official protocol parts
2020+include standard queries, responses and the Internet class RR data
2121+formats (e.g., host addresses). Since the previous RFC set, several
2222+definitions have changed, so some previous definitions are obsolete.
2323+2424+Experimental or obsolete features are clearly marked in these RFCs, and
2525+such information should be used with caution.
2626+2727+The reader is especially cautioned not to depend on the values which
2828+appear in examples to be current or complete, since their purpose is
2929+primarily pedagogical. Distribution of this memo is unlimited.
3030+3131+ Table of Contents
3232+3333+ 1. STATUS OF THIS MEMO 1
3434+ 2. INTRODUCTION 3
3535+ 2.1. Overview 3
3636+ 2.2. Common configurations 4
3737+ 2.3. Conventions 7
3838+ 2.3.1. Preferred name syntax 7
3939+ 2.3.2. Data Transmission Order 8
4040+ 2.3.3. Character Case 9
4141+ 2.3.4. Size limits 10
4242+ 3. DOMAIN NAME SPACE AND RR DEFINITIONS 10
4343+ 3.1. Name space definitions 10
4444+ 3.2. RR definitions 11
4545+ 3.2.1. Format 11
4646+ 3.2.2. TYPE values 12
4747+ 3.2.3. QTYPE values 12
4848+ 3.2.4. CLASS values 13
4949+5050+5151+5252+Mockapetris [Page 1]
5353+5454+RFC 1035 Domain Implementation and Specification November 1987
5555+5656+5757+ 3.2.5. QCLASS values 13
5858+ 3.3. Standard RRs 13
5959+ 3.3.1. CNAME RDATA format 14
6060+ 3.3.2. HINFO RDATA format 14
6161+ 3.3.3. MB RDATA format (EXPERIMENTAL) 14
6262+ 3.3.4. MD RDATA format (Obsolete) 15
6363+ 3.3.5. MF RDATA format (Obsolete) 15
6464+ 3.3.6. MG RDATA format (EXPERIMENTAL) 16
6565+ 3.3.7. MINFO RDATA format (EXPERIMENTAL) 16
6666+ 3.3.8. MR RDATA format (EXPERIMENTAL) 17
6767+ 3.3.9. MX RDATA format 17
6868+ 3.3.10. NULL RDATA format (EXPERIMENTAL) 17
6969+ 3.3.11. NS RDATA format 18
7070+ 3.3.12. PTR RDATA format 18
7171+ 3.3.13. SOA RDATA format 19
7272+ 3.3.14. TXT RDATA format 20
7373+ 3.4. ARPA Internet specific RRs 20
7474+ 3.4.1. A RDATA format 20
7575+ 3.4.2. WKS RDATA format 21
7676+ 3.5. IN-ADDR.ARPA domain 22
7777+ 3.6. Defining new types, classes, and special namespaces 24
7878+ 4. MESSAGES 25
7979+ 4.1. Format 25
8080+ 4.1.1. Header section format 26
8181+ 4.1.2. Question section format 28
8282+ 4.1.3. Resource record format 29
8383+ 4.1.4. Message compression 30
8484+ 4.2. Transport 32
8585+ 4.2.1. UDP usage 32
8686+ 4.2.2. TCP usage 32
8787+ 5. MASTER FILES 33
8888+ 5.1. Format 33
8989+ 5.2. Use of master files to define zones 35
9090+ 5.3. Master file example 36
9191+ 6. NAME SERVER IMPLEMENTATION 37
9292+ 6.1. Architecture 37
9393+ 6.1.1. Control 37
9494+ 6.1.2. Database 37
9595+ 6.1.3. Time 39
9696+ 6.2. Standard query processing 39
9797+ 6.3. Zone refresh and reload processing 39
9898+ 6.4. Inverse queries (Optional) 40
9999+ 6.4.1. The contents of inverse queries and responses 40
100100+ 6.4.2. Inverse query and response example 41
101101+ 6.4.3. Inverse query processing 42
102102+103103+104104+105105+106106+107107+108108+Mockapetris [Page 2]
109109+110110+RFC 1035 Domain Implementation and Specification November 1987
111111+112112+113113+ 6.5. Completion queries and responses 42
114114+ 7. RESOLVER IMPLEMENTATION 43
115115+ 7.1. Transforming a user request into a query 43
116116+ 7.2. Sending the queries 44
117117+ 7.3. Processing responses 46
118118+ 7.4. Using the cache 47
119119+ 8. MAIL SUPPORT 47
120120+ 8.1. Mail exchange binding 48
121121+ 8.2. Mailbox binding (Experimental) 48
122122+ 9. REFERENCES and BIBLIOGRAPHY 50
123123+ Index 54
124124+125125+2. INTRODUCTION
126126+127127+2.1. Overview
128128+129129+The goal of domain names is to provide a mechanism for naming resources
130130+in such a way that the names are usable in different hosts, networks,
131131+protocol families, internets, and administrative organizations.
132132+133133+From the user's point of view, domain names are useful as arguments to a
134134+local agent, called a resolver, which retrieves information associated
135135+with the domain name. Thus a user might ask for the host address or
136136+mail information associated with a particular domain name. To enable
137137+the user to request a particular type of information, an appropriate
138138+query type is passed to the resolver with the domain name. To the user,
139139+the domain tree is a single information space; the resolver is
140140+responsible for hiding the distribution of data among name servers from
141141+the user.
142142+143143+From the resolver's point of view, the database that makes up the domain
144144+space is distributed among various name servers. Different parts of the
145145+domain space are stored in different name servers, although a particular
146146+data item will be stored redundantly in two or more name servers. The
147147+resolver starts with knowledge of at least one name server. When the
148148+resolver processes a user query it asks a known name server for the
149149+information; in return, the resolver either receives the desired
150150+information or a referral to another name server. Using these
151151+referrals, resolvers learn the identities and contents of other name
152152+servers. Resolvers are responsible for dealing with the distribution of
153153+the domain space and dealing with the effects of name server failure by
154154+consulting redundant databases in other servers.
155155+156156+Name servers manage two kinds of data. The first kind of data held in
157157+sets called zones; each zone is the complete database for a particular
158158+"pruned" subtree of the domain space. This data is called
159159+authoritative. A name server periodically checks to make sure that its
160160+zones are up to date, and if not, obtains a new copy of updated zones
161161+162162+163163+164164+Mockapetris [Page 3]
165165+166166+RFC 1035 Domain Implementation and Specification November 1987
167167+168168+169169+from master files stored locally or in another name server. The second
170170+kind of data is cached data which was acquired by a local resolver.
171171+This data may be incomplete, but improves the performance of the
172172+retrieval process when non-local data is repeatedly accessed. Cached
173173+data is eventually discarded by a timeout mechanism.
174174+175175+This functional structure isolates the problems of user interface,
176176+failure recovery, and distribution in the resolvers and isolates the
177177+database update and refresh problems in the name servers.
178178+179179+2.2. Common configurations
180180+181181+A host can participate in the domain name system in a number of ways,
182182+depending on whether the host runs programs that retrieve information
183183+from the domain system, name servers that answer queries from other
184184+hosts, or various combinations of both functions. The simplest, and
185185+perhaps most typical, configuration is shown below:
186186+187187+ Local Host | Foreign
188188+ |
189189+ +---------+ +----------+ | +--------+
190190+ | | user queries | |queries | | |
191191+ | User |-------------->| |---------|->|Foreign |
192192+ | Program | | Resolver | | | Name |
193193+ | |<--------------| |<--------|--| Server |
194194+ | | user responses| |responses| | |
195195+ +---------+ +----------+ | +--------+
196196+ | A |
197197+ cache additions | | references |
198198+ V | |
199199+ +----------+ |
200200+ | cache | |
201201+ +----------+ |
202202+203203+User programs interact with the domain name space through resolvers; the
204204+format of user queries and user responses is specific to the host and
205205+its operating system. User queries will typically be operating system
206206+calls, and the resolver and its cache will be part of the host operating
207207+system. Less capable hosts may choose to implement the resolver as a
208208+subroutine to be linked in with every program that needs its services.
209209+Resolvers answer user queries with information they acquire via queries
210210+to foreign name servers and the local cache.
211211+212212+Note that the resolver may have to make several queries to several
213213+different foreign name servers to answer a particular user query, and
214214+hence the resolution of a user query may involve several network
215215+accesses and an arbitrary amount of time. The queries to foreign name
216216+servers and the corresponding responses have a standard format described
217217+218218+219219+220220+Mockapetris [Page 4]
221221+222222+RFC 1035 Domain Implementation and Specification November 1987
223223+224224+225225+in this memo, and may be datagrams.
226226+227227+Depending on its capabilities, a name server could be a stand alone
228228+program on a dedicated machine or a process or processes on a large
229229+timeshared host. A simple configuration might be:
230230+231231+ Local Host | Foreign
232232+ |
233233+ +---------+ |
234234+ / /| |
235235+ +---------+ | +----------+ | +--------+
236236+ | | | | |responses| | |
237237+ | | | | Name |---------|->|Foreign |
238238+ | Master |-------------->| Server | | |Resolver|
239239+ | files | | | |<--------|--| |
240240+ | |/ | | queries | +--------+
241241+ +---------+ +----------+ |
242242+243243+Here a primary name server acquires information about one or more zones
244244+by reading master files from its local file system, and answers queries
245245+about those zones that arrive from foreign resolvers.
246246+247247+The DNS requires that all zones be redundantly supported by more than
248248+one name server. Designated secondary servers can acquire zones and
249249+check for updates from the primary server using the zone transfer
250250+protocol of the DNS. This configuration is shown below:
251251+252252+ Local Host | Foreign
253253+ |
254254+ +---------+ |
255255+ / /| |
256256+ +---------+ | +----------+ | +--------+
257257+ | | | | |responses| | |
258258+ | | | | Name |---------|->|Foreign |
259259+ | Master |-------------->| Server | | |Resolver|
260260+ | files | | | |<--------|--| |
261261+ | |/ | | queries | +--------+
262262+ +---------+ +----------+ |
263263+ A |maintenance | +--------+
264264+ | +------------|->| |
265265+ | queries | |Foreign |
266266+ | | | Name |
267267+ +------------------|--| Server |
268268+ maintenance responses | +--------+
269269+270270+In this configuration, the name server periodically establishes a
271271+virtual circuit to a foreign name server to acquire a copy of a zone or
272272+to check that an existing copy has not changed. The messages sent for
273273+274274+275275+276276+Mockapetris [Page 5]
277277+278278+RFC 1035 Domain Implementation and Specification November 1987
279279+280280+281281+these maintenance activities follow the same form as queries and
282282+responses, but the message sequences are somewhat different.
283283+284284+The information flow in a host that supports all aspects of the domain
285285+name system is shown below:
286286+287287+ Local Host | Foreign
288288+ |
289289+ +---------+ +----------+ | +--------+
290290+ | | user queries | |queries | | |
291291+ | User |-------------->| |---------|->|Foreign |
292292+ | Program | | Resolver | | | Name |
293293+ | |<--------------| |<--------|--| Server |
294294+ | | user responses| |responses| | |
295295+ +---------+ +----------+ | +--------+
296296+ | A |
297297+ cache additions | | references |
298298+ V | |
299299+ +----------+ |
300300+ | Shared | |
301301+ | database | |
302302+ +----------+ |
303303+ A | |
304304+ +---------+ refreshes | | references |
305305+ / /| | V |
306306+ +---------+ | +----------+ | +--------+
307307+ | | | | |responses| | |
308308+ | | | | Name |---------|->|Foreign |
309309+ | Master |-------------->| Server | | |Resolver|
310310+ | files | | | |<--------|--| |
311311+ | |/ | | queries | +--------+
312312+ +---------+ +----------+ |
313313+ A |maintenance | +--------+
314314+ | +------------|->| |
315315+ | queries | |Foreign |
316316+ | | | Name |
317317+ +------------------|--| Server |
318318+ maintenance responses | +--------+
319319+320320+The shared database holds domain space data for the local name server
321321+and resolver. The contents of the shared database will typically be a
322322+mixture of authoritative data maintained by the periodic refresh
323323+operations of the name server and cached data from previous resolver
324324+requests. The structure of the domain data and the necessity for
325325+synchronization between name servers and resolvers imply the general
326326+characteristics of this database, but the actual format is up to the
327327+local implementor.
328328+329329+330330+331331+332332+Mockapetris [Page 6]
333333+334334+RFC 1035 Domain Implementation and Specification November 1987
335335+336336+337337+Information flow can also be tailored so that a group of hosts act
338338+together to optimize activities. Sometimes this is done to offload less
339339+capable hosts so that they do not have to implement a full resolver.
340340+This can be appropriate for PCs or hosts which want to minimize the
341341+amount of new network code which is required. This scheme can also
342342+allow a group of hosts can share a small number of caches rather than
343343+maintaining a large number of separate caches, on the premise that the
344344+centralized caches will have a higher hit ratio. In either case,
345345+resolvers are replaced with stub resolvers which act as front ends to
346346+resolvers located in a recursive server in one or more name servers
347347+known to perform that service:
348348+349349+ Local Hosts | Foreign
350350+ |
351351+ +---------+ |
352352+ | | responses |
353353+ | Stub |<--------------------+ |
354354+ | Resolver| | |
355355+ | |----------------+ | |
356356+ +---------+ recursive | | |
357357+ queries | | |
358358+ V | |
359359+ +---------+ recursive +----------+ | +--------+
360360+ | | queries | |queries | | |
361361+ | Stub |-------------->| Recursive|---------|->|Foreign |
362362+ | Resolver| | Server | | | Name |
363363+ | |<--------------| |<--------|--| Server |
364364+ +---------+ responses | |responses| | |
365365+ +----------+ | +--------+
366366+ | Central | |
367367+ | cache | |
368368+ +----------+ |
369369+370370+In any case, note that domain components are always replicated for
371371+reliability whenever possible.
372372+373373+2.3. Conventions
374374+375375+The domain system has several conventions dealing with low-level, but
376376+fundamental, issues. While the implementor is free to violate these
377377+conventions WITHIN HIS OWN SYSTEM, he must observe these conventions in
378378+ALL behavior observed from other hosts.
379379+380380+2.3.1. Preferred name syntax
381381+382382+The DNS specifications attempt to be as general as possible in the rules
383383+for constructing domain names. The idea is that the name of any
384384+existing object can be expressed as a domain name with minimal changes.
385385+386386+387387+388388+Mockapetris [Page 7]
389389+390390+RFC 1035 Domain Implementation and Specification November 1987
391391+392392+393393+However, when assigning a domain name for an object, the prudent user
394394+will select a name which satisfies both the rules of the domain system
395395+and any existing rules for the object, whether these rules are published
396396+or implied by existing programs.
397397+398398+For example, when naming a mail domain, the user should satisfy both the
399399+rules of this memo and those in RFC-822. When creating a new host name,
400400+the old rules for HOSTS.TXT should be followed. This avoids problems
401401+when old software is converted to use domain names.
402402+403403+The following syntax will result in fewer problems with many
404404+405405+applications that use domain names (e.g., mail, TELNET).
406406+407407+<domain> ::= <subdomain> | " "
408408+409409+<subdomain> ::= <label> | <subdomain> "." <label>
410410+411411+<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
412412+413413+<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
414414+415415+<let-dig-hyp> ::= <let-dig> | "-"
416416+417417+<let-dig> ::= <letter> | <digit>
418418+419419+<letter> ::= any one of the 52 alphabetic characters A through Z in
420420+upper case and a through z in lower case
421421+422422+<digit> ::= any one of the ten digits 0 through 9
423423+424424+Note that while upper and lower case letters are allowed in domain
425425+names, no significance is attached to the case. That is, two names with
426426+the same spelling but different case are to be treated as if identical.
427427+428428+The labels must follow the rules for ARPANET host names. They must
429429+start with a letter, end with a letter or digit, and have as interior
430430+characters only letters, digits, and hyphen. There are also some
431431+restrictions on the length. Labels must be 63 characters or less.
432432+433433+For example, the following strings identify hosts in the Internet:
434434+435435+A.ISI.EDU XX.LCS.MIT.EDU SRI-NIC.ARPA
436436+437437+2.3.2. Data Transmission Order
438438+439439+The order of transmission of the header and data described in this
440440+document is resolved to the octet level. Whenever a diagram shows a
441441+442442+443443+444444+Mockapetris [Page 8]
445445+446446+RFC 1035 Domain Implementation and Specification November 1987
447447+448448+449449+group of octets, the order of transmission of those octets is the normal
450450+order in which they are read in English. For example, in the following
451451+diagram, the octets are transmitted in the order they are numbered.
452452+453453+ 0 1
454454+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
455455+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
456456+ | 1 | 2 |
457457+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
458458+ | 3 | 4 |
459459+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
460460+ | 5 | 6 |
461461+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
462462+463463+Whenever an octet represents a numeric quantity, the left most bit in
464464+the diagram is the high order or most significant bit. That is, the bit
465465+labeled 0 is the most significant bit. For example, the following
466466+diagram represents the value 170 (decimal).
467467+468468+ 0 1 2 3 4 5 6 7
469469+ +-+-+-+-+-+-+-+-+
470470+ |1 0 1 0 1 0 1 0|
471471+ +-+-+-+-+-+-+-+-+
472472+473473+Similarly, whenever a multi-octet field represents a numeric quantity
474474+the left most bit of the whole field is the most significant bit. When
475475+a multi-octet quantity is transmitted the most significant octet is
476476+transmitted first.
477477+478478+2.3.3. Character Case
479479+480480+For all parts of the DNS that are part of the official protocol, all
481481+comparisons between character strings (e.g., labels, domain names, etc.)
482482+are done in a case-insensitive manner. At present, this rule is in
483483+force throughout the domain system without exception. However, future
484484+additions beyond current usage may need to use the full binary octet
485485+capabilities in names, so attempts to store domain names in 7-bit ASCII
486486+or use of special bytes to terminate labels, etc., should be avoided.
487487+488488+When data enters the domain system, its original case should be
489489+preserved whenever possible. In certain circumstances this cannot be
490490+done. For example, if two RRs are stored in a database, one at x.y and
491491+one at X.Y, they are actually stored at the same place in the database,
492492+and hence only one casing would be preserved. The basic rule is that
493493+case can be discarded only when data is used to define structure in a
494494+database, and two names are identical when compared in a case
495495+insensitive manner.
496496+497497+498498+499499+500500+Mockapetris [Page 9]
501501+502502+RFC 1035 Domain Implementation and Specification November 1987
503503+504504+505505+Loss of case sensitive data must be minimized. Thus while data for x.y
506506+and X.Y may both be stored under a single location x.y or X.Y, data for
507507+a.x and B.X would never be stored under A.x, A.X, b.x, or b.X. In
508508+general, this preserves the case of the first label of a domain name,
509509+but forces standardization of interior node labels.
510510+511511+Systems administrators who enter data into the domain database should
512512+take care to represent the data they supply to the domain system in a
513513+case-consistent manner if their system is case-sensitive. The data
514514+distribution system in the domain system will ensure that consistent
515515+representations are preserved.
516516+517517+2.3.4. Size limits
518518+519519+Various objects and parameters in the DNS have size limits. They are
520520+listed below. Some could be easily changed, others are more
521521+fundamental.
522522+523523+labels 63 octets or less
524524+525525+names 255 octets or less
526526+527527+TTL positive values of a signed 32 bit number.
528528+529529+UDP messages 512 octets or less
530530+531531+3. DOMAIN NAME SPACE AND RR DEFINITIONS
532532+533533+3.1. Name space definitions
534534+535535+Domain names in messages are expressed in terms of a sequence of labels.
536536+Each label is represented as a one octet length field followed by that
537537+number of octets. Since every domain name ends with the null label of
538538+the root, a domain name is terminated by a length byte of zero. The
539539+high order two bits of every length octet must be zero, and the
540540+remaining six bits of the length field limit the label to 63 octets or
541541+less.
542542+543543+To simplify implementations, the total length of a domain name (i.e.,
544544+label octets and label length octets) is restricted to 255 octets or
545545+less.
546546+547547+Although labels can contain any 8 bit values in octets that make up a
548548+label, it is strongly recommended that labels follow the preferred
549549+syntax described elsewhere in this memo, which is compatible with
550550+existing host naming conventions. Name servers and resolvers must
551551+compare labels in a case-insensitive manner (i.e., A=a), assuming ASCII
552552+with zero parity. Non-alphabetic codes must match exactly.
553553+554554+555555+556556+Mockapetris [Page 10]
557557+558558+RFC 1035 Domain Implementation and Specification November 1987
559559+560560+561561+3.2. RR definitions
562562+563563+3.2.1. Format
564564+565565+All RRs have the same top level format shown below:
566566+567567+ 1 1 1 1 1 1
568568+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
569569+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
570570+ | |
571571+ / /
572572+ / NAME /
573573+ | |
574574+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
575575+ | TYPE |
576576+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
577577+ | CLASS |
578578+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
579579+ | TTL |
580580+ | |
581581+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
582582+ | RDLENGTH |
583583+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
584584+ / RDATA /
585585+ / /
586586+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
587587+588588+589589+where:
590590+591591+NAME an owner name, i.e., the name of the node to which this
592592+ resource record pertains.
593593+594594+TYPE two octets containing one of the RR TYPE codes.
595595+596596+CLASS two octets containing one of the RR CLASS codes.
597597+598598+TTL a 32 bit signed integer that specifies the time interval
599599+ that the resource record may be cached before the source
600600+ of the information should again be consulted. Zero
601601+ values are interpreted to mean that the RR can only be
602602+ used for the transaction in progress, and should not be
603603+ cached. For example, SOA records are always distributed
604604+ with a zero TTL to prohibit caching. Zero values can
605605+ also be used for extremely volatile data.
606606+607607+RDLENGTH an unsigned 16 bit integer that specifies the length in
608608+ octets of the RDATA field.
609609+610610+611611+612612+Mockapetris [Page 11]
613613+614614+RFC 1035 Domain Implementation and Specification November 1987
615615+616616+617617+RDATA a variable length string of octets that describes the
618618+ resource. The format of this information varies
619619+ according to the TYPE and CLASS of the resource record.
620620+621621+3.2.2. TYPE values
622622+623623+TYPE fields are used in resource records. Note that these types are a
624624+subset of QTYPEs.
625625+626626+TYPE value and meaning
627627+628628+A 1 a host address
629629+630630+NS 2 an authoritative name server
631631+632632+MD 3 a mail destination (Obsolete - use MX)
633633+634634+MF 4 a mail forwarder (Obsolete - use MX)
635635+636636+CNAME 5 the canonical name for an alias
637637+638638+SOA 6 marks the start of a zone of authority
639639+640640+MB 7 a mailbox domain name (EXPERIMENTAL)
641641+642642+MG 8 a mail group member (EXPERIMENTAL)
643643+644644+MR 9 a mail rename domain name (EXPERIMENTAL)
645645+646646+NULL 10 a null RR (EXPERIMENTAL)
647647+648648+WKS 11 a well known service description
649649+650650+PTR 12 a domain name pointer
651651+652652+HINFO 13 host information
653653+654654+MINFO 14 mailbox or mail list information
655655+656656+MX 15 mail exchange
657657+658658+TXT 16 text strings
659659+660660+3.2.3. QTYPE values
661661+662662+QTYPE fields appear in the question part of a query. QTYPES are a
663663+superset of TYPEs, hence all TYPEs are valid QTYPEs. In addition, the
664664+following QTYPEs are defined:
665665+666666+667667+668668+Mockapetris [Page 12]
669669+670670+RFC 1035 Domain Implementation and Specification November 1987
671671+672672+673673+AXFR 252 A request for a transfer of an entire zone
674674+675675+MAILB 253 A request for mailbox-related records (MB, MG or MR)
676676+677677+MAILA 254 A request for mail agent RRs (Obsolete - see MX)
678678+679679+* 255 A request for all records
680680+681681+3.2.4. CLASS values
682682+683683+CLASS fields appear in resource records. The following CLASS mnemonics
684684+and values are defined:
685685+686686+IN 1 the Internet
687687+688688+CS 2 the CSNET class (Obsolete - used only for examples in
689689+ some obsolete RFCs)
690690+691691+CH 3 the CHAOS class
692692+693693+HS 4 Hesiod [Dyer 87]
694694+695695+3.2.5. QCLASS values
696696+697697+QCLASS fields appear in the question section of a query. QCLASS values
698698+are a superset of CLASS values; every CLASS is a valid QCLASS. In
699699+addition to CLASS values, the following QCLASSes are defined:
700700+701701+* 255 any class
702702+703703+3.3. Standard RRs
704704+705705+The following RR definitions are expected to occur, at least
706706+potentially, in all classes. In particular, NS, SOA, CNAME, and PTR
707707+will be used in all classes, and have the same format in all classes.
708708+Because their RDATA format is known, all domain names in the RDATA
709709+section of these RRs may be compressed.
710710+711711+<domain-name> is a domain name represented as a series of labels, and
712712+terminated by a label with zero length. <character-string> is a single
713713+length octet followed by that number of characters. <character-string>
714714+is treated as binary information, and can be up to 256 characters in
715715+length (including the length octet).
716716+717717+718718+719719+720720+721721+722722+723723+724724+Mockapetris [Page 13]
725725+726726+RFC 1035 Domain Implementation and Specification November 1987
727727+728728+729729+3.3.1. CNAME RDATA format
730730+731731+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
732732+ / CNAME /
733733+ / /
734734+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
735735+736736+where:
737737+738738+CNAME A <domain-name> which specifies the canonical or primary
739739+ name for the owner. The owner name is an alias.
740740+741741+CNAME RRs cause no additional section processing, but name servers may
742742+choose to restart the query at the canonical name in certain cases. See
743743+the description of name server logic in [RFC-1034] for details.
744744+745745+3.3.2. HINFO RDATA format
746746+747747+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
748748+ / CPU /
749749+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
750750+ / OS /
751751+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
752752+753753+where:
754754+755755+CPU A <character-string> which specifies the CPU type.
756756+757757+OS A <character-string> which specifies the operating
758758+ system type.
759759+760760+Standard values for CPU and OS can be found in [RFC-1010].
761761+762762+HINFO records are used to acquire general information about a host. The
763763+main use is for protocols such as FTP that can use special procedures
764764+when talking between machines or operating systems of the same type.
765765+766766+3.3.3. MB RDATA format (EXPERIMENTAL)
767767+768768+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
769769+ / MADNAME /
770770+ / /
771771+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
772772+773773+where:
774774+775775+MADNAME A <domain-name> which specifies a host which has the
776776+ specified mailbox.
777777+778778+779779+780780+Mockapetris [Page 14]
781781+782782+RFC 1035 Domain Implementation and Specification November 1987
783783+784784+785785+MB records cause additional section processing which looks up an A type
786786+RRs corresponding to MADNAME.
787787+788788+3.3.4. MD RDATA format (Obsolete)
789789+790790+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
791791+ / MADNAME /
792792+ / /
793793+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
794794+795795+where:
796796+797797+MADNAME A <domain-name> which specifies a host which has a mail
798798+ agent for the domain which should be able to deliver
799799+ mail for the domain.
800800+801801+MD records cause additional section processing which looks up an A type
802802+record corresponding to MADNAME.
803803+804804+MD is obsolete. See the definition of MX and [RFC-974] for details of
805805+the new scheme. The recommended policy for dealing with MD RRs found in
806806+a master file is to reject them, or to convert them to MX RRs with a
807807+preference of 0.
808808+809809+3.3.5. MF RDATA format (Obsolete)
810810+811811+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
812812+ / MADNAME /
813813+ / /
814814+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
815815+816816+where:
817817+818818+MADNAME A <domain-name> which specifies a host which has a mail
819819+ agent for the domain which will accept mail for
820820+ forwarding to the domain.
821821+822822+MF records cause additional section processing which looks up an A type
823823+record corresponding to MADNAME.
824824+825825+MF is obsolete. See the definition of MX and [RFC-974] for details ofw
826826+the new scheme. The recommended policy for dealing with MD RRs found in
827827+a master file is to reject them, or to convert them to MX RRs with a
828828+preference of 10.
829829+830830+831831+832832+833833+834834+835835+836836+Mockapetris [Page 15]
837837+838838+RFC 1035 Domain Implementation and Specification November 1987
839839+840840+841841+3.3.6. MG RDATA format (EXPERIMENTAL)
842842+843843+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
844844+ / MGMNAME /
845845+ / /
846846+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
847847+848848+where:
849849+850850+MGMNAME A <domain-name> which specifies a mailbox which is a
851851+ member of the mail group specified by the domain name.
852852+853853+MG records cause no additional section processing.
854854+855855+3.3.7. MINFO RDATA format (EXPERIMENTAL)
856856+857857+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
858858+ / RMAILBX /
859859+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
860860+ / EMAILBX /
861861+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
862862+863863+where:
864864+865865+RMAILBX A <domain-name> which specifies a mailbox which is
866866+ responsible for the mailing list or mailbox. If this
867867+ domain name names the root, the owner of the MINFO RR is
868868+ responsible for itself. Note that many existing mailing
869869+ lists use a mailbox X-request for the RMAILBX field of
870870+ mailing list X, e.g., Msgroup-request for Msgroup. This
871871+ field provides a more general mechanism.
872872+873873+874874+EMAILBX A <domain-name> which specifies a mailbox which is to
875875+ receive error messages related to the mailing list or
876876+ mailbox specified by the owner of the MINFO RR (similar
877877+ to the ERRORS-TO: field which has been proposed). If
878878+ this domain name names the root, errors should be
879879+ returned to the sender of the message.
880880+881881+MINFO records cause no additional section processing. Although these
882882+records can be associated with a simple mailbox, they are usually used
883883+with a mailing list.
884884+885885+886886+887887+888888+889889+890890+891891+892892+Mockapetris [Page 16]
893893+894894+RFC 1035 Domain Implementation and Specification November 1987
895895+896896+897897+3.3.8. MR RDATA format (EXPERIMENTAL)
898898+899899+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
900900+ / NEWNAME /
901901+ / /
902902+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
903903+904904+where:
905905+906906+NEWNAME A <domain-name> which specifies a mailbox which is the
907907+ proper rename of the specified mailbox.
908908+909909+MR records cause no additional section processing. The main use for MR
910910+is as a forwarding entry for a user who has moved to a different
911911+mailbox.
912912+913913+3.3.9. MX RDATA format
914914+915915+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
916916+ | PREFERENCE |
917917+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
918918+ / EXCHANGE /
919919+ / /
920920+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
921921+922922+where:
923923+924924+PREFERENCE A 16 bit integer which specifies the preference given to
925925+ this RR among others at the same owner. Lower values
926926+ are preferred.
927927+928928+EXCHANGE A <domain-name> which specifies a host willing to act as
929929+ a mail exchange for the owner name.
930930+931931+MX records cause type A additional section processing for the host
932932+specified by EXCHANGE. The use of MX RRs is explained in detail in
933933+[RFC-974].
934934+935935+3.3.10. NULL RDATA format (EXPERIMENTAL)
936936+937937+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
938938+ / <anything> /
939939+ / /
940940+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
941941+942942+Anything at all may be in the RDATA field so long as it is 65535 octets
943943+or less.
944944+945945+946946+947947+948948+Mockapetris [Page 17]
949949+950950+RFC 1035 Domain Implementation and Specification November 1987
951951+952952+953953+NULL records cause no additional section processing. NULL RRs are not
954954+allowed in master files. NULLs are used as placeholders in some
955955+experimental extensions of the DNS.
956956+957957+3.3.11. NS RDATA format
958958+959959+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
960960+ / NSDNAME /
961961+ / /
962962+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
963963+964964+where:
965965+966966+NSDNAME A <domain-name> which specifies a host which should be
967967+ authoritative for the specified class and domain.
968968+969969+NS records cause both the usual additional section processing to locate
970970+a type A record, and, when used in a referral, a special search of the
971971+zone in which they reside for glue information.
972972+973973+The NS RR states that the named host should be expected to have a zone
974974+starting at owner name of the specified class. Note that the class may
975975+not indicate the protocol family which should be used to communicate
976976+with the host, although it is typically a strong hint. For example,
977977+hosts which are name servers for either Internet (IN) or Hesiod (HS)
978978+class information are normally queried using IN class protocols.
979979+980980+3.3.12. PTR RDATA format
981981+982982+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
983983+ / PTRDNAME /
984984+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
985985+986986+where:
987987+988988+PTRDNAME A <domain-name> which points to some location in the
989989+ domain name space.
990990+991991+PTR records cause no additional section processing. These RRs are used
992992+in special domains to point to some other location in the domain space.
993993+These records are simple data, and don't imply any special processing
994994+similar to that performed by CNAME, which identifies aliases. See the
995995+description of the IN-ADDR.ARPA domain for an example.
996996+997997+998998+999999+10001000+10011001+10021002+10031003+10041004+Mockapetris [Page 18]
10051005+10061006+RFC 1035 Domain Implementation and Specification November 1987
10071007+10081008+10091009+3.3.13. SOA RDATA format
10101010+10111011+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10121012+ / MNAME /
10131013+ / /
10141014+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10151015+ / RNAME /
10161016+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10171017+ | SERIAL |
10181018+ | |
10191019+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10201020+ | REFRESH |
10211021+ | |
10221022+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10231023+ | RETRY |
10241024+ | |
10251025+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10261026+ | EXPIRE |
10271027+ | |
10281028+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10291029+ | MINIMUM |
10301030+ | |
10311031+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10321032+10331033+where:
10341034+10351035+MNAME The <domain-name> of the name server that was the
10361036+ original or primary source of data for this zone.
10371037+10381038+RNAME A <domain-name> which specifies the mailbox of the
10391039+ person responsible for this zone.
10401040+10411041+SERIAL The unsigned 32 bit version number of the original copy
10421042+ of the zone. Zone transfers preserve this value. This
10431043+ value wraps and should be compared using sequence space
10441044+ arithmetic.
10451045+10461046+REFRESH A 32 bit time interval before the zone should be
10471047+ refreshed.
10481048+10491049+RETRY A 32 bit time interval that should elapse before a
10501050+ failed refresh should be retried.
10511051+10521052+EXPIRE A 32 bit time value that specifies the upper limit on
10531053+ the time interval that can elapse before the zone is no
10541054+ longer authoritative.
10551055+10561056+10571057+10581058+10591059+10601060+Mockapetris [Page 19]
10611061+10621062+RFC 1035 Domain Implementation and Specification November 1987
10631063+10641064+10651065+MINIMUM The unsigned 32 bit minimum TTL field that should be
10661066+ exported with any RR from this zone.
10671067+10681068+SOA records cause no additional section processing.
10691069+10701070+All times are in units of seconds.
10711071+10721072+Most of these fields are pertinent only for name server maintenance
10731073+operations. However, MINIMUM is used in all query operations that
10741074+retrieve RRs from a zone. Whenever a RR is sent in a response to a
10751075+query, the TTL field is set to the maximum of the TTL field from the RR
10761076+and the MINIMUM field in the appropriate SOA. Thus MINIMUM is a lower
10771077+bound on the TTL field for all RRs in a zone. Note that this use of
10781078+MINIMUM should occur when the RRs are copied into the response and not
10791079+when the zone is loaded from a master file or via a zone transfer. The
10801080+reason for this provison is to allow future dynamic update facilities to
10811081+change the SOA RR with known semantics.
10821082+10831083+10841084+3.3.14. TXT RDATA format
10851085+10861086+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10871087+ / TXT-DATA /
10881088+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
10891089+10901090+where:
10911091+10921092+TXT-DATA One or more <character-string>s.
10931093+10941094+TXT RRs are used to hold descriptive text. The semantics of the text
10951095+depends on the domain where it is found.
10961096+10971097+3.4. Internet specific RRs
10981098+10991099+3.4.1. A RDATA format
11001100+11011101+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
11021102+ | ADDRESS |
11031103+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
11041104+11051105+where:
11061106+11071107+ADDRESS A 32 bit Internet address.
11081108+11091109+Hosts that have multiple Internet addresses will have multiple A
11101110+records.
11111111+11121112+11131113+11141114+11151115+11161116+Mockapetris [Page 20]
11171117+11181118+RFC 1035 Domain Implementation and Specification November 1987
11191119+11201120+11211121+A records cause no additional section processing. The RDATA section of
11221122+an A line in a master file is an Internet address expressed as four
11231123+decimal numbers separated by dots without any imbedded spaces (e.g.,
11241124+"10.2.0.52" or "192.0.5.6").
11251125+11261126+3.4.2. WKS RDATA format
11271127+11281128+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
11291129+ | ADDRESS |
11301130+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
11311131+ | PROTOCOL | |
11321132+ +--+--+--+--+--+--+--+--+ |
11331133+ | |
11341134+ / <BIT MAP> /
11351135+ / /
11361136+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
11371137+11381138+where:
11391139+11401140+ADDRESS An 32 bit Internet address
11411141+11421142+PROTOCOL An 8 bit IP protocol number
11431143+11441144+<BIT MAP> A variable length bit map. The bit map must be a
11451145+ multiple of 8 bits long.
11461146+11471147+The WKS record is used to describe the well known services supported by
11481148+a particular protocol on a particular internet address. The PROTOCOL
11491149+field specifies an IP protocol number, and the bit map has one bit per
11501150+port of the specified protocol. The first bit corresponds to port 0,
11511151+the second to port 1, etc. If the bit map does not include a bit for a
11521152+protocol of interest, that bit is assumed zero. The appropriate values
11531153+and mnemonics for ports and protocols are specified in [RFC-1010].
11541154+11551155+For example, if PROTOCOL=TCP (6), the 26th bit corresponds to TCP port
11561156+25 (SMTP). If this bit is set, a SMTP server should be listening on TCP
11571157+port 25; if zero, SMTP service is not supported on the specified
11581158+address.
11591159+11601160+The purpose of WKS RRs is to provide availability information for
11611161+servers for TCP and UDP. If a server supports both TCP and UDP, or has
11621162+multiple Internet addresses, then multiple WKS RRs are used.
11631163+11641164+WKS RRs cause no additional section processing.
11651165+11661166+In master files, both ports and protocols are expressed using mnemonics
11671167+or decimal numbers.
11681168+11691169+11701170+11711171+11721172+Mockapetris [Page 21]
11731173+11741174+RFC 1035 Domain Implementation and Specification November 1987
11751175+11761176+11771177+3.5. IN-ADDR.ARPA domain
11781178+11791179+The Internet uses a special domain to support gateway location and
11801180+Internet address to host mapping. Other classes may employ a similar
11811181+strategy in other domains. The intent of this domain is to provide a
11821182+guaranteed method to perform host address to host name mapping, and to
11831183+facilitate queries to locate all gateways on a particular network in the
11841184+Internet.
11851185+11861186+Note that both of these services are similar to functions that could be
11871187+performed by inverse queries; the difference is that this part of the
11881188+domain name space is structured according to address, and hence can
11891189+guarantee that the appropriate data can be located without an exhaustive
11901190+search of the domain space.
11911191+11921192+The domain begins at IN-ADDR.ARPA and has a substructure which follows
11931193+the Internet addressing structure.
11941194+11951195+Domain names in the IN-ADDR.ARPA domain are defined to have up to four
11961196+labels in addition to the IN-ADDR.ARPA suffix. Each label represents
11971197+one octet of an Internet address, and is expressed as a character string
11981198+for a decimal value in the range 0-255 (with leading zeros omitted
11991199+except in the case of a zero octet which is represented by a single
12001200+zero).
12011201+12021202+Host addresses are represented by domain names that have all four labels
12031203+specified. Thus data for Internet address 10.2.0.52 is located at
12041204+domain name 52.0.2.10.IN-ADDR.ARPA. The reversal, though awkward to
12051205+read, allows zones to be delegated which are exactly one network of
12061206+address space. For example, 10.IN-ADDR.ARPA can be a zone containing
12071207+data for the ARPANET, while 26.IN-ADDR.ARPA can be a separate zone for
12081208+MILNET. Address nodes are used to hold pointers to primary host names
12091209+in the normal domain space.
12101210+12111211+Network numbers correspond to some non-terminal nodes at various depths
12121212+in the IN-ADDR.ARPA domain, since Internet network numbers are either 1,
12131213+2, or 3 octets. Network nodes are used to hold pointers to the primary
12141214+host names of gateways attached to that network. Since a gateway is, by
12151215+definition, on more than one network, it will typically have two or more
12161216+network nodes which point at it. Gateways will also have host level
12171217+pointers at their fully qualified addresses.
12181218+12191219+Both the gateway pointers at network nodes and the normal host pointers
12201220+at full address nodes use the PTR RR to point back to the primary domain
12211221+names of the corresponding hosts.
12221222+12231223+For example, the IN-ADDR.ARPA domain will contain information about the
12241224+ISI gateway between net 10 and 26, an MIT gateway from net 10 to MIT's
12251225+12261226+12271227+12281228+Mockapetris [Page 22]
12291229+12301230+RFC 1035 Domain Implementation and Specification November 1987
12311231+12321232+12331233+net 18, and hosts A.ISI.EDU and MULTICS.MIT.EDU. Assuming that ISI
12341234+gateway has addresses 10.2.0.22 and 26.0.0.103, and a name MILNET-
12351235+GW.ISI.EDU, and the MIT gateway has addresses 10.0.0.77 and 18.10.0.4
12361236+and a name GW.LCS.MIT.EDU, the domain database would contain:
12371237+12381238+ 10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU.
12391239+ 10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU.
12401240+ 18.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU.
12411241+ 26.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU.
12421242+ 22.0.2.10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU.
12431243+ 103.0.0.26.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU.
12441244+ 77.0.0.10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU.
12451245+ 4.0.10.18.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU.
12461246+ 103.0.3.26.IN-ADDR.ARPA. PTR A.ISI.EDU.
12471247+ 6.0.0.10.IN-ADDR.ARPA. PTR MULTICS.MIT.EDU.
12481248+12491249+Thus a program which wanted to locate gateways on net 10 would originate
12501250+a query of the form QTYPE=PTR, QCLASS=IN, QNAME=10.IN-ADDR.ARPA. It
12511251+would receive two RRs in response:
12521252+12531253+ 10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU.
12541254+ 10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU.
12551255+12561256+The program could then originate QTYPE=A, QCLASS=IN queries for MILNET-
12571257+GW.ISI.EDU. and GW.LCS.MIT.EDU. to discover the Internet addresses of
12581258+these gateways.
12591259+12601260+A resolver which wanted to find the host name corresponding to Internet
12611261+host address 10.0.0.6 would pursue a query of the form QTYPE=PTR,
12621262+QCLASS=IN, QNAME=6.0.0.10.IN-ADDR.ARPA, and would receive:
12631263+12641264+ 6.0.0.10.IN-ADDR.ARPA. PTR MULTICS.MIT.EDU.
12651265+12661266+Several cautions apply to the use of these services:
12671267+ - Since the IN-ADDR.ARPA special domain and the normal domain
12681268+ for a particular host or gateway will be in different zones,
12691269+ the possibility exists that that the data may be inconsistent.
12701270+12711271+ - Gateways will often have two names in separate domains, only
12721272+ one of which can be primary.
12731273+12741274+ - Systems that use the domain database to initialize their
12751275+ routing tables must start with enough gateway information to
12761276+ guarantee that they can access the appropriate name server.
12771277+12781278+ - The gateway data only reflects the existence of a gateway in a
12791279+ manner equivalent to the current HOSTS.TXT file. It doesn't
12801280+ replace the dynamic availability information from GGP or EGP.
12811281+12821282+12831283+12841284+Mockapetris [Page 23]
12851285+12861286+RFC 1035 Domain Implementation and Specification November 1987
12871287+12881288+12891289+3.6. Defining new types, classes, and special namespaces
12901290+12911291+The previously defined types and classes are the ones in use as of the
12921292+date of this memo. New definitions should be expected. This section
12931293+makes some recommendations to designers considering additions to the
12941294+existing facilities. The mailing list NAMEDROPPERS@SRI-NIC.ARPA is the
12951295+forum where general discussion of design issues takes place.
12961296+12971297+In general, a new type is appropriate when new information is to be
12981298+added to the database about an existing object, or we need new data
12991299+formats for some totally new object. Designers should attempt to define
13001300+types and their RDATA formats that are generally applicable to all
13011301+classes, and which avoid duplication of information. New classes are
13021302+appropriate when the DNS is to be used for a new protocol, etc which
13031303+requires new class-specific data formats, or when a copy of the existing
13041304+name space is desired, but a separate management domain is necessary.
13051305+13061306+New types and classes need mnemonics for master files; the format of the
13071307+master files requires that the mnemonics for type and class be disjoint.
13081308+13091309+TYPE and CLASS values must be a proper subset of QTYPEs and QCLASSes
13101310+respectively.
13111311+13121312+The present system uses multiple RRs to represent multiple values of a
13131313+type rather than storing multiple values in the RDATA section of a
13141314+single RR. This is less efficient for most applications, but does keep
13151315+RRs shorter. The multiple RRs assumption is incorporated in some
13161316+experimental work on dynamic update methods.
13171317+13181318+The present system attempts to minimize the duplication of data in the
13191319+database in order to insure consistency. Thus, in order to find the
13201320+address of the host for a mail exchange, you map the mail domain name to
13211321+a host name, then the host name to addresses, rather than a direct
13221322+mapping to host address. This approach is preferred because it avoids
13231323+the opportunity for inconsistency.
13241324+13251325+In defining a new type of data, multiple RR types should not be used to
13261326+create an ordering between entries or express different formats for
13271327+equivalent bindings, instead this information should be carried in the
13281328+body of the RR and a single type used. This policy avoids problems with
13291329+caching multiple types and defining QTYPEs to match multiple types.
13301330+13311331+For example, the original form of mail exchange binding used two RR
13321332+types one to represent a "closer" exchange (MD) and one to represent a
13331333+"less close" exchange (MF). The difficulty is that the presence of one
13341334+RR type in a cache doesn't convey any information about the other
13351335+because the query which acquired the cached information might have used
13361336+a QTYPE of MF, MD, or MAILA (which matched both). The redesigned
13371337+13381338+13391339+13401340+Mockapetris [Page 24]
13411341+13421342+RFC 1035 Domain Implementation and Specification November 1987
13431343+13441344+13451345+service used a single type (MX) with a "preference" value in the RDATA
13461346+section which can order different RRs. However, if any MX RRs are found
13471347+in the cache, then all should be there.
13481348+13491349+4. MESSAGES
13501350+13511351+4.1. Format
13521352+13531353+All communications inside of the domain protocol are carried in a single
13541354+format called a message. The top level format of message is divided
13551355+into 5 sections (some of which are empty in certain cases) shown below:
13561356+13571357+ +---------------------+
13581358+ | Header |
13591359+ +---------------------+
13601360+ | Question | the question for the name server
13611361+ +---------------------+
13621362+ | Answer | RRs answering the question
13631363+ +---------------------+
13641364+ | Authority | RRs pointing toward an authority
13651365+ +---------------------+
13661366+ | Additional | RRs holding additional information
13671367+ +---------------------+
13681368+13691369+The header section is always present. The header includes fields that
13701370+specify which of the remaining sections are present, and also specify
13711371+whether the message is a query or a response, a standard query or some
13721372+other opcode, etc.
13731373+13741374+The names of the sections after the header are derived from their use in
13751375+standard queries. The question section contains fields that describe a
13761376+question to a name server. These fields are a query type (QTYPE), a
13771377+query class (QCLASS), and a query domain name (QNAME). The last three
13781378+sections have the same format: a possibly empty list of concatenated
13791379+resource records (RRs). The answer section contains RRs that answer the
13801380+question; the authority section contains RRs that point toward an
13811381+authoritative name server; the additional records section contains RRs
13821382+which relate to the query, but are not strictly answers for the
13831383+question.
13841384+13851385+13861386+13871387+13881388+13891389+13901390+13911391+13921392+13931393+13941394+13951395+13961396+Mockapetris [Page 25]
13971397+13981398+RFC 1035 Domain Implementation and Specification November 1987
13991399+14001400+14011401+4.1.1. Header section format
14021402+14031403+The header contains the following fields:
14041404+14051405+ 1 1 1 1 1 1
14061406+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
14071407+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14081408+ | ID |
14091409+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14101410+ |QR| Opcode |AA|TC|RD|RA| Z | RCODE |
14111411+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14121412+ | QDCOUNT |
14131413+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14141414+ | ANCOUNT |
14151415+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14161416+ | NSCOUNT |
14171417+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14181418+ | ARCOUNT |
14191419+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
14201420+14211421+where:
14221422+14231423+ID A 16 bit identifier assigned by the program that
14241424+ generates any kind of query. This identifier is copied
14251425+ the corresponding reply and can be used by the requester
14261426+ to match up replies to outstanding queries.
14271427+14281428+QR A one bit field that specifies whether this message is a
14291429+ query (0), or a response (1).
14301430+14311431+OPCODE A four bit field that specifies kind of query in this
14321432+ message. This value is set by the originator of a query
14331433+ and copied into the response. The values are:
14341434+14351435+ 0 a standard query (QUERY)
14361436+14371437+ 1 an inverse query (IQUERY)
14381438+14391439+ 2 a server status request (STATUS)
14401440+14411441+ 3-15 reserved for future use
14421442+14431443+AA Authoritative Answer - this bit is valid in responses,
14441444+ and specifies that the responding name server is an
14451445+ authority for the domain name in question section.
14461446+14471447+ Note that the contents of the answer section may have
14481448+ multiple owner names because of aliases. The AA bit
14491449+14501450+14511451+14521452+Mockapetris [Page 26]
14531453+14541454+RFC 1035 Domain Implementation and Specification November 1987
14551455+14561456+14571457+ corresponds to the name which matches the query name, or
14581458+ the first owner name in the answer section.
14591459+14601460+TC TrunCation - specifies that this message was truncated
14611461+ due to length greater than that permitted on the
14621462+ transmission channel.
14631463+14641464+RD Recursion Desired - this bit may be set in a query and
14651465+ is copied into the response. If RD is set, it directs
14661466+ the name server to pursue the query recursively.
14671467+ Recursive query support is optional.
14681468+14691469+RA Recursion Available - this be is set or cleared in a
14701470+ response, and denotes whether recursive query support is
14711471+ available in the name server.
14721472+14731473+Z Reserved for future use. Must be zero in all queries
14741474+ and responses.
14751475+14761476+RCODE Response code - this 4 bit field is set as part of
14771477+ responses. The values have the following
14781478+ interpretation:
14791479+14801480+ 0 No error condition
14811481+14821482+ 1 Format error - The name server was
14831483+ unable to interpret the query.
14841484+14851485+ 2 Server failure - The name server was
14861486+ unable to process this query due to a
14871487+ problem with the name server.
14881488+14891489+ 3 Name Error - Meaningful only for
14901490+ responses from an authoritative name
14911491+ server, this code signifies that the
14921492+ domain name referenced in the query does
14931493+ not exist.
14941494+14951495+ 4 Not Implemented - The name server does
14961496+ not support the requested kind of query.
14971497+14981498+ 5 Refused - The name server refuses to
14991499+ perform the specified operation for
15001500+ policy reasons. For example, a name
15011501+ server may not wish to provide the
15021502+ information to the particular requester,
15031503+ or a name server may not wish to perform
15041504+ a particular operation (e.g., zone
15051505+15061506+15071507+15081508+Mockapetris [Page 27]
15091509+15101510+RFC 1035 Domain Implementation and Specification November 1987
15111511+15121512+15131513+ transfer) for particular data.
15141514+15151515+ 6-15 Reserved for future use.
15161516+15171517+QDCOUNT an unsigned 16 bit integer specifying the number of
15181518+ entries in the question section.
15191519+15201520+ANCOUNT an unsigned 16 bit integer specifying the number of
15211521+ resource records in the answer section.
15221522+15231523+NSCOUNT an unsigned 16 bit integer specifying the number of name
15241524+ server resource records in the authority records
15251525+ section.
15261526+15271527+ARCOUNT an unsigned 16 bit integer specifying the number of
15281528+ resource records in the additional records section.
15291529+15301530+4.1.2. Question section format
15311531+15321532+The question section is used to carry the "question" in most queries,
15331533+i.e., the parameters that define what is being asked. The section
15341534+contains QDCOUNT (usually 1) entries, each of the following format:
15351535+15361536+ 1 1 1 1 1 1
15371537+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
15381538+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15391539+ | |
15401540+ / QNAME /
15411541+ / /
15421542+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15431543+ | QTYPE |
15441544+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15451545+ | QCLASS |
15461546+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15471547+15481548+where:
15491549+15501550+QNAME a domain name represented as a sequence of labels, where
15511551+ each label consists of a length octet followed by that
15521552+ number of octets. The domain name terminates with the
15531553+ zero length octet for the null label of the root. Note
15541554+ that this field may be an odd number of octets; no
15551555+ padding is used.
15561556+15571557+QTYPE a two octet code which specifies the type of the query.
15581558+ The values for this field include all codes valid for a
15591559+ TYPE field, together with some more general codes which
15601560+ can match more than one type of RR.
15611561+15621562+15631563+15641564+Mockapetris [Page 28]
15651565+15661566+RFC 1035 Domain Implementation and Specification November 1987
15671567+15681568+15691569+QCLASS a two octet code that specifies the class of the query.
15701570+ For example, the QCLASS field is IN for the Internet.
15711571+15721572+4.1.3. Resource record format
15731573+15741574+The answer, authority, and additional sections all share the same
15751575+format: a variable number of resource records, where the number of
15761576+records is specified in the corresponding count field in the header.
15771577+Each resource record has the following format:
15781578+ 1 1 1 1 1 1
15791579+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
15801580+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15811581+ | |
15821582+ / /
15831583+ / NAME /
15841584+ | |
15851585+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15861586+ | TYPE |
15871587+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15881588+ | CLASS |
15891589+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15901590+ | TTL |
15911591+ | |
15921592+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15931593+ | RDLENGTH |
15941594+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
15951595+ / RDATA /
15961596+ / /
15971597+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
15981598+15991599+where:
16001600+16011601+NAME a domain name to which this resource record pertains.
16021602+16031603+TYPE two octets containing one of the RR type codes. This
16041604+ field specifies the meaning of the data in the RDATA
16051605+ field.
16061606+16071607+CLASS two octets which specify the class of the data in the
16081608+ RDATA field.
16091609+16101610+TTL a 32 bit unsigned integer that specifies the time
16111611+ interval (in seconds) that the resource record may be
16121612+ cached before it should be discarded. Zero values are
16131613+ interpreted to mean that the RR can only be used for the
16141614+ transaction in progress, and should not be cached.
16151615+16161616+16171617+16181618+16191619+16201620+Mockapetris [Page 29]
16211621+16221622+RFC 1035 Domain Implementation and Specification November 1987
16231623+16241624+16251625+RDLENGTH an unsigned 16 bit integer that specifies the length in
16261626+ octets of the RDATA field.
16271627+16281628+RDATA a variable length string of octets that describes the
16291629+ resource. The format of this information varies
16301630+ according to the TYPE and CLASS of the resource record.
16311631+ For example, the if the TYPE is A and the CLASS is IN,
16321632+ the RDATA field is a 4 octet ARPA Internet address.
16331633+16341634+4.1.4. Message compression
16351635+16361636+In order to reduce the size of messages, the domain system utilizes a
16371637+compression scheme which eliminates the repetition of domain names in a
16381638+message. In this scheme, an entire domain name or a list of labels at
16391639+the end of a domain name is replaced with a pointer to a prior occurance
16401640+of the same name.
16411641+16421642+The pointer takes the form of a two octet sequence:
16431643+16441644+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
16451645+ | 1 1| OFFSET |
16461646+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
16471647+16481648+The first two bits are ones. This allows a pointer to be distinguished
16491649+from a label, since the label must begin with two zero bits because
16501650+labels are restricted to 63 octets or less. (The 10 and 01 combinations
16511651+are reserved for future use.) The OFFSET field specifies an offset from
16521652+the start of the message (i.e., the first octet of the ID field in the
16531653+domain header). A zero offset specifies the first byte of the ID field,
16541654+etc.
16551655+16561656+The compression scheme allows a domain name in a message to be
16571657+represented as either:
16581658+16591659+ - a sequence of labels ending in a zero octet
16601660+16611661+ - a pointer
16621662+16631663+ - a sequence of labels ending with a pointer
16641664+16651665+Pointers can only be used for occurances of a domain name where the
16661666+format is not class specific. If this were not the case, a name server
16671667+or resolver would be required to know the format of all RRs it handled.
16681668+As yet, there are no such cases, but they may occur in future RDATA
16691669+formats.
16701670+16711671+If a domain name is contained in a part of the message subject to a
16721672+length field (such as the RDATA section of an RR), and compression is
16731673+16741674+16751675+16761676+Mockapetris [Page 30]
16771677+16781678+RFC 1035 Domain Implementation and Specification November 1987
16791679+16801680+16811681+used, the length of the compressed name is used in the length
16821682+calculation, rather than the length of the expanded name.
16831683+16841684+Programs are free to avoid using pointers in messages they generate,
16851685+although this will reduce datagram capacity, and may cause truncation.
16861686+However all programs are required to understand arriving messages that
16871687+contain pointers.
16881688+16891689+For example, a datagram might need to use the domain names F.ISI.ARPA,
16901690+FOO.F.ISI.ARPA, ARPA, and the root. Ignoring the other fields of the
16911691+message, these domain names might be represented as:
16921692+16931693+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
16941694+ 20 | 1 | F |
16951695+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
16961696+ 22 | 3 | I |
16971697+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
16981698+ 24 | S | I |
16991699+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17001700+ 26 | 4 | A |
17011701+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17021702+ 28 | R | P |
17031703+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17041704+ 30 | A | 0 |
17051705+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17061706+17071707+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17081708+ 40 | 3 | F |
17091709+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17101710+ 42 | O | O |
17111711+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17121712+ 44 | 1 1| 20 |
17131713+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17141714+17151715+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17161716+ 64 | 1 1| 26 |
17171717+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17181718+17191719+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17201720+ 92 | 0 | |
17211721+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
17221722+17231723+The domain name for F.ISI.ARPA is shown at offset 20. The domain name
17241724+FOO.F.ISI.ARPA is shown at offset 40; this definition uses a pointer to
17251725+concatenate a label for FOO to the previously defined F.ISI.ARPA. The
17261726+domain name ARPA is defined at offset 64 using a pointer to the ARPA
17271727+component of the name F.ISI.ARPA at 20; note that this pointer relies on
17281728+ARPA being the last label in the string at 20. The root domain name is
17291729+17301730+17311731+17321732+Mockapetris [Page 31]
17331733+17341734+RFC 1035 Domain Implementation and Specification November 1987
17351735+17361736+17371737+defined by a single octet of zeros at 92; the root domain name has no
17381738+labels.
17391739+17401740+4.2. Transport
17411741+17421742+The DNS assumes that messages will be transmitted as datagrams or in a
17431743+byte stream carried by a virtual circuit. While virtual circuits can be
17441744+used for any DNS activity, datagrams are preferred for queries due to
17451745+their lower overhead and better performance. Zone refresh activities
17461746+must use virtual circuits because of the need for reliable transfer.
17471747+17481748+The Internet supports name server access using TCP [RFC-793] on server
17491749+port 53 (decimal) as well as datagram access using UDP [RFC-768] on UDP
17501750+port 53 (decimal).
17511751+17521752+4.2.1. UDP usage
17531753+17541754+Messages sent using UDP user server port 53 (decimal).
17551755+17561756+Messages carried by UDP are restricted to 512 bytes (not counting the IP
17571757+or UDP headers). Longer messages are truncated and the TC bit is set in
17581758+the header.
17591759+17601760+UDP is not acceptable for zone transfers, but is the recommended method
17611761+for standard queries in the Internet. Queries sent using UDP may be
17621762+lost, and hence a retransmission strategy is required. Queries or their
17631763+responses may be reordered by the network, or by processing in name
17641764+servers, so resolvers should not depend on them being returned in order.
17651765+17661766+The optimal UDP retransmission policy will vary with performance of the
17671767+Internet and the needs of the client, but the following are recommended:
17681768+17691769+ - The client should try other servers and server addresses
17701770+ before repeating a query to a specific address of a server.
17711771+17721772+ - The retransmission interval should be based on prior
17731773+ statistics if possible. Too aggressive retransmission can
17741774+ easily slow responses for the community at large. Depending
17751775+ on how well connected the client is to its expected servers,
17761776+ the minimum retransmission interval should be 2-5 seconds.
17771777+17781778+More suggestions on server selection and retransmission policy can be
17791779+found in the resolver section of this memo.
17801780+17811781+4.2.2. TCP usage
17821782+17831783+Messages sent over TCP connections use server port 53 (decimal). The
17841784+message is prefixed with a two byte length field which gives the message
17851785+17861786+17871787+17881788+Mockapetris [Page 32]
17891789+17901790+RFC 1035 Domain Implementation and Specification November 1987
17911791+17921792+17931793+length, excluding the two byte length field. This length field allows
17941794+the low-level processing to assemble a complete message before beginning
17951795+to parse it.
17961796+17971797+Several connection management policies are recommended:
17981798+17991799+ - The server should not block other activities waiting for TCP
18001800+ data.
18011801+18021802+ - The server should support multiple connections.
18031803+18041804+ - The server should assume that the client will initiate
18051805+ connection closing, and should delay closing its end of the
18061806+ connection until all outstanding client requests have been
18071807+ satisfied.
18081808+18091809+ - If the server needs to close a dormant connection to reclaim
18101810+ resources, it should wait until the connection has been idle
18111811+ for a period on the order of two minutes. In particular, the
18121812+ server should allow the SOA and AXFR request sequence (which
18131813+ begins a refresh operation) to be made on a single connection.
18141814+ Since the server would be unable to answer queries anyway, a
18151815+ unilateral close or reset may be used instead of a graceful
18161816+ close.
18171817+18181818+5. MASTER FILES
18191819+18201820+Master files are text files that contain RRs in text form. Since the
18211821+contents of a zone can be expressed in the form of a list of RRs a
18221822+master file is most often used to define a zone, though it can be used
18231823+to list a cache's contents. Hence, this section first discusses the
18241824+format of RRs in a master file, and then the special considerations when
18251825+a master file is used to create a zone in some name server.
18261826+18271827+5.1. Format
18281828+18291829+The format of these files is a sequence of entries. Entries are
18301830+predominantly line-oriented, though parentheses can be used to continue
18311831+a list of items across a line boundary, and text literals can contain
18321832+CRLF within the text. Any combination of tabs and spaces act as a
18331833+delimiter between the separate items that make up an entry. The end of
18341834+any line in the master file can end with a comment. The comment starts
18351835+with a ";" (semicolon).
18361836+18371837+The following entries are defined:
18381838+18391839+ <blank>[<comment>]
18401840+18411841+18421842+18431843+18441844+Mockapetris [Page 33]
18451845+18461846+RFC 1035 Domain Implementation and Specification November 1987
18471847+18481848+18491849+ $ORIGIN <domain-name> [<comment>]
18501850+18511851+ $INCLUDE <file-name> [<domain-name>] [<comment>]
18521852+18531853+ <domain-name><rr> [<comment>]
18541854+18551855+ <blank><rr> [<comment>]
18561856+18571857+Blank lines, with or without comments, are allowed anywhere in the file.
18581858+18591859+Two control entries are defined: $ORIGIN and $INCLUDE. $ORIGIN is
18601860+followed by a domain name, and resets the current origin for relative
18611861+domain names to the stated name. $INCLUDE inserts the named file into
18621862+the current file, and may optionally specify a domain name that sets the
18631863+relative domain name origin for the included file. $INCLUDE may also
18641864+have a comment. Note that a $INCLUDE entry never changes the relative
18651865+origin of the parent file, regardless of changes to the relative origin
18661866+made within the included file.
18671867+18681868+The last two forms represent RRs. If an entry for an RR begins with a
18691869+blank, then the RR is assumed to be owned by the last stated owner. If
18701870+an RR entry begins with a <domain-name>, then the owner name is reset.
18711871+18721872+<rr> contents take one of the following forms:
18731873+18741874+ [<TTL>] [<class>] <type> <RDATA>
18751875+18761876+ [<class>] [<TTL>] <type> <RDATA>
18771877+18781878+The RR begins with optional TTL and class fields, followed by a type and
18791879+RDATA field appropriate to the type and class. Class and type use the
18801880+standard mnemonics, TTL is a decimal integer. Omitted class and TTL
18811881+values are default to the last explicitly stated values. Since type and
18821882+class mnemonics are disjoint, the parse is unique. (Note that this
18831883+order is different from the order used in examples and the order used in
18841884+the actual RRs; the given order allows easier parsing and defaulting.)
18851885+18861886+<domain-name>s make up a large share of the data in the master file.
18871887+The labels in the domain name are expressed as character strings and
18881888+separated by dots. Quoting conventions allow arbitrary characters to be
18891889+stored in domain names. Domain names that end in a dot are called
18901890+absolute, and are taken as complete. Domain names which do not end in a
18911891+dot are called relative; the actual domain name is the concatenation of
18921892+the relative part with an origin specified in a $ORIGIN, $INCLUDE, or as
18931893+an argument to the master file loading routine. A relative name is an
18941894+error when no origin is available.
18951895+18961896+18971897+18981898+18991899+19001900+Mockapetris [Page 34]
19011901+19021902+RFC 1035 Domain Implementation and Specification November 1987
19031903+19041904+19051905+<character-string> is expressed in one or two ways: as a contiguous set
19061906+of characters without interior spaces, or as a string beginning with a "
19071907+and ending with a ". Inside a " delimited string any character can
19081908+occur, except for a " itself, which must be quoted using \ (back slash).
19091909+19101910+Because these files are text files several special encodings are
19111911+necessary to allow arbitrary data to be loaded. In particular:
19121912+19131913+ of the root.
19141914+19151915+@ A free standing @ is used to denote the current origin.
19161916+19171917+\X where X is any character other than a digit (0-9), is
19181918+ used to quote that character so that its special meaning
19191919+ does not apply. For example, "\." can be used to place
19201920+ a dot character in a label.
19211921+19221922+\DDD where each D is a digit is the octet corresponding to
19231923+ the decimal number described by DDD. The resulting
19241924+ octet is assumed to be text and is not checked for
19251925+ special meaning.
19261926+19271927+( ) Parentheses are used to group data that crosses a line
19281928+ boundary. In effect, line terminations are not
19291929+ recognized within parentheses.
19301930+19311931+; Semicolon is used to start a comment; the remainder of
19321932+ the line is ignored.
19331933+19341934+5.2. Use of master files to define zones
19351935+19361936+When a master file is used to load a zone, the operation should be
19371937+suppressed if any errors are encountered in the master file. The
19381938+rationale for this is that a single error can have widespread
19391939+consequences. For example, suppose that the RRs defining a delegation
19401940+have syntax errors; then the server will return authoritative name
19411941+errors for all names in the subzone (except in the case where the
19421942+subzone is also present on the server).
19431943+19441944+Several other validity checks that should be performed in addition to
19451945+insuring that the file is syntactically correct:
19461946+19471947+ 1. All RRs in the file should have the same class.
19481948+19491949+ 2. Exactly one SOA RR should be present at the top of the zone.
19501950+19511951+ 3. If delegations are present and glue information is required,
19521952+ it should be present.
19531953+19541954+19551955+19561956+Mockapetris [Page 35]
19571957+19581958+RFC 1035 Domain Implementation and Specification November 1987
19591959+19601960+19611961+ 4. Information present outside of the authoritative nodes in the
19621962+ zone should be glue information, rather than the result of an
19631963+ origin or similar error.
19641964+19651965+5.3. Master file example
19661966+19671967+The following is an example file which might be used to define the
19681968+ISI.EDU zone.and is loaded with an origin of ISI.EDU:
19691969+19701970+@ IN SOA VENERA Action\.domains (
19711971+ 20 ; SERIAL
19721972+ 7200 ; REFRESH
19731973+ 600 ; RETRY
19741974+ 3600000; EXPIRE
19751975+ 60) ; MINIMUM
19761976+19771977+ NS A.ISI.EDU.
19781978+ NS VENERA
19791979+ NS VAXA
19801980+ MX 10 VENERA
19811981+ MX 20 VAXA
19821982+19831983+A A 26.3.0.103
19841984+19851985+VENERA A 10.1.0.52
19861986+ A 128.9.0.32
19871987+19881988+VAXA A 10.2.0.27
19891989+ A 128.9.0.33
19901990+19911991+19921992+$INCLUDE <SUBSYS>ISI-MAILBOXES.TXT
19931993+19941994+Where the file <SUBSYS>ISI-MAILBOXES.TXT is:
19951995+19961996+ MOE MB A.ISI.EDU.
19971997+ LARRY MB A.ISI.EDU.
19981998+ CURLEY MB A.ISI.EDU.
19991999+ STOOGES MG MOE
20002000+ MG LARRY
20012001+ MG CURLEY
20022002+20032003+Note the use of the \ character in the SOA RR to specify the responsible
20042004+person mailbox "Action.domains@E.ISI.EDU".
20052005+20062006+20072007+20082008+20092009+20102010+20112011+20122012+Mockapetris [Page 36]
20132013+20142014+RFC 1035 Domain Implementation and Specification November 1987
20152015+20162016+20172017+6. NAME SERVER IMPLEMENTATION
20182018+20192019+6.1. Architecture
20202020+20212021+The optimal structure for the name server will depend on the host
20222022+operating system and whether the name server is integrated with resolver
20232023+operations, either by supporting recursive service, or by sharing its
20242024+database with a resolver. This section discusses implementation
20252025+considerations for a name server which shares a database with a
20262026+resolver, but most of these concerns are present in any name server.
20272027+20282028+6.1.1. Control
20292029+20302030+A name server must employ multiple concurrent activities, whether they
20312031+are implemented as separate tasks in the host's OS or multiplexing
20322032+inside a single name server program. It is simply not acceptable for a
20332033+name server to block the service of UDP requests while it waits for TCP
20342034+data for refreshing or query activities. Similarly, a name server
20352035+should not attempt to provide recursive service without processing such
20362036+requests in parallel, though it may choose to serialize requests from a
20372037+single client, or to regard identical requests from the same client as
20382038+duplicates. A name server should not substantially delay requests while
20392039+it reloads a zone from master files or while it incorporates a newly
20402040+refreshed zone into its database.
20412041+20422042+6.1.2. Database
20432043+20442044+While name server implementations are free to use any internal data
20452045+structures they choose, the suggested structure consists of three major
20462046+parts:
20472047+20482048+ - A "catalog" data structure which lists the zones available to
20492049+ this server, and a "pointer" to the zone data structure. The
20502050+ main purpose of this structure is to find the nearest ancestor
20512051+ zone, if any, for arriving standard queries.
20522052+20532053+ - Separate data structures for each of the zones held by the
20542054+ name server.
20552055+20562056+ - A data structure for cached data. (or perhaps separate caches
20572057+ for different classes)
20582058+20592059+All of these data structures can be implemented an identical tree
20602060+structure format, with different data chained off the nodes in different
20612061+parts: in the catalog the data is pointers to zones, while in the zone
20622062+and cache data structures, the data will be RRs. In designing the tree
20632063+framework the designer should recognize that query processing will need
20642064+to traverse the tree using case-insensitive label comparisons; and that
20652065+20662066+20672067+20682068+Mockapetris [Page 37]
20692069+20702070+RFC 1035 Domain Implementation and Specification November 1987
20712071+20722072+20732073+in real data, a few nodes have a very high branching factor (100-1000 or
20742074+more), but the vast majority have a very low branching factor (0-1).
20752075+20762076+One way to solve the case problem is to store the labels for each node
20772077+in two pieces: a standardized-case representation of the label where all
20782078+ASCII characters are in a single case, together with a bit mask that
20792079+denotes which characters are actually of a different case. The
20802080+branching factor diversity can be handled using a simple linked list for
20812081+a node until the branching factor exceeds some threshold, and
20822082+transitioning to a hash structure after the threshold is exceeded. In
20832083+any case, hash structures used to store tree sections must insure that
20842084+hash functions and procedures preserve the casing conventions of the
20852085+DNS.
20862086+20872087+The use of separate structures for the different parts of the database
20882088+is motivated by several factors:
20892089+20902090+ - The catalog structure can be an almost static structure that
20912091+ need change only when the system administrator changes the
20922092+ zones supported by the server. This structure can also be
20932093+ used to store parameters used to control refreshing
20942094+ activities.
20952095+20962096+ - The individual data structures for zones allow a zone to be
20972097+ replaced simply by changing a pointer in the catalog. Zone
20982098+ refresh operations can build a new structure and, when
20992099+ complete, splice it into the database via a simple pointer
21002100+ replacement. It is very important that when a zone is
21012101+ refreshed, queries should not use old and new data
21022102+ simultaneously.
21032103+21042104+ - With the proper search procedures, authoritative data in zones
21052105+ will always "hide", and hence take precedence over, cached
21062106+ data.
21072107+21082108+ - Errors in zone definitions that cause overlapping zones, etc.,
21092109+ may cause erroneous responses to queries, but problem
21102110+ determination is simplified, and the contents of one "bad"
21112111+ zone can't corrupt another.
21122112+21132113+ - Since the cache is most frequently updated, it is most
21142114+ vulnerable to corruption during system restarts. It can also
21152115+ become full of expired RR data. In either case, it can easily
21162116+ be discarded without disturbing zone data.
21172117+21182118+A major aspect of database design is selecting a structure which allows
21192119+the name server to deal with crashes of the name server's host. State
21202120+information which a name server should save across system crashes
21212121+21222122+21232123+21242124+Mockapetris [Page 38]
21252125+21262126+RFC 1035 Domain Implementation and Specification November 1987
21272127+21282128+21292129+includes the catalog structure (including the state of refreshing for
21302130+each zone) and the zone data itself.
21312131+21322132+6.1.3. Time
21332133+21342134+Both the TTL data for RRs and the timing data for refreshing activities
21352135+depends on 32 bit timers in units of seconds. Inside the database,
21362136+refresh timers and TTLs for cached data conceptually "count down", while
21372137+data in the zone stays with constant TTLs.
21382138+21392139+A recommended implementation strategy is to store time in two ways: as
21402140+a relative increment and as an absolute time. One way to do this is to
21412141+use positive 32 bit numbers for one type and negative numbers for the
21422142+other. The RRs in zones use relative times; the refresh timers and
21432143+cache data use absolute times. Absolute numbers are taken with respect
21442144+to some known origin and converted to relative values when placed in the
21452145+response to a query. When an absolute TTL is negative after conversion
21462146+to relative, then the data is expired and should be ignored.
21472147+21482148+6.2. Standard query processing
21492149+21502150+The major algorithm for standard query processing is presented in
21512151+[RFC-1034].
21522152+21532153+When processing queries with QCLASS=*, or some other QCLASS which
21542154+matches multiple classes, the response should never be authoritative
21552155+unless the server can guarantee that the response covers all classes.
21562156+21572157+When composing a response, RRs which are to be inserted in the
21582158+additional section, but duplicate RRs in the answer or authority
21592159+sections, may be omitted from the additional section.
21602160+21612161+When a response is so long that truncation is required, the truncation
21622162+should start at the end of the response and work forward in the
21632163+datagram. Thus if there is any data for the authority section, the
21642164+answer section is guaranteed to be unique.
21652165+21662166+The MINIMUM value in the SOA should be used to set a floor on the TTL of
21672167+data distributed from a zone. This floor function should be done when
21682168+the data is copied into a response. This will allow future dynamic
21692169+update protocols to change the SOA MINIMUM field without ambiguous
21702170+semantics.
21712171+21722172+6.3. Zone refresh and reload processing
21732173+21742174+In spite of a server's best efforts, it may be unable to load zone data
21752175+from a master file due to syntax errors, etc., or be unable to refresh a
21762176+zone within the its expiration parameter. In this case, the name server
21772177+21782178+21792179+21802180+Mockapetris [Page 39]
21812181+21822182+RFC 1035 Domain Implementation and Specification November 1987
21832183+21842184+21852185+should answer queries as if it were not supposed to possess the zone.
21862186+21872187+If a master is sending a zone out via AXFR, and a new version is created
21882188+during the transfer, the master should continue to send the old version
21892189+if possible. In any case, it should never send part of one version and
21902190+part of another. If completion is not possible, the master should reset
21912191+the connection on which the zone transfer is taking place.
21922192+21932193+6.4. Inverse queries (Optional)
21942194+21952195+Inverse queries are an optional part of the DNS. Name servers are not
21962196+required to support any form of inverse queries. If a name server
21972197+receives an inverse query that it does not support, it returns an error
21982198+response with the "Not Implemented" error set in the header. While
21992199+inverse query support is optional, all name servers must be at least
22002200+able to return the error response.
22012201+22022202+6.4.1. The contents of inverse queries and responses Inverse
22032203+queries reverse the mappings performed by standard query operations;
22042204+while a standard query maps a domain name to a resource, an inverse
22052205+query maps a resource to a domain name. For example, a standard query
22062206+might bind a domain name to a host address; the corresponding inverse
22072207+query binds the host address to a domain name.
22082208+22092209+Inverse queries take the form of a single RR in the answer section of
22102210+the message, with an empty question section. The owner name of the
22112211+query RR and its TTL are not significant. The response carries
22122212+questions in the question section which identify all names possessing
22132213+the query RR WHICH THE NAME SERVER KNOWS. Since no name server knows
22142214+about all of the domain name space, the response can never be assumed to
22152215+be complete. Thus inverse queries are primarily useful for database
22162216+management and debugging activities. Inverse queries are NOT an
22172217+acceptable method of mapping host addresses to host names; use the IN-
22182218+ADDR.ARPA domain instead.
22192219+22202220+Where possible, name servers should provide case-insensitive comparisons
22212221+for inverse queries. Thus an inverse query asking for an MX RR of
22222222+"Venera.isi.edu" should get the same response as a query for
22232223+"VENERA.ISI.EDU"; an inverse query for HINFO RR "IBM-PC UNIX" should
22242224+produce the same result as an inverse query for "IBM-pc unix". However,
22252225+this cannot be guaranteed because name servers may possess RRs that
22262226+contain character strings but the name server does not know that the
22272227+data is character.
22282228+22292229+When a name server processes an inverse query, it either returns:
22302230+22312231+ 1. zero, one, or multiple domain names for the specified
22322232+ resource as QNAMEs in the question section
22332233+22342234+22352235+22362236+Mockapetris [Page 40]
22372237+22382238+RFC 1035 Domain Implementation and Specification November 1987
22392239+22402240+22412241+ 2. an error code indicating that the name server doesn't support
22422242+ inverse mapping of the specified resource type.
22432243+22442244+When the response to an inverse query contains one or more QNAMEs, the
22452245+owner name and TTL of the RR in the answer section which defines the
22462246+inverse query is modified to exactly match an RR found at the first
22472247+QNAME.
22482248+22492249+RRs returned in the inverse queries cannot be cached using the same
22502250+mechanism as is used for the replies to standard queries. One reason
22512251+for this is that a name might have multiple RRs of the same type, and
22522252+only one would appear. For example, an inverse query for a single
22532253+address of a multiply homed host might create the impression that only
22542254+one address existed.
22552255+22562256+6.4.2. Inverse query and response example The overall structure
22572257+of an inverse query for retrieving the domain name that corresponds to
22582258+Internet address 10.1.0.52 is shown below:
22592259+22602260+ +-----------------------------------------+
22612261+ Header | OPCODE=IQUERY, ID=997 |
22622262+ +-----------------------------------------+
22632263+ Question | <empty> |
22642264+ +-----------------------------------------+
22652265+ Answer | <anyname> A IN 10.1.0.52 |
22662266+ +-----------------------------------------+
22672267+ Authority | <empty> |
22682268+ +-----------------------------------------+
22692269+ Additional | <empty> |
22702270+ +-----------------------------------------+
22712271+22722272+This query asks for a question whose answer is the Internet style
22732273+address 10.1.0.52. Since the owner name is not known, any domain name
22742274+can be used as a placeholder (and is ignored). A single octet of zero,
22752275+signifying the root, is usually used because it minimizes the length of
22762276+the message. The TTL of the RR is not significant. The response to
22772277+this query might be:
22782278+22792279+22802280+22812281+22822282+22832283+22842284+22852285+22862286+22872287+22882288+22892289+22902290+22912291+22922292+Mockapetris [Page 41]
22932293+22942294+RFC 1035 Domain Implementation and Specification November 1987
22952295+22962296+22972297+ +-----------------------------------------+
22982298+ Header | OPCODE=RESPONSE, ID=997 |
22992299+ +-----------------------------------------+
23002300+ Question |QTYPE=A, QCLASS=IN, QNAME=VENERA.ISI.EDU |
23012301+ +-----------------------------------------+
23022302+ Answer | VENERA.ISI.EDU A IN 10.1.0.52 |
23032303+ +-----------------------------------------+
23042304+ Authority | <empty> |
23052305+ +-----------------------------------------+
23062306+ Additional | <empty> |
23072307+ +-----------------------------------------+
23082308+23092309+Note that the QTYPE in a response to an inverse query is the same as the
23102310+TYPE field in the answer section of the inverse query. Responses to
23112311+inverse queries may contain multiple questions when the inverse is not
23122312+unique. If the question section in the response is not empty, then the
23132313+RR in the answer section is modified to correspond to be an exact copy
23142314+of an RR at the first QNAME.
23152315+23162316+6.4.3. Inverse query processing
23172317+23182318+Name servers that support inverse queries can support these operations
23192319+through exhaustive searches of their databases, but this becomes
23202320+impractical as the size of the database increases. An alternative
23212321+approach is to invert the database according to the search key.
23222322+23232323+For name servers that support multiple zones and a large amount of data,
23242324+the recommended approach is separate inversions for each zone. When a
23252325+particular zone is changed during a refresh, only its inversions need to
23262326+be redone.
23272327+23282328+Support for transfer of this type of inversion may be included in future
23292329+versions of the domain system, but is not supported in this version.
23302330+23312331+6.5. Completion queries and responses
23322332+23332333+The optional completion services described in RFC-882 and RFC-883 have
23342334+been deleted. Redesigned services may become available in the future.
23352335+23362336+23372337+23382338+23392339+23402340+23412341+23422342+23432343+23442344+23452345+23462346+23472347+23482348+Mockapetris [Page 42]
23492349+23502350+RFC 1035 Domain Implementation and Specification November 1987
23512351+23522352+23532353+7. RESOLVER IMPLEMENTATION
23542354+23552355+The top levels of the recommended resolver algorithm are discussed in
23562356+[RFC-1034]. This section discusses implementation details assuming the
23572357+database structure suggested in the name server implementation section
23582358+of this memo.
23592359+23602360+7.1. Transforming a user request into a query
23612361+23622362+The first step a resolver takes is to transform the client's request,
23632363+stated in a format suitable to the local OS, into a search specification
23642364+for RRs at a specific name which match a specific QTYPE and QCLASS.
23652365+Where possible, the QTYPE and QCLASS should correspond to a single type
23662366+and a single class, because this makes the use of cached data much
23672367+simpler. The reason for this is that the presence of data of one type
23682368+in a cache doesn't confirm the existence or non-existence of data of
23692369+other types, hence the only way to be sure is to consult an
23702370+authoritative source. If QCLASS=* is used, then authoritative answers
23712371+won't be available.
23722372+23732373+Since a resolver must be able to multiplex multiple requests if it is to
23742374+perform its function efficiently, each pending request is usually
23752375+represented in some block of state information. This state block will
23762376+typically contain:
23772377+23782378+ - A timestamp indicating the time the request began.
23792379+ The timestamp is used to decide whether RRs in the database
23802380+ can be used or are out of date. This timestamp uses the
23812381+ absolute time format previously discussed for RR storage in
23822382+ zones and caches. Note that when an RRs TTL indicates a
23832383+ relative time, the RR must be timely, since it is part of a
23842384+ zone. When the RR has an absolute time, it is part of a
23852385+ cache, and the TTL of the RR is compared against the timestamp
23862386+ for the start of the request.
23872387+23882388+ Note that using the timestamp is superior to using a current
23892389+ time, since it allows RRs with TTLs of zero to be entered in
23902390+ the cache in the usual manner, but still used by the current
23912391+ request, even after intervals of many seconds due to system
23922392+ load, query retransmission timeouts, etc.
23932393+23942394+ - Some sort of parameters to limit the amount of work which will
23952395+ be performed for this request.
23962396+23972397+ The amount of work which a resolver will do in response to a
23982398+ client request must be limited to guard against errors in the
23992399+ database, such as circular CNAME references, and operational
24002400+ problems, such as network partition which prevents the
24012401+24022402+24032403+24042404+Mockapetris [Page 43]
24052405+24062406+RFC 1035 Domain Implementation and Specification November 1987
24072407+24082408+24092409+ resolver from accessing the name servers it needs. While
24102410+ local limits on the number of times a resolver will retransmit
24112411+ a particular query to a particular name server address are
24122412+ essential, the resolver should have a global per-request
24132413+ counter to limit work on a single request. The counter should
24142414+ be set to some initial value and decremented whenever the
24152415+ resolver performs any action (retransmission timeout,
24162416+ retransmission, etc.) If the counter passes zero, the request
24172417+ is terminated with a temporary error.
24182418+24192419+ Note that if the resolver structure allows one request to
24202420+ start others in parallel, such as when the need to access a
24212421+ name server for one request causes a parallel resolve for the
24222422+ name server's addresses, the spawned request should be started
24232423+ with a lower counter. This prevents circular references in
24242424+ the database from starting a chain reaction of resolver
24252425+ activity.
24262426+24272427+ - The SLIST data structure discussed in [RFC-1034].
24282428+24292429+ This structure keeps track of the state of a request if it
24302430+ must wait for answers from foreign name servers.
24312431+24322432+7.2. Sending the queries
24332433+24342434+As described in [RFC-1034], the basic task of the resolver is to
24352435+formulate a query which will answer the client's request and direct that
24362436+query to name servers which can provide the information. The resolver
24372437+will usually only have very strong hints about which servers to ask, in
24382438+the form of NS RRs, and may have to revise the query, in response to
24392439+CNAMEs, or revise the set of name servers the resolver is asking, in
24402440+response to delegation responses which point the resolver to name
24412441+servers closer to the desired information. In addition to the
24422442+information requested by the client, the resolver may have to call upon
24432443+its own services to determine the address of name servers it wishes to
24442444+contact.
24452445+24462446+In any case, the model used in this memo assumes that the resolver is
24472447+multiplexing attention between multiple requests, some from the client,
24482448+and some internally generated. Each request is represented by some
24492449+state information, and the desired behavior is that the resolver
24502450+transmit queries to name servers in a way that maximizes the probability
24512451+that the request is answered, minimizes the time that the request takes,
24522452+and avoids excessive transmissions. The key algorithm uses the state
24532453+information of the request to select the next name server address to
24542454+query, and also computes a timeout which will cause the next action
24552455+should a response not arrive. The next action will usually be a
24562456+transmission to some other server, but may be a temporary error to the
24572457+24582458+24592459+24602460+Mockapetris [Page 44]
24612461+24622462+RFC 1035 Domain Implementation and Specification November 1987
24632463+24642464+24652465+client.
24662466+24672467+The resolver always starts with a list of server names to query (SLIST).
24682468+This list will be all NS RRs which correspond to the nearest ancestor
24692469+zone that the resolver knows about. To avoid startup problems, the
24702470+resolver should have a set of default servers which it will ask should
24712471+it have no current NS RRs which are appropriate. The resolver then adds
24722472+to SLIST all of the known addresses for the name servers, and may start
24732473+parallel requests to acquire the addresses of the servers when the
24742474+resolver has the name, but no addresses, for the name servers.
24752475+24762476+To complete initialization of SLIST, the resolver attaches whatever
24772477+history information it has to the each address in SLIST. This will
24782478+usually consist of some sort of weighted averages for the response time
24792479+of the address, and the batting average of the address (i.e., how often
24802480+the address responded at all to the request). Note that this
24812481+information should be kept on a per address basis, rather than on a per
24822482+name server basis, because the response time and batting average of a
24832483+particular server may vary considerably from address to address. Note
24842484+also that this information is actually specific to a resolver address /
24852485+server address pair, so a resolver with multiple addresses may wish to
24862486+keep separate histories for each of its addresses. Part of this step
24872487+must deal with addresses which have no such history; in this case an
24882488+expected round trip time of 5-10 seconds should be the worst case, with
24892489+lower estimates for the same local network, etc.
24902490+24912491+Note that whenever a delegation is followed, the resolver algorithm
24922492+reinitializes SLIST.
24932493+24942494+The information establishes a partial ranking of the available name
24952495+server addresses. Each time an address is chosen and the state should
24962496+be altered to prevent its selection again until all other addresses have
24972497+been tried. The timeout for each transmission should be 50-100% greater
24982498+than the average predicted value to allow for variance in response.
24992499+25002500+Some fine points:
25012501+25022502+ - The resolver may encounter a situation where no addresses are
25032503+ available for any of the name servers named in SLIST, and
25042504+ where the servers in the list are precisely those which would
25052505+ normally be used to look up their own addresses. This
25062506+ situation typically occurs when the glue address RRs have a
25072507+ smaller TTL than the NS RRs marking delegation, or when the
25082508+ resolver caches the result of a NS search. The resolver
25092509+ should detect this condition and restart the search at the
25102510+ next ancestor zone, or alternatively at the root.
25112511+25122512+25132513+25142514+25152515+25162516+Mockapetris [Page 45]
25172517+25182518+RFC 1035 Domain Implementation and Specification November 1987
25192519+25202520+25212521+ - If a resolver gets a server error or other bizarre response
25222522+ from a name server, it should remove it from SLIST, and may
25232523+ wish to schedule an immediate transmission to the next
25242524+ candidate server address.
25252525+25262526+7.3. Processing responses
25272527+25282528+The first step in processing arriving response datagrams is to parse the
25292529+response. This procedure should include:
25302530+25312531+ - Check the header for reasonableness. Discard datagrams which
25322532+ are queries when responses are expected.
25332533+25342534+ - Parse the sections of the message, and insure that all RRs are
25352535+ correctly formatted.
25362536+25372537+ - As an optional step, check the TTLs of arriving data looking
25382538+ for RRs with excessively long TTLs. If a RR has an
25392539+ excessively long TTL, say greater than 1 week, either discard
25402540+ the whole response, or limit all TTLs in the response to 1
25412541+ week.
25422542+25432543+The next step is to match the response to a current resolver request.
25442544+The recommended strategy is to do a preliminary matching using the ID
25452545+field in the domain header, and then to verify that the question section
25462546+corresponds to the information currently desired. This requires that
25472547+the transmission algorithm devote several bits of the domain ID field to
25482548+a request identifier of some sort. This step has several fine points:
25492549+25502550+ - Some name servers send their responses from different
25512551+ addresses than the one used to receive the query. That is, a
25522552+ resolver cannot rely that a response will come from the same
25532553+ address which it sent the corresponding query to. This name
25542554+ server bug is typically encountered in UNIX systems.
25552555+25562556+ - If the resolver retransmits a particular request to a name
25572557+ server it should be able to use a response from any of the
25582558+ transmissions. However, if it is using the response to sample
25592559+ the round trip time to access the name server, it must be able
25602560+ to determine which transmission matches the response (and keep
25612561+ transmission times for each outgoing message), or only
25622562+ calculate round trip times based on initial transmissions.
25632563+25642564+ - A name server will occasionally not have a current copy of a
25652565+ zone which it should have according to some NS RRs. The
25662566+ resolver should simply remove the name server from the current
25672567+ SLIST, and continue.
25682568+25692569+25702570+25712571+25722572+Mockapetris [Page 46]
25732573+25742574+RFC 1035 Domain Implementation and Specification November 1987
25752575+25762576+25772577+7.4. Using the cache
25782578+25792579+In general, we expect a resolver to cache all data which it receives in
25802580+responses since it may be useful in answering future client requests.
25812581+However, there are several types of data which should not be cached:
25822582+25832583+ - When several RRs of the same type are available for a
25842584+ particular owner name, the resolver should either cache them
25852585+ all or none at all. When a response is truncated, and a
25862586+ resolver doesn't know whether it has a complete set, it should
25872587+ not cache a possibly partial set of RRs.
25882588+25892589+ - Cached data should never be used in preference to
25902590+ authoritative data, so if caching would cause this to happen
25912591+ the data should not be cached.
25922592+25932593+ - The results of an inverse query should not be cached.
25942594+25952595+ - The results of standard queries where the QNAME contains "*"
25962596+ labels if the data might be used to construct wildcards. The
25972597+ reason is that the cache does not necessarily contain existing
25982598+ RRs or zone boundary information which is necessary to
25992599+ restrict the application of the wildcard RRs.
26002600+26012601+ - RR data in responses of dubious reliability. When a resolver
26022602+ receives unsolicited responses or RR data other than that
26032603+ requested, it should discard it without caching it. The basic
26042604+ implication is that all sanity checks on a packet should be
26052605+ performed before any of it is cached.
26062606+26072607+In a similar vein, when a resolver has a set of RRs for some name in a
26082608+response, and wants to cache the RRs, it should check its cache for
26092609+already existing RRs. Depending on the circumstances, either the data
26102610+in the response or the cache is preferred, but the two should never be
26112611+combined. If the data in the response is from authoritative data in the
26122612+answer section, it is always preferred.
26132613+26142614+8. MAIL SUPPORT
26152615+26162616+The domain system defines a standard for mapping mailboxes into domain
26172617+names, and two methods for using the mailbox information to derive mail
26182618+routing information. The first method is called mail exchange binding
26192619+and the other method is mailbox binding. The mailbox encoding standard
26202620+and mail exchange binding are part of the DNS official protocol, and are
26212621+the recommended method for mail routing in the Internet. Mailbox
26222622+binding is an experimental feature which is still under development and
26232623+subject to change.
26242624+26252625+26262626+26272627+26282628+Mockapetris [Page 47]
26292629+26302630+RFC 1035 Domain Implementation and Specification November 1987
26312631+26322632+26332633+The mailbox encoding standard assumes a mailbox name of the form
26342634+"<local-part>@<mail-domain>". While the syntax allowed in each of these
26352635+sections varies substantially between the various mail internets, the
26362636+preferred syntax for the ARPA Internet is given in [RFC-822].
26372637+26382638+The DNS encodes the <local-part> as a single label, and encodes the
26392639+<mail-domain> as a domain name. The single label from the <local-part>
26402640+is prefaced to the domain name from <mail-domain> to form the domain
26412641+name corresponding to the mailbox. Thus the mailbox HOSTMASTER@SRI-
26422642+NIC.ARPA is mapped into the domain name HOSTMASTER.SRI-NIC.ARPA. If the
26432643+<local-part> contains dots or other special characters, its
26442644+representation in a master file will require the use of backslash
26452645+quoting to ensure that the domain name is properly encoded. For
26462646+example, the mailbox Action.domains@ISI.EDU would be represented as
26472647+Action\.domains.ISI.EDU.
26482648+26492649+8.1. Mail exchange binding
26502650+26512651+Mail exchange binding uses the <mail-domain> part of a mailbox
26522652+specification to determine where mail should be sent. The <local-part>
26532653+is not even consulted. [RFC-974] specifies this method in detail, and
26542654+should be consulted before attempting to use mail exchange support.
26552655+26562656+One of the advantages of this method is that it decouples mail
26572657+destination naming from the hosts used to support mail service, at the
26582658+cost of another layer of indirection in the lookup function. However,
26592659+the addition layer should eliminate the need for complicated "%", "!",
26602660+etc encodings in <local-part>.
26612661+26622662+The essence of the method is that the <mail-domain> is used as a domain
26632663+name to locate type MX RRs which list hosts willing to accept mail for
26642664+<mail-domain>, together with preference values which rank the hosts
26652665+according to an order specified by the administrators for <mail-domain>.
26662666+26672667+In this memo, the <mail-domain> ISI.EDU is used in examples, together
26682668+with the hosts VENERA.ISI.EDU and VAXA.ISI.EDU as mail exchanges for
26692669+ISI.EDU. If a mailer had a message for Mockapetris@ISI.EDU, it would
26702670+route it by looking up MX RRs for ISI.EDU. The MX RRs at ISI.EDU name
26712671+VENERA.ISI.EDU and VAXA.ISI.EDU, and type A queries can find the host
26722672+addresses.
26732673+26742674+8.2. Mailbox binding (Experimental)
26752675+26762676+In mailbox binding, the mailer uses the entire mail destination
26772677+specification to construct a domain name. The encoded domain name for
26782678+the mailbox is used as the QNAME field in a QTYPE=MAILB query.
26792679+26802680+Several outcomes are possible for this query:
26812681+26822682+26832683+26842684+Mockapetris [Page 48]
26852685+26862686+RFC 1035 Domain Implementation and Specification November 1987
26872687+26882688+26892689+ 1. The query can return a name error indicating that the mailbox
26902690+ does not exist as a domain name.
26912691+26922692+ In the long term, this would indicate that the specified
26932693+ mailbox doesn't exist. However, until the use of mailbox
26942694+ binding is universal, this error condition should be
26952695+ interpreted to mean that the organization identified by the
26962696+ global part does not support mailbox binding. The
26972697+ appropriate procedure is to revert to exchange binding at
26982698+ this point.
26992699+27002700+ 2. The query can return a Mail Rename (MR) RR.
27012701+27022702+ The MR RR carries new mailbox specification in its RDATA
27032703+ field. The mailer should replace the old mailbox with the
27042704+ new one and retry the operation.
27052705+27062706+ 3. The query can return a MB RR.
27072707+27082708+ The MB RR carries a domain name for a host in its RDATA
27092709+ field. The mailer should deliver the message to that host
27102710+ via whatever protocol is applicable, e.g., b,SMTP.
27112711+27122712+ 4. The query can return one or more Mail Group (MG) RRs.
27132713+27142714+ This condition means that the mailbox was actually a mailing
27152715+ list or mail group, rather than a single mailbox. Each MG RR
27162716+ has a RDATA field that identifies a mailbox that is a member
27172717+ of the group. The mailer should deliver a copy of the
27182718+ message to each member.
27192719+27202720+ 5. The query can return a MB RR as well as one or more MG RRs.
27212721+27222722+ This condition means the the mailbox was actually a mailing
27232723+ list. The mailer can either deliver the message to the host
27242724+ specified by the MB RR, which will in turn do the delivery to
27252725+ all members, or the mailer can use the MG RRs to do the
27262726+ expansion itself.
27272727+27282728+In any of these cases, the response may include a Mail Information
27292729+(MINFO) RR. This RR is usually associated with a mail group, but is
27302730+legal with a MB. The MINFO RR identifies two mailboxes. One of these
27312731+identifies a responsible person for the original mailbox name. This
27322732+mailbox should be used for requests to be added to a mail group, etc.
27332733+The second mailbox name in the MINFO RR identifies a mailbox that should
27342734+receive error messages for mail failures. This is particularly
27352735+appropriate for mailing lists when errors in member names should be
27362736+reported to a person other than the one who sends a message to the list.
27372737+27382738+27392739+27402740+Mockapetris [Page 49]
27412741+27422742+RFC 1035 Domain Implementation and Specification November 1987
27432743+27442744+27452745+New fields may be added to this RR in the future.
27462746+27472747+27482748+9. REFERENCES and BIBLIOGRAPHY
27492749+27502750+[Dyer 87] S. Dyer, F. Hsu, "Hesiod", Project Athena
27512751+ Technical Plan - Name Service, April 1987, version 1.9.
27522752+27532753+ Describes the fundamentals of the Hesiod name service.
27542754+27552755+[IEN-116] J. Postel, "Internet Name Server", IEN-116,
27562756+ USC/Information Sciences Institute, August 1979.
27572757+27582758+ A name service obsoleted by the Domain Name System, but
27592759+ still in use.
27602760+27612761+[Quarterman 86] J. Quarterman, and J. Hoskins, "Notable Computer Networks",
27622762+ Communications of the ACM, October 1986, volume 29, number
27632763+ 10.
27642764+27652765+[RFC-742] K. Harrenstien, "NAME/FINGER", RFC-742, Network
27662766+ Information Center, SRI International, December 1977.
27672767+27682768+[RFC-768] J. Postel, "User Datagram Protocol", RFC-768,
27692769+ USC/Information Sciences Institute, August 1980.
27702770+27712771+[RFC-793] J. Postel, "Transmission Control Protocol", RFC-793,
27722772+ USC/Information Sciences Institute, September 1981.
27732773+27742774+[RFC-799] D. Mills, "Internet Name Domains", RFC-799, COMSAT,
27752775+ September 1981.
27762776+27772777+ Suggests introduction of a hierarchy in place of a flat
27782778+ name space for the Internet.
27792779+27802780+[RFC-805] J. Postel, "Computer Mail Meeting Notes", RFC-805,
27812781+ USC/Information Sciences Institute, February 1982.
27822782+27832783+[RFC-810] E. Feinler, K. Harrenstien, Z. Su, and V. White, "DOD
27842784+ Internet Host Table Specification", RFC-810, Network
27852785+ Information Center, SRI International, March 1982.
27862786+27872787+ Obsolete. See RFC-952.
27882788+27892789+[RFC-811] K. Harrenstien, V. White, and E. Feinler, "Hostnames
27902790+ Server", RFC-811, Network Information Center, SRI
27912791+ International, March 1982.
27922792+27932793+27942794+27952795+27962796+Mockapetris [Page 50]
27972797+27982798+RFC 1035 Domain Implementation and Specification November 1987
27992799+28002800+28012801+ Obsolete. See RFC-953.
28022802+28032803+[RFC-812] K. Harrenstien, and V. White, "NICNAME/WHOIS", RFC-812,
28042804+ Network Information Center, SRI International, March
28052805+ 1982.
28062806+28072807+[RFC-819] Z. Su, and J. Postel, "The Domain Naming Convention for
28082808+ Internet User Applications", RFC-819, Network
28092809+ Information Center, SRI International, August 1982.
28102810+28112811+ Early thoughts on the design of the domain system.
28122812+ Current implementation is completely different.
28132813+28142814+[RFC-821] J. Postel, "Simple Mail Transfer Protocol", RFC-821,
28152815+ USC/Information Sciences Institute, August 1980.
28162816+28172817+[RFC-830] Z. Su, "A Distributed System for Internet Name Service",
28182818+ RFC-830, Network Information Center, SRI International,
28192819+ October 1982.
28202820+28212821+ Early thoughts on the design of the domain system.
28222822+ Current implementation is completely different.
28232823+28242824+[RFC-882] P. Mockapetris, "Domain names - Concepts and
28252825+ Facilities," RFC-882, USC/Information Sciences
28262826+ Institute, November 1983.
28272827+28282828+ Superceeded by this memo.
28292829+28302830+[RFC-883] P. Mockapetris, "Domain names - Implementation and
28312831+ Specification," RFC-883, USC/Information Sciences
28322832+ Institute, November 1983.
28332833+28342834+ Superceeded by this memo.
28352835+28362836+[RFC-920] J. Postel and J. Reynolds, "Domain Requirements",
28372837+ RFC-920, USC/Information Sciences Institute,
28382838+ October 1984.
28392839+28402840+ Explains the naming scheme for top level domains.
28412841+28422842+[RFC-952] K. Harrenstien, M. Stahl, E. Feinler, "DoD Internet Host
28432843+ Table Specification", RFC-952, SRI, October 1985.
28442844+28452845+ Specifies the format of HOSTS.TXT, the host/address
28462846+ table replaced by the DNS.
28472847+28482848+28492849+28502850+28512851+28522852+Mockapetris [Page 51]
28532853+28542854+RFC 1035 Domain Implementation and Specification November 1987
28552855+28562856+28572857+[RFC-953] K. Harrenstien, M. Stahl, E. Feinler, "HOSTNAME Server",
28582858+ RFC-953, SRI, October 1985.
28592859+28602860+ This RFC contains the official specification of the
28612861+ hostname server protocol, which is obsoleted by the DNS.
28622862+ This TCP based protocol accesses information stored in
28632863+ the RFC-952 format, and is used to obtain copies of the
28642864+ host table.
28652865+28662866+[RFC-973] P. Mockapetris, "Domain System Changes and
28672867+ Observations", RFC-973, USC/Information Sciences
28682868+ Institute, January 1986.
28692869+28702870+ Describes changes to RFC-882 and RFC-883 and reasons for
28712871+ them.
28722872+28732873+[RFC-974] C. Partridge, "Mail routing and the domain system",
28742874+ RFC-974, CSNET CIC BBN Labs, January 1986.
28752875+28762876+ Describes the transition from HOSTS.TXT based mail
28772877+ addressing to the more powerful MX system used with the
28782878+ domain system.
28792879+28802880+[RFC-1001] NetBIOS Working Group, "Protocol standard for a NetBIOS
28812881+ service on a TCP/UDP transport: Concepts and Methods",
28822882+ RFC-1001, March 1987.
28832883+28842884+ This RFC and RFC-1002 are a preliminary design for
28852885+ NETBIOS on top of TCP/IP which proposes to base NetBIOS
28862886+ name service on top of the DNS.
28872887+28882888+[RFC-1002] NetBIOS Working Group, "Protocol standard for a NetBIOS
28892889+ service on a TCP/UDP transport: Detailed
28902890+ Specifications", RFC-1002, March 1987.
28912891+28922892+[RFC-1010] J. Reynolds, and J. Postel, "Assigned Numbers", RFC-1010,
28932893+ USC/Information Sciences Institute, May 1987.
28942894+28952895+ Contains socket numbers and mnemonics for host names,
28962896+ operating systems, etc.
28972897+28982898+[RFC-1031] W. Lazear, "MILNET Name Domain Transition", RFC-1031,
28992899+ November 1987.
29002900+29012901+ Describes a plan for converting the MILNET to the DNS.
29022902+29032903+[RFC-1032] M. Stahl, "Establishing a Domain - Guidelines for
29042904+ Administrators", RFC-1032, November 1987.
29052905+29062906+29072907+29082908+Mockapetris [Page 52]
29092909+29102910+RFC 1035 Domain Implementation and Specification November 1987
29112911+29122912+29132913+ Describes the registration policies used by the NIC to
29142914+ administer the top level domains and delegate subzones.
29152915+29162916+[RFC-1033] M. Lottor, "Domain Administrators Operations Guide",
29172917+ RFC-1033, November 1987.
29182918+29192919+ A cookbook for domain administrators.
29202920+29212921+[Solomon 82] M. Solomon, L. Landweber, and D. Neuhengen, "The CSNET
29222922+ Name Server", Computer Networks, vol 6, nr 3, July 1982.
29232923+29242924+ Describes a name service for CSNET which is independent
29252925+ from the DNS and DNS use in the CSNET.
29262926+29272927+29282928+29292929+29302930+29312931+29322932+29332933+29342934+29352935+29362936+29372937+29382938+29392939+29402940+29412941+29422942+29432943+29442944+29452945+29462946+29472947+29482948+29492949+29502950+29512951+29522952+29532953+29542954+29552955+29562956+29572957+29582958+29592959+29602960+29612961+29622962+29632963+29642964+Mockapetris [Page 53]
29652965+29662966+RFC 1035 Domain Implementation and Specification November 1987
29672967+29682968+29692969+Index
29702970+29712971+ * 13
29722972+29732973+ ; 33, 35
29742974+29752975+ <character-string> 35
29762976+ <domain-name> 34
29772977+29782978+ @ 35
29792979+29802980+ \ 35
29812981+29822982+ A 12
29832983+29842984+ Byte order 8
29852985+29862986+ CH 13
29872987+ Character case 9
29882988+ CLASS 11
29892989+ CNAME 12
29902990+ Completion 42
29912991+ CS 13
29922992+29932993+ Hesiod 13
29942994+ HINFO 12
29952995+ HS 13
29962996+29972997+ IN 13
29982998+ IN-ADDR.ARPA domain 22
29992999+ Inverse queries 40
30003000+30013001+ Mailbox names 47
30023002+ MB 12
30033003+ MD 12
30043004+ MF 12
30053005+ MG 12
30063006+ MINFO 12
30073007+ MINIMUM 20
30083008+ MR 12
30093009+ MX 12
30103010+30113011+ NS 12
30123012+ NULL 12
30133013+30143014+ Port numbers 32
30153015+ Primary server 5
30163016+ PTR 12, 18
30173017+30183018+30193019+30203020+Mockapetris [Page 54]
30213021+30223022+RFC 1035 Domain Implementation and Specification November 1987
30233023+30243024+30253025+ QCLASS 13
30263026+ QTYPE 12
30273027+30283028+ RDATA 12
30293029+ RDLENGTH 11
30303030+30313031+ Secondary server 5
30323032+ SOA 12
30333033+ Stub resolvers 7
30343034+30353035+ TCP 32
30363036+ TXT 12
30373037+ TYPE 11
30383038+30393039+ UDP 32
30403040+30413041+ WKS 12
30423042+30433043+30443044+30453045+30463046+30473047+30483048+30493049+30503050+30513051+30523052+30533053+30543054+30553055+30563056+30573057+30583058+30593059+30603060+30613061+30623062+30633063+30643064+30653065+30663066+30673067+30683068+30693069+30703070+30713071+30723072+30733073+30743074+30753075+30763076+Mockapetris [Page 55]
30773077+
+955
spec/rfc5891.txt
···11+22+33+44+55+66+77+Internet Engineering Task Force (IETF) J. Klensin
88+Request for Comments: 5891 August 2010
99+Obsoletes: 3490, 3491
1010+Updates: 3492
1111+Category: Standards Track
1212+ISSN: 2070-1721
1313+1414+1515+ Internationalized Domain Names in Applications (IDNA): Protocol
1616+1717+Abstract
1818+1919+ This document is the revised protocol definition for
2020+ Internationalized Domain Names (IDNs). The rationale for changes,
2121+ the relationship to the older specification, and important
2222+ terminology are provided in other documents. This document specifies
2323+ the protocol mechanism, called Internationalized Domain Names in
2424+ Applications (IDNA), for registering and looking up IDNs in a way
2525+ that does not require changes to the DNS itself. IDNA is only meant
2626+ for processing domain names, not free text.
2727+2828+Status of This Memo
2929+3030+ This is an Internet Standards Track document.
3131+3232+ This document is a product of the Internet Engineering Task Force
3333+ (IETF). It represents the consensus of the IETF community. It has
3434+ received public review and has been approved for publication by the
3535+ Internet Engineering Steering Group (IESG). Further information on
3636+ Internet Standards is available in Section 2 of RFC 5741.
3737+3838+ Information about the current status of this document, any errata,
3939+ and how to provide feedback on it may be obtained at
4040+ http://www.rfc-editor.org/info/rfc5891.
4141+4242+4343+4444+4545+4646+4747+4848+4949+5050+5151+5252+5353+5454+5555+5656+5757+5858+Klensin Standards Track [Page 1]
5959+6060+RFC 5891 IDNA2008 Protocol August 2010
6161+6262+6363+Copyright Notice
6464+6565+ Copyright (c) 2010 IETF Trust and the persons identified as the
6666+ document authors. All rights reserved.
6767+6868+ This document is subject to BCP 78 and the IETF Trust's Legal
6969+ Provisions Relating to IETF Documents
7070+ (http://trustee.ietf.org/license-info) in effect on the date of
7171+ publication of this document. Please review these documents
7272+ carefully, as they describe your rights and restrictions with respect
7373+ to this document. Code Components extracted from this document must
7474+ include Simplified BSD License text as described in Section 4.e of
7575+ the Trust Legal Provisions and are provided without warranty as
7676+ described in the Simplified BSD License.
7777+7878+ This document may contain material from IETF Documents or IETF
7979+ Contributions published or made publicly available before November
8080+ 10, 2008. The person(s) controlling the copyright in some of this
8181+ material may not have granted the IETF Trust the right to allow
8282+ modifications of such material outside the IETF Standards Process.
8383+ Without obtaining an adequate license from the person(s) controlling
8484+ the copyright in such materials, this document may not be modified
8585+ outside the IETF Standards Process, and derivative works of it may
8686+ not be created outside the IETF Standards Process, except to format
8787+ it for publication as an RFC or to translate it into languages other
8888+ than English.
8989+9090+9191+9292+9393+9494+9595+9696+9797+9898+9999+100100+101101+102102+103103+104104+105105+106106+107107+108108+109109+110110+111111+112112+113113+114114+Klensin Standards Track [Page 2]
115115+116116+RFC 5891 IDNA2008 Protocol August 2010
117117+118118+119119+Table of Contents
120120+121121+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
122122+ 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
123123+ 3. Requirements and Applicability . . . . . . . . . . . . . . . . 5
124124+ 3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 5
125125+ 3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 5
126126+ 3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 6
127127+ 3.2.2. Non-Domain-Name Data Types Stored in the DNS . . . . . 6
128128+ 4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 6
129129+ 4.1. Input to IDNA Registration . . . . . . . . . . . . . . . . 7
130130+ 4.2. Permitted Character and Label Validation . . . . . . . . . 7
131131+ 4.2.1. Input Format . . . . . . . . . . . . . . . . . . . . . 7
132132+ 4.2.2. Rejection of Characters That Are Not Permitted . . . . 8
133133+ 4.2.3. Label Validation . . . . . . . . . . . . . . . . . . . 8
134134+ 4.2.4. Registration Validation Requirements . . . . . . . . . 9
135135+ 4.3. Registry Restrictions . . . . . . . . . . . . . . . . . . 9
136136+ 4.4. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9
137137+ 4.5. Insertion in the Zone . . . . . . . . . . . . . . . . . . 10
138138+ 5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 10
139139+ 5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 10
140140+ 5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10
141141+ 5.3. A-label Input . . . . . . . . . . . . . . . . . . . . . . 10
142142+ 5.4. Validation and Character List Testing . . . . . . . . . . 11
143143+ 5.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 13
144144+ 5.6. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 13
145145+ 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
146146+ 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
147147+ 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13
148148+ 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14
149149+ 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
150150+ 10.1. Normative References . . . . . . . . . . . . . . . . . . . 14
151151+ 10.2. Informative References . . . . . . . . . . . . . . . . . . 15
152152+ Appendix A. Summary of Major Changes from IDNA2003 . . . . . . . 17
153153+154154+155155+156156+157157+158158+159159+160160+161161+162162+163163+164164+165165+166166+167167+168168+169169+170170+Klensin Standards Track [Page 3]
171171+172172+RFC 5891 IDNA2008 Protocol August 2010
173173+174174+175175+1. Introduction
176176+177177+ This document supplies the protocol definition for Internationalized
178178+ Domain Names in Applications (IDNA), with the version specified here
179179+ known as IDNA2008. Essential definitions and terminology for
180180+ understanding this document and a road map of the collection of
181181+ documents that make up IDNA2008 appear in a separate Definitions
182182+ document [RFC5890]. Appendix A discusses the relationship between
183183+ this specification and the earlier version of IDNA (referred to here
184184+ as "IDNA2003"). The rationale for these changes, along with
185185+ considerable explanatory material and advice to zone administrators
186186+ who support IDNs, is provided in another document, known informally
187187+ in this series as the "Rationale document" [RFC5894].
188188+189189+ IDNA works by allowing applications to use certain ASCII [ASCII]
190190+ string labels (beginning with a special prefix) to represent
191191+ non-ASCII name labels. Lower-layer protocols need not be aware of
192192+ this; therefore, IDNA does not change any infrastructure. In
193193+ particular, IDNA does not depend on any changes to DNS servers,
194194+ resolvers, or DNS protocol elements, because the ASCII name service
195195+ provided by the existing DNS can be used for IDNA.
196196+197197+ IDNA applies only to a specific subset of DNS labels. The base DNS
198198+ standards [RFC1034] [RFC1035] and their various updates specify how
199199+ to combine labels into fully-qualified domain names and parse labels
200200+ out of those names.
201201+202202+ This document describes two separate protocols, one for IDN
203203+ registration (Section 4) and one for IDN lookup (Section 5). These
204204+ two protocols share some terminology, reference data, and operations.
205205+206206+2. Terminology
207207+208208+ As mentioned above, terminology used as part of the definition of
209209+ IDNA appears in the Definitions document [RFC5890]. It is worth
210210+ noting that some of this terminology overlaps with, and is consistent
211211+ with, that used in Unicode or other character set standards and the
212212+ DNS. Readers of this document are assumed to be familiar with the
213213+ associated Definitions document and with the DNS-specific terminology
214214+ in RFC 1034 [RFC1034].
215215+216216+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
217217+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
218218+ document are to be interpreted as described in BCP 14, RFC 2119
219219+ [RFC2119].
220220+221221+222222+223223+224224+225225+226226+Klensin Standards Track [Page 4]
227227+228228+RFC 5891 IDNA2008 Protocol August 2010
229229+230230+231231+3. Requirements and Applicability
232232+233233+3.1. Requirements
234234+235235+ IDNA makes the following requirements:
236236+237237+ 1. Whenever a domain name is put into a domain name slot that is not
238238+ IDNA-aware (see Section 2.3.2.6 of the Definitions document
239239+ [RFC5890]), it MUST contain only ASCII characters (i.e., its
240240+ labels must be either A-labels or NR-LDH labels), unless the DNS
241241+ application is not subject to historical recommendations for
242242+ "hostname"-style names (see RFC 1034 [RFC1034] and
243243+ Section 3.2.1).
244244+245245+ 2. Labels MUST be compared using equivalent forms: either both
246246+ A-label forms or both U-label forms. Because A-labels and
247247+ U-labels can be transformed into each other without loss of
248248+ information, these comparisons are equivalent (however, in
249249+ practice, comparison of U-labels requires first verifying that
250250+ they actually are U-labels and not just Unicode strings). A pair
251251+ of A-labels MUST be compared as case-insensitive ASCII (as with
252252+ all comparisons of ASCII DNS labels). U-labels MUST be compared
253253+ as-is, without case folding or other intermediate steps. While
254254+ it is not necessary to validate labels in order to compare them,
255255+ successful comparison does not imply validity. In many cases,
256256+ not limited to comparison, validation may be important for other
257257+ reasons and SHOULD be performed.
258258+259259+ 3. Labels being registered MUST conform to the requirements of
260260+ Section 4. Labels being looked up and the lookup process MUST
261261+ conform to the requirements of Section 5.
262262+263263+3.2. Applicability
264264+265265+ IDNA applies to all domain names in all domain name slots in
266266+ protocols except where it is explicitly excluded. It does not apply
267267+ to domain name slots that do not use the LDH syntax rules as
268268+ described in the Definitions document [RFC5890].
269269+270270+ Because it uses the DNS, IDNA applies to many protocols that were
271271+ specified before it was designed. IDNs occupying domain name slots
272272+ in those older protocols MUST be in A-label form until and unless
273273+ those protocols and their implementations are explicitly upgraded to
274274+ be aware of IDNs and to accept the U-label form. IDNs actually
275275+ appearing in DNS queries or responses MUST be A-labels.
276276+277277+278278+279279+280280+281281+282282+Klensin Standards Track [Page 5]
283283+284284+RFC 5891 IDNA2008 Protocol August 2010
285285+286286+287287+ IDNA-aware protocols and implementations MAY accept U-labels,
288288+ A-labels, or both as those particular protocols specify. IDNA is not
289289+ defined for extended label types (see RFC 2671 [RFC2671], Section 3).
290290+291291+3.2.1. DNS Resource Records
292292+293293+ IDNA applies only to domain names in the NAME and RDATA fields of DNS
294294+ resource records whose CLASS is IN. See the DNS specification
295295+ [RFC1035] for precise definitions of these terms.
296296+297297+ The application of IDNA to DNS resource records depends entirely on
298298+ the CLASS of the record, and not on the TYPE except as noted below.
299299+ This will remain true, even as new TYPEs are defined, unless a new
300300+ TYPE defines TYPE-specific rules. Special naming conventions for SRV
301301+ records (and "underscore labels" more generally) are incompatible
302302+ with IDNA coding as discussed in the Definitions document [RFC5890],
303303+ especially Section 2.3.2.3. Of course, underscore labels may be part
304304+ of a domain that uses IDN labels at higher levels in the tree.
305305+306306+3.2.2. Non-Domain-Name Data Types Stored in the DNS
307307+308308+ Although IDNA enables the representation of non-ASCII characters in
309309+ domain names, that does not imply that IDNA enables the
310310+ representation of non-ASCII characters in other data types that are
311311+ stored in domain names, specifically in the RDATA field for types
312312+ that have structured RDATA format. For example, an email address
313313+ local part is stored in a domain name in the RNAME field as part of
314314+ the RDATA of an SOA record (e.g., hostmaster@example.com would be
315315+ represented as hostmaster.example.com). IDNA does not update the
316316+ existing email standards, which allow only ASCII characters in local
317317+ parts. Even though work is in progress to define
318318+ internationalization for email addresses [RFC4952], changes to the
319319+ email address part of the SOA RDATA would require action in, or
320320+ updates to, other standards, specifically those that specify the
321321+ format of the SOA RR.
322322+323323+4. Registration Protocol
324324+325325+ This section defines the model for registering an IDN. The model is
326326+ implementation independent; any sequence of steps that produces
327327+ exactly the same result for all labels is considered a valid
328328+ implementation.
329329+330330+ Note that, while the registration (this section) and lookup protocols
331331+ (Section 5) are very similar in most respects, they are not
332332+ identical, and implementers should carefully follow the steps
333333+ described in this specification.
334334+335335+336336+337337+338338+Klensin Standards Track [Page 6]
339339+340340+RFC 5891 IDNA2008 Protocol August 2010
341341+342342+343343+4.1. Input to IDNA Registration
344344+345345+ Registration processes, especially processing by entities (often
346346+ called "registrars") who deal with registrants before the request
347347+ actually reaches the zone manager ("registry") are outside the scope
348348+ of this definition and may differ significantly depending on local
349349+ needs. By the time a string enters the IDNA registration process as
350350+ described in this specification, it MUST be in Unicode and in
351351+ Normalization Form C (NFC [Unicode-UAX15]). Entities responsible for
352352+ zone files ("registries") MUST accept only the exact string for which
353353+ registration is requested, free of any mappings or local adjustments.
354354+ They MAY accept that input in any of three forms:
355355+356356+ 1. As a pair of A-label and U-label.
357357+358358+ 2. As an A-label only.
359359+360360+ 3. As a U-label only.
361361+362362+ The first two of these forms are RECOMMENDED because the use of
363363+ A-labels avoids any possibility of ambiguity. The first is normally
364364+ preferred over the second because it permits further verification of
365365+ user intent (see Section 4.2.1).
366366+367367+4.2. Permitted Character and Label Validation
368368+369369+4.2.1. Input Format
370370+371371+ If both the U-label and A-label forms are available, the registry
372372+ MUST ensure that the A-label form is in lowercase, perform a
373373+ conversion to a U-label, perform the steps and tests described below
374374+ on that U-label, and then verify that the A-label produced by the
375375+ step in Section 4.4 matches the one provided as input. In addition,
376376+ the U-label that was provided as input and the one obtained by
377377+ conversion of the A-label MUST match exactly. If, for some reason,
378378+ these tests fail, the registration MUST be rejected.
379379+380380+ If only an A-label was provided and the conversion to a U-label is
381381+ not performed, the registry MUST still verify that the A-label is
382382+ superficially valid, i.e., that it does not violate any of the rules
383383+ of Punycode encoding [RFC3492] such as the prohibition on trailing
384384+ hyphen-minus, the requirement that all characters be ASCII, and so
385385+ on. Strings that appear to be A-labels (e.g., they start with
386386+ "xn--") and strings that are supplied to the registry in a context
387387+ reserved for A-labels (such as a field in a form to be filled out),
388388+ but that are not valid A-labels as described in this paragraph, MUST
389389+ NOT be placed in DNS zones that support IDNA.
390390+391391+392392+393393+394394+Klensin Standards Track [Page 7]
395395+396396+RFC 5891 IDNA2008 Protocol August 2010
397397+398398+399399+ If only an A-label is provided, the conversion to a U-label is not
400400+ performed, but the superficial tests described in the previous
401401+ paragraph are performed, registration procedures MAY, and usually
402402+ will, bypass the tests and actions in the balance of Section 4.2 and
403403+ in Sections 4.3 and 4.4.
404404+405405+4.2.2. Rejection of Characters That Are Not Permitted
406406+407407+ The candidate Unicode string MUST NOT contain characters that appear
408408+ in the "DISALLOWED" and "UNASSIGNED" lists specified in the Tables
409409+ document [RFC5892].
410410+411411+4.2.3. Label Validation
412412+413413+ The proposed label (in the form of a Unicode string, i.e., a string
414414+ that at least superficially appears to be a U-label) is then examined
415415+ using tests that require examination of more than one character.
416416+ Character order is considered to be the on-the-wire order. That
417417+ order may not be the same as the display order.
418418+419419+4.2.3.1. Hyphen Restrictions
420420+421421+ The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
422422+ the third and fourth character positions and MUST NOT start or end
423423+ with a "-" (hyphen).
424424+425425+4.2.3.2. Leading Combining Marks
426426+427427+ The Unicode string MUST NOT begin with a combining mark or combining
428428+ character (see The Unicode Standard, Section 2.11 [Unicode] for an
429429+ exact definition).
430430+431431+4.2.3.3. Contextual Rules
432432+433433+ The Unicode string MUST NOT contain any characters whose validity is
434434+ context-dependent, unless the validity is positively confirmed by a
435435+ contextual rule. To check this, each code point identified as
436436+ CONTEXTJ or CONTEXTO in the Tables document [RFC5892] MUST have a
437437+ non-null rule. If such a code point is missing a rule, the label is
438438+ invalid. If the rule exists but the result of applying the rule is
439439+ negative or inconclusive, the proposed label is invalid.
440440+441441+4.2.3.4. Labels Containing Characters Written Right to Left
442442+443443+ If the proposed label contains any characters from scripts that are
444444+ written from right to left, it MUST meet the Bidi criteria [RFC5893].
445445+446446+447447+448448+449449+450450+Klensin Standards Track [Page 8]
451451+452452+RFC 5891 IDNA2008 Protocol August 2010
453453+454454+455455+4.2.4. Registration Validation Requirements
456456+457457+ Strings that contain at least one non-ASCII character, have been
458458+ produced by the steps above, whose contents pass all of the tests in
459459+ Section 4.2.3, and are 63 or fewer characters long in
460460+ ASCII-compatible encoding (ACE) form (see Section 4.4), are U-labels.
461461+462462+ To summarize, tests are made in Section 4.2 for invalid characters,
463463+ invalid combinations of characters, for labels that are invalid even
464464+ if the characters they contain are valid individually, and for labels
465465+ that do not conform to the restrictions for strings containing
466466+ right-to-left characters.
467467+468468+4.3. Registry Restrictions
469469+470470+ In addition to the rules and tests above, there are many reasons why
471471+ a registry could reject a label. Registries at all levels of the
472472+ DNS, not just the top level, are expected to establish policies about
473473+ label registrations. Policies are likely to be informed by the local
474474+ languages and the scripts that are used to write them and may depend
475475+ on many factors including what characters are in the label (for
476476+ example, a label may be rejected based on other labels already
477477+ registered). See the Rationale document [RFC5894], Section 3.2, for
478478+ further discussion and recommendations about registry policies.
479479+480480+ The string produced by the steps in Section 4.2 is checked and
481481+ processed as appropriate to local registry restrictions. Application
482482+ of those registry restrictions may result in the rejection of some
483483+ labels or the application of special restrictions to others.
484484+485485+4.4. Punycode Conversion
486486+487487+ The resulting U-label is converted to an A-label (defined in Section
488488+ 2.3.2.1 of the Definitions document [RFC5890]). The A-label is the
489489+ encoding of the U-label according to the Punycode algorithm [RFC3492]
490490+ with the ACE prefix "xn--" added at the beginning of the string. The
491491+ resulting string must, of course, conform to the length limits
492492+ imposed by the DNS. This document does not update or alter the
493493+ Punycode algorithm specified in RFC 3492 in any way. RFC 3492 does
494494+ make a non-normative reference to the information about the value and
495495+ construction of the ACE prefix that appears in RFC 3490 or Nameprep
496496+ [RFC3491]. For consistency and reader convenience, IDNA2008
497497+ effectively updates that reference to point to this document. That
498498+ change does not alter the prefix itself. The prefix, "xn--", is the
499499+ same in both sets of documents.
500500+501501+502502+503503+504504+505505+506506+Klensin Standards Track [Page 9]
507507+508508+RFC 5891 IDNA2008 Protocol August 2010
509509+510510+511511+ With the exception of the maximum string length test on Punycode
512512+ output, the failure conditions identified in the Punycode encoding
513513+ procedure cannot occur if the input is a U-label as determined by the
514514+ steps in Sections 4.1 through 4.3 above.
515515+516516+4.5. Insertion in the Zone
517517+518518+ The label is registered in the DNS by inserting the A-label into a
519519+ zone.
520520+521521+5. Domain Name Lookup Protocol
522522+523523+ Lookup is different from registration and different tests are applied
524524+ on the client. Although some validity checks are necessary to avoid
525525+ serious problems with the protocol, the lookup-side tests are more
526526+ permissive and rely on the assumption that names that are present in
527527+ the DNS are valid. That assumption is, however, a weak one because
528528+ the presence of wildcards in the DNS might cause a string that is not
529529+ actually registered in the DNS to be successfully looked up.
530530+531531+5.1. Label String Input
532532+533533+ The user supplies a string in the local character set, for example,
534534+ by typing it, clicking on it, or copying and pasting it from a
535535+ resource identifier, e.g., a Uniform Resource Identifier (URI)
536536+ [RFC3986] or an Internationalized Resource Identifier (IRI)
537537+ [RFC3987], from which the domain name is extracted. Alternately,
538538+ some process not directly involving the user may read the string from
539539+ a file or obtain it in some other way. Processing in this step and
540540+ the one specified in Section 5.2 are local matters, to be
541541+ accomplished prior to actual invocation of IDNA.
542542+543543+5.2. Conversion to Unicode
544544+545545+ The string is converted from the local character set into Unicode, if
546546+ it is not already in Unicode. Depending on local needs, this
547547+ conversion may involve mapping some characters into other characters
548548+ as well as coding conversions. Those issues are discussed in the
549549+ mapping-related sections (Sections 4.2, 4.4, 6, and 7.3) of the
550550+ Rationale document [RFC5894] and in the separate Mapping document
551551+ [IDNA2008-Mapping]. The result MUST be a Unicode string in NFC form.
552552+553553+5.3. A-label Input
554554+555555+ If the input to this procedure appears to be an A-label (i.e., it
556556+ starts in "xn--", interpreted case-insensitively), the lookup
557557+ application MAY attempt to convert it to a U-label, first ensuring
558558+ that the A-label is entirely in lowercase (converting it to lowercase
559559+560560+561561+562562+Klensin Standards Track [Page 10]
563563+564564+RFC 5891 IDNA2008 Protocol August 2010
565565+566566+567567+ if necessary), and apply the tests of Section 5.4 and the conversion
568568+ of Section 5.5 to that form. If the label is converted to Unicode
569569+ (i.e., to U-label form) using the Punycode decoding algorithm, then
570570+ the processing specified in those two sections MUST be performed, and
571571+ the label MUST be rejected if the resulting label is not identical to
572572+ the original. See Section 8.1 of the Rationale document [RFC5894]
573573+ for additional discussion on this topic.
574574+575575+ Conversion from the A-label and testing that the result is a U-label
576576+ SHOULD be performed if the domain name will later be presented to the
577577+ user in native character form (this requires that the lookup
578578+ application be IDNA-aware). If those steps are not performed, the
579579+ lookup process SHOULD at least test to determine that the string is
580580+ actually an A-label, examining it for the invalid formats specified
581581+ in the Punycode decoding specification. Applications that are not
582582+ IDNA-aware will obviously omit that testing; others MAY treat the
583583+ string as opaque to avoid the additional processing at the expense of
584584+ providing less protection and information to users.
585585+586586+5.4. Validation and Character List Testing
587587+588588+ As with the registration procedure described in Section 4, the
589589+ Unicode string is checked to verify that all characters that appear
590590+ in it are valid as input to IDNA lookup processing. As discussed
591591+ above and in the Rationale document [RFC5894], the lookup check is
592592+ more liberal than the registration one. Labels that have not been
593593+ fully evaluated for conformance to the applicable rules are referred
594594+ to as "putative" labels as discussed in Section 2.3.2.1 of the
595595+ Definitions document [RFC5890]. Putative U-labels with any of the
596596+ following characteristics MUST be rejected prior to DNS lookup:
597597+598598+ o Labels that are not in NFC [Unicode-UAX15].
599599+600600+ o Labels containing "--" (two consecutive hyphens) in the third and
601601+ fourth character positions.
602602+603603+ o Labels whose first character is a combining mark (see The Unicode
604604+ Standard, Section 2.11 [Unicode]).
605605+606606+ o Labels containing prohibited code points, i.e., those that are
607607+ assigned to the "DISALLOWED" category of the Tables document
608608+ [RFC5892].
609609+610610+ o Labels containing code points that are identified in the Tables
611611+ document as "CONTEXTJ", i.e., requiring exceptional contextual
612612+ rule processing on lookup, but that do not conform to those rules.
613613+ Note that this implies that a rule must be defined, not null: a
614614+615615+616616+617617+618618+Klensin Standards Track [Page 11]
619619+620620+RFC 5891 IDNA2008 Protocol August 2010
621621+622622+623623+ character that requires a contextual rule but for which the rule
624624+ is null is treated in this step as having failed to conform to the
625625+ rule.
626626+627627+ o Labels containing code points that are identified in the Tables
628628+ document as "CONTEXTO", but for which no such rule appears in the
629629+ table of rules. Applications resolving DNS names or carrying out
630630+ equivalent operations are not required to test contextual rules
631631+ for "CONTEXTO" characters, only to verify that a rule is defined
632632+ (although they MAY make such tests to provide better protection or
633633+ give better information to the user).
634634+635635+ o Labels containing code points that are unassigned in the version
636636+ of Unicode being used by the application, i.e., in the UNASSIGNED
637637+ category of the Tables document.
638638+639639+ This requirement means that the application must use a list of
640640+ unassigned characters that is matched to the version of Unicode
641641+ that is being used for the other requirements in this section. It
642642+ is not required that the application know which version of Unicode
643643+ is being used; that information might be part of the operating
644644+ environment in which the application is running.
645645+646646+ In addition, the application SHOULD apply the following test.
647647+648648+ o Verification that the string is compliant with the requirements
649649+ for right-to-left characters specified in the Bidi document
650650+ [RFC5893].
651651+652652+ This test may be omitted in special circumstances, such as when the
653653+ lookup application knows that the conditions are enforced elsewhere,
654654+ because an attempt to look up and resolve such strings will almost
655655+ certainly lead to a DNS lookup failure except when wildcards are
656656+ present in the zone. However, applying the test is likely to give
657657+ much better information about the reason for a lookup failure --
658658+ information that may be usefully passed to the user when that is
659659+ feasible -- than DNS resolution failure information alone.
660660+661661+ For all other strings, the lookup application MUST rely on the
662662+ presence or absence of labels in the DNS to determine the validity of
663663+ those labels and the validity of the characters they contain. If
664664+ they are registered, they are presumed to be valid; if they are not,
665665+ their possible validity is not relevant. While a lookup application
666666+ may reasonably issue warnings about strings it believes may be
667667+ problematic, applications that decline to process a string that
668668+ conforms to the rules above (i.e., does not look it up in the DNS)
669669+ are not in conformance with this protocol.
670670+671671+672672+673673+674674+Klensin Standards Track [Page 12]
675675+676676+RFC 5891 IDNA2008 Protocol August 2010
677677+678678+679679+5.5. Punycode Conversion
680680+681681+ The string that has now been validated for lookup is converted to ACE
682682+ form by applying the Punycode algorithm to the string and then adding
683683+ the ACE prefix ("xn--").
684684+685685+5.6. DNS Name Resolution
686686+687687+ The A-label resulting from the conversion in Section 5.5 or supplied
688688+ directly (see Section 5.3) is combined with other labels as needed to
689689+ form a fully-qualified domain name that is then looked up in the DNS,
690690+ using normal DNS resolver procedures. The lookup can obviously
691691+ either succeed (returning information) or fail.
692692+693693+6. Security Considerations
694694+695695+ Security Considerations for this version of IDNA are described in the
696696+ Definitions document [RFC5890], except for the special issues
697697+ associated with right-to-left scripts and characters. The latter are
698698+ discussed in the Bidi document [RFC5893].
699699+700700+ In order to avoid intentional or accidental attacks from labels that
701701+ might be confused with others, special problems in rendering, and so
702702+ on, the IDNA model requires that registries exercise care and
703703+ thoughtfulness about what labels they choose to permit. That issue
704704+ is discussed in Section 4.3 of this document which, in turn, points
705705+ to a somewhat more extensive discussion in the Rationale document
706706+ [RFC5894].
707707+708708+7. IANA Considerations
709709+710710+ IANA actions for this version of IDNA are specified in the Tables
711711+ document [RFC5892] and discussed informally in the Rationale document
712712+ [RFC5894]. The components of IDNA described in this document do not
713713+ require any IANA actions.
714714+715715+8. Contributors
716716+717717+ While the listed editor held the pen, the original versions of this
718718+ document represent the joint work and conclusions of an ad hoc design
719719+ team consisting of the editor and, in alphabetic order, Harald
720720+ Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document
721721+ draws significantly on the original version of IDNA [RFC3490] both
722722+ conceptually and for specific text. This second-generation version
723723+ would not have been possible without the work that went into that
724724+ first version and especially the contributions of its authors Patrik
725725+ Faltstrom, Paul Hoffman, and Adam Costello. While Faltstrom was
726726+727727+728728+729729+730730+Klensin Standards Track [Page 13]
731731+732732+RFC 5891 IDNA2008 Protocol August 2010
733733+734734+735735+ actively involved in the creation of this version, Hoffman and
736736+ Costello were not and should not be held responsible for any errors
737737+ or omissions.
738738+739739+9. Acknowledgments
740740+741741+ This revision to IDNA would have been impossible without the
742742+ accumulated experience since RFC 3490 was published and resulting
743743+ comments and complaints of many people in the IETF, ICANN, and other
744744+ communities (too many people to list here). Nor would it have been
745745+ possible without RFC 3490 itself and the efforts of the Working Group
746746+ that defined it. Those people whose contributions are acknowledged
747747+ in RFC 3490, RFC 4690 [RFC4690], and the Rationale document [RFC5894]
748748+ were particularly important.
749749+750750+ Specific textual changes were incorporated into this document after
751751+ suggestions from the other contributors, Stephane Bortzmeyer, Vint
752752+ Cerf, Lisa Dusseault, Paul Hoffman, Kent Karlsson, James Mitchell,
753753+ Erik van der Poel, Marcos Sanz, Andrew Sullivan, Wil Tan, Ken
754754+ Whistler, Chris Wright, and other WG participants and reviewers
755755+ including Martin Duerst, James Mitchell, Subramanian Moonesamy, Peter
756756+ Saint-Andre, Margaret Wasserman, and Dan Winship who caught specific
757757+ errors and recommended corrections. Special thanks are due to Paul
758758+ Hoffman for permission to extract material to form the basis for
759759+ Appendix A from a draft document that he prepared.
760760+761761+10. References
762762+763763+10.1. Normative References
764764+765765+ [RFC1034] Mockapetris, P., "Domain names - concepts and
766766+ facilities", STD 13, RFC 1034, November 1987.
767767+768768+ [RFC1035] Mockapetris, P., "Domain names - implementation and
769769+ specification", STD 13, RFC 1035, November 1987.
770770+771771+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
772772+ Requirement Levels", BCP 14, RFC 2119, March 1997.
773773+774774+ [RFC3492] Costello, A., "Punycode: A Bootstring encoding of
775775+ Unicode for Internationalized Domain Names in
776776+ Applications (IDNA)", RFC 3492, March 2003.
777777+778778+ [RFC5890] Klensin, J., "Internationalized Domain Names for
779779+ Applications (IDNA): Definitions and Document
780780+ Framework", RFC 5890, August 2010.
781781+782782+783783+784784+785785+786786+Klensin Standards Track [Page 14]
787787+788788+RFC 5891 IDNA2008 Protocol August 2010
789789+790790+791791+ [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and
792792+ Internationalized Domain Names for Applications (IDNA)",
793793+ RFC 5892, August 2010.
794794+795795+ [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
796796+ for Internationalized Domain Names for Applications
797797+ (IDNA)", RFC 5893, August 2010.
798798+799799+ [Unicode-UAX15]
800800+ The Unicode Consortium, "Unicode Standard Annex #15:
801801+ Unicode Normalization Forms", September 2009,
802802+ <http://www.unicode.org/reports/tr15/>.
803803+804804+10.2. Informative References
805805+806806+ [ASCII] American National Standards Institute (formerly United
807807+ States of America Standards Institute), "USA Code for
808808+ Information Interchange", ANSI X3.4-1968, 1968. ANSI
809809+ X3.4-1968 has been replaced by newer versions with
810810+ slight modifications, but the 1968 version remains
811811+ definitive for the Internet.
812812+813813+ [IDNA2008-Mapping]
814814+ Resnick, P. and P. Hoffman, "Mapping Characters in
815815+ Internationalized Domain Names for Applications (IDNA)",
816816+ Work in Progress, April 2010.
817817+818818+ [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
819819+ RFC 2671, August 1999.
820820+821821+ [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
822822+ "Internationalizing Domain Names in Applications
823823+ (IDNA)", RFC 3490, March 2003.
824824+825825+ [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
826826+ Profile for Internationalized Domain Names (IDN)",
827827+ RFC 3491, March 2003.
828828+829829+ [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
830830+ Resource Identifier (URI): Generic Syntax", STD 66,
831831+ RFC 3986, January 2005.
832832+833833+ [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
834834+ Identifiers (IRIs)", RFC 3987, January 2005.
835835+836836+ [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
837837+ and Recommendations for Internationalized Domain Names
838838+ (IDNs)", RFC 4690, September 2006.
839839+840840+841841+842842+Klensin Standards Track [Page 15]
843843+844844+RFC 5891 IDNA2008 Protocol August 2010
845845+846846+847847+ [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
848848+ Internationalized Email", RFC 4952, July 2007.
849849+850850+ [RFC5894] Klensin, J., "Internationalized Domain Names for
851851+ Applications (IDNA): Background, Explanation, and
852852+ Rationale", RFC 5894, August 2010.
853853+854854+ [Unicode] The Unicode Consortium, "The Unicode Standard, Version
855855+ 5.0", 2007. Boston, MA, USA: Addison-Wesley. ISBN
856856+ 0-321-48091-0. This printed reference has now been
857857+ updated online to reflect additional code points. For
858858+ code points, the reference at the time this document was
859859+ published is to Unicode 5.2.
860860+861861+862862+863863+864864+865865+866866+867867+868868+869869+870870+871871+872872+873873+874874+875875+876876+877877+878878+879879+880880+881881+882882+883883+884884+885885+886886+887887+888888+889889+890890+891891+892892+893893+894894+895895+896896+897897+898898+Klensin Standards Track [Page 16]
899899+900900+RFC 5891 IDNA2008 Protocol August 2010
901901+902902+903903+Appendix A. Summary of Major Changes from IDNA2003
904904+905905+ 1. Update base character set from Unicode 3.2 to Unicode version
906906+ agnostic.
907907+908908+ 2. Separate the definitions for the "registration" and "lookup"
909909+ activities.
910910+911911+ 3. Disallow symbol and punctuation characters except where special
912912+ exceptions are necessary.
913913+914914+ 4. Remove the mapping and normalization steps from the protocol and
915915+ have them, instead, done by the applications themselves,
916916+ possibly in a local fashion, before invoking the protocol.
917917+918918+ 5. Change the way that the protocol specifies which characters are
919919+ allowed in labels from "humans decide what the table of code
920920+ points contains" to "decision about code points are based on
921921+ Unicode properties plus a small exclusion list created by
922922+ humans".
923923+924924+ 6. Introduce the new concept of characters that can be used only in
925925+ specific contexts.
926926+927927+ 7. Allow typical words and names in languages such as Dhivehi and
928928+ Yiddish to be expressed.
929929+930930+ 8. Make bidirectional domain names (delimited strings of labels,
931931+ not just labels standing on their own) display in a less
932932+ surprising fashion, whether they appear in obvious domain name
933933+ contexts or as part of running text in paragraphs.
934934+935935+ 9. Remove the dot separator from the mandatory part of the
936936+ protocol.
937937+938938+ 10. Make some currently valid labels that are not actually IDNA
939939+ labels invalid.
940940+941941+Author's Address
942942+943943+ John C Klensin
944944+ 1770 Massachusetts Ave, Ste 322
945945+ Cambridge, MA 02140
946946+ USA
947947+948948+ Phone: +1 617 245 1457
949949+ EMail: john+ietf@jck.com
950950+951951+952952+953953+954954+Klensin Standards Track [Page 17]
955955+
+3923
spec/rfc5892.txt
···11+22+33+44+55+66+77+Internet Engineering Task Force (IETF) P. Faltstrom, Ed.
88+Request for Comments: 5892 Cisco
99+Category: Standards Track August 2010
1010+ISSN: 2070-1721
1111+1212+1313+ The Unicode Code Points and
1414+ Internationalized Domain Names for Applications (IDNA)
1515+1616+Abstract
1717+1818+ This document specifies rules for deciding whether a code point,
1919+ considered in isolation or in context, is a candidate for inclusion
2020+ in an Internationalized Domain Name (IDN).
2121+2222+ It is part of the specification of Internationalizing Domain Names in
2323+ Applications 2008 (IDNA2008).
2424+2525+Status of This Memo
2626+2727+ This is an Internet Standards Track document.
2828+2929+ This document is a product of the Internet Engineering Task Force
3030+ (IETF). It represents the consensus of the IETF community. It has
3131+ received public review and has been approved for publication by the
3232+ Internet Engineering Steering Group (IESG). Further information on
3333+ Internet Standards is available in Section 2 of RFC 5741.
3434+3535+ Information about the current status of this document, any errata,
3636+ and how to provide feedback on it may be obtained at
3737+ http://www.rfc-editor.org/info/rfc5892.
3838+3939+Copyright Notice
4040+4141+ Copyright (c) 2010 IETF Trust and the persons identified as the
4242+ document authors. All rights reserved.
4343+4444+ This document is subject to BCP 78 and the IETF Trust's Legal
4545+ Provisions Relating to IETF Documents
4646+ (http://trustee.ietf.org/license-info) in effect on the date of
4747+ publication of this document. Please review these documents
4848+ carefully, as they describe your rights and restrictions with respect
4949+ to this document. Code Components extracted from this document must
5050+ include Simplified BSD License text as described in Section 4.e of
5151+ the Trust Legal Provisions and are provided without warranty as
5252+ described in the Simplified BSD License.
5353+5454+5555+5656+5757+5858+Faltstrom Standards Track [Page 1]
5959+6060+RFC 5892 IDNA Code Points August 2010
6161+6262+6363+Table of Contents
6464+6565+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
6666+ 2. Category Definitions Used to Calculate Derived Property
6767+ Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6868+ 2.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . . 5
6969+ 2.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . . 6
7070+ 2.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 6
7171+ 2.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 7
7272+ 2.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 7
7373+ 2.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . . 7
7474+ 2.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . . 9
7575+ 2.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 9
7676+ 2.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 9
7777+ 2.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . . 9
7878+ 3. Calculation of the Derived Property . . . . . . . . . . . . . 10
7979+ 4. Code Points . . . . . . . . . . . . . . . . . . . . . . . . . 10
8080+ 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
8181+ 5.1. IDNA-Derived Property Value Registry . . . . . . . . . . . 11
8282+ 5.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 11
8383+ 5.2.1. Template for Context Registry . . . . . . . . . . . . 11
8484+ 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12
8585+ 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12
8686+ Appendix A. Contextual Rules Registry . . . . . . . . . . . . . 13
8787+ Appendix A.1. ZERO WIDTH NON-JOINER . . . . . . . . . . . . . . . 15
8888+ Appendix A.2. ZERO WIDTH JOINER . . . . . . . . . . . . . . . . . 16
8989+ Appendix A.3. MIDDLE DOT . . . . . . . . . . . . . . . . . . . . . 16
9090+ Appendix A.4. GREEK LOWER NUMERAL SIGN (KERAIA) . . . . . . . . . 17
9191+ Appendix A.5. HEBREW PUNCTUATION GERESH . . . . . . . . . . . . . 17
9292+ Appendix A.6. HEBREW PUNCTUATION GERSHAYIM . . . . . . . . . . . . 18
9393+ Appendix A.7. KATAKANA MIDDLE DOT . . . . . . . . . . . . . . . . 18
9494+ Appendix A.8. ARABIC-INDIC DIGITS . . . . . . . . . . . . . . . . 19
9595+ Appendix A.9. EXTENDED ARABIC-INDIC DIGITS . . . . . . . . . . . . 19
9696+ Appendix B. Code Points 0x0000 - 0x10FFFF . . . . . . . . . . . 20
9797+ Appendix B.1. Code Points in Unicode Character Database (UCD)
9898+ Format . . . . . . . . . . . . . . . . . . . . . . . 20
9999+ 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 69
100100+ 8.1. Normative References . . . . . . . . . . . . . . . . . . . 69
101101+ 8.2. Informative References . . . . . . . . . . . . . . . . . . 69
102102+103103+104104+105105+106106+107107+108108+109109+110110+111111+112112+113113+114114+Faltstrom Standards Track [Page 2]
115115+116116+RFC 5892 IDNA Code Points August 2010
117117+118118+119119+1. Introduction
120120+121121+ RFC 4690 [RFC4690] suggests an inclusion-based approach for selecting
122122+ the code points from The Unicode Standard [Unicode52] that should be
123123+ included in the list of code points that may be used in
124124+ Internationalized Domain Names.
125125+126126+ Specifically, RFC 4690 [RFC4690] says the following:
127127+128128+ The IAB has concluded that there is a consensus within the broader
129129+ community that lists of code points should be specified by the use
130130+ of an inclusion-based mechanism (i.e., identifying the characters
131131+ that are permitted), rather than by excluding a small number of
132132+ characters from the total Unicode set as Stringprep [RFC3454] and
133133+ Nameprep [RFC3491] do today. That conclusion should be reviewed
134134+ by the IETF community and action taken as appropriate.
135135+136136+ This document reviews and classifies the collections of code points
137137+ in the Unicode character set by examining various properties of the
138138+ code points. It then defines an algorithm for determining a derived
139139+ property value. It specifies a procedure, and not a table, of code
140140+ points so that the algorithm can be used to determine code point sets
141141+ independent of the version of Unicode that is in use.
142142+143143+ This document is not intended to specify precisely how these property
144144+ values are to be applied in IDN labels. That information appears in
145145+ the Protocol document [RFC5891], but it is important to understand
146146+ that the assignment of a value of this property to a particular
147147+ character is not sufficient to determine whether it can be used in a
148148+ given label. In particular, some combinations of allowed code points
149149+ are not advisable for use in IDNs due to rules specific to a script
150150+ or class of characters. The requirement for such rules is linked to
151151+ the operations in the Protocol document and especially to the
152152+ characters designated as requiring contextual rules.
153153+154154+ The value of the property is to be interpreted as follows.
155155+156156+ o PROTOCOL VALID: Those that are allowed to be used in IDNs. Code
157157+ points with this property value are permitted for general use in
158158+ IDNs. However, that a label consists only of code points that
159159+ have this property value does not imply that the label can be used
160160+ in DNS. See the Protocol document for algorithms to make
161161+ decisions about labels in domain names. The abbreviated term
162162+ PVALID is used to refer to this value in the rest of this
163163+ document.
164164+165165+166166+167167+168168+169169+170170+Faltstrom Standards Track [Page 3]
171171+172172+RFC 5892 IDNA Code Points August 2010
173173+174174+175175+ o CONTEXTUAL RULE REQUIRED: Some characteristics of the character,
176176+ such as it being invisible in certain contexts or problematic in
177177+ others, require that it not be used in labels unless specific
178178+ other characters or properties are present. The abbreviated term
179179+ CONTEXT is used to refer to this value in the rest of this
180180+ document. There are two subdivisions of CONTEXTUAL RULE REQUIRED,
181181+ one for Join_controls (called CONTEXTJ) and for other characters
182182+ (called CONTEXTO). These are discussed in more detail below and
183183+ in the Protocol document.
184184+185185+ o DISALLOWED: Those that should clearly not be included in IDNs.
186186+ Code points with this property value are not permitted in IDNs.
187187+188188+ o UNASSIGNED: Those code points that are not designated (i.e., are
189189+ unassigned) in the Unicode Standard.
190190+191191+ The mechanisms described here allow determination of the value of the
192192+ property for future versions of Unicode (including characters added
193193+ after Unicode 5.2). Changes in Unicode properties that do not affect
194194+ the outcome of this process do not affect IDN. For example, a
195195+ character can have its Unicode General_Category value (see
196196+ [Unicode52]) change from So to Sm or from Lo to Ll, without affecting
197197+ the algorithm results. Moreover, even if such changes were the
198198+ result, the BackwardCompatible list (Section 2.7) can be adjusted to
199199+ ensure the stability of the results.
200200+201201+ Some code points need to be allowed in exceptional circumstances but
202202+ should be excluded in all other cases; these rules are also described
203203+ in other documents. The most notable of these are the Join Control
204204+ characters, U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH
205205+ NON-JOINER. Both of them have the derived property value CONTEXTJ.
206206+ A character with the derived property value CONTEXTJ or CONTEXTO
207207+ (CONTEXTUAL RULE REQUIRED) is not to be used unless an appropriate
208208+ rule has been established and the context of the character is
209209+ consistent with that rule. It is invalid to either register a string
210210+ containing these characters or even to look one up unless such a
211211+ contextual rule is found and satisfied. Please see Appendix A, "The
212212+ Contextual Rules Registry", for more information.
213213+214214+ This document is part of a series that, together, constitute a
215215+ proposal for updating the IDNA standards to resolve issues uncovered
216216+ in recent years, cover a broader range of scripts, and provide for
217217+ migration to newer versions of Unicode. See the Rationale document
218218+ [RFC5894] for a broader discussion.
219219+220220+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
221221+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
222222+ document are to be interpreted as described in RFC 2119 [RFC2119].
223223+224224+225225+226226+Faltstrom Standards Track [Page 4]
227227+228228+RFC 5892 IDNA Code Points August 2010
229229+230230+231231+2. Category Definitions Used to Calculate Derived Property Value
232232+233233+ The derived property obtains its value based on a two-step procedure.
234234+ First, characters are placed in one or more character categories
235235+ based on either core properties defined by the Unicode Standard or by
236236+ treating the code point as an exception and addressing the code point
237237+ by its code point value. These categories are not mutually
238238+ exclusive.
239239+240240+ In the second step, set operations are used with these categories to
241241+ determine the values for an IDN-specific property. Those operations
242242+ are specified in Section 3.
243243+244244+ Unicode property names and property value names may have short
245245+ abbreviations, such as gc for the General_Category property, and Ll
246246+ for the Lowercase_Letter property value of the gc property.
247247+248248+ In the following specification of categories, the operation that
249249+ returns the value of a particular Unicode character property for a
250250+ code point is designated by using the formal name of that property
251251+ (from PropertyAliases.txt) followed by '(cp)'. For example, the
252252+ value of the General_Category property for a code point is indicated
253253+ by General_Category(cp).
254254+255255+2.1. LetterDigits (A)
256256+257257+ A: General_Category(cp) is in {Ll, Lu, Lo, Nd, Lm, Mn, Mc}
258258+259259+ These rules identify characters commonly used in mnemonics and often
260260+ informally described as "language characters". In general, only code
261261+ points assigned to this category are suitable for use in IDN.
262262+263263+ For more information, see Section 4.5 of The Unicode Standard
264264+ [Unicode].
265265+266266+ The categories used in this rule are:
267267+268268+ o Ll - Lowercase_Letter
269269+270270+ o Lu - Uppercase_Letter
271271+272272+ o Lo - Other_Letter
273273+274274+ o Nd - Decimal_Number
275275+276276+ o Lm - Modifier_Letter
277277+278278+279279+280280+281281+282282+Faltstrom Standards Track [Page 5]
283283+284284+RFC 5892 IDNA Code Points August 2010
285285+286286+287287+ o Mn - Nonspacing_Mark
288288+289289+ o Mc - Spacing_Mark
290290+291291+2.2. Unstable (B)
292292+293293+ B: toNFKC(toCaseFold(toNFKC(cp))) != cp
294294+295295+ This category is used to group the characters that are not stable
296296+ under Normalization Form K (NFKC) and case folding. In general,
297297+ these code points are not suitable for use for IDN.
298298+299299+ The toCaseFold() operation is defined in Section 3.13 of The Unicode
300300+ Standard [Unicode].
301301+302302+ The toNFKC() operation returns the code point in normalization form
303303+ KC. For more information, see Section 5 of Unicode Standard Annex
304304+ #15 [TR15].
305305+306306+ It should be noted that NFKC is used, although Normalization Form C
307307+ (NFC) is used in the "IDNA Protocol" document [RFC5891].
308308+309309+2.3. IgnorableProperties (C)
310310+311311+ C: Default_Ignorable_Code_Point(cp) = True or
312312+ White_Space(cp) = True or
313313+ Noncharacter_Code_Point(cp) = True
314314+315315+ This category is used to group code points that are not recommended
316316+ for use in identifiers. In general, these code points are not
317317+ suitable for use in an IDN.
318318+319319+ The definition for Default_Ignorable_Code_Point can be found in
320320+ DerivedCoreProperties.txt [DerivedCoreProperties] and is at the time
321321+ of Unicode 5.2:
322322+323323+ Other_Default_Ignorable_Code_Point + Cf (Format characters)
324324+ + Variation_Selector - White_Space - FFF9..FFFB (Annotation
325325+ Characters) - 0600..0603, 06DD, 070F (exceptional Cf characters
326326+ that should be visible)
327327+328328+329329+330330+331331+332332+333333+334334+335335+336336+337337+338338+Faltstrom Standards Track [Page 6]
339339+340340+RFC 5892 IDNA Code Points August 2010
341341+342342+343343+2.4. IgnorableBlocks (D)
344344+345345+ D: Block(cp) is in {Combining Diacritical Marks for Symbols,
346346+ Musical Symbols, Ancient Greek Musical Notation}
347347+348348+ This category is used to identify code points that are not useful in
349349+ mnemonics or that are otherwise impractical for IDN use. In general,
350350+ these code points are not suitable for use for IDN.
351351+352352+ The definition of blocks can be found in Blocks.txt [BlockNames].
353353+354354+2.5. LDH (E)
355355+356356+ E: cp is in {002D, 0030..0039, 0061..007A}
357357+358358+ This category is used in the second step to preserve the traditional
359359+ "hostname" (LDH -- as described in the Definitions document
360360+ [RFC5890]) characters ('-', 0-9, and a-z). In general, these code
361361+ points are suitable for use for IDN. Note that there are other rules
362362+ regarding the code point U+002D HYPHEN-MINUS that are specified in
363363+ the IDNA Protocol Specification [RFC5891].
364364+365365+2.6. Exceptions (F)
366366+367367+ F: cp is in {00B7, 00DF, 0375, 03C2, 05F3, 05F4, 0640, 0660,
368368+ 0661, 0662, 0663, 0664, 0665, 0666, 0667, 0668,
369369+ 0669, 06F0, 06F1, 06F2, 06F3, 06F4, 06F5, 06F6,
370370+ 06F7, 06F8, 06F9, 06FD, 06FE, 07FA, 0F0B, 3007,
371371+ 302E, 302F, 3031, 3032, 3033, 3034, 3035, 303B,
372372+ 30FB}
373373+374374+ This category explicitly lists code points for which the category
375375+ cannot be assigned using only the core property values that exist in
376376+ the Unicode standard. The values are according to the table below:
377377+378378+ PVALID -- Would otherwise have been DISALLOWED
379379+380380+ 00DF; PVALID # LATIN SMALL LETTER SHARP S
381381+ 03C2; PVALID # GREEK SMALL LETTER FINAL SIGMA
382382+ 06FD; PVALID # ARABIC SIGN SINDHI AMPERSAND
383383+ 06FE; PVALID # ARABIC SIGN SINDHI POSTPOSITION MEN
384384+ 0F0B; PVALID # TIBETAN MARK INTERSYLLABIC TSHEG
385385+ 3007; PVALID # IDEOGRAPHIC NUMBER ZERO
386386+387387+388388+389389+390390+391391+392392+393393+394394+Faltstrom Standards Track [Page 7]
395395+396396+RFC 5892 IDNA Code Points August 2010
397397+398398+399399+ CONTEXTO -- Would otherwise have been DISALLOWED
400400+401401+ 00B7; CONTEXTO # MIDDLE DOT
402402+ 0375; CONTEXTO # GREEK LOWER NUMERAL SIGN (KERAIA)
403403+ 05F3; CONTEXTO # HEBREW PUNCTUATION GERESH
404404+ 05F4; CONTEXTO # HEBREW PUNCTUATION GERSHAYIM
405405+ 30FB; CONTEXTO # KATAKANA MIDDLE DOT
406406+407407+ CONTEXTO -- Would otherwise have been PVALID
408408+409409+ 0660; CONTEXTO # ARABIC-INDIC DIGIT ZERO
410410+ 0661; CONTEXTO # ARABIC-INDIC DIGIT ONE
411411+ 0662; CONTEXTO # ARABIC-INDIC DIGIT TWO
412412+ 0663; CONTEXTO # ARABIC-INDIC DIGIT THREE
413413+ 0664; CONTEXTO # ARABIC-INDIC DIGIT FOUR
414414+ 0665; CONTEXTO # ARABIC-INDIC DIGIT FIVE
415415+ 0666; CONTEXTO # ARABIC-INDIC DIGIT SIX
416416+ 0667; CONTEXTO # ARABIC-INDIC DIGIT SEVEN
417417+ 0668; CONTEXTO # ARABIC-INDIC DIGIT EIGHT
418418+ 0669; CONTEXTO # ARABIC-INDIC DIGIT NINE
419419+ 06F0; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ZERO
420420+ 06F1; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ONE
421421+ 06F2; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT TWO
422422+ 06F3; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT THREE
423423+ 06F4; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT FOUR
424424+ 06F5; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT FIVE
425425+ 06F6; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT SIX
426426+ 06F7; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT SEVEN
427427+ 06F8; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT EIGHT
428428+ 06F9; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT NINE
429429+430430+ DISALLOWED -- Would otherwise have been PVALID
431431+432432+ 0640; DISALLOWED # ARABIC TATWEEL
433433+ 07FA; DISALLOWED # NKO LAJANYALAN
434434+ 302E; DISALLOWED # HANGUL SINGLE DOT TONE MARK
435435+ 302F; DISALLOWED # HANGUL DOUBLE DOT TONE MARK
436436+ 3031; DISALLOWED # VERTICAL KANA REPEAT MARK
437437+ 3032; DISALLOWED # VERTICAL KANA REPEAT WITH VOICED SOUND MARK
438438+ 3033; DISALLOWED # VERTICAL KANA REPEAT MARK UPPER HALF
439439+ 3034; DISALLOWED # VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HA
440440+ 3035; DISALLOWED # VERTICAL KANA REPEAT MARK LOWER HALF
441441+ 303B; DISALLOWED # VERTICAL IDEOGRAPHIC ITERATION MARK
442442+443443+444444+445445+446446+447447+448448+449449+450450+Faltstrom Standards Track [Page 8]
451451+452452+RFC 5892 IDNA Code Points August 2010
453453+454454+455455+2.7. BackwardCompatible (G)
456456+457457+ G: cp is in {}
458458+459459+ This category includes the code points that property values in
460460+ versions of Unicode after 5.2 have changed in such a way that the
461461+ derived property value would no longer be PVALID or DISALLOWED. If
462462+ changes are made to future versions of Unicode so that code points
463463+ might change the property value from PVALID or DISALLOWED, then this
464464+ table can be updated and keep special exception values so that the
465465+ property values for code points stay stable.
466466+467467+2.8. JoinControl (H)
468468+469469+ H: Join_Control(cp) = True
470470+471471+ This category consists of Join Control characters (i.e., they are not
472472+ in LetterDigits (Section 2.1) but are still required in IDN labels
473473+ under some circumstances).
474474+475475+2.9. OldHangulJamo (I)
476476+477477+ I: Hangul_Syllable_Type(cp) is in {L, V, T}
478478+479479+ This category consists of all conjoining Hangul Jamo (Leading Jamo,
480480+ Vowel Jamo, and Trailing Jamo).
481481+482482+ Elimination of conjoining Hangul Jamo from the set of PVALID
483483+ characters results in restricting the set of Korean PVALID characters
484484+ just to preformed, modern Hangul syllable characters. Old Hangul
485485+ syllables, which must be spelled with sequences of conjoining Hangul
486486+ Jamo, are not PVALID for IDNs.
487487+488488+2.10. Unassigned (J)
489489+490490+ J: General_Category(cp) is in {Cn} and
491491+ Noncharacter_Code_Point(cp) = False
492492+493493+ This category consists of code points in the Unicode character set
494494+ that are not (yet) assigned. It should be noted that Unicode
495495+ distinguishes between "unassigned code points" and "unassigned
496496+ characters". The unassigned code points are all but (Cn -
497497+ Noncharacters), while the unassigned *characters* are all but (Cn +
498498+ Cs).
499499+500500+501501+502502+503503+504504+505505+506506+Faltstrom Standards Track [Page 9]
507507+508508+RFC 5892 IDNA Code Points August 2010
509509+510510+511511+3. Calculation of the Derived Property
512512+513513+ As described above (Section 1) and in more detail in the IDNA
514514+ Protocol document [RFC5891], possible values of the IDN property are:
515515+516516+ o PVALID
517517+518518+ o CONTEXTJ
519519+520520+ o CONTEXTO
521521+522522+ o DISALLOWED
523523+524524+ o UNASSIGNED
525525+526526+ The algorithm to calculate the value of the derived property is as
527527+ follows. If the name of a rule (such as Exception) is used, that
528528+ implies the set of code points that the rule defines, while the same
529529+ name as a function call (such as Exception(cp)) implies the value cp
530530+ has in the Exceptions table.
531531+532532+ If .cp. .in. Exceptions Then Exceptions(cp);
533533+ Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp);
534534+ Else If .cp. .in. Unassigned Then UNASSIGNED;
535535+ Else If .cp. .in. LDH Then PVALID;
536536+ Else If .cp. .in. JoinControl Then CONTEXTJ;
537537+ Else If .cp. .in. Unstable Then DISALLOWED;
538538+ Else If .cp. .in. IgnorableProperties Then DISALLOWED;
539539+ Else If .cp. .in. IgnorableBlocks Then DISALLOWED;
540540+ Else If .cp. .in. OldHangulJamo Then DISALLOWED;
541541+ Else If .cp. .in. LetterDigits Then PVALID;
542542+ Else DISALLOWED;
543543+544544+4. Code Points
545545+546546+ The categories and rules defined in Sections 2 and 3 apply to all
547547+ Unicode code points. The table in Appendix B shows, for illustrative
548548+ purposes, the consequences of the categories and classification
549549+ rules, and the resulting property values.
550550+551551+ The list of code points that can be found in Appendix B is
552552+ non-normative. Sections 2 and 3 are normative.
553553+554554+555555+556556+557557+558558+559559+560560+561561+562562+Faltstrom Standards Track [Page 10]
563563+564564+RFC 5892 IDNA Code Points August 2010
565565+566566+567567+5. IANA Considerations
568568+569569+5.1. IDNA-Derived Property Value Registry
570570+571571+ IANA has created a registry with the derived properties for the
572572+ versions of Unicode released after (and including) version 5.2. The
573573+ derived property value is to be calculated in cooperation with a
574574+ designated expert [RFC5226] according to the specifications in
575575+ Sections 2 and 3 and not by copying the non-normative table found in
576576+ Appendix B.
577577+578578+ If non-backward-compatible changes or other problems arise during the
579579+ creation or designated expert review of the table of derived property
580580+ values, they should be flagged for the IESG. Changes to the rules
581581+ (as specified in Sections 2 and 3), including BackwardCompatible
582582+ (Section 2.7) (a set that is at release of this document is empty)
583583+ require IETF Review, as described in RFC 5226 [RFC5226].
584584+585585+5.2. IDNA Context Registry
586586+587587+ For characters that are defined in the IDNA derived property value
588588+ registry (Section 5.1) as CONTEXTO or CONTEXTJ and that therefore
589589+ require a contextual rule, IANA has created and now maintains a list
590590+ of approved contextual rules. Additions or changes to these rules
591591+ require IETF Review, as described in [RFC5226].
592592+593593+ Appendix A contains further discussion and a table from which that
594594+ registry can be initialized.
595595+596596+5.2.1. Template for Context Registry
597597+598598+ The following information is to be given when a new rule is created.
599599+600600+ Name: Unique name of the rule
601601+602602+ Code point: Rule that should be applied when this code point
603603+ exists in the label
604604+605605+ Overview: Description in plain English on what the rule verifies
606606+607607+ Lookup: Should the rule be applied at time of lookup?
608608+609609+ Rule Set: The set of rules, with a reference to the defining
610610+ document.
611611+612612+613613+614614+615615+616616+617617+618618+Faltstrom Standards Track [Page 11]
619619+620620+RFC 5892 IDNA Code Points August 2010
621621+622622+623623+6. Security Considerations
624624+625625+ Security Considerations for this version of IDNA, except for the
626626+ special issues associated with right-to-left scripts and characters,
627627+ are described in the Definitions document [RFC5890]. Specific issues
628628+ for labels containing characters associated with scripts written
629629+ right to left appear in the Bidi document [RFC5893].
630630+631631+7. Acknowledgements
632632+633633+ This document would not have been possible to produce without input
634634+ from many people. The main contributors are (in alphabetical order)
635635+ Harald Alvestrand, Vint Cerf, Tina Dam, Mark Davis, Gihan Dias,
636636+ Mouhammet Diop, Michael Everson, Asmus Freytag, Debbie Garside, Paul
637637+ Hoffman, Kent Karlsson, Cary Karp, Jaeyoun Kim, John Klensin, Olaf
638638+ Kolkman, Gervase Markham, Ram Mohan, Lisa Moore, Yngve Pettersen,
639639+ Erik van der Poel, Hualin Qian, Rick Reed, Pete Resnick, Lakmal
640640+ Silva, Michel Suignard, Andrew Sullivan, Wil Tan, Kenneth Whistler,
641641+ Chris Wright, and Yoshiro Yoneya.
642642+643643+644644+645645+646646+647647+648648+649649+650650+651651+652652+653653+654654+655655+656656+657657+658658+659659+660660+661661+662662+663663+664664+665665+666666+667667+668668+669669+670670+671671+672672+673673+674674+Faltstrom Standards Track [Page 12]
675675+676676+RFC 5892 IDNA Code Points August 2010
677677+678678+679679+Appendix A. Contextual Rules Registry
680680+681681+ As discussed in Section 5.2 and in the IANA Considerations section of
682682+ the Rationale document [RFC5894], a registry of rules that define the
683683+ contexts in which particular PROTOCOL-VALID characters, characters
684684+ associated with a requirement for Contextual Information, are
685685+ permitted. These rules are expressed as tests on the label in which
686686+ the characters appear (all, or any part of, the label may be tested).
687687+688688+ The grammatical rules are expressed in pseudo-code. The conventions
689689+ used for that pseudo-code are explained here.
690690+691691+ Each rule is constructed as a Boolean expression that evaluates to
692692+ either True or False. A simple "True;" or "False;" rule sets the
693693+ default result value for the rule set. Subsequent conditional rules
694694+ that evaluate to True or False may re-set the result value.
695695+696696+ A special value "Undefined" is used to deal with any error
697697+ conditions, such as an attempt to test a character before the start
698698+ of a label or after the end of a label. If any term of a rule
699699+ evaluates to Undefined, further evaluation of the rule immediately
700700+ terminates, as the result value of the rule will itself be Undefined.
701701+702702+ cp represents the code point to be tested.
703703+704704+ FirstChar is a special term that denotes the first code point in a
705705+ label.
706706+707707+ LastChar is a special term that denotes the last code point in a
708708+ label.
709709+710710+ .eq. represents the equality relation.
711711+712712+ A .eq. B evaluates to True if A equals B.
713713+714714+ .is. represents checking the position in a label.
715715+716716+ A .is. B evaluates to True if A and B have same position in
717717+ the same label.
718718+719719+ .ne. represents the non-equality relation.
720720+721721+ A .ne. B evaluates to True if A is not equal to B.
722722+723723+ .in. represents the set inclusion relation.
724724+725725+ A .in. B evaluates to True if A is a member of the set B.
726726+727727+728728+729729+730730+Faltstrom Standards Track [Page 13]
731731+732732+RFC 5892 IDNA Code Points August 2010
733733+734734+735735+ A functional notation, Function_Name(cp), is used to express either
736736+ string positions within a label, Boolean character property tests of
737737+ a code point, or a regular expression match. When such function
738738+ names refer to Boolean character property tests, the function names
739739+ use the exact Unicode character property name for the property in
740740+ question, and "cp" is evaluated as the Unicode value of the code
741741+ point to be tested, rather than as its position in the label. When
742742+ such function names refer to string positions within a label, "cp" is
743743+ evaluated as its position in the label.
744744+745745+ RegExpMatch(X) takes as its parameter X a schematic regular
746746+ expression consisting of a mix of Unicode character property values
747747+ and literal Unicode code points.
748748+749749+ Script(cp) returns the value of the Unicode Script property, as
750750+ defined in Scripts.txt in the Unicode Character Database.
751751+752752+ Canonical_Combining_Class(cp) returns the value of the Unicode
753753+ Canonical_Combining_Class property, as defined in UnicodeData.txt in
754754+ the Unicode Character Database.
755755+756756+ Before(cp) returns the code point of the character immediately
757757+ preceding cp in logical order in the string representing the label.
758758+ Before(FirstChar) evaluates to Undefined.
759759+760760+ After(cp) returns the code point of the character immediately
761761+ following cp in logical order in the string representing the label.
762762+ After(LastChar) evaluates to Undefined.
763763+764764+ Note that "Before" and "After" do not refer to the visual display
765765+ order of the character in a label, which may be reversed or otherwise
766766+ modified by the bidirectional algorithm for labels including
767767+ characters from scripts written right to left. Instead, "Before" and
768768+ "After" refer to the network order of the character in the label.
769769+770770+ The clauses "Then True" and "Then False" imply exit from the
771771+ pseudo-code routine with the corresponding result.
772772+773773+ Repeated evaluation for all characters in a label makes use of the
774774+ special construct:
775775+776776+ For All Characters:
777777+778778+ Expression;
779779+780780+ End For;
781781+782782+783783+784784+785785+786786+Faltstrom Standards Track [Page 14]
787787+788788+RFC 5892 IDNA Code Points August 2010
789789+790790+791791+ This construct requires repeated evaluation of "Expression" for each
792792+ code point in the label, starting from FirstChar and proceeding to
793793+ LastChar.
794794+795795+ The different fields in the rules are to be interpreted as follows:
796796+797797+ Code point:
798798+ The code point, or code points, to which this rule is to be
799799+ applied. Normally, this implies that if any of the code points in
800800+ a label is as defined, then the rules should be applied. If
801801+ evaluated to True, the code point is OK as used; if evaluated to
802802+ False, it is not OK.
803803+804804+ Overview:
805805+ A description of the goal with the rule, in plain English.
806806+807807+ Lookup:
808808+ True if application of this rule is recommended at lookup time;
809809+ False otherwise.
810810+811811+ Rule Set:
812812+ The rule set itself, as described above.
813813+814814+Appendix A.1. ZERO WIDTH NON-JOINER
815815+816816+ Code point:
817817+ U+200C
818818+819819+ Overview:
820820+ This may occur in a formally cursive script (such as Arabic) in a
821821+ context where it breaks a cursive connection as required for
822822+ orthographic rules, as in the Persian language, for example. It
823823+ also may occur in Indic scripts in a consonant-conjunct context
824824+ (immediately following a virama), to control required display of
825825+ such conjuncts.
826826+827827+ Lookup:
828828+ True
829829+830830+ Rule Set:
831831+832832+ False;
833833+834834+ If Canonical_Combining_Class(Before(cp)) .eq. Virama Then True;
835835+836836+ If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C
837837+838838+ (Joining_Type:T)*(Joining_Type:{R,D})) Then True;
839839+840840+841841+842842+Faltstrom Standards Track [Page 15]
843843+844844+RFC 5892 IDNA Code Points August 2010
845845+846846+847847+Appendix A.2. ZERO WIDTH JOINER
848848+849849+ Code point:
850850+ U+200D
851851+852852+ Overview:
853853+ This may occur in Indic scripts in a consonant-conjunct context
854854+ (immediately following a virama), to control required display of
855855+ such conjuncts.
856856+857857+ Lookup:
858858+ True
859859+860860+ Rule Set:
861861+862862+ False;
863863+864864+ If Canonical_Combining_Class(Before(cp)) .eq. Virama Then True;
865865+866866+Appendix A.3. MIDDLE DOT
867867+868868+ Code point:
869869+ U+00B7
870870+871871+ Overview:
872872+ Between 'l' (U+006C) characters only, used to permit the Catalan
873873+ character ela geminada to be expressed.
874874+875875+ Lookup:
876876+ False
877877+878878+ Rule Set:
879879+880880+ False;
881881+882882+ If Before(cp) .eq. U+006C And
883883+884884+ After(cp) .eq. U+006C Then True;
885885+886886+887887+888888+889889+890890+891891+892892+893893+894894+895895+896896+897897+898898+Faltstrom Standards Track [Page 16]
899899+900900+RFC 5892 IDNA Code Points August 2010
901901+902902+903903+Appendix A.4. GREEK LOWER NUMERAL SIGN (KERAIA)
904904+905905+ Code point:
906906+ U+0375
907907+908908+ Overview:
909909+ The script of the following character MUST be Greek.
910910+911911+ Lookup:
912912+ False
913913+914914+ Rule Set:
915915+916916+ False;
917917+918918+ If Script(After(cp)) .eq. Greek Then True;
919919+920920+Appendix A.5. HEBREW PUNCTUATION GERESH
921921+922922+ Code point:
923923+ U+05F3
924924+925925+ Overview:
926926+ The script of the preceding character MUST be Hebrew.
927927+928928+ Lookup:
929929+ False
930930+931931+ Rule Set:
932932+933933+ False;
934934+935935+ If Script(Before(cp)) .eq. Hebrew Then True;
936936+937937+938938+939939+940940+941941+942942+943943+944944+945945+946946+947947+948948+949949+950950+951951+952952+953953+954954+Faltstrom Standards Track [Page 17]
955955+956956+RFC 5892 IDNA Code Points August 2010
957957+958958+959959+Appendix A.6. HEBREW PUNCTUATION GERSHAYIM
960960+961961+ Code point:
962962+ U+05F4
963963+964964+ Overview:
965965+ The script of the preceding character MUST be Hebrew.
966966+967967+ Lookup:
968968+ False
969969+970970+ Rule Set:
971971+972972+ False;
973973+974974+ If Script(Before(cp)) .eq. Hebrew Then True;
975975+976976+Appendix A.7. KATAKANA MIDDLE DOT
977977+978978+ Code point:
979979+ U+30FB
980980+981981+ Overview:
982982+ Note that the Script of Katakana Middle Dot is not any of
983983+ "Hiragana", "Katakana", or "Han". The effect of this rule is to
984984+ require at least one character in the label to be in one of those
985985+ scripts.
986986+987987+ Lookup:
988988+ False
989989+990990+ Rule Set:
991991+992992+ False;
993993+994994+ For All Characters:
995995+996996+ If Script(cp) .in. {Hiragana, Katakana, Han} Then True;
997997+998998+ End For;
999999+10001000+10011001+10021002+10031003+10041004+10051005+10061006+10071007+10081008+10091009+10101010+Faltstrom Standards Track [Page 18]
10111011+10121012+RFC 5892 IDNA Code Points August 2010
10131013+10141014+10151015+Appendix A.8. ARABIC-INDIC DIGITS
10161016+10171017+ Code point:
10181018+ 0660..0669
10191019+10201020+ Overview:
10211021+ Can not be mixed with Extended Arabic-Indic Digits.
10221022+10231023+ Lookup:
10241024+ False
10251025+10261026+ Rule Set:
10271027+10281028+ True;
10291029+10301030+ For All Characters:
10311031+10321032+ If cp .in. 06F0..06F9 Then False;
10331033+10341034+ End For;
10351035+10361036+Appendix A.9. EXTENDED ARABIC-INDIC DIGITS
10371037+10381038+ Code point:
10391039+ 06F0..06F9
10401040+10411041+ Overview:
10421042+ Can not be mixed with Arabic-Indic Digits.
10431043+10441044+ Lookup:
10451045+ False
10461046+10471047+ Rule Set:
10481048+10491049+ True;
10501050+10511051+ For All Characters:
10521052+10531053+ If cp .in. 0660..0669 Then False;
10541054+10551055+ End For;
10561056+10571057+10581058+10591059+10601060+10611061+10621062+10631063+10641064+10651065+10661066+Faltstrom Standards Track [Page 19]
10671067+10681068+RFC 5892 IDNA Code Points August 2010
10691069+10701070+10711071+Appendix B. Code Points 0x0000 - 0x10FFFF
10721072+10731073+ If one applies the rules (Section 3) to the code points 0x0000 to
10741074+ 0x10FFFF to Unicode 5.2, the result is as follows.
10751075+10761076+ This list is non-normative, and only included for illustrative
10771077+ purposes. Specifically, what is displayed in the third column is not
10781078+ the formal name of the code point (as defined in Section 4.8 of The
10791079+ Unicode Standard [Unicode52]). The differences exist, for example,
10801080+ for the code points that have the code point value as part of the
10811081+ name (for example, CJK UNIFIED IDEOGRAPH-4E00) and the naming of
10821082+ Hangul syllables. For many code points, what you see is the official
10831083+ name.
10841084+10851085+Appendix B.1. Code Points in Unicode Character Database (UCD) Format
10861086+10871087+0000..002C ; DISALLOWED # <control>..COMMA
10881088+002D ; PVALID # HYPHEN-MINUS
10891089+002E..002F ; DISALLOWED # FULL STOP..SOLIDUS
10901090+0030..0039 ; PVALID # DIGIT ZERO..DIGIT NINE
10911091+003A..0060 ; DISALLOWED # COLON..GRAVE ACCENT
10921092+0061..007A ; PVALID # LATIN SMALL LETTER A..LATIN SMALL LETTER Z
10931093+007B..00B6 ; DISALLOWED # LEFT CURLY BRACKET..PILCROW SIGN
10941094+00B7 ; CONTEXTO # MIDDLE DOT
10951095+00B8..00DE ; DISALLOWED # CEDILLA..LATIN CAPITAL LETTER THORN
10961096+00DF..00F6 ; PVALID # LATIN SMALL LETTER SHARP S..LATIN SMALL LETT
10971097+00F7 ; DISALLOWED # DIVISION SIGN
10981098+00F8..00FF ; PVALID # LATIN SMALL LETTER O WITH STROKE..LATIN SMAL
10991099+0100 ; DISALLOWED # LATIN CAPITAL LETTER A WITH MACRON
11001100+0101 ; PVALID # LATIN SMALL LETTER A WITH MACRON
11011101+0102 ; DISALLOWED # LATIN CAPITAL LETTER A WITH BREVE
11021102+0103 ; PVALID # LATIN SMALL LETTER A WITH BREVE
11031103+0104 ; DISALLOWED # LATIN CAPITAL LETTER A WITH OGONEK
11041104+0105 ; PVALID # LATIN SMALL LETTER A WITH OGONEK
11051105+0106 ; DISALLOWED # LATIN CAPITAL LETTER C WITH ACUTE
11061106+0107 ; PVALID # LATIN SMALL LETTER C WITH ACUTE
11071107+0108 ; DISALLOWED # LATIN CAPITAL LETTER C WITH CIRCUMFLEX
11081108+0109 ; PVALID # LATIN SMALL LETTER C WITH CIRCUMFLEX
11091109+010A ; DISALLOWED # LATIN CAPITAL LETTER C WITH DOT ABOVE
11101110+010B ; PVALID # LATIN SMALL LETTER C WITH DOT ABOVE
11111111+010C ; DISALLOWED # LATIN CAPITAL LETTER C WITH CARON
11121112+010D ; PVALID # LATIN SMALL LETTER C WITH CARON
11131113+010E ; DISALLOWED # LATIN CAPITAL LETTER D WITH CARON
11141114+010F ; PVALID # LATIN SMALL LETTER D WITH CARON
11151115+0110 ; DISALLOWED # LATIN CAPITAL LETTER D WITH STROKE
11161116+0111 ; PVALID # LATIN SMALL LETTER D WITH STROKE
11171117+0112 ; DISALLOWED # LATIN CAPITAL LETTER E WITH MACRON
11181118+0113 ; PVALID # LATIN SMALL LETTER E WITH MACRON
11191119+11201120+11211121+11221122+Faltstrom Standards Track [Page 20]
11231123+11241124+RFC 5892 IDNA Code Points August 2010
11251125+11261126+11271127+0114 ; DISALLOWED # LATIN CAPITAL LETTER E WITH BREVE
11281128+0115 ; PVALID # LATIN SMALL LETTER E WITH BREVE
11291129+0116 ; DISALLOWED # LATIN CAPITAL LETTER E WITH DOT ABOVE
11301130+0117 ; PVALID # LATIN SMALL LETTER E WITH DOT ABOVE
11311131+0118 ; DISALLOWED # LATIN CAPITAL LETTER E WITH OGONEK
11321132+0119 ; PVALID # LATIN SMALL LETTER E WITH OGONEK
11331133+011A ; DISALLOWED # LATIN CAPITAL LETTER E WITH CARON
11341134+011B ; PVALID # LATIN SMALL LETTER E WITH CARON
11351135+011C ; DISALLOWED # LATIN CAPITAL LETTER G WITH CIRCUMFLEX
11361136+011D ; PVALID # LATIN SMALL LETTER G WITH CIRCUMFLEX
11371137+011E ; DISALLOWED # LATIN CAPITAL LETTER G WITH BREVE
11381138+011F ; PVALID # LATIN SMALL LETTER G WITH BREVE
11391139+0120 ; DISALLOWED # LATIN CAPITAL LETTER G WITH DOT ABOVE
11401140+0121 ; PVALID # LATIN SMALL LETTER G WITH DOT ABOVE
11411141+0122 ; DISALLOWED # LATIN CAPITAL LETTER G WITH CEDILLA
11421142+0123 ; PVALID # LATIN SMALL LETTER G WITH CEDILLA
11431143+0124 ; DISALLOWED # LATIN CAPITAL LETTER H WITH CIRCUMFLEX
11441144+0125 ; PVALID # LATIN SMALL LETTER H WITH CIRCUMFLEX
11451145+0126 ; DISALLOWED # LATIN CAPITAL LETTER H WITH STROKE
11461146+0127 ; PVALID # LATIN SMALL LETTER H WITH STROKE
11471147+0128 ; DISALLOWED # LATIN CAPITAL LETTER I WITH TILDE
11481148+0129 ; PVALID # LATIN SMALL LETTER I WITH TILDE
11491149+012A ; DISALLOWED # LATIN CAPITAL LETTER I WITH MACRON
11501150+012B ; PVALID # LATIN SMALL LETTER I WITH MACRON
11511151+012C ; DISALLOWED # LATIN CAPITAL LETTER I WITH BREVE
11521152+012D ; PVALID # LATIN SMALL LETTER I WITH BREVE
11531153+012E ; DISALLOWED # LATIN CAPITAL LETTER I WITH OGONEK
11541154+012F ; PVALID # LATIN SMALL LETTER I WITH OGONEK
11551155+0130 ; DISALLOWED # LATIN CAPITAL LETTER I WITH DOT ABOVE
11561156+0131 ; PVALID # LATIN SMALL LETTER DOTLESS I
11571157+0132..0134 ; DISALLOWED # LATIN CAPITAL LIGATURE IJ..LATIN CAPITAL LET
11581158+0135 ; PVALID # LATIN SMALL LETTER J WITH CIRCUMFLEX
11591159+0136 ; DISALLOWED # LATIN CAPITAL LETTER K WITH CEDILLA
11601160+0137..0138 ; PVALID # LATIN SMALL LETTER K WITH CEDILLA..LATIN SMA
11611161+0139 ; DISALLOWED # LATIN CAPITAL LETTER L WITH ACUTE
11621162+013A ; PVALID # LATIN SMALL LETTER L WITH ACUTE
11631163+013B ; DISALLOWED # LATIN CAPITAL LETTER L WITH CEDILLA
11641164+013C ; PVALID # LATIN SMALL LETTER L WITH CEDILLA
11651165+013D ; DISALLOWED # LATIN CAPITAL LETTER L WITH CARON
11661166+013E ; PVALID # LATIN SMALL LETTER L WITH CARON
11671167+013F..0141 ; DISALLOWED # LATIN CAPITAL LETTER L WITH MIDDLE DOT..LATI
11681168+0142 ; PVALID # LATIN SMALL LETTER L WITH STROKE
11691169+0143 ; DISALLOWED # LATIN CAPITAL LETTER N WITH ACUTE
11701170+0144 ; PVALID # LATIN SMALL LETTER N WITH ACUTE
11711171+0145 ; DISALLOWED # LATIN CAPITAL LETTER N WITH CEDILLA
11721172+0146 ; PVALID # LATIN SMALL LETTER N WITH CEDILLA
11731173+0147 ; DISALLOWED # LATIN CAPITAL LETTER N WITH CARON
11741174+0148 ; PVALID # LATIN SMALL LETTER N WITH CARON
11751175+11761176+11771177+11781178+Faltstrom Standards Track [Page 21]
11791179+11801180+RFC 5892 IDNA Code Points August 2010
11811181+11821182+11831183+0149..014A ; DISALLOWED # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE.
11841184+014B ; PVALID # LATIN SMALL LETTER ENG
11851185+014C ; DISALLOWED # LATIN CAPITAL LETTER O WITH MACRON
11861186+014D ; PVALID # LATIN SMALL LETTER O WITH MACRON
11871187+014E ; DISALLOWED # LATIN CAPITAL LETTER O WITH BREVE
11881188+014F ; PVALID # LATIN SMALL LETTER O WITH BREVE
11891189+0150 ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
11901190+0151 ; PVALID # LATIN SMALL LETTER O WITH DOUBLE ACUTE
11911191+0152 ; DISALLOWED # LATIN CAPITAL LIGATURE OE
11921192+0153 ; PVALID # LATIN SMALL LIGATURE OE
11931193+0154 ; DISALLOWED # LATIN CAPITAL LETTER R WITH ACUTE
11941194+0155 ; PVALID # LATIN SMALL LETTER R WITH ACUTE
11951195+0156 ; DISALLOWED # LATIN CAPITAL LETTER R WITH CEDILLA
11961196+0157 ; PVALID # LATIN SMALL LETTER R WITH CEDILLA
11971197+0158 ; DISALLOWED # LATIN CAPITAL LETTER R WITH CARON
11981198+0159 ; PVALID # LATIN SMALL LETTER R WITH CARON
11991199+015A ; DISALLOWED # LATIN CAPITAL LETTER S WITH ACUTE
12001200+015B ; PVALID # LATIN SMALL LETTER S WITH ACUTE
12011201+015C ; DISALLOWED # LATIN CAPITAL LETTER S WITH CIRCUMFLEX
12021202+015D ; PVALID # LATIN SMALL LETTER S WITH CIRCUMFLEX
12031203+015E ; DISALLOWED # LATIN CAPITAL LETTER S WITH CEDILLA
12041204+015F ; PVALID # LATIN SMALL LETTER S WITH CEDILLA
12051205+0160 ; DISALLOWED # LATIN CAPITAL LETTER S WITH CARON
12061206+0161 ; PVALID # LATIN SMALL LETTER S WITH CARON
12071207+0162 ; DISALLOWED # LATIN CAPITAL LETTER T WITH CEDILLA
12081208+0163 ; PVALID # LATIN SMALL LETTER T WITH CEDILLA
12091209+0164 ; DISALLOWED # LATIN CAPITAL LETTER T WITH CARON
12101210+0165 ; PVALID # LATIN SMALL LETTER T WITH CARON
12111211+0166 ; DISALLOWED # LATIN CAPITAL LETTER T WITH STROKE
12121212+0167 ; PVALID # LATIN SMALL LETTER T WITH STROKE
12131213+0168 ; DISALLOWED # LATIN CAPITAL LETTER U WITH TILDE
12141214+0169 ; PVALID # LATIN SMALL LETTER U WITH TILDE
12151215+016A ; DISALLOWED # LATIN CAPITAL LETTER U WITH MACRON
12161216+016B ; PVALID # LATIN SMALL LETTER U WITH MACRON
12171217+016C ; DISALLOWED # LATIN CAPITAL LETTER U WITH BREVE
12181218+016D ; PVALID # LATIN SMALL LETTER U WITH BREVE
12191219+016E ; DISALLOWED # LATIN CAPITAL LETTER U WITH RING ABOVE
12201220+016F ; PVALID # LATIN SMALL LETTER U WITH RING ABOVE
12211221+0170 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
12221222+0171 ; PVALID # LATIN SMALL LETTER U WITH DOUBLE ACUTE
12231223+0172 ; DISALLOWED # LATIN CAPITAL LETTER U WITH OGONEK
12241224+0173 ; PVALID # LATIN SMALL LETTER U WITH OGONEK
12251225+0174 ; DISALLOWED # LATIN CAPITAL LETTER W WITH CIRCUMFLEX
12261226+0175 ; PVALID # LATIN SMALL LETTER W WITH CIRCUMFLEX
12271227+0176 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
12281228+0177 ; PVALID # LATIN SMALL LETTER Y WITH CIRCUMFLEX
12291229+0178..0179 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH DIAERESIS..LATIN
12301230+017A ; PVALID # LATIN SMALL LETTER Z WITH ACUTE
12311231+12321232+12331233+12341234+Faltstrom Standards Track [Page 22]
12351235+12361236+RFC 5892 IDNA Code Points August 2010
12371237+12381238+12391239+017B ; DISALLOWED # LATIN CAPITAL LETTER Z WITH DOT ABOVE
12401240+017C ; PVALID # LATIN SMALL LETTER Z WITH DOT ABOVE
12411241+017D ; DISALLOWED # LATIN CAPITAL LETTER Z WITH CARON
12421242+017E ; PVALID # LATIN SMALL LETTER Z WITH CARON
12431243+017F ; DISALLOWED # LATIN SMALL LETTER LONG S
12441244+0180 ; PVALID # LATIN SMALL LETTER B WITH STROKE
12451245+0181..0182 ; DISALLOWED # LATIN CAPITAL LETTER B WITH HOOK..LATIN CAPI
12461246+0183 ; PVALID # LATIN SMALL LETTER B WITH TOPBAR
12471247+0184 ; DISALLOWED # LATIN CAPITAL LETTER TONE SIX
12481248+0185 ; PVALID # LATIN SMALL LETTER TONE SIX
12491249+0186..0187 ; DISALLOWED # LATIN CAPITAL LETTER OPEN O..LATIN CAPITAL L
12501250+0188 ; PVALID # LATIN SMALL LETTER C WITH HOOK
12511251+0189..018B ; DISALLOWED # LATIN CAPITAL LETTER AFRICAN D..LATIN CAPITA
12521252+018C..018D ; PVALID # LATIN SMALL LETTER D WITH TOPBAR..LATIN SMAL
12531253+018E..0191 ; DISALLOWED # LATIN CAPITAL LETTER REVERSED E..LATIN CAPIT
12541254+0192 ; PVALID # LATIN SMALL LETTER F WITH HOOK
12551255+0193..0194 ; DISALLOWED # LATIN CAPITAL LETTER G WITH HOOK..LATIN CAPI
12561256+0195 ; PVALID # LATIN SMALL LETTER HV
12571257+0196..0198 ; DISALLOWED # LATIN CAPITAL LETTER IOTA..LATIN CAPITAL LET
12581258+0199..019B ; PVALID # LATIN SMALL LETTER K WITH HOOK..LATIN SMALL
12591259+019C..019D ; DISALLOWED # LATIN CAPITAL LETTER TURNED M..LATIN CAPITAL
12601260+019E ; PVALID # LATIN SMALL LETTER N WITH LONG RIGHT LEG
12611261+019F..01A0 ; DISALLOWED # LATIN CAPITAL LETTER O WITH MIDDLE TILDE..LA
12621262+01A1 ; PVALID # LATIN SMALL LETTER O WITH HORN
12631263+01A2 ; DISALLOWED # LATIN CAPITAL LETTER OI
12641264+01A3 ; PVALID # LATIN SMALL LETTER OI
12651265+01A4 ; DISALLOWED # LATIN CAPITAL LETTER P WITH HOOK
12661266+01A5 ; PVALID # LATIN SMALL LETTER P WITH HOOK
12671267+01A6..01A7 ; DISALLOWED # LATIN LETTER YR..LATIN CAPITAL LETTER TONE T
12681268+01A8 ; PVALID # LATIN SMALL LETTER TONE TWO
12691269+01A9 ; DISALLOWED # LATIN CAPITAL LETTER ESH
12701270+01AA..01AB ; PVALID # LATIN LETTER REVERSED ESH LOOP..LATIN SMALL
12711271+01AC ; DISALLOWED # LATIN CAPITAL LETTER T WITH HOOK
12721272+01AD ; PVALID # LATIN SMALL LETTER T WITH HOOK
12731273+01AE..01AF ; DISALLOWED # LATIN CAPITAL LETTER T WITH RETROFLEX HOOK..
12741274+01B0 ; PVALID # LATIN SMALL LETTER U WITH HORN
12751275+01B1..01B3 ; DISALLOWED # LATIN CAPITAL LETTER UPSILON..LATIN CAPITAL
12761276+01B4 ; PVALID # LATIN SMALL LETTER Y WITH HOOK
12771277+01B5 ; DISALLOWED # LATIN CAPITAL LETTER Z WITH STROKE
12781278+01B6 ; PVALID # LATIN SMALL LETTER Z WITH STROKE
12791279+01B7..01B8 ; DISALLOWED # LATIN CAPITAL LETTER EZH..LATIN CAPITAL LETT
12801280+01B9..01BB ; PVALID # LATIN SMALL LETTER EZH REVERSED..LATIN LETTE
12811281+01BC ; DISALLOWED # LATIN CAPITAL LETTER TONE FIVE
12821282+01BD..01C3 ; PVALID # LATIN SMALL LETTER TONE FIVE..LATIN LETTER R
12831283+01C4..01CD ; DISALLOWED # LATIN CAPITAL LETTER DZ WITH CARON..LATIN CA
12841284+01CE ; PVALID # LATIN SMALL LETTER A WITH CARON
12851285+01CF ; DISALLOWED # LATIN CAPITAL LETTER I WITH CARON
12861286+01D0 ; PVALID # LATIN SMALL LETTER I WITH CARON
12871287+12881288+12891289+12901290+Faltstrom Standards Track [Page 23]
12911291+12921292+RFC 5892 IDNA Code Points August 2010
12931293+12941294+12951295+01D1 ; DISALLOWED # LATIN CAPITAL LETTER O WITH CARON
12961296+01D2 ; PVALID # LATIN SMALL LETTER O WITH CARON
12971297+01D3 ; DISALLOWED # LATIN CAPITAL LETTER U WITH CARON
12981298+01D4 ; PVALID # LATIN SMALL LETTER U WITH CARON
12991299+01D5 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND MA
13001300+01D6 ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND MACR
13011301+01D7 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND AC
13021302+01D8 ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND ACUT
13031303+01D9 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND CA
13041304+01DA ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND CARO
13051305+01DB ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND GR
13061306+01DC..01DD ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND GRAV
13071307+01DE ; DISALLOWED # LATIN CAPITAL LETTER A WITH DIAERESIS AND MA
13081308+01DF ; PVALID # LATIN SMALL LETTER A WITH DIAERESIS AND MACR
13091309+01E0 ; DISALLOWED # LATIN CAPITAL LETTER A WITH DOT ABOVE AND MA
13101310+01E1 ; PVALID # LATIN SMALL LETTER A WITH DOT ABOVE AND MACR
13111311+01E2 ; DISALLOWED # LATIN CAPITAL LETTER AE WITH MACRON
13121312+01E3 ; PVALID # LATIN SMALL LETTER AE WITH MACRON
13131313+01E4 ; DISALLOWED # LATIN CAPITAL LETTER G WITH STROKE
13141314+01E5 ; PVALID # LATIN SMALL LETTER G WITH STROKE
13151315+01E6 ; DISALLOWED # LATIN CAPITAL LETTER G WITH CARON
13161316+01E7 ; PVALID # LATIN SMALL LETTER G WITH CARON
13171317+01E8 ; DISALLOWED # LATIN CAPITAL LETTER K WITH CARON
13181318+01E9 ; PVALID # LATIN SMALL LETTER K WITH CARON
13191319+01EA ; DISALLOWED # LATIN CAPITAL LETTER O WITH OGONEK
13201320+01EB ; PVALID # LATIN SMALL LETTER O WITH OGONEK
13211321+01EC ; DISALLOWED # LATIN CAPITAL LETTER O WITH OGONEK AND MACRO
13221322+01ED ; PVALID # LATIN SMALL LETTER O WITH OGONEK AND MACRON
13231323+01EE ; DISALLOWED # LATIN CAPITAL LETTER EZH WITH CARON
13241324+01EF..01F0 ; PVALID # LATIN SMALL LETTER EZH WITH CARON..LATIN SMA
13251325+01F1..01F4 ; DISALLOWED # LATIN CAPITAL LETTER DZ..LATIN CAPITAL LETTE
13261326+01F5 ; PVALID # LATIN SMALL LETTER G WITH ACUTE
13271327+01F6..01F8 ; DISALLOWED # LATIN CAPITAL LETTER HWAIR..LATIN CAPITAL LE
13281328+01F9 ; PVALID # LATIN SMALL LETTER N WITH GRAVE
13291329+01FA ; DISALLOWED # LATIN CAPITAL LETTER A WITH RING ABOVE AND A
13301330+01FB ; PVALID # LATIN SMALL LETTER A WITH RING ABOVE AND ACU
13311331+01FC ; DISALLOWED # LATIN CAPITAL LETTER AE WITH ACUTE
13321332+01FD ; PVALID # LATIN SMALL LETTER AE WITH ACUTE
13331333+01FE ; DISALLOWED # LATIN CAPITAL LETTER O WITH STROKE AND ACUTE
13341334+01FF ; PVALID # LATIN SMALL LETTER O WITH STROKE AND ACUTE
13351335+0200 ; DISALLOWED # LATIN CAPITAL LETTER A WITH DOUBLE GRAVE
13361336+0201 ; PVALID # LATIN SMALL LETTER A WITH DOUBLE GRAVE
13371337+0202 ; DISALLOWED # LATIN CAPITAL LETTER A WITH INVERTED BREVE
13381338+0203 ; PVALID # LATIN SMALL LETTER A WITH INVERTED BREVE
13391339+0204 ; DISALLOWED # LATIN CAPITAL LETTER E WITH DOUBLE GRAVE
13401340+0205 ; PVALID # LATIN SMALL LETTER E WITH DOUBLE GRAVE
13411341+0206 ; DISALLOWED # LATIN CAPITAL LETTER E WITH INVERTED BREVE
13421342+0207 ; PVALID # LATIN SMALL LETTER E WITH INVERTED BREVE
13431343+13441344+13451345+13461346+Faltstrom Standards Track [Page 24]
13471347+13481348+RFC 5892 IDNA Code Points August 2010
13491349+13501350+13511351+0208 ; DISALLOWED # LATIN CAPITAL LETTER I WITH DOUBLE GRAVE
13521352+0209 ; PVALID # LATIN SMALL LETTER I WITH DOUBLE GRAVE
13531353+020A ; DISALLOWED # LATIN CAPITAL LETTER I WITH INVERTED BREVE
13541354+020B ; PVALID # LATIN SMALL LETTER I WITH INVERTED BREVE
13551355+020C ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOUBLE GRAVE
13561356+020D ; PVALID # LATIN SMALL LETTER O WITH DOUBLE GRAVE
13571357+020E ; DISALLOWED # LATIN CAPITAL LETTER O WITH INVERTED BREVE
13581358+020F ; PVALID # LATIN SMALL LETTER O WITH INVERTED BREVE
13591359+0210 ; DISALLOWED # LATIN CAPITAL LETTER R WITH DOUBLE GRAVE
13601360+0211 ; PVALID # LATIN SMALL LETTER R WITH DOUBLE GRAVE
13611361+0212 ; DISALLOWED # LATIN CAPITAL LETTER R WITH INVERTED BREVE
13621362+0213 ; PVALID # LATIN SMALL LETTER R WITH INVERTED BREVE
13631363+0214 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DOUBLE GRAVE
13641364+0215 ; PVALID # LATIN SMALL LETTER U WITH DOUBLE GRAVE
13651365+0216 ; DISALLOWED # LATIN CAPITAL LETTER U WITH INVERTED BREVE
13661366+0217 ; PVALID # LATIN SMALL LETTER U WITH INVERTED BREVE
13671367+0218 ; DISALLOWED # LATIN CAPITAL LETTER S WITH COMMA BELOW
13681368+0219 ; PVALID # LATIN SMALL LETTER S WITH COMMA BELOW
13691369+021A ; DISALLOWED # LATIN CAPITAL LETTER T WITH COMMA BELOW
13701370+021B ; PVALID # LATIN SMALL LETTER T WITH COMMA BELOW
13711371+021C ; DISALLOWED # LATIN CAPITAL LETTER YOGH
13721372+021D ; PVALID # LATIN SMALL LETTER YOGH
13731373+021E ; DISALLOWED # LATIN CAPITAL LETTER H WITH CARON
13741374+021F ; PVALID # LATIN SMALL LETTER H WITH CARON
13751375+0220 ; DISALLOWED # LATIN CAPITAL LETTER N WITH LONG RIGHT LEG
13761376+0221 ; PVALID # LATIN SMALL LETTER D WITH CURL
13771377+0222 ; DISALLOWED # LATIN CAPITAL LETTER OU
13781378+0223 ; PVALID # LATIN SMALL LETTER OU
13791379+0224 ; DISALLOWED # LATIN CAPITAL LETTER Z WITH HOOK
13801380+0225 ; PVALID # LATIN SMALL LETTER Z WITH HOOK
13811381+0226 ; DISALLOWED # LATIN CAPITAL LETTER A WITH DOT ABOVE
13821382+0227 ; PVALID # LATIN SMALL LETTER A WITH DOT ABOVE
13831383+0228 ; DISALLOWED # LATIN CAPITAL LETTER E WITH CEDILLA
13841384+0229 ; PVALID # LATIN SMALL LETTER E WITH CEDILLA
13851385+022A ; DISALLOWED # LATIN CAPITAL LETTER O WITH DIAERESIS AND MA
13861386+022B ; PVALID # LATIN SMALL LETTER O WITH DIAERESIS AND MACR
13871387+022C ; DISALLOWED # LATIN CAPITAL LETTER O WITH TILDE AND MACRON
13881388+022D ; PVALID # LATIN SMALL LETTER O WITH TILDE AND MACRON
13891389+022E ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOT ABOVE
13901390+022F ; PVALID # LATIN SMALL LETTER O WITH DOT ABOVE
13911391+0230 ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOT ABOVE AND MA
13921392+0231 ; PVALID # LATIN SMALL LETTER O WITH DOT ABOVE AND MACR
13931393+0232 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH MACRON
13941394+0233..0239 ; PVALID # LATIN SMALL LETTER Y WITH MACRON..LATIN SMAL
13951395+023A..023B ; DISALLOWED # LATIN CAPITAL LETTER A WITH STROKE..LATIN CA
13961396+023C ; PVALID # LATIN SMALL LETTER C WITH STROKE
13971397+023D..023E ; DISALLOWED # LATIN CAPITAL LETTER L WITH BAR..LATIN CAPIT
13981398+023F..0240 ; PVALID # LATIN SMALL LETTER S WITH SWASH TAIL..LATIN
13991399+14001400+14011401+14021402+Faltstrom Standards Track [Page 25]
14031403+14041404+RFC 5892 IDNA Code Points August 2010
14051405+14061406+14071407+0241 ; DISALLOWED # LATIN CAPITAL LETTER GLOTTAL STOP
14081408+0242 ; PVALID # LATIN SMALL LETTER GLOTTAL STOP
14091409+0243..0246 ; DISALLOWED # LATIN CAPITAL LETTER B WITH STROKE..LATIN CA
14101410+0247 ; PVALID # LATIN SMALL LETTER E WITH STROKE
14111411+0248 ; DISALLOWED # LATIN CAPITAL LETTER J WITH STROKE
14121412+0249 ; PVALID # LATIN SMALL LETTER J WITH STROKE
14131413+024A ; DISALLOWED # LATIN CAPITAL LETTER SMALL Q WITH HOOK TAIL
14141414+024B ; PVALID # LATIN SMALL LETTER Q WITH HOOK TAIL
14151415+024C ; DISALLOWED # LATIN CAPITAL LETTER R WITH STROKE
14161416+024D ; PVALID # LATIN SMALL LETTER R WITH STROKE
14171417+024E ; DISALLOWED # LATIN CAPITAL LETTER Y WITH STROKE
14181418+024F..02AF ; PVALID # LATIN SMALL LETTER Y WITH STROKE..LATIN SMAL
14191419+02B0..02B8 ; DISALLOWED # MODIFIER LETTER SMALL H..MODIFIER LETTER SMA
14201420+02B9..02C1 ; PVALID # MODIFIER LETTER PRIME..MODIFIER LETTER REVER
14211421+02C2..02C5 ; DISALLOWED # MODIFIER LETTER LEFT ARROWHEAD..MODIFIER LET
14221422+02C6..02D1 ; PVALID # MODIFIER LETTER CIRCUMFLEX ACCENT..MODIFIER
14231423+02D2..02EB ; DISALLOWED # MODIFIER LETTER CENTRED RIGHT HALF RING..MOD
14241424+02EC ; PVALID # MODIFIER LETTER VOICING
14251425+02ED ; DISALLOWED # MODIFIER LETTER UNASPIRATED
14261426+02EE ; PVALID # MODIFIER LETTER DOUBLE APOSTROPHE
14271427+02EF..02FF ; DISALLOWED # MODIFIER LETTER LOW DOWN ARROWHEAD..MODIFIER
14281428+0300..033F ; PVALID # COMBINING GRAVE ACCENT..COMBINING DOUBLE OVE
14291429+0340..0341 ; DISALLOWED # COMBINING GRAVE TONE MARK..COMBINING ACUTE T
14301430+0342 ; PVALID # COMBINING GREEK PERISPOMENI
14311431+0343..0345 ; DISALLOWED # COMBINING GREEK KORONIS..COMBINING GREEK YPO
14321432+0346..034E ; PVALID # COMBINING BRIDGE ABOVE..COMBINING UPWARDS AR
14331433+034F ; DISALLOWED # COMBINING GRAPHEME JOINER
14341434+0350..036F ; PVALID # COMBINING RIGHT ARROWHEAD ABOVE..COMBINING L
14351435+0370 ; DISALLOWED # GREEK CAPITAL LETTER HETA
14361436+0371 ; PVALID # GREEK SMALL LETTER HETA
14371437+0372 ; DISALLOWED # GREEK CAPITAL LETTER ARCHAIC SAMPI
14381438+0373 ; PVALID # GREEK SMALL LETTER ARCHAIC SAMPI
14391439+0374 ; DISALLOWED # GREEK NUMERAL SIGN
14401440+0375 ; CONTEXTO # GREEK LOWER NUMERAL SIGN
14411441+0376 ; DISALLOWED # GREEK CAPITAL LETTER PAMPHYLIAN DIGAMMA
14421442+0377 ; PVALID # GREEK SMALL LETTER PAMPHYLIAN DIGAMMA
14431443+0378..0379 ; UNASSIGNED # <reserved>..<reserved>
14441444+037A ; DISALLOWED # GREEK YPOGEGRAMMENI
14451445+037B..037D ; PVALID # GREEK SMALL REVERSED LUNATE SIGMA SYMBOL..GR
14461446+037E ; DISALLOWED # GREEK QUESTION MARK
14471447+037F..0383 ; UNASSIGNED # <reserved>..<reserved>
14481448+0384..038A ; DISALLOWED # GREEK TONOS..GREEK CAPITAL LETTER IOTA WITH
14491449+038B ; UNASSIGNED # <reserved>
14501450+038C ; DISALLOWED # GREEK CAPITAL LETTER OMICRON WITH TONOS
14511451+038D ; UNASSIGNED # <reserved>
14521452+038E..038F ; DISALLOWED # GREEK CAPITAL LETTER UPSILON WITH TONOS..GRE
14531453+0390 ; PVALID # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND T
14541454+0391..03A1 ; DISALLOWED # GREEK CAPITAL LETTER ALPHA..GREEK CAPITAL LE
14551455+14561456+14571457+14581458+Faltstrom Standards Track [Page 26]
14591459+14601460+RFC 5892 IDNA Code Points August 2010
14611461+14621462+14631463+03A2 ; UNASSIGNED # <reserved>
14641464+03A3..03AB ; DISALLOWED # GREEK CAPITAL LETTER SIGMA..GREEK CAPITAL LE
14651465+03AC..03CE ; PVALID # GREEK SMALL LETTER ALPHA WITH TONOS..GREEK S
14661466+03CF..03D6 ; DISALLOWED # GREEK CAPITAL KAI SYMBOL..GREEK PI SYMBOL
14671467+03D7 ; PVALID # GREEK KAI SYMBOL
14681468+03D8 ; DISALLOWED # GREEK LETTER ARCHAIC KOPPA
14691469+03D9 ; PVALID # GREEK SMALL LETTER ARCHAIC KOPPA
14701470+03DA ; DISALLOWED # GREEK LETTER STIGMA
14711471+03DB ; PVALID # GREEK SMALL LETTER STIGMA
14721472+03DC ; DISALLOWED # GREEK LETTER DIGAMMA
14731473+03DD ; PVALID # GREEK SMALL LETTER DIGAMMA
14741474+03DE ; DISALLOWED # GREEK LETTER KOPPA
14751475+03DF ; PVALID # GREEK SMALL LETTER KOPPA
14761476+03E0 ; DISALLOWED # GREEK LETTER SAMPI
14771477+03E1 ; PVALID # GREEK SMALL LETTER SAMPI
14781478+03E2 ; DISALLOWED # COPTIC CAPITAL LETTER SHEI
14791479+03E3 ; PVALID # COPTIC SMALL LETTER SHEI
14801480+03E4 ; DISALLOWED # COPTIC CAPITAL LETTER FEI
14811481+03E5 ; PVALID # COPTIC SMALL LETTER FEI
14821482+03E6 ; DISALLOWED # COPTIC CAPITAL LETTER KHEI
14831483+03E7 ; PVALID # COPTIC SMALL LETTER KHEI
14841484+03E8 ; DISALLOWED # COPTIC CAPITAL LETTER HORI
14851485+03E9 ; PVALID # COPTIC SMALL LETTER HORI
14861486+03EA ; DISALLOWED # COPTIC CAPITAL LETTER GANGIA
14871487+03EB ; PVALID # COPTIC SMALL LETTER GANGIA
14881488+03EC ; DISALLOWED # COPTIC CAPITAL LETTER SHIMA
14891489+03ED ; PVALID # COPTIC SMALL LETTER SHIMA
14901490+03EE ; DISALLOWED # COPTIC CAPITAL LETTER DEI
14911491+03EF ; PVALID # COPTIC SMALL LETTER DEI
14921492+03F0..03F2 ; DISALLOWED # GREEK KAPPA SYMBOL..GREEK LUNATE SIGMA SYMBO
14931493+03F3 ; PVALID # GREEK LETTER YOT
14941494+03F4..03F7 ; DISALLOWED # GREEK CAPITAL THETA SYMBOL..GREEK CAPITAL LE
14951495+03F8 ; PVALID # GREEK SMALL LETTER SHO
14961496+03F9..03FA ; DISALLOWED # GREEK CAPITAL LUNATE SIGMA SYMBOL..GREEK CAP
14971497+03FB..03FC ; PVALID # GREEK SMALL LETTER SAN..GREEK RHO WITH STROK
14981498+03FD..042F ; DISALLOWED # GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL..
14991499+0430..045F ; PVALID # CYRILLIC SMALL LETTER A..CYRILLIC SMALL LETT
15001500+0460 ; DISALLOWED # CYRILLIC CAPITAL LETTER OMEGA
15011501+0461 ; PVALID # CYRILLIC SMALL LETTER OMEGA
15021502+0462 ; DISALLOWED # CYRILLIC CAPITAL LETTER YAT
15031503+0463 ; PVALID # CYRILLIC SMALL LETTER YAT
15041504+0464 ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED E
15051505+0465 ; PVALID # CYRILLIC SMALL LETTER IOTIFIED E
15061506+0466 ; DISALLOWED # CYRILLIC CAPITAL LETTER LITTLE YUS
15071507+0467 ; PVALID # CYRILLIC SMALL LETTER LITTLE YUS
15081508+0468 ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED LITTLE YUS
15091509+0469 ; PVALID # CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS
15101510+046A ; DISALLOWED # CYRILLIC CAPITAL LETTER BIG YUS
15111511+15121512+15131513+15141514+Faltstrom Standards Track [Page 27]
15151515+15161516+RFC 5892 IDNA Code Points August 2010
15171517+15181518+15191519+046B ; PVALID # CYRILLIC SMALL LETTER BIG YUS
15201520+046C ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED BIG YUS
15211521+046D ; PVALID # CYRILLIC SMALL LETTER IOTIFIED BIG YUS
15221522+046E ; DISALLOWED # CYRILLIC CAPITAL LETTER KSI
15231523+046F ; PVALID # CYRILLIC SMALL LETTER KSI
15241524+0470 ; DISALLOWED # CYRILLIC CAPITAL LETTER PSI
15251525+0471 ; PVALID # CYRILLIC SMALL LETTER PSI
15261526+0472 ; DISALLOWED # CYRILLIC CAPITAL LETTER FITA
15271527+0473 ; PVALID # CYRILLIC SMALL LETTER FITA
15281528+0474 ; DISALLOWED # CYRILLIC CAPITAL LETTER IZHITSA
15291529+0475 ; PVALID # CYRILLIC SMALL LETTER IZHITSA
15301530+0476 ; DISALLOWED # CYRILLIC CAPITAL LETTER IZHITSA WITH DOUBLE
15311531+0477 ; PVALID # CYRILLIC SMALL LETTER IZHITSA WITH DOUBLE GR
15321532+0478 ; DISALLOWED # CYRILLIC CAPITAL LETTER UK
15331533+0479 ; PVALID # CYRILLIC SMALL LETTER UK
15341534+047A ; DISALLOWED # CYRILLIC CAPITAL LETTER ROUND OMEGA
15351535+047B ; PVALID # CYRILLIC SMALL LETTER ROUND OMEGA
15361536+047C ; DISALLOWED # CYRILLIC CAPITAL LETTER OMEGA WITH TITLO
15371537+047D ; PVALID # CYRILLIC SMALL LETTER OMEGA WITH TITLO
15381538+047E ; DISALLOWED # CYRILLIC CAPITAL LETTER OT
15391539+047F ; PVALID # CYRILLIC SMALL LETTER OT
15401540+0480 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOPPA
15411541+0481 ; PVALID # CYRILLIC SMALL LETTER KOPPA
15421542+0482 ; DISALLOWED # CYRILLIC THOUSANDS SIGN
15431543+0483..0487 ; PVALID # COMBINING CYRILLIC TITLO..COMBINING CYRILLIC
15441544+0488..048A ; DISALLOWED # COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..C
15451545+048B ; PVALID # CYRILLIC SMALL LETTER SHORT I WITH TAIL
15461546+048C ; DISALLOWED # CYRILLIC CAPITAL LETTER SEMISOFT SIGN
15471547+048D ; PVALID # CYRILLIC SMALL LETTER SEMISOFT SIGN
15481548+048E ; DISALLOWED # CYRILLIC CAPITAL LETTER ER WITH TICK
15491549+048F ; PVALID # CYRILLIC SMALL LETTER ER WITH TICK
15501550+0490 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH UPTURN
15511551+0491 ; PVALID # CYRILLIC SMALL LETTER GHE WITH UPTURN
15521552+0492 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH STROKE
15531553+0493 ; PVALID # CYRILLIC SMALL LETTER GHE WITH STROKE
15541554+0494 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK
15551555+0495 ; PVALID # CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK
15561556+0496 ; DISALLOWED # CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
15571557+0497 ; PVALID # CYRILLIC SMALL LETTER ZHE WITH DESCENDER
15581558+0498 ; DISALLOWED # CYRILLIC CAPITAL LETTER ZE WITH DESCENDER
15591559+0499 ; PVALID # CYRILLIC SMALL LETTER ZE WITH DESCENDER
15601560+049A ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH DESCENDER
15611561+049B ; PVALID # CYRILLIC SMALL LETTER KA WITH DESCENDER
15621562+049C ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH VERTICAL STR
15631563+049D ; PVALID # CYRILLIC SMALL LETTER KA WITH VERTICAL STROK
15641564+049E ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH STROKE
15651565+049F ; PVALID # CYRILLIC SMALL LETTER KA WITH STROKE
15661566+04A0 ; DISALLOWED # CYRILLIC CAPITAL LETTER BASHKIR KA
15671567+15681568+15691569+15701570+Faltstrom Standards Track [Page 28]
15711571+15721572+RFC 5892 IDNA Code Points August 2010
15731573+15741574+15751575+04A1 ; PVALID # CYRILLIC SMALL LETTER BASHKIR KA
15761576+04A2 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH DESCENDER
15771577+04A3 ; PVALID # CYRILLIC SMALL LETTER EN WITH DESCENDER
15781578+04A4 ; DISALLOWED # CYRILLIC CAPITAL LIGATURE EN GHE
15791579+04A5 ; PVALID # CYRILLIC SMALL LIGATURE EN GHE
15801580+04A6 ; DISALLOWED # CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK
15811581+04A7 ; PVALID # CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK
15821582+04A8 ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN HA
15831583+04A9 ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN HA
15841584+04AA ; DISALLOWED # CYRILLIC CAPITAL LETTER ES WITH DESCENDER
15851585+04AB ; PVALID # CYRILLIC SMALL LETTER ES WITH DESCENDER
15861586+04AC ; DISALLOWED # CYRILLIC CAPITAL LETTER TE WITH DESCENDER
15871587+04AD ; PVALID # CYRILLIC SMALL LETTER TE WITH DESCENDER
15881588+04AE ; DISALLOWED # CYRILLIC CAPITAL LETTER STRAIGHT U
15891589+04AF ; PVALID # CYRILLIC SMALL LETTER STRAIGHT U
15901590+04B0 ; DISALLOWED # CYRILLIC CAPITAL LETTER STRAIGHT U WITH STRO
15911591+04B1 ; PVALID # CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE
15921592+04B2 ; DISALLOWED # CYRILLIC CAPITAL LETTER HA WITH DESCENDER
15931593+04B3 ; PVALID # CYRILLIC SMALL LETTER HA WITH DESCENDER
15941594+04B4 ; DISALLOWED # CYRILLIC CAPITAL LIGATURE TE TSE
15951595+04B5 ; PVALID # CYRILLIC SMALL LIGATURE TE TSE
15961596+04B6 ; DISALLOWED # CYRILLIC CAPITAL LETTER CHE WITH DESCENDER
15971597+04B7 ; PVALID # CYRILLIC SMALL LETTER CHE WITH DESCENDER
15981598+04B8 ; DISALLOWED # CYRILLIC CAPITAL LETTER CHE WITH VERTICAL ST
15991599+04B9 ; PVALID # CYRILLIC SMALL LETTER CHE WITH VERTICAL STRO
16001600+04BA ; DISALLOWED # CYRILLIC CAPITAL LETTER SHHA
16011601+04BB ; PVALID # CYRILLIC SMALL LETTER SHHA
16021602+04BC ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN CHE
16031603+04BD ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN CHE
16041604+04BE ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH D
16051605+04BF ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DES
16061606+04C0..04C1 ; DISALLOWED # CYRILLIC LETTER PALOCHKA..CYRILLIC CAPITAL L
16071607+04C2 ; PVALID # CYRILLIC SMALL LETTER ZHE WITH BREVE
16081608+04C3 ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH HOOK
16091609+04C4 ; PVALID # CYRILLIC SMALL LETTER KA WITH HOOK
16101610+04C5 ; DISALLOWED # CYRILLIC CAPITAL LETTER EL WITH TAIL
16111611+04C6 ; PVALID # CYRILLIC SMALL LETTER EL WITH TAIL
16121612+04C7 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH HOOK
16131613+04C8 ; PVALID # CYRILLIC SMALL LETTER EN WITH HOOK
16141614+04C9 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH TAIL
16151615+04CA ; PVALID # CYRILLIC SMALL LETTER EN WITH TAIL
16161616+04CB ; DISALLOWED # CYRILLIC CAPITAL LETTER KHAKASSIAN CHE
16171617+04CC ; PVALID # CYRILLIC SMALL LETTER KHAKASSIAN CHE
16181618+04CD ; DISALLOWED # CYRILLIC CAPITAL LETTER EM WITH TAIL
16191619+04CE..04CF ; PVALID # CYRILLIC SMALL LETTER EM WITH TAIL..CYRILLIC
16201620+04D0 ; DISALLOWED # CYRILLIC CAPITAL LETTER A WITH BREVE
16211621+04D1 ; PVALID # CYRILLIC SMALL LETTER A WITH BREVE
16221622+04D2 ; DISALLOWED # CYRILLIC CAPITAL LETTER A WITH DIAERESIS
16231623+16241624+16251625+16261626+Faltstrom Standards Track [Page 29]
16271627+16281628+RFC 5892 IDNA Code Points August 2010
16291629+16301630+16311631+04D3 ; PVALID # CYRILLIC SMALL LETTER A WITH DIAERESIS
16321632+04D4 ; DISALLOWED # CYRILLIC CAPITAL LIGATURE A IE
16331633+04D5 ; PVALID # CYRILLIC SMALL LIGATURE A IE
16341634+04D6 ; DISALLOWED # CYRILLIC CAPITAL LETTER IE WITH BREVE
16351635+04D7 ; PVALID # CYRILLIC SMALL LETTER IE WITH BREVE
16361636+04D8 ; DISALLOWED # CYRILLIC CAPITAL LETTER SCHWA
16371637+04D9 ; PVALID # CYRILLIC SMALL LETTER SCHWA
16381638+04DA ; DISALLOWED # CYRILLIC CAPITAL LETTER SCHWA WITH DIAERESIS
16391639+04DB ; PVALID # CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS
16401640+04DC ; DISALLOWED # CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS
16411641+04DD ; PVALID # CYRILLIC SMALL LETTER ZHE WITH DIAERESIS
16421642+04DE ; DISALLOWED # CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS
16431643+04DF ; PVALID # CYRILLIC SMALL LETTER ZE WITH DIAERESIS
16441644+04E0 ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN DZE
16451645+04E1 ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN DZE
16461646+04E2 ; DISALLOWED # CYRILLIC CAPITAL LETTER I WITH MACRON
16471647+04E3 ; PVALID # CYRILLIC SMALL LETTER I WITH MACRON
16481648+04E4 ; DISALLOWED # CYRILLIC CAPITAL LETTER I WITH DIAERESIS
16491649+04E5 ; PVALID # CYRILLIC SMALL LETTER I WITH DIAERESIS
16501650+04E6 ; DISALLOWED # CYRILLIC CAPITAL LETTER O WITH DIAERESIS
16511651+04E7 ; PVALID # CYRILLIC SMALL LETTER O WITH DIAERESIS
16521652+04E8 ; DISALLOWED # CYRILLIC CAPITAL LETTER BARRED O
16531653+04E9 ; PVALID # CYRILLIC SMALL LETTER BARRED O
16541654+04EA ; DISALLOWED # CYRILLIC CAPITAL LETTER BARRED O WITH DIAERE
16551655+04EB ; PVALID # CYRILLIC SMALL LETTER BARRED O WITH DIAERESI
16561656+04EC ; DISALLOWED # CYRILLIC CAPITAL LETTER E WITH DIAERESIS
16571657+04ED ; PVALID # CYRILLIC SMALL LETTER E WITH DIAERESIS
16581658+04EE ; DISALLOWED # CYRILLIC CAPITAL LETTER U WITH MACRON
16591659+04EF ; PVALID # CYRILLIC SMALL LETTER U WITH MACRON
16601660+04F0 ; DISALLOWED # CYRILLIC CAPITAL LETTER U WITH DIAERESIS
16611661+04F1 ; PVALID # CYRILLIC SMALL LETTER U WITH DIAERESIS
16621662+04F2 ; DISALLOWED # CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE
16631663+04F3 ; PVALID # CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE
16641664+04F4 ; DISALLOWED # CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS
16651665+04F5 ; PVALID # CYRILLIC SMALL LETTER CHE WITH DIAERESIS
16661666+04F6 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH DESCENDER
16671667+04F7 ; PVALID # CYRILLIC SMALL LETTER GHE WITH DESCENDER
16681668+04F8 ; DISALLOWED # CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS
16691669+04F9 ; PVALID # CYRILLIC SMALL LETTER YERU WITH DIAERESIS
16701670+04FA ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH STROKE AND
16711671+04FB ; PVALID # CYRILLIC SMALL LETTER GHE WITH STROKE AND HO
16721672+04FC ; DISALLOWED # CYRILLIC CAPITAL LETTER HA WITH HOOK
16731673+04FD ; PVALID # CYRILLIC SMALL LETTER HA WITH HOOK
16741674+04FE ; DISALLOWED # CYRILLIC CAPITAL LETTER HA WITH STROKE
16751675+04FF ; PVALID # CYRILLIC SMALL LETTER HA WITH STROKE
16761676+0500 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI DE
16771677+0501 ; PVALID # CYRILLIC SMALL LETTER KOMI DE
16781678+0502 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI DJE
16791679+16801680+16811681+16821682+Faltstrom Standards Track [Page 30]
16831683+16841684+RFC 5892 IDNA Code Points August 2010
16851685+16861686+16871687+0503 ; PVALID # CYRILLIC SMALL LETTER KOMI DJE
16881688+0504 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI ZJE
16891689+0505 ; PVALID # CYRILLIC SMALL LETTER KOMI ZJE
16901690+0506 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI DZJE
16911691+0507 ; PVALID # CYRILLIC SMALL LETTER KOMI DZJE
16921692+0508 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI LJE
16931693+0509 ; PVALID # CYRILLIC SMALL LETTER KOMI LJE
16941694+050A ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI NJE
16951695+050B ; PVALID # CYRILLIC SMALL LETTER KOMI NJE
16961696+050C ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI SJE
16971697+050D ; PVALID # CYRILLIC SMALL LETTER KOMI SJE
16981698+050E ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI TJE
16991699+050F ; PVALID # CYRILLIC SMALL LETTER KOMI TJE
17001700+0510 ; DISALLOWED # CYRILLIC CAPITAL LETTER REVERSED ZE
17011701+0511 ; PVALID # CYRILLIC SMALL LETTER REVERSED ZE
17021702+0512 ; DISALLOWED # CYRILLIC CAPITAL LETTER EL WITH HOOK
17031703+0513 ; PVALID # CYRILLIC SMALL LETTER EL WITH HOOK
17041704+0514 ; DISALLOWED # CYRILLIC CAPITAL LETTER LHA
17051705+0515 ; PVALID # CYRILLIC SMALL LETTER LHA
17061706+0516 ; DISALLOWED # CYRILLIC CAPITAL LETTER RHA
17071707+0517 ; PVALID # CYRILLIC SMALL LETTER RHA
17081708+0518 ; DISALLOWED # CYRILLIC CAPITAL LETTER YAE
17091709+0519 ; PVALID # CYRILLIC SMALL LETTER YAE
17101710+051A ; DISALLOWED # CYRILLIC CAPITAL LETTER QA
17111711+051B ; PVALID # CYRILLIC SMALL LETTER QA
17121712+051C ; DISALLOWED # CYRILLIC CAPITAL LETTER WE
17131713+051D ; PVALID # CYRILLIC SMALL LETTER WE
17141714+051E ; DISALLOWED # CYRILLIC CAPITAL LETTER ALEUT KA
17151715+051F ; PVALID # CYRILLIC SMALL LETTER ALEUT KA
17161716+0520 ; DISALLOWED # CYRILLIC CAPITAL LETTER EL WITH MIDDLE HOOK
17171717+0521 ; PVALID # CYRILLIC SMALL LETTER EL WITH MIDDLE HOOK
17181718+0522 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH MIDDLE HOOK
17191719+0523 ; PVALID # CYRILLIC SMALL LETTER EN WITH MIDDLE HOOK
17201720+0524 ; DISALLOWED # CYRILLIC CAPITAL LETTER PE WITH DESCENDER
17211721+0525 ; PVALID # CYRILLIC SMALL LETTER PE WITH DESCENDER
17221722+0526..0530 ; UNASSIGNED # <reserved>..<reserved>
17231723+0531..0556 ; DISALLOWED # ARMENIAN CAPITAL LETTER AYB..ARMENIAN CAPITA
17241724+0557..0558 ; UNASSIGNED # <reserved>..<reserved>
17251725+0559 ; PVALID # ARMENIAN MODIFIER LETTER LEFT HALF RING
17261726+055A..055F ; DISALLOWED # ARMENIAN APOSTROPHE..ARMENIAN ABBREVIATION M
17271727+0560 ; UNASSIGNED # <reserved>
17281728+0561..0586 ; PVALID # ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LE
17291729+0587 ; DISALLOWED # ARMENIAN SMALL LIGATURE ECH YIWN
17301730+0588 ; UNASSIGNED # <reserved>
17311731+0589..058A ; DISALLOWED # ARMENIAN FULL STOP..ARMENIAN HYPHEN
17321732+058B..0590 ; UNASSIGNED # <reserved>..<reserved>
17331733+0591..05BD ; PVALID # HEBREW ACCENT ETNAHTA..HEBREW POINT METEG
17341734+05BE ; DISALLOWED # HEBREW PUNCTUATION MAQAF
17351735+17361736+17371737+17381738+Faltstrom Standards Track [Page 31]
17391739+17401740+RFC 5892 IDNA Code Points August 2010
17411741+17421742+17431743+05BF ; PVALID # HEBREW POINT RAFE
17441744+05C0 ; DISALLOWED # HEBREW PUNCTUATION PASEQ
17451745+05C1..05C2 ; PVALID # HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT
17461746+05C3 ; DISALLOWED # HEBREW PUNCTUATION SOF PASUQ
17471747+05C4..05C5 ; PVALID # HEBREW MARK UPPER DOT..HEBREW MARK LOWER DOT
17481748+05C6 ; DISALLOWED # HEBREW PUNCTUATION NUN HAFUKHA
17491749+05C7 ; PVALID # HEBREW POINT QAMATS QATAN
17501750+05C8..05CF ; UNASSIGNED # <reserved>..<reserved>
17511751+05D0..05EA ; PVALID # HEBREW LETTER ALEF..HEBREW LETTER TAV
17521752+05EB..05EF ; UNASSIGNED # <reserved>..<reserved>
17531753+05F0..05F2 ; PVALID # HEBREW LIGATURE YIDDISH DOUBLE VAV..HEBREW L
17541754+05F3..05F4 ; CONTEXTO # HEBREW PUNCTUATION GERESH..HEBREW PUNCTUATIO
17551755+05F5..05FF ; UNASSIGNED # <reserved>..<reserved>
17561756+0600..0603 ; DISALLOWED # ARABIC NUMBER SIGN..ARABIC SIGN SAFHA
17571757+0604..0605 ; UNASSIGNED # <reserved>..<reserved>
17581758+0606..060F ; DISALLOWED # ARABIC-INDIC CUBE ROOT..ARABIC SIGN MISRA
17591759+0610..061A ; PVALID # ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..AR
17601760+061B ; DISALLOWED # ARABIC SEMICOLON
17611761+061C..061D ; UNASSIGNED # <reserved>..<reserved>
17621762+061E..061F ; DISALLOWED # ARABIC TRIPLE DOT PUNCTUATION MARK..ARABIC Q
17631763+0620 ; UNASSIGNED # <reserved>
17641764+0621..063F ; PVALID # ARABIC LETTER HAMZA..ARABIC LETTER FARSI YEH
17651765+0640 ; DISALLOWED # ARABIC TATWEEL
17661766+0641..065E ; PVALID # ARABIC LETTER FEH..ARABIC FATHA WITH TWO DOT
17671767+065F ; UNASSIGNED # <reserved>
17681768+0660..0669 ; CONTEXTO # ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT
17691769+066A..066D ; DISALLOWED # ARABIC PERCENT SIGN..ARABIC FIVE POINTED STA
17701770+066E..0674 ; PVALID # ARABIC LETTER DOTLESS BEH..ARABIC LETTER HIG
17711771+0675..0678 ; DISALLOWED # ARABIC LETTER HIGH HAMZA ALEF..ARABIC LETTER
17721772+0679..06D3 ; PVALID # ARABIC LETTER TTEH..ARABIC LETTER YEH BARREE
17731773+06D4 ; DISALLOWED # ARABIC FULL STOP
17741774+06D5..06DC ; PVALID # ARABIC LETTER AE..ARABIC SMALL HIGH SEEN
17751775+06DD..06DE ; DISALLOWED # ARABIC END OF AYAH..ARABIC START OF RUB EL H
17761776+06DF..06E8 ; PVALID # ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL
17771777+06E9 ; DISALLOWED # ARABIC PLACE OF SAJDAH
17781778+06EA..06EF ; PVALID # ARABIC EMPTY CENTRE LOW STOP..ARABIC LETTER
17791779+06F0..06F9 ; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ZERO..EXTENDED A
17801780+06FA..06FF ; PVALID # ARABIC LETTER SHEEN WITH DOT BELOW..ARABIC L
17811781+0700..070D ; DISALLOWED # SYRIAC END OF PARAGRAPH..SYRIAC HARKLEAN AST
17821782+070E ; UNASSIGNED # <reserved>
17831783+070F ; DISALLOWED # SYRIAC ABBREVIATION MARK
17841784+0710..074A ; PVALID # SYRIAC LETTER ALAPH..SYRIAC BARREKH
17851785+074B..074C ; UNASSIGNED # <reserved>..<reserved>
17861786+074D..07B1 ; PVALID # SYRIAC LETTER SOGDIAN ZHAIN..THAANA LETTER N
17871787+07B2..07BF ; UNASSIGNED # <reserved>..<reserved>
17881788+07C0..07F5 ; PVALID # NKO DIGIT ZERO..NKO LOW TONE APOSTROPHE
17891789+07F6..07FA ; DISALLOWED # NKO SYMBOL OO DENNEN..NKO LAJANYALAN
17901790+07FB..07FF ; UNASSIGNED # <reserved>..<reserved>
17911791+17921792+17931793+17941794+Faltstrom Standards Track [Page 32]
17951795+17961796+RFC 5892 IDNA Code Points August 2010
17971797+17981798+17991799+0800..082D ; PVALID # SAMARITAN LETTER ALAF..SAMARITAN MARK NEQUDA
18001800+082E..082F ; UNASSIGNED # <reserved>..<reserved>
18011801+0830..083E ; DISALLOWED # SAMARITAN PUNCTUATION NEQUDAA..SAMARITAN PUN
18021802+083F..08FF ; UNASSIGNED # <reserved>..<reserved>
18031803+0900..0939 ; PVALID # DEVANAGARI SIGN INVERTED CANDRABINDU..DEVANA
18041804+093A..093B ; UNASSIGNED # <reserved>..<reserved>
18051805+093C..094E ; PVALID # DEVANAGARI SIGN NUKTA..DEVANAGARI VOWEL SIGN
18061806+094F ; UNASSIGNED # <reserved>
18071807+0950..0955 ; PVALID # DEVANAGARI OM..DEVANAGARI VOWEL SIGN CANDRA
18081808+0956..0957 ; UNASSIGNED # <reserved>..<reserved>
18091809+0958..095F ; DISALLOWED # DEVANAGARI LETTER QA..DEVANAGARI LETTER YYA
18101810+0960..0963 ; PVALID # DEVANAGARI LETTER VOCALIC RR..DEVANAGARI VOW
18111811+0964..0965 ; DISALLOWED # DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
18121812+0966..096F ; PVALID # DEVANAGARI DIGIT ZERO..DEVANAGARI DIGIT NINE
18131813+0970 ; DISALLOWED # DEVANAGARI ABBREVIATION SIGN
18141814+0971..0972 ; PVALID # DEVANAGARI SIGN HIGH SPACING DOT..DEVANAGARI
18151815+0973..0978 ; UNASSIGNED # <reserved>..<reserved>
18161816+0979..097F ; PVALID # DEVANAGARI LETTER ZHA..DEVANAGARI LETTER BBA
18171817+0980 ; UNASSIGNED # <reserved>
18181818+0981..0983 ; PVALID # BENGALI SIGN CANDRABINDU..BENGALI SIGN VISAR
18191819+0984 ; UNASSIGNED # <reserved>
18201820+0985..098C ; PVALID # BENGALI LETTER A..BENGALI LETTER VOCALIC L
18211821+098D..098E ; UNASSIGNED # <reserved>..<reserved>
18221822+098F..0990 ; PVALID # BENGALI LETTER E..BENGALI LETTER AI
18231823+0991..0992 ; UNASSIGNED # <reserved>..<reserved>
18241824+0993..09A8 ; PVALID # BENGALI LETTER O..BENGALI LETTER NA
18251825+09A9 ; UNASSIGNED # <reserved>
18261826+09AA..09B0 ; PVALID # BENGALI LETTER PA..BENGALI LETTER RA
18271827+09B1 ; UNASSIGNED # <reserved>
18281828+09B2 ; PVALID # BENGALI LETTER LA
18291829+09B3..09B5 ; UNASSIGNED # <reserved>..<reserved>
18301830+09B6..09B9 ; PVALID # BENGALI LETTER SHA..BENGALI LETTER HA
18311831+09BA..09BB ; UNASSIGNED # <reserved>..<reserved>
18321832+09BC..09C4 ; PVALID # BENGALI SIGN NUKTA..BENGALI VOWEL SIGN VOCAL
18331833+09C5..09C6 ; UNASSIGNED # <reserved>..<reserved>
18341834+09C7..09C8 ; PVALID # BENGALI VOWEL SIGN E..BENGALI VOWEL SIGN AI
18351835+09C9..09CA ; UNASSIGNED # <reserved>..<reserved>
18361836+09CB..09CE ; PVALID # BENGALI VOWEL SIGN O..BENGALI LETTER KHANDA
18371837+09CF..09D6 ; UNASSIGNED # <reserved>..<reserved>
18381838+09D7 ; PVALID # BENGALI AU LENGTH MARK
18391839+09D8..09DB ; UNASSIGNED # <reserved>..<reserved>
18401840+09DC..09DD ; DISALLOWED # BENGALI LETTER RRA..BENGALI LETTER RHA
18411841+09DE ; UNASSIGNED # <reserved>
18421842+09DF ; DISALLOWED # BENGALI LETTER YYA
18431843+09E0..09E3 ; PVALID # BENGALI LETTER VOCALIC RR..BENGALI VOWEL SIG
18441844+09E4..09E5 ; UNASSIGNED # <reserved>..<reserved>
18451845+09E6..09F1 ; PVALID # BENGALI DIGIT ZERO..BENGALI LETTER RA WITH L
18461846+09F2..09FB ; DISALLOWED # BENGALI RUPEE MARK..BENGALI GANDA MARK
18471847+18481848+18491849+18501850+Faltstrom Standards Track [Page 33]
18511851+18521852+RFC 5892 IDNA Code Points August 2010
18531853+18541854+18551855+09FC..0A00 ; UNASSIGNED # <reserved>..<reserved>
18561856+0A01..0A03 ; PVALID # GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN VISA
18571857+0A04 ; UNASSIGNED # <reserved>
18581858+0A05..0A0A ; PVALID # GURMUKHI LETTER A..GURMUKHI LETTER UU
18591859+0A0B..0A0E ; UNASSIGNED # <reserved>..<reserved>
18601860+0A0F..0A10 ; PVALID # GURMUKHI LETTER EE..GURMUKHI LETTER AI
18611861+0A11..0A12 ; UNASSIGNED # <reserved>..<reserved>
18621862+0A13..0A28 ; PVALID # GURMUKHI LETTER OO..GURMUKHI LETTER NA
18631863+0A29 ; UNASSIGNED # <reserved>
18641864+0A2A..0A30 ; PVALID # GURMUKHI LETTER PA..GURMUKHI LETTER RA
18651865+0A31 ; UNASSIGNED # <reserved>
18661866+0A32 ; PVALID # GURMUKHI LETTER LA
18671867+0A33 ; DISALLOWED # GURMUKHI LETTER LLA
18681868+0A34 ; UNASSIGNED # <reserved>
18691869+0A35 ; PVALID # GURMUKHI LETTER VA
18701870+0A36 ; DISALLOWED # GURMUKHI LETTER SHA
18711871+0A37 ; UNASSIGNED # <reserved>
18721872+0A38..0A39 ; PVALID # GURMUKHI LETTER SA..GURMUKHI LETTER HA
18731873+0A3A..0A3B ; UNASSIGNED # <reserved>..<reserved>
18741874+0A3C ; PVALID # GURMUKHI SIGN NUKTA
18751875+0A3D ; UNASSIGNED # <reserved>
18761876+0A3E..0A42 ; PVALID # GURMUKHI VOWEL SIGN AA..GURMUKHI VOWEL SIGN
18771877+0A43..0A46 ; UNASSIGNED # <reserved>..<reserved>
18781878+0A47..0A48 ; PVALID # GURMUKHI VOWEL SIGN EE..GURMUKHI VOWEL SIGN
18791879+0A49..0A4A ; UNASSIGNED # <reserved>..<reserved>
18801880+0A4B..0A4D ; PVALID # GURMUKHI VOWEL SIGN OO..GURMUKHI SIGN VIRAMA
18811881+0A4E..0A50 ; UNASSIGNED # <reserved>..<reserved>
18821882+0A51 ; PVALID # GURMUKHI SIGN UDAAT
18831883+0A52..0A58 ; UNASSIGNED # <reserved>..<reserved>
18841884+0A59..0A5B ; DISALLOWED # GURMUKHI LETTER KHHA..GURMUKHI LETTER ZA
18851885+0A5C ; PVALID # GURMUKHI LETTER RRA
18861886+0A5D ; UNASSIGNED # <reserved>
18871887+0A5E ; DISALLOWED # GURMUKHI LETTER FA
18881888+0A5F..0A65 ; UNASSIGNED # <reserved>..<reserved>
18891889+0A66..0A75 ; PVALID # GURMUKHI DIGIT ZERO..GURMUKHI SIGN YAKASH
18901890+0A76..0A80 ; UNASSIGNED # <reserved>..<reserved>
18911891+0A81..0A83 ; PVALID # GUJARATI SIGN CANDRABINDU..GUJARATI SIGN VIS
18921892+0A84 ; UNASSIGNED # <reserved>
18931893+0A85..0A8D ; PVALID # GUJARATI LETTER A..GUJARATI VOWEL CANDRA E
18941894+0A8E ; UNASSIGNED # <reserved>
18951895+0A8F..0A91 ; PVALID # GUJARATI LETTER E..GUJARATI VOWEL CANDRA O
18961896+0A92 ; UNASSIGNED # <reserved>
18971897+0A93..0AA8 ; PVALID # GUJARATI LETTER O..GUJARATI LETTER NA
18981898+0AA9 ; UNASSIGNED # <reserved>
18991899+0AAA..0AB0 ; PVALID # GUJARATI LETTER PA..GUJARATI LETTER RA
19001900+0AB1 ; UNASSIGNED # <reserved>
19011901+0AB2..0AB3 ; PVALID # GUJARATI LETTER LA..GUJARATI LETTER LLA
19021902+0AB4 ; UNASSIGNED # <reserved>
19031903+19041904+19051905+19061906+Faltstrom Standards Track [Page 34]
19071907+19081908+RFC 5892 IDNA Code Points August 2010
19091909+19101910+19111911+0AB5..0AB9 ; PVALID # GUJARATI LETTER VA..GUJARATI LETTER HA
19121912+0ABA..0ABB ; UNASSIGNED # <reserved>..<reserved>
19131913+0ABC..0AC5 ; PVALID # GUJARATI SIGN NUKTA..GUJARATI VOWEL SIGN CAN
19141914+0AC6 ; UNASSIGNED # <reserved>
19151915+0AC7..0AC9 ; PVALID # GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN C
19161916+0ACA ; UNASSIGNED # <reserved>
19171917+0ACB..0ACD ; PVALID # GUJARATI VOWEL SIGN O..GUJARATI SIGN VIRAMA
19181918+0ACE..0ACF ; UNASSIGNED # <reserved>..<reserved>
19191919+0AD0 ; PVALID # GUJARATI OM
19201920+0AD1..0ADF ; UNASSIGNED # <reserved>..<reserved>
19211921+0AE0..0AE3 ; PVALID # GUJARATI LETTER VOCALIC RR..GUJARATI VOWEL S
19221922+0AE4..0AE5 ; UNASSIGNED # <reserved>..<reserved>
19231923+0AE6..0AEF ; PVALID # GUJARATI DIGIT ZERO..GUJARATI DIGIT NINE
19241924+0AF0 ; UNASSIGNED # <reserved>
19251925+0AF1 ; DISALLOWED # GUJARATI RUPEE SIGN
19261926+0AF2..0B00 ; UNASSIGNED # <reserved>..<reserved>
19271927+0B01..0B03 ; PVALID # ORIYA SIGN CANDRABINDU..ORIYA SIGN VISARGA
19281928+0B04 ; UNASSIGNED # <reserved>
19291929+0B05..0B0C ; PVALID # ORIYA LETTER A..ORIYA LETTER VOCALIC L
19301930+0B0D..0B0E ; UNASSIGNED # <reserved>..<reserved>
19311931+0B0F..0B10 ; PVALID # ORIYA LETTER E..ORIYA LETTER AI
19321932+0B11..0B12 ; UNASSIGNED # <reserved>..<reserved>
19331933+0B13..0B28 ; PVALID # ORIYA LETTER O..ORIYA LETTER NA
19341934+0B29 ; UNASSIGNED # <reserved>
19351935+0B2A..0B30 ; PVALID # ORIYA LETTER PA..ORIYA LETTER RA
19361936+0B31 ; UNASSIGNED # <reserved>
19371937+0B32..0B33 ; PVALID # ORIYA LETTER LA..ORIYA LETTER LLA
19381938+0B34 ; UNASSIGNED # <reserved>
19391939+0B35..0B39 ; PVALID # ORIYA LETTER VA..ORIYA LETTER HA
19401940+0B3A..0B3B ; UNASSIGNED # <reserved>..<reserved>
19411941+0B3C..0B44 ; PVALID # ORIYA SIGN NUKTA..ORIYA VOWEL SIGN VOCALIC R
19421942+0B45..0B46 ; UNASSIGNED # <reserved>..<reserved>
19431943+0B47..0B48 ; PVALID # ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI
19441944+0B49..0B4A ; UNASSIGNED # <reserved>..<reserved>
19451945+0B4B..0B4D ; PVALID # ORIYA VOWEL SIGN O..ORIYA SIGN VIRAMA
19461946+0B4E..0B55 ; UNASSIGNED # <reserved>..<reserved>
19471947+0B56..0B57 ; PVALID # ORIYA AI LENGTH MARK..ORIYA AU LENGTH MARK
19481948+0B58..0B5B ; UNASSIGNED # <reserved>..<reserved>
19491949+0B5C..0B5D ; DISALLOWED # ORIYA LETTER RRA..ORIYA LETTER RHA
19501950+0B5E ; UNASSIGNED # <reserved>
19511951+0B5F..0B63 ; PVALID # ORIYA LETTER YYA..ORIYA VOWEL SIGN VOCALIC L
19521952+0B64..0B65 ; UNASSIGNED # <reserved>..<reserved>
19531953+0B66..0B6F ; PVALID # ORIYA DIGIT ZERO..ORIYA DIGIT NINE
19541954+0B70 ; DISALLOWED # ORIYA ISSHAR
19551955+0B71 ; PVALID # ORIYA LETTER WA
19561956+0B72..0B81 ; UNASSIGNED # <reserved>..<reserved>
19571957+0B82..0B83 ; PVALID # TAMIL SIGN ANUSVARA..TAMIL SIGN VISARGA
19581958+0B84 ; UNASSIGNED # <reserved>
19591959+19601960+19611961+19621962+Faltstrom Standards Track [Page 35]
19631963+19641964+RFC 5892 IDNA Code Points August 2010
19651965+19661966+19671967+0B85..0B8A ; PVALID # TAMIL LETTER A..TAMIL LETTER UU
19681968+0B8B..0B8D ; UNASSIGNED # <reserved>..<reserved>
19691969+0B8E..0B90 ; PVALID # TAMIL LETTER E..TAMIL LETTER AI
19701970+0B91 ; UNASSIGNED # <reserved>
19711971+0B92..0B95 ; PVALID # TAMIL LETTER O..TAMIL LETTER KA
19721972+0B96..0B98 ; UNASSIGNED # <reserved>..<reserved>
19731973+0B99..0B9A ; PVALID # TAMIL LETTER NGA..TAMIL LETTER CA
19741974+0B9B ; UNASSIGNED # <reserved>
19751975+0B9C ; PVALID # TAMIL LETTER JA
19761976+0B9D ; UNASSIGNED # <reserved>
19771977+0B9E..0B9F ; PVALID # TAMIL LETTER NYA..TAMIL LETTER TTA
19781978+0BA0..0BA2 ; UNASSIGNED # <reserved>..<reserved>
19791979+0BA3..0BA4 ; PVALID # TAMIL LETTER NNA..TAMIL LETTER TA
19801980+0BA5..0BA7 ; UNASSIGNED # <reserved>..<reserved>
19811981+0BA8..0BAA ; PVALID # TAMIL LETTER NA..TAMIL LETTER PA
19821982+0BAB..0BAD ; UNASSIGNED # <reserved>..<reserved>
19831983+0BAE..0BB9 ; PVALID # TAMIL LETTER MA..TAMIL LETTER HA
19841984+0BBA..0BBD ; UNASSIGNED # <reserved>..<reserved>
19851985+0BBE..0BC2 ; PVALID # TAMIL VOWEL SIGN AA..TAMIL VOWEL SIGN UU
19861986+0BC3..0BC5 ; UNASSIGNED # <reserved>..<reserved>
19871987+0BC6..0BC8 ; PVALID # TAMIL VOWEL SIGN E..TAMIL VOWEL SIGN AI
19881988+0BC9 ; UNASSIGNED # <reserved>
19891989+0BCA..0BCD ; PVALID # TAMIL VOWEL SIGN O..TAMIL SIGN VIRAMA
19901990+0BCE..0BCF ; UNASSIGNED # <reserved>..<reserved>
19911991+0BD0 ; PVALID # TAMIL OM
19921992+0BD1..0BD6 ; UNASSIGNED # <reserved>..<reserved>
19931993+0BD7 ; PVALID # TAMIL AU LENGTH MARK
19941994+0BD8..0BE5 ; UNASSIGNED # <reserved>..<reserved>
19951995+0BE6..0BEF ; PVALID # TAMIL DIGIT ZERO..TAMIL DIGIT NINE
19961996+0BF0..0BFA ; DISALLOWED # TAMIL NUMBER TEN..TAMIL NUMBER SIGN
19971997+0BFB..0C00 ; UNASSIGNED # <reserved>..<reserved>
19981998+0C01..0C03 ; PVALID # TELUGU SIGN CANDRABINDU..TELUGU SIGN VISARGA
19991999+0C04 ; UNASSIGNED # <reserved>
20002000+0C05..0C0C ; PVALID # TELUGU LETTER A..TELUGU LETTER VOCALIC L
20012001+0C0D ; UNASSIGNED # <reserved>
20022002+0C0E..0C10 ; PVALID # TELUGU LETTER E..TELUGU LETTER AI
20032003+0C11 ; UNASSIGNED # <reserved>
20042004+0C12..0C28 ; PVALID # TELUGU LETTER O..TELUGU LETTER NA
20052005+0C29 ; UNASSIGNED # <reserved>
20062006+0C2A..0C33 ; PVALID # TELUGU LETTER PA..TELUGU LETTER LLA
20072007+0C34 ; UNASSIGNED # <reserved>
20082008+0C35..0C39 ; PVALID # TELUGU LETTER VA..TELUGU LETTER HA
20092009+0C3A..0C3C ; UNASSIGNED # <reserved>..<reserved>
20102010+0C3D..0C44 ; PVALID # TELUGU SIGN AVAGRAHA..TELUGU VOWEL SIGN VOCA
20112011+0C45 ; UNASSIGNED # <reserved>
20122012+0C46..0C48 ; PVALID # TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI
20132013+0C49 ; UNASSIGNED # <reserved>
20142014+0C4A..0C4D ; PVALID # TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA
20152015+20162016+20172017+20182018+Faltstrom Standards Track [Page 36]
20192019+20202020+RFC 5892 IDNA Code Points August 2010
20212021+20222022+20232023+0C4E..0C54 ; UNASSIGNED # <reserved>..<reserved>
20242024+0C55..0C56 ; PVALID # TELUGU LENGTH MARK..TELUGU AI LENGTH MARK
20252025+0C57 ; UNASSIGNED # <reserved>
20262026+0C58..0C59 ; PVALID # TELUGU LETTER TSA..TELUGU LETTER DZA
20272027+0C5A..0C5F ; UNASSIGNED # <reserved>..<reserved>
20282028+0C60..0C63 ; PVALID # TELUGU LETTER VOCALIC RR..TELUGU VOWEL SIGN
20292029+0C64..0C65 ; UNASSIGNED # <reserved>..<reserved>
20302030+0C66..0C6F ; PVALID # TELUGU DIGIT ZERO..TELUGU DIGIT NINE
20312031+0C70..0C77 ; UNASSIGNED # <reserved>..<reserved>
20322032+0C78..0C7F ; DISALLOWED # TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF
20332033+0C80..0C81 ; UNASSIGNED # <reserved>..<reserved>
20342034+0C82..0C83 ; PVALID # KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA
20352035+0C84 ; UNASSIGNED # <reserved>
20362036+0C85..0C8C ; PVALID # KANNADA LETTER A..KANNADA LETTER VOCALIC L
20372037+0C8D ; UNASSIGNED # <reserved>
20382038+0C8E..0C90 ; PVALID # KANNADA LETTER E..KANNADA LETTER AI
20392039+0C91 ; UNASSIGNED # <reserved>
20402040+0C92..0CA8 ; PVALID # KANNADA LETTER O..KANNADA LETTER NA
20412041+0CA9 ; UNASSIGNED # <reserved>
20422042+0CAA..0CB3 ; PVALID # KANNADA LETTER PA..KANNADA LETTER LLA
20432043+0CB4 ; UNASSIGNED # <reserved>
20442044+0CB5..0CB9 ; PVALID # KANNADA LETTER VA..KANNADA LETTER HA
20452045+0CBA..0CBB ; UNASSIGNED # <reserved>..<reserved>
20462046+0CBC..0CC4 ; PVALID # KANNADA SIGN NUKTA..KANNADA VOWEL SIGN VOCAL
20472047+0CC5 ; UNASSIGNED # <reserved>
20482048+0CC6..0CC8 ; PVALID # KANNADA VOWEL SIGN E..KANNADA VOWEL SIGN AI
20492049+0CC9 ; UNASSIGNED # <reserved>
20502050+0CCA..0CCD ; PVALID # KANNADA VOWEL SIGN O..KANNADA SIGN VIRAMA
20512051+0CCE..0CD4 ; UNASSIGNED # <reserved>..<reserved>
20522052+0CD5..0CD6 ; PVALID # KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
20532053+0CD7..0CDD ; UNASSIGNED # <reserved>..<reserved>
20542054+0CDE ; PVALID # KANNADA LETTER FA
20552055+0CDF ; UNASSIGNED # <reserved>
20562056+0CE0..0CE3 ; PVALID # KANNADA LETTER VOCALIC RR..KANNADA VOWEL SIG
20572057+0CE4..0CE5 ; UNASSIGNED # <reserved>..<reserved>
20582058+0CE6..0CEF ; PVALID # KANNADA DIGIT ZERO..KANNADA DIGIT NINE
20592059+0CF0 ; UNASSIGNED # <reserved>
20602060+0CF1..0CF2 ; DISALLOWED # KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADH
20612061+0CF3..0D01 ; UNASSIGNED # <reserved>..<reserved>
20622062+0D02..0D03 ; PVALID # MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISA
20632063+0D04 ; UNASSIGNED # <reserved>
20642064+0D05..0D0C ; PVALID # MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC
20652065+0D0D ; UNASSIGNED # <reserved>
20662066+0D0E..0D10 ; PVALID # MALAYALAM LETTER E..MALAYALAM LETTER AI
20672067+0D11 ; UNASSIGNED # <reserved>
20682068+0D12..0D28 ; PVALID # MALAYALAM LETTER O..MALAYALAM LETTER NA
20692069+0D29 ; UNASSIGNED # <reserved>
20702070+0D2A..0D39 ; PVALID # MALAYALAM LETTER PA..MALAYALAM LETTER HA
20712071+20722072+20732073+20742074+Faltstrom Standards Track [Page 37]
20752075+20762076+RFC 5892 IDNA Code Points August 2010
20772077+20782078+20792079+0D3A..0D3C ; UNASSIGNED # <reserved>..<reserved>
20802080+0D3D..0D44 ; PVALID # MALAYALAM SIGN AVAGRAHA..MALAYALAM VOWEL SIG
20812081+0D45 ; UNASSIGNED # <reserved>
20822082+0D46..0D48 ; PVALID # MALAYALAM VOWEL SIGN E..MALAYALAM VOWEL SIGN
20832083+0D49 ; UNASSIGNED # <reserved>
20842084+0D4A..0D4D ; PVALID # MALAYALAM VOWEL SIGN O..MALAYALAM SIGN VIRAM
20852085+0D4E..0D56 ; UNASSIGNED # <reserved>..<reserved>
20862086+0D57 ; PVALID # MALAYALAM AU LENGTH MARK
20872087+0D58..0D5F ; UNASSIGNED # <reserved>..<reserved>
20882088+0D60..0D63 ; PVALID # MALAYALAM LETTER VOCALIC RR..MALAYALAM VOWEL
20892089+0D64..0D65 ; UNASSIGNED # <reserved>..<reserved>
20902090+0D66..0D6F ; PVALID # MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE
20912091+0D70..0D75 ; DISALLOWED # MALAYALAM NUMBER TEN..MALAYALAM FRACTION THR
20922092+0D76..0D78 ; UNASSIGNED # <reserved>..<reserved>
20932093+0D79 ; DISALLOWED # MALAYALAM DATE MARK
20942094+0D7A..0D7F ; PVALID # MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER
20952095+0D80..0D81 ; UNASSIGNED # <reserved>..<reserved>
20962096+0D82..0D83 ; PVALID # SINHALA SIGN ANUSVARAYA..SINHALA SIGN VISARG
20972097+0D84 ; UNASSIGNED # <reserved>
20982098+0D85..0D96 ; PVALID # SINHALA LETTER AYANNA..SINHALA LETTER AUYANN
20992099+0D97..0D99 ; UNASSIGNED # <reserved>..<reserved>
21002100+0D9A..0DB1 ; PVALID # SINHALA LETTER ALPAPRAANA KAYANNA..SINHALA L
21012101+0DB2 ; UNASSIGNED # <reserved>
21022102+0DB3..0DBB ; PVALID # SINHALA LETTER SANYAKA DAYANNA..SINHALA LETT
21032103+0DBC ; UNASSIGNED # <reserved>
21042104+0DBD ; PVALID # SINHALA LETTER DANTAJA LAYANNA
21052105+0DBE..0DBF ; UNASSIGNED # <reserved>..<reserved>
21062106+0DC0..0DC6 ; PVALID # SINHALA LETTER VAYANNA..SINHALA LETTER FAYAN
21072107+0DC7..0DC9 ; UNASSIGNED # <reserved>..<reserved>
21082108+0DCA ; PVALID # SINHALA SIGN AL-LAKUNA
21092109+0DCB..0DCE ; UNASSIGNED # <reserved>..<reserved>
21102110+0DCF..0DD4 ; PVALID # SINHALA VOWEL SIGN AELA-PILLA..SINHALA VOWEL
21112111+0DD5 ; UNASSIGNED # <reserved>
21122112+0DD6 ; PVALID # SINHALA VOWEL SIGN DIGA PAA-PILLA
21132113+0DD7 ; UNASSIGNED # <reserved>
21142114+0DD8..0DDF ; PVALID # SINHALA VOWEL SIGN GAETTA-PILLA..SINHALA VOW
21152115+0DE0..0DF1 ; UNASSIGNED # <reserved>..<reserved>
21162116+0DF2..0DF3 ; PVALID # SINHALA VOWEL SIGN DIGA GAETTA-PILLA..SINHAL
21172117+0DF4 ; DISALLOWED # SINHALA PUNCTUATION KUNDDALIYA
21182118+0DF5..0E00 ; UNASSIGNED # <reserved>..<reserved>
21192119+0E01..0E32 ; PVALID # THAI CHARACTER KO KAI..THAI CHARACTER SARA A
21202120+0E33 ; DISALLOWED # THAI CHARACTER SARA AM
21212121+0E34..0E3A ; PVALID # THAI CHARACTER SARA I..THAI CHARACTER PHINTH
21222122+0E3B..0E3E ; UNASSIGNED # <reserved>..<reserved>
21232123+0E3F ; DISALLOWED # THAI CURRENCY SYMBOL BAHT
21242124+0E40..0E4E ; PVALID # THAI CHARACTER SARA E..THAI CHARACTER YAMAKK
21252125+0E4F ; DISALLOWED # THAI CHARACTER FONGMAN
21262126+0E50..0E59 ; PVALID # THAI DIGIT ZERO..THAI DIGIT NINE
21272127+21282128+21292129+21302130+Faltstrom Standards Track [Page 38]
21312131+21322132+RFC 5892 IDNA Code Points August 2010
21332133+21342134+21352135+0E5A..0E5B ; DISALLOWED # THAI CHARACTER ANGKHANKHU..THAI CHARACTER KH
21362136+0E5C..0E80 ; UNASSIGNED # <reserved>..<reserved>
21372137+0E81..0E82 ; PVALID # LAO LETTER KO..LAO LETTER KHO SUNG
21382138+0E83 ; UNASSIGNED # <reserved>
21392139+0E84 ; PVALID # LAO LETTER KHO TAM
21402140+0E85..0E86 ; UNASSIGNED # <reserved>..<reserved>
21412141+0E87..0E88 ; PVALID # LAO LETTER NGO..LAO LETTER CO
21422142+0E89 ; UNASSIGNED # <reserved>
21432143+0E8A ; PVALID # LAO LETTER SO TAM
21442144+0E8B..0E8C ; UNASSIGNED # <reserved>..<reserved>
21452145+0E8D ; PVALID # LAO LETTER NYO
21462146+0E8E..0E93 ; UNASSIGNED # <reserved>..<reserved>
21472147+0E94..0E97 ; PVALID # LAO LETTER DO..LAO LETTER THO TAM
21482148+0E98 ; UNASSIGNED # <reserved>
21492149+0E99..0E9F ; PVALID # LAO LETTER NO..LAO LETTER FO SUNG
21502150+0EA0 ; UNASSIGNED # <reserved>
21512151+0EA1..0EA3 ; PVALID # LAO LETTER MO..LAO LETTER LO LING
21522152+0EA4 ; UNASSIGNED # <reserved>
21532153+0EA5 ; PVALID # LAO LETTER LO LOOT
21542154+0EA6 ; UNASSIGNED # <reserved>
21552155+0EA7 ; PVALID # LAO LETTER WO
21562156+0EA8..0EA9 ; UNASSIGNED # <reserved>..<reserved>
21572157+0EAA..0EAB ; PVALID # LAO LETTER SO SUNG..LAO LETTER HO SUNG
21582158+0EAC ; UNASSIGNED # <reserved>
21592159+0EAD..0EB2 ; PVALID # LAO LETTER O..LAO VOWEL SIGN AA
21602160+0EB3 ; DISALLOWED # LAO VOWEL SIGN AM
21612161+0EB4..0EB9 ; PVALID # LAO VOWEL SIGN I..LAO VOWEL SIGN UU
21622162+0EBA ; UNASSIGNED # <reserved>
21632163+0EBB..0EBD ; PVALID # LAO VOWEL SIGN MAI KON..LAO SEMIVOWEL SIGN N
21642164+0EBE..0EBF ; UNASSIGNED # <reserved>..<reserved>
21652165+0EC0..0EC4 ; PVALID # LAO VOWEL SIGN E..LAO VOWEL SIGN AI
21662166+0EC5 ; UNASSIGNED # <reserved>
21672167+0EC6 ; PVALID # LAO KO LA
21682168+0EC7 ; UNASSIGNED # <reserved>
21692169+0EC8..0ECD ; PVALID # LAO TONE MAI EK..LAO NIGGAHITA
21702170+0ECE..0ECF ; UNASSIGNED # <reserved>..<reserved>
21712171+0ED0..0ED9 ; PVALID # LAO DIGIT ZERO..LAO DIGIT NINE
21722172+0EDA..0EDB ; UNASSIGNED # <reserved>..<reserved>
21732173+0EDC..0EDD ; DISALLOWED # LAO HO NO..LAO HO MO
21742174+0EDE..0EFF ; UNASSIGNED # <reserved>..<reserved>
21752175+0F00 ; PVALID # TIBETAN SYLLABLE OM
21762176+0F01..0F0A ; DISALLOWED # TIBETAN MARK GTER YIG MGO TRUNCATED A..TIBET
21772177+0F0B ; PVALID # TIBETAN MARK INTERSYLLABIC TSHEG
21782178+0F0C..0F17 ; DISALLOWED # TIBETAN MARK DELIMITER TSHEG BSTAR..TIBETAN
21792179+0F18..0F19 ; PVALID # TIBETAN ASTROLOGICAL SIGN -KHYUD PA..TIBETAN
21802180+0F1A..0F1F ; DISALLOWED # TIBETAN SIGN RDEL DKAR GCIG..TIBETAN SIGN RD
21812181+0F20..0F29 ; PVALID # TIBETAN DIGIT ZERO..TIBETAN DIGIT NINE
21822182+0F2A..0F34 ; DISALLOWED # TIBETAN DIGIT HALF ONE..TIBETAN MARK BSDUS R
21832183+21842184+21852185+21862186+Faltstrom Standards Track [Page 39]
21872187+21882188+RFC 5892 IDNA Code Points August 2010
21892189+21902190+21912191+0F35 ; PVALID # TIBETAN MARK NGAS BZUNG NYI ZLA
21922192+0F36 ; DISALLOWED # TIBETAN MARK CARET -DZUD RTAGS BZHI MIG CAN
21932193+0F37 ; PVALID # TIBETAN MARK NGAS BZUNG SGOR RTAGS
21942194+0F38 ; DISALLOWED # TIBETAN MARK CHE MGO
21952195+0F39 ; PVALID # TIBETAN MARK TSA -PHRU
21962196+0F3A..0F3D ; DISALLOWED # TIBETAN MARK GUG RTAGS GYON..TIBETAN MARK AN
21972197+0F3E..0F42 ; PVALID # TIBETAN SIGN YAR TSHES..TIBETAN LETTER GA
21982198+0F43 ; DISALLOWED # TIBETAN LETTER GHA
21992199+0F44..0F47 ; PVALID # TIBETAN LETTER NGA..TIBETAN LETTER JA
22002200+0F48 ; UNASSIGNED # <reserved>
22012201+0F49..0F4C ; PVALID # TIBETAN LETTER NYA..TIBETAN LETTER DDA
22022202+0F4D ; DISALLOWED # TIBETAN LETTER DDHA
22032203+0F4E..0F51 ; PVALID # TIBETAN LETTER NNA..TIBETAN LETTER DA
22042204+0F52 ; DISALLOWED # TIBETAN LETTER DHA
22052205+0F53..0F56 ; PVALID # TIBETAN LETTER NA..TIBETAN LETTER BA
22062206+0F57 ; DISALLOWED # TIBETAN LETTER BHA
22072207+0F58..0F5B ; PVALID # TIBETAN LETTER MA..TIBETAN LETTER DZA
22082208+0F5C ; DISALLOWED # TIBETAN LETTER DZHA
22092209+0F5D..0F68 ; PVALID # TIBETAN LETTER WA..TIBETAN LETTER A
22102210+0F69 ; DISALLOWED # TIBETAN LETTER KSSA
22112211+0F6A..0F6C ; PVALID # TIBETAN LETTER FIXED-FORM RA..TIBETAN LETTER
22122212+0F6D..0F70 ; UNASSIGNED # <reserved>..<reserved>
22132213+0F71..0F72 ; PVALID # TIBETAN VOWEL SIGN AA..TIBETAN VOWEL SIGN I
22142214+0F73 ; DISALLOWED # TIBETAN VOWEL SIGN II
22152215+0F74 ; PVALID # TIBETAN VOWEL SIGN U
22162216+0F75..0F79 ; DISALLOWED # TIBETAN VOWEL SIGN UU..TIBETAN VOWEL SIGN VO
22172217+0F7A..0F80 ; PVALID # TIBETAN VOWEL SIGN E..TIBETAN VOWEL SIGN REV
22182218+0F81 ; DISALLOWED # TIBETAN VOWEL SIGN REVERSED II
22192219+0F82..0F84 ; PVALID # TIBETAN SIGN NYI ZLA NAA DA..TIBETAN MARK HA
22202220+0F85 ; DISALLOWED # TIBETAN MARK PALUTA
22212221+0F86..0F8B ; PVALID # TIBETAN SIGN LCI RTAGS..TIBETAN SIGN GRU MED
22222222+0F8C..0F8F ; UNASSIGNED # <reserved>..<reserved>
22232223+0F90..0F92 ; PVALID # TIBETAN SUBJOINED LETTER KA..TIBETAN SUBJOIN
22242224+0F93 ; DISALLOWED # TIBETAN SUBJOINED LETTER GHA
22252225+0F94..0F97 ; PVALID # TIBETAN SUBJOINED LETTER NGA..TIBETAN SUBJOI
22262226+0F98 ; UNASSIGNED # <reserved>
22272227+0F99..0F9C ; PVALID # TIBETAN SUBJOINED LETTER NYA..TIBETAN SUBJOI
22282228+0F9D ; DISALLOWED # TIBETAN SUBJOINED LETTER DDHA
22292229+0F9E..0FA1 ; PVALID # TIBETAN SUBJOINED LETTER NNA..TIBETAN SUBJOI
22302230+0FA2 ; DISALLOWED # TIBETAN SUBJOINED LETTER DHA
22312231+0FA3..0FA6 ; PVALID # TIBETAN SUBJOINED LETTER NA..TIBETAN SUBJOIN
22322232+0FA7 ; DISALLOWED # TIBETAN SUBJOINED LETTER BHA
22332233+0FA8..0FAB ; PVALID # TIBETAN SUBJOINED LETTER MA..TIBETAN SUBJOIN
22342234+0FAC ; DISALLOWED # TIBETAN SUBJOINED LETTER DZHA
22352235+0FAD..0FB8 ; PVALID # TIBETAN SUBJOINED LETTER WA..TIBETAN SUBJOIN
22362236+0FB9 ; DISALLOWED # TIBETAN SUBJOINED LETTER KSSA
22372237+0FBA..0FBC ; PVALID # TIBETAN SUBJOINED LETTER FIXED-FORM WA..TIBE
22382238+0FBD ; UNASSIGNED # <reserved>
22392239+22402240+22412241+22422242+Faltstrom Standards Track [Page 40]
22432243+22442244+RFC 5892 IDNA Code Points August 2010
22452245+22462246+22472247+0FBE..0FC5 ; DISALLOWED # TIBETAN KU RU KHA..TIBETAN SYMBOL RDO RJE
22482248+0FC6 ; PVALID # TIBETAN SYMBOL PADMA GDAN
22492249+0FC7..0FCC ; DISALLOWED # TIBETAN SYMBOL RDO RJE RGYA GRAM..TIBETAN SY
22502250+0FCD ; UNASSIGNED # <reserved>
22512251+0FCE..0FD8 ; DISALLOWED # TIBETAN SIGN RDEL NAG RDEL DKAR..LEFT-FACING
22522252+0FD9..0FFF ; UNASSIGNED # <reserved>..<reserved>
22532253+1000..1049 ; PVALID # MYANMAR LETTER KA..MYANMAR DIGIT NINE
22542254+104A..104F ; DISALLOWED # MYANMAR SIGN LITTLE SECTION..MYANMAR SYMBOL
22552255+1050..109D ; PVALID # MYANMAR LETTER SHA..MYANMAR VOWEL SIGN AITON
22562256+109E..10C5 ; DISALLOWED # MYANMAR SYMBOL SHAN ONE..GEORGIAN CAPITAL LE
22572257+10C6..10CF ; UNASSIGNED # <reserved>..<reserved>
22582258+10D0..10FA ; PVALID # GEORGIAN LETTER AN..GEORGIAN LETTER AIN
22592259+10FB..10FC ; DISALLOWED # GEORGIAN PARAGRAPH SEPARATOR..MODIFIER LETTE
22602260+10FD..10FF ; UNASSIGNED # <reserved>..<reserved>
22612261+1100..11FF ; DISALLOWED # HANGUL CHOSEONG KIYEOK..HANGUL JONGSEONG SSA
22622262+1200..1248 ; PVALID # ETHIOPIC SYLLABLE HA..ETHIOPIC SYLLABLE QWA
22632263+1249 ; UNASSIGNED # <reserved>
22642264+124A..124D ; PVALID # ETHIOPIC SYLLABLE QWI..ETHIOPIC SYLLABLE QWE
22652265+124E..124F ; UNASSIGNED # <reserved>..<reserved>
22662266+1250..1256 ; PVALID # ETHIOPIC SYLLABLE QHA..ETHIOPIC SYLLABLE QHO
22672267+1257 ; UNASSIGNED # <reserved>
22682268+1258 ; PVALID # ETHIOPIC SYLLABLE QHWA
22692269+1259 ; UNASSIGNED # <reserved>
22702270+125A..125D ; PVALID # ETHIOPIC SYLLABLE QHWI..ETHIOPIC SYLLABLE QH
22712271+125E..125F ; UNASSIGNED # <reserved>..<reserved>
22722272+1260..1288 ; PVALID # ETHIOPIC SYLLABLE BA..ETHIOPIC SYLLABLE XWA
22732273+1289 ; UNASSIGNED # <reserved>
22742274+128A..128D ; PVALID # ETHIOPIC SYLLABLE XWI..ETHIOPIC SYLLABLE XWE
22752275+128E..128F ; UNASSIGNED # <reserved>..<reserved>
22762276+1290..12B0 ; PVALID # ETHIOPIC SYLLABLE NA..ETHIOPIC SYLLABLE KWA
22772277+12B1 ; UNASSIGNED # <reserved>
22782278+12B2..12B5 ; PVALID # ETHIOPIC SYLLABLE KWI..ETHIOPIC SYLLABLE KWE
22792279+12B6..12B7 ; UNASSIGNED # <reserved>..<reserved>
22802280+12B8..12BE ; PVALID # ETHIOPIC SYLLABLE KXA..ETHIOPIC SYLLABLE KXO
22812281+12BF ; UNASSIGNED # <reserved>
22822282+12C0 ; PVALID # ETHIOPIC SYLLABLE KXWA
22832283+12C1 ; UNASSIGNED # <reserved>
22842284+12C2..12C5 ; PVALID # ETHIOPIC SYLLABLE KXWI..ETHIOPIC SYLLABLE KX
22852285+12C6..12C7 ; UNASSIGNED # <reserved>..<reserved>
22862286+12C8..12D6 ; PVALID # ETHIOPIC SYLLABLE WA..ETHIOPIC SYLLABLE PHAR
22872287+12D7 ; UNASSIGNED # <reserved>
22882288+12D8..1310 ; PVALID # ETHIOPIC SYLLABLE ZA..ETHIOPIC SYLLABLE GWA
22892289+1311 ; UNASSIGNED # <reserved>
22902290+1312..1315 ; PVALID # ETHIOPIC SYLLABLE GWI..ETHIOPIC SYLLABLE GWE
22912291+1316..1317 ; UNASSIGNED # <reserved>..<reserved>
22922292+1318..135A ; PVALID # ETHIOPIC SYLLABLE GGA..ETHIOPIC SYLLABLE FYA
22932293+135B..135E ; UNASSIGNED # <reserved>..<reserved>
22942294+135F ; PVALID # ETHIOPIC COMBINING GEMINATION MARK
22952295+22962296+22972297+22982298+Faltstrom Standards Track [Page 41]
22992299+23002300+RFC 5892 IDNA Code Points August 2010
23012301+23022302+23032303+1360..137C ; DISALLOWED # ETHIOPIC SECTION MARK..ETHIOPIC NUMBER TEN T
23042304+137D..137F ; UNASSIGNED # <reserved>..<reserved>
23052305+1380..138F ; PVALID # ETHIOPIC SYLLABLE SEBATBEIT MWA..ETHIOPIC SY
23062306+1390..1399 ; DISALLOWED # ETHIOPIC TONAL MARK YIZET..ETHIOPIC TONAL MA
23072307+139A..139F ; UNASSIGNED # <reserved>..<reserved>
23082308+13A0..13F4 ; PVALID # CHEROKEE LETTER A..CHEROKEE LETTER YV
23092309+13F5..13FF ; UNASSIGNED # <reserved>..<reserved>
23102310+1400 ; DISALLOWED # CANADIAN SYLLABICS HYPHEN
23112311+1401..166C ; PVALID # CANADIAN SYLLABICS E..CANADIAN SYLLABICS CAR
23122312+166D..166E ; DISALLOWED # CANADIAN SYLLABICS CHI SIGN..CANADIAN SYLLAB
23132313+166F..167F ; PVALID # CANADIAN SYLLABICS QAI..CANADIAN SYLLABICS B
23142314+1680 ; DISALLOWED # OGHAM SPACE MARK
23152315+1681..169A ; PVALID # OGHAM LETTER BEITH..OGHAM LETTER PEITH
23162316+169B..169C ; DISALLOWED # OGHAM FEATHER MARK..OGHAM REVERSED FEATHER M
23172317+169D..169F ; UNASSIGNED # <reserved>..<reserved>
23182318+16A0..16EA ; PVALID # RUNIC LETTER FEHU FEOH FE F..RUNIC LETTER X
23192319+16EB..16F0 ; DISALLOWED # RUNIC SINGLE PUNCTUATION..RUNIC BELGTHOR SYM
23202320+16F1..16FF ; UNASSIGNED # <reserved>..<reserved>
23212321+1700..170C ; PVALID # TAGALOG LETTER A..TAGALOG LETTER YA
23222322+170D ; UNASSIGNED # <reserved>
23232323+170E..1714 ; PVALID # TAGALOG LETTER LA..TAGALOG SIGN VIRAMA
23242324+1715..171F ; UNASSIGNED # <reserved>..<reserved>
23252325+1720..1734 ; PVALID # HANUNOO LETTER A..HANUNOO SIGN PAMUDPOD
23262326+1735..1736 ; DISALLOWED # PHILIPPINE SINGLE PUNCTUATION..PHILIPPINE DO
23272327+1737..173F ; UNASSIGNED # <reserved>..<reserved>
23282328+1740..1753 ; PVALID # BUHID LETTER A..BUHID VOWEL SIGN U
23292329+1754..175F ; UNASSIGNED # <reserved>..<reserved>
23302330+1760..176C ; PVALID # TAGBANWA LETTER A..TAGBANWA LETTER YA
23312331+176D ; UNASSIGNED # <reserved>
23322332+176E..1770 ; PVALID # TAGBANWA LETTER LA..TAGBANWA LETTER SA
23332333+1771 ; UNASSIGNED # <reserved>
23342334+1772..1773 ; PVALID # TAGBANWA VOWEL SIGN I..TAGBANWA VOWEL SIGN U
23352335+1774..177F ; UNASSIGNED # <reserved>..<reserved>
23362336+1780..17B3 ; PVALID # KHMER LETTER KA..KHMER INDEPENDENT VOWEL QAU
23372337+17B4..17B5 ; DISALLOWED # KHMER VOWEL INHERENT AQ..KHMER VOWEL INHEREN
23382338+17B6..17D3 ; PVALID # KHMER VOWEL SIGN AA..KHMER SIGN BATHAMASAT
23392339+17D4..17D6 ; DISALLOWED # KHMER SIGN KHAN..KHMER SIGN CAMNUC PII KUUH
23402340+17D7 ; PVALID # KHMER SIGN LEK TOO
23412341+17D8..17DB ; DISALLOWED # KHMER SIGN BEYYAL..KHMER CURRENCY SYMBOL RIE
23422342+17DC..17DD ; PVALID # KHMER SIGN AVAKRAHASANYA..KHMER SIGN ATTHACA
23432343+17DE..17DF ; UNASSIGNED # <reserved>..<reserved>
23442344+17E0..17E9 ; PVALID # KHMER DIGIT ZERO..KHMER DIGIT NINE
23452345+17EA..17EF ; UNASSIGNED # <reserved>..<reserved>
23462346+17F0..17F9 ; DISALLOWED # KHMER SYMBOL LEK ATTAK SON..KHMER SYMBOL LEK
23472347+17FA..17FF ; UNASSIGNED # <reserved>..<reserved>
23482348+1800..180E ; DISALLOWED # MONGOLIAN BIRGA..MONGOLIAN VOWEL SEPARATOR
23492349+180F ; UNASSIGNED # <reserved>
23502350+1810..1819 ; PVALID # MONGOLIAN DIGIT ZERO..MONGOLIAN DIGIT NINE
23512351+23522352+23532353+23542354+Faltstrom Standards Track [Page 42]
23552355+23562356+RFC 5892 IDNA Code Points August 2010
23572357+23582358+23592359+181A..181F ; UNASSIGNED # <reserved>..<reserved>
23602360+1820..1877 ; PVALID # MONGOLIAN LETTER A..MONGOLIAN LETTER MANCHU
23612361+1878..187F ; UNASSIGNED # <reserved>..<reserved>
23622362+1880..18AA ; PVALID # MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONG
23632363+18AB..18AF ; UNASSIGNED # <reserved>..<reserved>
23642364+18B0..18F5 ; PVALID # CANADIAN SYLLABICS OY..CANADIAN SYLLABICS CA
23652365+18F6..18FF ; UNASSIGNED # <reserved>..<reserved>
23662366+1900..191C ; PVALID # LIMBU VOWEL-CARRIER LETTER..LIMBU LETTER HA
23672367+191D..191F ; UNASSIGNED # <reserved>..<reserved>
23682368+1920..192B ; PVALID # LIMBU VOWEL SIGN A..LIMBU SUBJOINED LETTER W
23692369+192C..192F ; UNASSIGNED # <reserved>..<reserved>
23702370+1930..193B ; PVALID # LIMBU SMALL LETTER KA..LIMBU SIGN SA-I
23712371+193C..193F ; UNASSIGNED # <reserved>..<reserved>
23722372+1940 ; DISALLOWED # LIMBU SIGN LOO
23732373+1941..1943 ; UNASSIGNED # <reserved>..<reserved>
23742374+1944..1945 ; DISALLOWED # LIMBU EXCLAMATION MARK..LIMBU QUESTION MARK
23752375+1946..196D ; PVALID # LIMBU DIGIT ZERO..TAI LE LETTER AI
23762376+196E..196F ; UNASSIGNED # <reserved>..<reserved>
23772377+1970..1974 ; PVALID # TAI LE LETTER TONE-2..TAI LE LETTER TONE-6
23782378+1975..197F ; UNASSIGNED # <reserved>..<reserved>
23792379+1980..19AB ; PVALID # NEW TAI LUE LETTER HIGH QA..NEW TAI LUE LETT
23802380+19AC..19AF ; UNASSIGNED # <reserved>..<reserved>
23812381+19B0..19C9 ; PVALID # NEW TAI LUE VOWEL SIGN VOWEL SHORTENER..NEW
23822382+19CA..19CF ; UNASSIGNED # <reserved>..<reserved>
23832383+19D0..19DA ; PVALID # NEW TAI LUE DIGIT ZERO..NEW TAI LUE THAM DIG
23842384+19DB..19DD ; UNASSIGNED # <reserved>..<reserved>
23852385+19DE..19FF ; DISALLOWED # NEW TAI LUE SIGN LAE..KHMER SYMBOL DAP-PRAM
23862386+1A00..1A1B ; PVALID # BUGINESE LETTER KA..BUGINESE VOWEL SIGN AE
23872387+1A1C..1A1D ; UNASSIGNED # <reserved>..<reserved>
23882388+1A1E..1A1F ; DISALLOWED # BUGINESE PALLAWA..BUGINESE END OF SECTION
23892389+1A20..1A5E ; PVALID # TAI THAM LETTER HIGH KA..TAI THAM CONSONANT
23902390+1A5F ; UNASSIGNED # <reserved>
23912391+1A60..1A7C ; PVALID # TAI THAM SIGN SAKOT..TAI THAM SIGN KHUEN-LUE
23922392+1A7D..1A7E ; UNASSIGNED # <reserved>..<reserved>
23932393+1A7F..1A89 ; PVALID # TAI THAM COMBINING CRYPTOGRAMMIC DOT..TAI TH
23942394+1A8A..1A8F ; UNASSIGNED # <reserved>..<reserved>
23952395+1A90..1A99 ; PVALID # TAI THAM THAM DIGIT ZERO..TAI THAM THAM DIGI
23962396+1A9A..1A9F ; UNASSIGNED # <reserved>..<reserved>
23972397+1AA0..1AA6 ; DISALLOWED # TAI THAM SIGN WIANG..TAI THAM SIGN REVERSED
23982398+1AA7 ; PVALID # TAI THAM SIGN MAI YAMOK
23992399+1AA8..1AAD ; DISALLOWED # TAI THAM SIGN KAAN..TAI THAM SIGN CAANG
24002400+1AAE..1AFF ; UNASSIGNED # <reserved>..<reserved>
24012401+1B00..1B4B ; PVALID # BALINESE SIGN ULU RICEM..BALINESE LETTER ASY
24022402+1B4C..1B4F ; UNASSIGNED # <reserved>..<reserved>
24032403+1B50..1B59 ; PVALID # BALINESE DIGIT ZERO..BALINESE DIGIT NINE
24042404+1B5A..1B6A ; DISALLOWED # BALINESE PANTI..BALINESE MUSICAL SYMBOL DANG
24052405+1B6B..1B73 ; PVALID # BALINESE MUSICAL SYMBOL COMBINING TEGEH..BAL
24062406+1B74..1B7C ; DISALLOWED # BALINESE MUSICAL SYMBOL RIGHT-HAND OPEN DUG.
24072407+24082408+24092409+24102410+Faltstrom Standards Track [Page 43]
24112411+24122412+RFC 5892 IDNA Code Points August 2010
24132413+24142414+24152415+1B7D..1B7F ; UNASSIGNED # <reserved>..<reserved>
24162416+1B80..1BAA ; PVALID # SUNDANESE SIGN PANYECEK..SUNDANESE SIGN PAMA
24172417+1BAB..1BAD ; UNASSIGNED # <reserved>..<reserved>
24182418+1BAE..1BB9 ; PVALID # SUNDANESE LETTER KHA..SUNDANESE DIGIT NINE
24192419+1BBA..1BFF ; UNASSIGNED # <reserved>..<reserved>
24202420+1C00..1C37 ; PVALID # LEPCHA LETTER KA..LEPCHA SIGN NUKTA
24212421+1C38..1C3A ; UNASSIGNED # <reserved>..<reserved>
24222422+1C3B..1C3F ; DISALLOWED # LEPCHA PUNCTUATION TA-ROL..LEPCHA PUNCTUATIO
24232423+1C40..1C49 ; PVALID # LEPCHA DIGIT ZERO..LEPCHA DIGIT NINE
24242424+1C4A..1C4C ; UNASSIGNED # <reserved>..<reserved>
24252425+1C4D..1C7D ; PVALID # LEPCHA LETTER TTA..OL CHIKI AHAD
24262426+1C7E..1C7F ; DISALLOWED # OL CHIKI PUNCTUATION MUCAAD..OL CHIKI PUNCTU
24272427+1C80..1CCF ; UNASSIGNED # <reserved>..<reserved>
24282428+1CD0..1CD2 ; PVALID # VEDIC TONE KARSHANA..VEDIC TONE PRENKHA
24292429+1CD3 ; DISALLOWED # VEDIC SIGN NIHSHVASA
24302430+1CD4..1CF2 ; PVALID # VEDIC SIGN YAJURVEDIC MIDLINE SVARITA..VEDIC
24312431+1CF3..1CFF ; UNASSIGNED # <reserved>..<reserved>
24322432+1D00..1D2B ; PVALID # LATIN LETTER SMALL CAPITAL A..CYRILLIC LETTE
24332433+1D2C..1D2E ; DISALLOWED # MODIFIER LETTER CAPITAL A..MODIFIER LETTER C
24342434+1D2F ; PVALID # MODIFIER LETTER CAPITAL BARRED B
24352435+1D30..1D3A ; DISALLOWED # MODIFIER LETTER CAPITAL D..MODIFIER LETTER C
24362436+1D3B ; PVALID # MODIFIER LETTER CAPITAL REVERSED N
24372437+1D3C..1D4D ; DISALLOWED # MODIFIER LETTER CAPITAL O..MODIFIER LETTER S
24382438+1D4E ; PVALID # MODIFIER LETTER SMALL TURNED I
24392439+1D4F..1D6A ; DISALLOWED # MODIFIER LETTER SMALL K..GREEK SUBSCRIPT SMA
24402440+1D6B..1D77 ; PVALID # LATIN SMALL LETTER UE..LATIN SMALL LETTER TU
24412441+1D78 ; DISALLOWED # MODIFIER LETTER CYRILLIC EN
24422442+1D79..1D9A ; PVALID # LATIN SMALL LETTER INSULAR G..LATIN SMALL LE
24432443+1D9B..1DBF ; DISALLOWED # MODIFIER LETTER SMALL TURNED ALPHA..MODIFIER
24442444+1DC0..1DE6 ; PVALID # COMBINING DOTTED GRAVE ACCENT..COMBINING LAT
24452445+1DE7..1DFC ; UNASSIGNED # <reserved>..<reserved>
24462446+1DFD..1DFF ; PVALID # COMBINING ALMOST EQUAL TO BELOW..COMBINING R
24472447+1E00 ; DISALLOWED # LATIN CAPITAL LETTER A WITH RING BELOW
24482448+1E01 ; PVALID # LATIN SMALL LETTER A WITH RING BELOW
24492449+1E02 ; DISALLOWED # LATIN CAPITAL LETTER B WITH DOT ABOVE
24502450+1E03 ; PVALID # LATIN SMALL LETTER B WITH DOT ABOVE
24512451+1E04 ; DISALLOWED # LATIN CAPITAL LETTER B WITH DOT BELOW
24522452+1E05 ; PVALID # LATIN SMALL LETTER B WITH DOT BELOW
24532453+1E06 ; DISALLOWED # LATIN CAPITAL LETTER B WITH LINE BELOW
24542454+1E07 ; PVALID # LATIN SMALL LETTER B WITH LINE BELOW
24552455+1E08 ; DISALLOWED # LATIN CAPITAL LETTER C WITH CEDILLA AND ACUT
24562456+1E09 ; PVALID # LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
24572457+1E0A ; DISALLOWED # LATIN CAPITAL LETTER D WITH DOT ABOVE
24582458+1E0B ; PVALID # LATIN SMALL LETTER D WITH DOT ABOVE
24592459+1E0C ; DISALLOWED # LATIN CAPITAL LETTER D WITH DOT BELOW
24602460+1E0D ; PVALID # LATIN SMALL LETTER D WITH DOT BELOW
24612461+1E0E ; DISALLOWED # LATIN CAPITAL LETTER D WITH LINE BELOW
24622462+1E0F ; PVALID # LATIN SMALL LETTER D WITH LINE BELOW
24632463+24642464+24652465+24662466+Faltstrom Standards Track [Page 44]
24672467+24682468+RFC 5892 IDNA Code Points August 2010
24692469+24702470+24712471+1E10 ; DISALLOWED # LATIN CAPITAL LETTER D WITH CEDILLA
24722472+1E11 ; PVALID # LATIN SMALL LETTER D WITH CEDILLA
24732473+1E12 ; DISALLOWED # LATIN CAPITAL LETTER D WITH CIRCUMFLEX BELOW
24742474+1E13 ; PVALID # LATIN SMALL LETTER D WITH CIRCUMFLEX BELOW
24752475+1E14 ; DISALLOWED # LATIN CAPITAL LETTER E WITH MACRON AND GRAVE
24762476+1E15 ; PVALID # LATIN SMALL LETTER E WITH MACRON AND GRAVE
24772477+1E16 ; DISALLOWED # LATIN CAPITAL LETTER E WITH MACRON AND ACUTE
24782478+1E17 ; PVALID # LATIN SMALL LETTER E WITH MACRON AND ACUTE
24792479+1E18 ; DISALLOWED # LATIN CAPITAL LETTER E WITH CIRCUMFLEX BELOW
24802480+1E19 ; PVALID # LATIN SMALL LETTER E WITH CIRCUMFLEX BELOW
24812481+1E1A ; DISALLOWED # LATIN CAPITAL LETTER E WITH TILDE BELOW
24822482+1E1B ; PVALID # LATIN SMALL LETTER E WITH TILDE BELOW
24832483+1E1C ; DISALLOWED # LATIN CAPITAL LETTER E WITH CEDILLA AND BREV
24842484+1E1D ; PVALID # LATIN SMALL LETTER E WITH CEDILLA AND BREVE
24852485+1E1E ; DISALLOWED # LATIN CAPITAL LETTER F WITH DOT ABOVE
24862486+1E1F ; PVALID # LATIN SMALL LETTER F WITH DOT ABOVE
24872487+1E20 ; DISALLOWED # LATIN CAPITAL LETTER G WITH MACRON
24882488+1E21 ; PVALID # LATIN SMALL LETTER G WITH MACRON
24892489+1E22 ; DISALLOWED # LATIN CAPITAL LETTER H WITH DOT ABOVE
24902490+1E23 ; PVALID # LATIN SMALL LETTER H WITH DOT ABOVE
24912491+1E24 ; DISALLOWED # LATIN CAPITAL LETTER H WITH DOT BELOW
24922492+1E25 ; PVALID # LATIN SMALL LETTER H WITH DOT BELOW
24932493+1E26 ; DISALLOWED # LATIN CAPITAL LETTER H WITH DIAERESIS
24942494+1E27 ; PVALID # LATIN SMALL LETTER H WITH DIAERESIS
24952495+1E28 ; DISALLOWED # LATIN CAPITAL LETTER H WITH CEDILLA
24962496+1E29 ; PVALID # LATIN SMALL LETTER H WITH CEDILLA
24972497+1E2A ; DISALLOWED # LATIN CAPITAL LETTER H WITH BREVE BELOW
24982498+1E2B ; PVALID # LATIN SMALL LETTER H WITH BREVE BELOW
24992499+1E2C ; DISALLOWED # LATIN CAPITAL LETTER I WITH TILDE BELOW
25002500+1E2D ; PVALID # LATIN SMALL LETTER I WITH TILDE BELOW
25012501+1E2E ; DISALLOWED # LATIN CAPITAL LETTER I WITH DIAERESIS AND AC
25022502+1E2F ; PVALID # LATIN SMALL LETTER I WITH DIAERESIS AND ACUT
25032503+1E30 ; DISALLOWED # LATIN CAPITAL LETTER K WITH ACUTE
25042504+1E31 ; PVALID # LATIN SMALL LETTER K WITH ACUTE
25052505+1E32 ; DISALLOWED # LATIN CAPITAL LETTER K WITH DOT BELOW
25062506+1E33 ; PVALID # LATIN SMALL LETTER K WITH DOT BELOW
25072507+1E34 ; DISALLOWED # LATIN CAPITAL LETTER K WITH LINE BELOW
25082508+1E35 ; PVALID # LATIN SMALL LETTER K WITH LINE BELOW
25092509+1E36 ; DISALLOWED # LATIN CAPITAL LETTER L WITH DOT BELOW
25102510+1E37 ; PVALID # LATIN SMALL LETTER L WITH DOT BELOW
25112511+1E38 ; DISALLOWED # LATIN CAPITAL LETTER L WITH DOT BELOW AND MA
25122512+1E39 ; PVALID # LATIN SMALL LETTER L WITH DOT BELOW AND MACR
25132513+1E3A ; DISALLOWED # LATIN CAPITAL LETTER L WITH LINE BELOW
25142514+1E3B ; PVALID # LATIN SMALL LETTER L WITH LINE BELOW
25152515+1E3C ; DISALLOWED # LATIN CAPITAL LETTER L WITH CIRCUMFLEX BELOW
25162516+1E3D ; PVALID # LATIN SMALL LETTER L WITH CIRCUMFLEX BELOW
25172517+1E3E ; DISALLOWED # LATIN CAPITAL LETTER M WITH ACUTE
25182518+1E3F ; PVALID # LATIN SMALL LETTER M WITH ACUTE
25192519+25202520+25212521+25222522+Faltstrom Standards Track [Page 45]
25232523+25242524+RFC 5892 IDNA Code Points August 2010
25252525+25262526+25272527+1E40 ; DISALLOWED # LATIN CAPITAL LETTER M WITH DOT ABOVE
25282528+1E41 ; PVALID # LATIN SMALL LETTER M WITH DOT ABOVE
25292529+1E42 ; DISALLOWED # LATIN CAPITAL LETTER M WITH DOT BELOW
25302530+1E43 ; PVALID # LATIN SMALL LETTER M WITH DOT BELOW
25312531+1E44 ; DISALLOWED # LATIN CAPITAL LETTER N WITH DOT ABOVE
25322532+1E45 ; PVALID # LATIN SMALL LETTER N WITH DOT ABOVE
25332533+1E46 ; DISALLOWED # LATIN CAPITAL LETTER N WITH DOT BELOW
25342534+1E47 ; PVALID # LATIN SMALL LETTER N WITH DOT BELOW
25352535+1E48 ; DISALLOWED # LATIN CAPITAL LETTER N WITH LINE BELOW
25362536+1E49 ; PVALID # LATIN SMALL LETTER N WITH LINE BELOW
25372537+1E4A ; DISALLOWED # LATIN CAPITAL LETTER N WITH CIRCUMFLEX BELOW
25382538+1E4B ; PVALID # LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW
25392539+1E4C ; DISALLOWED # LATIN CAPITAL LETTER O WITH TILDE AND ACUTE
25402540+1E4D ; PVALID # LATIN SMALL LETTER O WITH TILDE AND ACUTE
25412541+1E4E ; DISALLOWED # LATIN CAPITAL LETTER O WITH TILDE AND DIAERE
25422542+1E4F ; PVALID # LATIN SMALL LETTER O WITH TILDE AND DIAERESI
25432543+1E50 ; DISALLOWED # LATIN CAPITAL LETTER O WITH MACRON AND GRAVE
25442544+1E51 ; PVALID # LATIN SMALL LETTER O WITH MACRON AND GRAVE
25452545+1E52 ; DISALLOWED # LATIN CAPITAL LETTER O WITH MACRON AND ACUTE
25462546+1E53 ; PVALID # LATIN SMALL LETTER O WITH MACRON AND ACUTE
25472547+1E54 ; DISALLOWED # LATIN CAPITAL LETTER P WITH ACUTE
25482548+1E55 ; PVALID # LATIN SMALL LETTER P WITH ACUTE
25492549+1E56 ; DISALLOWED # LATIN CAPITAL LETTER P WITH DOT ABOVE
25502550+1E57 ; PVALID # LATIN SMALL LETTER P WITH DOT ABOVE
25512551+1E58 ; DISALLOWED # LATIN CAPITAL LETTER R WITH DOT ABOVE
25522552+1E59 ; PVALID # LATIN SMALL LETTER R WITH DOT ABOVE
25532553+1E5A ; DISALLOWED # LATIN CAPITAL LETTER R WITH DOT BELOW
25542554+1E5B ; PVALID # LATIN SMALL LETTER R WITH DOT BELOW
25552555+1E5C ; DISALLOWED # LATIN CAPITAL LETTER R WITH DOT BELOW AND MA
25562556+1E5D ; PVALID # LATIN SMALL LETTER R WITH DOT BELOW AND MACR
25572557+1E5E ; DISALLOWED # LATIN CAPITAL LETTER R WITH LINE BELOW
25582558+1E5F ; PVALID # LATIN SMALL LETTER R WITH LINE BELOW
25592559+1E60 ; DISALLOWED # LATIN CAPITAL LETTER S WITH DOT ABOVE
25602560+1E61 ; PVALID # LATIN SMALL LETTER S WITH DOT ABOVE
25612561+1E62 ; DISALLOWED # LATIN CAPITAL LETTER S WITH DOT BELOW
25622562+1E63 ; PVALID # LATIN SMALL LETTER S WITH DOT BELOW
25632563+1E64 ; DISALLOWED # LATIN CAPITAL LETTER S WITH ACUTE AND DOT AB
25642564+1E65 ; PVALID # LATIN SMALL LETTER S WITH ACUTE AND DOT ABOV
25652565+1E66 ; DISALLOWED # LATIN CAPITAL LETTER S WITH CARON AND DOT AB
25662566+1E67 ; PVALID # LATIN SMALL LETTER S WITH CARON AND DOT ABOV
25672567+1E68 ; DISALLOWED # LATIN CAPITAL LETTER S WITH DOT BELOW AND DO
25682568+1E69 ; PVALID # LATIN SMALL LETTER S WITH DOT BELOW AND DOT
25692569+1E6A ; DISALLOWED # LATIN CAPITAL LETTER T WITH DOT ABOVE
25702570+1E6B ; PVALID # LATIN SMALL LETTER T WITH DOT ABOVE
25712571+1E6C ; DISALLOWED # LATIN CAPITAL LETTER T WITH DOT BELOW
25722572+1E6D ; PVALID # LATIN SMALL LETTER T WITH DOT BELOW
25732573+1E6E ; DISALLOWED # LATIN CAPITAL LETTER T WITH LINE BELOW
25742574+1E6F ; PVALID # LATIN SMALL LETTER T WITH LINE BELOW
25752575+25762576+25772577+25782578+Faltstrom Standards Track [Page 46]
25792579+25802580+RFC 5892 IDNA Code Points August 2010
25812581+25822582+25832583+1E70 ; DISALLOWED # LATIN CAPITAL LETTER T WITH CIRCUMFLEX BELOW
25842584+1E71 ; PVALID # LATIN SMALL LETTER T WITH CIRCUMFLEX BELOW
25852585+1E72 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS BELOW
25862586+1E73 ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS BELOW
25872587+1E74 ; DISALLOWED # LATIN CAPITAL LETTER U WITH TILDE BELOW
25882588+1E75 ; PVALID # LATIN SMALL LETTER U WITH TILDE BELOW
25892589+1E76 ; DISALLOWED # LATIN CAPITAL LETTER U WITH CIRCUMFLEX BELOW
25902590+1E77 ; PVALID # LATIN SMALL LETTER U WITH CIRCUMFLEX BELOW
25912591+1E78 ; DISALLOWED # LATIN CAPITAL LETTER U WITH TILDE AND ACUTE
25922592+1E79 ; PVALID # LATIN SMALL LETTER U WITH TILDE AND ACUTE
25932593+1E7A ; DISALLOWED # LATIN CAPITAL LETTER U WITH MACRON AND DIAER
25942594+1E7B ; PVALID # LATIN SMALL LETTER U WITH MACRON AND DIAERES
25952595+1E7C ; DISALLOWED # LATIN CAPITAL LETTER V WITH TILDE
25962596+1E7D ; PVALID # LATIN SMALL LETTER V WITH TILDE
25972597+1E7E ; DISALLOWED # LATIN CAPITAL LETTER V WITH DOT BELOW
25982598+1E7F ; PVALID # LATIN SMALL LETTER V WITH DOT BELOW
25992599+1E80 ; DISALLOWED # LATIN CAPITAL LETTER W WITH GRAVE
26002600+1E81 ; PVALID # LATIN SMALL LETTER W WITH GRAVE
26012601+1E82 ; DISALLOWED # LATIN CAPITAL LETTER W WITH ACUTE
26022602+1E83 ; PVALID # LATIN SMALL LETTER W WITH ACUTE
26032603+1E84 ; DISALLOWED # LATIN CAPITAL LETTER W WITH DIAERESIS
26042604+1E85 ; PVALID # LATIN SMALL LETTER W WITH DIAERESIS
26052605+1E86 ; DISALLOWED # LATIN CAPITAL LETTER W WITH DOT ABOVE
26062606+1E87 ; PVALID # LATIN SMALL LETTER W WITH DOT ABOVE
26072607+1E88 ; DISALLOWED # LATIN CAPITAL LETTER W WITH DOT BELOW
26082608+1E89 ; PVALID # LATIN SMALL LETTER W WITH DOT BELOW
26092609+1E8A ; DISALLOWED # LATIN CAPITAL LETTER X WITH DOT ABOVE
26102610+1E8B ; PVALID # LATIN SMALL LETTER X WITH DOT ABOVE
26112611+1E8C ; DISALLOWED # LATIN CAPITAL LETTER X WITH DIAERESIS
26122612+1E8D ; PVALID # LATIN SMALL LETTER X WITH DIAERESIS
26132613+1E8E ; DISALLOWED # LATIN CAPITAL LETTER Y WITH DOT ABOVE
26142614+1E8F ; PVALID # LATIN SMALL LETTER Y WITH DOT ABOVE
26152615+1E90 ; DISALLOWED # LATIN CAPITAL LETTER Z WITH CIRCUMFLEX
26162616+1E91 ; PVALID # LATIN SMALL LETTER Z WITH CIRCUMFLEX
26172617+1E92 ; DISALLOWED # LATIN CAPITAL LETTER Z WITH DOT BELOW
26182618+1E93 ; PVALID # LATIN SMALL LETTER Z WITH DOT BELOW
26192619+1E94 ; DISALLOWED # LATIN CAPITAL LETTER Z WITH LINE BELOW
26202620+1E95..1E99 ; PVALID # LATIN SMALL LETTER Z WITH LINE BELOW..LATIN
26212621+1E9A..1E9B ; DISALLOWED # LATIN SMALL LETTER A WITH RIGHT HALF RING..L
26222622+1E9C..1E9D ; PVALID # LATIN SMALL LETTER LONG S WITH DIAGONAL STRO
26232623+1E9E ; DISALLOWED # LATIN CAPITAL LETTER SHARP S
26242624+1E9F ; PVALID # LATIN SMALL LETTER DELTA
26252625+1EA0 ; DISALLOWED # LATIN CAPITAL LETTER A WITH DOT BELOW
26262626+1EA1 ; PVALID # LATIN SMALL LETTER A WITH DOT BELOW
26272627+1EA2 ; DISALLOWED # LATIN CAPITAL LETTER A WITH HOOK ABOVE
26282628+1EA3 ; PVALID # LATIN SMALL LETTER A WITH HOOK ABOVE
26292629+1EA4 ; DISALLOWED # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND A
26302630+1EA5 ; PVALID # LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACU
26312631+26322632+26332633+26342634+Faltstrom Standards Track [Page 47]
26352635+26362636+RFC 5892 IDNA Code Points August 2010
26372637+26382638+26392639+1EA6 ; DISALLOWED # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND G
26402640+1EA7 ; PVALID # LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRA
26412641+1EA8 ; DISALLOWED # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND H
26422642+1EA9 ; PVALID # LATIN SMALL LETTER A WITH CIRCUMFLEX AND HOO
26432643+1EAA ; DISALLOWED # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND T
26442644+1EAB ; PVALID # LATIN SMALL LETTER A WITH CIRCUMFLEX AND TIL
26452645+1EAC ; DISALLOWED # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND D
26462646+1EAD ; PVALID # LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT
26472647+1EAE ; DISALLOWED # LATIN CAPITAL LETTER A WITH BREVE AND ACUTE
26482648+1EAF ; PVALID # LATIN SMALL LETTER A WITH BREVE AND ACUTE
26492649+1EB0 ; DISALLOWED # LATIN CAPITAL LETTER A WITH BREVE AND GRAVE
26502650+1EB1 ; PVALID # LATIN SMALL LETTER A WITH BREVE AND GRAVE
26512651+1EB2 ; DISALLOWED # LATIN CAPITAL LETTER A WITH BREVE AND HOOK A
26522652+1EB3 ; PVALID # LATIN SMALL LETTER A WITH BREVE AND HOOK ABO
26532653+1EB4 ; DISALLOWED # LATIN CAPITAL LETTER A WITH BREVE AND TILDE
26542654+1EB5 ; PVALID # LATIN SMALL LETTER A WITH BREVE AND TILDE
26552655+1EB6 ; DISALLOWED # LATIN CAPITAL LETTER A WITH BREVE AND DOT BE
26562656+1EB7 ; PVALID # LATIN SMALL LETTER A WITH BREVE AND DOT BELO
26572657+1EB8 ; DISALLOWED # LATIN CAPITAL LETTER E WITH DOT BELOW
26582658+1EB9 ; PVALID # LATIN SMALL LETTER E WITH DOT BELOW
26592659+1EBA ; DISALLOWED # LATIN CAPITAL LETTER E WITH HOOK ABOVE
26602660+1EBB ; PVALID # LATIN SMALL LETTER E WITH HOOK ABOVE
26612661+1EBC ; DISALLOWED # LATIN CAPITAL LETTER E WITH TILDE
26622662+1EBD ; PVALID # LATIN SMALL LETTER E WITH TILDE
26632663+1EBE ; DISALLOWED # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND A
26642664+1EBF ; PVALID # LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACU
26652665+1EC0 ; DISALLOWED # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND G
26662666+1EC1 ; PVALID # LATIN SMALL LETTER E WITH CIRCUMFLEX AND GRA
26672667+1EC2 ; DISALLOWED # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND H
26682668+1EC3 ; PVALID # LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOO
26692669+1EC4 ; DISALLOWED # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND T
26702670+1EC5 ; PVALID # LATIN SMALL LETTER E WITH CIRCUMFLEX AND TIL
26712671+1EC6 ; DISALLOWED # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND D
26722672+1EC7 ; PVALID # LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT
26732673+1EC8 ; DISALLOWED # LATIN CAPITAL LETTER I WITH HOOK ABOVE
26742674+1EC9 ; PVALID # LATIN SMALL LETTER I WITH HOOK ABOVE
26752675+1ECA ; DISALLOWED # LATIN CAPITAL LETTER I WITH DOT BELOW
26762676+1ECB ; PVALID # LATIN SMALL LETTER I WITH DOT BELOW
26772677+1ECC ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOT BELOW
26782678+1ECD ; PVALID # LATIN SMALL LETTER O WITH DOT BELOW
26792679+1ECE ; DISALLOWED # LATIN CAPITAL LETTER O WITH HOOK ABOVE
26802680+1ECF ; PVALID # LATIN SMALL LETTER O WITH HOOK ABOVE
26812681+1ED0 ; DISALLOWED # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND A
26822682+1ED1 ; PVALID # LATIN SMALL LETTER O WITH CIRCUMFLEX AND ACU
26832683+1ED2 ; DISALLOWED # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND G
26842684+1ED3 ; PVALID # LATIN SMALL LETTER O WITH CIRCUMFLEX AND GRA
26852685+1ED4 ; DISALLOWED # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND H
26862686+1ED5 ; PVALID # LATIN SMALL LETTER O WITH CIRCUMFLEX AND HOO
26872687+26882688+26892689+26902690+Faltstrom Standards Track [Page 48]
26912691+26922692+RFC 5892 IDNA Code Points August 2010
26932693+26942694+26952695+1ED6 ; DISALLOWED # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND T
26962696+1ED7 ; PVALID # LATIN SMALL LETTER O WITH CIRCUMFLEX AND TIL
26972697+1ED8 ; DISALLOWED # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND D
26982698+1ED9 ; PVALID # LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT
26992699+1EDA ; DISALLOWED # LATIN CAPITAL LETTER O WITH HORN AND ACUTE
27002700+1EDB ; PVALID # LATIN SMALL LETTER O WITH HORN AND ACUTE
27012701+1EDC ; DISALLOWED # LATIN CAPITAL LETTER O WITH HORN AND GRAVE
27022702+1EDD ; PVALID # LATIN SMALL LETTER O WITH HORN AND GRAVE
27032703+1EDE ; DISALLOWED # LATIN CAPITAL LETTER O WITH HORN AND HOOK AB
27042704+1EDF ; PVALID # LATIN SMALL LETTER O WITH HORN AND HOOK ABOV
27052705+1EE0 ; DISALLOWED # LATIN CAPITAL LETTER O WITH HORN AND TILDE
27062706+1EE1 ; PVALID # LATIN SMALL LETTER O WITH HORN AND TILDE
27072707+1EE2 ; DISALLOWED # LATIN CAPITAL LETTER O WITH HORN AND DOT BEL
27082708+1EE3 ; PVALID # LATIN SMALL LETTER O WITH HORN AND DOT BELOW
27092709+1EE4 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DOT BELOW
27102710+1EE5 ; PVALID # LATIN SMALL LETTER U WITH DOT BELOW
27112711+1EE6 ; DISALLOWED # LATIN CAPITAL LETTER U WITH HOOK ABOVE
27122712+1EE7 ; PVALID # LATIN SMALL LETTER U WITH HOOK ABOVE
27132713+1EE8 ; DISALLOWED # LATIN CAPITAL LETTER U WITH HORN AND ACUTE
27142714+1EE9 ; PVALID # LATIN SMALL LETTER U WITH HORN AND ACUTE
27152715+1EEA ; DISALLOWED # LATIN CAPITAL LETTER U WITH HORN AND GRAVE
27162716+1EEB ; PVALID # LATIN SMALL LETTER U WITH HORN AND GRAVE
27172717+1EEC ; DISALLOWED # LATIN CAPITAL LETTER U WITH HORN AND HOOK AB
27182718+1EED ; PVALID # LATIN SMALL LETTER U WITH HORN AND HOOK ABOV
27192719+1EEE ; DISALLOWED # LATIN CAPITAL LETTER U WITH HORN AND TILDE
27202720+1EEF ; PVALID # LATIN SMALL LETTER U WITH HORN AND TILDE
27212721+1EF0 ; DISALLOWED # LATIN CAPITAL LETTER U WITH HORN AND DOT BEL
27222722+1EF1 ; PVALID # LATIN SMALL LETTER U WITH HORN AND DOT BELOW
27232723+1EF2 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH GRAVE
27242724+1EF3 ; PVALID # LATIN SMALL LETTER Y WITH GRAVE
27252725+1EF4 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH DOT BELOW
27262726+1EF5 ; PVALID # LATIN SMALL LETTER Y WITH DOT BELOW
27272727+1EF6 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH HOOK ABOVE
27282728+1EF7 ; PVALID # LATIN SMALL LETTER Y WITH HOOK ABOVE
27292729+1EF8 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH TILDE
27302730+1EF9 ; PVALID # LATIN SMALL LETTER Y WITH TILDE
27312731+1EFA ; DISALLOWED # LATIN CAPITAL LETTER MIDDLE-WELSH LL
27322732+1EFB ; PVALID # LATIN SMALL LETTER MIDDLE-WELSH LL
27332733+1EFC ; DISALLOWED # LATIN CAPITAL LETTER MIDDLE-WELSH V
27342734+1EFD ; PVALID # LATIN SMALL LETTER MIDDLE-WELSH V
27352735+1EFE ; DISALLOWED # LATIN CAPITAL LETTER Y WITH LOOP
27362736+1EFF..1F07 ; PVALID # LATIN SMALL LETTER Y WITH LOOP..GREEK SMALL
27372737+1F08..1F0F ; DISALLOWED # GREEK CAPITAL LETTER ALPHA WITH PSILI..GREEK
27382738+1F10..1F15 ; PVALID # GREEK SMALL LETTER EPSILON WITH PSILI..GREEK
27392739+1F16..1F17 ; UNASSIGNED # <reserved>..<reserved>
27402740+1F18..1F1D ; DISALLOWED # GREEK CAPITAL LETTER EPSILON WITH PSILI..GRE
27412741+1F1E..1F1F ; UNASSIGNED # <reserved>..<reserved>
27422742+1F20..1F27 ; PVALID # GREEK SMALL LETTER ETA WITH PSILI..GREEK SMA
27432743+27442744+27452745+27462746+Faltstrom Standards Track [Page 49]
27472747+27482748+RFC 5892 IDNA Code Points August 2010
27492749+27502750+27512751+1F28..1F2F ; DISALLOWED # GREEK CAPITAL LETTER ETA WITH PSILI..GREEK C
27522752+1F30..1F37 ; PVALID # GREEK SMALL LETTER IOTA WITH PSILI..GREEK SM
27532753+1F38..1F3F ; DISALLOWED # GREEK CAPITAL LETTER IOTA WITH PSILI..GREEK
27542754+1F40..1F45 ; PVALID # GREEK SMALL LETTER OMICRON WITH PSILI..GREEK
27552755+1F46..1F47 ; UNASSIGNED # <reserved>..<reserved>
27562756+1F48..1F4D ; DISALLOWED # GREEK CAPITAL LETTER OMICRON WITH PSILI..GRE
27572757+1F4E..1F4F ; UNASSIGNED # <reserved>..<reserved>
27582758+1F50..1F57 ; PVALID # GREEK SMALL LETTER UPSILON WITH PSILI..GREEK
27592759+1F58 ; UNASSIGNED # <reserved>
27602760+1F59 ; DISALLOWED # GREEK CAPITAL LETTER UPSILON WITH DASIA
27612761+1F5A ; UNASSIGNED # <reserved>
27622762+1F5B ; DISALLOWED # GREEK CAPITAL LETTER UPSILON WITH DASIA AND
27632763+1F5C ; UNASSIGNED # <reserved>
27642764+1F5D ; DISALLOWED # GREEK CAPITAL LETTER UPSILON WITH DASIA AND
27652765+1F5E ; UNASSIGNED # <reserved>
27662766+1F5F ; DISALLOWED # GREEK CAPITAL LETTER UPSILON WITH DASIA AND
27672767+1F60..1F67 ; PVALID # GREEK SMALL LETTER OMEGA WITH PSILI..GREEK S
27682768+1F68..1F6F ; DISALLOWED # GREEK CAPITAL LETTER OMEGA WITH PSILI..GREEK
27692769+1F70 ; PVALID # GREEK SMALL LETTER ALPHA WITH VARIA
27702770+1F71 ; DISALLOWED # GREEK SMALL LETTER ALPHA WITH OXIA
27712771+1F72 ; PVALID # GREEK SMALL LETTER EPSILON WITH VARIA
27722772+1F73 ; DISALLOWED # GREEK SMALL LETTER EPSILON WITH OXIA
27732773+1F74 ; PVALID # GREEK SMALL LETTER ETA WITH VARIA
27742774+1F75 ; DISALLOWED # GREEK SMALL LETTER ETA WITH OXIA
27752775+1F76 ; PVALID # GREEK SMALL LETTER IOTA WITH VARIA
27762776+1F77 ; DISALLOWED # GREEK SMALL LETTER IOTA WITH OXIA
27772777+1F78 ; PVALID # GREEK SMALL LETTER OMICRON WITH VARIA
27782778+1F79 ; DISALLOWED # GREEK SMALL LETTER OMICRON WITH OXIA
27792779+1F7A ; PVALID # GREEK SMALL LETTER UPSILON WITH VARIA
27802780+1F7B ; DISALLOWED # GREEK SMALL LETTER UPSILON WITH OXIA
27812781+1F7C ; PVALID # GREEK SMALL LETTER OMEGA WITH VARIA
27822782+1F7D ; DISALLOWED # GREEK SMALL LETTER OMEGA WITH OXIA
27832783+1F7E..1F7F ; UNASSIGNED # <reserved>..<reserved>
27842784+1F80..1FAF ; DISALLOWED # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOG
27852785+1FB0..1FB1 ; PVALID # GREEK SMALL LETTER ALPHA WITH VRACHY..GREEK
27862786+1FB2..1FB4 ; DISALLOWED # GREEK SMALL LETTER ALPHA WITH VARIA AND YPOG
27872787+1FB5 ; UNASSIGNED # <reserved>
27882788+1FB6 ; PVALID # GREEK SMALL LETTER ALPHA WITH PERISPOMENI
27892789+1FB7..1FC4 ; DISALLOWED # GREEK SMALL LETTER ALPHA WITH PERISPOMENI AN
27902790+1FC5 ; UNASSIGNED # <reserved>
27912791+1FC6 ; PVALID # GREEK SMALL LETTER ETA WITH PERISPOMENI
27922792+1FC7..1FCF ; DISALLOWED # GREEK SMALL LETTER ETA WITH PERISPOMENI AND
27932793+1FD0..1FD2 ; PVALID # GREEK SMALL LETTER IOTA WITH VRACHY..GREEK S
27942794+1FD3 ; DISALLOWED # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND O
27952795+1FD4..1FD5 ; UNASSIGNED # <reserved>..<reserved>
27962796+1FD6..1FD7 ; PVALID # GREEK SMALL LETTER IOTA WITH PERISPOMENI..GR
27972797+1FD8..1FDB ; DISALLOWED # GREEK CAPITAL LETTER IOTA WITH VRACHY..GREEK
27982798+1FDC ; UNASSIGNED # <reserved>
27992799+28002800+28012801+28022802+Faltstrom Standards Track [Page 50]
28032803+28042804+RFC 5892 IDNA Code Points August 2010
28052805+28062806+28072807+1FDD..1FDF ; DISALLOWED # GREEK DASIA AND VARIA..GREEK DASIA AND PERIS
28082808+1FE0..1FE2 ; PVALID # GREEK SMALL LETTER UPSILON WITH VRACHY..GREE
28092809+1FE3 ; DISALLOWED # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AN
28102810+1FE4..1FE7 ; PVALID # GREEK SMALL LETTER RHO WITH PSILI..GREEK SMA
28112811+1FE8..1FEF ; DISALLOWED # GREEK CAPITAL LETTER UPSILON WITH VRACHY..GR
28122812+1FF0..1FF1 ; UNASSIGNED # <reserved>..<reserved>
28132813+1FF2..1FF4 ; DISALLOWED # GREEK SMALL LETTER OMEGA WITH VARIA AND YPOG
28142814+1FF5 ; UNASSIGNED # <reserved>
28152815+1FF6 ; PVALID # GREEK SMALL LETTER OMEGA WITH PERISPOMENI
28162816+1FF7..1FFE ; DISALLOWED # GREEK SMALL LETTER OMEGA WITH PERISPOMENI AN
28172817+1FFF ; UNASSIGNED # <reserved>
28182818+2000..200B ; DISALLOWED # EN QUAD..ZERO WIDTH SPACE
28192819+200C..200D ; CONTEXTJ # ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
28202820+200E..2064 ; DISALLOWED # LEFT-TO-RIGHT MARK..INVISIBLE PLUS
28212821+2065..2069 ; UNASSIGNED # <reserved>..<reserved>
28222822+206A..2071 ; DISALLOWED # INHIBIT SYMMETRIC SWAPPING..SUPERSCRIPT LATI
28232823+2072..2073 ; UNASSIGNED # <reserved>..<reserved>
28242824+2074..208E ; DISALLOWED # SUPERSCRIPT FOUR..SUBSCRIPT RIGHT PARENTHESI
28252825+208F ; UNASSIGNED # <reserved>
28262826+2090..2094 ; DISALLOWED # LATIN SUBSCRIPT SMALL LETTER A..LATIN SUBSCR
28272827+2095..209F ; UNASSIGNED # <reserved>..<reserved>
28282828+20A0..20B8 ; DISALLOWED # EURO-CURRENCY SIGN..TENGE SIGN
28292829+20B9..20CF ; UNASSIGNED # <reserved>..<reserved>
28302830+20D0..20F0 ; DISALLOWED # COMBINING LEFT HARPOON ABOVE..COMBINING ASTE
28312831+20F1..20FF ; UNASSIGNED # <reserved>..<reserved>
28322832+2100..214D ; DISALLOWED # ACCOUNT OF..AKTIESELSKAB
28332833+214E ; PVALID # TURNED SMALL F
28342834+214F..2183 ; DISALLOWED # SYMBOL FOR SAMARITAN SOURCE..ROMAN NUMERAL R
28352835+2184 ; PVALID # LATIN SMALL LETTER REVERSED C
28362836+2185..2189 ; DISALLOWED # ROMAN NUMERAL SIX LATE FORM..VULGAR FRACTION
28372837+218A..218F ; UNASSIGNED # <reserved>..<reserved>
28382838+2190..23E8 ; DISALLOWED # LEFTWARDS ARROW..DECIMAL EXPONENT SYMBOL
28392839+23E9..23FF ; UNASSIGNED # <reserved>..<reserved>
28402840+2400..2426 ; DISALLOWED # SYMBOL FOR NULL..SYMBOL FOR SUBSTITUTE FORM
28412841+2427..243F ; UNASSIGNED # <reserved>..<reserved>
28422842+2440..244A ; DISALLOWED # OCR HOOK..OCR DOUBLE BACKSLASH
28432843+244B..245F ; UNASSIGNED # <reserved>..<reserved>
28442844+2460..26CD ; DISALLOWED # CIRCLED DIGIT ONE..DISABLED CAR
28452845+26CE ; UNASSIGNED # <reserved>
28462846+26CF..26E1 ; DISALLOWED # PICK..RESTRICTED LEFT ENTRY-2
28472847+26E2 ; UNASSIGNED # <reserved>
28482848+26E3 ; DISALLOWED # HEAVY CIRCLE WITH STROKE AND TWO DOTS ABOVE
28492849+26E4..26E7 ; UNASSIGNED # <reserved>..<reserved>
28502850+26E8..26FF ; DISALLOWED # BLACK CROSS ON SHIELD..WHITE FLAG WITH HORIZ
28512851+2700 ; UNASSIGNED # <reserved>
28522852+2701..2704 ; DISALLOWED # UPPER BLADE SCISSORS..WHITE SCISSORS
28532853+2705 ; UNASSIGNED # <reserved>
28542854+2706..2709 ; DISALLOWED # TELEPHONE LOCATION SIGN..ENVELOPE
28552855+28562856+28572857+28582858+Faltstrom Standards Track [Page 51]
28592859+28602860+RFC 5892 IDNA Code Points August 2010
28612861+28622862+28632863+270A..270B ; UNASSIGNED # <reserved>..<reserved>
28642864+270C..2727 ; DISALLOWED # VICTORY HAND..WHITE FOUR POINTED STAR
28652865+2728 ; UNASSIGNED # <reserved>
28662866+2729..274B ; DISALLOWED # STRESS OUTLINED WHITE STAR..HEAVY EIGHT TEAR
28672867+274C ; UNASSIGNED # <reserved>
28682868+274D ; DISALLOWED # SHADOWED WHITE CIRCLE
28692869+274E ; UNASSIGNED # <reserved>
28702870+274F..2752 ; DISALLOWED # LOWER RIGHT DROP-SHADOWED WHITE SQUARE..UPPE
28712871+2753..2755 ; UNASSIGNED # <reserved>..<reserved>
28722872+2756..275E ; DISALLOWED # BLACK DIAMOND MINUS WHITE X..HEAVY DOUBLE CO
28732873+275F..2760 ; UNASSIGNED # <reserved>..<reserved>
28742874+2761..2794 ; DISALLOWED # CURVED STEM PARAGRAPH SIGN ORNAMENT..HEAVY W
28752875+2795..2797 ; UNASSIGNED # <reserved>..<reserved>
28762876+2798..27AF ; DISALLOWED # HEAVY SOUTH EAST ARROW..NOTCHED LOWER RIGHT-
28772877+27B0 ; UNASSIGNED # <reserved>
28782878+27B1..27BE ; DISALLOWED # NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARD
28792879+27BF ; UNASSIGNED # <reserved>
28802880+27C0..27CA ; DISALLOWED # THREE DIMENSIONAL ANGLE..VERTICAL BAR WITH H
28812881+27CB ; UNASSIGNED # <reserved>
28822882+27CC ; DISALLOWED # LONG DIVISION
28832883+27CD..27CF ; UNASSIGNED # <reserved>..<reserved>
28842884+27D0..2B4C ; DISALLOWED # WHITE DIAMOND WITH CENTRED DOT..RIGHTWARDS A
28852885+2B4D..2B4F ; UNASSIGNED # <reserved>..<reserved>
28862886+2B50..2B59 ; DISALLOWED # WHITE MEDIUM STAR..HEAVY CIRCLED SALTIRE
28872887+2B5A..2BFF ; UNASSIGNED # <reserved>..<reserved>
28882888+2C00..2C2E ; DISALLOWED # GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CA
28892889+2C2F ; UNASSIGNED # <reserved>
28902890+2C30..2C5E ; PVALID # GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMAL
28912891+2C5F ; UNASSIGNED # <reserved>
28922892+2C60 ; DISALLOWED # LATIN CAPITAL LETTER L WITH DOUBLE BAR
28932893+2C61 ; PVALID # LATIN SMALL LETTER L WITH DOUBLE BAR
28942894+2C62..2C64 ; DISALLOWED # LATIN CAPITAL LETTER L WITH MIDDLE TILDE..LA
28952895+2C65..2C66 ; PVALID # LATIN SMALL LETTER A WITH STROKE..LATIN SMAL
28962896+2C67 ; DISALLOWED # LATIN CAPITAL LETTER H WITH DESCENDER
28972897+2C68 ; PVALID # LATIN SMALL LETTER H WITH DESCENDER
28982898+2C69 ; DISALLOWED # LATIN CAPITAL LETTER K WITH DESCENDER
28992899+2C6A ; PVALID # LATIN SMALL LETTER K WITH DESCENDER
29002900+2C6B ; DISALLOWED # LATIN CAPITAL LETTER Z WITH DESCENDER
29012901+2C6C ; PVALID # LATIN SMALL LETTER Z WITH DESCENDER
29022902+2C6D..2C70 ; DISALLOWED # LATIN CAPITAL LETTER ALPHA..LATIN CAPITAL LE
29032903+2C71 ; PVALID # LATIN SMALL LETTER V WITH RIGHT HOOK
29042904+2C72 ; DISALLOWED # LATIN CAPITAL LETTER W WITH HOOK
29052905+2C73..2C74 ; PVALID # LATIN SMALL LETTER W WITH HOOK..LATIN SMALL
29062906+2C75 ; DISALLOWED # LATIN CAPITAL LETTER HALF H
29072907+2C76..2C7B ; PVALID # LATIN SMALL LETTER HALF H..LATIN LETTER SMAL
29082908+2C7C..2C80 ; DISALLOWED # LATIN SUBSCRIPT SMALL LETTER J..COPTIC CAPIT
29092909+2C81 ; PVALID # COPTIC SMALL LETTER ALFA
29102910+2C82 ; DISALLOWED # COPTIC CAPITAL LETTER VIDA
29112911+29122912+29132913+29142914+Faltstrom Standards Track [Page 52]
29152915+29162916+RFC 5892 IDNA Code Points August 2010
29172917+29182918+29192919+2C83 ; PVALID # COPTIC SMALL LETTER VIDA
29202920+2C84 ; DISALLOWED # COPTIC CAPITAL LETTER GAMMA
29212921+2C85 ; PVALID # COPTIC SMALL LETTER GAMMA
29222922+2C86 ; DISALLOWED # COPTIC CAPITAL LETTER DALDA
29232923+2C87 ; PVALID # COPTIC SMALL LETTER DALDA
29242924+2C88 ; DISALLOWED # COPTIC CAPITAL LETTER EIE
29252925+2C89 ; PVALID # COPTIC SMALL LETTER EIE
29262926+2C8A ; DISALLOWED # COPTIC CAPITAL LETTER SOU
29272927+2C8B ; PVALID # COPTIC SMALL LETTER SOU
29282928+2C8C ; DISALLOWED # COPTIC CAPITAL LETTER ZATA
29292929+2C8D ; PVALID # COPTIC SMALL LETTER ZATA
29302930+2C8E ; DISALLOWED # COPTIC CAPITAL LETTER HATE
29312931+2C8F ; PVALID # COPTIC SMALL LETTER HATE
29322932+2C90 ; DISALLOWED # COPTIC CAPITAL LETTER THETHE
29332933+2C91 ; PVALID # COPTIC SMALL LETTER THETHE
29342934+2C92 ; DISALLOWED # COPTIC CAPITAL LETTER IAUDA
29352935+2C93 ; PVALID # COPTIC SMALL LETTER IAUDA
29362936+2C94 ; DISALLOWED # COPTIC CAPITAL LETTER KAPA
29372937+2C95 ; PVALID # COPTIC SMALL LETTER KAPA
29382938+2C96 ; DISALLOWED # COPTIC CAPITAL LETTER LAULA
29392939+2C97 ; PVALID # COPTIC SMALL LETTER LAULA
29402940+2C98 ; DISALLOWED # COPTIC CAPITAL LETTER MI
29412941+2C99 ; PVALID # COPTIC SMALL LETTER MI
29422942+2C9A ; DISALLOWED # COPTIC CAPITAL LETTER NI
29432943+2C9B ; PVALID # COPTIC SMALL LETTER NI
29442944+2C9C ; DISALLOWED # COPTIC CAPITAL LETTER KSI
29452945+2C9D ; PVALID # COPTIC SMALL LETTER KSI
29462946+2C9E ; DISALLOWED # COPTIC CAPITAL LETTER O
29472947+2C9F ; PVALID # COPTIC SMALL LETTER O
29482948+2CA0 ; DISALLOWED # COPTIC CAPITAL LETTER PI
29492949+2CA1 ; PVALID # COPTIC SMALL LETTER PI
29502950+2CA2 ; DISALLOWED # COPTIC CAPITAL LETTER RO
29512951+2CA3 ; PVALID # COPTIC SMALL LETTER RO
29522952+2CA4 ; DISALLOWED # COPTIC CAPITAL LETTER SIMA
29532953+2CA5 ; PVALID # COPTIC SMALL LETTER SIMA
29542954+2CA6 ; DISALLOWED # COPTIC CAPITAL LETTER TAU
29552955+2CA7 ; PVALID # COPTIC SMALL LETTER TAU
29562956+2CA8 ; DISALLOWED # COPTIC CAPITAL LETTER UA
29572957+2CA9 ; PVALID # COPTIC SMALL LETTER UA
29582958+2CAA ; DISALLOWED # COPTIC CAPITAL LETTER FI
29592959+2CAB ; PVALID # COPTIC SMALL LETTER FI
29602960+2CAC ; DISALLOWED # COPTIC CAPITAL LETTER KHI
29612961+2CAD ; PVALID # COPTIC SMALL LETTER KHI
29622962+2CAE ; DISALLOWED # COPTIC CAPITAL LETTER PSI
29632963+2CAF ; PVALID # COPTIC SMALL LETTER PSI
29642964+2CB0 ; DISALLOWED # COPTIC CAPITAL LETTER OOU
29652965+2CB1 ; PVALID # COPTIC SMALL LETTER OOU
29662966+2CB2 ; DISALLOWED # COPTIC CAPITAL LETTER DIALECT-P ALEF
29672967+29682968+29692969+29702970+Faltstrom Standards Track [Page 53]
29712971+29722972+RFC 5892 IDNA Code Points August 2010
29732973+29742974+29752975+2CB3 ; PVALID # COPTIC SMALL LETTER DIALECT-P ALEF
29762976+2CB4 ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC AIN
29772977+2CB5 ; PVALID # COPTIC SMALL LETTER OLD COPTIC AIN
29782978+2CB6 ; DISALLOWED # COPTIC CAPITAL LETTER CRYPTOGRAMMIC EIE
29792979+2CB7 ; PVALID # COPTIC SMALL LETTER CRYPTOGRAMMIC EIE
29802980+2CB8 ; DISALLOWED # COPTIC CAPITAL LETTER DIALECT-P KAPA
29812981+2CB9 ; PVALID # COPTIC SMALL LETTER DIALECT-P KAPA
29822982+2CBA ; DISALLOWED # COPTIC CAPITAL LETTER DIALECT-P NI
29832983+2CBB ; PVALID # COPTIC SMALL LETTER DIALECT-P NI
29842984+2CBC ; DISALLOWED # COPTIC CAPITAL LETTER CRYPTOGRAMMIC NI
29852985+2CBD ; PVALID # COPTIC SMALL LETTER CRYPTOGRAMMIC NI
29862986+2CBE ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC OOU
29872987+2CBF ; PVALID # COPTIC SMALL LETTER OLD COPTIC OOU
29882988+2CC0 ; DISALLOWED # COPTIC CAPITAL LETTER SAMPI
29892989+2CC1 ; PVALID # COPTIC SMALL LETTER SAMPI
29902990+2CC2 ; DISALLOWED # COPTIC CAPITAL LETTER CROSSED SHEI
29912991+2CC3 ; PVALID # COPTIC SMALL LETTER CROSSED SHEI
29922992+2CC4 ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC SHEI
29932993+2CC5 ; PVALID # COPTIC SMALL LETTER OLD COPTIC SHEI
29942994+2CC6 ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC ESH
29952995+2CC7 ; PVALID # COPTIC SMALL LETTER OLD COPTIC ESH
29962996+2CC8 ; DISALLOWED # COPTIC CAPITAL LETTER AKHMIMIC KHEI
29972997+2CC9 ; PVALID # COPTIC SMALL LETTER AKHMIMIC KHEI
29982998+2CCA ; DISALLOWED # COPTIC CAPITAL LETTER DIALECT-P HORI
29992999+2CCB ; PVALID # COPTIC SMALL LETTER DIALECT-P HORI
30003000+2CCC ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC HORI
30013001+2CCD ; PVALID # COPTIC SMALL LETTER OLD COPTIC HORI
30023002+2CCE ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC HA
30033003+2CCF ; PVALID # COPTIC SMALL LETTER OLD COPTIC HA
30043004+2CD0 ; DISALLOWED # COPTIC CAPITAL LETTER L-SHAPED HA
30053005+2CD1 ; PVALID # COPTIC SMALL LETTER L-SHAPED HA
30063006+2CD2 ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC HEI
30073007+2CD3 ; PVALID # COPTIC SMALL LETTER OLD COPTIC HEI
30083008+2CD4 ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC HAT
30093009+2CD5 ; PVALID # COPTIC SMALL LETTER OLD COPTIC HAT
30103010+2CD6 ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC GANGIA
30113011+2CD7 ; PVALID # COPTIC SMALL LETTER OLD COPTIC GANGIA
30123012+2CD8 ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC DJA
30133013+2CD9 ; PVALID # COPTIC SMALL LETTER OLD COPTIC DJA
30143014+2CDA ; DISALLOWED # COPTIC CAPITAL LETTER OLD COPTIC SHIMA
30153015+2CDB ; PVALID # COPTIC SMALL LETTER OLD COPTIC SHIMA
30163016+2CDC ; DISALLOWED # COPTIC CAPITAL LETTER OLD NUBIAN SHIMA
30173017+2CDD ; PVALID # COPTIC SMALL LETTER OLD NUBIAN SHIMA
30183018+2CDE ; DISALLOWED # COPTIC CAPITAL LETTER OLD NUBIAN NGI
30193019+2CDF ; PVALID # COPTIC SMALL LETTER OLD NUBIAN NGI
30203020+2CE0 ; DISALLOWED # COPTIC CAPITAL LETTER OLD NUBIAN NYI
30213021+2CE1 ; PVALID # COPTIC SMALL LETTER OLD NUBIAN NYI
30223022+2CE2 ; DISALLOWED # COPTIC CAPITAL LETTER OLD NUBIAN WAU
30233023+30243024+30253025+30263026+Faltstrom Standards Track [Page 54]
30273027+30283028+RFC 5892 IDNA Code Points August 2010
30293029+30303030+30313031+2CE3..2CE4 ; PVALID # COPTIC SMALL LETTER OLD NUBIAN WAU..COPTIC S
30323032+2CE5..2CEB ; DISALLOWED # COPTIC SYMBOL MI RO..COPTIC CAPITAL LETTER C
30333033+2CEC ; PVALID # COPTIC SMALL LETTER CRYPTOGRAMMIC SHEI
30343034+2CED ; DISALLOWED # COPTIC CAPITAL LETTER CRYPTOGRAMMIC GANGIA
30353035+2CEE..2CF1 ; PVALID # COPTIC SMALL LETTER CRYPTOGRAMMIC GANGIA..CO
30363036+2CF2..2CF8 ; UNASSIGNED # <reserved>..<reserved>
30373037+2CF9..2CFF ; DISALLOWED # COPTIC OLD NUBIAN FULL STOP..COPTIC MORPHOLO
30383038+2D00..2D25 ; PVALID # GEORGIAN SMALL LETTER AN..GEORGIAN SMALL LET
30393039+2D26..2D2F ; UNASSIGNED # <reserved>..<reserved>
30403040+2D30..2D65 ; PVALID # TIFINAGH LETTER YA..TIFINAGH LETTER YAZZ
30413041+2D66..2D6E ; UNASSIGNED # <reserved>..<reserved>
30423042+2D6F ; DISALLOWED # TIFINAGH MODIFIER LETTER LABIALIZATION MARK
30433043+2D70..2D7F ; UNASSIGNED # <reserved>..<reserved>
30443044+2D80..2D96 ; PVALID # ETHIOPIC SYLLABLE LOA..ETHIOPIC SYLLABLE GGW
30453045+2D97..2D9F ; UNASSIGNED # <reserved>..<reserved>
30463046+2DA0..2DA6 ; PVALID # ETHIOPIC SYLLABLE SSA..ETHIOPIC SYLLABLE SSO
30473047+2DA7 ; UNASSIGNED # <reserved>
30483048+2DA8..2DAE ; PVALID # ETHIOPIC SYLLABLE CCA..ETHIOPIC SYLLABLE CCO
30493049+2DAF ; UNASSIGNED # <reserved>
30503050+2DB0..2DB6 ; PVALID # ETHIOPIC SYLLABLE ZZA..ETHIOPIC SYLLABLE ZZO
30513051+2DB7 ; UNASSIGNED # <reserved>
30523052+2DB8..2DBE ; PVALID # ETHIOPIC SYLLABLE CCHA..ETHIOPIC SYLLABLE CC
30533053+2DBF ; UNASSIGNED # <reserved>
30543054+2DC0..2DC6 ; PVALID # ETHIOPIC SYLLABLE QYA..ETHIOPIC SYLLABLE QYO
30553055+2DC7 ; UNASSIGNED # <reserved>
30563056+2DC8..2DCE ; PVALID # ETHIOPIC SYLLABLE KYA..ETHIOPIC SYLLABLE KYO
30573057+2DCF ; UNASSIGNED # <reserved>
30583058+2DD0..2DD6 ; PVALID # ETHIOPIC SYLLABLE XYA..ETHIOPIC SYLLABLE XYO
30593059+2DD7 ; UNASSIGNED # <reserved>
30603060+2DD8..2DDE ; PVALID # ETHIOPIC SYLLABLE GYA..ETHIOPIC SYLLABLE GYO
30613061+2DDF ; UNASSIGNED # <reserved>
30623062+2DE0..2DFF ; PVALID # COMBINING CYRILLIC LETTER BE..COMBINING CYRI
30633063+2E00..2E2E ; DISALLOWED # RIGHT ANGLE SUBSTITUTION MARKER..REVERSED QU
30643064+2E2F ; PVALID # VERTICAL TILDE
30653065+2E30..2E31 ; DISALLOWED # RING POINT..WORD SEPARATOR MIDDLE DOT
30663066+2E32..2E7F ; UNASSIGNED # <reserved>..<reserved>
30673067+2E80..2E99 ; DISALLOWED # CJK RADICAL REPEAT..CJK RADICAL RAP
30683068+2E9A ; UNASSIGNED # <reserved>
30693069+2E9B..2EF3 ; DISALLOWED # CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED
30703070+2EF4..2EFF ; UNASSIGNED # <reserved>..<reserved>
30713071+2F00..2FD5 ; DISALLOWED # KANGXI RADICAL ONE..KANGXI RADICAL FLUTE
30723072+2FD6..2FEF ; UNASSIGNED # <reserved>..<reserved>
30733073+2FF0..2FFB ; DISALLOWED # IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RI
30743074+2FFC..2FFF ; UNASSIGNED # <reserved>..<reserved>
30753075+3000..3004 ; DISALLOWED # IDEOGRAPHIC SPACE..JAPANESE INDUSTRIAL STAND
30763076+3005..3007 ; PVALID # IDEOGRAPHIC ITERATION MARK..IDEOGRAPHIC NUMB
30773077+3008..3029 ; DISALLOWED # LEFT ANGLE BRACKET..HANGZHOU NUMERAL NINE
30783078+302A..302D ; PVALID # IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC ENT
30793079+30803080+30813081+30823082+Faltstrom Standards Track [Page 55]
30833083+30843084+RFC 5892 IDNA Code Points August 2010
30853085+30863086+30873087+302E..303B ; DISALLOWED # HANGUL SINGLE DOT TONE MARK..VERTICAL IDEOGR
30883088+303C ; PVALID # MASU MARK
30893089+303D..303F ; DISALLOWED # PART ALTERNATION MARK..IDEOGRAPHIC HALF FILL
30903090+3040 ; UNASSIGNED # <reserved>
30913091+3041..3096 ; PVALID # HIRAGANA LETTER SMALL A..HIRAGANA LETTER SMA
30923092+3097..3098 ; UNASSIGNED # <reserved>..<reserved>
30933093+3099..309A ; PVALID # COMBINING KATAKANA-HIRAGANA VOICED SOUND MAR
30943094+309B..309C ; DISALLOWED # KATAKANA-HIRAGANA VOICED SOUND MARK..KATAKAN
30953095+309D..309E ; PVALID # HIRAGANA ITERATION MARK..HIRAGANA VOICED ITE
30963096+309F..30A0 ; DISALLOWED # HIRAGANA DIGRAPH YORI..KATAKANA-HIRAGANA DOU
30973097+30A1..30FA ; PVALID # KATAKANA LETTER SMALL A..KATAKANA LETTER VO
30983098+30FB ; CONTEXTO # KATAKANA MIDDLE DOT
30993099+30FC..30FE ; PVALID # KATAKANA-HIRAGANA PROLONGED SOUND MARK..KATA
31003100+30FF ; DISALLOWED # KATAKANA DIGRAPH KOTO
31013101+3100..3104 ; UNASSIGNED # <reserved>..<reserved>
31023102+3105..312D ; PVALID # BOPOMOFO LETTER B..BOPOMOFO LETTER IH
31033103+312E..3130 ; UNASSIGNED # <reserved>..<reserved>
31043104+3131..318E ; DISALLOWED # HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE
31053105+318F ; UNASSIGNED # <reserved>
31063106+3190..319F ; DISALLOWED # IDEOGRAPHIC ANNOTATION LINKING MARK..IDEOGRA
31073107+31A0..31B7 ; PVALID # BOPOMOFO LETTER BU..BOPOMOFO FINAL LETTER H
31083108+31B8..31BF ; UNASSIGNED # <reserved>..<reserved>
31093109+31C0..31E3 ; DISALLOWED # CJK STROKE T..CJK STROKE Q
31103110+31E4..31EF ; UNASSIGNED # <reserved>..<reserved>
31113111+31F0..31FF ; PVALID # KATAKANA LETTER SMALL KU..KATAKANA LETTER SM
31123112+3200..321E ; DISALLOWED # PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED K
31133113+321F ; UNASSIGNED # <reserved>
31143114+3220..32FE ; DISALLOWED # PARENTHESIZED IDEOGRAPH ONE..CIRCLED KATAKAN
31153115+32FF ; UNASSIGNED # <reserved>
31163116+3300..33FF ; DISALLOWED # SQUARE APAATO..SQUARE GAL
31173117+3400..4DB5 ; PVALID # <CJK Ideograph Extension A>..<CJK Ideograph
31183118+4DB6..4DBF ; UNASSIGNED # <reserved>..<reserved>
31193119+4DC0..4DFF ; DISALLOWED # HEXAGRAM FOR THE CREATIVE HEAVEN..HEXAGRAM F
31203120+4E00..9FCB ; PVALID # <CJK Ideograph>..<CJK Ideograph>
31213121+9FCC..9FFF ; UNASSIGNED # <reserved>..<reserved>
31223122+A000..A48C ; PVALID # YI SYLLABLE IT..YI SYLLABLE YYR
31233123+A48D..A48F ; UNASSIGNED # <reserved>..<reserved>
31243124+A490..A4C6 ; DISALLOWED # YI RADICAL QOT..YI RADICAL KE
31253125+A4C7..A4CF ; UNASSIGNED # <reserved>..<reserved>
31263126+A4D0..A4FD ; PVALID # LISU LETTER BA..LISU LETTER TONE MYA JEU
31273127+A4FE..A4FF ; DISALLOWED # LISU PUNCTUATION COMMA..LISU PUNCTUATION FUL
31283128+A500..A60C ; PVALID # VAI SYLLABLE EE..VAI SYLLABLE LENGTHENER
31293129+A60D..A60F ; DISALLOWED # VAI COMMA..VAI QUESTION MARK
31303130+A610..A62B ; PVALID # VAI SYLLABLE NDOLE FA..VAI SYLLABLE NDOLE DO
31313131+A62C..A63F ; UNASSIGNED # <reserved>..<reserved>
31323132+A640 ; DISALLOWED # CYRILLIC CAPITAL LETTER ZEMLYA
31333133+A641 ; PVALID # CYRILLIC SMALL LETTER ZEMLYA
31343134+A642 ; DISALLOWED # CYRILLIC CAPITAL LETTER DZELO
31353135+31363136+31373137+31383138+Faltstrom Standards Track [Page 56]
31393139+31403140+RFC 5892 IDNA Code Points August 2010
31413141+31423142+31433143+A643 ; PVALID # CYRILLIC SMALL LETTER DZELO
31443144+A644 ; DISALLOWED # CYRILLIC CAPITAL LETTER REVERSED DZE
31453145+A645 ; PVALID # CYRILLIC SMALL LETTER REVERSED DZE
31463146+A646 ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTA
31473147+A647 ; PVALID # CYRILLIC SMALL LETTER IOTA
31483148+A648 ; DISALLOWED # CYRILLIC CAPITAL LETTER DJERV
31493149+A649 ; PVALID # CYRILLIC SMALL LETTER DJERV
31503150+A64A ; DISALLOWED # CYRILLIC CAPITAL LETTER MONOGRAPH UK
31513151+A64B ; PVALID # CYRILLIC SMALL LETTER MONOGRAPH UK
31523152+A64C ; DISALLOWED # CYRILLIC CAPITAL LETTER BROAD OMEGA
31533153+A64D ; PVALID # CYRILLIC SMALL LETTER BROAD OMEGA
31543154+A64E ; DISALLOWED # CYRILLIC CAPITAL LETTER NEUTRAL YER
31553155+A64F ; PVALID # CYRILLIC SMALL LETTER NEUTRAL YER
31563156+A650 ; DISALLOWED # CYRILLIC CAPITAL LETTER YERU WITH BACK YER
31573157+A651 ; PVALID # CYRILLIC SMALL LETTER YERU WITH BACK YER
31583158+A652 ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED YAT
31593159+A653 ; PVALID # CYRILLIC SMALL LETTER IOTIFIED YAT
31603160+A654 ; DISALLOWED # CYRILLIC CAPITAL LETTER REVERSED YU
31613161+A655 ; PVALID # CYRILLIC SMALL LETTER REVERSED YU
31623162+A656 ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED A
31633163+A657 ; PVALID # CYRILLIC SMALL LETTER IOTIFIED A
31643164+A658 ; DISALLOWED # CYRILLIC CAPITAL LETTER CLOSED LITTLE YUS
31653165+A659 ; PVALID # CYRILLIC SMALL LETTER CLOSED LITTLE YUS
31663166+A65A ; DISALLOWED # CYRILLIC CAPITAL LETTER BLENDED YUS
31673167+A65B ; PVALID # CYRILLIC SMALL LETTER BLENDED YUS
31683168+A65C ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED CLOSED LITT
31693169+A65D ; PVALID # CYRILLIC SMALL LETTER IOTIFIED CLOSED LITTLE
31703170+A65E ; DISALLOWED # CYRILLIC CAPITAL LETTER YN
31713171+A65F ; PVALID # CYRILLIC SMALL LETTER YN
31723172+A660..A661 ; UNASSIGNED # <reserved>..<reserved>
31733173+A662 ; DISALLOWED # CYRILLIC CAPITAL LETTER SOFT DE
31743174+A663 ; PVALID # CYRILLIC SMALL LETTER SOFT DE
31753175+A664 ; DISALLOWED # CYRILLIC CAPITAL LETTER SOFT EL
31763176+A665 ; PVALID # CYRILLIC SMALL LETTER SOFT EL
31773177+A666 ; DISALLOWED # CYRILLIC CAPITAL LETTER SOFT EM
31783178+A667 ; PVALID # CYRILLIC SMALL LETTER SOFT EM
31793179+A668 ; DISALLOWED # CYRILLIC CAPITAL LETTER MONOCULAR O
31803180+A669 ; PVALID # CYRILLIC SMALL LETTER MONOCULAR O
31813181+A66A ; DISALLOWED # CYRILLIC CAPITAL LETTER BINOCULAR O
31823182+A66B ; PVALID # CYRILLIC SMALL LETTER BINOCULAR O
31833183+A66C ; DISALLOWED # CYRILLIC CAPITAL LETTER DOUBLE MONOCULAR O
31843184+A66D..A66F ; PVALID # CYRILLIC SMALL LETTER DOUBLE MONOCULAR O..CO
31853185+A670..A673 ; DISALLOWED # COMBINING CYRILLIC TEN MILLIONS SIGN..SLAVON
31863186+A674..A67B ; UNASSIGNED # <reserved>..<reserved>
31873187+A67C..A67D ; PVALID # COMBINING CYRILLIC KAVYKA..COMBINING CYRILLI
31883188+A67E ; DISALLOWED # CYRILLIC KAVYKA
31893189+A67F ; PVALID # CYRILLIC PAYEROK
31903190+A680 ; DISALLOWED # CYRILLIC CAPITAL LETTER DWE
31913191+31923192+31933193+31943194+Faltstrom Standards Track [Page 57]
31953195+31963196+RFC 5892 IDNA Code Points August 2010
31973197+31983198+31993199+A681 ; PVALID # CYRILLIC SMALL LETTER DWE
32003200+A682 ; DISALLOWED # CYRILLIC CAPITAL LETTER DZWE
32013201+A683 ; PVALID # CYRILLIC SMALL LETTER DZWE
32023202+A684 ; DISALLOWED # CYRILLIC CAPITAL LETTER ZHWE
32033203+A685 ; PVALID # CYRILLIC SMALL LETTER ZHWE
32043204+A686 ; DISALLOWED # CYRILLIC CAPITAL LETTER CCHE
32053205+A687 ; PVALID # CYRILLIC SMALL LETTER CCHE
32063206+A688 ; DISALLOWED # CYRILLIC CAPITAL LETTER DZZE
32073207+A689 ; PVALID # CYRILLIC SMALL LETTER DZZE
32083208+A68A ; DISALLOWED # CYRILLIC CAPITAL LETTER TE WITH MIDDLE HOOK
32093209+A68B ; PVALID # CYRILLIC SMALL LETTER TE WITH MIDDLE HOOK
32103210+A68C ; DISALLOWED # CYRILLIC CAPITAL LETTER TWE
32113211+A68D ; PVALID # CYRILLIC SMALL LETTER TWE
32123212+A68E ; DISALLOWED # CYRILLIC CAPITAL LETTER TSWE
32133213+A68F ; PVALID # CYRILLIC SMALL LETTER TSWE
32143214+A690 ; DISALLOWED # CYRILLIC CAPITAL LETTER TSSE
32153215+A691 ; PVALID # CYRILLIC SMALL LETTER TSSE
32163216+A692 ; DISALLOWED # CYRILLIC CAPITAL LETTER TCHE
32173217+A693 ; PVALID # CYRILLIC SMALL LETTER TCHE
32183218+A694 ; DISALLOWED # CYRILLIC CAPITAL LETTER HWE
32193219+A695 ; PVALID # CYRILLIC SMALL LETTER HWE
32203220+A696 ; DISALLOWED # CYRILLIC CAPITAL LETTER SHWE
32213221+A697 ; PVALID # CYRILLIC SMALL LETTER SHWE
32223222+A698..A69F ; UNASSIGNED # <reserved>..<reserved>
32233223+A6A0..A6E5 ; PVALID # BAMUM LETTER A..BAMUM LETTER KI
32243224+A6E6..A6EF ; DISALLOWED # BAMUM LETTER MO..BAMUM LETTER KOGHOM
32253225+A6F0..A6F1 ; PVALID # BAMUM COMBINING MARK KOQNDON..BAMUM COMBININ
32263226+A6F2..A6F7 ; DISALLOWED # BAMUM NJAEMLI..BAMUM QUESTION MARK
32273227+A6F8..A6FF ; UNASSIGNED # <reserved>..<reserved>
32283228+A700..A716 ; DISALLOWED # MODIFIER LETTER CHINESE TONE YIN PING..MODIF
32293229+A717..A71F ; PVALID # MODIFIER LETTER DOT VERTICAL BAR..MODIFIER L
32303230+A720..A722 ; DISALLOWED # MODIFIER LETTER STRESS AND HIGH TONE..LATIN
32313231+A723 ; PVALID # LATIN SMALL LETTER EGYPTOLOGICAL ALEF
32323232+A724 ; DISALLOWED # LATIN CAPITAL LETTER EGYPTOLOGICAL AIN
32333233+A725 ; PVALID # LATIN SMALL LETTER EGYPTOLOGICAL AIN
32343234+A726 ; DISALLOWED # LATIN CAPITAL LETTER HENG
32353235+A727 ; PVALID # LATIN SMALL LETTER HENG
32363236+A728 ; DISALLOWED # LATIN CAPITAL LETTER TZ
32373237+A729 ; PVALID # LATIN SMALL LETTER TZ
32383238+A72A ; DISALLOWED # LATIN CAPITAL LETTER TRESILLO
32393239+A72B ; PVALID # LATIN SMALL LETTER TRESILLO
32403240+A72C ; DISALLOWED # LATIN CAPITAL LETTER CUATRILLO
32413241+A72D ; PVALID # LATIN SMALL LETTER CUATRILLO
32423242+A72E ; DISALLOWED # LATIN CAPITAL LETTER CUATRILLO WITH COMMA
32433243+A72F..A731 ; PVALID # LATIN SMALL LETTER CUATRILLO WITH COMMA..LAT
32443244+A732 ; DISALLOWED # LATIN CAPITAL LETTER AA
32453245+A733 ; PVALID # LATIN SMALL LETTER AA
32463246+A734 ; DISALLOWED # LATIN CAPITAL LETTER AO
32473247+32483248+32493249+32503250+Faltstrom Standards Track [Page 58]
32513251+32523252+RFC 5892 IDNA Code Points August 2010
32533253+32543254+32553255+A735 ; PVALID # LATIN SMALL LETTER AO
32563256+A736 ; DISALLOWED # LATIN CAPITAL LETTER AU
32573257+A737 ; PVALID # LATIN SMALL LETTER AU
32583258+A738 ; DISALLOWED # LATIN CAPITAL LETTER AV
32593259+A739 ; PVALID # LATIN SMALL LETTER AV
32603260+A73A ; DISALLOWED # LATIN CAPITAL LETTER AV WITH HORIZONTAL BAR
32613261+A73B ; PVALID # LATIN SMALL LETTER AV WITH HORIZONTAL BAR
32623262+A73C ; DISALLOWED # LATIN CAPITAL LETTER AY
32633263+A73D ; PVALID # LATIN SMALL LETTER AY
32643264+A73E ; DISALLOWED # LATIN CAPITAL LETTER REVERSED C WITH DOT
32653265+A73F ; PVALID # LATIN SMALL LETTER REVERSED C WITH DOT
32663266+A740 ; DISALLOWED # LATIN CAPITAL LETTER K WITH STROKE
32673267+A741 ; PVALID # LATIN SMALL LETTER K WITH STROKE
32683268+A742 ; DISALLOWED # LATIN CAPITAL LETTER K WITH DIAGONAL STROKE
32693269+A743 ; PVALID # LATIN SMALL LETTER K WITH DIAGONAL STROKE
32703270+A744 ; DISALLOWED # LATIN CAPITAL LETTER K WITH STROKE AND DIAGO
32713271+A745 ; PVALID # LATIN SMALL LETTER K WITH STROKE AND DIAGONA
32723272+A746 ; DISALLOWED # LATIN CAPITAL LETTER BROKEN L
32733273+A747 ; PVALID # LATIN SMALL LETTER BROKEN L
32743274+A748 ; DISALLOWED # LATIN CAPITAL LETTER L WITH HIGH STROKE
32753275+A749 ; PVALID # LATIN SMALL LETTER L WITH HIGH STROKE
32763276+A74A ; DISALLOWED # LATIN CAPITAL LETTER O WITH LONG STROKE OVER
32773277+A74B ; PVALID # LATIN SMALL LETTER O WITH LONG STROKE OVERLA
32783278+A74C ; DISALLOWED # LATIN CAPITAL LETTER O WITH LOOP
32793279+A74D ; PVALID # LATIN SMALL LETTER O WITH LOOP
32803280+A74E ; DISALLOWED # LATIN CAPITAL LETTER OO
32813281+A74F ; PVALID # LATIN SMALL LETTER OO
32823282+A750 ; DISALLOWED # LATIN CAPITAL LETTER P WITH STROKE THROUGH D
32833283+A751 ; PVALID # LATIN SMALL LETTER P WITH STROKE THROUGH DES
32843284+A752 ; DISALLOWED # LATIN CAPITAL LETTER P WITH FLOURISH
32853285+A753 ; PVALID # LATIN SMALL LETTER P WITH FLOURISH
32863286+A754 ; DISALLOWED # LATIN CAPITAL LETTER P WITH SQUIRREL TAIL
32873287+A755 ; PVALID # LATIN SMALL LETTER P WITH SQUIRREL TAIL
32883288+A756 ; DISALLOWED # LATIN CAPITAL LETTER Q WITH STROKE THROUGH D
32893289+A757 ; PVALID # LATIN SMALL LETTER Q WITH STROKE THROUGH DES
32903290+A758 ; DISALLOWED # LATIN CAPITAL LETTER Q WITH DIAGONAL STROKE
32913291+A759 ; PVALID # LATIN SMALL LETTER Q WITH DIAGONAL STROKE
32923292+A75A ; DISALLOWED # LATIN CAPITAL LETTER R ROTUNDA
32933293+A75B ; PVALID # LATIN SMALL LETTER R ROTUNDA
32943294+A75C ; DISALLOWED # LATIN CAPITAL LETTER RUM ROTUNDA
32953295+A75D ; PVALID # LATIN SMALL LETTER RUM ROTUNDA
32963296+A75E ; DISALLOWED # LATIN CAPITAL LETTER V WITH DIAGONAL STROKE
32973297+A75F ; PVALID # LATIN SMALL LETTER V WITH DIAGONAL STROKE
32983298+A760 ; DISALLOWED # LATIN CAPITAL LETTER VY
32993299+A761 ; PVALID # LATIN SMALL LETTER VY
33003300+A762 ; DISALLOWED # LATIN CAPITAL LETTER VISIGOTHIC Z
33013301+A763 ; PVALID # LATIN SMALL LETTER VISIGOTHIC Z
33023302+A764 ; DISALLOWED # LATIN CAPITAL LETTER THORN WITH STROKE
33033303+33043304+33053305+33063306+Faltstrom Standards Track [Page 59]
33073307+33083308+RFC 5892 IDNA Code Points August 2010
33093309+33103310+33113311+A765 ; PVALID # LATIN SMALL LETTER THORN WITH STROKE
33123312+A766 ; DISALLOWED # LATIN CAPITAL LETTER THORN WITH STROKE THROU
33133313+A767 ; PVALID # LATIN SMALL LETTER THORN WITH STROKE THROUGH
33143314+A768 ; DISALLOWED # LATIN CAPITAL LETTER VEND
33153315+A769 ; PVALID # LATIN SMALL LETTER VEND
33163316+A76A ; DISALLOWED # LATIN CAPITAL LETTER ET
33173317+A76B ; PVALID # LATIN SMALL LETTER ET
33183318+A76C ; DISALLOWED # LATIN CAPITAL LETTER IS
33193319+A76D ; PVALID # LATIN SMALL LETTER IS
33203320+A76E ; DISALLOWED # LATIN CAPITAL LETTER CON
33213321+A76F ; PVALID # LATIN SMALL LETTER CON
33223322+A770 ; DISALLOWED # MODIFIER LETTER US
33233323+A771..A778 ; PVALID # LATIN SMALL LETTER DUM..LATIN SMALL LETTER U
33243324+A779 ; DISALLOWED # LATIN CAPITAL LETTER INSULAR D
33253325+A77A ; PVALID # LATIN SMALL LETTER INSULAR D
33263326+A77B ; DISALLOWED # LATIN CAPITAL LETTER INSULAR F
33273327+A77C ; PVALID # LATIN SMALL LETTER INSULAR F
33283328+A77D..A77E ; DISALLOWED # LATIN CAPITAL LETTER INSULAR G..LATIN CAPITA
33293329+A77F ; PVALID # LATIN SMALL LETTER TURNED INSULAR G
33303330+A780 ; DISALLOWED # LATIN CAPITAL LETTER TURNED L
33313331+A781 ; PVALID # LATIN SMALL LETTER TURNED L
33323332+A782 ; DISALLOWED # LATIN CAPITAL LETTER INSULAR R
33333333+A783 ; PVALID # LATIN SMALL LETTER INSULAR R
33343334+A784 ; DISALLOWED # LATIN CAPITAL LETTER INSULAR S
33353335+A785 ; PVALID # LATIN SMALL LETTER INSULAR S
33363336+A786 ; DISALLOWED # LATIN CAPITAL LETTER INSULAR T
33373337+A787..A788 ; PVALID # LATIN SMALL LETTER INSULAR T..MODIFIER LETTE
33383338+A789..A78B ; DISALLOWED # MODIFIER LETTER COLON..LATIN CAPITAL LETTER
33393339+A78C ; PVALID # LATIN SMALL LETTER SALTILLO
33403340+A78D..A7FA ; UNASSIGNED # <reserved>..<reserved>
33413341+A7FB..A827 ; PVALID # LATIN EPIGRAPHIC LETTER REVERSED F..SYLOTI N
33423342+A828..A82B ; DISALLOWED # SYLOTI NAGRI POETRY MARK-1..SYLOTI NAGRI POE
33433343+A82C..A82F ; UNASSIGNED # <reserved>..<reserved>
33443344+A830..A839 ; DISALLOWED # NORTH INDIC FRACTION ONE QUARTER..NORTH INDI
33453345+A83A..A83F ; UNASSIGNED # <reserved>..<reserved>
33463346+A840..A873 ; PVALID # PHAGS-PA LETTER KA..PHAGS-PA LETTER CANDRABI
33473347+A874..A877 ; DISALLOWED # PHAGS-PA SINGLE HEAD MARK..PHAGS-PA MARK DOU
33483348+A878..A87F ; UNASSIGNED # <reserved>..<reserved>
33493349+A880..A8C4 ; PVALID # SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VI
33503350+A8C5..A8CD ; UNASSIGNED # <reserved>..<reserved>
33513351+A8CE..A8CF ; DISALLOWED # SAURASHTRA DANDA..SAURASHTRA DOUBLE DANDA
33523352+A8D0..A8D9 ; PVALID # SAURASHTRA DIGIT ZERO..SAURASHTRA DIGIT NINE
33533353+A8DA..A8DF ; UNASSIGNED # <reserved>..<reserved>
33543354+A8E0..A8F7 ; PVALID # COMBINING DEVANAGARI DIGIT ZERO..DEVANAGARI
33553355+A8F8..A8FA ; DISALLOWED # DEVANAGARI SIGN PUSHPIKA..DEVANAGARI CARET
33563356+A8FB ; PVALID # DEVANAGARI HEADSTROKE
33573357+A8FC..A8FF ; UNASSIGNED # <reserved>..<reserved>
33583358+A900..A92D ; PVALID # KAYAH LI DIGIT ZERO..KAYAH LI TONE CALYA PLO
33593359+33603360+33613361+33623362+Faltstrom Standards Track [Page 60]
33633363+33643364+RFC 5892 IDNA Code Points August 2010
33653365+33663366+33673367+A92E..A92F ; DISALLOWED # KAYAH LI SIGN CWI..KAYAH LI SIGN SHYA
33683368+A930..A953 ; PVALID # REJANG LETTER KA..REJANG VIRAMA
33693369+A954..A95E ; UNASSIGNED # <reserved>..<reserved>
33703370+A95F..A97C ; DISALLOWED # REJANG SECTION MARK..HANGUL CHOSEONG SSANGYE
33713371+A97D..A97F ; UNASSIGNED # <reserved>..<reserved>
33723372+A980..A9C0 ; PVALID # JAVANESE SIGN PANYANGGA..JAVANESE PANGKON
33733373+A9C1..A9CD ; DISALLOWED # JAVANESE LEFT RERENGGAN..JAVANESE TURNED PAD
33743374+A9CE ; UNASSIGNED # <reserved>
33753375+A9CF..A9D9 ; PVALID # JAVANESE PANGRANGKEP..JAVANESE DIGIT NINE
33763376+A9DA..A9DD ; UNASSIGNED # <reserved>..<reserved>
33773377+A9DE..A9DF ; DISALLOWED # JAVANESE PADA TIRTA TUMETES..JAVANESE PADA I
33783378+A9E0..A9FF ; UNASSIGNED # <reserved>..<reserved>
33793379+AA00..AA36 ; PVALID # CHAM LETTER A..CHAM CONSONANT SIGN WA
33803380+AA37..AA3F ; UNASSIGNED # <reserved>..<reserved>
33813381+AA40..AA4D ; PVALID # CHAM LETTER FINAL K..CHAM CONSONANT SIGN FIN
33823382+AA4E..AA4F ; UNASSIGNED # <reserved>..<reserved>
33833383+AA50..AA59 ; PVALID # CHAM DIGIT ZERO..CHAM DIGIT NINE
33843384+AA5A..AA5B ; UNASSIGNED # <reserved>..<reserved>
33853385+AA5C..AA5F ; DISALLOWED # CHAM PUNCTUATION SPIRAL..CHAM PUNCTUATION TR
33863386+AA60..AA76 ; PVALID # MYANMAR LETTER KHAMTI GA..MYANMAR LOGOGRAM K
33873387+AA77..AA79 ; DISALLOWED # MYANMAR SYMBOL AITON EXCLAMATION..MYANMAR SY
33883388+AA7A..AA7B ; PVALID # MYANMAR LETTER AITON RA..MYANMAR SIGN PAO KA
33893389+AA7C..AA7F ; UNASSIGNED # <reserved>..<reserved>
33903390+AA80..AAC2 ; PVALID # TAI VIET LETTER LOW KO..TAI VIET TONE MAI SO
33913391+AAC3..AADA ; UNASSIGNED # <reserved>..<reserved>
33923392+AADB..AADD ; PVALID # TAI VIET SYMBOL KON..TAI VIET SYMBOL SAM
33933393+AADE..AADF ; DISALLOWED # TAI VIET SYMBOL HO HOI..TAI VIET SYMBOL KOI
33943394+AAE0..ABBF ; UNASSIGNED # <reserved>..<reserved>
33953395+ABC0..ABEA ; PVALID # MEETEI MAYEK LETTER KOK..MEETEI MAYEK VOWEL
33963396+ABEB ; DISALLOWED # MEETEI MAYEK CHEIKHEI
33973397+ABEC..ABED ; PVALID # MEETEI MAYEK LUM IYEK..MEETEI MAYEK APUN IYE
33983398+ABEE..ABEF ; UNASSIGNED # <reserved>..<reserved>
33993399+ABF0..ABF9 ; PVALID # MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DIGIT
34003400+ABFA..ABFF ; UNASSIGNED # <reserved>..<reserved>
34013401+AC00..D7A3 ; PVALID # <Hangul Syllable>..<Hangul Syllable>
34023402+D7A4..D7AF ; UNASSIGNED # <reserved>..<reserved>
34033403+D7B0..D7C6 ; DISALLOWED # HANGUL JUNGSEONG O-YEO..HANGUL JUNGSEONG ARA
34043404+D7C7..D7CA ; UNASSIGNED # <reserved>..<reserved>
34053405+D7CB..D7FB ; DISALLOWED # HANGUL JONGSEONG NIEUN-RIEUL..HANGUL JONGSEO
34063406+D7FC..D7FF ; UNASSIGNED # <reserved>..<reserved>
34073407+D800..FA0D ; DISALLOWED # <Non Private Use High Surrogate>..CJK COMPAT
34083408+FA0E..FA0F ; PVALID # CJK COMPATIBILITY IDEOGRAPH-FA0E..CJK COMPAT
34093409+FA10 ; DISALLOWED # CJK COMPATIBILITY IDEOGRAPH-FA10
34103410+FA11 ; PVALID # CJK COMPATIBILITY IDEOGRAPH-FA11
34113411+FA12 ; DISALLOWED # CJK COMPATIBILITY IDEOGRAPH-FA12
34123412+FA13..FA14 ; PVALID # CJK COMPATIBILITY IDEOGRAPH-FA13..CJK COMPAT
34133413+FA15..FA1E ; DISALLOWED # CJK COMPATIBILITY IDEOGRAPH-FA15..CJK COMPAT
34143414+FA1F ; PVALID # CJK COMPATIBILITY IDEOGRAPH-FA1F
34153415+34163416+34173417+34183418+Faltstrom Standards Track [Page 61]
34193419+34203420+RFC 5892 IDNA Code Points August 2010
34213421+34223422+34233423+FA20 ; DISALLOWED # CJK COMPATIBILITY IDEOGRAPH-FA20
34243424+FA21 ; PVALID # CJK COMPATIBILITY IDEOGRAPH-FA21
34253425+FA22 ; DISALLOWED # CJK COMPATIBILITY IDEOGRAPH-FA22
34263426+FA23..FA24 ; PVALID # CJK COMPATIBILITY IDEOGRAPH-FA23..CJK COMPAT
34273427+FA25..FA26 ; DISALLOWED # CJK COMPATIBILITY IDEOGRAPH-FA25..CJK COMPAT
34283428+FA27..FA29 ; PVALID # CJK COMPATIBILITY IDEOGRAPH-FA27..CJK COMPAT
34293429+FA2A..FA2D ; DISALLOWED # CJK COMPATIBILITY IDEOGRAPH-FA2A..CJK COMPAT
34303430+FA2E..FA2F ; UNASSIGNED # <reserved>..<reserved>
34313431+FA30..FA6D ; DISALLOWED # CJK COMPATIBILITY IDEOGRAPH-FA30..CJK COMPAT
34323432+FA6E..FA6F ; UNASSIGNED # <reserved>..<reserved>
34333433+FA70..FAD9 ; DISALLOWED # CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPAT
34343434+FADA..FAFF ; UNASSIGNED # <reserved>..<reserved>
34353435+FB00..FB06 ; DISALLOWED # LATIN SMALL LIGATURE FF..LATIN SMALL LIGATUR
34363436+FB07..FB12 ; UNASSIGNED # <reserved>..<reserved>
34373437+FB13..FB17 ; DISALLOWED # ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SM
34383438+FB18..FB1C ; UNASSIGNED # <reserved>..<reserved>
34393439+FB1D ; DISALLOWED # HEBREW LETTER YOD WITH HIRIQ
34403440+FB1E ; PVALID # HEBREW POINT JUDEO-SPANISH VARIKA
34413441+FB1F..FB36 ; DISALLOWED # HEBREW LIGATURE YIDDISH YOD YOD PATAH..HEBRE
34423442+FB37 ; UNASSIGNED # <reserved>
34433443+FB38..FB3C ; DISALLOWED # HEBREW LETTER TET WITH DAGESH..HEBREW LETTER
34443444+FB3D ; UNASSIGNED # <reserved>
34453445+FB3E ; DISALLOWED # HEBREW LETTER MEM WITH DAGESH
34463446+FB3F ; UNASSIGNED # <reserved>
34473447+FB40..FB41 ; DISALLOWED # HEBREW LETTER NUN WITH DAGESH..HEBREW LETTER
34483448+FB42 ; UNASSIGNED # <reserved>
34493449+FB43..FB44 ; DISALLOWED # HEBREW LETTER FINAL PE WITH DAGESH..HEBREW L
34503450+FB45 ; UNASSIGNED # <reserved>
34513451+FB46..FBB1 ; DISALLOWED # HEBREW LETTER TSADI WITH DAGESH..ARABIC LETT
34523452+FBB2..FBD2 ; UNASSIGNED # <reserved>..<reserved>
34533453+FBD3..FD3F ; DISALLOWED # ARABIC LETTER NG ISOLATED FORM..ORNATE RIGHT
34543454+FD40..FD4F ; UNASSIGNED # <reserved>..<reserved>
34553455+FD50..FD8F ; DISALLOWED # ARABIC LIGATURE TEH WITH JEEM WITH MEEM INIT
34563456+FD90..FD91 ; UNASSIGNED # <reserved>..<reserved>
34573457+FD92..FDC7 ; DISALLOWED # ARABIC LIGATURE MEEM WITH JEEM WITH KHAH INI
34583458+FDC8..FDCF ; UNASSIGNED # <reserved>..<reserved>
34593459+FDD0..FDFD ; DISALLOWED # <noncharacter>..ARABIC LIGATURE BISMILLAH AR
34603460+FDFE..FDFF ; UNASSIGNED # <reserved>..<reserved>
34613461+FE00..FE19 ; DISALLOWED # VARIATION SELECTOR-1..PRESENTATION FORM FOR
34623462+FE1A..FE1F ; UNASSIGNED # <reserved>..<reserved>
34633463+FE20..FE26 ; PVALID # COMBINING LIGATURE LEFT HALF..COMBINING CONJ
34643464+FE27..FE2F ; UNASSIGNED # <reserved>..<reserved>
34653465+FE30..FE52 ; DISALLOWED # PRESENTATION FORM FOR VERTICAL TWO DOT LEADE
34663466+FE53 ; UNASSIGNED # <reserved>
34673467+FE54..FE66 ; DISALLOWED # SMALL SEMICOLON..SMALL EQUALS SIGN
34683468+FE67 ; UNASSIGNED # <reserved>
34693469+FE68..FE6B ; DISALLOWED # SMALL REVERSE SOLIDUS..SMALL COMMERCIAL AT
34703470+FE6C..FE6F ; UNASSIGNED # <reserved>..<reserved>
34713471+34723472+34733473+34743474+Faltstrom Standards Track [Page 62]
34753475+34763476+RFC 5892 IDNA Code Points August 2010
34773477+34783478+34793479+FE70..FE72 ; DISALLOWED # ARABIC FATHATAN ISOLATED FORM..ARABIC DAMMAT
34803480+FE73 ; PVALID # ARABIC TAIL FRAGMENT
34813481+FE74 ; DISALLOWED # ARABIC KASRATAN ISOLATED FORM
34823482+FE75 ; UNASSIGNED # <reserved>
34833483+FE76..FEFC ; DISALLOWED # ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE
34843484+FEFD..FEFE ; UNASSIGNED # <reserved>..<reserved>
34853485+FEFF ; DISALLOWED # ZERO WIDTH NO-BREAK SPACE
34863486+FF00 ; UNASSIGNED # <reserved>
34873487+FF01..FFBE ; DISALLOWED # FULLWIDTH EXCLAMATION MARK..HALFWIDTH HANGUL
34883488+FFBF..FFC1 ; UNASSIGNED # <reserved>..<reserved>
34893489+FFC2..FFC7 ; DISALLOWED # HALFWIDTH HANGUL LETTER A..HALFWIDTH HANGUL
34903490+FFC8..FFC9 ; UNASSIGNED # <reserved>..<reserved>
34913491+FFCA..FFCF ; DISALLOWED # HALFWIDTH HANGUL LETTER YEO..HALFWIDTH HANGU
34923492+FFD0..FFD1 ; UNASSIGNED # <reserved>..<reserved>
34933493+FFD2..FFD7 ; DISALLOWED # HALFWIDTH HANGUL LETTER YO..HALFWIDTH HANGUL
34943494+FFD8..FFD9 ; UNASSIGNED # <reserved>..<reserved>
34953495+FFDA..FFDC ; DISALLOWED # HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL
34963496+FFDD..FFDF ; UNASSIGNED # <reserved>..<reserved>
34973497+FFE0..FFE6 ; DISALLOWED # FULLWIDTH CENT SIGN..FULLWIDTH WON SIGN
34983498+FFE7 ; UNASSIGNED # <reserved>
34993499+FFE8..FFEE ; DISALLOWED # HALFWIDTH FORMS LIGHT VERTICAL..HALFWIDTH WH
35003500+FFEF..FFF8 ; UNASSIGNED # <reserved>..<reserved>
35013501+FFF9..FFFF ; DISALLOWED # INTERLINEAR ANNOTATION ANCHOR..<noncharacter
35023502+10000..1000B; PVALID # LINEAR B SYLLABLE B008 A..LINEAR B SYLLABLE
35033503+1000C ; UNASSIGNED # <reserved>
35043504+1000D..10026; PVALID # LINEAR B SYLLABLE B036 JO..LINEAR B SYLLABLE
35053505+10027 ; UNASSIGNED # <reserved>
35063506+10028..1003A; PVALID # LINEAR B SYLLABLE B060 RA..LINEAR B SYLLABLE
35073507+1003B ; UNASSIGNED # <reserved>
35083508+1003C..1003D; PVALID # LINEAR B SYLLABLE B017 ZA..LINEAR B SYLLABLE
35093509+1003E ; UNASSIGNED # <reserved>
35103510+1003F..1004D; PVALID # LINEAR B SYLLABLE B020 ZO..LINEAR B SYLLABLE
35113511+1004E..1004F; UNASSIGNED # <reserved>..<reserved>
35123512+10050..1005D; PVALID # LINEAR B SYMBOL B018..LINEAR B SYMBOL B089
35133513+1005E..1007F; UNASSIGNED # <reserved>..<reserved>
35143514+10080..100FA; PVALID # LINEAR B IDEOGRAM B100 MAN..LINEAR B IDEOGRA
35153515+100FB..100FF; UNASSIGNED # <reserved>..<reserved>
35163516+10100..10102; DISALLOWED # AEGEAN WORD SEPARATOR LINE..AEGEAN CHECK MAR
35173517+10103..10106; UNASSIGNED # <reserved>..<reserved>
35183518+10107..10133; DISALLOWED # AEGEAN NUMBER ONE..AEGEAN NUMBER NINETY THOU
35193519+10134..10136; UNASSIGNED # <reserved>..<reserved>
35203520+10137..1018A; DISALLOWED # AEGEAN WEIGHT BASE UNIT..GREEK ZERO SIGN
35213521+1018B..1018F; UNASSIGNED # <reserved>..<reserved>
35223522+10190..1019B; DISALLOWED # ROMAN SEXTANS SIGN..ROMAN CENTURIAL SIGN
35233523+1019C..101CF; UNASSIGNED # <reserved>..<reserved>
35243524+101D0..101FC; DISALLOWED # PHAISTOS DISC SIGN PEDESTRIAN..PHAISTOS DISC
35253525+101FD ; PVALID # PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE
35263526+101FE..1027F; UNASSIGNED # <reserved>..<reserved>
35273527+35283528+35293529+35303530+Faltstrom Standards Track [Page 63]
35313531+35323532+RFC 5892 IDNA Code Points August 2010
35333533+35343534+35353535+10280..1029C; PVALID # LYCIAN LETTER A..LYCIAN LETTER X
35363536+1029D..1029F; UNASSIGNED # <reserved>..<reserved>
35373537+102A0..102D0; PVALID # CARIAN LETTER A..CARIAN LETTER UUU3
35383538+102D1..102FF; UNASSIGNED # <reserved>..<reserved>
35393539+10300..1031E; PVALID # OLD ITALIC LETTER A..OLD ITALIC LETTER UU
35403540+1031F ; UNASSIGNED # <reserved>
35413541+10320..10323; DISALLOWED # OLD ITALIC NUMERAL ONE..OLD ITALIC NUMERAL F
35423542+10324..1032F; UNASSIGNED # <reserved>..<reserved>
35433543+10330..10340; PVALID # GOTHIC LETTER AHSA..GOTHIC LETTER PAIRTHRA
35443544+10341 ; DISALLOWED # GOTHIC LETTER NINETY
35453545+10342..10349; PVALID # GOTHIC LETTER RAIDA..GOTHIC LETTER OTHAL
35463546+1034A ; DISALLOWED # GOTHIC LETTER NINE HUNDRED
35473547+1034B..1037F; UNASSIGNED # <reserved>..<reserved>
35483548+10380..1039D; PVALID # UGARITIC LETTER ALPA..UGARITIC LETTER SSU
35493549+1039E ; UNASSIGNED # <reserved>
35503550+1039F ; DISALLOWED # UGARITIC WORD DIVIDER
35513551+103A0..103C3; PVALID # OLD PERSIAN SIGN A..OLD PERSIAN SIGN HA
35523552+103C4..103C7; UNASSIGNED # <reserved>..<reserved>
35533553+103C8..103CF; PVALID # OLD PERSIAN SIGN AURAMAZDAA..OLD PERSIAN SIG
35543554+103D0..103D5; DISALLOWED # OLD PERSIAN WORD DIVIDER..OLD PERSIAN NUMBER
35553555+103D6..103FF; UNASSIGNED # <reserved>..<reserved>
35563556+10400..10427; DISALLOWED # DESERET CAPITAL LETTER LONG I..DESERET CAPIT
35573557+10428..1049D; PVALID # DESERET SMALL LETTER LONG I..OSMANYA LETTER
35583558+1049E..1049F; UNASSIGNED # <reserved>..<reserved>
35593559+104A0..104A9; PVALID # OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE
35603560+104AA..107FF; UNASSIGNED # <reserved>..<reserved>
35613561+10800..10805; PVALID # CYPRIOT SYLLABLE A..CYPRIOT SYLLABLE JA
35623562+10806..10807; UNASSIGNED # <reserved>..<reserved>
35633563+10808 ; PVALID # CYPRIOT SYLLABLE JO
35643564+10809 ; UNASSIGNED # <reserved>
35653565+1080A..10835; PVALID # CYPRIOT SYLLABLE KA..CYPRIOT SYLLABLE WO
35663566+10836 ; UNASSIGNED # <reserved>
35673567+10837..10838; PVALID # CYPRIOT SYLLABLE XA..CYPRIOT SYLLABLE XE
35683568+10839..1083B; UNASSIGNED # <reserved>..<reserved>
35693569+1083C ; PVALID # CYPRIOT SYLLABLE ZA
35703570+1083D..1083E; UNASSIGNED # <reserved>..<reserved>
35713571+1083F..10855; PVALID # CYPRIOT SYLLABLE ZO..IMPERIAL ARAMAIC LETTER
35723572+10856 ; UNASSIGNED # <reserved>
35733573+10857..1085F; DISALLOWED # IMPERIAL ARAMAIC SECTION SIGN..IMPERIAL ARAM
35743574+10860..108FF; UNASSIGNED # <reserved>..<reserved>
35753575+10900..10915; PVALID # PHOENICIAN LETTER ALF..PHOENICIAN LETTER TAU
35763576+10916..1091B; DISALLOWED # PHOENICIAN NUMBER ONE..PHOENICIAN NUMBER THR
35773577+1091C..1091E; UNASSIGNED # <reserved>..<reserved>
35783578+1091F ; DISALLOWED # PHOENICIAN WORD SEPARATOR
35793579+10920..10939; PVALID # LYDIAN LETTER A..LYDIAN LETTER C
35803580+1093A..1093E; UNASSIGNED # <reserved>..<reserved>
35813581+1093F ; DISALLOWED # LYDIAN TRIANGULAR MARK
35823582+10940..109FF; UNASSIGNED # <reserved>..<reserved>
35833583+35843584+35853585+35863586+Faltstrom Standards Track [Page 64]
35873587+35883588+RFC 5892 IDNA Code Points August 2010
35893589+35903590+35913591+10A00..10A03; PVALID # KHAROSHTHI LETTER A..KHAROSHTHI VOWEL SIGN V
35923592+10A04 ; UNASSIGNED # <reserved>
35933593+10A05..10A06; PVALID # KHAROSHTHI VOWEL SIGN E..KHAROSHTHI VOWEL SI
35943594+10A07..10A0B; UNASSIGNED # <reserved>..<reserved>
35953595+10A0C..10A13; PVALID # KHAROSHTHI VOWEL LENGTH MARK..KHAROSHTHI LET
35963596+10A14 ; UNASSIGNED # <reserved>
35973597+10A15..10A17; PVALID # KHAROSHTHI LETTER CA..KHAROSHTHI LETTER JA
35983598+10A18 ; UNASSIGNED # <reserved>
35993599+10A19..10A33; PVALID # KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER TTT
36003600+10A34..10A37; UNASSIGNED # <reserved>..<reserved>
36013601+10A38..10A3A; PVALID # KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN D
36023602+10A3B..10A3E; UNASSIGNED # <reserved>..<reserved>
36033603+10A3F ; PVALID # KHAROSHTHI VIRAMA
36043604+10A40..10A47; DISALLOWED # KHAROSHTHI DIGIT ONE..KHAROSHTHI NUMBER ONE
36053605+10A48..10A4F; UNASSIGNED # <reserved>..<reserved>
36063606+10A50..10A58; DISALLOWED # KHAROSHTHI PUNCTUATION DOT..KHAROSHTHI PUNCT
36073607+10A59..10A5F; UNASSIGNED # <reserved>..<reserved>
36083608+10A60..10A7C; PVALID # OLD SOUTH ARABIAN LETTER HE..OLD SOUTH ARABI
36093609+10A7D..10A7F; DISALLOWED # OLD SOUTH ARABIAN NUMBER ONE..OLD SOUTH ARAB
36103610+10A80..10AFF; UNASSIGNED # <reserved>..<reserved>
36113611+10B00..10B35; PVALID # AVESTAN LETTER A..AVESTAN LETTER HE
36123612+10B36..10B38; UNASSIGNED # <reserved>..<reserved>
36133613+10B39..10B3F; DISALLOWED # AVESTAN ABBREVIATION MARK..LARGE ONE RING OV
36143614+10B40..10B55; PVALID # INSCRIPTIONAL PARTHIAN LETTER ALEPH..INSCRIP
36153615+10B56..10B57; UNASSIGNED # <reserved>..<reserved>
36163616+10B58..10B5F; DISALLOWED # INSCRIPTIONAL PARTHIAN NUMBER ONE..INSCRIPTI
36173617+10B60..10B72; PVALID # INSCRIPTIONAL PAHLAVI LETTER ALEPH..INSCRIPT
36183618+10B73..10B77; UNASSIGNED # <reserved>..<reserved>
36193619+10B78..10B7F; DISALLOWED # INSCRIPTIONAL PAHLAVI NUMBER ONE..INSCRIPTIO
36203620+10B80..10BFF; UNASSIGNED # <reserved>..<reserved>
36213621+10C00..10C48; PVALID # OLD TURKIC LETTER ORKHON A..OLD TURKIC LETTE
36223622+10C49..10E5F; UNASSIGNED # <reserved>..<reserved>
36233623+10E60..10E7E; DISALLOWED # RUMI DIGIT ONE..RUMI FRACTION TWO THIRDS
36243624+10E7F..1107F; UNASSIGNED # <reserved>..<reserved>
36253625+11080..110BA; PVALID # KAITHI SIGN CANDRABINDU..KAITHI SIGN NUKTA
36263626+110BB..110C1; DISALLOWED # KAITHI ABBREVIATION SIGN..KAITHI DOUBLE DAND
36273627+110C2..11FFF; UNASSIGNED # <reserved>..<reserved>
36283628+12000..1236E; PVALID # CUNEIFORM SIGN A..CUNEIFORM SIGN ZUM
36293629+1236F..123FF; UNASSIGNED # <reserved>..<reserved>
36303630+12400..12462; DISALLOWED # CUNEIFORM NUMERIC SIGN TWO ASH..CUNEIFORM NU
36313631+12463..1246F; UNASSIGNED # <reserved>..<reserved>
36323632+12470..12473; DISALLOWED # CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD
36333633+12474..12FFF; UNASSIGNED # <reserved>..<reserved>
36343634+13000..1342E; PVALID # EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYP
36353635+1342F..1CFFF; UNASSIGNED # <reserved>..<reserved>
36363636+1D000..1D0F5; DISALLOWED # BYZANTINE MUSICAL SYMBOL PSILI..BYZANTINE MU
36373637+1D0F6..1D0FF; UNASSIGNED # <reserved>..<reserved>
36383638+1D100..1D126; DISALLOWED # MUSICAL SYMBOL SINGLE BARLINE..MUSICAL SYMBO
36393639+36403640+36413641+36423642+Faltstrom Standards Track [Page 65]
36433643+36443644+RFC 5892 IDNA Code Points August 2010
36453645+36463646+36473647+1D127..1D128; UNASSIGNED # <reserved>..<reserved>
36483648+1D129..1D1DD; DISALLOWED # MUSICAL SYMBOL MULTIPLE MEASURE REST..MUSICA
36493649+1D1DE..1D1FF; UNASSIGNED # <reserved>..<reserved>
36503650+1D200..1D245; DISALLOWED # GREEK VOCAL NOTATION SYMBOL-1..GREEK MUSICAL
36513651+1D246..1D2FF; UNASSIGNED # <reserved>..<reserved>
36523652+1D300..1D356; DISALLOWED # MONOGRAM FOR EARTH..TETRAGRAM FOR FOSTERING
36533653+1D357..1D35F; UNASSIGNED # <reserved>..<reserved>
36543654+1D360..1D371; DISALLOWED # COUNTING ROD UNIT DIGIT ONE..COUNTING ROD TE
36553655+1D372..1D3FF; UNASSIGNED # <reserved>..<reserved>
36563656+1D400..1D454; DISALLOWED # MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL IT
36573657+1D455 ; UNASSIGNED # <reserved>
36583658+1D456..1D49C; DISALLOWED # MATHEMATICAL ITALIC SMALL I..MATHEMATICAL SC
36593659+1D49D ; UNASSIGNED # <reserved>
36603660+1D49E..1D49F; DISALLOWED # MATHEMATICAL SCRIPT CAPITAL C..MATHEMATICAL
36613661+1D4A0..1D4A1; UNASSIGNED # <reserved>..<reserved>
36623662+1D4A2 ; DISALLOWED # MATHEMATICAL SCRIPT CAPITAL G
36633663+1D4A3..1D4A4; UNASSIGNED # <reserved>..<reserved>
36643664+1D4A5..1D4A6; DISALLOWED # MATHEMATICAL SCRIPT CAPITAL J..MATHEMATICAL
36653665+1D4A7..1D4A8; UNASSIGNED # <reserved>..<reserved>
36663666+1D4A9..1D4AC; DISALLOWED # MATHEMATICAL SCRIPT CAPITAL N..MATHEMATICAL
36673667+1D4AD ; UNASSIGNED # <reserved>
36683668+1D4AE..1D4B9; DISALLOWED # MATHEMATICAL SCRIPT CAPITAL S..MATHEMATICAL
36693669+1D4BA ; UNASSIGNED # <reserved>
36703670+1D4BB ; DISALLOWED # MATHEMATICAL SCRIPT SMALL F
36713671+1D4BC ; UNASSIGNED # <reserved>
36723672+1D4BD..1D4C3; DISALLOWED # MATHEMATICAL SCRIPT SMALL H..MATHEMATICAL SC
36733673+1D4C4 ; UNASSIGNED # <reserved>
36743674+1D4C5..1D505; DISALLOWED # MATHEMATICAL SCRIPT SMALL P..MATHEMATICAL FR
36753675+1D506 ; UNASSIGNED # <reserved>
36763676+1D507..1D50A; DISALLOWED # MATHEMATICAL FRAKTUR CAPITAL D..MATHEMATICAL
36773677+1D50B..1D50C; UNASSIGNED # <reserved>..<reserved>
36783678+1D50D..1D514; DISALLOWED # MATHEMATICAL FRAKTUR CAPITAL J..MATHEMATICAL
36793679+1D515 ; UNASSIGNED # <reserved>
36803680+1D516..1D51C; DISALLOWED # MATHEMATICAL FRAKTUR CAPITAL S..MATHEMATICAL
36813681+1D51D ; UNASSIGNED # <reserved>
36823682+1D51E..1D539; DISALLOWED # MATHEMATICAL FRAKTUR SMALL A..MATHEMATICAL D
36833683+1D53A ; UNASSIGNED # <reserved>
36843684+1D53B..1D53E; DISALLOWED # MATHEMATICAL DOUBLE-STRUCK CAPITAL D..MATHEM
36853685+1D53F ; UNASSIGNED # <reserved>
36863686+1D540..1D544; DISALLOWED # MATHEMATICAL DOUBLE-STRUCK CAPITAL I..MATHEM
36873687+1D545 ; UNASSIGNED # <reserved>
36883688+1D546 ; DISALLOWED # MATHEMATICAL DOUBLE-STRUCK CAPITAL O
36893689+1D547..1D549; UNASSIGNED # <reserved>..<reserved>
36903690+1D54A..1D550; DISALLOWED # MATHEMATICAL DOUBLE-STRUCK CAPITAL S..MATHEM
36913691+1D551 ; UNASSIGNED # <reserved>
36923692+1D552..1D6A5; DISALLOWED # MATHEMATICAL DOUBLE-STRUCK SMALL A..MATHEMAT
36933693+1D6A6..1D6A7; UNASSIGNED # <reserved>..<reserved>
36943694+1D6A8..1D7CB; DISALLOWED # MATHEMATICAL BOLD CAPITAL ALPHA..MATHEMATICA
36953695+36963696+36973697+36983698+Faltstrom Standards Track [Page 66]
36993699+37003700+RFC 5892 IDNA Code Points August 2010
37013701+37023702+37033703+1D7CC..1D7CD; UNASSIGNED # <reserved>..<reserved>
37043704+1D7CE..1D7FF; DISALLOWED # MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL M
37053705+1D800..1EFFF; UNASSIGNED # <reserved>..<reserved>
37063706+1F000..1F02B; DISALLOWED # MAHJONG TILE EAST WIND..MAHJONG TILE BACK
37073707+1F02C..1F02F; UNASSIGNED # <reserved>..<reserved>
37083708+1F030..1F093; DISALLOWED # DOMINO TILE HORIZONTAL BACK..DOMINO TILE VER
37093709+1F094..1F0FF; UNASSIGNED # <reserved>..<reserved>
37103710+1F100..1F10A; DISALLOWED # DIGIT ZERO FULL STOP..DIGIT NINE COMMA
37113711+1F10B..1F10F; UNASSIGNED # <reserved>..<reserved>
37123712+1F110..1F12E; DISALLOWED # PARENTHESIZED LATIN CAPITAL LETTER A..CIRCLE
37133713+1F12F..1F130; UNASSIGNED # <reserved>..<reserved>
37143714+1F131 ; DISALLOWED # SQUARED LATIN CAPITAL LETTER B
37153715+1F132..1F13C; UNASSIGNED # <reserved>..<reserved>
37163716+1F13D ; DISALLOWED # SQUARED LATIN CAPITAL LETTER N
37173717+1F13E ; UNASSIGNED # <reserved>
37183718+1F13F ; DISALLOWED # SQUARED LATIN CAPITAL LETTER P
37193719+1F140..1F141; UNASSIGNED # <reserved>..<reserved>
37203720+1F142 ; DISALLOWED # SQUARED LATIN CAPITAL LETTER S
37213721+1F143..1F145; UNASSIGNED # <reserved>..<reserved>
37223722+1F146 ; DISALLOWED # SQUARED LATIN CAPITAL LETTER W
37233723+1F147..1F149; UNASSIGNED # <reserved>..<reserved>
37243724+1F14A..1F14E; DISALLOWED # SQUARED HV..SQUARED PPV
37253725+1F14F..1F156; UNASSIGNED # <reserved>..<reserved>
37263726+1F157 ; DISALLOWED # NEGATIVE CIRCLED LATIN CAPITAL LETTER H
37273727+1F158..1F15E; UNASSIGNED # <reserved>..<reserved>
37283728+1F15F ; DISALLOWED # NEGATIVE CIRCLED LATIN CAPITAL LETTER P
37293729+1F160..1F178; UNASSIGNED # <reserved>..<reserved>
37303730+1F179 ; DISALLOWED # NEGATIVE SQUARED LATIN CAPITAL LETTER J
37313731+1F17A ; UNASSIGNED # <reserved>
37323732+1F17B..1F17C; DISALLOWED # NEGATIVE SQUARED LATIN CAPITAL LETTER L..NEG
37333733+1F17D..1F17E; UNASSIGNED # <reserved>..<reserved>
37343734+1F17F ; DISALLOWED # NEGATIVE SQUARED LATIN CAPITAL LETTER P
37353735+1F180..1F189; UNASSIGNED # <reserved>..<reserved>
37363736+1F18A..1F18D; DISALLOWED # CROSSED NEGATIVE SQUARED LATIN CAPITAL LETTE
37373737+1F18E..1F18F; UNASSIGNED # <reserved>..<reserved>
37383738+1F190 ; DISALLOWED # SQUARE DJ
37393739+1F191..1F1FF; UNASSIGNED # <reserved>..<reserved>
37403740+1F200 ; DISALLOWED # SQUARE HIRAGANA HOKA
37413741+1F201..1F20F; UNASSIGNED # <reserved>..<reserved>
37423742+1F210..1F231; DISALLOWED # SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED
37433743+1F232..1F23F; UNASSIGNED # <reserved>..<reserved>
37443744+1F240..1F248; DISALLOWED # TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRA
37453745+1F249..1FFFD; UNASSIGNED # <reserved>..<reserved>
37463746+1FFFE..1FFFF; DISALLOWED # <noncharacter>..<noncharacter>
37473747+20000..2A6D6; PVALID # <CJK Ideograph Extension B>..<CJK Ideograph
37483748+2A6D7..2A6FF; UNASSIGNED # <reserved>..<reserved>
37493749+2A700..2B734; PVALID # <CJK Ideograph Extension C>..<CJK Ideograph
37503750+2B735..2F7FF; UNASSIGNED # <reserved>..<reserved>
37513751+37523752+37533753+37543754+Faltstrom Standards Track [Page 67]
37553755+37563756+RFC 5892 IDNA Code Points August 2010
37573757+37583758+37593759+2F800..2FA1D; DISALLOWED # CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPA
37603760+2FA1E..2FFFD; UNASSIGNED # <reserved>..<reserved>
37613761+2FFFE..2FFFF; DISALLOWED # <noncharacter>..<noncharacter>
37623762+30000..3FFFD; UNASSIGNED # <reserved>..<reserved>
37633763+3FFFE..3FFFF; DISALLOWED # <noncharacter>..<noncharacter>
37643764+40000..4FFFD; UNASSIGNED # <reserved>..<reserved>
37653765+4FFFE..4FFFF; DISALLOWED # <noncharacter>..<noncharacter>
37663766+50000..5FFFD; UNASSIGNED # <reserved>..<reserved>
37673767+5FFFE..5FFFF; DISALLOWED # <noncharacter>..<noncharacter>
37683768+60000..6FFFD; UNASSIGNED # <reserved>..<reserved>
37693769+6FFFE..6FFFF; DISALLOWED # <noncharacter>..<noncharacter>
37703770+70000..7FFFD; UNASSIGNED # <reserved>..<reserved>
37713771+7FFFE..7FFFF; DISALLOWED # <noncharacter>..<noncharacter>
37723772+80000..8FFFD; UNASSIGNED # <reserved>..<reserved>
37733773+8FFFE..8FFFF; DISALLOWED # <noncharacter>..<noncharacter>
37743774+90000..9FFFD; UNASSIGNED # <reserved>..<reserved>
37753775+9FFFE..9FFFF; DISALLOWED # <noncharacter>..<noncharacter>
37763776+A0000..AFFFD; UNASSIGNED # <reserved>..<reserved>
37773777+AFFFE..AFFFF; DISALLOWED # <noncharacter>..<noncharacter>
37783778+B0000..BFFFD; UNASSIGNED # <reserved>..<reserved>
37793779+BFFFE..BFFFF; DISALLOWED # <noncharacter>..<noncharacter>
37803780+C0000..CFFFD; UNASSIGNED # <reserved>..<reserved>
37813781+CFFFE..CFFFF; DISALLOWED # <noncharacter>..<noncharacter>
37823782+D0000..DFFFD; UNASSIGNED # <reserved>..<reserved>
37833783+DFFFE..DFFFF; DISALLOWED # <noncharacter>..<noncharacter>
37843784+E0000 ; UNASSIGNED # <reserved>
37853785+E0001 ; DISALLOWED # LANGUAGE TAG
37863786+E0002..E001F; UNASSIGNED # <reserved>..<reserved>
37873787+E0020..E007F; DISALLOWED # TAG SPACE..CANCEL TAG
37883788+E0080..E00FF; UNASSIGNED # <reserved>..<reserved>
37893789+E0100..E01EF; DISALLOWED # VARIATION SELECTOR-17..VARIATION SELECTOR-25
37903790+E01F0..EFFFD; UNASSIGNED # <reserved>..<reserved>
37913791+EFFFE..10FFFF; DISALLOWED # <noncharacter>..<noncharacter>
37923792+37933793+37943794+37953795+37963796+37973797+37983798+37993799+38003800+38013801+38023802+38033803+38043804+38053805+38063806+38073807+38083808+38093809+38103810+Faltstrom Standards Track [Page 68]
38113811+38123812+RFC 5892 IDNA Code Points August 2010
38133813+38143814+38153815+8. References
38163816+38173817+8.1. Normative References
38183818+38193819+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
38203820+ Requirement Levels", BCP 14, RFC 2119, March 1997.
38213821+38223822+ [TR15] Davis, M. and M. Duerst, "Unicode Standard Annex #15,
38233823+ Unicode Normalization Forms, an integral part of the
38243824+ Unicode Standard",
38253825+ <http://unicode.org/unicode/reports/tr15/>.
38263826+38273827+ [Unicode] The Unicode Consortium, "The Unicode Standard, Version
38283828+ 5.0", 2007. Boston, MA, USA: Addison-Wesley. ISBN
38293829+ 0-321-48091-0. This printed reference has now been
38303830+ updated online to reflect additional code points. For
38313831+ code points, the reference at the time this document was
38323832+ published is to Unicode 5.2.
38333833+38343834+ [Unicode52] The Unicode Consortium. The Unicode Standard, Version
38353835+ 5.2.0, defined by: "The Unicode Standard, Version
38363836+ 5.2.0", (Mountain View, CA: The Unicode Consortium,
38373837+ 2009. ISBN 978-1-936213-00-9).
38383838+ <http://www.unicode.org/versions/Unicode5.2.0/>.
38393839+38403840+8.2. Informative References
38413841+38423842+ [BlockNames] "Blocks-5.2.0.txt", Unicode Character Database,
38433843+ May 2009,
38443844+ <http://unicode.org/Public/5.2.0/ucd/Blocks.txt>.
38453845+38463846+ [DerivedCoreProperties]
38473847+ "DerivedCoreProperties-5.2.0.txt", Unicode Character
38483848+ Database, August 2009, <http://unicode.org/Public/5.2.0/
38493849+ ucd/DerivedCoreProperties.txt>.
38503850+38513851+ [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
38523852+ Internationalized Strings ("stringprep")", RFC 3454,
38533853+ December 2002.
38543854+38553855+ [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
38563856+ Profile for Internationalized Domain Names (IDN)",
38573857+ RFC 3491, March 2003.
38583858+38593859+ [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
38603860+ and Recommendations for Internationalized Domain Names
38613861+ (IDNs)", RFC 4690, September 2006.
38623862+38633863+38643864+38653865+38663866+Faltstrom Standards Track [Page 69]
38673867+38683868+RFC 5892 IDNA Code Points August 2010
38693869+38703870+38713871+ [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an
38723872+ IANA Considerations Section in RFCs", BCP 26, RFC 5226,
38733873+ May 2008.
38743874+38753875+ [RFC5890] Klensin, J., "Internationalized Domain Names for
38763876+ Applications (IDNA): Definitions and Document
38773877+ Framework", RFC 5890, August 2010.
38783878+38793879+ [RFC5891] Klensin, J., "Internationalized Domain Names in
38803880+ Applications (IDNA): Protocol", RFC 5891, August 2010.
38813881+38823882+ [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
38833883+ for Internationalized Domain Names for Applications
38843884+ (IDNA)", RFC 5893, August 2010.
38853885+38863886+ [RFC5894] Klensin, J., "Internationalized Domain Names for
38873887+ Applications (IDNA): Background, Explanation, and
38883888+ Rationale", RFC 5894, August 2010.
38893889+38903890+Author's Address
38913891+38923892+ Patrik Faltstrom (editor)
38933893+ Cisco
38943894+38953895+ EMail: paf@cisco.com
38963896+38973897+38983898+38993899+39003900+39013901+39023902+39033903+39043904+39053905+39063906+39073907+39083908+39093909+39103910+39113911+39123912+39133913+39143914+39153915+39163916+39173917+39183918+39193919+39203920+39213921+39223922+Faltstrom Standards Track [Page 70]
39233923+
+955
spec/rfc5893.txt
···11+22+33+44+55+66+77+Internet Engineering Task Force (IETF) H. Alvestrand, Ed.
88+Request for Comments: 5893 Google
99+Category: Standards Track C. Karp
1010+ISSN: 2070-1721 Swedish Museum of Natural History
1111+ August 2010
1212+1313+1414+ Right-to-Left Scripts for
1515+ Internationalized Domain Names for Applications (IDNA)
1616+1717+Abstract
1818+1919+ The use of right-to-left scripts in Internationalized Domain Names
2020+ (IDNs) has presented several challenges. This memo provides a new
2121+ Bidi rule for Internationalized Domain Names for Applications (IDNA)
2222+ labels, based on the encountered problems with some scripts and some
2323+ shortcomings in the 2003 IDNA Bidi criterion.
2424+2525+Status of This Memo
2626+2727+ This is an Internet Standards Track document.
2828+2929+ This document is a product of the Internet Engineering Task Force
3030+ (IETF). It represents the consensus of the IETF community. It has
3131+ received public review and has been approved for publication by the
3232+ Internet Engineering Steering Group (IESG). Further information on
3333+ Internet Standards is available in Section 2 of RFC 5741.
3434+3535+ Information about the current status of this document, any errata,
3636+ and how to provide feedback on it may be obtained at
3737+ http://www.rfc-editor.org/info/rfc5893.
3838+3939+Copyright Notice
4040+4141+ Copyright (c) 2010 IETF Trust and the persons identified as the
4242+ document authors. All rights reserved.
4343+4444+ This document is subject to BCP 78 and the IETF Trust's Legal
4545+ Provisions Relating to IETF Documents
4646+ (http://trustee.ietf.org/license-info) in effect on the date of
4747+ publication of this document. Please review these documents
4848+ carefully, as they describe your rights and restrictions with respect
4949+ to this document. Code Components extracted from this document must
5050+ include Simplified BSD License text as described in Section 4.e of
5151+ the Trust Legal Provisions and are provided without warranty as
5252+ described in the Simplified BSD License.
5353+5454+5555+5656+5757+5858+Alvestrand & Karp Standards Track [Page 1]
5959+6060+RFC 5893 IDNA Right to Left August 2010
6161+6262+6363+Table of Contents
6464+6565+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
6666+ 1.1. Purpose and Applicability . . . . . . . . . . . . . . . . 2
6767+ 1.2. Background and History . . . . . . . . . . . . . . . . . . 3
6868+ 1.3. Structure of the Rest of This Document . . . . . . . . . . 3
6969+ 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4
7070+ 2. The Bidi Rule . . . . . . . . . . . . . . . . . . . . . . . . 6
7171+ 3. The Requirement Set for the Bidi Rule . . . . . . . . . . . . 6
7272+ 4. Examples of Issues Found with RFC 3454 . . . . . . . . . . . . 9
7373+ 4.1. Dhivehi . . . . . . . . . . . . . . . . . . . . . . . . . 9
7474+ 4.2. Yiddish . . . . . . . . . . . . . . . . . . . . . . . . . 10
7575+ 4.3. Strings with Numbers . . . . . . . . . . . . . . . . . . . 12
7676+ 5. Troublesome Situations and Guidelines . . . . . . . . . . . . 12
7777+ 6. Other Issues in Need of Resolution . . . . . . . . . . . . . . 13
7878+ 7. Compatibility Considerations . . . . . . . . . . . . . . . . . 14
7979+ 7.1. Backwards Compatibility Considerations . . . . . . . . . . 14
8080+ 7.2. Forward Compatibility Considerations . . . . . . . . . . . 15
8181+ 8. Security Considerations . . . . . . . . . . . . . . . . . . . 15
8282+ 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16
8383+ 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8484+ 10.1. Normative References . . . . . . . . . . . . . . . . . . . 16
8585+ 10.2. Informative References . . . . . . . . . . . . . . . . . . 17
8686+8787+1. Introduction
8888+8989+1.1. Purpose and Applicability
9090+9191+ The purpose of this document is to establish a rule that can be
9292+ applied to Internationalized Domain Name (IDN) labels in Unicode form
9393+ (U-labels) containing characters from scripts that are written from
9494+ right to left. It is part of the revised IDNA protocol [RFC5891].
9595+9696+ When labels satisfy the rule, and when certain other conditions are
9797+ satisfied, there is only a minimal chance of these labels being
9898+ displayed in a confusing way by the Unicode bidirectional display
9999+ algorithm.
100100+101101+ The other normative documents in the IDNA2008 document set establish
102102+ criteria for valid labels, including listing the permitted
103103+ characters. This document establishes additional validity criteria
104104+ for labels in scripts normally written from right to left.
105105+106106+ This specification is not intended to place any requirements on
107107+ domain names that do not contain characters from such scripts.
108108+109109+110110+111111+112112+113113+114114+Alvestrand & Karp Standards Track [Page 2]
115115+116116+RFC 5893 IDNA Right to Left August 2010
117117+118118+119119+1.2. Background and History
120120+121121+ The "Stringprep" specification [RFC3454], part of IDNA2003, made the
122122+ following statement in its Section 6 on the Bidi algorithm:
123123+124124+ 3) If a string contains any RandALCat character, a RandALCat
125125+ character MUST be the first character of the string, and a
126126+ RandALCat character MUST be the last character of the string.
127127+128128+ (A RandALCat character is a character with unambiguously
129129+ right-to-left directionality.)
130130+131131+ The reasoning behind this prohibition was to ensure that every
132132+ component of a displayed domain name has an unambiguously preferred
133133+ direction. However, this made certain words in languages written
134134+ with right-to-left scripts invalid as IDN labels, and in at least one
135135+ case (Dhivehi) meant that all the words of an entire language were
136136+ forbidden as IDN labels.
137137+138138+ This is illustrated below with examples taken from the Dhivehi and
139139+ Yiddish languages, as written with the Thaana and Hebrew scripts,
140140+ respectively.
141141+142142+ RFC 3454 did not explicitly state the requirement to be fulfilled.
143143+ Therefore, it is impossible to determine whether a simple relaxation
144144+ of the rule would continue to fulfill the requirement.
145145+146146+ While this document specifies rules quite different from RFC 3454,
147147+ most reasonable labels that were allowed under RFC 3454 will also be
148148+ allowed under this specification (the most important example of
149149+ non-permitted labels being labels that mix Arabic and European digits
150150+ (AN and EN) inside an RTL label, and labels that use AN in an LTR
151151+ label -- see Section 1.4 for terminology), so the operational impact
152152+ of using the new rule in the updated IDNA specification is limited.
153153+154154+1.3. Structure of the Rest of This Document
155155+156156+ Section 2 defines a rule, the "Bidi rule", which can be used on a
157157+ domain name label to check how safe it is to use in a domain name of
158158+ possibly mixed directionality. The primary initial use of this rule
159159+ is as part of the IDNA2008 protocol [RFC5891].
160160+161161+ Section 3 sets out the requirements for defining the Bidi rule.
162162+163163+ Section 4 gives detailed examples that serve as justification for the
164164+ new rule.
165165+166166+167167+168168+169169+170170+Alvestrand & Karp Standards Track [Page 3]
171171+172172+RFC 5893 IDNA Right to Left August 2010
173173+174174+175175+ Section 5 to Section 8 describe various situations that can occur
176176+ when dealing with domain names with characters of different
177177+ directionality.
178178+179179+ Only Section 1.4 and Section 2 are normative.
180180+181181+1.4. Terminology
182182+183183+ The terminology used to describe IDNA concepts is defined in the
184184+ Definitions document [RFC5890].
185185+186186+ The terminology used for the Bidi properties of Unicode characters is
187187+ taken from the Unicode Standard [Unicode52].
188188+189189+ The Unicode Standard specifies a Bidi property for each character.
190190+ That property controls the character's behavior in the Unicode
191191+ bidirectional algorithm [Unicode-UAX9]. For reference, here are the
192192+ values that the Unicode Bidi property can have:
193193+194194+ o L - Left to right - most letters in LTR scripts
195195+196196+ o R - Right to left - most letters in non-Arabic RTL scripts
197197+198198+ o AL - Arabic letters - most letters in the Arabic script
199199+200200+ o EN - European Number (0-9, and Extended Arabic-Indic numbers)
201201+202202+ o ES - European Number Separator (+ and -)
203203+204204+ o ET - European Number Terminator (currency symbols, the hash sign,
205205+ the percent sign and so on)
206206+207207+ o AN - Arabic Number; this encompasses the Arabic-Indic numbers, but
208208+ not the Extended Arabic-Indic numbers
209209+210210+ o CS - Common Number Separator (. , / : et al)
211211+212212+ o NSM - Nonspacing Mark - most combining accents
213213+214214+ o BN - Boundary Neutral - control characters (ZWNJ, ZWJ, and others)
215215+216216+ o B - Paragraph Separator
217217+218218+ o S - Segment Separator
219219+220220+ o WS - Whitespace, including the SPACE character
221221+222222+ o ON - Other Neutrals, including @, &, parentheses, MIDDLE DOT
223223+224224+225225+226226+Alvestrand & Karp Standards Track [Page 4]
227227+228228+RFC 5893 IDNA Right to Left August 2010
229229+230230+231231+ o LRE, LRO, RLE, RLO, PDF - these are "directional control
232232+ characters" and are not used in IDNA labels.
233233+234234+ In this memo, we use "network order" to describe the sequence of
235235+ characters as transmitted on the wire or stored in a file; the terms
236236+ "first", "next", "previous", "beginning", "end", "before", and
237237+ "after" are used to refer to the relationship of characters and
238238+ labels in network order.
239239+240240+ We use "display order" to talk about the sequence of characters as
241241+ imaged on a display medium; the terms "left" and "right" are used to
242242+ refer to the relationship of characters and labels in display order.
243243+244244+ Most of the time, the examples use the abbreviations for the Unicode
245245+ Bidi classes to denote the directionality of the characters; the
246246+ example string CS L consists of one character of class CS and one
247247+ character of class L. In some examples, the convention that
248248+ uppercase characters are of class R or AL, and lowercase characters
249249+ are of class L is used -- thus, the example string ABC.abc would
250250+ consist of three right-to-left characters and three left-to-right
251251+ characters.
252252+253253+ The directionality of such examples is determined by context -- for
254254+ instance, in the sentence "ABC.abc is displayed as CBA.abc", the
255255+ first example string is in network order, the second example string
256256+ is in display order.
257257+258258+ The term "paragraph" is used in the sense of the Unicode Bidi
259259+ specification [Unicode-UAX9]. It means "a block of text that has an
260260+ overall direction, either left to right or right to left",
261261+ approximately; see the "Unicode Bidirectional Algorithm"
262262+ [Unicode-UAX9] for details.
263263+264264+ "RTL" and "LTR" are abbreviations for "right to left" and "left to
265265+ right", respectively.
266266+267267+ An RTL label is a label that contains at least one character of type
268268+ R, AL, or AN.
269269+270270+ An LTR label is any label that is not an RTL label.
271271+272272+ A "Bidi domain name" is a domain name that contains at least one RTL
273273+ label. (Note: This definition includes domain names containing only
274274+ dots and right-to-left characters. Providing a separate category of
275275+ "RTL domain names" would not make this specification simpler, so it
276276+ has not been done.)
277277+278278+279279+280280+281281+282282+Alvestrand & Karp Standards Track [Page 5]
283283+284284+RFC 5893 IDNA Right to Left August 2010
285285+286286+287287+2. The Bidi Rule
288288+289289+ The following rule, consisting of six conditions, applies to labels
290290+ in Bidi domain names. The requirements that this rule satisfies are
291291+ described in Section 3. All of the conditions must be satisfied for
292292+ the rule to be satisfied.
293293+294294+ 1. The first character must be a character with Bidi property L, R,
295295+ or AL. If it has the R or AL property, it is an RTL label; if it
296296+ has the L property, it is an LTR label.
297297+298298+ 2. In an RTL label, only characters with the Bidi properties R, AL,
299299+ AN, EN, ES, CS, ET, ON, BN, or NSM are allowed.
300300+301301+ 3. In an RTL label, the end of the label must be a character with
302302+ Bidi property R, AL, EN, or AN, followed by zero or more
303303+ characters with Bidi property NSM.
304304+305305+ 4. In an RTL label, if an EN is present, no AN may be present, and
306306+ vice versa.
307307+308308+ 5. In an LTR label, only characters with the Bidi properties L, EN,
309309+ ES, CS, ET, ON, BN, or NSM are allowed.
310310+311311+ 6. In an LTR label, the end of the label must be a character with
312312+ Bidi property L or EN, followed by zero or more characters with
313313+ Bidi property NSM.
314314+315315+ The following guarantees can be made based on the above:
316316+317317+ o In a domain name consisting of only labels that satisfy the rule,
318318+ the requirements of Section 3 are satisfied. Note that even LTR
319319+ labels and pure ASCII labels have to be tested.
320320+321321+ o In a domain name consisting of only LDH labels (as defined in the
322322+ Definitions document [RFC5890]) and labels that satisfy the rule,
323323+ the requirements of Section 3 are satisfied as long as a label
324324+ that starts with an ASCII digit does not come after a
325325+ right-to-left label.
326326+327327+ No guarantee is given for other combinations.
328328+329329+3. The Requirement Set for the Bidi Rule
330330+331331+ This document, unlike RFC 3454 [RFC3454], provides an explicit
332332+ justification for the Bidi rule, and states a set of requirements for
333333+ which it is possible to test whether or not the modified rule
334334+ fulfills the requirement.
335335+336336+337337+338338+Alvestrand & Karp Standards Track [Page 6]
339339+340340+RFC 5893 IDNA Right to Left August 2010
341341+342342+343343+ All the text in this document assumes that text containing the labels
344344+ under consideration will be displayed using the Unicode bidirectional
345345+ algorithm [Unicode-UAX9].
346346+347347+ The requirements proposed are these:
348348+349349+ o Label Uniqueness: No two labels, when presented in display order
350350+ in the same paragraph, should have the same sequence of characters
351351+ without also having the same sequence of characters in network
352352+ order, both when the paragraph has LTR direction and when the
353353+ paragraph has RTL direction. (This is the criterion that is
354354+ explicit in RFC 3454). (Note that a label displayed in an RTL
355355+ paragraph may display the same as a different label displayed in
356356+ an LTR paragraph and still satisfy this criterion.)
357357+358358+ o Character Grouping: When displaying a string of labels, using the
359359+ Unicode Bidi algorithm to reorder the characters for display, the
360360+ characters of each label should remain grouped between the
361361+ characters delimiting the labels, both when the string is embedded
362362+ in a paragraph with LTR direction and when it is embedded in a
363363+ paragraph with RTL direction.
364364+365365+ Several stronger statements were considered and rejected, because
366366+ they seem to be impossible to fulfill within the constraints of the
367367+ Unicode bidirectional algorithm. These include:
368368+369369+ o The appearance of a label should be unaffected by its embedding
370370+ context. This proved impossible even for ASCII labels; the label
371371+ "123-A" will have a different display order in an RTL context than
372372+ in an LTR context. (This particular example is, however,
373373+ disallowed anyway.)
374374+375375+ o The sequence of labels should be consistent with network order.
376376+ This proved impossible -- a domain name consisting of the labels
377377+ (in network order) L1.R2.R3.L4 will be displayed as L1.R3.R2.L4 in
378378+ an LTR context. (In an RTL context, it will be displayed as
379379+ L4.R3.R2.L1).
380380+381381+ o No two domain names should be displayed the same, even under
382382+ differing directionality. This was shown to be unsound, since the
383383+ domain name (in network order) ABC.abc will have display order
384384+ CBA.abc in an LTR context and abc.CBA in an RTL context, while the
385385+ domain name (network) abc.ABC will have display order abc.CBA in
386386+ an LTR context and CBA.abc in an RTL context.
387387+388388+389389+390390+391391+392392+393393+394394+Alvestrand & Karp Standards Track [Page 7]
395395+396396+RFC 5893 IDNA Right to Left August 2010
397397+398398+399399+ One possible requirement was thought to be problematic, but turned
400400+ out to be satisfied by a string that obeys the proposed rules:
401401+402402+ o The Character Grouping requirement should be satisfied when
403403+ directional controls (LRE, RLE, RLO, LRO, PDF) are used in the
404404+ same paragraph (outside of the labels). Because these controls
405405+ affect presentation order in non-obvious ways, by affecting the
406406+ "sor" and "eor" properties of the Unicode Bidi algorithm, the
407407+ conditions above require extra testing in order to figure out
408408+ whether or not they influence the display of the domain name.
409409+ Testing found that for the strings allowed under the rule
410410+ presented in this document, directional controls do not influence
411411+ the display of the domain name.
412412+413413+ This is still not stated as a requirement, since it did not seem as
414414+ important as the stated requirements, but it is useful to know that
415415+ Bidi domain names where the labels satisfy the rule have this
416416+ property.
417417+418418+ In the following descriptions, first-level bullets are used to
419419+ indicate rules or normative statements; second-level bullets are
420420+ commentary.
421421+422422+ The Character Grouping requirement can be more formally stated as:
423423+424424+ o Let "Delimiterchars" be a set of characters with the Unicode Bidi
425425+ properties CS, WS, ON. (These are commonly used to delimit labels
426426+ -- both the FULL STOP and the space are included. They are not
427427+ allowed in domain labels.)
428428+429429+ * ET, though it commonly occurs next to domain names in practice,
430430+ is problematic: the context R CS L EN ET (for instance A.a1%)
431431+ makes the label L EN not satisfy the character grouping
432432+ requirement.
433433+434434+ * ES commonly occurs in labels as HYPHEN-MINUS, but could also be
435435+ used as a delimiter (for instance, the plus sign). It is left
436436+ out here.
437437+438438+ o Let "unproblematic label" be a label that either satisfies the
439439+ requirements or does not contain any character with the Bidi
440440+ properties R, AL, or AN and does not begin with a character with
441441+ the Bidi property EN. (Informally, "it does not start with a
442442+ number".)
443443+444444+445445+446446+447447+448448+449449+450450+Alvestrand & Karp Standards Track [Page 8]
451451+452452+RFC 5893 IDNA Right to Left August 2010
453453+454454+455455+ A label X satisfies the Character Grouping requirement when, for any
456456+ Delimiter Character D1 and D2, and for any label S1 and S2 that is an
457457+ unproblematic label or an empty string, the following holds true:
458458+459459+ If the string formed by concatenating S1, D1, X, D2, and S2 is
460460+ reordered according to the Bidi algorithm, then all the characters of
461461+ X in the reordered string are between D1 and D2, and no other
462462+ characters are between D1 and D2, both if the overall paragraph
463463+ direction is LTR and if the overall paragraph direction is RTL.
464464+465465+ Note that the definition is self-referential, since S1 and S2 are
466466+ constrained to be "legal" by this definition. This makes testing
467467+ changes to proposed rules a little complex, but does not create
468468+ problems for testing whether or not a given proposed rule satisfies
469469+ the criterion.
470470+471471+ The "zero-length" case represents the case where a domain name is
472472+ next to something that isn't a domain name, separated by a delimiter
473473+ character.
474474+475475+ Note about the position of BN: The Unicode bidirectional algorithm
476476+ specifies that a BN has an effect on the adjoining characters in
477477+ network order, not in display order, and are therefore treated as if
478478+ removed during Bidi processing ([Unicode-UAX9], Section 3.3.2, rule
479479+ X9 and Section 5.3). Therefore, the question of "what position does
480480+ a BN have after reordering" is not meaningful. It has been ignored
481481+ while developing the rules here.
482482+483483+ The Label Uniqueness requirement can be formally stated as:
484484+485485+ If two non-identical labels X and Y, embedded as for the test above,
486486+ displayed in paragraphs with the same directionality, are reordered
487487+ by the Bidi algorithm into the same sequence of code points, the
488488+ labels X and Y cannot both be legal.
489489+490490+4. Examples of Issues Found with RFC 3454
491491+492492+4.1. Dhivehi
493493+494494+ Dhivehi, the official language of the Maldives, is written with the
495495+ Thaana script. This script displays some of the characteristics of
496496+ the Arabic script, including its directional properties, and the
497497+ indication of vowels by the diacritical marking of consonantal base
498498+ characters. This marking is obligatory, and both two consecutive
499499+ vowels and syllable-final consonants are indicated with unvoiced
500500+ combining marks. Every Dhivehi word therefore ends with a combining
501501+ mark.
502502+503503+504504+505505+506506+Alvestrand & Karp Standards Track [Page 9]
507507+508508+RFC 5893 IDNA Right to Left August 2010
509509+510510+511511+ The word for "computer", which is romanized as "konpeetaru", is
512512+ written with the following sequence of Unicode code points:
513513+514514+ U+0786 THAANA LETTER KAAFU (AL)
515515+516516+ U+07AE THAANA OBOFILI (NSM)
517517+518518+ U+0782 THAANA LETTER NOONU (AL)
519519+520520+ U+07B0 THAANA SUKUN (NSM)
521521+522522+ U+0795 THAANA LETTER PAVIYANI (AL)
523523+524524+ U+07A9 THAANA LETTER EEBEEFILI (AL)
525525+526526+ U+0793 THAANA LETTER TAVIYANI (AL)
527527+528528+ U+07A6 THAANA ABAFILI (NSM)
529529+530530+ U+0783 THAANA LETTER RAA (AL)
531531+532532+ U+07AA THAANA UBUFILI (NSM)
533533+534534+ The directionality class of U+07AA in the Unicode database
535535+ [Unicode52] is NSM (Nonspacing Mark), which is not R or AL; a
536536+ conformant implementation of the IDNA2003 algorithm will say that
537537+ "this is not in RandALCat" and refuse to encode the string.
538538+539539+4.2. Yiddish
540540+541541+ Yiddish is one of several languages written with the Hebrew script
542542+ (others include Hebrew and Ladino). This is basically a consonantal
543543+ alphabet (also termed an "abjad"), but Yiddish is written using an
544544+ extended form that is fully vocalic. The vowels are indicated in
545545+ several ways, one of which is by repurposing letters that are
546546+ consonants in Hebrew. Other letters are used both as vowels and
547547+ consonants, with combining marks, called "points", used to
548548+ differentiate between them. Finally, some base characters can
549549+ indicate several different vowels, which are also disambiguated by
550550+ combining marks. Pointed characters can appear in word-final
551551+ position and may therefore also be needed at the end of labels. This
552552+ is not an invariable attribute of a Yiddish string and there is thus
553553+ greater latitude here than there is with Dhivehi.
554554+555555+ The organization now known as the "YIVO Institute for Jewish
556556+ Research" developed orthographic rules for modern Standard Yiddish
557557+ during the 1930s on the basis of work conducted in several venues
558558+ since earlier in that century. These are given in, "The Standardized
559559+560560+561561+562562+Alvestrand & Karp Standards Track [Page 10]
563563+564564+RFC 5893 IDNA Right to Left August 2010
565565+566566+567567+ Yiddish Orthography: Rules of Yiddish Spelling" [SYO], and are taken
568568+ as normatively descriptive of modern Standard Yiddish in any context
569569+ where that notion is deemed relevant. They have been applied
570570+ exclusively in all formal Yiddish dictionaries published since their
571571+ establishment, and are similarly dominant in academic and
572572+ bibliographic regards.
573573+574574+ It therefore appears appropriate for this repertoire also to be
575575+ supported fully by IDNA. This presents no difficulty with characters
576576+ in initial and medial positions, but pointed characters are regularly
577577+ used in final position as well. All of the characters in the SYO
578578+ repertoire appear in both marked and unmarked form with one
579579+ exception: the HEBREW LETTER PE (U+05E4). The SYO only permits this
580580+ with a HEBREW POINT DAGESH (U+05BC), providing the Yiddish equivalent
581581+ to the Latin letter "p", or a HEBREW POINT RAFE (U+05BF), equivalent
582582+ to the Latin letter "f". There is, however, a separate unpointed
583583+ allograph, the HEBREW LETTER FINAL PE (U+05E3), for the latter
584584+ character when it appears in final position. The constraint on the
585585+ use of the SYO repertoire resulting from the proscription of
586586+ combining marks at the end of RTL strings thus reduces to nothing
587587+ more, or less, than the equivalent of saying that a string of Latin
588588+ characters cannot end with the letter "p". It must also be noted
589589+ that the HEBREW LETTER PE with the HEBREW POINT DAGESH is
590590+ characteristic of almost all traditional Yiddish orthographies that
591591+ predate (or remain in use in parallel to) the SYO, being the first
592592+ pointed character to appear in any of them.
593593+594594+ A more general instantiation of the basic problem can be seen in the
595595+ representation of the YIVO acronym. This acronym is written with the
596596+ Hebrew letters YOD YOD HIRIQ VAV VAV ALEF QAMATS, where HIRIQ and
597597+ QAMATS are combining points. The Unicode code points are:
598598+599599+ U+05D9 HEBREW LETTER YOD (R)
600600+601601+ U+05B4 HEBREW POINT HIRIQ (NSM)
602602+603603+ U+05D5 HEBREW LETTER VAV (R)
604604+605605+ U+05D0 HEBREW LETTER ALEF (R)
606606+607607+ U+05B8 HEBREW POINT QAMATS (NSM)
608608+609609+ The directionality class of U+05B8 HEBREW POINT QAMATS in the Unicode
610610+ database is NSM, which again causes the IDNA2003 algorithm to reject
611611+ the string.
612612+613613+614614+615615+616616+617617+618618+Alvestrand & Karp Standards Track [Page 11]
619619+620620+RFC 5893 IDNA Right to Left August 2010
621621+622622+623623+ It may also be noted that all of the combined characters mentioned
624624+ above exist in precomposed form at separate positions in the Unicode
625625+ chart. However, by invoking Stringprep, the IDNA2003 algorithm also
626626+ rejects those code points, for reasons not discussed here.
627627+628628+4.3. Strings with Numbers
629629+630630+ By requiring that the first or last character of a string be a member
631631+ of category R or AL, the Stringprep specification [RFC3454]
632632+ prohibited a string containing right-to-left characters from ending
633633+ with a number.
634634+635635+ Consider the strings ALEF 5 (HEBREW LETTER ALEF + DIGIT FIVE) and 5
636636+ ALEF. Displayed in an LTR context, the first one will be displayed
637637+ from left to right as 5 ALEF (with the 5 being considered right to
638638+ left because of the leading ALEF), while 5 ALEF will be displayed in
639639+ exactly the same order (5 taking the direction from context).
640640+ Clearly, only one of those should be permitted as a registered label,
641641+ but barring them both seems unnecessary.
642642+643643+5. Troublesome Situations and Guidelines
644644+645645+ There are situations in which labels that satisfy the rule above will
646646+ be displayed in a surprising fashion. The most important of these is
647647+ the case where a label ending in a character with Bidi property AL,
648648+ AN, or R occurs before a label beginning with a character of Bidi
649649+ property EN. In that case, the number will appear to move into the
650650+ label containing the right-to-left character, violating the Character
651651+ Grouping requirement.
652652+653653+ If the label that occurs after the right-to-left label itself
654654+ satisfies the Bidi criterion, the requirements will be satisfied in
655655+ all cases (this is the reason why the criterion talks about strings
656656+ containing L in some cases). However, the IDNABIS WG concluded that
657657+ this could not be required for several reasons:
658658+659659+ o There is a large current deployment of ASCII domain names starting
660660+ with digits. These cannot possibly be invalidated.
661661+662662+ o Domain names are often constructed piecemeal, for instance, by
663663+ combining a string with the content of a search list. This may
664664+ occur after IDNA processing, and thus in part of the code that is
665665+ not IDNA-aware, making detection of the undesirable combination
666666+ impossible.
667667+668668+669669+670670+671671+672672+673673+674674+Alvestrand & Karp Standards Track [Page 12]
675675+676676+RFC 5893 IDNA Right to Left August 2010
677677+678678+679679+ o Even if a label is registered under a "safe" label, there may be a
680680+ DNAME [RFC2672] with an "unsafe" label that points to the "safe"
681681+ label, thus creating seemingly valid names that would not satisfy
682682+ the criterion.
683683+684684+ o Wildcards create the odd situation where a label is "valid" (can
685685+ be looked up successfully) without the zone owner knowing that
686686+ this label exists. So an owner of a zone whose name starts with a
687687+ digit and contains a wildcard has no way of controlling whether or
688688+ not names with RTL labels in them are looked up in his zone.
689689+690690+ Rather than trying to suggest rules that disallow all such
691691+ undesirable situations, this document merely warns about the
692692+ possibility, and leaves it to application developers to take whatever
693693+ measures they deem appropriate to avoid problematic situations.
694694+695695+6. Other Issues in Need of Resolution
696696+697697+ This document concerns itself only with the rules that are needed
698698+ when dealing with domain names with characters that have differing
699699+ Bidi properties, and considers characters only in terms of their Bidi
700700+ properties. All other issues with scripts that are written from
701701+ right to left must be considered in other contexts.
702702+703703+ One such issue is the need to keep numbers separate. Several scripts
704704+ are used with multiple sets of numbers -- most commonly they use
705705+ Latin numbers and a script-specific set of numbers, but in the case
706706+ of Arabic, there are two sets of "Arabic-Indic" digits involved.
707707+708708+ The algorithm in this document disallows occurrences of AN-class
709709+ characters ("Arabic-Indic digits", U+0660 to U+0669) together with
710710+ EN-class characters (which includes "European" digits, U+0030 to
711711+ U+0039 and "extended Arabic-Indic digits", U+06F0 to U+06F9), but
712712+ does not help in preventing the mixing of, for instance, Bengali
713713+ digits (U+09E6 to U+09EF) and Gujarati digits (U+0AE6 to U+0AEF),
714714+ both of which have Bidi class L. A registry or script community that
715715+ wishes to create rules restricting the mixing of digits in a label
716716+ will be able to specify these restrictions at the registry level.
717717+ Some rules are also specified at the protocol level.
718718+719719+ Another set of issues concerns the proper display of IDNs with a
720720+ mixture of LTR and RTL labels, or only RTL labels.
721721+722722+ It is unrealistic to expect that applications will display domain
723723+ names using embedded formatting codes between their labels (for one
724724+ thing, no reliable algorithms for identifying domain names in running
725725+ text exist); thus, the display order will be determined by the Bidi
726726+ algorithm. Thus, a sequence (in network order) of R1.R2.ltr will be
727727+728728+729729+730730+Alvestrand & Karp Standards Track [Page 13]
731731+732732+RFC 5893 IDNA Right to Left August 2010
733733+734734+735735+ displayed in the order 2R.1R.ltr in an LTR context, which might
736736+ surprise someone expecting to see labels displayed in hierarchical
737737+ order. People used to working with text that mixes LTR and RTL
738738+ strings might not be so surprised by this. Again, this memo does not
739739+ attempt to suggest a solution to this problem.
740740+741741+7. Compatibility Considerations
742742+743743+7.1. Backwards Compatibility Considerations
744744+745745+ As with any change to an existing standard, it is important to
746746+ consider what happens with existing implementations when the change
747747+ is introduced. Some troublesome cases include:
748748+749749+ o An old program used to input the newly allowed label. If the old
750750+ program checks the input against RFC 3454, some labels will not be
751751+ allowed, and domain names containing those labels will remain
752752+ inaccessible.
753753+754754+ o An old program is asked to display the newly allowed label, and
755755+ checks it against RFC 3454 before displaying. The program will
756756+ perform some kind of fallback, most likely displaying the label in
757757+ A-label form.
758758+759759+ o An old program tries to display the newly allowed label. If the
760760+ old program has code for displaying the last character of a label
761761+ that is different from the code used to display the characters in
762762+ the middle of the label, the display may be inconsistent and cause
763763+ confusion.
764764+765765+ One particular example of the last case is if a program chooses to
766766+ examine the last character (in network order) of a string in order to
767767+ determine its directionality, rather than its first. If it finds an
768768+ NSM character and tries to display the string as if it was a
769769+ left-to-right string, the resulting display may be interesting, but
770770+ not useful.
771771+772772+ The editors believe that these cases will have a less harmful impact
773773+ in practice than continuing to deny the use of words from the
774774+ languages for which these strings are necessary as IDN labels.
775775+776776+ This specification does not forbid using leading European digits in
777777+ ASCII-only labels, since this would conflict with a large installed
778778+ base of such labels, and would increase the scope of the
779779+ specification from RTL labels to all labels. The harm resulting from
780780+ this limitation of scope is described in Section 5. Registries and
781781+ private zone managers can check for this particular condition before
782782+ they allow registration of any RTL label. Generally, it is best to
783783+784784+785785+786786+Alvestrand & Karp Standards Track [Page 14]
787787+788788+RFC 5893 IDNA Right to Left August 2010
789789+790790+791791+ disallow registration of any right-to-left strings in a zone where
792792+ the label at the level above begins with a digit.
793793+794794+7.2. Forward Compatibility Considerations
795795+796796+ This text is intentionally specified strictly in terms of the Unicode
797797+ Bidi properties. The determination that the condition is sufficient
798798+ to fulfill the criteria depends on the Unicode Bidi algorithm; it is
799799+ unlikely that drastic changes will be made to this algorithm.
800800+801801+ However, the determination of validity for any string depends on the
802802+ Unicode Bidi property values, which are not declared immutable by the
803803+ Unicode Consortium. Furthermore, the behavior of the algorithm for
804804+ any given character is likely to be linguistically and culturally
805805+ sensitive, so while it should occur rarely, it is possible that later
806806+ versions of the Unicode Standard may change the Bidi properties
807807+ assigned to certain Unicode characters.
808808+809809+ This memo does not propose a solution for this problem.
810810+811811+8. Security Considerations
812812+813813+ The display behavior of mixed-direction text can be extremely
814814+ surprising to users who are not used to it; for instance, cut and
815815+ paste of a piece of text can cause the text to display differently at
816816+ the destination, if the destination is in another directionality
817817+ context, and adding a character in one place of a text can cause
818818+ characters some distance from the point of insertion to change their
819819+ display position. This is, however, not a phenomenon unique to the
820820+ display of domain names.
821821+822822+ The new IDNA protocol, and particularly these new Bidi rules, will
823823+ allow some strings to be used in IDNA contexts that are not allowed
824824+ today. It is possible that differences in the interpretation of
825825+ labels between implementations of IDNA2003 and IDNA2008 could pose a
826826+ security risk, but it is difficult to envision any specific
827827+ instantiation of this.
828828+829829+ Any rational attempt to compute, for instance, a hash over an
830830+ identifier processed by IDNA would use network order for its
831831+ computation, and thus be unaffected by the new rules proposed here.
832832+833833+ While it is not believed to pose a problem, if display routines had
834834+ been written with specific knowledge of the RFC 3454 IDNA
835835+ prohibitions, it is possible that the potential problems noted under
836836+ "Backwards Compatibility Considerations" could cause new kinds of
837837+ confusion.
838838+839839+840840+841841+842842+Alvestrand & Karp Standards Track [Page 15]
843843+844844+RFC 5893 IDNA Right to Left August 2010
845845+846846+847847+9. Acknowledgements
848848+849849+ While the listed editors held the pen, this document represents the
850850+ joint work and conclusions of an ad hoc design team. In addition to
851851+ the editors, this consisted of, in alphabetic order, Tina Dam, Patrik
852852+ Faltstrom, and John Klensin. Many further specific contributions and
853853+ helpful comments were received from the people listed below, and
854854+ others who have contributed to the development and use of the IDNA
855855+ protocols.
856856+857857+ The particular formulation of the Bidi rule in Section 2 was
858858+ suggested by Matitiahu Allouche.
859859+860860+ The team wishes, in particular, to thank Roozbeh Pournader for
861861+ calling its attention to the issue with the Thaana script, Paul
862862+ Hoffman for pointing out the need to be explicit about backwards
863863+ compatibility considerations, Ken Whistler for suggesting the basis
864864+ of the formalized "Character Grouping" requirement, Mark Davis for
865865+ commentary, Erik van der Poel for careful review, comments, and
866866+ verification of the rulesets, Marcos Sanz, Andrew Sullivan, and Pete
867867+ Resnick for reviews, and Vint Cerf for chairing the working group and
868868+ contributing massively to getting the documents finished.
869869+870870+10. References
871871+872872+10.1. Normative References
873873+874874+ [RFC5890] Klensin, J., "Internationalized Domain Names for
875875+ Applications (IDNA): Definitions and Document
876876+ Framework", RFC 5890, August 2010.
877877+878878+ [Unicode-UAX9] The Unicode Consortium, "Unicode Standard Annex #9:
879879+ Unicode Bidirectional Algorithm", September 2009,
880880+ <http://www.unicode.org/reports/tr9/>.
881881+882882+ [Unicode52] The Unicode Consortium. The Unicode Standard, Version
883883+ 5.2.0, defined by: "The Unicode Standard, Version
884884+ 5.2.0", (Mountain View, CA: The Unicode Consortium,
885885+ 2009. ISBN 978-1-936213-00-9).
886886+ <http://www.unicode.org/versions/Unicode5.2.0/>.
887887+888888+889889+890890+891891+892892+893893+894894+895895+896896+897897+898898+Alvestrand & Karp Standards Track [Page 16]
899899+900900+RFC 5893 IDNA Right to Left August 2010
901901+902902+903903+10.2. Informative References
904904+905905+ [RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection",
906906+ RFC 2672, August 1999.
907907+908908+ [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
909909+ Internationalized Strings ("stringprep")", RFC 3454,
910910+ December 2002.
911911+912912+ [RFC5891] Klensin, J., "Internationalized Domain Names in
913913+ Applications (IDNA): Protocol", RFC 5891, August 2010.
914914+915915+ [SYO] "The Standardized Yiddish Orthography: Rules of
916916+ Yiddish Spelling, 6th ed., New York, ISBN
917917+ 0-914512-25-0", 1999.
918918+919919+Authors' Addresses
920920+921921+ Harald Tveit Alvestrand (editor)
922922+ Google
923923+ Beddingen 10
924924+ Trondheim, 7014
925925+ Norway
926926+927927+ EMail: harald@alvestrand.no
928928+929929+930930+ Cary Karp
931931+ Swedish Museum of Natural History
932932+ Frescativ. 40
933933+ Stockholm, 10405
934934+ Sweden
935935+936936+ Phone: +46 8 5195 4055
937937+ Fax:
938938+ EMail: ck@nic.museum
939939+940940+941941+942942+943943+944944+945945+946946+947947+948948+949949+950950+951951+952952+953953+954954+Alvestrand & Karp Standards Track [Page 17]
955955+