Skip to main content
Resources

Second-Level Reference Label Generation Rules

ICANN has developed second-level Internationalized Domain Name (IDN) tables in machine-readable format or Label Generation Rules (LGRs) that registry operators can reference while designing their IDN tables. These reference LGRs will be used by ICANN org when reviewing IDN tables submitted for use with the generic top-level domains (gTLDs).

The reference LGRs have been developed using guidelines, which have been reviewed by the community. These LGRs are provided below in the XML format along with a more readable HTML format.

If you have questions or feedback regarding these reference LGRs, please send an email to IDNprogram@icann.org.

Current Version (25 October 2024)

The current version of Second-Level Reference LGRs are developed in consultation with the respective script communities. Other resources are also consulted where available, e.g., the Root Zone Label Generation Rules (RZ-LGR). They are finalized after a Public Comment proceeding.

This version includes: Balinese script and Thaana script LGRs as well as Inuktitut language LGR. Common LGR is updated to incorporate these new LGRs.

See the Overview and Summary document for further details about these LGRs. The package of all LGRs is available here[ZIP, 13.6MB]. The LGRs with normative updates are marked with (*) and the changes are documented in the LGR document.

Script-based LGRs

Name Language Tag1 LGR Document
Arabic und-Arab HTML, XML
Armenian und-Armn HTML, XML
Balinese und-Bali HTML, XML
Supporting Document
Bangla (Bengali) und-Beng HTML, XML
Cyrillic und-Cyrl HTML, XML
Devanagari und-Deva HTML, XML
Ethiopic und-Ethi HTML, XML
Georgian und-Geor HTML, XML
Greek und-Grek HTML, XML
Gujarati und-Gujr HTML, XML
Gurmukhi und-Guru HTML, XML
Hebrew und-Hebr HTML, XML
Japanese und-Jpan HTML, XML
Kannada und-Knda HTML, XML
Khmer und-Khmr HTML, XML
Lao und-Laoo HTML, XML
Latin und-Latn HTML, XML
Malayalam und-Mlym HTML, XML
Myanmar und-Mymr HTML, XML
Oriya und-Orya HTML, XML
Sinhala und-Sinh HTML, XML
Tamil und-Taml HTML, XML
Telugu und-Telu HTML, XML
Thaana und-Thaa HTML, XML
Supporting Document
Thai und-Thai HTML, XML

1: The prefix 'und' (Undetermined) identifies linguistic content whose language is not determined. Please see RFC5646 for details of the language tag syntax and IANA language sub tag registry for the available language tags.

Full Variant Set LGRs and Common LGR

A set of "full-variant" LGR has been defined that collectively contains the cross-script variants identified to mitigate whole-script homograph labels mostly within the related scripts.

Name Language Tag Script Collection LGR Document
Chinese (Full Variant Set) und-Hani Han used in Chinese, Korean, Japanese scripts HTML, XML
Devanagari (Full Variant Set) und-Deva Devanagari, Bengali, and Gurmukhi HTML, XML
Korean (Full Variant Set) und-Kore Hangul and Han used in Chinese and Korean script HTML, XML
Latin (Full Variant Set) und-Latn Armenian, Cyrillic, Greek, Hebrew, and Latin HTML, XML
Myanmar (Full Variant Set)* und-Mymr Georgian, Latin, Malayalam, Myanmar, and Oriya HTML, XML
Tamil (Full Variant Set) und-Taml Tamil and Malayalam HTML, XML
Telugu (Full Variant Set) und-Telu Kannada and Telugu HTML, XML
Common LGR Multiple Tags All scripts HTML, XML

Language-based LGRs

Name Language Tag2 LGR Document
Arabic ar HTML, XML
Belarusian be HTML, XML
Bosnian (Cyrillic) bs-Cyrl HTML, XML
Bosnian (Latin) bs HTML, XML
Bulgarian bg HTML, XML
Chinese zh HTML, XML
Danish da HTML, XML
English en HTML, XML
Finnish fi HTML, XML
French fr HTML, XML
German de HTML, XML
Hebrew he HTML, XML
Hindi hi HTML, XML
Hungarian hu HTML, XML
Icelandic is HTML, XML
Inuktitut iu-Cans HTML, XML
Supporting Document
Italian it HTML, XML
Japanese (Standalone) ja HTML, XML
Korean (Hangul) ko HTML, XML
Latvian lv HTML, XML
Lithuanian lt HTML, XML
Macedonian mk HTML, XML
Montenegrin cnr-Cyrl HTML, XML
Norwegian no HTML, XML
Polish pl HTML, XML
Portuguese pt HTML, XML
Russian ru HTML, XML
Serbian sr-Cyrl HTML, XML
Spanish* es HTML, XML
Swedish sv HTML, XML
Thai th HTML, XML
Ukrainian uk HTML, XML

2: Where the default script is not identified, the script information is included to avoid ambiguity.

Full Variant Set LGRs for RSP Evaluation Progarm

All languages and scripts have associated "rsp-full-variant" LGRs which include the injected cross-repertoire variant sets. They are used as part of the Registry Service Provider (RSP) Evaluation Program. Further details are available at: https://newgtldprogram.icann.org/en/application-rounds/round2/rsp/full-variant-set-lgrs

Archives

Version 24 January 2024

Versions before 24 January 2024

Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."