UASG Recommendations
*User interface elements requiring user to type domain name/email address must support Unicode and strings up to 256 characters
*Users should be allowed to enter ASCII Compatible Encoded text (“Punycoded”) in place of Unicode equivalent
*Unicode should be shown by default
*Punycoded text should only be shown when provides a benefit
Many programmers have been trained to validate by following heuristics that require checking that a top-level domain has the “correct” number of letters, or that the letters are from the ASCII character set. These heuristics are no longer applicable because of the introduction of domain names with more than three characters, and Unicode (non-ASCII) characters.
Regardless of the lifetime of the data, it should be stored in RFC-defined formats (preferred) or other formats which can transform between RFC-defined formats.
UTF-8 (Unicode Transformation Format). Some systems may require support for UTF-16 as well, but generally UTF-8 is preferred. UTF-7 and UTF-32 should be avoided.
Consider all end-to-end scenarios before converting A-Labels to U-Labels and vice versa when storing. It may be desirable to maintain only U-Labels in a file or database, because it simplifies searching and sorting. However, conversion may have implications when interoperating with older, non-Unicode-enabled applications and services. Consider storing in both formats.
Instances where email addresses and domain names have been filed under the “author” field of a document or “contact info” in a log file have led to the loss of origin as an address.
Activity (e.g., searching or sorting a list)
Alternate format (e.g., storing ASCII as Unicode).
Additional validation may also occur during processing. Domain names and email addresses can be processed in an unlimited number of ways*, which reinforces the need for conventions that ensure data is being understood and classified consistently. *Examples: Identify people in New Zealand by searching within the .nz ccTLD; identify pharmacists by searching for user@*.pharmacist email addresses.
Recommendations:
Since the Unicode standard is continually expanding, code points not defined when the application or service was created should be checked to ensure they will not “break” the user experience. Missing fonts in the underlying operating system may result in non-displayable characters (frequently the “” character is used to represent these), but this situation should not result in a fatal crash.
Use supported Unicode-enabled APIs.
Use the latest Internationalized Domain Names in Applications (IDNA) Protocol [http://tools.ietf.org/html/rfc5891] and Tables [http://tools.ietf.org/html/rfc5892] documents for Internationalized Domain Names (IDNs).
Process in UTF-8 format wherever possible.
Ensure that the product or feature handles numbers as expected. For example, ASCII numerals and Asian ideographic number representations should be treated as numbers. [RFC5892, link above]
Upgrade applications and servers/services together. If the server is Unicode and client is non-Unicode or vice versa, the data will need to be converted to each code page every time the data travels between server and client.
Perform code reviews to avoid buffer overflow attacks. When doing character transformation, text strings may grow or shrink substantially.
Activity (e.g., searching or sorting a list)
Alternate format (e.g., storing ASCII as Unicode).
Additional validation may also occur during processing. Domain names and email addresses can be processed in an unlimited number of ways*, which reinforces the need for conventions that ensure data is being understood and classified consistently. *Examples: Identify people in New Zealand by searching within the .nz ccTLD; identify pharmacists by searching for user@*.pharmacist email addresses.
Recommendations:
Since the Unicode standard is continually expanding, code points not defined when the application or service was created should be checked to ensure they will not “break” the user experience. Missing fonts in the underlying operating system may result in non-displayable characters (frequently the “” character is used to represent these), but this situation should not result in a fatal crash.
Use supported Unicode-enabled APIs.
Use the latest Internationalized Domain Names in Applications (IDNA) Protocol [http://tools.ietf.org/html/rfc5891] and Tables [http://tools.ietf.org/html/rfc5892] documents for Internationalized Domain Names (IDNs).
Process in UTF-8 format wherever possible.
Ensure that the product or feature handles numbers as expected. For example, ASCII numerals and Asian ideographic number representations should be treated as numbers. [RFC5892, link above]
Upgrade applications and servers/services together. If the server is Unicode and client is non-Unicode or vice versa, the data will need to be converted to each code page every time the data travels between server and client.
Perform code reviews to avoid buffer overflow attacks. When doing character transformation, text strings may grow or shrink substantially.
Displaying domain names and email addresses is usually straightforward when the scripts used are supported in the underlying OS and strings are stored in Unicode; however, application-specific transformations may be required otherwise.
Recommendations
Display all Unicode code points which are supported by the underlying operating system. If an application maintains its own font sets, comprehensive Unicode support should be offered to the collection of fonts available from the operating system.
When developing an app or a service, or when operating a registry, consider the languages supported and make sure OS and applications cover those languages.
Convert non-Unicode data to Unicode before display. For example, the end user should see “everyone.みんな” as opposed to “everyone.xn--q9jyb4c”. (This conversion is an example of UA-ready processing).
Display Unicode by default. Use Punycoded text to the user only when it provides a benefit. Augment Unicode display with Punycoded hover text as a mitigation.
Consider that mixed-script addresses will become more common. Some Unicode characters may look the same to the human eye, but different to computers. Don’t assume that mixed-script strings are intended for malicious purposes, such as phishing, and if the user interface calls the strings to the user’s attention, be sure that it does so in a way which is not
prejudicial to users of non-Latin scripts. Learn more about Unicode Security Considerations at: http://unicode.org/reports/tr36/.
Use Unicode IDNA Compatibility Processing in order to match user expectations. To learn more, go to: http://unicode.org/reports/tr46/.
Be aware of unassigned and disallowed characters. Learn more at RFC 5892: https://tools.ietf.org/rfc/rfc5892.txt.
Displaying domain names and email addresses is usually straightforward when the scripts used are supported in the underlying OS and strings are stored in Unicode; however, application-specific transformations may be required otherwise.
Recommendations
Display all Unicode code points which are supported by the underlying operating system. If an application maintains its own font sets, comprehensive Unicode support should be offered to the collection of fonts available from the operating system.
When developing an app or a service, or when operating a registry, consider the languages supported and make sure OS and applications cover those languages.
Convert non-Unicode data to Unicode before display. For example, the end user should see “everyone.みんな” as opposed to “everyone.xn--q9jyb4c”. (This conversion is an example of UA-ready processing).
Display Unicode by default. Use Punycoded text to the user only when it provides a benefit. Augment Unicode display with Punycoded hover text as a mitigation.
Consider that mixed-script addresses will become more common. Some Unicode characters may look the same to the human eye, but different to computers. Don’t assume that mixed-script strings are intended for malicious purposes, such as phishing, and if the user interface calls the strings to the user’s attention, be sure that it does so in a way which is not
prejudicial to users of non-Latin scripts. Learn more about Unicode Security Considerations at: http://unicode.org/reports/tr36/.
Use Unicode IDNA Compatibility Processing in order to match user expectations. To learn more, go to: http://unicode.org/reports/tr46/.
Be aware of unassigned and disallowed characters. Learn more at RFC 5892: https://tools.ietf.org/rfc/rfc5892.txt.
Content Credit: UASG Tech.
More Stories
ICANN selects new CEO and announces job cuts as preparation for new gTLD window accelerates: Domain Watch (June 2024) – World Trademark Review
Entertainment Report: African Children’s Choir celebrates 40 years – VOA Africa
Transforming Food Safety with the Internet of Things – Food Poisoning News