Consultor Eletrônico

Status: Unverified

GOAL:

I18N What is Compatibility decomposition

FIX:

Compatibility decomposition is the process of taking a string, replacing composite characters using both the Unicode canonical decomposition mappings and the Unicode compatibility decomposition mappings, and putting the result in canonical order.

Example of decomposition:

Take the string with the characters "ác´¸" (a-acute, c, acute, cedilla)

The data file contains the following relevant information:
code; name; ... combining class; ... decomposition.
0061;LATIN SMALL LETTER A;...0;...
0063;LATIN SMALL LETTER C;...0;...
00E1;LATIN SMALL LETTER A WITH ACUTE;...0;...0061 0301;...
0107;LATIN SMALL LETTER C WITH ACUTE;...0;...0063 0301;...
0301;COMBINING ACUTE ACCENT;...230;...
0327;COMBINING CEDILLA;...202;...

Applying the canonical decomposition mappings, we get "a´c´¸" (a, acute, c, acute, cedilla).

This is because 00E1 (a-acute) has a canonical decomposition mapping to 0061 0301 (a, acute).

Applying the canonical ordering, we get "a´c¸´" (a, acute, c, cedilla, acute).

This is because cedilla has a lower combining class (202) than acute (230) does. The positions of 'a' and 'c' are not affected, since they are starters.