Consultor Eletrônico



Kbase P82165: I18N What is Canonical decomposition
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   27/05/2004
Status: Unverified

GOAL:

I18N What is Canonical decomposition

FIX:

Canonical decomposition is the process of taking a string, recursively replacing composite characters using the Unicode canonical decomposition mappings (including the algorithmic Hangul canonical decomposition mappings), and putting the result in canonical order.

Example of decomposition:

Take the string with the characters "ác´¸" (a-acute, c, acute, cedilla)

The data file contains the following relevant information:
code; name; ... combining class; ... decomposition.
0061;LATIN SMALL LETTER A;...0;...
0063;LATIN SMALL LETTER C;...0;...
00E1;LATIN SMALL LETTER A WITH ACUTE;...0;...0061 0301;...
0107;LATIN SMALL LETTER C WITH ACUTE;...0;...0063 0301;...
0301;COMBINING ACUTE ACCENT;...230;...
0327;COMBINING CEDILLA;...202;...

Applying the canonical decomposition mappings, we get "a´c´¸" (a, acute, c, acute, cedilla).

This is because 00E1 (a-acute) has a canonical decomposition mapping to 0061 0301 (a, acute)

Applying the canonical ordering, we get "a´c¸´" (a, acute, c, cedilla, acute).

This is because cedilla has a lower combining class (202) than acute (230) does.

The positions of 'a' and 'c' are not affected, since they are starters.