Consultor Eletrônico



Kbase P130103: Only first character of string displayed after using CODEPAGE-CONVERT function with code page UTF-16
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   02/04/2008
Status: Unverified

FACT(s) (Environment):

OpenEdge 10.x
All Supported Operating Systems

SYMPTOM(s):

Converting string to UTF-16 code page

Using CODEPAGE-CONVERT with UTF-16 code page

Only first character of string is displayed after operation

CODEPAGE-CONVERT("abc","UTF-16") will return "a"

CODEPAGE-CONVERT("abc","UTF-8") will return "abc"

CAUSE:

This is expected behaviour. Standard Western characters (ASCII, 255 and less) are 8-bit. Characters in the UTF-16 code-page are 16-bit.When converting a string to UTF-16, Western characters need to be padded out to 16-bit and this is achieved by adding an extra zero (0).
There are 2 forms of UTF-16: UTF-16LE (default) adds the extra zero after a character; UTF-16BE adds the extra zero before a character.
A UTF-16 client recognizes double-zero (00) as a string terminator.
A UTF-8 client recognizes a single zero (0) as a string terminator.
The Progress client is a UTF-8 client.
The Progress client interprets the first zero (added as padding to make the character 16-bit) as a string terminator.
Therefore a standard MESSAGE or DISPLAY with only show the first character of a string, but the rest of the string data is stored in the variable.
If not specified, UTF-16 defaults to UTF-16LE. If specifying UTF-16BE then no value is displayed.

FIX:

Option #1
Parse the data to remove the padding added by UTF-16.

Option #2
Use XML to transfer data, setting the encoding of the XML document to UTF-16. The information will be parsed by the SAX parser rather than the Progress client.

Option #3
Use UTF-8 instead.