Kbase P130103: Only first character of string displayed after using CODEPAGE-CONVERT function with code page UTF-16
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  02/04/2008 |
|
Status: Unverified
FACT(s) (Environment):
OpenEdge 10.x
All Supported Operating Systems
SYMPTOM(s):
Converting string to UTF-16 code page
Using CODEPAGE-CONVERT with UTF-16 code page
Only first character of string is displayed after operation
CODEPAGE-CONVERT("abc","UTF-16") will return "a"
CODEPAGE-CONVERT("abc","UTF-8") will return "abc"
CAUSE:
This is expected behaviour. Standard Western characters (ASCII, 255 and less) are 8-bit. Characters in the UTF-16 code-page are 16-bit.When converting a string to UTF-16, Western characters need to be padded out to 16-bit and this is achieved by adding an extra zero (0).
There are 2 forms of UTF-16: UTF-16LE (default) adds the extra zero after a character; UTF-16BE adds the extra zero before a character.
A UTF-16 client recognizes double-zero (00) as a string terminator.
A UTF-8 client recognizes a single zero (0) as a string terminator.
The Progress client is a UTF-8 client.
The Progress client interprets the first zero (added as padding to make the character 16-bit) as a string terminator.
Therefore a standard MESSAGE or DISPLAY with only show the first character of a string, but the rest of the string data is stored in the variable.
If not specified, UTF-16 defaults to UTF-16LE. If specifying UTF-16BE then no value is displayed.
FIX:
Option #1
Parse the data to remove the padding added by UTF-16.
Option #2
Use XML to transfer data, setting the encoding of the XML document to UTF-16. The information will be parsed by the SAX parser rather than the Progress client.
Option #3
Use UTF-8 instead.