Consultor Eletrônico



Kbase P124474: SAX parser does not return extended characters correctly when using -cpinternal UTF-8
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   19/06/2007
Status: Unverified

FACT(s) (Environment):

OpenEdge 10.x

SYMPTOM(s):

Using SAX Parser

XML document contains extended characters

Incorrect data returned when session is started with -cpinternal UTF-8

ALBANIË is returned as ALBANI@

Retrieving data using GET-STRING function

NumChars parameter of Characters callback procedure is used to determine amount of data retrieved

Problem does not occur when another code-page is specified with -cpinternal

Problem does not occur when using LONGCHAR data-type in place of MEMPTR in the Characters callback procedure

Problem does not occur when numChars parameter is not used to specify amount of data retrieved by GET-STRING function

CAUSE:

This is expected behaviour due to the use of numChars with extended characters. If a character requires more than one byte to encode, the value of numChars might not match the value returned by MEMPTR:GET-SIZE(). NumChars returns the number of characters rather than the number of bytes. For example, the character length of ALBANIË is 7; the number of bytes used is 8 due to the extended character.

FIX:

Option #1
Use GET-SIZE function in place of value of numChars:

cValue = GET-STRING(charData,1,GET-SIZE(charData)).

Option #2
Do not specify a value for numbytes in the GET-STRING function:

cValue = GET-STRING(charData,1).

Option #3
Use a LONGCHAR data-type instead of MEMPTR in the Characters callback function thereby avoiding the GET-STRING function altogether