Kbase P124474: SAX parser does not return extended characters correctly when using -cpinternal UTF-8
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  19/06/2007 |
|
Status: Unverified
FACT(s) (Environment):
OpenEdge 10.x
SYMPTOM(s):
Using SAX Parser
XML document contains extended characters
Incorrect data returned when session is started with -cpinternal UTF-8
ALBANIË is returned as ALBANI@
Retrieving data using GET-STRING function
NumChars parameter of Characters callback procedure is used to determine amount of data retrieved
Problem does not occur when another code-page is specified with -cpinternal
Problem does not occur when using LONGCHAR data-type in place of MEMPTR in the Characters callback procedure
Problem does not occur when numChars parameter is not used to specify amount of data retrieved by GET-STRING function
CAUSE:
This is expected behaviour due to the use of numChars with extended characters. If a character requires more than one byte to encode, the value of numChars might not match the value returned by MEMPTR:GET-SIZE(). NumChars returns the number of characters rather than the number of bytes. For example, the character length of ALBANIË is 7; the number of bytes used is 8 due to the extended character.
FIX:
Option #1
Use GET-SIZE function in place of value of numChars:
cValue = GET-STRING(charData,1,GET-SIZE(charData)).
Option #2
Do not specify a value for numbytes in the GET-STRING function:
cValue = GET-STRING(charData,1).
Option #3
Use a LONGCHAR data-type instead of MEMPTR in the Characters callback function thereby avoiding the GET-STRING function altogether