Kbase 20010: How to Estimate the Expansion of Database Size With Multilingual Applications Using Unicode Standard
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  26/11/2008 |
|
Status: Verified
GOAL:
How to estimate the expansion of database size with multilingual applications using Unicode Standard character set.
FACT(s) (Environment):
Progress 9.x
FIX:
Progress uses UTF-8 multi-byte encoding in Progress 9.x.
The expansion of database size is as follows:
1) Only character fields expand. Numeric, date, raw, and other data types do not expand, consequently the percentage of database size this represents does not change.
2) The portion of the database used for schema, Indexes, and other kinds of content does not change.
3) Any English or ASCII data is converted with no expansion. For example, if some portion of the data is ASCII, it will not expand. (Often parts numbers, and similar data are ASCII only.)
4) ISO 8859-1 data, that is not in the ASCII range, generally expands by two. For example, the word "résumé", the accented "e" becomes two bytes for each instance. The other characters remain one byte each since they are ASCII. The result here is that the 6-byte word becomes 8 bytes.
5) Double-byte characters generally become 3 byte characters, thus they expand by 150 percent.
6) Index keys for character fields are proportional to the text byte size. In the above example, the expansion of the 6-byte word to 8 bytes means its key also expands by 2 bytes.
7) For SQL-92 database, the character fields' field width "(n)" always means the maximum number of characters but not bytes of that field. So any SQL-92 character field with width "n" in an Unicode UTF-8 database always allows up to "n" UTF-8 characters which may require up to 3*n bytes. There's no need to change the schema, the SQL-92 engine takes care of it internally.
The only restriction is that SQL-92 doesn't allow the data who has more than "n" characters. So if connecting a SQL-92 application to a 4GL database including the UTF-8 one, those 4gl records who have longer 4gl character value than the SQL-92 "n" maximum character count, won't be returned. A solution is to goto the Data Dictionary to increase the default SQL-92 widths(= 2*(4gl format length)) for those fields.