Kbase 16242: I18N. How CONVMAP Relates to Code Page and Collation Settings
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  15/10/2008 |
|
Status: Verified
GOAL:
Relationship of the convmap (Conversion Map) client internationalization startup parameter to code pages and collation settings.
CAUSE:
The convmap.cp file is a binary file that contains all of the tables that Progress uses for character management.
FIX:
If you work in an international environment, you use the convmap startup parameter. Progress usually makes use of the parameter behind the scenes, however there might be times when it is necessary to modify the convmap file for a particular environment.
The compiled version of CONVMAP (what the Progress executable accesses) is named (convmap.cp) and is located in the \DLC directory. The text source of this file is (convmap.dat) and is located in the \DLC\PROLANG directory.
If you need to modify CONVMAP, use PROUTIL to recompile it.
Prior to recompile all the DLC/prolang/convmap/*.dat files should be where you plan to do the codepage-compile.
The syntax for the recompile is:
proutil -C codepage-compiler convmap.dat convmap.cp
Place the resulting file (convmap.cp) in your \DLC Directory. The next time you run Progress, the new file is accessed.
The following lists the convmap format and the instructions to modify and recompile the convmap.cp file:
There are several sections of the convmap file. Each section contains entries for several languages. Each entry is comprised of a flag of 0 or 1 for each of the 256 characters that are supported by 8 bit character sets.
Valid sections are:
code page - code page definition section.
ISALPHA - Defines alphabetic characters.
LEAD-BYTE - Defines lead byte range(s).
TRAIL-BYTE - Defines trail byte range(s).
CASE - Used in case translation functions.
UPPERCASE-MAP - Upper case translation definitions.
LOWERCASE-MAP - Lower case translation definitions.
COLLATION - Sort ordering definition section.
CASE-INSENSITIVE-SORT - Insensitive sort order (weight).
CASE-SENSITIVE-SORT - Sensitive sort order (weight).
CONVERT - Conversion map between code pages.
Each section contains a header, keywords, and the 256-character flags.
The code page definition section:
NOTE: In the following example of a code page definition section, the ellipses (...) indicate several columns that were removed because the actual section is too wide to fit in this page:
#-----------------------------------------------------------------
# This table contains the attributes for code page iso8859-1
code page
CODEPAGE-NAME "ISO8859-1"
TYPE "1"
ISALPHA
/*000-015*/ 000 000 000 000 000 000 000 000 ... 000 000 000 000
/*016-031*/ 000 000 000 000 000 000 000 000 ... 000 000 000 000
/*032-047*/ 000 000 000 000 000 000 000 000 ... 000 000 000 000
/*048-063*/ 000 000 000 000 000 000 000 000 ... 000 000 000 000
/*064-079*/ 000 001 001 001 001 001 001 001 ... 001 001 001 001
/*080-095*/ 001 001 001 001 001 001 001 001 ... 000 000 000 000
/*096-111*/ 000 001 001 001 001 001 001 001 ... 001 001 001 001
/*112-127*/ 001 001 001 001 001 001 001 001 ... 000 000 000 000
/*128-143*/ 000 000 000 000 000 000 000 000 ... 000 000 000 000
/*144-159*/ 000 000 000 000 000 000 000 000 ... 000 000 000 000
/*160-175*/ 000 000 000 000 000 000 000 000 ... 000 000 000 000
/*176-191*/ 000 000 000 000 000 000 000 000 ... 000 000 000 000
/*192-207*/ 001 001 001 001 001 001 001 001 ... 001 001 001 001
/*208-223*/ 001 001 001 001 001 001 001 000 ... 001 001 001 001
/*224-239*/ 001 001 001 001 001 001 001 001 ... 001 001 001 001
/*240-255*/ 001 001 001 001 001 001 001 000 ... 001 001 001 001
ENDTABLE
ENDCODEPAGE
The header of the above code page definition section consists of the code page keyword (that indicates the beginning of the code page definition), the CODEPAGE-NAME keyword (that indicates the code page Name), the TYPE keyword (that. indicates whether the code page is single or double byte), and the ISALPHA keyword (that indicates the beginning of the alphabetic character map).
The map of character flags indicates whether or not each character is (001) or is not (000) alphabetic. This block of the convmap file is followed by the ENDTABLE keyword that indicates the map block is complete.
The ENDCODEPAGE keyword indicates the end of the definition for this code page.
The CODE PAGE section:
This section of the convmap.cp file defines the code page name, type (1 = single byte, 2 = double byte) and related attributes.
- ISALPHA maps the characters that are considered to be alphabetic or non-alphabetic.
- LEAD-BYTE and TRAIL-BYTE map the characters that are used as lead or trail bytes for double-byte languages (such as Japanese, Korean, traditional and simplified Chinese).
The most likely reason to modify this section is to add support for Gaiji characters.
The CASE section:
This section provides character maps for the UPPERCASE and LOWERCASE functions. When the caps function is used, Progress refers to the UPPERCASE-MAP table to determine what lower-case characters should be converted as well as what character they should be converted into. The LOWERCASE-MAP table is used by the LC function.
The COLLATION section:
The primary purpose of the collation table in the convmap.cp file is to provide the Progress sorting algorithm with a sort weight for each character. The sort weight determines the sort order for a character. The heavier the character, the further down it falls in the sort order.
The collation table is also used for character equality testing. In some languages, a character that appears physically different might be interchangeable with another character that is similar (such as capital A and capital A-accent in the ISO8859-1 code page). Therefore, these two characters sort with the same
weight and in an equality test, they come out as being equal.
NOTE: In this section, the character positions do not have only 001 and 000 to denote a true or false state. Instead, numbers from 0 to 255 are used to denote the character's position sort weight. Also, since different character positions can have the same sort weight, some of the numbers in this range might go unused.
The CONVERT section:
The conversion tables help facilitate running Progress across multiple platforms and/or configurations.
For example, if you run a Progress U.S. Windows 3.x client against a U.S. Progress UNIX server, there are at least two (maybe more) different code pages being used. The Windows client uses the ISO8859-1 code page to display and input characters, and the UNIX server is likely to use the same code page but might use
a different one.
If the Windows client needs to perform any disk I/O on its local drive, it uses the IBM850 code page.
Since there are potentially several code pages that it makes use of at a given time, Progress needs a method of converting the character values between the code pages. The CONVERT tables are used for this.
One thing to keep in mind is that Progress can only convert between code pages that use the same basic characters (such as the LATIN 1 character set that is used in IBM850 and ISO8859-1).These conversions are more for converting between operating systems and are not used to convert between languages.
For example, since there are many more characters in the Russian alphabet than in the English alphabet, there is no way to provide a one-to-one map (or conversion). To do this, the application (and/or data) needs to be actually translated.
NOTE: As in COLLATIONs, this section of the convmap file uses the number range of 0 to 255. Since multiple characters from the source code page might convert to the same character in the target code p.age, there might be duplicate values in this table.
Reference to Written Documentation:
Progress System Reference Guide, Appendix A..