Consultor Eletrônico

Status: Verified

GOAL:

I18N. Code page conversions and where they happen

GOAL:

Where the CodePage convertions happen?

FACT(s) (Environment):

All Supported Operating Systems
Progress/OpenEdge Versions

FIX:

Code page conversions happen many places in PROGRESS version 7.3A and above. Each of the following items can be identified with a unique code page. As a result, a conversion happens when the code pages are different. If the code pages of the two locations exchanging data are the same, no conversion happens. If one of the two locations exchanging data uses "undefined" as its code page, the data are not converted.

The user is expected to declare to PROGRESS what code page is used in each of these locations. Of course, each of these has defaults, based on the Western European language family. Given this knowledge, PROGRESS will automatically convert the data from one code page to another as needed.

Some of these conversions must be reversible without any loss of data. For example, if the database and the client internal processing use different code pages, PROGRESS will do the conversion between them. However, this conversion must be reversible so that no data is lost. A reversible conversion means
a 1-to-1 mapping between the code pages.

A non-reversible conversion is one where more than 1 character in the code page maps to a single character in the other code page. The ISO8859-1 to GERMAN-7-BIT bit conversion is an example of this. This conversion table is used for 7-bit terminals. It maps a } to u-umlaut. It also maps a } to a }. When a u-umlaut is in memory, and needs to be displayed on the 7-bit terminal, it is converted to a }. This, of course, assumes that the memory has been identified as using the ISO8859-1 code page with the -cpinternal parameter.

Data locations:

Version 7.3 and higher databases:

The database administrator must decide which code page will be used in the database and must label the database with the correct code page name. The database administrator must also make sure that the database's _db record contains the correct collation table for that code page. The database administrator can change the database code page and convert the database at
any time within a language family without affecting the rest of the system in any way. If the user attempts to change code pages between lanaguage families, he or she may experience data loss because not all characters appear in both language families.

The code page for a database is specified with the utility:

proutil <dbname> -C convchar convert <new-code-page-name>

The default for the empty database is ISO8859-1. The default for a database converted from version 6 is IBM850. There are a number of empty databases in the dlc/prolang subdirectories that have been already identified with code pages that are appropriate for the language family.

Version 6 databases:

The database administrator must determine which code page is contained in the database. Version 6 provides no way to label the database directly. Version 7 clients, however, must always be aware of which code page the database contains (so that the correct collation table can be chosen and the correct data conversion can happen). The database connection parameter which version 7 and above clients use to connect to a version 6 database is -cpdb <code page>. This specifies the code page of the version database. Data read from and written to the database with be converted from and to this code page. If this parameter
is not specified, the default is IBM850.

Internal Client Data:

This is the data which the client uses internally such as the data that the 4GL sees. Functions like ASC, CHR and LASTKEY are directly affected by the choice of code page. It is identified by the -charset or the -cpinternal startup parameter. This startup parameter is used by _progres, _mprosrv, _mproshut, _proutil,
_dbutil, _rfutil, and prolib to specify the code page used for internal data processing.

GUI Screen Input/Outp.ut:

This is always the same as the client's internal data. This is the code page used by the fonts and keyboard driver. This generally forces the choice of -cpinternal.

PROMSGS:

The PROMSGS file is labelled internally with the code page which it contains. The user has no control over this.

Stream Input and Output:

This includes INPUT FROM, OUTPUT TO, INPUT-OUTPUT THROUGH, standard in and standard out for batch jobs.

The default code page is IBM850. To override this, the user uses the startup parameter -stream or -cpstream to indicate what code page to use for these operations.

Character mode terminals:

The default code page used to input and output data to a character terminal is the one identified by the -stream or -cpstream startup parameter. However, this can be overridden with the -cpterm startup parameter.

Operating system data:

This includes filenames and environment variables. This depends on the operating system. For UNIX, we treat them the same as -cpinternal. For MS-Windows, the values are converted from the DOS code page to the MS-Windows code page.

Version 7 .p and .r files:

.r files always contain the data in the internal code page (-cpinternal). .p files always contain data in the -cpstream code page. Data in both cases refers to any literal values in the code such as labels, initial values, and constant values.

Version 8 and higher .p and .r files:

As a default, .r files always contain the data in the internal code page (-cpinternal). This default can be overridden. When writing a .r file, the code page can be indicated with the -cprcodeout start up parameter. When reading a .r file, the code page can be indicates with the -cprcodein startup parameter.

.p files always contain data in the -cpstream code page.

Data in both cases refers to any literal values in the code such as labels, initial values, and constant values.

.d and .df files:

These files follow the same basic rules as stream input and output do. However, the PROGRESS dictionary application implements some of it's own rules for handling this information.

When creating a .d or a .df file, the dictionary prompts the user for the code page to use when writing the data. The file's code page is identified in the trailer.

When the dictionary application is used to read one of these files, it uses the code page value in the trailer.

Version 7 .lg file:

The data in the version 7.3 .lg file was written using the -cpinternal code page.

Version 8 and higher .lg file:

The data in the version 8 .lg file is written using the value of the startup parameter -cplog. This defaults to -cpinternal. It is best for each user and server to use the same value for this parameter.

Version 7 OUTPUT TO PRINTER statement:

The data written to the printer in version 7 used the code page identified by the -cpstream startup parameter.

Version 8 and higher OUTPUT TO PRINTER statement:

The data written to the printer in version 8 used the code page identified by the -cpprint startup parameter. It defaults to the value of the -cpstream startup parameter.

Version 7 filenames stored in prolib:

The filenames stored in prolib are stored using the -cpinternal code page.

Version 8 and higher filenames stored in prolib:

In version 8, the user can mark the library with a code page when it is created. The filenames are then stored in prolib using this code page..