Kbase 15369: Code page conversions and where they happen
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  5/10/1998 |
|
Code page conversions and where they happen
Code page conversions happen many places in PROGRESS version 7.3A
and above. Each of the following items can be identified with a
unique code page. As a result, a conversion happens when the code
pages are different. If the code pages of the two locations
exchanging data are the same, no conversion happens. If one of the
two locations exchanging data uses "undefined" as its code page,
the data are not converted.
The user is expected to declare to PROGRESS what code page is
used in each of these locations. Of course, each of these has
defaults, based on the Western European language family. Given
this knowledge, PROGRESS will automatically convert the data from
one code page to another as needed.
Some of these conversions must be reversible without any loss of
data. For example, if the database and the client internal
processing use different code pages, PROGRESS will do the
conversion between them. However, this conversion must be
reversible so that no data is lost. A reversible conversion means
a 1-to-1 mapping between the code pages.
A non-reversible conversion is one where more than 1 character in
the code page maps to a single character in the other code page.
The ISO8859-1 to GERMAN-7-BIT bit conversion is an example of
this. This conversion table is used for 7-bit terminals. It maps
a } to u-umlaut. It also maps a } to a }. When a u-umlaut is
in memory, and needs to be displayed on the 7-bit terminal, it
is converted to a }. This, of course, assumes that the memory
has been identified as using the ISO8859-1 code page with
the -cpinternal parameter.
Data locations:
===============
Version 7.3 and above databases:
================================
The database administrator must decide which code page will
be used in the database and must label the database with the
correct code page name. The database administrator must also
make sure that the database's _db record contains the correct
collation table for that code page. The database administrator
can change the database code page and convert the database at
any time within a language family without affecting the rest of
the system in any way. If the user attempts to change code pages
between lanaguage families, he or she may experience data loss because
not all characters appear in both language families.
The code page for a database is specified with the utility:
proutil <dbname> -C convchar convert <new-code-page-name>
The default for the empty database is ISO8859-1. The default for
a database converted from version 6 is IBM850. There are a number
of empty databases in the dlc/prolang subdirectories that have
been already identified with code pages that are appropriate for
the language family.
Version 6 databases:
====================
The database administrator must determine which code page is
contained in the database. Version 6 provides no way to label
the database directly. Version 7 clients, however, must always
be aware of which code page the database contains (so that
the correct collation table can be chosen and the correct data
conversion can happen). The database connection parameter which
version 7 and above clients use to connect to a version 6
database is -cpdb <code page>. This specifies the code page of
the version database. Data read from and written to the database
with be converted from and to this code page. If this parameter
is not specified, the default is IBM850.
Internal Client Data:
=====================
This is the data which the client uses internally such as the
data that the 4GL sees. Functions like ASC, CHR and LASTKEY are
directly affected by the choice of code page. It is identified by
the -charset or the -cpinternal startup parameter. This startup
parameter is used by _progres, _mprosrv, _mproshut, _proutil,
_dbutil, _rfutil, and prolib to specify the code page used for
internal data processing.
GUI Screen Input/Output:
========================
This is always the same as the client's internal data. This is
the code page used by the fonts and keyboard driver. This
generally forces the choice of -cpinternal.
PROMSGS:
========
The PROMSGS file is labelled internally with the code page which
it contains. The user has no control over this.
Stream Input and Output:
========================
This includes INPUT FROM, OUTPUT TO, INPUT-OUTPUT THROUGH,
standard in and standard out for batch jobs.
The default code page is IBM850. To override this, the user uses
the startup parameter -stream or -cpstream to indicate what code
page to use for these operations.
Character mode terminals:
=========================
The default code page used to input and output data to a
character terminal is the one identified by the -stream or
-cpstream startup parameter. However, this can be overridden
with the -cpterm startup parameter.
Operating system data:
======================
This includes filenames and environment variables. This depends
on the operating system. For UNIX, we treat them the same as
-cpinternal. For MS-Windows, the values are converted from the
DOS code page to the MS-Windows code page.
Version 7 .p and .r files:
==========================
.r files always contain the data in the internal code page
(-cpinternal). .p files always contain data in the -cpstream
code page. Data in both cases refers to any literal values in
the code such as labels, initial values, and constant values.
Version 8 .p and .r files:
==========================
As a default, .r files always contain the data in the internal
code page (-cpinternal). This default can be overridden. When
writing a .r file, the code page can be indicated with the
-cprcodeout start up parameter. When reading a .r file, the code
page can be indicates with the -cprcodein startup parameter.
.p files always contain data in the -cpstream code page.
Data in both cases refers to any literal values in the code such
as labels, initial values, and constant values.
.d and .df files:
=================
These files follow the same basic rules as stream input and
output do. However, the PROGRESS dictionary application
implements some of it's own rules for handling this information.
When creating a .d or a .df file, the dictionary prompts the
user for the code page to use when writing the data. The file's
code page is identified in the trailer.
When the dictionary application is used to read one of these
files, it uses the code page value in the trailer.
Version 7 .lg file:
===================
The data in the version 7.3 .lg file was written using the
-cpinternal code page.
Version 8 .lg file:
===================
The data in the version 8 .lg file is written using the value of
the startup parameter -cplog. This defaults to -cpinternal. It
is best for each user and server to use the same value for this
parameter.
Version 7 OUTPUT TO PRINTER statement:
======================================
The data written to the printer in version 7 used the code
page identified by the -cpstream startup parameter.
Version 8 OUTPUT TO PRINTER statement:
======================================
The data written to the printer in version 8 used the code
page identified by the -cpprint startup parameter. It defaults
to the value of the -cpstream startup parameter.
Version 7 filenames stored in prolib:
=====================================
The filenames stored in prolib are stored using the -cpinternal
code page.
Version 8 filenames stored in prolib:
=====================================
In version 8, the user can mark the library with a code page
when it is created. The filenames are then stored in prolib
using this code page.
Progress Software Technical Support Note # 15369