Kbase 16399: Definitions of internationalization ( i18n ) terms.
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  09/02/2000 |
|
INTRODUCTION
============
This knowledgebase will explain some of the standard terminology used in internationalization:
1. DBE & SBE
2. SBCS & SBCS
3. Lead-byte & Trail-byte
4. IME
note: All information below assumes an 8-bit system.
STEP BY STEP DETAILS
====================
1. DBE = Double Byte Enabled
SBE = Single Byte Enabled
-----------------------------
Whether an application, including operating system, is SBE
or DBE determines the maximum number of characters that it
can display. Usually, we differentiate a DBE application
but not an SBE application. If an application is not
refered to as being DBE then it is assumed to be SBE.
An SBE application uses one byte (a "single" byte) to
display characters. Therefore, a maximum of 256 characters
can be displayed.
On the other hand, a DBE application uses two bytes (a
"double" byte) to display characters so a maximum of 65,536
characters can be displayed. However, most applications do
not use the entire range of 256 values for the two bytes so
that only several thousand characters are available
(see Lead-Byte and Trail-Byte below).
2. DBCS = Double Byte Character Set
SBCS = Single Byte Character Set
------------------------------------
A character set is the visual representation of byte values.
In other words, the letters, punctuation and symbols that
we see are represtative of numeric values that are stored in
a Progress database (or some other system file). Some
examples of character sets are ASCII and EBCIDIC. Also, the
extended characters (sometimes refered to as "extended ASCII",
which incidentaly is incorrect) that reside in the 128 - 256
range are usually defined in a character set.
The "map" between a character set and the numeric values
that the set represents is called a codepage. Some examples
of codepages are: ibm850, ISO-8859-1, ISO-8859-5 and
SHIFT-JIS.
Take, for example, the ASCII character "A." To you and me
this is the letter "capital A." To the application,
however, it is the byte whose value is 65 (this is true for
any codepage that maps the ASCII character set).
An SBCS is a character set that is useable in a SBE
environment. A DBCS is a character set that can only be
represented in a DBE environment.
An SBCS codepage maps each individual value to a particular
character. A DBCS codepage maps most individual values to a
particular character but also maps combinations of two values
to a particular character.
Korean, Simplified Chinese, Traditional Chinese and Japanese
are represented with DBCS's. All other languages are
represented with SBCS's.
3. Lead-Byte and Trail-Byte
----------------------------
The lead-byte is the first value of a double byte character.
The trail-byte is the second value in a double byte character.
When the application comes across a lead-byte it knows to
expect a trail-byte to follow. If a trail-byte does not
follow an error occurs.
With all codepages that Progress supports the lead-bytes are
in the range of characters above 127 (all DBCS that Progress
supports define the ASCII character set in the lower 128
values {0 - 127}). In these codepages the trail-bytes are
also in the upper range of 128 values.
4. IME = Input Method Editor
-----------------------------
The IME is what enables you to enter double byte characters
using a "standard" keyboard. You can configure the IME to
allow you to enter single byte characters or double byte
characters. If the IME is configured for entering double
byte then the character pairs that you enter are converted
to double byte characters instead (assuming that what you
enter is a valid character combination).
References to written documentation
====================================
Kbase 16392 Double Byte Character Set facts and information
Progress Software Technical Support Note # 16399