Consultor Eletrônico

Extended alphabet / characters with Progress V6

EXTENDED CHARACTERS WITH PROGRESS VERSION 6
=======================================================

Progress v6 uses the IBM codepage 850 internally. This creates some
problems with all kinds of different implementations of extended
characters. When dealing with extended characters you have to
make sure that you know the character sets that the terminals
and printers are using. That information is needed for setting
up conversions. Iso8859-1 codepage is used as an example in
this text, but you can set up other codepages as well.

This document only describes how to deal with terminals and printers
that require a different codepage than ibm850. It does not contain
information about using a database with a different codepage than
ibm850 or different collation than the ones supplied with
"proutil -C language" parameter.

This document has been written to explain any mysteries surrounding
international characters and codepages. They are really just
integer bytes that represent a character. What makes this difficult
is that you will never know how a software package handles these
before you try it. To get the best results you should always
know what a printer queue or file transfer program does to these
characters. Some of them silently change the codepage and you can
spend hours trying to figure out why the printer prints funny
characters.

Terminals & terminal emulators
=======================================================

Most terminals can use the ISO 8859-1 (Latin-1) character set. Some
can do IBM codepage 850. When setting up the environment, it is
in almost every case easier to use iso8859-1, because it is adopted
as a standard in many Unix environments. If you cannot do 8-bit
characters, you must set up conversion from 7-bit national
characters but in many cases you will lose curly braces and
backslash/pipe characters because 7-bit extended chars are mapped
to these characters. In DEC terminals (And most VT??? emulators)
there is also a code page called "DEC Multinational" which is very
close to iso8859-1, and you can probably use this character set on the
terminal and set it up as iso8859-1 on the Prgress side.

To determine the character set used in your particular terminal
do the following on a Unix box:

$ cat > umlauts

Type some extended characters like Aumlaut or Oumlaut, then press
Enter followed by CTRL-D. Then do a hex dump of the file:

$ od -x umlauts

You should see something like this:

0000000 d6d6 d6c4 c4c4 0a00
0000007

This file contains (In iso8859-1 charset) 3 Oumlauts and 3
Aumlauts. Compare the hex codes to the ones listed in a codepage
table to determine the character set. You can find a cp table for
ibm850 in v6 Programming Handbook chapter 2. For other codepages
you can try finding tables in printer or terminal manuals.

Protermcap
=======================================================

To use anything other than IBM codepage 850 with Progress v6 you must
set up translations in the Protermcap file. Chapter 2 of Programming
Handbook contains information on how to do this.

Basically, you must do a conversion table from your character set to
ibm850 which Progress uses internally. If you have determined that
your terminal uses iso8859-1 character set, do the following steps.

1) Create the translation table and add it to Protermcap

You should create a sequence of IN (And possibly OUT) definitions
in protermcap that define how to convert characters. The following
is an example of iso8859-1 conversion:

iso8859-1:latin-1:ISO 8859-1 character set: :IN(\304)=\216: :IN(\344)=\204: :IN(\326)=\231: :IN(\366)=\224: :IN(\334)=\232: :IN(\374)=\201:

This defines the conversion for Aumlaut, Oumlaut and Uumlaut.
The numbers are in octal and you must set up both upper and lower
case. Note that you don't have to specify any OUT mappings
unless you want to map 2 or more different characters to one
on output. You can then add this conversion table to definitions
for terminal type by adding ":tc=iso8859-1:" to the end of terminal
definitions in protermcap (Remember to add backslash to the
preceding line). Or you can use TERM=vt220/iso8859-1 to do the
conversion, but this may break some Unix shell commands (No such
terminal in termcap or terminfo).

2) Try the conversion

You can write a small program that uses readkey and displays the
ASCII code of the character. Here is an example:

repeat:
readkey.
disp asc(keyfunction(lastkey)).
end.

Now you should see ibm850 character codes displayed.

Database
=======================================================

If you have data in your database that is not in ibm850 code page
you should write a program that displays ASCII codes of the
characters in strings and figure out what is the character set.
Then you can either write a conversion program or dump the data,
run it through some filter program (Such as tr in Unix) and
reload it.

Printing and input/output files
=======================================================

At this point your terminals and db should be fine. Some important
points to remember.

When outputting to a file or printer, you can use the MAP option
of OUTPUT TO statement to specify the character set. Without
this option Progress uses the one you have set up for your terminal.
You can use the NO-MAP option to output (Or input) data without
any conversion (Then it should be in ibm850 codepage).

You can create as many translations as you like and refer to them in
your programs, so you should be able to handle different printers
and transfer files to other systems.

Suggested reading
=======================================================

Progress v6 Programming Handbook chapter 2

Any character set tables you are able to obtain (These are sometimes
quite handy and usually you can find them in printer manuals etc.)

Progress Software Technical Support Note # 15400