Consultor Eletrônico



Kbase 30109: I18N. How to use the new Version 9 word-break table feature for those countries that need a collation other than Basic.
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   5/10/1998
Solution ID: P109

GOAL:

I18N. How to use the new Version 9 word-break table feature for those countries that need a collation other than Basic (for example; Swedish, Norwegian, Danish).

GOAL:

QBW syntax error - an asterisk (*) is allowed only at the end of a word. (4686)

FACT(s) (Environment):

Progress 9.x

FIX:

When Progress has to maintain a word-index, each time a record is created/updated, Progress splits the word-indexed field or array into words and adds/updates one entry in the word-index for each word in the field. To tell which character in the field is part of a word and which is not, Progress uses word-break tables. For collations other than Basic, a further step is involved: converting each character in the field to its sort weight.

Prior to Version 9, Progress first converted the characters in the word-indexed field into their sort weight, and then broke down the field into words. While this poses no problems for Basic collation (no conversion is performed), for non-Basic collations the default word-break table causes the CONTAINS operator not to work as expected. And building a customized word-break table can be tricky.

Progress Solution 18429 explains how to work around this problem for Version 8.X: in a nutshell, word-break tables must be built by referring to characters' sort weights for the target collation, rather than to characters' ASCII values.

With Version 9, a new type of word-break table has been introduced which resolves the problem altogether. By using this new type of word-break table, you are allowed to always refer to characters' ASCII values, no matter what the target collation is.

What distinguishes a new table from an old one is the presence of a header at the beginning of the word-break table.

/*
* Sample word-break table in version 9
*/

/* This header was introduced in version 9 */
version = 9
codepage = <target code page, eg: iso8859-1>
wordrules-name = <generic name for the word-break table, eg: swedish>
type = 3

/* The word_attr field has the same format as before */
word_attr = { ..... }

/*
* End of sample word-break table
*/

Currently, "version" must be "9" and "type" must be "3". The "codepage" field must match the code page in the target database.

Except for this header, new word-break tables must be compiled and used just like the old ones.

NOTES:
-- If you need to migrate from an old work-break table to a new one, word indexes must be rebuilt once the new word-break table is set up.

-- The new word-break table does work in conjunction with the -ttwrdrul startup parameter (Word-break rule for temp-tables).

FIX:

References to Written Documentation:

Progress Internationalization Guide for Version 9, Section 3.2

Progress Solutions:

18429, "Creating a word-break File for International Collation"
20441, "-ttwrdrul and Applying Word Rules to Temp Tables"