Kbase 30109: I18N. How to use the new Version 9 word-break table feature for those countries that need a collation other than Basic.
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  5/10/1998 |
|
Solution ID: P109
GOAL:
I18N. How to use the new Version 9 word-break table feature for those countries that need a collation other than Basic (for example; Swedish, Norwegian, Danish).
GOAL:
QBW syntax error - an asterisk (*) is allowed only at the end of a word. (4686)
FACT(s) (Environment):
Progress 9.x
FIX:
When Progress has to maintain a word-index, each time a record is created/updated, Progress splits the word-indexed field or array into words and adds/updates one entry in the word-index for each word in the field. To tell which character in the field is part of a word and which is not, Progress uses word-break tables. For collations other than Basic, a further step is involved: converting each character in the field to its sort weight.
Prior to Version 9, Progress first converted the characters in the word-indexed field into their sort weight, and then broke down the field into words. While this poses no problems for Basic collation (no conversion is performed), for non-Basic collations the default word-break table causes the CONTAINS operator not to work as expected. And building a customized word-break table can be tricky.
Progress Solution 18429 explains how to work around this problem for Version 8.X: in a nutshell, word-break tables must be built by referring to characters' sort weights for the target collation, rather than to characters' ASCII values.
With Version 9, a new type of word-break table has been introduced which resolves the problem altogether. By using this new type of word-break table, you are allowed to always refer to characters' ASCII values, no matter what the target collation is.
What distinguishes a new table from an old one is the presence of a header at the beginning of the word-break table.
/*
* Sample word-break table in version 9
*/
/* This header was introduced in version 9 */
version = 9
codepage = <target code page, eg: iso8859-1>
wordrules-name = <generic name for the word-break table, eg: swedish>
type = 3
/* The word_attr field has the same format as before */
word_attr = { ..... }
/*
* End of sample word-break table
*/
Currently, "version" must be "9" and "type" must be "3". The "codepage" field must match the code page in the target database.
Except for this header, new word-break tables must be compiled and used just like the old ones.
NOTES:
-- If you need to migrate from an old work-break table to a new one, word indexes must be rebuilt once the new word-break table is set up.
-- The new word-break table does work in conjunction with the -ttwrdrul startup parameter (Word-break rule for temp-tables).
FIX:
References to Written Documentation:
Progress Internationalization Guide for Version 9, Section 3.2
Progress Solutions:
18429, "Creating a word-break File for International Collation"
20441, "-ttwrdrul and Applying Word Rules to Temp Tables"