15db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangNon-standard hyphenation 25db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang------------------------ 35db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 45db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSome languages use non-standard hyphenation; `discretionary' 55db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangcharacter changes at hyphenation points. For example, 65db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangCatalan: paral·lel -> paral-lel, 75db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangDutch: omaatje -> oma-tje, 85db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangGerman (before the new orthography): Schiffahrt -> Schiff-fahrt, 95db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangHungarian: asszonnyal -> asz-szony-nyal (multiple occurance!) 105db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSwedish: tillata -> till-lata. 115db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 125db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangUsing this extended library, you can define 135db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangnon-standard hyphenation patterns. For example: 145db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 155db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangl·1l/l=l 165db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wanga1atje./a=t,1,3 175db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.schif1fahrt/ff=f,5,2 185db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.as3szon/sz=sz,2,3 195db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangn1nyal./ny=ny,1,3 205db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.til1lata./ll=l,3,2 215db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 225db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangor with narrow boundaries: 235db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 245db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangl·1l/l=,1,2 255db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wanga1atje./a=,1,1 265db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.schif1fahrt/ff=,5,1 275db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.as3szon/sz=,2,1 285db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangn1nyal./ny=,1,1 295db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.til1lata./ll=,3,1 305db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 315db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangNote: Libhnj uses modified patterns by preparing substrings.pl. 325db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangUnfortunatelly, now the conversion step can generate bad non-standard 335db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangpatterns (non-standard -> standard pattern conversion), so using 345db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangnarrow boundaries may be better for recent Libhnj. For example, 355db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangsubstrings.pl generates a few bad patterns for Hungarian hyphenation 365db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangpatterns resulting bad non-standard hyphenation in a few cases. Using narrow 375db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangboundaries solves this problem. Java HyFo module can check this problem. 385db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 395db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSyntax of the non-standard hyphenation patterns 405db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang------------------------------------------------ 415db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 425db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangpat1tern/change[,start,cut] 435db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 445db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangIf this pattern matches the word, and this pattern win (see README.hyphen) 455db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangin the change region of the pattern, then pattern[start, start + cut - 1] 465db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangsubstring will be replaced with the "change". 475db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 485db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangFor example, a German ff -> ff-f hyphenation: 495db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 505db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangf1f/ff=f 515db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 525db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangor with expansion 535db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 545db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangf1f/ff=f,1,2 555db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 565db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangwill change every "ff" with "ff=f" at hyphenation. 575db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 585db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangA more real example: 595db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 605db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang% simple ff -> f-f hyphenation 615db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangf1f 625db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang% Schiffahrt -> Schiff-fahrt hyphenation 635db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang% 645db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangschif3fahrt/ff=f,5,2 655db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 665db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSpecification 675db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 685db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang- Pattern: matching patterns of the original Liang's algorithm 695db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang - patterns must contain only one hyphenation point at change region 705db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang signed with an one-digit odd number (1, 3, 5, 7 or 9). 715db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang These point may be at subregion boundaries: schif3fahrt/ff=,5,1 725db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang - only the greater value guarantees the win (don't mix non-standard and 735db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang non-standard patterns with the same value, for example 745db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang instead of f3f and schif3fahrt/ff=f,5,2 use f3f and schif5fahrt/ff=f,5,2) 755db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 765db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang- Change: new characters. 775db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang Arbitrary character sequence. Equal sign (=) signs hyphenation points 785db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang for OpenOffice.org (like in the example). (In a possible German LaTeX 795db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang preprocessor, ff could be replaced with "ff, for a Hungarian one, ssz 805db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang with `ssz, according to the German and Hungarian Babel settings.) 815db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 825db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang- Start: starting position of the change region. 835db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang - begins with 1 (not 0): schif3fahrt/ff=f,5,2 845db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang - start dot doesn't matter: .schif3fahrt/ff=f,5,2 855db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang - numbers don't matter: .s2c2h2i2f3f2ahrt/ff=f,5,2 865db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang - In UTF-8 encoding, use Unicode character positions: össze/sz=sz,2,3 875db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang ("össze" looks "össze" in an ISO 8859-1 8-bit editor). 885db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 895db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang- Cut: length of the removed character sequence in the original word. 905db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang - In UTF-8 encoding, use Unicode character length: paral·1lel/l=l,5,3 915db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang ("paral·lel" looks "paral·1lel" in an ISO 8859-1 8-bit editor). 925db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 935db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangDictionary developing 945db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang--------------------- 955db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 965db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangThere hasn't been extended PatGen pattern generator for non-standard 975db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wanghyphenation patterns, yet. 985db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 995db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangFortunatelly, non-standard hyphenation points are forbidden in the PatGen 1005db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wanggenerated hyphenation patterns, so with a little patch can be develop 1015db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangnon-standard hyphenation patterns also in this case. 1025db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 1035db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangWarning: If you use UTF-8 Unicode encoding in your patterns, call 1045db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangsubstrings.pl with UTF-8 parameter to calculate right 1055db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangcharacter positions for non-standard hyphenation: 1065db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 1075db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang./substrings.pl input output UTF-8 1085db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 1095db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangProgramming 1105db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang----------- 1115db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 1125db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangUse hyphenate2() or hyphenate3() to handle non-standard hyphenation. 1135db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSee hyphen.h for the documentation of the hyphenate*() functions. 1145db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSee example.c for processing the output of the hyphenate*() functions. 1155db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 1165db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangWarning: change characters are lower cased in the source, so you may need 1175db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangcase conversion of the change characters based on input word case detection. 1185db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangFor example, see OpenOffice.org source 1195db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang(lingucomponent/source/hyphenator/altlinuxhyph/hyphen/hyphenimp.cxx). 1205db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang 1215db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangLászló Németh 1225db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang<nemeth (at) openoffice.org> 123