15db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangNon-standard hyphenation
25db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang------------------------
35db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
45db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSome languages use non-standard hyphenation; `discretionary'
55db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangcharacter changes at hyphenation points. For example,
65db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangCatalan: paral·lel -> paral-lel,
75db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangDutch: omaatje -> oma-tje,
85db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangGerman (before the new orthography): Schiffahrt -> Schiff-fahrt,
95db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangHungarian: asszonnyal -> asz-szony-nyal (multiple occurance!)
105db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSwedish: tillata -> till-lata.
115db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
125db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangUsing this extended library, you can define 
135db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangnon-standard hyphenation patterns. For example:
145db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
155db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangl·1l/l=l
165db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wanga1atje./a=t,1,3
175db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.schif1fahrt/ff=f,5,2
185db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.as3szon/sz=sz,2,3
195db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangn1nyal./ny=ny,1,3
205db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.til1lata./ll=l,3,2
215db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
225db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangor with narrow boundaries:
235db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
245db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangl·1l/l=,1,2
255db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wanga1atje./a=,1,1
265db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.schif1fahrt/ff=,5,1
275db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.as3szon/sz=,2,1
285db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangn1nyal./ny=,1,1
295db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang.til1lata./ll=,3,1
305db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
315db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangNote: Libhnj uses modified patterns by preparing substrings.pl.
325db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangUnfortunatelly, now the conversion step can generate bad non-standard
335db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangpatterns (non-standard -> standard pattern conversion), so using
345db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangnarrow boundaries may be better for recent Libhnj. For example,
355db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangsubstrings.pl generates a few bad patterns for Hungarian hyphenation
365db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangpatterns resulting bad non-standard hyphenation in a few cases. Using narrow
375db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangboundaries solves this problem. Java HyFo module can check this problem.
385db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
395db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSyntax of the non-standard hyphenation patterns
405db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang------------------------------------------------
415db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
425db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangpat1tern/change[,start,cut]
435db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
445db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangIf this pattern matches the word, and this pattern win (see README.hyphen)
455db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangin the change region of the pattern, then pattern[start, start + cut - 1]
465db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangsubstring will be replaced with the "change".
475db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
485db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangFor example, a German ff -> ff-f hyphenation:
495db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
505db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangf1f/ff=f 
515db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
525db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangor with expansion
535db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
545db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangf1f/ff=f,1,2
555db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
565db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangwill change every "ff" with "ff=f" at hyphenation.
575db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
585db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangA more real example:
595db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
605db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang% simple ff -> f-f hyphenation
615db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangf1f
625db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang% Schiffahrt -> Schiff-fahrt hyphenation
635db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang% 
645db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangschif3fahrt/ff=f,5,2
655db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
665db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSpecification
675db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
685db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang- Pattern: matching patterns of the original Liang's algorithm
695db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  - patterns must contain only one hyphenation point at change region
705db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang    signed with an one-digit odd number (1, 3, 5, 7 or 9).
715db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang    These point may be at subregion boundaries: schif3fahrt/ff=,5,1
725db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  - only the greater value guarantees the win (don't mix non-standard and
735db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang    non-standard patterns with the same value, for example
745db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang    instead of f3f and schif3fahrt/ff=f,5,2 use f3f and schif5fahrt/ff=f,5,2)
755db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
765db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang- Change: new characters.
775db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  Arbitrary character sequence. Equal sign (=) signs hyphenation points
785db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  for OpenOffice.org (like in the example). (In a possible German LaTeX
795db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  preprocessor, ff could be replaced with "ff, for a Hungarian one, ssz
805db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  with `ssz, according to the German and Hungarian Babel settings.)
815db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
825db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang- Start: starting position of the change region.
835db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  - begins with 1 (not 0): schif3fahrt/ff=f,5,2
845db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  - start dot doesn't matter: .schif3fahrt/ff=f,5,2
855db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  - numbers don't matter: .s2c2h2i2f3f2ahrt/ff=f,5,2
865db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  - In UTF-8 encoding, use Unicode character positions: össze/sz=sz,2,3
875db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang    ("össze" looks "össze" in an ISO 8859-1 8-bit editor). 
885db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
895db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang- Cut: length of the removed character sequence in the original word.
905db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang  - In UTF-8 encoding, use Unicode character length: paral·1lel/l=l,5,3
915db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang    ("paral·lel" looks "paral·1lel" in an ISO 8859-1 8-bit editor).
925db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
935db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangDictionary developing
945db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang---------------------
955db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
965db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangThere hasn't been extended PatGen pattern generator for non-standard
975db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wanghyphenation patterns, yet.
985db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
995db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangFortunatelly, non-standard hyphenation points are forbidden in the PatGen
1005db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wanggenerated hyphenation patterns, so with a little patch can be develop
1015db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangnon-standard hyphenation patterns also in this case.
1025db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
1035db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangWarning: If you use UTF-8 Unicode encoding in your patterns, call
1045db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangsubstrings.pl with UTF-8 parameter to calculate right
1055db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangcharacter positions for non-standard hyphenation:
1065db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
1075db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang./substrings.pl input output UTF-8
1085db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
1095db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangProgramming
1105db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang-----------
1115db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
1125db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangUse hyphenate2() or hyphenate3() to handle non-standard hyphenation.
1135db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSee hyphen.h for the documentation of the hyphenate*() functions.
1145db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangSee example.c for processing the output of the hyphenate*() functions.
1155db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
1165db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangWarning: change characters are lower cased in the source, so you may need
1175db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wangcase conversion of the change characters based on input word case detection.
1185db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangFor example, see OpenOffice.org source
1195db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang(lingucomponent/source/hyphenator/altlinuxhyph/hyphen/hyphenimp.cxx).
1205db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang
1215db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) WangLászló Németh
1225db78df27806d2eb07c14f86623a906df914b952Shimeng (Simon) Wang<nemeth (at) openoffice.org>
123