3

我正在查看从 OpenOffice 站点下载的连字符算法,但在查看评论后我无法理解参数repposcut的用途。有知识的人可以告诉我这些参数的作用吗?以下是评论。

从这个例子来看,似乎ff可以用单个f替换,但这与连字符有什么关系?

谢谢,


/*

int hnj_hyphen_hyphenate2(): non-standard hyphenation.

(It supports Catalan, Dutch, German, Hungarian, Norwegian, Swedish etc. orthography, see documentation.)

input data: word: input word word_size: byte length of the input word

hyphens: allocated character buffer (size = word_size + 5) hyphenated_word: allocated character buffer (size ~ word_size * 2) or NULL rep, pos, cut: pointers (point to the allocated and zeroed buffers (size=word_size) or with NULL value) or NULL

output data: hyphens: hyphenation vector (hyphenation points signed with odd numbers) hyphenated_word: hyphenated input word (hyphens signed with ='), optional (NULL input) rep: NULL (only standard hyph.), or replacements (hyphenation points signed with=' in replacements); pos: NULL, or difference of the actual position and the beginning positions of the change in input words; cut: NULL, or counts of the removed characters of the original words at hyphenation,

Note: rep, pos, cut are complementary arrays to the hyphens, indexed with the character positions of the input word.

For example: Schiffahrt -> Schiff=fahrt, pattern: f1f/ff=f,1,2 output: rep[5]="ff=f", pos[5] = 1, cut[5] = 2

Note: hnj_hyphen_hyphenate2() can allocate rep, pos, cut (word_size length arrays):

char ** rep = NULL; int * pos = NULL; int * cut = NULL; char hyphens[MAXWORDLEN]; hnj_hyphen_hyphenate2(dict, "example", 7, hyphens, NULL, &rep, &pos, &cut);

See example in the source distribution.

*/

int hnj_hyphen_hyphenate2 (HyphenDict *dict, const char *word, int word_size, char * hyphens, char *hyphenated_word, char * rep, int ** pos, int ** cut);

4

1 回答 1

3

我相信您指的是以下评论:

// 例如:
// Schiffahrt -> Schiff=fahrt,
// 模式:f1f/ff=f,1,2
// 输出:rep[5]="ff=f", pos[5] = 1, cut[5] = 2

该示例引用了 1990 年代拼写改革之前的德语断字规则。德语中的复合名词写成一个单词,根据旧规则,如果后面有元音,则省略第三个辅音,例如“Schifffahrt”(由“Schiff”和“Fahrt”组成)中的“f” ('Schifffahrt' 写成 'Schiffahrt'),但在连字符时仍然写了省略的字母。

所以这个例子的意思不是'ff'可以用一个'f'代替,而是'ff'可以用'ff-f'代替。

因此,参数的含义是:

  • rep: 包含用于代替 'ff' 的替换 'ff-f'
  • pos: 值 1 表示替换在 5 的连字符位置之前开始一个字母
  • cut:值为 2 表示需要从输入单词中删除 2 个字符。

这些参数似乎仅用于在连字符时单词拼写不同的罕见情况。

于 2010-11-11T22:34:08.360 回答