我有两个制表符分隔的文件。一个包含引理和词干,另一个包含形成语法形式所需的内容。
文件(引理和词干):
Lemma Stem Pos
ablakzár ablakz noun
adminisztrátorlány adminisztrátorl noun
...
....
文件(后缀):
suffix
[r]as
[r][r]er
...
.....
遵循和输出的规则:
Lemma Stem Suffix Output
ablakzár ablakz [r]as ablakzras
adminisztrátorlány adminisztrátorl [r][r]er adminisztrátorlnnyer
These are the grammar forms that I would have to create from the two lemmas:
ablakzras
adminisztrátorlnnyer
也就是说,如果我只在括号中找到一个字母,我选择引理的最后一个辅音并将它添加到词干,如果我在括号中找到两个字母,我将最后一个辅音加倍并将它添加到词干。还添加了括号中字母之后的内容。
双辅音表:
Single: b c cs d dz dzs f g gy h j k l ly m n ny p q r s sz t ty v w x y z zs
Doubles: bb cc ccs dd ddz ddzs ff gg ggy hh jj kk ll lly mm nn nny pp qq rr ss ssz tt tty vv ww xx yy zz zzs
最后,我自己解决了这个问题。我展示了解决方案,以防它适用于任何 OP:
BEGIN {
OFS=FS="\t";
while ((getline line < file ) > 0)
{
models[++c]=line;
}
v="a o u ö ü e i á ó ú ő ű é í";
a1=split(v,vocals," ");
doubled_consonants["b"]="bb"; doubled_consonants["c"]="cc";
doubled_consonants["cs"]="ccs"; doubled_consonants["d"]="dd";
doubled_consonants["dz"]="ddz"; doubled_consonants["dzs"]="ddzs";
doubled_consonants["f"]="ff"; doubled_consonants["g"]="gg";
doubled_consonants["gy"]="ggy"; doubled_consonants["h"]="hh";
doubled_consonants["j"]="jj"; doubled_consonants["k"]="kk";
doubled_consonants["l"]="ll"; doubled_consonants["ly"]="lly";
doubled_consonants["m"]="mm"; doubled_consonants["n"]="nn";
doubled_consonants["ny"]="nny"; doubled_consonants["p"]="pp";
doubled_consonants["q"]="qq"; doubled_consonants["r"]="rr";
doubled_consonants["s"]="ss"; doubled_consonants["sz"]="ssz";
doubled_consonants["t"]="tt"; doubled_consonants["ty"]="tty";
doubled_consonants["v"]="vv"; doubled_consonants["w"]="ww";
doubled_consonants["x"]="xx"; doubled_consonants["y"]="yy";
doubled_consonants["z"]="zz"; doubled_consonants["zs"]="zzs";
}
{
s1=split($1,lemma_letters,"")
for (i=1; i<=c; i++)
{
s2=split(mod[i],model,"\t");
s3=split(model[4],suffix_letters,"");
for (j=1; j<=s3; j++)
{
switch (suffix_letters[j]) {
case "[":
wz=extrac_consonant($1,s1,doubled_consonants)
wa=double_single(j,s3,suffix_letters)
if (wa == 0)
{
tp=tp wz;
j+=2;
}
else
{
tp=tp doubled_consonants[wz];
j+=5;
}
break;
default:
tp=tp ltrs[j];
break;
}
}
}
function extrac_consonant(string,leng,double)
{
# string - lemma
# leng - lemma length
# double - array (doubled_consonants)
q1=substr(string,(leng-2));
q2=substr(string,(leng-1));
q3=substr(string,leng);
if (double[q1])
{
cons=q1;
}
else if (double[q2])
{
cons=q2;
}
else
{
cons=q3;
}
return cons;
}
function double_single(x5,x6,arr5)
{
# x5 - j value in switch statement
# x6 - suffix length
# arr5 - array (suffix_letters)
flag=0;
for (g=(x5+1); g<=x6; g++)
{
if (arr5[g] == "[")
{
flag=1;
}
}
return flag; # It tells us, if we have to double the consonant or not [r] ó [r][r]
}