我正在从 wwwjdic 示例中的 EDICT 字典文件中提取数据:
相同器官 [そうどうきかん] /(n) homologous organ/
相同染色体 [そうどうせんしょくたい] /(n) homologous chromosome/
相同組換え [そうどうくみかえ] /(n) homologous recombination/
相同的組み換え [そうどうてきくみかえ] /(n) homologous recombination/
相同的組換 [そうどうてきくみかえ] /(n) homologous recombination/
相同的組換え [そうどうてきくみかえ] /(n) homologous recombination/
相入れない [あいいれない] /(iK) (exp,adj-i) in conflict/incompatible/out of harmony/running counter/mutually exclusive/clashing with/
相年 [あいどし] /(n,adj-no) the same age/
相伴 [しょうばん] /(n,vs) partaking/participating/taking part in/sharing (something with someone)/
相伴う [あいともなう] /(v5u) to accompany/
相判 [あいはん] /(n,vs) (1) official seal/verification seal/affixing a seal to an official document/(2) making a joint signature or seal/
相判 [あいばん] /(n) (1) medium-sized paper (approx. 15x21 cm, used for notebooks)/(2) medium-sized photo print (approx. 10x13 cm)/
相判 [あいばん] /(n,vs) (1) official
这些行指定每个条目的词性,即/(n)
名词和/(adj)
形容词。我有兴趣在此数组中获取所有标记为词性的条目:
["n", "n-adv", "n-pref", "n-suf", "n-t", "num", "pn", "adj-no", "adj-f", "adv-n", "vs"]
我正在尝试像这样分割线条
file = File.open("EDICT.txt")
file.each_line do |line|
if line[#Regex]
.
.
我正在使用正则表达式,但我得到的最远的是
/\/[(](n|n-adv|n-pref|n-suf|n-t|num|pn|adj-no|adj-f|adv-n|vs|n,vs)[)]/
这是不健壮的。此外,有时还有这样的标签:
/(adj-no,n-adv,n-t)
与正则表达式不匹配。同时它不应该匹配这些术语:
["adj-i", "adj-na", "adj-pn", "adj-t", "adj", "adv", "adv-to", "aux", "aux-v", "aux-adj", "conj",
"ctr", "exp", "int", "iv", "pref", "prt", "suf", "v1", "v2a-s", "v4h", "v4r", "v5", "v5argu",
"v5b", "v5g", "v5k", "v5k-s", "v5m", "v5n", "v5r", "v5r-i", "v5s", "v5t", "v5u", "v5u-s", "v5uru",
"v5z", "vz", "vi", "vk", "vn", "vs-c", "vs-i", "vs-s", "vt"]
有什么更好、更可靠的方法来查看该行是否包含所需的/()
标签?