0

每行一个定义,格式为“WORDPartOfSpeech”

任务是处理文档,在定义时添加词性。不应进行重新格式化。

例如,如果词典是

文章 BIG 形容词 BALL 名词

文件是

大红球落了下来。

那么输出应该是

/文大/形容词红球/名词落下。

如果我将词典作为 2 个字段放入数据库表中,然后运行 ​​SQL 选择,输出为 1 个逗号分隔的行,格式如下:“The/article,big/adjective,ball/noun”那么我将如何使用该行并针对文档进行处理,以便像上面一样输出?

4

1 回答 1

0

You should modify your sql query to preserve any words that don't match a term in your lexicon (perhaps by using an outer join; if you show us that query we could give you more specific advice). Then, assuming your output then looks something like this (with just a / following each term that didn't match your lexicon):

The/article big/adjective red/ ball/noun fell/.

You could clean it up with sed like this (assuming the string had been saved in a variable called $variablename:

sed 's_\/\([ .]\)_\1_g' <(echo "$variablename")

Explanation:

  • I used _ instead of / to delimit my s command for legibility. The syntax s/search/replace/g is synonymous with s_search_replace_g.

  • \/\([ .]\) tells sed to match anything with a literal / (escaped as \/) followed by either a space or a period [ .]. anything matching this pattern is stored into a reference because of the \( and \) surrounding the pattern.

  • \1 in the replacement pattern is the backreference I mentioned earlier. This works like a variable storing the matched portion we surrounded with parentheses in the search pattern. In effect, I've told sed to strip out any forward slashes that are followed by a space or a period, without stripping away the space or period itself.

Output:

The/article big/adjective red ball/noun fell.
于 2013-02-04T14:59:24.320 回答