sed - 处理文档，添加词性

Question

每行一个定义，格式为“WORDPartOfSpeech”

任务是处理文档，在定义时添加词性。不应进行重新格式化。

例如，如果词典是

文章 BIG 形容词 BALL 名词

文件是

大红球落了下来。

那么输出应该是

/文大/形容词红球/名词落下。

如果我将词典作为 2 个字段放入数据库表中，然后运行 SQL 选择，输出为 1 个逗号分隔的行，格式如下：“The/article,big/adjective,ball/noun”那么我将如何使用该行并针对文档进行处理，以便像上面一样输出？

score 0 · Accepted Answer

You should modify your sql query to preserve any words that don't match a term in your lexicon (perhaps by using an outer join; if you show us that query we could give you more specific advice). Then, assuming your output then looks something like this (with just a / following each term that didn't match your lexicon):

The/article big/adjective red/ ball/noun fell/.

You could clean it up with sed like this (assuming the string had been saved in a variable called $variablename:

sed 's_\/\([ .]\)_\1_g' <(echo "$variablename")

Explanation:

I used _ instead of / to delimit my s command for legibility. The syntax s/search/replace/g is synonymous with s_search_replace_g.
\/$[ .]$ tells sed to match anything with a literal / (escaped as \/) followed by either a space or a period [ .]. anything matching this pattern is stored into a reference because of the $ and $ surrounding the pattern.
\1 in the replacement pattern is the backreference I mentioned earlier. This works like a variable storing the matched portion we surrounded with parentheses in the search pattern. In effect, I've told sed to strip out any forward slashes that are followed by a space or a period, without stripping away the space or period itself.

Output:

The/article big/adjective red ball/noun fell.

sed - 处理文档，添加词性

1 回答 1

Related

Reference