每行一个定义,格式为“WORDPartOfSpeech”
任务是处理文档,在定义时添加词性。不应进行重新格式化。
例如,如果词典是
文章 BIG 形容词 BALL 名词
文件是
大红球落了下来。
那么输出应该是
/文大/形容词红球/名词落下。
如果我将词典作为 2 个字段放入数据库表中,然后运行 SQL 选择,输出为 1 个逗号分隔的行,格式如下:“The/article,big/adjective,ball/noun”那么我将如何使用该行并针对文档进行处理,以便像上面一样输出?
每行一个定义,格式为“WORDPartOfSpeech”
任务是处理文档,在定义时添加词性。不应进行重新格式化。
例如,如果词典是
文章 BIG 形容词 BALL 名词
文件是
大红球落了下来。
那么输出应该是
/文大/形容词红球/名词落下。
如果我将词典作为 2 个字段放入数据库表中,然后运行 SQL 选择,输出为 1 个逗号分隔的行,格式如下:“The/article,big/adjective,ball/noun”那么我将如何使用该行并针对文档进行处理,以便像上面一样输出?
You should modify your sql query to preserve any words that don't match a term in your lexicon (perhaps by using an outer join; if you show us that query we could give you more specific advice). Then, assuming your output then looks something like this (with just a / following each term that didn't match your lexicon):
The/article big/adjective red/ ball/noun fell/.
You could clean it up with sed like this (assuming the string had been saved in a variable called $variablename:
sed 's_\/\([ .]\)_\1_g' <(echo "$variablename")
Explanation:
I used _ instead of / to delimit my s command for legibility. The syntax s/search/replace/g is synonymous with s_search_replace_g.
\/\([ .]\) tells sed to match anything with a literal / (escaped as \/) followed by either a space or a period [ .]. anything matching this pattern is stored into a reference because of the \( and \) surrounding the pattern.
\1 in the replacement pattern is the backreference I mentioned earlier. This works like a variable storing the matched portion we surrounded with parentheses in the search pattern. In effect, I've told sed to strip out any forward slashes that are followed by a space or a period, without stripping away the space or period itself.
Output:
The/article big/adjective red ball/noun fell.