因此,我们都知道 sed 非常擅长查找和替换文件中所有出现的单词:
sed -i 's/original_word/new_word/g' file.txt
但是,有人可以告诉我如何从文件(类似于 grep -f)中为 sed 提供“original_words”列表吗?我只想用''替换所有(删除它们)。
原始单词表文件只是一堆由行分隔的停用词(wordlist.txt):
a
about
above
according
across
after
afterwards
这将是一种获取停用词列表并从语料库中删除它们的简单方法(对于清理数据很有用)。
file.txt 看起来像
05ricardo RT @shakira: Immigration reform isn't about politics. It's about people mothers, kids. Obama is working for all of them. http://t.co/rAW ... 0
05ricardo ?@ItsReginaG: Don't vote Obama. Because you will lose jobs, and die.? Lol 0
05ricardo ?@shakira: Obama doubles Pell Grants - 700,000 more Latinos get help to go to college. Meet Johanny Adames http://t.co/EMg8NLGl Shak?. ? -1
05rodriguez_a My Comm teacher gave me a copy of Obama's speech that he gave the other night and I cried while reading it. It was that moving. -3