sed - sed：从文件中删除字母数字单词

Question

我有很多文本的文件，我想做的是删除所有字母数字单词。

Example of words to be removed:

gr8  
2006  
sdlfj435ljsa  
232asa  
asld213  
ladj2343asda
asd!32

我能做到这一点的最好方法是什么？

score 6 · Accepted Answer

如果要删除所有由字母和数字组成的单词，只留下由所有数字或所有字母组成的单词：

sed 's/\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g' inputfile

例子：

$ echo 'abc def ghi 111 222 ab3 a34 43a a34a 4ab3' | sed 's/\<\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g'
abc def ghi 111 222

score 2 · Accepted Answer

假设您想要从示例文本中得到的唯一输出是2006并且每行有一个单词：

 sed '/[[:alpha:]]\+/{/[[:digit:]]\+/d}' /path/to/alnum/file

输入

$ cat alnum
gr8
2006
sdlFj435ljsa
232asa
asld213
ladj2343asda
asd!32
alpha

输出

$ sed '/[[:alpha:]]\+/{/[[:digit:]]\+/d}' ./alnum
2006
alpha

score 0 · Accepted Answer

如果目标实际上是删除所有字母数字单词（完全由字母和数字组成的字符串），则此sed命令将起作用。它用任何内容替换所有字母数字字符串。

sed 's/[[:alnum:]]*//g' < inputfile

请注意，除此之外的其他字符类alnum也可用（请参阅参考资料man 7 regex）。

对于您给定的示例数据，这只留下 6 个空行和一个!（因为这是示例数据中唯一的非字母数字字符）。这实际上是你想要做的吗？

score 0 · Accepted Answer

AWK解决方案：

BEGIN { # Statement that will be executed once at the beginning.
    FS="[ \t]" # Set space and tab characters to be treated as word separator.
}
# Code below will execute for each line in file.
{
    x=1  # Set initial word index to 1 (0 is the original string in array)
    fw=1 # Indicate that future matched word is a first word. This is needed to put newline and spaces correctly.
    while ( x<=NF )
    {
        gsub(/[ \t]*/,"",$x) # Strip word. Remove any leading and trailing white-spaces.
        if (!match($x,"^[A-Za-z0-9]*$")) # Print word only if it does not match pure alphanumeric set of characters.
        {
            if (fw == 0)
            {
                printf (" %s", $x) # Print the word offsetting it with space in case if this is not a first match.
            }
            else
            {
                printf ("%s", $x) # Print word as is...
                fw=0 # ...and indicate that future matches are not first occurrences
            }
        }
        x++ # Increase word index number.
    }
    if (fw == 0) # Print newline only if we had matched some words and printed something.
    {
        printf ("\n")
    }
}

假设您在script.awk' and data indata.txt , you have to invokeawk` 中有这个脚本，如下所示：

awk -f ./test.awk ./data.txt

对于您的文件，它将产生：

asd!32

对于像这样的更复杂的情况：

gr8
2006
sdlfj435ljsa
232asa  he!he lol
asld213  f
ladj2343asda
asd!32  ab acd!s

...它会产生这个：

he!he
asd!32 acd!s

希望能帮助到你。祝你好运！

sed - sed：从文件中删除字母数字单词

4 回答 4

输入

输出

Related

Reference