regex - 替换模式中的特定字符

Question

我正在寻找一种从匹配正则表达式模式的字符串中删除特定字符的方法。我将带有换行符的文本存储在一个制表符分隔的文件中，该文件应该每行有一个记录，我试图用空格替换所有换行符。最后一列（这是一个带有字母数字键的短列）中不会出现换行符。

解决它恕我直言的方法是替换\n以下模式中的每个实例：

[^\t]*\t[^\t]*

到目前为止，我的解决方案使用三个步骤：

将“好”替换\n为文本的其余部分中不存在的特殊字符串（例如长数字），使用s/\([^\t]*\t{x}[^\t]*\)\n/\1#12398754987235649876234#/g比x我的文件中的预期列数少一
\n用空格替换所有（“坏”）
用新行替换长号

但我有相当多的文本文件，我正在寻找一种方法来一步完成。sed

示例输入：

foo \t Each multiplex has screens allocated \n
to each studio. \t abc \n
bar \t The screens need filling. \t bcd \n
123 \t Studios have to create product to fill \n
their screen, and the amount of good product is limited. \t cde \n

输出：

foo \t Each multiplex has screens allocated to each studio. \t abc \n
bar \t The screens need filling. \t bcd \n
123 \t Studios have to create product to fill their screen, and the amount of good product is limited. \t cde \n

score 1 · Accepted Answer

这可能对您有用（GNU sed）：

sed -r ':a;$!N;s/\n([^\t]+)$/\1/;ta;P;D' file

将 2 行读入模式空间 (PS)，如果最后一行不包含制表符，则删除换行符并读入下一行并重复。如果该行确实包含一个制表符，则打印第一行，然后将其删除，然后重复。

score 1 · Accepted Answer

使用awk

cat file
foo     Each multiplex has screens allocated
to each studio.
bar     The screens need filling.
123     Studios have to create product to fill
their screen, and the amount of good product is limited.

如果一行确实包含制表符\t，则将其连接到下一行。

awk 'NR>1 {s=/\t/?"\n":" "}{printf s"%s",$0} END {print ""}'
foo     Each multiplex has screens allocated to each studio.
bar     The screens need filling.
123     Studios have to create product to fill their screen, and the amount of good product is limited.

score 0 · Accepted Answer

使用sed处理前面的行总是很棘手，因为它的缓冲区数量少、非贪婪量词、缺乏前瞻等等的限制，但在这里你有一个方法。它已被评论，但我知道这并不容易

sed -n '
    ## Label "a"
    :a;
    ## Enter this section after join all lines without a tab.
    /\t.*\t/ {
        ## Loop to remove all newlines but the last one, because it is
        ## next line with a tab that I dont want to print now.
        :b;
        /\n[^\n]*\n/ { 
            s/\n/ /; 
            bb 
        }; 
        ## Print until newline (all joined lines) and delete them
        P; 
        D;
    };
    ## Append next line to buffer and repeat loop.
    N; 
    $! ba;
    ## Special case for last line, remove extra newlines and print. 
    s/\n/ /g; 
    p
' infile

假设infile有以下内容：

foo     Each multiplex has screens allocated
to each studio.
bar     The screens need filling.
123     Studios have to create product to fill
their screen, and the amount of good product is limited.

它产生：

foo     Each multiplex has screens allocated to each studio.
bar     The screens need filling.
123     Studios have to create product to fill their screen, and the amount of good product is limited.

regex - 替换模式中的特定字符

3 回答 3

Related

Reference