regex - 替换匹配前的第一个字符

Question

对于每一行，我需要在字母数字符号的第一个匹配之前添加一个分号，但仅适用于第一次出现分号之后的字母数字符号。

例子：

输入：

00000001;Root;;
00000002;  Documents;;
00000003;    oracle-advanced_plsql.zip;file;
00000004;  Public;;
00000005;  backup;;
00000006;    20110323-JM-F.7z.001;file;
00000007;    20110426-JM-F.7z.001;file;
00000008;    20110603-JM-F.7z.001;file;
00000009;    20110701-JM-F-via-summer_school;;
00000010;      20110701-JM-F-via-summer_school.7z.001;file;

期望的输出：

00000001;;Root;;
00000002;  ;Documents;;
00000003;    ;oracle-advanced_plsql.zip;file;
00000004;  ;Public;;
00000005;  ;backup;;
00000006;    ;20110323-JM-F.7z.001;file;
00000007;    ;20110426-JM-F.7z.001;file;
00000008;    ;20110603-JM-F.7z.001;file;
00000009;    ;20110701-JM-F-via-summer_school;;
00000010;      ;20110701-JM-F-via-summer_school.7z.001;file;

有人可以帮我创建 Perl 正则表达式吗？我需要它在一个程序中，而不是作为一个单行者。

score 3 · Accepted Answer

这是一种在第一个分号和空格之后但在第一个非空格之前插入分号的方法。

s/;\s*\K(?=\S)/;/

如果你觉得有必要，你可以使用\w代替\S，但我觉得这个输入是一个不必要的规范。

( \Kkeep) 转义类似于后向断言，因为它不会删除匹配的内容。前瞻断言也是如此，所以这个替换所做的只是在指定位置插入一个分号。

score 1 · Accepted Answer

首先，这是一个似乎符合您要求的程序：

#/usr/bin/perl -w
while(<>) {                                                           
  s/^(.*?;.*?)(\w)/$1;$2/;                                            
  print $_;                                                           
}

将其存储在文件“program.pl”中，使用“chmod u+x program.pl”使其可执行，然后在输入数据上运行它，如下所示：

program.pl input-data.txt

下面是正则表达式的解释：

s/        # start search-and-replace regexp
  ^       # start at the beginning of this line
  (       # save the matched characters until ')' in $1
    .*?;  # go forward until finding the first semicolon
    .*?   # go forward until finding... (to be continued below)
  )
  (       # save the matched characters until ')' in $2
    \w    # ... the next alphanumeric character.
  )
/         # continue with the replace part
  $1;$2   # write all characters found above, but insert a ; before $2
/         # finish the search-and-replace regexp.

根据您的示例输入，我将使用更具体的正则表达式：

s/^(\d*; *)(\w)/$1;$2/;

此表达式从行首开始，跳过数字 (\d*)，后跟第一个分号和空格。在后面的单词字符之前插入一个分号。

选择最适合您需求的！

score 0 · Accepted Answer

首先感谢您的出色回答！

实际上我的代码片段如下所示：

 our $seperator=";" # at the beginning of the file
 #...
 sub insert {
    my ( $seperator, $line, @all_lines, $count, @all_out );
    $count     = 0;
    @all_lines = read_file($filename);

    foreach $line (@all_lines) {
        $count = sprintf( "%08d", $count );
        chomp $line;
        $line =~ s/\:/$seperator/;                          # works
        $line =~ s/\ file/file/;                            # works

        #$line=~s/;\s*\K(?=\S)/;/;                          # doesn't work
        $line =~ s/^(.*?$seperator.*?)(\w)/$1$seperator$2/; # doesn't work
        say $count . $seperator . $line . $seperator; 

        $count++; # btw, is there maybe a hidden index variable in a foreach-loop I could us instead of a new variable??
        push( @all_out, $count . $seperator . $line . $seperator . "\n" );
    }

    write_file( $csvfile, @all_out ); # using File::Slurp
}

为了得到我提供给你的输入，我已经做了一些小的替换，你可以在 foreach 循环的开头看到。

我很好奇，为什么 TLP 和 Yaakov 提出的正则表达式在我的代码中不起作用。一般来说，它们是有效的，但只有在像 Yaakov 给出的示例中那样编写时：

while(<>) {                                                           
  s/^(.*?;.*?)(\w)/$1;$2/;                                            
  print $_;                                                           
}

regex - 替换匹配前的第一个字符

3 回答 3

Related

Reference