perl - 省略由标记行分隔的部分文本文件

Question

给定一个测试文件settings.py，如下所示：

# Django settings for x project.
DEBUG = True
TEMPLATE_DEBUG = DEBUG
ADMINS = (
    # ('Your Name', 'your_email@example.com'),
)
MANAGERS = ADMINS
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
        'NAME': '',                      # Or path to database file if using sqlite3.
        'USER': '',                      # Not used with sqlite3.
        'PASSWORD': '',                  # Not used with sqlite3.
        'HOST': '',                      # Set to empty string for localhost. Not used with sqlite3.
        'PORT': '',                      # Set to empty string for default. Not used with sqlite3.
    }
}
# Hosts/domain names that are valid for this site; required if DEBUG is False
# See https://docs.djangoproject.com/en/1.3/ref/settings/#allowed-hosts
ALLOWED_HOSTS = []

我想以编程方式（shell脚本）替换行之间的部分：

DATABASES = {

和：

变量中包含一些文本k：

declare -r k='foo bar baz'

我是perl初学者，但我编造了这个：

perl -ne 'if(!$f && /DATABASES/){$f=1} if(!$f){print} if($f && /^}$/){$f=0}' < settings.py

这与我通常的sed/awk小技巧不同：

# e.g.
sed '/DATABASES/,/^}$/ d' < settings.py

我想改进我perl的单行！

万能的我怎么能做sed的这么漂亮perl呢？

什么是绝对最好的方法：

观看标准输入传递并将其复制到标准输出
检测到哨兵“停止打印”行并停止复印
在遇到第二个哨兵线时重新启用 stdin->stdout 的传递

我省略了任务的替换部分，希望也能得到一些帮助。

score 2 · Accepted Answer

无法想象您为什么要使用 perl 来进行简单的文本操作，因为它是 awk 的设计初衷，并且与所有优秀的 UNIX 工具一样，awk 只做一件事并且做得很好。

使用 GNU awk：

$ k="<<<< foo >>>>"
$ gawk -v k="$k" -v RS='\0' '{sub(/DATABASES = {.*\n}/,k)}1' file
# Django settings for x project.
DEBUG = True
TEMPLATE_DEBUG = DEBUG
ADMINS = (
    # ('Your Name', 'your_email@example.com'),
)
MANAGERS = ADMINS
<<<< foo >>>>
# Hosts/domain names that are valid for this site; required if DEBUG is False
# See https://docs.djangoproject.com/en/1.3/ref/settings/#allowed-hosts
ALLOWED_HOSTS = []

解释：

gawk
-v k="$k"     = set the awk variable k to the value of the shell variable k
-v RS='\0'    = set the Record Separator to the NULL string so gawk reads the whole file
'
{sub(/DATABASES = {.*\n}/,k)}     = replace the text between "DATABASES = {" and "}" at the start of a line inclusive with the contents of the awk variable k.
1     = set a true condition which invokes the default action of printing the current record (the whole file in this case)
' file

如果由于内存限制而无法一次读取整个文件，或者您只是喜欢这种风格或没有 GNU awk，请将脚本修改为（未经测试）：

$ awk -v k="$k" '
    /DATABASES = {/ { skip=1 }
    skip && /^}/    { skip=0; $0=k }
    !skip
  ' file

希望它的作用是显而易见的。请注意，删除 RS='\0' 的设置意味着脚本不再是 gawk 特定的。

如果您需要保留分隔线，那也只是一个调整：

$ awk -v k="$k" '
    skip && /^}/    { skip=0; print k }
    !skip
    /DATABASES = {/ { skip=1 }
  ' file
# Django settings for x project.
DEBUG = True
TEMPLATE_DEBUG = DEBUG
ADMINS = (
    # ('Your Name', 'your_email@example.com'),
)
MANAGERS = ADMINS
DATABASES = {
<<<< foo >>>>
}
# Hosts/domain names that are valid for this site; required if DEBUG is False
# See https://docs.djangoproject.com/en/1.3/ref/settings/#allowed-hosts
ALLOWED_HOSTS = []

score 1 · Accepted Answer

要删除 DATABASES 和 } 之间的部分，您可以使用以下命令：

perl -ne 'print unless (/DATABASES/../^}$/)' settings.py

对于替换，是这样的：

$ export VAR="foo bar baz"
$ perl -ne 'print $ENV{VAR},"\n" if /DATABASES/; print unless /DATABASES/../^}$/' settings.py

score 1 · Accepted Answer

我想我会向您展示如何将 awk 脚本转换为 Perl 脚本。

所以一开始，我拿了Ed Morton 的 awk 版本并通过a2p.

$ a2p
/DATABASES = {/ { skip=1 }
skip && /^}/    { skip=0; $0=k }
!skip
^d

请注意，^d表示按Ctrl+ d。

#!/opt/perl-5.14.1/bin/perl
eval 'exec /opt/perl-5.14.1/bin/perl -S $0 ${1+"$@"}'
    if $running_under_some_shell;
            # this emulates #! processing on NIH machines.
            # (remove #! line above if indigestible)

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
            # process any FOO=bar switches

while (<>) {
    chomp;  # strip record separator
    if (/DATABASES = {/) {
    $skip = 1;
    }
    if ($skip && /^}/) {
    $skip = 0;
    $_ = $k;
    }
    print $_ if !$skip;
}

我们可以扔掉这eval 'exec ...条线。我怀疑你永远需要它。

由于我们只需要处理k="$k",eval '$'.$1.'$2;' ...也可以被抛出。我们只需要设置$k或$ENV{k}用后者替换前者。（请注意，您必须调用export k它才能工作。您也可以通过调用它env k="$k" perl test.pl）

由于该行被chomped，我们需要替换print $_ if !$skip;为print $_, "\n" if !$skip;或设置$\为"\n"。我想我们可以不打电话就离开chomp。

另外为了防止难以发现的错误，我将添加use strict;到use warnings;开头。

#!/usr/bin/env perl
use strict;
use warnings;

my $skip; # prevents printing when true
while (<>) {
  if (/DATABASES = {/) {
    $skip = 1;
  }
  if ($skip && /^}/) {
    $skip = 0;
    $_ = $ENV{k}."\n";
  }
  print $_ if !$skip;
}

我认为我们可以在这里混入sed“主义”。( ...)

#!/usr/bin/env perl
use strict;
use warnings;

while (<>) {
  if( my $r = /DATABASES = {/ ... /^}/ ){

    if( $r == 1 ){ # first time it matches
      print $ENV{k}, "\n";
    }

    next; # don't print
  }

  print;
}

唯一的事情是，我认为 OP 想要替换and之间的文本。所以我们必须添加代码以允许打印这两行。DATABASES = {}

#!/usr/bin/env perl
use strict;
use warnings;

while (<>) {
  if( my $r = /DATABASES = {/ ... /^}/ ){

    if( $r == 1 ){
      # append the replacement to the first line
      $_ .= $ENV{k}."\n";

    }elsif( $r !~ /E/ ){ # rest of the matches, except the last one
      next;
    }
  }

  print;
}

你知道，我真的不喜欢将替换文本放在环境变量中。把它放在分区里怎么样__DATA__。

use strict;
use warnings;

my $replacement = do{ local $/; <DATA> }; # slurp
close DATA;

while (<>) {
  if( my $r = /DATABASES = {/ .. /^}/ ){
    if( $r == 1 ){
      $_ .= $replacement;
    }elsif( $r !~ /E/ ){
      next
    }
  }
  print;
}

__DATA__
<<< FOO >>>

perl - 省略由标记行分隔的部分文本文件

3 回答 3

Related

Reference