regex - 如果多于两个，则多行正则表达式替换

Question

我很难解决以下问题；

有一个包含问题和答案的 word 文件，我需要以特定格式导入 moodle（在线问题网站）。一切都是黑色的接受正确的答案，这些是绿色的。开始格式如下：

1. Question example

a. Wrong

b. Wrong

C. Wrong

D. Right

输出应该变成

:Question example

:Question example

{

~ Wrong

~ Wrong

~ Wrong

= Right

}

我在 word 中打开文件，用 * 替换所有红色段落标记（我不能用组替换）。之后，我将 .docx 文件导出为文本。在我的 linux 计算机上打开并在其上抛出以下正则表达式。

sed -i -e 's/^\r/\n/g' tmp #OS X white line replacement                    
sed -i -e 's/\r//g' tmp #remove white lines                           
sed -i -e 's:^[a-z]\.:~:' tmp #Replace Leading question letters with tilde                                                                                               
sed -i -e 's/\(^[0-9]*\.\ \)\(.*\)/}\n::\2\n::\2\n{/' tmp #regenerate tittle                    
sed -i -n '${p;q};N;/\n\*/{s/"\?\n//p;b};P;D' tmp #next line starts with * append to front of current                                                              
sed -i -e 's:^~\(.*\)\(\*.*\)$:=\1:' tmp #move * from back to = to front
sed -i -e 's:^\*:=:' tmp #replace any remaining * with =        
sed '/^$/d' tmp #delete any remaining white lines

这不是很好，但效果很好，问题是手工制作的并且有很多错误，所以我仍然必须手动完成。困难的部分是当我有多个正确答案时。输出应如下所示；

:Question example

:Question example

{

~%-100% Wrong

~%-100% Wrong

~%50% Right

~%50% Right

}

理想情况下，我有一个 sed 或 perl 正则表达式，它计算 { 之间的 = 唱数，并用 ~%50% 替换它们。所有的 ~ 都以 %-100% 的比例唱歌。我也可以将这段代码用于 3 个正确答案，其中每个正确答案都变为 ~%33%。

这是可行的吗？我有超过 1000 个问题，它肯定有助于自动化。用 sed 进行多行替换对于两行来说有点棘手，所以我猜四行或更多行需要 perl？我没有使用 Perl 的经验。

有人可以帮我解决这个问题吗？请原谅我的英语不好，我是非母语人士。

score 1 · Accepted Answer

下面的程序根据我对您需要的最佳猜测工作。它的工作原理是将所有信息读入一个数组，然后对其进行格式化。

就目前而言，数据被合并到源中并从DATA文件句柄中读取。将循环更改为while (<>) { ... }将允许您在命令行上指定数据文件。

如果我的猜测是错误的，你必须纠正我。

use strict;
use warnings;

my @questions;

while (<DATA>) {
  next unless /\S/;
  s/\s+$//;
  if (/^\d+\.\s*(.+)/) {
    push @questions, [$1];
  }
  elsif (/^[A-Za-z]\.\s*(.+)/i) {
    push @{$questions[-1]}, $1;
  }
}

for my $question (@questions) {

  my ($text, @answers) = @$question;

  print "::$text\n" for 1, 2;

  my $correct = grep /right/i, @answers;
  my $percent = int(100/$correct);

  print "{\n";

  if ($correct == 1) {
    printf "%s %s\n", /right/i ? '=' : '~', $_ for @answers;
  }
  else {
    my $percent = int(100/$correct);
    printf "~%%%d%%~ %s\n", /right/i ? $percent : -100, $_ for @answers;
  }

  print "}\n";
}

__DATA__
1. Question one

a. Wrong

b. Wrong

c. Right

d. Wrong

2. Question two

a. Right

b. Wrong

c. Right

d. Wrong

3. Question three

a. Right

b. Right

c. Wrong

d. Right

输出

::Question one
::Question one
{
~ Wrong
~ Wrong
= Right
~ Wrong
}
::Question two
::Question two
{
~%50%~ Right
~%-100%~ Wrong
~%50%~ Right
~%-100%~ Wrong
}
::Question three
::Question three
{
~%33%~ Right
~%33%~ Right
~%-100%~ Wrong
~%33%~ Right
}

score 1 · Accepted Answer

my $file = do { local $/; <> };
my @questions = split /(?<=.)(?=[0-9]+\.)/s, $file;
for (@questions) {
   my @lines = split /^/m;

   my $title = shift(@lines);
   $title =~ s/^\S+\s*/:/;

   my $num_right;
   my $num_wrong;
   for (@lines) {
      if    (/Right/) { ++$num_right; }
      elsif (/Wrong/) { ++$num_wrong; }
   }

   my $num_answers = $num_right + $num_wrong;

   my $right_pct = sprintf('%.0f', $num_right/$num_answers*100);
   my $right_prefix = $num_right == 1 ? "=" : "~%$right_pct%";
   my $wrong_prefix = $num_right == 1 ? "~" : "~%-100%";

   for (@lines) {
      if    (/Right/) { s/^\S+/$right_prefix/; }
      elsif (/Wrong/) { s/^\S+/$wrong_prefix/; }
   }

   print(
      $title,
      "\n",
      $title,
      "\n{\n",
      @lines,
      "\n}\n",
   );
}

用适当的东西替换/Right/和。/Wrong/

score 1 · Accepted Answer

这可能对您有用：

cat <<\! >file.sed
> # On encountering a digit in the first character position
> /^[0-9]/{
>   # Create a label to cater for last line processing
>   :end
>   # Swap to hold space
>   x
>   # Check hold space for contents.
>   # If none delete it and begin a new cycle
>   # This is to cater for the first question line
>   /./!d
>   # Remove any carriage returns
>   s/\r//g
>   # Remove any blank lines
>   s/\n\n*/\n/g
>   # Double the question line, replacing the question number by a ':'
>   # Also append a { followed by a newline
>   s/^[0-9]*\.\([^\n]*\n\)/:\1:\1{\n/
>   # Coalesce lines beginning with a * and remove optional preceeding "
>   s/"\?\n\*/*/g
>   # Replace the wrong answers a,b,c...  with ~%-100%
>   s/\n[a-zA-z]*\. \(Wrong\)/\n~%-100% \1/g
>   # Replace the right answers a,B,c... with ~%100%
>   s/\n[a-zA-Z]*\. \(Right\)/\n~%100% \1/g
>   # Assuming no more than 4 answers:
>   # Replace 4 correct answers prefix with ~%25%
>   s/\(~%100%\)\(.*\)\1\(.*\)\1\(.*\)\1/~%25%\2~%25%\3~%25%\4~%25%/
>   # Replace 3 correct answers prefix with ~%33%
>   s/\(~%100%\)\(.*\)\1\(.*\)\1/~%33%\2~%33%\3~%33%/
>   # Replace 2 correct answers prefix with ~%50%
>   s/\(~%100%\)\(.*\)\1/~%50%\2~%50%/
>   # Append a newline and a }
>   s/$/\n}/
>   # Break and so print newly formatted string
>   b
>   }
> # Append pattern space to hold space
> H
> # On last line jump to end label
> $b end
> # Delete all lines from pattern space
> d
> !

然后运行：

sed -f file.sed file

score 0 · Accepted Answer

您的示例与此文档不匹配：http: //docs.moodle.org/22/en/GIFT。问题标题和问题由两个冒号而不是一个冒号分隔：

//Comment line 
::Question title 
:: Question {
=A correct answer
~Wrong answer1
#A response to wrong answer1
~Wrong answer2
#A response to wrong answer2
~Wrong answer3
#A response to wrong answer3
~Wrong answer4
#A response to wrong answer4
}

有些人天真地根据你的例子给你答案，而不是找到真正的规范，哎呀。

您的问题无法回答，因为您的格式没有显示正确答案。也就是说：

1. Question

a. Is this right?

b. Or this?

c. Or this?

您说这些是使用原始 Word 文档中的颜色标识的，并且您对其进行了一些替换以保留信息；但是，您没有显示此示例！哎呀...

regex - 如果多于两个，则多行正则表达式替换

4 回答 4

Related

Reference