5

我正在寻找一个输出文本中所有引用的SimpleGrepSedPerlOrPythonOneLiner 。


示例 1:

echo “HAL,” noted Frank, “said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner

标准输出:

"HAL,"
"said that everything was going extremely well.”

示例 2:

cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner

标准输出:

"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"

等等

链接到相应的文本)。

4

4 回答 4

7

我喜欢这个:

perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'

它有点冗长,但它比最简单的实现更好地处理转义引号和回溯。它的意思是:

my $re = qr{
   "               # Begin it with literal quote
   ( 
     (?>           # prevent backtracking once the alternation has been
                   # satisfied. It either agrees or it does not. This expression
                   # only needs one direction, or we fail out of the branch

         [^"\\]    # a character that is not a dquote or a backslash
     |   \\+       # OR if a backslash, then any number of backslashes followed by 
         [^"]      # something that is not a quote
     |   \\        # OR again a backslash
         (?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
         "         # and a quote
     )*            # any number of *set* qualifying phrases
  )                # all batched up together
  "                # Ended by a literal quote
}x;

如果你不需要那么大的力量——说它可能只是对话而不是结构化的引语,那么

/"([^"]*)"/ 

可能和其他任何东西一样有效。

于 2008-12-05T13:32:44.607 回答
5

如果您有嵌套引号,则任何正则表达式解决方案都不起作用,但对于您的示例,这很好用

$ echo \"HAL,\" noted Frank, \"said that everything was going extremely well\"  
 | perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"HAL,"
"said that everything was going extremely well"

$ cat eula.txt| perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"EULA"
"online"
"Software"
"Workstation Computer"
"Device"
"multiplexing"
"DRM"
"Secure Content"
"DRM Software"
"Secure Content Owners"
"DRM Upgrades"
"WMFSDK"
"Not For Resale"
"NFR,"
"Academic Edition"
"AE,"
"Qualified Educational User."
"Exclusion of Incidental, Consequential and Certain Other Damages"
"Restricted Rights"
"Exclusion des dommages accessoires, indirects et de certains autres dommages"
"Consumer rights"
于 2008-12-05T11:26:19.493 回答
4
grep -o "\"[^\"]*\""

这 greps for "+ 除了引用之外的任何内容,任意次数 +"

-o 使它只输出匹配的文本,而不是整行。

于 2008-12-05T11:23:56.457 回答
0
grep -o '"[^"]*"' file

The option '-o' print only pattern

于 2010-03-31T11:19:53.100 回答