ruby - 红宝石混淆中的正则表达式匹配

Question

谁能给我解释一下？

str = "org-id:         N/A\n"

puts str[/org-id:\s+(.+)\n/]
=> "org-id:         N/A\n"
str =~ /org-id:\s+(.+)\n/
puts $1
=> "N/A"

我只需要

str =~ /org-id:\s+(.+)\n/
puts $1

在一行中。但是str[/org-id:\s+(.+)\n/]并str.slice(/org-id:\s+(.+)\n/)返回"org-id: N/A\n"和 str.scan(/org-id:\s+(.+)\n/).first 返回["N/A"]（和数组）。为什么所有这些匹配的行为不同？

score 3 · Accepted Answer

来自精美手册：

str[regexp] → new_str 或 nil
str[regexp, fixnum] → new_str 或 nil

如果提供了 a ，则返回strRegexp的匹配部分。如果数字或名称参数跟随正则表达式，则返回该组件。MatchData

因此，如果您这样做，str[/org-id:\s+(.+)\n/]那么您将获得整个匹配部分（AKA $&）；如果您想要第一个捕获组（AKA $1），那么您可以说：

puts str[/org-id:\s+(.+)\n/, 1]
# 'N/A'

如果您的正则表达式中有第二个捕获组并且您想要它捕获的内容，您可以这样说str[regex, 2]。因此，您还可以使用命名的捕获组和符号：

puts str[/org-id:\s+(?<want>.+)\n/, :want]

因此，使用正确的模式和参数，String#[]可以方便地从字符串中提取单个基于正则表达式的块。

如果您查看手册，您应该会注意到String#[]并且String#splice是一回事。

如果我们查看String#=~，我们会看到：

str =~ obj → fixnum 或 nil

匹配—如果obj是 a Regexp，则将其用作匹配str的模式，并返回匹配开始的位置，或者nil如果没有匹配。

所以当你说：

str =~ /org-id:\s+(.+)\n/

你得到'org-id: N/A'in $&，'N/A'in $1，运算符的返回值是数字零；如果您的正则表达式中有另一个捕获组，您会在$2. “ nilor not nil”的返回值=~允许您说出以下内容：

make_pancakes_for($1) if(str =~ /some pattern that makes (us) happy/)

所以=~很方便一次性结合解析和布尔测试。

String#scan方法：

扫描（模式）→数组
扫描（模式）{|匹配，...| 块 } → str

Both forms iterate through str, matching the pattern (which may be a Regexp or a String). For each match, a result is generated and either added to the result array or passed to the block. If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.

So scan gives you a simple list of matches or an AoA of matches if capture groups are involved and scan is meant to pull apart a string into all its component pieces in one go (sort of like a more complicated version of String#split).

If you wanted to grab all of the (.+) matches from your string you'd use scan and map:

array_of_ids = str.scan(/org-id:\s+(.+)\n/).map(&:first)

but you'd only bother with that if you knew there would be several org-ids in str. Scan will also leave $&, $1, ... set to the values for the last match in the scan; but if you're using scan you'll be looking for several matches at once so those globals won't be terribly useful.

The three regex approaches ([], =~, and scan) offer similar functionality but they fill different niches. You could do it all with scan but that would be pointlessly cumbersome unless you were an orthogonality bigot and then you certainly wouldn't be working in Ruby except under extreme duress so it wouldn't matter.

score 0 · Accepted Answer

这是匹配和捕获之间的区别。Str[regex] 返回与整个正则表达式匹配的整个片段。$1 仅代表第一个 () 子句捕获的匹配部分。

ruby - 红宝石混淆中的正则表达式匹配

2 回答 2

Related

Reference