ruby - 从包含“[”的Ruby字符串中提取子字符串

Question

我有一个包含如下数据的文件：

[date,ip]:string{[0.892838,1.28820,8.828823]}

我想将数据 0.892838, 1.28820, 8.828823 提取到一个字符串中以供以后处理。

我已经使用该模式line = String ~= /\\[/来获取发生的位置，"["但是对于上述输入，我收到以下错误消息：

premature end of char-class /\\[/

score 2 · Accepted Answer

这怎么样？

str = 'date,ip]:string{[0.892838,1.28820,8.828823]}'
str.scan(/\d+.\d+/)
# => ["0.892838", "1.28820", "8.828823"]

score 1 · Accepted Answer

使用捕获组：

'[date,ip]:string{[0.892838,1.28820,8.828823]}' =~ /{\[(.*?)\]}/
# => 16
$1            # => "0.892838,1.28820,8.828823"
$1.split(',') # => ["0.892838", "1.28820", "8.828823"]

score 0 · Accepted Answer

正如我倾向于做的那样：

require 'fruity'

str = '[date,ip]:' + ('string' * 1) + '{[0.892838,1.28820,8.828823]}'
compare do
  arup { str.scan(/\d+.\d+/) }
  falsetrue { str =~ /{\[(.*?)\]}/; $1.split(',') }
  ttm { str[/\[([^\]]+)\]}$/, 1].split(',') }
end

# >> Running each test 2048 times. Test will take about 1 second.
# >> falsetrue is similar to ttm
# >> ttm is faster than arup by 2x ± 0.1

该string部分越长，各种尝试的运行时间变化就越大：

require 'fruity'

str = '[date,ip]:' + ('string' * 1000) + '{[0.892838,1.28820,8.828823]}'
compare do
  arup { str.scan(/\d+.\d+/) }
  falsetrue { str =~ /{\[(.*?)\]}/; $1.split(',') }
  ttm { str[/\[([^\]]+)\]}$/, 1].split(',') }
end

# >> Running each test 512 times. Test will take about 2 seconds.
# >> ttm is faster than falsetrue by 60.00000000000001% ± 10.0%
# >> falsetrue is faster than arup by 13x ± 1.0

“ttm”结果速度提高的原因是因为'$'. 该锚点为正则表达式引擎提供了它需要知道的立即搜索位置的信息。如果没有它，它将从字符串的开头开始并向前搜索，因此'string'组件越长，找到所需模式所需的时间就越多。

使用基准测试表达式，您可以找到特定任务的最佳平均速度和表达式。

如果“字符串”部分总是很短，那么单次传递的差异是如此之小，它并不重要，然后使用最容易阅读（和易于维护）的代码是明智的，这将是str.scan(/\d+.\d+/). 如果代码处于循环中并运行了数百万次，那么它就会开始产生影响，而其他代码可能会更明智。

ruby - 从包含“[”的Ruby字符串中提取子字符串

3 回答 3

Related

Reference