ruby - 如何拆分包含定界符和转义定界符的字符串？

Question

我的字符串分隔符是;. 分隔符在字符串中转义为\;. 例如，

irb(main):018:0> s = "a;b;;d\\;e"
=> "a;b;;d\\;e"
irb(main):019:0> s.split(';')
=> ["a", "b", "", "d\\", "e"]

有人可以建议我使用正则表达式，以便 split 的输出是["a", "b", "", "d\\;e"]？我正在使用 Ruby 1.8.7

score 6 · Accepted Answer

1.8.7 没有 Oniguruma （可能被编译）没有负面的后视。

1.9.3；耶：

> s = "a;b;c\\;d"
=> "a;b;c\\;d"
> s.split /(?<!\\);/
=> ["a", "b", "c\\;d"]

带有 Oniguruma 的 1.8.7 不提供微不足道的拆分，但您可以获得匹配偏移量并以这种方式拉开子字符串。我认为有更好的方法可以做到这一点，我不记得了：

> require 'oniguruma'
> re = Oniguruma::ORegexp.new "(?<!\\\\);"
> s = "hello;there\\;nope;yestho"
> re.match_all s
=> [#<MatchData ";">, #<MatchData ";">]
> mds = re.match_all s
=> [#<MatchData ";">, #<MatchData ";">]
> mds.collect {|md| md.offset}
=> [[5, 6], [17, 18]]

其他选项包括：

对结果进行拆分;和后处理以寻找尾随\\，或
做一个逐字符循环并保持一些简单的状态，然后手动拆分。

score 2 · Accepted Answer

正如@dave-newton 回答的那样，您可以使用否定的lookbehind，但这在1.8 中不受支持。在 1.8 和 1.9 中都可以使用的替代方法是使用String#scan而不是 split，其模式接受 not（分号或反斜杠）或以反斜杠为前缀的 anychar：

$ irb
>> RUBY_VERSION
=> "1.8.7"
>> s = "a;b;c\\;d"
=> "a;b;c\\;d"
s.scan /(?:[^;\\]|\\.)+/
=> ["a", "b", "c\\;d"]

ruby - 如何拆分包含定界符和转义定界符的字符串？

2 回答 2

Related

Reference