ruby - 如何改进这个小的 Ruby 正则表达式片段？

Question

我该如何改进呢？

此代码的目的是用于从表单中捕获 hash_tags #twittertype 字符串的方法 - 解析单词列表并确保所有单词都被分离出来。

WORD_TEST = "123 sunset #2d2-apple,#home,#star #Babyclub, #apple_surprise #apple,cats    mustard#dog , #basic_cable safety #222 #dog-D#DOG#2D "
SECOND_TEST = 'orion#Orion#oRion,Mike'

这是我的问题区域 RegXps ......

_string_rgx = /([a-zA-Z0-9]+(-|_)?\w+|#?[a-zA-Z0-9]+(-|_)?\w+)/

add_pound_sign = lambda { |a| a[0].chr == '#' ? a : a='#' + a; a}

我不太了解正则表达式：因此需要从扫描结果中收集第一个[元素] - >它产生了奇怪的东西，但第一个元素始终是我想要的。

 t_word = WORD_TEST.scan(_string_rgx).collect {|i| i[0] }
 s_word = SECOND_TEST.scan(_string_rgx).collect {|i| i[0] }
 t_word.map! { |a| a = add_pound_sign.call(a); a }
 s_word.map! { |a| a = add_pound_sign.call(a); a }

结果就是我想要的。我只想从 Ruby 那里得到见解 | 正则表达式大师就在那里。

puts t_word.inspect

[ 
"#123", "#sunset", "#2d2-apple", "#home", "#star", "#Babyclub", 
"#apple_surprise", "#apple", "#cats", "#mustard", "#dog", 
"#basic_cable", "#safety", "#222", "#dog-D", "#DOG", "#2D"
]

puts s_word.inspect

[
"#orion", "#Orion", "#oRion", "#Mike"
]

提前致谢。

score 2 · Accepted Answer

让我们展开正则表达式：

(
   [a-zA-Z0-9]+ (-|_)? \w+
   | #? [a-zA-Z0-9]+ (-|_)? \w+
)

(开始捕获组

[a-zA-Z0-9]+匹配一个或多个字母数字字符

(-|_)?匹配连字符或下划线并保存。该组可能会失败

\w+匹配一个或多个“单词”字符（字母数字 + 下划线）

| OR match this:

#? match optional # character

[a-zA-Z0-9]+ match one or more alphanumeric characters

(-|_)? match hyphen or underscore and capture. may fail.

\w+ match one or more word characters

) end capature

I'd rather write this regex like this;

(#? [a-zA-Z0-9]+ (-|_)? \w+)

or

( #? [a-zA-Z0-9]+ (-?\w+)? )

or

( #? [a-zA-Z0-9]+ -? \w+ )

(all are reasonably equivalent)

You should note that this regex will fail on hashtags with unicode characters, eg #Ü-Umlaut, #façadeetc. You are also limited to a two-character minimum length (#a fails, #ab matches) and may have only one hyphen (#a-b-c fails / would return #a-b)

score 0 · Accepted Answer

I would reduce your Regex pattern such as this:

WORD_TEST = "123 sunset #2d2-apple,#home,#star #Babyclub, #apple_surprise #apple,cats    mustard#dog , #basic_cable safety #222 #dog-D#DOG#2D "
foo = []
WORD_TEST.scan(/#?[-\w]+\b/) do |s|
    foo.push( s[0] != '#' ? '#' + s : s )
end

ruby - 如何改进这个小的 Ruby 正则表达式片段？

我该如何改进呢？

2 回答 2

Related

Reference