-1

I want to extract #hashtags from a string, also those that have special characters such as #1+1.

Currently I'm using:

@hashtags ||= string.scan(/#\w+/)

But it doesn't work with those special characters. Also, I want it to be UTF-8 compatible.

How do I do this?

EDIT:
If the last character is a special character it should be removed, such as #hashtag, #hashtag. #hashtag! #hashtag? etc...

Also, the hash sign at the beginning should be removed.

4

3 回答 3

1

解决方案

你可能想要这样的东西:

'#hash+tag'.encode('UTF-8').scan /\b(?<=#)[^#[:punct:]]+\b/
=> ["hash+tag"]

请注意,开始时需要零宽度断言以避免将井号捕获为匹配的一部分。

参考

于 2012-06-05T14:03:55.050 回答
0

这个怎么样:

@hashtags ||=string.match(/(#[[:alpha:]]+)|#[\d\+-]+\d+/).to_s[1..-1]

照顾 #alphabets 或 #2323+2323 #2323-2323 #2323+65656-67676

还删除开头的#

或者,如果您希望它以数组形式出现:

 @hashtags ||=string.scan(/#[[:alpha:]]+|#[\d\+-]+\d+/).collect{|x| x[1..-1]}

哇,这花了这么长时间,但我仍然不明白为什么能scan(/#[[:alpha:]]+|#[\d\+-]+\d+/)工作,但不能scan(/(#[[:alpha:]]+)|#[\d\+-]+\d+/)在我的电脑上工作。不同之处在于()第二次扫描语句。当我使用 withmatch方法时,这没有任何效果。

于 2012-06-05T13:54:22.203 回答
0

这应该有效:

@hashtags = str.scan(/#([[:graph:]]*[[:alnum:]])/).flatten

或者,如果您不希望主题标签以特殊字符开头:

@hashtags = str.scan(/#((?:[[:alnum:]][[:graph:]]*)?[[:alnum:]])/).flatten
于 2012-06-05T14:52:07.143 回答