什么是从字符串中删除哈希标签然后将哈希标签单词连接到另一个用逗号分隔的字符串中的好方法:
'Some interesting tweet #hash #tags'
结果将是:
'Some interesting tweet'
和:
'hash,tags'
什么是从字符串中删除哈希标签然后将哈希标签单词连接到另一个用逗号分隔的字符串中的好方法:
'Some interesting tweet #hash #tags'
结果将是:
'Some interesting tweet'
和:
'hash,tags'
str = 'Some interesting tweet #hash #tags'
a,b = str.split.partition{|e| e.start_with?("#")}
# => [["#hash", "#tags"], ["Some", "interesting", "tweet"]]
a
# => ["#hash", "#tags"]
b
# => ["Some", "interesting", "tweet"]
a.join(",").delete("#")
# => "hash,tags"
b.join(" ")
# => "Some interesting tweet"
An alternate path is to use scan
then remove the hash tags:
tweet = 'Some interesting tweet #hash #tags'
tags = tweet.scan(/#\w+/).uniq
tweet = tweet.gsub(/(?:#{ Regexp.union(tags).source })\b/, '').strip.squeeze(' ') # => "Some interesting tweet"
tags.join(',').tr('#', '') # => "hash,tags"
Dissecting it shows:
tweet.scan(/#\w+/)
returns an array ["#hash", "#tags"]
.uniq
would remove any duplicated tags.Regexp.union(tags)
returns (?-mix:\#hash|\#tags)
.Regexp.union(tags).source
returns \#hash|\#tags
. We don't want the pattern-flags at the start, so using source
fixes that./(?:#{ Regexp.union(tags).source })\b/
returns the regular expression /(?:\#hash|\#tags)\b/
.tr
is an extremely fast way to translate one character or characters to another, or strip them.The final regex isn't the most optimized that can be generated. I'd actually write code to generate:
/#(?:hash|tags)\b/
but how to do that is left as an exercise for you. And, for short strings it won't make much difference as far as speed goes.
这有一个从空开始的哈希数组然后根据空格拆分哈希标记然后查找哈希标记并抓取单词的其余部分然后将其存储到数组中
array_of_hashetags = []
array_of_words = []
str = "Some interesting tweet #hash #tags"
str.split.each do |x|
if /\#\w+/ =~ x
array_of_hashetags << x.gsub(/\#/, "")
else
array_of_words << x
end
end
希望有帮助