ruby - 如何将主题标签与推文分开？

Question

什么是从字符串中删除哈希标签然后将哈希标签单词连接到另一个用逗号分隔的字符串中的好方法：

'Some interesting tweet #hash #tags'

结果将是：

'Some interesting tweet'

和：

'hash,tags'

score 6 · Accepted Answer

str = 'Some interesting tweet #hash #tags'
a,b = str.split.partition{|e| e.start_with?("#")}
# => [["#hash", "#tags"], ["Some", "interesting", "tweet"]]
a
# => ["#hash", "#tags"]
b
# => ["Some", "interesting", "tweet"]
a.join(",").delete("#")
# => "hash,tags"
b.join(" ")
# => "Some interesting tweet"

score 2 · Accepted Answer

An alternate path is to use scan then remove the hash tags:

tweet = 'Some interesting tweet #hash #tags'

tags = tweet.scan(/#\w+/).uniq
tweet = tweet.gsub(/(?:#{ Regexp.union(tags).source })\b/, '').strip.squeeze(' ') # => "Some interesting tweet"
tags.join(',').tr('#', '') # => "hash,tags"

Dissecting it shows:

tweet.scan(/#\w+/) returns an array ["#hash", "#tags"].
uniq would remove any duplicated tags.
Regexp.union(tags) returns (?-mix:\#hash|\#tags).
Regexp.union(tags).source returns \#hash|\#tags. We don't want the pattern-flags at the start, so using source fixes that.
/(?:#{ Regexp.union(tags).source })\b/ returns the regular expression /(?:\#hash|\#tags)\b/.
tr is an extremely fast way to translate one character or characters to another, or strip them.

The final regex isn't the most optimized that can be generated. I'd actually write code to generate:

/#(?:hash|tags)\b/

but how to do that is left as an exercise for you. And, for short strings it won't make much difference as far as speed goes.

score 0 · Accepted Answer

这有一个从空开始的哈希数组然后根据空格拆分哈希标记然后查找哈希标记并抓取单词的其余部分然后将其存储到数组中

array_of_hashetags = []
array_of_words = []

str = "Some interesting tweet #hash #tags"

str.split.each do |x|
  if /\#\w+/ =~ x
    array_of_hashetags << x.gsub(/\#/, "")
  else 
    array_of_words << x
  end
end

希望有帮助

ruby - 如何将主题标签与推文分开？

3 回答 3

Related

Reference