ruby-on-rails - rails：获取文章的预告片/摘录

Question

我有一个将列出新闻文章的页面。为了减少页面的长度，我只想显示一个预告片（文章的前 200 个字/600 个字母），然后显示一个“更多...”链接，单击该链接时，将展开其余部分以 jQuery/Javascript 方式撰写的文章。现在，我已经弄清楚了，甚至在一些粘贴页面上找到了以下帮助方法，这将确保新闻文章（字符串）不会在单词中间被切碎：

 def shorten (string, count = 30)
    if string.length >= count
      shortened = string[0, count]
      splitted = shortened.split(/\s/)
      words = splitted.length
      splitted[0, words-1].join(" ") + ' ...'
    else
      string
    end
  end

我遇到的问题是我从数据库中获得的新闻文章正文是格式化的 HTML。所以如果我不走运，上面的助手会在 html 标签的中间切掉我的文章字符串，并在那里插入“more...”字符串（例如在“”之间），这会破坏我在页面上的 html .

有什么办法可以解决这个问题，还是有一个插件可以用来从 HTML 字符串生成摘录/预告片？

score 16 · Accepted Answer

您可以结合使用Sanitize和Truncate。

truncate("And they found that many people were sleeping better.", 
  :omission => "... (continued)", :length => 15)
# => And they found... (continued)

我正在做一个类似的任务，我有博客文章，我只想快速摘录。所以在我看来，我只是这样做：

sanitize(truncate(blog_post.body, length: 150))

这去掉了 HTML 标记，给了我前 150 个字符，并在视图中处理，因此它对 MVC 友好。

祝你好运！

score 3 · Accepted Answer

我在这里的回答应该有效。最初的问题（错误，由我提出）是关于截断降价的，但我最终将降价转换为 HTML，然后将其截断，所以它应该可以工作。

当然，如果您的网站访问量很大，您应该缓存摘录（也许在创建/更新帖子时，您可以将摘录存储在数据库中？），这也意味着您可以允许用户修改或输入他们自己的摘抄

用法：

>> puts "<p><b><a href=\"hi\">Something</a></p>".truncate_html(5, at_end = "...")
=> <p><b><a href="hi">Someth...</a></b></p>

..和代码（从另一个答案复制）：

require 'rexml/parsers/pullparser'

class String
  def truncate_html(len = 30, at_end = nil)
    p = REXML::Parsers::PullParser.new(self)
    tags = []
    new_len = len
    results = ''
    while p.has_next? && new_len > 0
      p_e = p.pull
      case p_e.event_type
      when :start_element
        tags.push p_e[0]
        results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
      when :end_element
        results << "</#{tags.pop}>"
      when :text
        results << p_e[0][0..new_len]
        new_len -= p_e[0].length
      else
        results << "<!-- #{p_e.inspect} -->"
      end
    end
    if at_end
      results << "..."
    end
    tags.reverse.each do |tag|
      results << "</#{tag}>"
    end
    results
  end

  private

  def attrs_to_s(attrs)
    if attrs.empty?
      ''
    else
      ' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
    end
  end
end

score 2 · Accepted Answer

非常感谢你的回答！然而，与此同时，我偶然发现了jQuery HTML Truncator 插件，它完全符合我的目的并将截断转移到客户端。它没有变得更容易:-)

score 1 · Accepted Answer

如果您不想在 html 元素中间拆分，则必须编写更复杂的解析器。它必须记住它是否在 <> 块的中间以及是否在两个标签之间。

即使你这样做了，你仍然会遇到问题。如果有人将整篇文章放入一个 html 元素中，由于缺少结束标记，解析器无法将其拆分到任何地方。

如果可能的话，我会尽量不要在文章中添加任何标签或将其保留在不包含任何内容的标签中（不<div>，等等）。这样你只需要检查你是否在一个非常简单的标签中间：

  def shorten (string, count = 30)
     if string.length >= count
       shortened = string[0, count]
       splitted = shortened.split(/\s/)
       words = splitted.length
       if(splitted[words-1].include? "<")
         splitted[0,words-2].join(" ") + ' ...'
       else
         splitted[0, words-1].join(" ") + ' ...'
     else
       string
     end   
  end

score 1 · Accepted Answer

我会清理 HTML 并提取第一句话。假设您有一个文章模型，其“body”属性包含 HTML：

# lib/core_ext/string.rb
class String
  def first_sentence
    self[/(\A[^.|!|?]+)/, 1]
  end
end

# app/models/article.rb
def teaser
  HTML::FullSanitizer.new.sanitize(body).first_sentence
end

这将转换为“<b>This</b> is an <em>important</em> article！这是文章的其余部分。” 进入“这是一篇重要的文章”。

score 0 · Accepted Answer

我使用以下解决方案解决了这个问题

安装 gem 'sanitize'

gem install sanitize

并使用以下代码，这里的正文是包含 html 标签的文本。

<%= content_tag :div, Sanitize.clean(truncate(body, length: 200, separator: ' ', omission: "... #{ link_to '(continue)', '#' }"), Sanitize::Config::BASIC).html_safe %>

给出有效 html 的摘录。我希望它可以帮助某人。

score 0 · Accepted Answer

现在有一个名为HTMLTruncator的 gem可以为您解决这个问题。我用它来显示帖子摘录等，它非常强大。

ruby-on-rails - rails：获取文章的预告片/摘录

7 回答 7

Related

Reference