0

我最近需要截断包含 HTML 的帖子内容(对于帖子摘录/摘要等)。这通常是通过手动输入帖子的摘录来完成的,但是对于这个特定的项目,我需要自动完成。

我试图创建一个简单的方法,它只需要一个字符数并对内容进行子字符串化。但是,这并不总是有效,因为它可能会截断 HTML 标记/属性中的内容。

例如:

<?php
function truncateText($string, $chars) { return substr($string, 0, $chars); }

$content = "<div><p>some content</p><a href='http://google.com'>Let's go to google</a></div>";    

echo truncateText($content,40); //returns "<div><p>some content</p><a href='http:/"

如您所见,它将返回一个损坏的 HTML,无法正确呈现。我如何能够截断内容,但保留 HTML 标签?

4

2 回答 2

0

Your approach yelds many problems. Do you want to truncate at the 40 characters, then add as many tags as needed until they are closed? Or do you prefer to truncate at 40 and trim as much as needed to make the tags work? Do the tags add up to the 40 characters or they are ignored when counting? There are many problems with this as you can see. However, there's an alternative commonly found for summaries:

Delete the tags and truncate the text. The summary is normally just a small extract of text, a paragraph, with simple format. You don't want lists here and in most cases and stripping a link or two is okay for this.

However, if you really want to go down that road, I'd recommend meaningfully reading the html tags with some DOM parser, but to know how to do that you will first need to answer the first questions I wrote.

于 2013-07-02T20:58:52.110 回答
0

如果您不关心是否从文本中删除了格式,那么只需strip-tags()在执行任何其他操作之前通过 PHP 函数发送字符串。 说明在这里

于 2013-07-02T22:06:13.620 回答