php - PHP 自动从文本文件创建列表到 (X)HTML

Question

正如您可能已经知道的那样，我正在创建一个 CMS，但现在我遇到了一个小问题。

可以说：我有一个包含以下内容的文本文件：

[b]Some bold text[/b]
[i]Italic[/i]
- List item 1
- List item 2
- List item 3
# List item 1
# List item 2
# List item 3

我想将其转换为：

<b>Some bold text</b>
<i>Italic</i>
<ul>
    <li>List item 1</li>
    <li>List item 2</li>
    <li>List item 3</li>
</ul>
<ol>
    <li>List item 1</li>
    <li>List item 2</li>
    <li>List item 3</li>
</ol>

粗体和斜体工作（使用正则表达式），但我该如何做列表？

'-' 列表应转换为

<ul>
    <li>List item 1</li>
    <li>List item 2</li>
    <li>List item 3</li>
</ul>

'#' 列表

<ol>
    <li>List item 1</li>
    <li>List item 2</li>
    <li>List item 3</li>
</ol>

有人有这方面的经验吗？请帮我。我正在使用 PHP 5.2.9

score 3 · Accepted Answer

如果您不想使用现有的解析库，则必须逐行解析文件，并将当前状态保持在某处。

如果这些行以“ - ”开头，并且状态告诉您您尚未在列表中，请放置一个 <ul> 加上一个 <li>。如果您已经在列表中，只需输入 <li>。

与以“#”开头的行相同。

score 2 · Accepted Answer

您可以考虑使用其他标记语言，例如Markdown或Textile。然后你只需要处理一个图书馆。

score 0 · Accepted Answer

好的，所以你不需要所有的 Markdown ......
但为什么不直接使用你需要的功能呢？

我调整了列表处理函数以按程序工作。
为了最小化文件大小，我压缩了冗长的正则表达式
并删除了与其他 Markdown 相关的注释。
它的时钟为 3308 字节。

也许它对你的口味来说还是太臃肿了...... d-：
但你可以自由地偷工减料并删除一些东西。
例如，也许您不需要嵌套列表。

修改后的代码如下。
首先是许可证：

PHP Markdown 版权所有 (c) 2004-2009 Michel Fortin
http://michelf.com/
保留所有权利。

基于 Markdown
版权所有 (c) 2003-2006 John Gruber
http://daringfireball.net/
保留所有权利。

如果满足以下条件，则允许以源代码和二进制形式重新分发和使用，无论是否经过修改：

源代码的再分发必须保留上述版权声明、此条件列表和以下免责声明。

二进制形式的再分发必须在随分发提供的文档和/或其他材料中复制上述版权声明、此条件列表和以下免责声明。

未经事先明确的书面许可，不得使用“Markdown”名称或其贡献者的名称来认可或推广源自该软件的产品。

本软件由版权所有者和贡献者“按原样”提供，任何明示或暗示的保证，包括但不限于适销性和特定用途适用性的暗示保证，均不予承认。在任何情况下，版权所有者或贡献者均不对任何直接、间接、偶然、特殊、惩戒性或后果性损害（包括但不限于采购替代商品或服务；使用、数据或利润损失；或业务中断）以任何方式引起的以及任何责任理论，无论是合同、严格责任或因使用本软件而以任何方式引起的侵权（包括疏忽或其他），即使已被告知此类损害的可能性。

<?php
@define('MARKDOWN_TAB_WIDTH', 4);
$tab_width = MARKDOWN_TAB_WIDTH;
$list_level = 0;

function doLists($text) {
# Form HTML ordered (numbered) and unordered (bulleted) lists.
    global $tab_width, $list_level;
    $less_than_tab = $tab_width - 1;
    # Re-usable patterns to match list item bullets and number markers:
    $marker_ul_re  = '[*-]';
    $marker_ol_re  = '\d+[\.]';
    $marker_any_re = "(?:$marker_ul_re|$marker_ol_re)";
    $markers_relist = array(
        $marker_ul_re => $marker_ol_re,
        $marker_ol_re => $marker_ul_re,
        );
    foreach ($markers_relist as $marker_re => $other_marker_re) {
        # Re-usable pattern to match any entirel ul or ol list:
        $whole_list_re = '((([ ]{0,'.$less_than_tab.'})('.$marker_re.')[ ]+)(?s:.+?)(\z|\n{2,}(?=\S)(?![ ]*'.$marker_re.'[ ]+)|(?=\n\3'.$other_marker_re.'[ ]+)))'; // mx
        # We use a different prefix before nested lists than top-level lists.
        # See extended comment in _ProcessListItems().
        if ($list_level) {
            $text = preg_replace_callback('{
                    ^
                    '.$whole_list_re.'
                }mx',
                '_doLists_callback', $text);
            }
        else {
            $text = preg_replace_callback('{
                    (?:(?<=\n)\n|\A\n?) # Must eat the newline
                    '.$whole_list_re.'
                }mx',
                '_doLists_callback', $text);
            }
        }
    return $text;
    }

function _doLists_callback($matches) {
# Re-usable patterns to match list item bullets and number markers:
    $marker_ul_re  = '[*+-]';
    $marker_ol_re  = '\d+[\.]';
    $marker_any_re = "(?:$marker_ul_re|$marker_ol_re)";
    $list = $matches[1];
    $list_type = preg_match("/$marker_ul_re/", $matches[4]) ? "ul" : "ol";
    $marker_any_re = ( $list_type == "ul" ? $marker_ul_re : $marker_ol_re );
    $list .= "\n";
    $result = processListItems($list, $marker_any_re);
    $result = "<$list_type>\n" . $result . "</$list_type>";
    return "\n". $result ."\n\n";
    }

function processListItems($list_str, $marker_any_re) {
#   Process the contents of a single ordered or unordered list, splitting it
#   into individual list items.
# The $list_level global keeps track of when we're inside a list.
# Each time we enter a list, we increment it; when we leave a list,
# we decrement. If it's zero, we're not in a list anymore.
    global $list_level;
    $list_level++;
# trim trailing blank lines:
    $list_str = preg_replace("/\n{2,}\\z/", "\n", $list_str);
    $list_str = preg_replace_callback('{(\n)?(^[ ]*)('.$marker_any_re.'(?:[ ]+|(?=\n)))((?s:.*?))(?:(\n+(?=\n))|\n)(?= \n* (\z | \2 ('.$marker_any_re.') (?:[ ]+|(?=\n))))}xm','_processListItems_callback', $list_str);
    $list_level--;
    return $list_str;
    }

function _processListItems_callback($matches) {
    $item = $matches[4];
    $leading_line =& $matches[1];
    $leading_space =& $matches[2];
    $marker_space = $matches[3];
    $tailing_blank_line =& $matches[5];
    if ($leading_line || $tailing_blank_line
    || preg_match('/\n{2,}/', $item))
        { # Replace marker with the appropriate whitespace indentation
        $item = $leading_space . str_repeat(' ', strlen($marker_space)) . $item;
        $item = outdent($item)."\n";
        }
    else { # Recursion for sub-lists:
        $item = doLists(outdent($item));
        $item = preg_replace('/\n+$/', '', $item);
        }
    return "<li>" . $item . "</li>\n";
    }

function outdent($text) {
# Remove one level of line-leading tabs or spaces
    global $tab_width;
    return preg_replace('/^(\t|[ ]{1,'.$tab_width.'})/m', '', $text);
    }
?>

php - PHP 自动从文本文件创建列表到 (X)HTML

3 回答 3

Related

Reference