1

我正在尝试编写一个对 HTML 文本进行清理的函数。问题定义:

function f(txt) return txt:gsub("%s"," ")

现在这适用于以下情况:

f(" hello  buddy!") ---> " hello  buddy!"

但根据 HTML 规范,只有当有两个或多个空格时,多余的才需要替换为 . 因此,不需要替换单个空格。如果多,一个空格不转换,其余转换为 . 换句话说,我需要一个函数:

f(" hello  buddy!") ---> " hello  buddy!"
f("   ") ---> "  &nbsp"
f(" ") ---> " "
f("hello buddy!") ---> "hello buddy!"

知道如何编写 f() 吗?

4

3 回答 3

2

你可以尝试类似的东西

txt:gsub("( +)", function(c) return " "..(" "):rep(#c-1) end)
于 2011-08-29T08:15:28.093 回答
2

(关于亚历克斯答案的注释。在这里发布,以便我可以包含格式化的代码。)

前 4 个 gsub 调用可以替换为单个调用,该调用将查找表作为第二个参数。这比对代码进行 4 次传递要快得多。

function sanitize(txt)
    local replacements = {
        ['&' ] = '&', 
        ['<' ] = '&lt;', 
        ['>' ] = '&gt;', 
        ['\n'] = '<br/>'
    }
    return txt
        :gsub('[&<>\n]', replacements)
        :gsub(' +', function(s) return ' '..('&nbsp;'):rep(#s-1) end)
end
于 2011-08-29T21:27:00.620 回答
0

感谢 jpjacobs 的函数使用提示,这里是完整的函数代码加上一个例子:

---This function sanetizes a HTML string so that the following characters will be shown
-- correctly when the output is rendered in a browser:
-- & will be replaced by &amp;
-- < will be replaced by &lt;
-- > will be replaced by &gt;
-- \n will be replaced by <br/>;
-- (more than one space) will be replaced by &nbsp; (as many as required)
-- @param txt the input text which may have HTML formatting characters
-- @return the sanetized HTML code
function sanitize(txt)
    txt=txt:gsub("%&","&amp;")
    txt=txt:gsub("%<","&lt;")
    txt=txt:gsub("%>","&gt;")
    txt=txt:gsub("\n","<br/>")
    txt=txt:gsub("(% +)", function(c) return " "..("&nbsp;"):rep(#c-1) end)
    return txt
end

text=[[    <html>   hello  &bye </html> ]]

print("Text='"..text.."'")
print("sanetize='"..sanitize(text).."'")

输出:

Text='    <html>   hello  &bye </html> '
sanetize=' &nbsp;&nbsp;&nbsp;&lt;html&gt; &nbsp;&nbsp;hello &nbsp;&amp;bye &lt;/html&gt; '
于 2011-08-29T08:28:49.260 回答