1

我正在用 Lua 写一个 Mushclient 插件。Mushclient 包含一个 PCRE 模块,它允许我使用 rex.new 函数编译正则表达式。我不确定我是否需要使用它来完成我正在尝试做的事情,但我怀疑我可能会,尽管我不想这样做。

基本上我希望能够使用分隔符“,”或“和”将字符串拆分为表格。但是,在某些情况下,这些“分隔符”出现在我希望保持不拆分的项目中(即 Felix,猫)。这是我到目前为止所做的:

false_separators = {"Felix, the Cat", "orange and tan cat", "black and white cat"}
separators = rex.new(" ?(.+?)(?:,| and )")
local sample_text = "a black and white cat, a tabby cat, a giant cat, Felix, the Cat and an orange and tan cat."
index = 1
matches = {}
separators:gmatch(sample_text, function (m, t) 
    for k, v in pairs(t) do
          print(v)
          table.insert(matches, v)
    end
 end)

这将输出:

a black
white cat
a tabby cat
a giant cat
Felix
the Cat
an orange

这有两个问题。首先,最后一项不包括在内。其次,我还没有弄清楚如何实现我的 false_separators 表。我想要的输出是:

a black and white cat
a tabby cat
a giant cat
Felix, the Cat
an orange and tan cat

我可以用很多 gsubing 来做到这一点,但它看起来不优雅,可能会被利用或速度慢:

false_separators = {"Felix, the Cat", "orange and tan cat", "black and white cat"}
local sample_text = "a black and white cat, a tabby cat, a giant cat, Felix, the Cat and an orange and tan cat."

function split_cats(text, false_sep)
    for k, v in ipairs(false_sep) do
        text = text:gsub(v, v:gsub(" ", "_")) -- replace spaces in false separator matches with underscores
    end
    text = text:gsub(" and ", ", "):gsub(", ", ";") -- replace ' and ' (that isn't surrounded by underscores) with a comma, then replace all commas that aren't followed by underscores with a semi-colon. Semi-colon is now the true delimiter
    m = utils.split (text, ";") or {} -- split at semi-colon
    for i, v in ipairs(m) do
        m[i] = v:gsub("_", " ") -- remove underscores
    end
    return m
end

table.foreach(split_cats(sample_text, false_separators), print)

输出:

1 a black and white cat
2 a tabby cat
3 a giant cat
4 Felix, the Cat
5 an orange and tan cat.
4

0 回答 0