我正在用 Lua 写一个 Mushclient 插件。Mushclient 包含一个 PCRE 模块,它允许我使用 rex.new 函数编译正则表达式。我不确定我是否需要使用它来完成我正在尝试做的事情,但我怀疑我可能会,尽管我不想这样做。
基本上我希望能够使用分隔符“,”或“和”将字符串拆分为表格。但是,在某些情况下,这些“分隔符”出现在我希望保持不拆分的项目中(即 Felix,猫)。这是我到目前为止所做的:
false_separators = {"Felix, the Cat", "orange and tan cat", "black and white cat"}
separators = rex.new(" ?(.+?)(?:,| and )")
local sample_text = "a black and white cat, a tabby cat, a giant cat, Felix, the Cat and an orange and tan cat."
index = 1
matches = {}
separators:gmatch(sample_text, function (m, t)
for k, v in pairs(t) do
print(v)
table.insert(matches, v)
end
end)
这将输出:
a black
white cat
a tabby cat
a giant cat
Felix
the Cat
an orange
这有两个问题。首先,最后一项不包括在内。其次,我还没有弄清楚如何实现我的 false_separators 表。我想要的输出是:
a black and white cat
a tabby cat
a giant cat
Felix, the Cat
an orange and tan cat
我可以用很多 gsubing 来做到这一点,但它看起来不优雅,可能会被利用或速度慢:
false_separators = {"Felix, the Cat", "orange and tan cat", "black and white cat"}
local sample_text = "a black and white cat, a tabby cat, a giant cat, Felix, the Cat and an orange and tan cat."
function split_cats(text, false_sep)
for k, v in ipairs(false_sep) do
text = text:gsub(v, v:gsub(" ", "_")) -- replace spaces in false separator matches with underscores
end
text = text:gsub(" and ", ", "):gsub(", ", ";") -- replace ' and ' (that isn't surrounded by underscores) with a comma, then replace all commas that aren't followed by underscores with a semi-colon. Semi-colon is now the true delimiter
m = utils.split (text, ";") or {} -- split at semi-colon
for i, v in ipairs(m) do
m[i] = v:gsub("_", " ") -- remove underscores
end
return m
end
table.foreach(split_cats(sample_text, false_separators), print)
输出:
1 a black and white cat
2 a tabby cat
3 a giant cat
4 Felix, the Cat
5 an orange and tan cat.