Java有shlex替代品吗?我希望能够像 shell 处理它们一样拆分引号分隔的字符串。例如,如果我发送:
一二三四”并执行拆分,我想收到令牌
一二三四
_
我今天遇到了类似的问题,它看起来不像任何标准选项,如 StringTokenizer、StrTokenizer、Scanner 都非常适合。但是,实现基础并不难。
此示例处理当前在其他答案上评论的所有边缘情况。请注意,我还没有检查它是否完全符合 POSIX。Gist 包括GitHub 上可用的单元测试- 通过未经许可在公共领域发布。
public List<String> shellSplit(CharSequence string) {
List<String> tokens = new ArrayList<String>();
boolean escaping = false;
char quoteChar = ' ';
boolean quoting = false;
int lastCloseQuoteIndex = Integer.MIN_VALUE;
StringBuilder current = new StringBuilder();
for (int i = 0; i<string.length(); i++) {
char c = string.charAt(i);
if (escaping) {
current.append(c);
escaping = false;
} else if (c == '\\' && !(quoting && quoteChar == '\'')) {
escaping = true;
} else if (quoting && c == quoteChar) {
quoting = false;
lastCloseQuoteIndex = i;
} else if (!quoting && (c == '\'' || c == '"')) {
quoting = true;
quoteChar = c;
} else if (!quoting && Character.isWhitespace(c)) {
if (current.length() > 0 || lastCloseQuoteIndex == (i - 1)) {
tokens.add(current.toString());
current = new StringBuilder();
}
} else {
current.append(c);
}
}
if (current.length() > 0 || lastCloseQuoteIndex == (string.length() - 1)) {
tokens.add(current.toString());
}
return tokens;
}
org.apache.commons.lang.text.StrTokenizer应该能够做你想做的事:
new StringTokenizer("一二\"三四\"", '', '"').getTokenArray();
我使用fastparse使用以下 Scala 代码取得了成功。我不能保证它是完整的:
val kvParser = {
import fastparse._
import NoWhitespace._
def nonQuoteChar[_:P] = P(CharPred(_ != '"'))
def quotedQuote[_:P] = P("\\\"")
def quotedElement[_:P] = P(nonQuoteChar | quotedQuote)
def quotedContent[_:P] = P(quotedElement.rep)
def quotedString[_:P] = P("\"" ~/ quotedContent.! ~ "\"")
def alpha[_:P] = P(CharIn("a-zA-Z"))
def digit[_:P] = P(CharIn("0-9"))
def hyphen[_:P] = P("-")
def underscore[_:P] = P("_")
def bareStringChar[_:P] = P(alpha | digit | hyphen | underscore)
def bareString[_:P] = P(bareStringChar.rep.!)
def string[_:P] = P(quotedString | bareString)
def kvPair[_:P] = P(string ~ "=" ~ string)
def commaAndSpace[_:P] = P(CharIn(" \t\n\r").rep ~ "," ~ CharIn(" \t\n\r").rep)
def kvPairList[_:P] = P(kvPair.rep(sep = commaAndSpace))
def fullLang[_:P] = P(kvPairList ~ End)
def res(str: String) = {
parse(str, fullLang(_))
}
res _
}