我有一组基于 picocli 的应用程序,我想将使用输出解析为结构化数据。到目前为止,我已经编写了三个不同的输出解析器,但我对其中任何一个都不满意(脆弱性、复杂性、扩展困难等)。关于如何干净地解析这种类型的半结构化输出有什么想法吗?
使用输出通常如下所示:
Usage: taker-mvo-2 [-hV] [-C=file] [-E=file] [-p=payoffs] [-s=millis] PENALTY
(ASSET SPREAD)...
Submits liquidity-taking orders based on mean-variance optimization of multiple
assets.
PENALTY risk penalty for payoff variance
(ASSET SPREAD)... Spread for creating market above fundamental value
for assets
-C, --credential=file credential file
-E, --endpoint=file marketplace endpoint file
-h, --help display this help message
-p, --payoffs=payoffs payoff states and probabilities (default: .fm/payoffs)
-s, --sleep=millis sleep milliseconds before acting (default: 2000)
-V, --version print product version and exit
我想将程序名称和描述、选项、参数和参数组及其描述捕获到agent
:
public class Agent {
private String name;
private String description = "";
private List<Option> options;
private List<Parameter> parameters;
private List<ParameterGroup> parameterGroups;
}
程序名称taker-mvo-2
和(可能是多行的)描述在(可能是多行的)参数列表之后:
Submits liquidity-taking orders based on mean-variance optimization of multiple assets.
选项(在方括号中)应该被解析为:
public class Option {
private String shortName;
private String parameter;
private String longName;
private String description;
}
解析选项的 JSON 是:
options: [ {
"shortName": "h",
"parameter": null,
"longName": "help",
"description": "display this help message"
}, {
"shortName": "V",
"parameter": null,
"longName": "version",
"description": "print product version and exit"
}, {
"shortName": "C",
"parameter": file,
"longName": "credential",
"description": "credential file"
}, {
"shortName": "E",
"parameter": file,
"longName": "endpoint",
"description": "marketplace endpoint file"
}, {
"shortName": "p",
"parameter": payoffs,
"longName": "payoffs",
"description": "payoff states and probabilities (default: ~/.fm/payoffs)"
}]
同样对于应该解析成的参数:
public class Parameter {
private String name;
private String description;
}
和参数组,它们被包围(
并且)...
应该被解析为:
public class ParameterGroup {
private List<String> parameters;
private String description;
}
我编写的第一个手写解析器遍历缓冲区,在数据处理过程中捕获数据。它工作得很好,但看起来很可怕。而且扩展很可怕。第二个手写解析器在遍历缓冲区时使用正则表达式。比第一个更好看,但仍然丑陋且难以扩展。第三个解析器使用正则表达式。可能是这群人中最好看的,但仍然丑陋且难以管理。
我认为手动解析此文本非常简单,但现在我想知道 ANTLR 是否可能是一个更好的工具。有什么想法或替代想法吗?