c++ - 如何在编译时解析 DSL 的文本？

Question

是的。这是正确的。我希望能够粘贴如下表达式：

"a && b || c"

直接作为字符串进入源代码：

const std::string expression_text("a && b || c");

用它创建一个惰性求值结构：

Expr expr(magical_function(expression_text));

然后稍后评估替换已知值：

evaluate(expr, a, b, c);

我想稍后扩展这个小 DSL，所以使用一些非 C++ 语法做一些更复杂的事情，所以我不能简单地用简单的方式硬编码我的表达式。用例是我将能够从另一个模块中复制和粘贴相同的逻辑，该模块用于另一种语言的不同开发领域，而不是每次都必须调整它以遵循 C++ 语法。

如果有人可以让我至少开始如何执行上述简单的 1 个表达式和 2 个布尔运算符的概念，那将不胜感激。

注意：由于我发布的另一个问题的反馈，我发布了这个问题：如何将 DSL 输入解析为高性能表达式模板。在这里，我实际上想要一个稍微不同的问题的答案，但是评论引发了这个我认为值得发布的特定问题，因为潜在的答案确实值得记录。

score 49 · Accepted Answer

免责声明：我对元解析一无所知，对原型也知之甚少。以下代码是我尝试（主要通过反复试验）修改此示例以执行与您想要的类似的操作。

代码可以很容易地分为几个部分：

1. 语法

1.1代币定义

typedef token < lit_c < 'a' > > arg1_token;
typedef token < lit_c < 'b' > > arg2_token;
typedef token < lit_c < 'c' > > arg3_token;

token<Parser>:
token 是一个解析器组合器，用于Parser解析输入，然后使用（并丢弃）所有空格。解析的结果是Parser.
lit_c<char>:
lit_c 匹配特定char的并且解析的结果是相同的字符。在语法中，这个结果被always.

typedef token < keyword < _S ( "true" ), bool_<true> > > true_token;
typedef token < keyword < _S ( "false" ), bool_<false> > > false_token;

keyword<metaparse_string,result_type=undefined>:
关键字匹配特定的metaparse_string（_S("true")返回 metaparse::string<'t','r','u','e'>Metaparse 在内部使用它来发挥它的魔力）并且解析的结果是result_type.

typedef token < keyword < _S ( "&&" ) > > and_token;
typedef token < keyword < _S ( "||" ) > > or_token;
typedef token < lit_c < '!' > > not_token;

在 and 的情况下，and_token结果or_token是未定义的，并且在下面的语法中它被忽略。

1.2语法的“规则”

struct paren_exp;

首先paren_exp是前向声明。

typedef one_of< 
        paren_exp, 
        transform<true_token, build_value>,
        transform<false_token, build_value>, 
        always<arg1_token, arg<0> >,
        always<arg2_token, arg<1> >, 
        always<arg3_token, arg<2> > 
    >
    value_exp;

one_of<Parsers...>:
one_of 是一个解析器组合器，它尝试将输入与其参数之一匹配。结果是匹配的第一个解析器返回的结果。
transform<Parser,SemanticAction>:
transform 是匹配的解析器组合器Parser。结果类型是Parser转换后的结果类型SemanticAction。

always<Parser,NewResultType>:
匹配Parser，返回NewResultType。

等效的精神规则是：

value_exp = paren_exp [ _val=_1 ]
    | true_token      [ _val=build_value(_1) ]
    | false_token     [ _val=build_value(_1) ]
    | argN_token      [ _val=phx::construct<arg<N>>() ];

typedef one_of< 
        transform<last_of<not_token, value_exp>, build_not>, 
        value_exp
    >
    not_exp;

last_of<Parsers...>:
last_of 匹配顺序中的每一个，Parsers其结果类型是最后一个解析器的结果类型。

等效的精神规则是：
```
not_exp = (omit[not_token] >> value_exp) [ _val=build_not(_1) ] 
    | value_exp                          [ _val=_1 ];
```

typedef
foldl_start_with_parser<
        last_of<and_token, not_exp>,
        not_exp,
        build_and
    > and_exp; // and_exp = not_exp >> *(omit[and_token] >> not_exp);

typedef
foldl_start_with_parser<
    last_of<or_token, and_exp>,
    and_exp,
    build_or
> or_exp;     // or_exp = and_exp >> *(omit[or_token] >> and_exp);

foldl_start_with_parser<RepeatingParser,InitialParser,SemanticAction>:
这个解析器组合器匹配InitialParser然后RepeatingParser多次匹配直到它失败。结果类型是的结果mpl::fold<RepeatingParserSequence, InitialParserResult, SemanticAction>，其中RepeatingParserSequence是每个应用程序的结果类型的序列RepeatingParser。如果RepeatingParser从未成功，则结果类型为 simple InitialParserResult。

我相信（xd）等效的精神规则是：
```
or_exp = and_exp[_a=_1] 
    >> *( omit[or_token] >> and_exp [ _val = build_or(_1,_a), _a = _val ]);  
```

struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; 
   // paren_exp = '(' >> or_exp >> ')';

middle_of<Parsers...>：
这匹配的序列Parsers和结果类型是在中间的解析器的结果。

typedef last_of<repeated<space>, or_exp> expression; 
   //expression = omit[*space] >> or_exp;

repeated<Parser>:
这个解析器组合器尝试匹配Parser多次。结果是解析器的每个应用程序的结果类型的序列，如果解析器第一次尝试失败，则结果是一个空序列。此规则只是删除任何前导空格。

typedef build_parser<entire_input<expression> > function_parser;

这一行创建了一个接受输入字符串并返回解析结果的元函数。

2. 表达式的构建

让我们看一个构建表达式的示例演练。这分两步完成：首先，语法构造一个依赖于build_or、build_and、build_value和build_not的树arg<N>。获得该类型后，您可以使用proto_typetypedef 获得原型表达式。

“一个||！b”

我们开始or_expr：

or_expr: 我们试试它的 InitialParser，它是and_expr.
- and_expr: 我们试试它的 InitialParser，它是not_expr.
- - not_expr: not_token 失败，所以我们尝试value_expr。
- - - value_expr: arg1_token 成功。返回类型是arg<0>，我们回到not_expr.
- - not_expr：此步骤不修改返回类型。我们回到and_expr.
- and_expr: 我们尝试了它的RepeatingParser，它失败了。and_expr 成功，它的返回类型是它的 InitialParser: 的返回类型arg<0>。我们回到or_expr.
- or_expr: 我们试试它的 RepeatingParser，or_token 匹配，我们试试and_expr。
- and_expr: 我们试试它的 InitialParser not_expr。
- - not_expr: not_token 成功了，我们试试value_expr。
- - - value_expr: arg2_token 成功。返回类型是arg<1>，我们回到not_expr.
- - not_expr：返回类型通过使用 build_not: build_not::apply< arg<1> >进行转换。我们回到and_expr.
- and_expr: 我们尝试了它的RepeatingParser，它失败了。and_expr 成功并返回build_not::apply< arg<1> >。我们回到or_expr.
or_expr：RepeatingParser 成功，foldlp 使用 build_orbuild_not::apply< arg<1> >和arg<0>，获得build_or::apply< build_not::apply< arg<1> >, arg<0> >.

一旦我们构建了这棵树，我们就会得到它proto_type：

build_or::apply< build_not::apply< arg<1> >, arg<0> >::proto_type;
proto::logical_or< arg<0>::proto_type, build_not::apply< arg<1> >::proto_type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, build_not::apply< arg<1> >::proto_type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< arg<1>::proto_type >::type >::type;
proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< proto::terminal< placeholder<1> >::type >::type >::type;

完整示例代码（在 Wandbox 上运行）

#include <iostream>
#include <vector>

#include <boost/metaparse/repeated.hpp>
#include <boost/metaparse/sequence.hpp>
#include <boost/metaparse/lit_c.hpp>
#include <boost/metaparse/last_of.hpp>
#include <boost/metaparse/middle_of.hpp>
#include <boost/metaparse/space.hpp>
#include <boost/metaparse/foldl_start_with_parser.hpp>
#include <boost/metaparse/one_of.hpp>
#include <boost/metaparse/token.hpp>
#include <boost/metaparse/entire_input.hpp>
#include <boost/metaparse/string.hpp>
#include <boost/metaparse/transform.hpp>
#include <boost/metaparse/always.hpp>
#include <boost/metaparse/build_parser.hpp>
#include <boost/metaparse/keyword.hpp>

#include <boost/mpl/apply_wrap.hpp>
#include <boost/mpl/front.hpp>
#include <boost/mpl/back.hpp>
#include <boost/mpl/bool.hpp>

#include <boost/proto/proto.hpp>
#include <boost/fusion/include/at.hpp>
#include <boost/fusion/include/make_vector.hpp>

using boost::metaparse::sequence;
using boost::metaparse::lit_c;
using boost::metaparse::last_of;
using boost::metaparse::middle_of;
using boost::metaparse::space;
using boost::metaparse::repeated;
using boost::metaparse::build_parser;
using boost::metaparse::foldl_start_with_parser;
using boost::metaparse::one_of;
using boost::metaparse::token;
using boost::metaparse::entire_input;
using boost::metaparse::transform;
using boost::metaparse::always;
using boost::metaparse::keyword;

using boost::mpl::apply_wrap1;
using boost::mpl::front;
using boost::mpl::back;
using boost::mpl::bool_;


struct build_or
{
    typedef build_or type;

    template <class C, class State>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::logical_or<typename State::proto_type, typename C::proto_type >::type proto_type;
    };
};

struct build_and
{
    typedef build_and type;

    template <class C, class State>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::logical_and<typename State::proto_type, typename C::proto_type >::type proto_type;
    };
};



template<bool I>
struct value //helper struct that will be used during the evaluation in the proto context
{};

struct build_value
{
    typedef build_value type;

    template <class V>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::terminal<value<V::type::value> >::type proto_type;
    };
};

struct build_not
{
    typedef build_not type;

    template <class V>
    struct apply
    {
        typedef apply type;
        typedef typename boost::proto::logical_not<typename V::proto_type >::type proto_type;
    };
};

template<int I>
struct placeholder //helper struct that will be used during the evaluation in the proto context
{};

template<int I>
struct arg
{
    typedef arg type;
    typedef typename boost::proto::terminal<placeholder<I> >::type proto_type;
};

#ifdef _S
#error _S already defined
#endif
#define _S BOOST_METAPARSE_STRING

typedef token < keyword < _S ( "&&" ) > > and_token;
typedef token < keyword < _S ( "||" ) > > or_token;
typedef token < lit_c < '!' > > not_token;

typedef token < keyword < _S ( "true" ), bool_<true> > > true_token;
typedef token < keyword < _S ( "false" ), bool_<false> > > false_token;

typedef token < lit_c < 'a' > > arg1_token;
typedef token < lit_c < 'b' > > arg2_token;
typedef token < lit_c < 'c' > > arg3_token;


struct paren_exp;

typedef
one_of< paren_exp, transform<true_token, build_value>, transform<false_token, build_value>, always<arg1_token, arg<0> >, always<arg2_token, arg<1> >, always<arg3_token, arg<2> > >
value_exp; //value_exp = paren_exp | true_token | false_token | arg1_token | arg2_token | arg3_token;

typedef
one_of< transform<last_of<not_token, value_exp>, build_not>, value_exp>
not_exp; //not_exp = (omit[not_token] >> value_exp) | value_exp;

typedef
foldl_start_with_parser <
last_of<and_token, not_exp>,
         not_exp,
         build_and
         >
         and_exp; // and_exp = not_exp >> *(and_token >> not_exp);

typedef
foldl_start_with_parser <
last_of<or_token, and_exp>,
         and_exp,
         build_or
         >
         or_exp; // or_exp = and_exp >> *(or_token >> and_exp);

struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; //paren_exp = lit('(') >> or_exp >> lit('(');

typedef last_of<repeated<space>, or_exp> expression; //expression = omit[*space] >> or_exp;

typedef build_parser<entire_input<expression> > function_parser;


template <typename Args>
struct calculator_context
        : boost::proto::callable_context< calculator_context<Args> const >
{
    calculator_context ( const Args& args ) : args_ ( args ) {}
    // Values to replace the placeholders
    const Args& args_;

    // Define the result type of the calculator.
    // (This makes the calculator_context "callable".)
    typedef bool result_type;

    // Handle the placeholders:
    template<int I>
    bool operator() ( boost::proto::tag::terminal, placeholder<I> ) const
    {
        return boost::fusion::at_c<I> ( args_ );
    }

    template<bool I>
    bool operator() ( boost::proto::tag::terminal, value<I> ) const
    {
        return I;
    }
};

template <typename Args>
calculator_context<Args> make_context ( const Args& args )
{
    return calculator_context<Args> ( args );
}

template <typename Expr, typename ... Args>
int evaluate ( const Expr& expr, const Args& ... args )
{
    return boost::proto::eval ( expr, make_context ( boost::fusion::make_vector ( args... ) ) );
}

#ifdef LAMBDA
#error LAMBDA already defined
#endif
#define LAMBDA(exp) apply_wrap1<function_parser, _S(exp)>::type::proto_type{}

int main()
{
    using std::cout;
    using std::endl;

    cout << evaluate ( LAMBDA ( "true&&false" ) ) << endl;
    cout << evaluate ( LAMBDA ( "true&&a" ), false ) << endl;
    cout << evaluate ( LAMBDA ( "true&&a" ), true ) << endl;
    cout << evaluate ( LAMBDA ( "a&&b" ), true, false ) << endl;
    cout << evaluate ( LAMBDA ( "a&&(b||c)" ), true, false, true ) << endl;
    cout << evaluate ( LAMBDA ( "!a&&(false||(b&&!c||false))" ), false, true, false ) << endl;
}

/*int main(int argc , char** argv)
{
    using std::cout;
    using std::endl;

    bool a=false, b=false, c=false;

    if(argc==4)
    {
        a=(argv[1][0]=='1');
        b=(argv[2][0]=='1');
        c=(argv[3][0]=='1');
    }

    LAMBDA("a && b || c") expr;

    cout << evaluate(expr, true, true, false) << endl;
    cout << evaluate(expr, a, b, c) << endl;

    return 0;
}*/

score 4 · Accepted Answer

长期以来，编译时解析意味着使用模板元编程——对于大多数初学者甚至中级 C++ 程序员来说，这似乎是一种干扰。

但是，在 C++11 中，我们得到了 constexpr，而在 C++14 中，对 constexpr 的许多限制都被删除了。C++17 甚至使一些标准库的东西 constexpr。

在尝试学习先进的现代 C++ 时——我决定编写一个编译时 HTML 解析器——这个想法是创建一个快速的 HTML 模板引擎。

完整的代码可以在这里找到：https ://github.com/rep-movsd/see-phit

我将简要解释一下我在实现这一点时学到的东西。

处理动态数据结构

我需要解析一个 const char* 并将其转换为多路树 - 但动态分配在 constexpr 领域是一个禁忌。

解决方案？使用一个节点数组，索引指向子节点和兄弟节点——本质上你可以在 FORTRAN 中如何做！

需要注意的是，您的节点列表最初必须具有固定大小。保持它非常大似乎会使 gcc 大大减慢编译速度。如果最终超出数组的末尾，编译器将抛出错误。我写了一个小的 std::array 像完全 constexpr 的包装器。

解析

您为运行时解析编写的几乎所有标准代码都将在编译时工作！循环、递归、条件 - 一切都很完美。

一个问题是 - 如何表示字符串？使用上面提到的方法 - 一个字符数组 - 是非常消耗内存，乏味的做事方式。幸运的是，我所需要的只是原始 const char* 输入的子字符串。所以我写了一个小的 constexpr string_view 类，它只保存指向相关解析标记的开始和结束的指针。创建新的文字字符串只是将这些视图变成 const char* 文字。

错误报告

处理 constexpr 代码中的错误的基本方法是调用不是 constexpr 的函数 - 编译器停止并打印违规行，该行很容易包含错误字符串。

但是我想要更多——我希望解析器也显示行和列。我挣扎了一会儿，最终认为这是不可能的。但我又回到了它，尝试了我能想到的所有可能的事情。最后我找到了一种让 gcc 打印 2 个数字和错误描述消息的方法。它本质上涉及创建一个带有两个整数参数（row 和 col）的模板，其值来自 constexpr 解析器。

表现

对于哪种 constexpr 代码往往会降低编译器的速度，我无法清楚地找到任何模式，但默认性能一点也不差。我能够在 gcc 上在大约 1.5 秒内解析一个 1000 个节点的 HTML 文件。

铿锵声有点快。

我打算在 github repo 的 wiki 中更详细地描述代码的工作原理 - 请继续关注。

score 0 · Accepted Answer

虽然在技术上可以使用模板或 constexpr 进行这种元编程，但我不推荐这种方法。你最终会得到大量非常复杂的代码。很难调试，维护和扩展成本很高。

相反，使用任何其他语言从您的表达式生成 C++ 代码。

如果您使用的是 Visual Studio，那么一种好的内置方式是 T4 文本模板。这里有更多细节。

否则，请使用您平台上可用的任何其他语言，例如我在 Python 中做了类似的事情。

c++ - 如何在编译时解析 DSL 的文本？

3 回答 3

1. 语法

1.1代币定义

1.2语法的“规则”

2. 表达式的构建

完整示例代码（在 Wandbox 上运行）

Related

Reference