parsing - Boost::Spirit Token Text 在语义动作中的大小写不敏感字符串比较

Question

我有一个标记器和一个解析器。解析器有一个特殊的标记类型，KEYWORD，用于关键字（大约有 50 个）。在我的解析器中，我想确保令牌是我所期望的，所以我对每个令牌都有规则。像这样：

KW_A = tok.KEYWORDS[_pass = (_1 == "A")];
KW_B = tok.KEYWORDS[_pass = (_1 == "B")];
KW_C = tok.KEYWORDS[_pass = (_1 == "C")];

这工作得很好，但它不区分大小写（我要处理的语法是！）。我想使用 boost::iequals，但尝试将 _1 转换为 std::string 会导致以下错误：

error: no viable conversion from 'const _1_type' (aka 'const actor<argument<0> >') to 'std::string' (aka 'basic_string<char>')

如何将这些关键字视为字符串并确保它们是预期的文本而不考虑大小写？

score 2 · Accepted Answer

一点点学习大有裨益。我在我的词法分析器中添加了以下内容：

struct normalise_keyword_impl
{
    template <typename Value>
    struct result
    {
        typedef void type;
    };

    template <typename Value>
    void operator()(Value const& val) const
    {
        // This modifies the original input string.
        typedef boost::iterator_range<std::string::iterator> iterpair_type;
        iterpair_type const& ip = boost::get<iterpair_type>(val);
        std::for_each(ip.begin(), ip.end(),
            [](char& in)
            {
                in = std::toupper(in);
            });
    }
};

    boost::phoenix::function<normalise_keyword_impl> normalise_keyword;

    // The rest...
};

然后使用 phoenix 将操作绑定到我的构造函数中的关键字标记，如下所示：

this->self =
    KEYWORD [normalise_keyword(_val)]
    // The rest...
    ;

虽然这完成了我所追求的，但它修改了原始输入序列。是否可以进行一些修改以便我可以使用 const_iterator 而不是迭代器，并避免修改我的输入序列？

我尝试返回从 ip.begin() 复制到 ip.end() 并使用 boost::toupper(...) 大写的 std::string，将其分配给 _val。尽管它编译并运行了，但它所产生的内容显然存在一些问题：

Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
result is SELECT
Token: 0: KEYWORD ('KEYWOR')
Token: 1: REGULAR_IDENTIFIER ('a')
result is FROM
Token: 0: KEYWORD ('KEYW')
Token: 1: REGULAR_IDENTIFIER ('b')

很奇怪，看来我还有更多的学习要做。

最终解决方案

好的，我最终使用了这个功能：

struct normalise_keyword_impl
{
    template <typename Value>
    struct result
    {
        typedef std::string type;
    };

    template <typename Value>
    std::string operator()(Value const& val) const
    {
        // Copy the token and update the attribute value.
        typedef boost::iterator_range<std::string::const_iterator> iterpair_type;
        iterpair_type const& ip = boost::get<iterpair_type>(val);

        auto result = std::string(ip.begin(), ip.end());
        result = boost::to_upper_copy(result);
        return result;
    }
};

而这个语义动作：

KEYWORD [_val = normalise_keyword(_val)]

使用（并将事情整理出来），修改后的 token_type：

typedef std::string::const_iterator base_iterator;
typedef boost::spirit::lex::lexertl::token<base_iterator, boost::mpl::vector<std::string> > token_type;
typedef boost::spirit::lex::lexertl::actor_lexer<token_type> lexer_type;
typedef type_system::Tokens<lexer_type> tokens_type;
typedef tokens_type::iterator_type iterator_type;
typedef type_system::Grammar<iterator_type> grammar_type;

// Establish our lexer and our parser.
tokens_type lexer;
grammar_type parser(lexer);

// ...

重要的补充是boost::mpl::vector<std::string> >。结果：

Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
Token: 0: KEYWORD ('SELECT')
Token: 1: REGULAR_IDENTIFIER ('a')
Token: 0: KEYWORD ('FROM')
Token: 1: REGULAR_IDENTIFIER ('b')

我不知道为什么这已经解决了问题，所以如果有人可以用他们的专业知识插话，我是一个愿意的学生。

parsing - Boost::Spirit Token Text 在语义动作中的大小写不敏感字符串比较

1 回答 1

最终解决方案

Related

Reference