c++ - 如何使用 Boost::Spirit::Lex 对文件进行 lex 而不先将整个文件读入内存？

Question

我正在考虑使用 boost::spirit::lex 编写词法分析器，但我能找到的所有示例似乎都假设您已先将整个文件读入 RAM。我想编写一个不需要整个字符串都在 RAM 中的词法分析器，这可能吗？还是我需要使用其他东西？

我尝试使用 istream_iterator，但除非我使用 const char* 作为迭代器类型，否则 boost 会给我一个编译错误。

例如，我能找到的所有示例基本上都是这样做的：

lex_functor_type< lex::lexertl::lexer<> > lex_functor;

// assumes entire file is in memory
char const* first = str.c_str();
char const* last = &first[str.size()];

bool r = lex::tokenize(first, last, lex_functor, 
    boost::bind(lex_callback_functor(), _1, ... ));

另外，是否有可能以某种方式从 lex 标记中确定行号/列号？

谢谢！

score 6 · Accepted Answer

只要符合标准前向迭代器的要求，Spirit Lex 就可以与任何迭代器一起使用。这意味着您可以为词法分析器（调用lex::tokenize()）提供任何符合要求的迭代器。例如，如果你想使用 a std::istream，你可以把它包装成 a boost::spirit::istream_iterator：

bool tokenize(std::istream& is, ...)
{
    lex_functor_type< lex::lexertl::lexer<> > lex_functor;

    boost::spirit::istream_iterator first(is);
    boost::spirit::istream_iterator last;

    return lex::tokenize(first, last, lex_functor,
        boost::bind (lex_callback_functor(), _1, ... ));   
}

它会起作用的。

对于您问题的第二部分（与输入的行/列号相关）：是的，可以使用词法分析器跟踪输入位置。不过，这并非微不足道。您需要创建自己的令牌类型来存储行/列信息并使用它而不是预定义的令牌类型。很多人一直在问这个问题，所以我可能会继续创建一个示例。

c++ - 如何使用 Boost::Spirit::Lex 对文件进行 lex 而不先将整个文件读入内存？

1 回答 1

Related

Reference