In an application, I basically want to have a "pre-parsing" phase where I adjust the token stream before a Qi parser can see it.
One way to do this would be to have some kind of "lexer adaptor" which is constructed from a lexer
and is itself a lexer
, which wraps and modifies the behavior of the inner lexer
. However it would be simpler and easier to debug if instead I just lex the entire input stream with the inner lexer
first and store the results in a std::vector<token_type>
, then modify as desired, then pass the result to the parser. (In my application I don't think that there would even be any performance concern with this.)
In an email exchange from a few years back, someone described exactly this question and Hartmut said that it should be trivial. http://comments.gmane.org/gmane.comp.parsers.spirit.general/24899
However I didn't find any code examples or instructions how to do this beyond, look at the headers in spirit::lex
and figure it out. That will likely occupy me for quite a while now unless you, dear reader, can assist.
The specific question is, how can I make a "shim" lexer which wraps over a pair of std::vector<token_type>::iterator
's and looks to spirit::qi
just like a standard spirit::lex
lexer
.
Edit: To be clear, this is not a duplicate of this question: Using Boost.Spirit.Qi with custom lexer
My token_type
s are attributed, and the details of the extra things that Hartmut says I need to do are the substance of this question.
Edit: Okay, I made an SSCCE. This version does not have attributed lexer tokens, but even without that I still can't get it to work yet, and this seems like as good an SSCCE to get started anyways.
Highlights:
"Token buffer" type:
template<typename TokenType>
struct token_buffer {
std::vector<TokenType> tokens_;
token_buffer() = default;
bool operator()(token_type t) {
tokens_.push_back(t);
return true;
}
void print(std::ostream & o) const { ... }
};
My first attempt at making a "buffer lexer" which looks like a lex::lexer to Qi, but in fact serves tokens from a buffer. This one derives from lex_basic above, I'm not sure if that's correct.
template<typename LexerType>
class buffer_lexer : public lex_basic<LexerType> {
public:
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
private:
const buff_type & buff_;
public:
buffer_lexer(const buff_type & b) : lex_basic<LexerType>(), buff_(b) {}
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is needed
template<typename T>
iterator_type begin(T, T) { return begin(); }
};
My second attempt at making a buffer lexer. This one does not derive from lex_basic
and instead tries to follow these instructions found in the header boost/spirit/home/lex/lexer/lexertl/lexer.hpp
:
///////////////////////////////////////////////////////////////////////////
//
// Every lexer type to be used as a lexer for Spirit has to conform to
// the following public interface:
//
// typedefs:
// iterator_type The type of the iterator exposed by this lexer.
// token_type The type of the tokens returned from the exposed
// iterators.
//
// functions:
// default constructor
// Since lexers are instantiated as base classes
// only it might be a good idea to make this
// constructor protected.
// begin, end Return a pair of iterators, when dereferenced
// returning the sequence of tokens recognized in
// the input stream given as the parameters to the
// begin() function.
// add_token Should add the definition of a token to be
// recognized by this lexer.
// clear Should delete all current token definitions
// associated with the given state of this lexer
// object.
//
// template parameters:
// Iterator The type of the iterator used to access the
// underlying character stream.
// Token The type of the tokens to be returned from the
// exposed token iterator.
// Functor The type of the InputPolicy to use to instantiate
// the multi_pass iterator type to be used as the
// token iterator (returned from begin()/end()).
//
///////////////////////////////////////////////////////////////////////////
Here's the "buffer_lexer_raw" that I came up with:
template<typename Iterator,
typename TokenType,
typename Functor = lex::lexertl::functor<TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
typedef TokenType token_type;
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
typedef typename boost::detail::iterator_traits<typename token_type::iterator_type>::value_type char_type;
private:
buff_type buff_;
public:
buffer_lexer_raw() {}
void set_buffer(const buff_type & b) { buff_ = b; }
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is needed
template<typename T>
iterator_type begin(T, T) { return begin(); }
std::size_t add_token(char_type const* state, char_type tokendef,
std::size_t token_id, char_type const* targetstate)
{
return 1;
}
void clear(char_type const* state) {}
};
The test code responds to a macro defined at the top of the file.
// Use the type "buffer_lexer" which derives from lex_basic<Lexer>
//#define WHICH_LEXER_TYPE 1
// Use the type "buffer_lexer_raw" which does not derive from anything
//#define WHICH_LEXER_TYPE 2
// Use the "placebo" lexer, which is just lex_basic<Lexer>, as a sanity test of our lex:: api calls
#define WHICH_LEXER_TYPE 0
The test code will:
- Run the lexer on a simple test case and make a detailed dump of the lexed token sequence.
- Run the lexer and grammar in tandem on a few simple test cases using
lex::tokenize_and_parse
, and dump the resulting AST. - Try lexing and parsing again, using the lexer selected by the macro to generate iterators for use with
qi::parse
. It will check that the resulting AST is the same as the AST generated the "easy" way.
Currently the #define WHICH_LEXER_TYPE 0
option compiles and works great for me with both gcc-4.8 and clang-3.6.
I can't actually get it to compile with the #define WHICH_LEXER_TYPE 1
or #define WHICH_LEXER_TYPE 2
options. With type 1, clang gives the following error message which I don't have the foggiest idea about:
In file included from main.cpp:1:
In file included from /usr/include/boost/spirit/include/lex_lexertl.hpp:16:
In file included from /usr/include/boost/spirit/home/lex/lexer_lexertl.hpp:15:
In file included from /usr/include/boost/spirit/home/lex.hpp:13:
In file included from /usr/include/boost/spirit/home/lex/lexer.hpp:14:
In file included from /usr/include/boost/spirit/home/lex/lexer/token_def.hpp:21:
In file included from /usr/include/boost/spirit/home/lex/reference.hpp:16:
/usr/include/boost/spirit/home/qi/reference.hpp:43:30: error: no matching member function for call to 'parse'
return ref.get().parse(first, last, context, skipper, attr);
~~~~~~~~~~^~~~~
/usr/include/boost/spirit/home/qi/parse.hpp:86:42: note: in instantiation of function template specialization 'boost::spirit::qi::reference<const
boost::spirit::qi::rule<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const
char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > >, ast::Body (),
boost::spirit::locals<std::basic_string<char>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
boost::spirit::unused_type, boost::spirit::unused_type> >::parse<__gnu_cxx::__normal_iterator<const
boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >, boost::spirit::context<boost::fusion::cons<ast::Body &, boost::fusion::nil>,
boost::spirit::locals<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na> >, boost::spirit::unused_type,
ast::Body>' requested here
return compile<qi::domain>(expr).parse(first, last, context, unused, attr);
^
main.cpp:414:12: note: in instantiation of function template specialization 'boost::spirit::qi::parse<__gnu_cxx::__normal_iterator<const
boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >,
basic_grammar<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const
char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > > >, ast::Body>' requested here
if (!qi::parse(it, fin, bgram, tree2)) {
^
/usr/include/boost/spirit/home/qi/nonterminal/rule.hpp:273:14: note: candidate function [with Context = boost::spirit::context<boost::fusion::cons<ast::Body &,
boost::fusion::nil>, boost::spirit::locals<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na> >, Skipper =
boost::spirit::unused_type, Attribute = ast::Body] not viable: no known conversion from '__gnu_cxx::__normal_iterator<const
boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >' to
'boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *,
std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > > &' for 1st argument
bool parse(Iterator& first, Iterator const& last
^
/usr/include/boost/spirit/home/qi/nonterminal/rule.hpp:319:14: note: candidate function template not viable: requires 6 arguments, but 5 were provided
bool parse(Iterator& first, Iterator const& last
^
1 error generated.
The "2" option gives essentially the same error message. gcc doesn't seem to give a better error message.
Here's the complete source code:
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/std_pair.hpp>
#include <boost/variant/get.hpp>
#include <boost/variant/variant.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <boost/preprocessor/stringize.hpp>
#include <vector>
#include <string>
typedef unsigned int uint;
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace mpl = boost::mpl;
// Use the type "buffer_lexer" which derives from lex_basic<Lexer>
//#define WHICH_LEXER_TYPE 1
// Use the type "buffer_lexer_raw" which does not derive from anything
//#define WHICH_LEXER_TYPE 2
// Use the "placebo" lexer, which is just lex_basic<Lexer>, as a sanity test of
// our lex:: api calls
#define WHICH_LEXER_TYPE 0
//// Lexer definition
enum tokenids {
LCARET = lex::min_token_id + 10,
RCARET,
BSLASH,
LBRACE,
RBRACE,
LPAREN,
RPAREN,
EQUALS,
USCORE,
ALPHA,
NUM,
EOL,
BLANK,
IDANY
};
#define TOKEN_CASE(X) \
case X: return #X
const char *token_id_string(size_t id) {
switch (id) {
TOKEN_CASE(LCARET);
TOKEN_CASE(RCARET);
TOKEN_CASE(BSLASH);
TOKEN_CASE(LBRACE);
TOKEN_CASE(RBRACE);
TOKEN_CASE(LPAREN);
TOKEN_CASE(RPAREN);
TOKEN_CASE(EQUALS);
TOKEN_CASE(USCORE);
TOKEN_CASE(ALPHA);
TOKEN_CASE(NUM);
TOKEN_CASE(EOL);
TOKEN_CASE(BLANK);
TOKEN_CASE(IDANY);
default:
return "Unknown token";
}
}
template <typename Lexer> struct lex_basic : lex::lexer<Lexer> {
lex_basic() {
this->self.add
('<', LCARET)
('>', RCARET)
('/', BSLASH)
('{', LBRACE)
('}', RBRACE)
('(', LPAREN)
(')', RPAREN)
('=', EQUALS)
('_', USCORE)
("[A-Za-z]", ALPHA)
("[0-9]", NUM)
('\n', EOL)
("[ \\t\\r]", BLANK)
(".", IDANY);
}
};
typedef std::string::const_iterator str_it;
// the token type needs to know the iterator type of the underlying
// input and the set of used token value types
typedef lex::lexertl::token<str_it, mpl::vector<char>> token_type;
template <typename TokenType> struct token_buffer {
std::vector<TokenType> tokens_;
token_buffer() = default;
bool operator()(token_type t) {
tokens_.push_back(t);
return true;
}
void print(std::ostream &o) const {
std::cout << "tokens_.size() == " << tokens_.size() << std::endl;
for (size_t i = 0; i < tokens_.size(); ++i) {
const TokenType &t = tokens_[i];
o << "[" << i << "]: -" << token_id_string(t.id()) << "- \"" << t
<< "\" [";
const auto &v = t.value();
if (t.id() == EOL) {
o << "\\n";
} else {
o << v;
}
o << "]" << std::endl;
}
}
};
/***
* Lexers which serve tokens from a buffer
*/
// Two versions of the same thing, one deriving from lex::lexer, one not
template <typename LexerType> class buffer_lexer : public lex_basic<LexerType> {
public:
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
private:
const buff_type &buff_;
public:
buffer_lexer(const buff_type &b) : lex_basic<LexerType>(), buff_(b) {}
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is
// needed
template <typename T> iterator_type begin(T, T) { return begin(); }
};
template <typename Iterator, typename TokenType,
typename Functor = lex::lexertl::functor<
TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
typedef TokenType token_type;
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
typedef typename boost::detail::iterator_traits<
typename token_type::iterator_type>::value_type char_type;
private:
buff_type buff_;
public:
buffer_lexer_raw() {}
void set_buffer(const buff_type &b) { buff_ = b; }
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is
// needed
template <typename T> iterator_type begin(T, T) { return begin(); }
std::size_t add_token(char_type const *state, char_type tokendef,
std::size_t token_id, char_type const *targetstate) {
return 1;
}
void clear(char_type const *state) {}
};
/***
* AST
*/
namespace ast {
typedef std::string Str;
struct BraceExpr;
typedef boost::variant<Str, boost::recursive_wrapper<BraceExpr>> BraceExprArg;
struct BraceExpr {
std::vector<BraceExprArg> args;
};
typedef std::pair<Str, Str> Pair;
struct Body;
typedef boost::variant<Pair, BraceExpr, boost::recursive_wrapper<Body>> Node;
struct Body {
Str key;
std::vector<Node> nodes;
};
} // end namespace ast
BOOST_FUSION_ADAPT_STRUCT(ast::BraceExpr,
(std::vector<ast::BraceExprArg>, args))
BOOST_FUSION_ADAPT_STRUCT(ast::Body,
(ast::Str, key)(std::vector<ast::Node>, nodes))
namespace ast {
// Stream ops
class printer : public boost::static_visitor<> {
std::ostream &ss_;
uint indent_;
std::string indent(uint extra = 0) const {
return std::string(indent_ + extra, ' ');
}
std::string indent_plus_tab() const { return indent(tab_width); }
public:
static constexpr uint tab_width = 4;
explicit printer(std::ostream &s, uint indent = 0)
: ss_(s), indent_(indent) {}
void operator()(const Str &s) const { ss_ << s; }
void operator()(const BraceExpr &b) const {
ss_ << "{";
for (size_t i = 0; i < b.args.size(); ++i) {
if (i) {
ss_ << " ";
}
boost::apply_visitor(*this, b.args[i]);
}
ss_ << "}";
}
void operator()(const Pair &p) const { ss_ << p.first << " = " << p.second; }
void operator()(const Body &b) const {
ss_ << indent() << "<" << b.key << ">\n";
printer p{ss_, indent_ + tab_width};
for (const auto &n : b.nodes) {
ss_ << indent_plus_tab();
boost::apply_visitor(p, n);
ss_ << "\n";
}
ss_ << indent() << "</" << b.key << ">";
}
};
std::ostream &operator<<(std::ostream &ss, const BraceExpr &b) {
printer p{ss};
p(b);
return ss;
}
std::ostream &operator<<(std::ostream &ss, const Pair &p) {
printer pr{ss};
pr(p);
return ss;
}
std::ostream &operator<<(std::ostream &ss, const Body &b) {
printer p{ss};
p(b);
return ss;
}
// Equality ops
bool operator==(const Pair &p1, const Pair &p2) {
return p1.first == p2.first && p1.second == p2.second;
}
bool operator==(const BraceExpr &b1, const BraceExpr &b2) {
return b1.args == b2.args;
}
bool operator==(const Body &b1, const Body &b2) {
return b1.key == b2.key && b1.nodes == b2.nodes;
}
bool operator!=(const Pair &p1, const Pair &p2) { return !(p1 == p2); }
bool operator!=(const BraceExpr &b1, const BraceExpr &b2) {
return !(b1 == b2);
}
bool operator!=(const Body &b1, const Body &b2) { return !(b1 == b2); }
} // end namespace ast
/***
* Grammar
*/
template <typename Iterator>
struct basic_grammar
: qi::grammar<Iterator, ast::Body(), qi::locals<ast::Str>> {
qi::rule<Iterator, ast::Body(), qi::locals<ast::Str>> body;
qi::rule<Iterator, ast::Node()> node;
qi::rule<Iterator, ast::Pair()> pair;
qi::rule<Iterator, ast::BraceExprArg()> brace_expr_arg;
qi::rule<Iterator, ast::BraceExpr()> brace_expr;
qi::rule<Iterator, ast::Str()> identifier;
qi::rule<Iterator, ast::Str()> str;
qi::rule<Iterator, ast::Str()> open_tag;
qi::rule<Iterator /*, ast::Str()*/> close_tag;
qi::rule<Iterator> lbrace;
qi::rule<Iterator> rbrace;
qi::rule<Iterator> equals;
qi::rule<Iterator> ws;
template <typename TokenDef>
basic_grammar(const TokenDef &tok)
: basic_grammar::base_type(body, "body") {
using namespace qi;
ws %= token(BLANK) | token(EOL);
lbrace %= token(LBRACE);
rbrace %= token(RBRACE);
equals %= token(EQUALS);
identifier %= token(ALPHA) >> *(token(ALPHA) | token(NUM) | token(USCORE));
str %= *(token(LCARET) | token(RCARET) | token(BSLASH) | token(LPAREN) |
token(RPAREN) | token(ALPHA) | token(NUM) | token(USCORE) |
token(EQUALS) | token(BLANK) | token(IDANY));
open_tag %= omit[token(LCARET)] >> identifier >>
omit[token(RCARET)]; // tok.open_tag;
close_tag %= omit[token(LCARET) >> token(BSLASH)] >> identifier >>
omit[token(RCARET)]; // tok.close_tag;
pair = skip(boost::proto::deep_copy(ws))[identifier >> equals >> str];
body = skip(boost::proto::deep_copy(ws))[open_tag >> *node >> close_tag];
node = brace_expr | body | pair;
brace_expr_arg = brace_expr | identifier;
brace_expr =
skip(boost::proto::deep_copy(ws))[lbrace >> *brace_expr_arg >> rbrace];
}
};
/***
* Usage / Tests
*/
// use actor_lexer<> here if your token definitions have semantic
// actions
typedef lex::lexertl::lexer<token_type> lexer_type;
// this is the iterator exposed by the lexer, we use this for parsing
typedef lexer_type::iterator_type iterator_type;
token_buffer<token_type> test_lexer(const std::string &input,
bool silent = false) {
str_it s = input.begin();
str_it end = input.end();
// create a lexer instance
lex_basic<lexer_type> lex;
token_buffer<token_type> buff;
if (!lex::tokenize(s, end, lex, [&](token_type t) { return buff(t); })) {
if (!silent) {
std::cout << "\nTokenizing failed!" << std::endl;
}
} else {
if (!silent) {
std::cout << "\nTokenizing succeeded!" << std::endl;
}
}
if (!silent) {
buff.print(std::cout);
}
return buff;
}
void test_grammar(const std::string &input) {
lex_basic<lexer_type> lex;
basic_grammar<iterator_type> gram{lex};
ast::Body tree;
{
str_it s = input.begin();
str_it end = input.end();
if (!lex::tokenize_and_parse(s, end, lex, gram, tree)) {
std::cout << "\nParsing failed!" << std::endl;
} else {
std::cout << "\nParsing succeeded!" << std::endl;
}
std::cout << tree << std::endl;
}
// Now try to do it in two steps, with buffered lexer
auto buff = test_lexer(input, true); // get buffer, silence output
#if WHICH_LEXER_TYPE == 1
buffer_lexer<lexer_type> blex{buff.tokens_};
#else
#if WHICH_LEXER_TYPE == 2
buffer_lexer_raw<str_it, token_type> blex;
blex.set_buffer(buff.tokens_);
#else
lex_basic<lexer_type> blex;
#endif
#endif
basic_grammar<iterator_type> bgram{blex};
ast::Body tree2;
{
#if (WHICH_LEXER_TYPE == 1) || (WHICH_LEXER_TYPE == 2)
auto it = blex.begin();
#else
str_it s = input.begin();
str_it end = input.end();
auto it = blex.begin(s, end);
#endif
auto fin = blex.end();
if (!qi::parse(it, fin, bgram, tree2)) {
std::cout << "\nBuffered parsing failed!" << std::endl;
} else {
std::cout << "\nBuffered parsing succeeded!" << std::endl;
}
}
std::cout << tree2 << std::endl;
if (tree != tree2) {
std::cout << "\nRegular parsing vs. buffered parsing mismatch!"
<< std::endl;
}
}
int main() {
std::string input{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"</asdf>\n"};
test_lexer(input);
// Use lexer and grammar at once as demonstrated in tutorials
std::string input2 = "<asdf></asdf>";
test_grammar(input2);
test_grammar(input);
std::string input3{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"<jkl>\n"
"baz = gaz\n"
"{H {H H} {{{H} {G} {F foo}} {B ar}} {Q i}}\n"
"</jkl>\n"
"</asdf>\n"};
test_grammar(input3);
return 0;
}