c++ - Spirit-Qi: How can I write a nonterminal parser? -


i want write parser (as qi extension) can used via my_parser(p1, p2, ...) p1, p2, ... qi parser expressions.

actually, want implement best_matchparser works qi alternative, selects not first matching rule rule 'explains' of input.

given 2 rules simple_id = +(qi::alpha) , complex_id = simple_id >> *(qi::string("::") > simple_id) select complex_id on input willy::anton. , not produce intermediate attributes while doing so. these benefits payed runtime because lookahead parsing required.

in view there several use cases such parser construct. benchmark(p1, ...) supporting parser optimization, 1 example. provide that, once know how that.

this parser non-terminal. tried (hard), can not find answers problem within qi. guess is, c++ mechanisms tightly integrated withc qi, there no understandable entry point, @ least me.

it quite easy implement want introducing operator. append current solution below. compiles expected incomplete.

a qi operator gets fusion::vector/sequence (whatever) ready made , acts on it. there seem no library offerings solve problem. make_nary_composite expects args being compiled elements.

i tried lot, failed, don't want bore that.

a workaround can imagine provide stateful operator ,, make p1, ... legal qi expression. implement unary_parser best_match or directive process expression. , 2 modes: getting length of input current (successful) parser consumes , parsing selected 1 phase 1. wrapper run comma parser first in mode 1 , in mode 2. might ugly, work.

the operator implementation p1 |= p2 |= ... expensive regarding runtime n > 2. happy around that.

minimum (but nevertheless reasonable) overhead have best_match(p1, ...). possible @ all?

for not experienced boost::fusion added questions code (how tos regarding fusion).

it feels wrong, press nary non-terminal parser unary parser scheme. poor understanding seems way done. i'd highly appreciate me out of this.

my environment: boost 1.61, msvc 2015 upd 2, target win32console.

progress:

i think, i'm getting surely. happy see error_invalid_expression. compile failed because of following proto expression:

boost::proto::exprns_::expr<     boost::proto::tagns_::tag::function,boost::proto::argsns_::list5<         const boost::proto::exprns_::expr<             boost::proto::tagns_::tag::terminal,boost::proto::argsns_::term<                 mxc::qitoo::tag::best_match             >         ,0> &         ,const boost::spirit::qi::rule<             iterator_type,std::string (void),             boost::spirit::standard::space_type,             boost::spirit::unused_type,boost::spirit::unused_type> &         ,const boost::spirit::qi::rule<             iterator_type,std::string (void),             boost::spirit::standard::space_type,             boost::spirit::unused_type,             boost::spirit::unused_type> &         , const boost::spirit::qi::rule<             iterator_type,std::string (void),             boost::spirit::standard::space_type,             boost::spirit::unused_type,             boost::spirit::unused_type> &         ,const boost::spirit::qi::rule<             iterator_type,std::string (void),             boost::spirit::standard::space_type,             boost::spirit::unused_type,             boost::spirit::unused_type> &         >     ,5> 

this expression describes function-style usage of best_match parser. test rule looks like

qitoo::best_match(id, qualified_id, id, qualified_id) 

where id , qualified_id rules mentioned above. want. error occurs because expression not have member parse. okay. digging deeper found qi's meta-compiler support unary, binary , nary (what means variadic) parsers. not support function style syntax. qi seems utilize unary, binary , nary operators only. , allthough nary type supported, more or less obsolete, because qi operators binary max.

end progress edit

#include <boost/spirit/home/qi.hpp>  namespace boost {     namespace spirit     {         ///////////////////////////////////////////////////////////////////////////         // enablers         ///////////////////////////////////////////////////////////////////////////         template <>         struct use_operator<qi::domain, proto::tag::bitwise_or_assign> // enables |=             : mpl::true_ {};          template <>         struct flatten_tree<qi::domain, proto::tag::bitwise_or_assign> // flattens |=             : mpl::true_ {};     } }  namespace mxc {     namespace qitoo {          namespace spirit = boost::spirit;         namespace qi = spirit::qi;         namespace fusion = boost::fusion;          template <typename elements>         struct best_match_parser : qi::nary_parser<mxc::qitoo::best_match_parser<elements>>         {             // 1 lookahead parse of each qi expression find out rule matches              // of input              template <typename iterator, typename context, typename skipper>             struct best_function             {                 best_function(iterator& first, iterator const& last, context& context,                     skipper const& skipper, int& best, int& idx, int& size)                     : first(first), last(last), context(context), skipper(skipper), best(best),                     idx(idx), size(size) {};                  template <typename component>                 void operator()(component& component) const                 {                     iterator f = first;                     if (component.parse(f, last, context, skipper, qi::unused)) {                         // please have on how transport best, idx , size                         // parser context                         //                         // guess, nasty hack , done better                         // advice highliy appreciated                          int l = f - first;                         if (l > size) {                             size = l;                             best = idx++;                         }                         idx++;                     }                 }              private:                 int& best;                 int& idx;                 int& size;                  iterator&       first;                 iterator const& last;                 context&        context;                 skipper const&  skipper;             };              template <typename context, typename iterator>             struct attribute             {                 // put element attributes in tuple                 typedef typename spirit::traits::build_attribute_sequence <                     elements, context, spirit::traits::alternative_attribute_transform                     , iterator, qi::domain                 >::type all_attributes;                  // ok, make variant on attribute sequence. note                 // build_variant makes sure 1) attributes in variant                 // unique 2) puts unused attribute, if there any,                 // front , 3) collapses single element variants, variant<t>                 // t.                 typedef typename                     spirit::traits::build_variant<all_attributes>::type                     type;             };              best_match_parser(elements const& elements_) : elements(elements_), size(0), idx(0), best(-1) {}               template <typename iterator, typename context, typename skipper, typename attribute>             bool parse(iterator& first, iterator const& last                 , context& context, skipper const& skipper                 , attribute& attr_) const             {                 best_function<iterator, context, skipper>  f(first, last, context, skipper, best, idx, size);                  // find out parser matches of input                 fusion::for_each(elements, f);                  // best >= 0 if @ least 1 parser successful                 if (best >= 0) {                     // have best parser identified, how access it?                     // understand next line can not work, i'm looking                      // --> auto v = fusion::at<boost::mpl::int_<best>>(elements);                 };                  return false;             }              template <typename context>             qi::info what(context& context) const             {                 qi::info result("best_match");                 fusion::for_each(elements,                     spirit::detail::what_function<context>(result, context));                  return result;             }              elements elements;             mutable int  best;             mutable int  idx;             mutable int  size;          };     } }  namespace boost {     namespace spirit {         namespace qi {              ///////////////////////////////////////////////////////////////////////////             // parser generators: make_xxx function (objects)             ///////////////////////////////////////////////////////////////////////////             template <typename elements, typename modifiers>             struct make_composite<proto::tag::bitwise_or_assign, elements, modifiers>                 : make_nary_composite < elements, mxc::qitoo::best_match_parser >             {};         }          namespace traits {             ///////////////////////////////////////////////////////////////////////////             template <typename elements>             struct has_semantic_action<mxc::qitoo::best_match_parser<elements> >                 : nary_has_semantic_action<elements> {};              ///////////////////////////////////////////////////////////////////////////             template <typename elements, typename attribute, typename context                 , typename iterator>                 struct handles_container<mxc::qitoo::best_match_parser<elements>, attribute, context                 , iterator>                 : nary_handles_container<elements, attribute, context, iterator> {};         }     } } namespace qi = boost::spirit::qi; namespace qitoo = mxc::qitoo;  using iterator_type = std::string::const_iterator; using result_type = std::string;  template<typename parser> void parse(const std::string message, const std::string& input, const std::string& rule, const parser& parser) {     iterator_type iter = input.begin(), end = input.end();      std::vector<result_type> parsed_result;      std::cout << "-------------------------\n";     std::cout << message << "\n";     std::cout << "rule: " << rule << std::endl;     std::cout << "parsing: \"" << input << "\"\n";      bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result);     if (result)     {         std::cout << "parser succeeded.\n";         std::cout << "parsed " << parsed_result.size() << " elements:";         (const auto& str : parsed_result)             std::cout << "[" << str << "]";         std::cout << std::endl;     }     else     {         std::cout << "parser failed" << std::endl;     }     if (iter == end) {         std::cout << "eoi reached." << std::endl;     }     else {         std::cout << "eoi not reached. unparsed: \"" << std::string(iter, end) << "\"" << std::endl;     }     std::cout << "-------------------------\n";  }   int main() {     namespace qi = boost::spirit::qi;      qi::rule < iterator_type, std::string(), qi::space_type>         id = (qi::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'));      qi::rule < iterator_type, std::string(), qi::space_type>         qualified_id = id >> *(qi::string("::") > id);       namespace qitoo = mxc::qitoo;     namespace qi = boost::spirit::qi;      parse("best match operator, select second rule"         , "willy::anton::lutz"         , "id |= qualified_id"         , id |= qualified_id); } 

your example

i don't see how sample requires of this. reorder branches, realize short branch special case of qualified case n=1: live on coliru¹ (or using x3 version if prefer).

the generic case

now, having mentioned x3, has capacity make live easier!

here's think wanted, in general case:

namespace parser {      template <typename... parsers>     struct longest_parser : x3::parser_base {         longest_parser(parsers... sub) : _alternatives {sub...} { }          template <typename it, typename ctx, typename other, typename attr>         bool parse(it& f, l, ctx& ctx, other const& other, attr& attr) const {             auto const saved = f;              //// exclude pre-skip length comparisons, pre-skip here:             // x3::skip_over(f, l, ctx);             auto seq = std::make_index_sequence<sizeof...(parsers)>();              auto best = select_best(f, l, ctx, seq);             //std::cout << "longest match @ index #" << best << "\n";              bool ok = dispatch(f, l, ctx, other, attr, best, seq);              if (!ok)                 f = saved;              return ok;         }        private:         template <typename it, typename ctx, typename p>         size_t length_of(it f, l, ctx ctx, p const& p) const {             boost::iterator_range<it> matched;             return x3::raw[p].parse(f, l, ctx, x3::unused, matched)? boost::size(matched) : 0;         }          template <typename it, typename ctx, size_t... i>             size_t select_best(it f, l, ctx& ctx, std::index_sequence<i...>) const {                 std::array<size_t, sizeof...(i)> lengths { length_of(f, l, ctx, std::get<i>(_alternatives))... };                 return std::distance(lengths.begin(), std::max_element(lengths.begin(), lengths.end()));             }          template <typename it, typename ctx, typename other, typename attr, size_t... i>         bool dispatch(it& f, l, ctx& ctx, other const& other, attr& attr, size_t targetidx, std::index_sequence<i...>) const {             //return (real_parse<i>(f, l, ctx, other, attr, targetidx) || ...);             std::array<bool, sizeof...(i)> b = { real_parse<i>(f, l, ctx, other, attr, targetidx)... };              return std::accumulate(b.begin(), b.end(), false, std::logical_or<bool>());         }          template <size_t idx, typename it, typename ctx, typename other, typename attr>         bool real_parse(it& f, l, ctx& ctx, other const& other, attr& attr, size_t targetidx) const {             if (targetidx != idx)                 return false;              return std::get<idx>(_alternatives).parse(f, l, ctx, other, attr);         }          std::tuple<parsers...> _alternatives;     };      template <typename... ps>         longest_parser<ps...> longest(ps... p) { return {x3::as_parser(p)...}; } } 

note fold expression use in dispatch if compiler supports (coliru does, edit see!).

note subtle choice regarding skippable (probably whitespace); if it's not significant length comparisons, uncomment pre-skip.

live demo

live on coliru

#include <boost/spirit/home/x3.hpp> #include <type_traits> #include <iostream> #include <numeric>  namespace x3 = boost::spirit::x3;  namespace std {     template <typename t> // easy debug printing; hack     static std::ostream& operator<<(std::ostream& os, std::vector<t> const& v) {         (auto& el : v) std::cout << '[' << el << ']';         return os;     } }  using string_vec  = std::vector<std::string>; using result_type = boost::variant<std::string, double, string_vec>;  template <typename parser> void parse(const std::string message, const std::string &input, const std::string &rule, const parser &parser) {     auto iter = input.begin(), end = input.end();      std::cout << "-------------------------\n";     std::cout << message << "\n";     std::cout << "rule:     " << rule  << "\n";     std::cout << "parsing: '" << input << "'\n";      result_type parsed_result;     bool result = phrase_parse(iter, end, parser, x3::space, parsed_result);      if (result) {         std::cout << "parsed " << parsed_result << "\n";     } else {         std::cout << "parser failed\n";     }     if (iter != end)         std::cout << "eoi not reached. unparsed: '" << std::string(iter, end) << "'\n"; }  namespace parser {      template <typename... parsers>     struct longest_parser : x3::parser_base {         longest_parser(parsers... sub) : _alternatives {sub...} { }          template <typename it, typename ctx, typename other, typename attr>         bool parse(it& f, l, ctx& ctx, other const& other, attr& attr) const {             auto const saved = f;              //// exclude pre-skip length comparisons, pre-skip here:             // x3::skip_over(f, l, ctx);             auto seq = std::make_index_sequence<sizeof...(parsers)>();              auto best = select_best(f, l, ctx, seq);             //std::cout << "longest match @ index #" << best << "\n";              bool ok = dispatch(f, l, ctx, other, attr, best, seq);              if (!ok)                 f = saved;              return ok;         }        private:         template <typename it, typename ctx, typename p>         size_t length_of(it f, l, ctx ctx, p const& p) const {             boost::iterator_range<it> matched;             return x3::raw[p].parse(f, l, ctx, x3::unused, matched)? boost::size(matched) : 0;         }          template <typename it, typename ctx, size_t... i>             size_t select_best(it f, l, ctx& ctx, std::index_sequence<i...>) const {                 std::array<size_t, sizeof...(i)> lengths { length_of(f, l, ctx, std::get<i>(_alternatives))... };                 return std::distance(lengths.begin(), std::max_element(lengths.begin(), lengths.end()));             }          template <typename it, typename ctx, typename other, typename attr, size_t... i>         bool dispatch(it& f, l, ctx& ctx, other const& other, attr& attr, size_t targetidx, std::index_sequence<i...>) const {             //return (real_parse<i>(f, l, ctx, other, attr, targetidx) || ...);             std::array<bool, sizeof...(i)> b = { real_parse<i>(f, l, ctx, other, attr, targetidx)... };              return std::accumulate(b.begin(), b.end(), false, std::logical_or<bool>());         }          template <size_t idx, typename it, typename ctx, typename other, typename attr>         bool real_parse(it& f, l, ctx& ctx, other const& other, attr& attr, size_t targetidx) const {             if (targetidx != idx)                 return false;              return std::get<idx>(_alternatives).parse(f, l, ctx, other, attr);         }          std::tuple<parsers...> _alternatives;     };      template <typename... ps>         longest_parser<ps...> longest(ps... p) { return {x3::as_parser(p)...}; } }  int main() {     auto id        = x3::rule<void, std::string> {} = x3::lexeme [ x3::char_("a-za-z_") >> *x3::char_("a-za-z0-9_") ];     auto qualified = x3::rule<void, string_vec>  {} = id % "::";  #define test_case(label, input, rule) parse(label, input, #rule, rule)     test_case("unqualified"                , "willy"                , parser::longest(id, x3::int_, x3::double_));     test_case("unqualified whitespace", " willy \t"            , parser::longest(id, x3::int_, x3::double_));     test_case("integral or number"         , "123.78::anton::lutz"  , parser::longest(id, x3::int_, x3::double_));     test_case("qualified"                  , "willy anton::lutz"    , parser::longest(id, x3::int_, x3::double_));     test_case("qualified whitespace"  , "willy \tanton::lutz"  , parser::longest(id, x3::int_, x3::double_));      test_case("unqualified"                , "willy"                , parser::longest(id, x3::int_, x3::double_, qualified));     test_case("unqualified whitespace", " willy \t"            , parser::longest(id, x3::int_, x3::double_, qualified));     test_case("integral or number"         , "123.78::anton::lutz"  , parser::longest(id, x3::int_, x3::double_, qualified));     test_case("qualified"                  , "willy::anton::lutz"   , parser::longest(id, x3::int_, x3::double_, qualified));     test_case("qualified whitespace"  , "willy ::\tanton::lutz", parser::longest(id, x3::int_, x3::double_, qualified));      test_case("unqualified"                , "willy"                , parser::longest(x3::int_, x3::double_, qualified));     test_case("unqualified whitespace", " willy \t"            , parser::longest(x3::int_, x3::double_, qualified));     test_case("integral or number"         , "123.78::anton::lutz"  , parser::longest(x3::int_, x3::double_, qualified));     test_case("qualified"                  , "willy::anton::lutz"   , parser::longest(x3::int_, x3::double_, qualified));     test_case("qualified whitespace"  , "willy ::\tanton::lutz", parser::longest(x3::int_, x3::double_, qualified)); } 

prints

------------------------- unqualified rule:     parser::longest(id, x3::int_, x3::double_) parsing: 'willy' parsed willy ------------------------- unqualified whitespace rule:     parser::longest(id, x3::int_, x3::double_) parsing: ' willy    ' parsed willy ------------------------- integral or number rule:     parser::longest(id, x3::int_, x3::double_) parsing: '123.78::anton::lutz' parsed 123.78 eoi not reached. unparsed: '::anton::lutz' ------------------------- qualified rule:     parser::longest(id, x3::int_, x3::double_) parsing: 'willy anton::lutz' parsed willy eoi not reached. unparsed: 'anton::lutz' ------------------------- qualified whitespace rule:     parser::longest(id, x3::int_, x3::double_) parsing: 'willy     anton::lutz' parsed willy eoi not reached. unparsed: 'anton::lutz' ------------------------- unqualified rule:     parser::longest(id, x3::int_, x3::double_, qualified) parsing: 'willy' parsed willy ------------------------- unqualified whitespace rule:     parser::longest(id, x3::int_, x3::double_, qualified) parsing: ' willy    ' parsed willy ------------------------- integral or number rule:     parser::longest(id, x3::int_, x3::double_, qualified) parsing: '123.78::anton::lutz' parsed 123.78 eoi not reached. unparsed: '::anton::lutz' ------------------------- qualified rule:     parser::longest(id, x3::int_, x3::double_, qualified) parsing: 'willy::anton::lutz' parsed [willy][anton][lutz] ------------------------- qualified whitespace rule:     parser::longest(id, x3::int_, x3::double_, qualified) parsing: 'willy ::  anton::lutz' parsed [willy][anton][lutz] ------------------------- unqualified rule:     parser::longest(x3::int_, x3::double_, qualified) parsing: 'willy' parsed [willy] ------------------------- unqualified whitespace rule:     parser::longest(x3::int_, x3::double_, qualified) parsing: ' willy    ' parsed [willy] ------------------------- integral or number rule:     parser::longest(x3::int_, x3::double_, qualified) parsing: '123.78::anton::lutz' parsed 123.78 eoi not reached. unparsed: '::anton::lutz' ------------------------- qualified rule:     parser::longest(x3::int_, x3::double_, qualified) parsing: 'willy::anton::lutz' parsed [willy][anton][lutz] ------------------------- qualified whitespace rule:     parser::longest(x3::int_, x3::double_, qualified) parsing: 'willy ::  anton::lutz' parsed [willy][anton][lutz] 

note different results depending on parser expressions in alternatives.