这是针对此问题的最小 C++ 弹性词法分析器。非贪婪匹配的关键是 flex 手册和其他地方提到的开始条件。
开始条件只是词法分析器的另一种状态。当需要非贪婪匹配时,有一些模式需要在第一次出现时终止匹配
一般来说,如果您正在寻找目标字符串或模式,无论状态如何,您只需要确保没有其他更通用的模式可以匹配包含目标模式的更长输入
当目标模式是有条件的并且需要在一些较早的匹配后启用时,开始条件会有所帮助。您打开开始条件以启用匹配目标模式并通过将状态重置为 0 或INITIAL
- 或切换到另一个状态以进行更多条件匹配来关闭它
状态切换BEGIN
- 还有一个状态堆栈用于通过yy_push_state
和yy_pop_state
flex手册中有很多启动条件的例子
以下是显示与 flex 开始条件的非贪婪匹配的 flex 规则 - 词法分析器匹配一行上第一次出现的 dog 直到第一次出现 cat - 匹配不区分大小写
完整的文件发布在最后——对于不熟悉 flex 的人,请注意许多行以空格开头——这不是偶然的,而且是 flex 所必需的
%%
/* flex rules section */
string match;
dog {
// found a dog, change state to HAVE_DOG to start looking for a cat
BEGIN(HAVE_DOG);
// save the found dog
match = yytext;
}
/* save and keep going till cat is found */
<HAVE_DOG>. match += yytext;
<HAVE_DOG>cat {
// save the found cat
match += yytext;
// output the matched dog and cat
cout << match << "\n";
// ignore rest of line
BEGIN(SKIP_LINE);
}
/* no cat on this line, reset state */
<HAVE_DOG>\n BEGIN(0);
/* rules to ignore rest of the line then reset state */
<SKIP_LINE>{
.*
\n BEGIN(0);
}
/* nothing to do yet */
.|\n
这是一些测试输入
$ cat dogcat.in.txt
Dog Ca Cat Cc Cat
dog Ca cat Cc cat
dOg Ca cAt Cc cAt
DOG CA CAT CC CAT
cat dog dog cat cat
dog kitten cat dog cat
dig cat dog can dog cut
dig dug dog cow cat cat
doc dogcat catdog
dog dog dog
cat cat cat
构建
flex -o dogcat.flex.cpp dogcat.flex.l && g++ -o dogcat dogcat.flex.cpp
运行
$ ./dogcat < dogcat.in.txt
Dog Ca Cat
dog Ca cat
dOg Ca cAt
DOG CA CAT
dog dog cat
dog kitten cat
dog cow cat
dogcat
完整的 flex 文件
/* dogcat.flex.l */
/*
Build with:
flex -o dogcat.flex.cpp dogcat.flex.l && g++ -o dogcat dogcat.flex.cpp
*/
/*
A minimal C++ flex lexer that shows nongreedy matching with flex
start conditions
matches the first occurrence of dog on a line till the first
occurrence of cat
matching is case insensitive
*/
/* C++ lexer using yyFlexLexer in FlexLexer.h */
%option c++
/* case-insensitive patterns */
%option case-insensitive
/* generate main function for executable */
%option main
/* all input must be matched, no echo by default */
%option nodefault
/* debug output with lexer.set_debug(1) */
%option debug
/* start condition means dog was matched */
%x HAVE_DOG
/* start condition means to ignore remaining line */
%x SKIP_LINE
%{
#include <string>
#include <iostream>
// C++ flex lexer class
// needed because header itself has no guard
#ifndef yyFlexLexerOnce
# include <FlexLexer.h>
#endif
using namespace std;
namespace {
// the C++ lexer class from flex
yyFlexLexer lexer;
// main generated by flex still calls free yylex function even for C++ lexer
int yylex() {
return lexer.yylex();
}
}
%}
%%
/* flex rules section */
string match;
dog {
// found a dog, change state to HAVE_DOG to start looking for a cat
BEGIN(HAVE_DOG);
// save the found dog
match = yytext;
}
/* save and keep going till cat is found */
<HAVE_DOG>. match += yytext;
<HAVE_DOG>cat {
// save the found cat
match += yytext;
// output the matched dog and cat
cout << match << "\n";
// ignore rest of line
BEGIN(SKIP_LINE);
}
/* no cat on this line, reset state */
<HAVE_DOG>\n BEGIN(0);
/* rules to ignore rest of the line then reset state */
<SKIP_LINE>{
.*
\n BEGIN(0);
}
/* nothing to do yet */
.|\n