3

I'm using C# regular expressions for my project in compiler design.

I'm working with a lexical analyzer and I have to tokenize the code depending on the rules I have set.

I defined my string as [\".*?\"] and double quote as [\"].

When I input "Hi" it is read as STRING TOKEN.

But when I input " \" ", it yields STRING for " \" and DOUBLE-QUOTE for ".

I want it to be read as STRING TOKEN.

In other words, I want to correctly parse strings containing escaped double quotes.

4

2 回答 2

2

我相信你想要的模式是:

"(?:[^"]|\")*"

这将匹配引号内的任何非引号字符或斜杠-引号对。例如:

var input = @"1 2 3 ""Hello \""Word\""!""";
var match = Regex.Match(input, @"""(?:[^""]|\"")*""");

Console.WriteLine(match.Value); // "Hello \"Word\"!"
于 2013-07-10T04:09:03.537 回答
0

尝试为您输入的每个字符打印出 ASCII 码。根据您的输入是在命令行上、通过 GUI 还是来自文件,反斜杠将产生不同的效果。

您不情愿的匹配器可能将\. 本身视为一个字符,而不是 . 的修饰符"

于 2013-07-10T04:13:37.097 回答