c# - 正则表达式匹配具有混合 C# 代码的字符串中的 NC 注释

Question

我有一个混合了 NC 代码和 C# 代码的文本文件。C#-代码以“<#”开头，以“#>”结尾。现在我需要一个正则表达式来查找所有 NC-Comments。一个问题是 NC-Comments 以“;”开头因此我遇到了一些问题来区分 NC-Comment 和“;” C#-代码。

是否可以仅使用一个正则表达式来实现这一目标？

; 1. NC-Comment
FUNCT_A;
FUNCT_B;

<# // C#-Code
int temp = 42;
string var = "hello";   // C#-Comment
#>

FUNCT_C ; 2. Comment

<# // C#-Code
for(int i = 0; i <10; i++)
{
    Console.WriteLine(i.ToString());
}
#>  

; 3. Comment
FUNCT_D;

正则表达式的结果应该是 {1. NC-评论，2.评论，3.评论}

我玩过以下正则表达式：

1.) (;(.*?)\r?\n) --> Finds all NC-Comments but also C#-Code as comment
2.) (#>.*?<#)|(#>.*) --> Finds all NC-Code except the first NC-Code fragment
3.) #>.+?(?=<#) --> Finds all NC-Code except the first and last NC-Code fragment

一种解决方案可能是将每个“<#”推入堆栈并从该堆栈中弹出每个“#>”。因此，如果堆栈为空，则当前字符串为 NC-Code。接下来我必须找出这个字符串是否是 NC-Comment。

score 1 · Accepted Answer

我宁愿不使用正则表达式：

public static List<string> GetNCComments(Stream stream)
{
    using (StreamReader sr = new StreamReader(stream))
    {
        List<string> result = new List<string>();
        bool inCS = false; // are we in C# code?
        int c;
        while ((c = sr.Read()) != -1)
        {
            if (inCS)
            {
                switch ((char)c)
                {
                    case '#':
                        if (sr.Peek() == '>') // end of C# block
                        {
                            sr.Read();
                            inCS = false;
                        }
                        break;
                    case '/':
                        if (sr.Peek() == '/') // a C# comment
                            sr.ReadLine(); // skip the whole comment
                        break;
                }
            }
            else
            {
                switch ((char)c)
                {
                    case '<':
                        if (sr.Peek() == '#') // start of C# block
                        {
                            sr.Read();
                            inCS = true;
                        }
                        break;
                    case ';': // NC comment
                        string comment = sr.ReadLine();
                        if (!string.IsNullOrEmpty(comment))
                            result.Add(comment);
                        break;
                }
            }
        }
        return result;
    }
}

用法：

var comments = GetNCComments(new FileStream(filePath, FileMode.Open, FileAccess.Read));

代码很简单且不言自明。这也处理 C# 注释，但不处理 C# 字符串。我的意思是，如果你有#>一个 C# 注释，它可以正常工作。但是，如果您有相同的 C# 字符串（错误地将其视为 C# 块的结尾），则不起作用。处理这种情况也很容易。

c# - 正则表达式匹配具有混合 C# 代码的字符串中的 NC 注释

1 回答 1

Related

Reference