c# - 查找字符串中空行的索引

Question

假设我有一个包含文本文件、回车和制表符的字符串。如何在该字符串中找到第一个空白行（包括仅包含空白的行）的索引？

我试过的：

在这种情况下，我有一个工作函数，它利用一堆丑陋的代码来查找空行的索引。必须有比this更优雅/更易读的方法。

为了清楚起见，下面的函数将字符串中的部分从提供的“标题”返回到标题后第一个空行的索引。完整提供，因为大部分内容都用于搜索该索引，并避免任何“为什么在世界上你需要空行索引”的问题。如果它发生在这里，也可以抵消 XY 问题。

（显然有效，尚未测试所有边缘情况）代码：

// Get subsection indicated by supplied title from supplied section
private static string GetSubSectionText(string section, string subSectionTitle)
    {
        int indexSubSectionBgn = section.IndexOf(subSectionTitle);
        if (indexSubSectionBgn == -1)
            return String.Empty;

        int indexSubSectionEnd = section.Length;

        // Find first blank line after found sub-section
        bool blankLineFound = false;
        int lineStartIndex = 0;
        int lineEndIndex = 0;
        do
        {
            string temp;
            lineEndIndex = section.IndexOf(Environment.NewLine, lineStartIndex);

            if (lineEndIndex == -1)
                temp = section.Substring(lineStartIndex);
            else
                temp = section.Substring(lineStartIndex, (lineEndIndex - lineStartIndex));

            temp = temp.Trim();
            if (temp.Length == 0)
            {
                if (lineEndIndex == -1)
                    indexSubSectionEnd = section.Length;
                else
                    indexSubSectionEnd = lineEndIndex;

                blankLineFound = true;
            }
            else
            {
                lineStartIndex = lineEndIndex + 1;
            }
        } while (!blankLineFound && (lineEndIndex != -1));

        if (blankLineFound)
            return section.Substring(indexSubSectionBgn, indexSubSectionEnd);
        else
            return null;
}

后续编辑：

结果（很大程度上基于康斯坦丁的回答）：

// Get subsection indicated by supplied title from supplied section
private static string GetSubSectionText(string section, string subSectionTitle)
{
        string[] lines = section.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
        int subsectStart = 0;
        int subsectEnd = lines.Length;

        // Find subsection start
        for (int i = 0; i < lines.Length; i++)
        {
            if (lines[i].Trim() == subSectionTitle)
            {
                subsectStart = i;
                break;
            }
        }

        // Find subsection end (ie, first blank line)
        for (int i = subsectStart; i < lines.Length; i++)
        {
            if (lines[i].Trim().Length == 0)
            {
                subsectEnd = i;
                break;
            }
        }

        return string.Join(Environment.NewLine, lines, subsectStart, subsectEnd - subsectStart);

}

结果和 Konstantin 的答案之间的主要区别在于框架版本（我正在使用 .NET 2.0，它不支持 string[].Take），并利用 Environment.NewLine 而不是硬编码的 '\n' . 比原来的通行证更漂亮、更易读。谢谢大家！

score 4 · Accepted Answer

您是否尝试过使用String.Split 方法：

string s = "safsadfd\r\ndfgfdg\r\n\r\ndfgfgg";
string[] lines = s.Split('\n');
int i;
for (i = 0; i < lines.Length; i++)
{
    if (string.IsNullOrWhiteSpace(lines[i]))     
    //if (lines[i].Length == 0)          //or maybe this suits better..
    //if (lines[i].Equals(string.Empty)) //or this
    {
        Console.WriteLine(i);
        break;
    }
}
Console.WriteLine(string.Join("\n",lines.Take(i)));

编辑：响应 OP 的编辑。

score 3 · Accepted Answer

“空白行”是指仅包含空格的行？是的，你应该使用正则表达式；您正在寻找的语法是@"(?<=\r?\n)[ \t]*(\r?\n|$)".

(?<=…<code>) 表示前瞻，应该先于你正在寻找的东西。
\r?\n表示换行符，同时支持 Unix 和 Windows 约定。
(?<=\r?\n)因此是前一个换行符的前瞻。
[ \t]*表示零个或多个空格或制表符；这些将与您的空白行的内容（如果有）相匹配。
(\r?\n|$)表示换行符或文件结尾。

例子：

string source = "Line 1\r\nLine 2\r\n   \r\nLine 4\r\n";
Match firstBlankLineMatch = Regex.Match(source, @"(?<=\r?\n)[ \t]*(\r?\n|$)");
int firstBlankLineIndex = 
    firstBlankLineMatch.Success ? firstBlankLineMatch.Index : -1;

score 2 · Accepted Answer

只是为了好玩：您似乎可以每行重新分配一次字符串。那么就可以编写一个迭代器来懒惰地评估字符串并返回每一行。例如：

IEnumerable<string> BreakIntoLines(string theWholeThing)
{
    int startIndex = 0;
    int endIndex = 0;
    for(;;)
    {
        endIndex = theWholeThing.IndexOf(Environment.NewLine,startIndex) + Environment.NewLine.Count; //Remember to pick up the newline character(s) too!
        if(endIndex = -1) //Didn't find a newline
        {
            //Return the end part of the string and finish
            yield return theWholeThing.SubString(startIndex);
            yield break;
        }
        else //Found a newline
        {
            //Return where we're at up to the newline
            yield return theWholeThing.SubString(startIndex, endIndex - startIndex);
            startIndex = endIndex;
        }
    }
}

然后，您可以将该迭代器包装在另一个仅返回您关心的行并丢弃其他行的迭代器中。

IEnumerable<string> GetSubsectionLines(string theWholeThing, string subsectionTitle)
{
    bool foundSubsectionTitle = false;
    foreach(var line in BreakIntoLines(theWholeThing))
    {
        if(line.Contains(subSectionTitle))
        {
            foundSubsectionTitle = true; //Start capturing
        }

        if(foundSubsectionTitle)
        {
            yield return line;
        } //Implicit "else" - Just discard the line if we haven't found the subsection title yet

        if(String.IsNullOrWhiteSpace(line))
        {
            //This will stop iterating after returning the empty line, if there is one
            yield break;
        }
    }
}

现在，这种方法（连同其他一些发布的方法）并不能完全按照您的原始代码所做的。例如，如果 subsectionTitle 中的文本碰巧跨越了一行，它就不会被找到。我们假设规范的编写方式不允许这样做。该代码还将复制原始代码所返回的每一行，所以这可能没问题。

与 string.split 相比，这样做的唯一好处是，当您完成返回 SubSection 时，不会对字符串的其余部分进行评估。对于大多数合理大小的字符串，您可能不在乎。任何“性能提升”都可能不存在。如果您真的关心性能，那么您一开始就不会复制每一行！

您得到的另一件事（实际上可能很有价值）是代码重用。如果您正在编写一个解析文档的程序，那么能够对单独的行进行操作可能会有所帮助。

c# - 查找字符串中空行的索引

3 回答 3

Related

Reference