c# - 我试图使用 indexof 和 substring 从文件中提取文本，但变量 index 一直是 -1 有什么问题？

Question

我有一个包含一些字符串的 html 文件，例如：

"http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150000&cultuur=en-GB&continent=europa","http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150300&cultuur=en-GB&continent=europa","http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150600&cultuur=en-GB&continent=europa"

我想提取每一行：http ://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150000&cultuur=en-GB&continent=europa

然后下一个：http ://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150300&cultuur=en-GB&continent=europa

这是我使用的代码：

在构造函数中我做了：

f = File.ReadAllText(localFilename + "test.html");
retrivingText1();


private void retrivingText1()
        {
            string startTag = "http://www.niederschlagsradar.de/images.aspx";//"<Translation>";
            string endTag = "continent=europa";//"</Translation>";
            int startTagWidth = startTag.Length;
            int endTagWidth = endTag.Length;
            index = 0;
            w = new StreamWriter(@"d:\retrivedText1.txt");
            while (true)
            {
                index = f.IndexOf(startTag, index);
                if (index == -1)
                {
                    break;
                }
                // else more to do - index now is positioned at first character of startTag 
                int start = index + startTagWidth;
                index = f.LastIndexOf(endTag, start + 1);
                if (index == -1)
                {
                    break;
                }
                // found the endTag 
                string g = f.Substring(start, index - start + endTagWidth).Trim(); //Trim the founded text so the start and ending spaces are removed.
                w.WriteLine(g);
                //break so you dont have an endless loop
                break;
            }
            w.Close();
        }

我知道从 html 文件中提取最好使用 htmlagilitypack 或正则表达式。但是这次我想试试 indexof 和 substring。

当我在行上使用断点时：

int start = index + startTagWidth;

开始 = 2950

下一行 index = -1

score 2 · Accepted Answer

我更喜欢 Don 的回答，但如果你真的想使用 indexof，如果你启动循环并执行以下操作会更容易：

private void button3_Click(object sender, EventArgs e)
    {
        string f = "\"http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150000&cultuur=en-GB&continent=europa\",\"http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150300&cultuur=en-GB&continent=europa\",\"http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150600&cultuur=en-GB&continent=europa\"";

        int startIndex = 0;
        int endIndex = 0;
        int position = 0;

        string startTag = "http://www.niederschlagsradar.de/images.aspx";//"<Translation>";
        string endTag = "continent=europa";//"</Translation>";

        startIndex = f.IndexOf(startTag);

        while (startIndex > 0)
        {
            endIndex = f.IndexOf(endTag, position);

            //parse out what you want

            position = endIndex + endTag.Length;

            startIndex = (f.IndexOf(startTag, position));

            //something here to prevent endless loop
        }

    }

score 1 · Accepted Answer

在您引用的页面上，我找不到您要查找的文本行...

我认为，就像您也想过的那样，使用正则表达式会更好：

http:\/\/www\.niederschlagsradar\.de\/images\.aspx\?jaar=-6&type=europa\.precip&datum=\d{12}&cultuur=en-GB&continent=europa

然后，您将获得进一步处理所需的所有参考资料。

编辑

如果您不想使用 IndexOf 和 SubString。您以错误的方式使用 LastIndexOf 。LastIndexOf 在字符串中向后搜索到字符串的开头。

文档

尝试只使用 IndexOf 代替

score 0 · Accepted Answer

鉴于您的示例文件，我希望：

String[] sa = f.Split(',');
foreach (String s in sa)
{
    String strToWrite = f.Trim('\"');
    //write your string
}

c# - 我试图使用 indexof 和 substring 从文件中提取文本，但变量 index 一直是 -1 有什么问题？

3 回答 3

Related

Reference