我有一个包含一些字符串的 html 文件,例如:
"http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150000&cultuur=en-GB&continent=europa","http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150300&cultuur=en-GB&continent=europa","http://www.niederschlagsradar.de/images.aspx?jaar=-6&type=europa.precip&datum=201309150600&cultuur=en-GB&continent=europa"
这是我使用的代码:
在构造函数中我做了:
f = File.ReadAllText(localFilename + "test.html");
retrivingText1();
private void retrivingText1()
{
string startTag = "http://www.niederschlagsradar.de/images.aspx";//"<Translation>";
string endTag = "continent=europa";//"</Translation>";
int startTagWidth = startTag.Length;
int endTagWidth = endTag.Length;
index = 0;
w = new StreamWriter(@"d:\retrivedText1.txt");
while (true)
{
index = f.IndexOf(startTag, index);
if (index == -1)
{
break;
}
// else more to do - index now is positioned at first character of startTag
int start = index + startTagWidth;
index = f.LastIndexOf(endTag, start + 1);
if (index == -1)
{
break;
}
// found the endTag
string g = f.Substring(start, index - start + endTagWidth).Trim(); //Trim the founded text so the start and ending spaces are removed.
w.WriteLine(g);
//break so you dont have an endless loop
break;
}
w.Close();
}
我知道从 html 文件中提取最好使用 htmlagilitypack 或正则表达式。但是这次我想试试 indexof 和 substring。
当我在行上使用断点时:
int start = index + startTagWidth;
开始 = 2950
下一行 index = -1