3

我认为使用 jQuery 是可能的,但是任何 ASP.NET 服务器端代码也适合我的情况。

使用 jQuery,我可以将页面加载到例如 div,并过滤 div 以获取<title>标签,但我认为对于重页,首先阅读所有内容然后阅读标题标签是不好的......或者它可能有一个非常简单的解决方案?无论如何,我无法从互联网上找到任何相关信息。谢谢

4

7 回答 7

2

cjjer almost got it right.

First, change the regex to: <title>(?<Content>.*?)?</title>

Second, you need to create a match object first (just in case your URI does not have a title).

Match tMatch = new RegEx(@"<title>(?<Content>.*?)?</title>").Match(new System.Net.WebClient().DownloadString(url));

if ((null != tMatch) && (tMatch.IsSuccess)) {
    //  yay.
    title = tMatch.Groups("Content").value;
}
于 2009-03-01T10:32:23.457 回答
2

okay thanks to cjjer and Boo, I've just read more about regex and finally the code below is working for me.

Dim qq As New System.Net.WebClient
    Dim theuri As New Uri(TextBox1.Text)
    Dim res As String = qq.DownloadString(theuri)
    Dim re As Regex = New Regex("<title\b[^>]*>(.*?)</title>", RegexOptions.Singleline)
    Dim ma As Match = re.Match(res)


    If Not ma Is Nothing And ma.Success Then
        Response.Write(ma.Groups(1).Value.ToString())
    Else
        Response.Write("error")
    End If

but anyways, the problem remains, this code is downloading the whole page and seeking through it, which one heavy websites it took more than 2 or 3 secconds to complete, but seems it is the only way as far as I know :| Is there any suggestions to refine this code?

于 2009-03-02T12:18:15.363 回答
1

标题通常出现在前几百个字节内,因此您可以尝试对前 1KiB 左右的范围请求,尝试解析它(使用纠错解析器,因为会丢失一些结束标签),如果失败则回退到加载整个页面。

于 2009-03-01T10:11:12.570 回答
0

将任何其他网页加载到您的网页中会带来安全风险,只是为了读取标题...您应该使用服务器端脚本(asp.net,php,...)执行此操作,然后将标题输出到您的网络页。某种缓存的东西,因为它可以无缝地在每个请求上获取标题。

于 2009-03-01T09:23:40.937 回答
0

没有简单干净的方法来检索外部页面的标题。您可以在服务器端使用 aWebClient并解析响应。

但是,可能值得审查要求,是否真的有必要,它将产生多少额外的流量和延迟。还要考虑到您可能会在不知道您想要的只是标题的外部站点上生成负载,页面创建可能会非常昂贵。

于 2009-03-01T09:33:21.867 回答
0
string title=Regex.Match(new System.Net.WebClient().DownloadString(url),(@"<title>(.*?)</title>"))[0].Groups[1].ToString();

试试。我不确定。

于 2009-03-01T09:46:11.150 回答
0

我不确定是否所有服务器都支持这个。
看看,如果这有帮助


char[] data = new char[299];
System.Net.HttpWebRequest wr =(HttpWebRequest)WebRequest.Create("http://www.yahoo.com");
wr.AddRange("bytes", 0, 299);
HttpWebResponse wre = (HttpWebResponse)wr.GetResponse();
StreamReader sr = new StreamReader(wre.GetResponseStream());
sr.Read(data, 0, 299);
Console.WriteLine((data));
sr.Close();

EDIT: Try checking with some network monitoring tool to find out what is the text that servers send out. I used fiddler to see the output & wrote it to console.

EDIT2: I am assuming the title to be in the beginning of the page.

于 2009-03-01T10:15:55.197 回答