2

我需要能够通过查看标题或类似内容(无需下载)来判断链接(URL)是否指向 XML 文件(RSS 提要)或常规 HTML 文件

那里对我有什么好的建议吗?:)

谢谢!罗伊

4

7 回答 7

11

你可以只做一个 HEAD 请求而不是一个完整的 POST/GET

这将为您提供该页面的标题,其中应包含内容类型。从中你应该能够区分它的文本/html或xml

这里有一个很好的例子

于 2009-05-25T10:01:47.073 回答
5

跟进 Eoin Campbell 的回应,这里有一个代码片段,应该使用该System.Net功能完全做到这一点:

using (var request = System.Net.HttpWebRequest.Create(
    "http://tempuri.org/pathToFile"))
{
    request.Method = "HEAD";

    using (var response = request.GetResponse())
    {
        switch (response.ContentType)
        {
            case "text/xml":
                // ...
                break;
            case "text/html":
                // ...
                break;
        }
    }
}

Of course, this assumes that the web server publishes the content (MIME) type and does so correctly. But since you stated that want a bandwidth-efficient way of doing this, I assume you don't want to download all the markup and analyse that! To be honest, the content type is usually set correctly in any case.

于 2009-05-25T10:05:00.123 回答
2

您可以使用Content-Type标题,并且为了节省带宽,您可以强制 Web 服务器为您提供文档的指定部分。如果服务器Accept-Ranges: bytes在其响应中包含标头,您可以使用Range: bytes=0-10仅下载前十个字节(甚至尝试不下载任何内容)。

也研究HEAD动词而不是GET

于 2009-05-25T10:01:00.850 回答
1

Check the headers in your HttpWebResponse object. The Content-Type header should read text/xml for an XML/RSS document and text/html for a standard web page.

于 2009-05-25T10:06:33.053 回答
0

您无法仅通过查看 URL 来找出它是什么文件类型。

我建议您尝试检查您请求的文档的 MIME-type,或者阅读第一行并希望作者输入了 Doctype。

于 2009-05-25T10:01:40.490 回答
0

Generally speaking, this impossible. This is because it is possible (though unhelpful) to serve either HTML or XML files as application/octet-stream. Also, as noted by others, there are multiple valid XML mime types. However, a HEAD request then content type check could work sometimes:

WebRequest req = WebRequest.Create(url);
WebResponse resp = req.GetResponse();
req.Method = "HEAD";
String contentType = resp.ContentType;

if(contentType == "text/xml")
  getXML(url);
else if(contentType == "text/html")
  getHTML(url);

But if you're going to process it somehow either way, you can do:

WebRequest req = WebRequest.Create(url);
WebResponse resp = req.GetResponse();
String contentType = resp.ContentType;

if(contentType == "text/xml")
  processXML(resp.GetResponseStream());
else if(contentType == "text/html")
  processHTML(resp.GetResponseStream());
else
  // process error condition

Keep in mind, files are downloaded on an as-needed basis. So just asking for the response object does not cause the whole file to be downloaded.

于 2009-05-25T10:11:50.143 回答
-3

只需在“文本”阅读器中阅读即可。然后决定哪个是最好的,例如,寻找一些想到的标签;)然后将其放入您的实际阅读器中。

还是那太简单了?

于 2009-05-25T10:00:06.420 回答