我需要能够通过查看标题或类似内容(无需下载)来判断链接(URL)是否指向 XML 文件(RSS 提要)或常规 HTML 文件
那里对我有什么好的建议吗?:)
谢谢!罗伊
我需要能够通过查看标题或类似内容(无需下载)来判断链接(URL)是否指向 XML 文件(RSS 提要)或常规 HTML 文件
那里对我有什么好的建议吗?:)
谢谢!罗伊
跟进 Eoin Campbell 的回应,这里有一个代码片段,应该使用该System.Net
功能完全做到这一点:
using (var request = System.Net.HttpWebRequest.Create(
"http://tempuri.org/pathToFile"))
{
request.Method = "HEAD";
using (var response = request.GetResponse())
{
switch (response.ContentType)
{
case "text/xml":
// ...
break;
case "text/html":
// ...
break;
}
}
}
Of course, this assumes that the web server publishes the content (MIME) type and does so correctly. But since you stated that want a bandwidth-efficient way of doing this, I assume you don't want to download all the markup and analyse that! To be honest, the content type is usually set correctly in any case.
您可以使用Content-Type
标题,并且为了节省带宽,您可以强制 Web 服务器为您提供文档的指定部分。如果服务器Accept-Ranges: bytes
在其响应中包含标头,您可以使用Range: bytes=0-10
仅下载前十个字节(甚至尝试不下载任何内容)。
也研究HEAD
动词而不是GET
。
Check the headers in your HttpWebResponse object. The Content-Type header should read text/xml for an XML/RSS document and text/html for a standard web page.
您无法仅通过查看 URL 来找出它是什么文件类型。
我建议您尝试检查您请求的文档的 MIME-type,或者阅读第一行并希望作者输入了 Doctype。
Generally speaking, this impossible. This is because it is possible (though unhelpful) to serve either HTML or XML files as application/octet-stream. Also, as noted by others, there are multiple valid XML mime types. However, a HEAD request then content type check could work sometimes:
WebRequest req = WebRequest.Create(url);
WebResponse resp = req.GetResponse();
req.Method = "HEAD";
String contentType = resp.ContentType;
if(contentType == "text/xml")
getXML(url);
else if(contentType == "text/html")
getHTML(url);
But if you're going to process it somehow either way, you can do:
WebRequest req = WebRequest.Create(url);
WebResponse resp = req.GetResponse();
String contentType = resp.ContentType;
if(contentType == "text/xml")
processXML(resp.GetResponseStream());
else if(contentType == "text/html")
processHTML(resp.GetResponseStream());
else
// process error condition
Keep in mind, files are downloaded on an as-needed basis. So just asking for the response object does not cause the whole file to be downloaded.
只需在“文本”阅读器中阅读即可。然后决定哪个是最好的,例如,寻找一些想到的标签;)然后将其放入您的实际阅读器中。
还是那太简单了?