我已经保存了网页的源代码(每个浏览器中的选项);现在我想捕捉以 . 开头的引号之间的所有内容http://
。我怎样才能做到这一点?
问问题
277 次
2 回答
1
使用HTML 敏捷包
string path = ...
var doc = new HtmlDocument();
doc.Load(path);
var links =
from e in doc.DocumentNode.Descendants()
from a in e.Attributes
where a.Value.StartsWith("http://")
select a.Value;
(请注意,它只返回 HTML 属性中的链接,而不是纯文本中的链接)
于 2013-04-28T16:16:48.057 回答
0
使用正则表达式:
Dim mc As MatchCollection = Regex.Matches(html, """(http://.+?)""", RegexOptions.IgnoreCase)
For Each m As Match In mc
Console.WriteLine(m.Groups(1).Value)
Next
示例输出 when html
= 本页源代码:
http://cdn.sstatic.net/stackoverflow/img/favicon.ico
http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png
http://cdn.sstatic.net/js/stub.js?v=181da36f6419
http://cdn.sstatic.net/stackoverflow/all.css?v=0f0c93534e2b
http://stackoverflow.com/questions/16264292/extract-all-values-between-double-quotes-from-a-webpages-source-code
http://www.gravatar.com/avatar/91d33760d2823fa7cf5c95b41a16fada?s=32&d=identicon&r=PG\
http://stackoverflow.com/users/2264365/ajakblackgoat
http://stackexchange.com
http://chat.stackoverflow.com
... etc
于 2013-04-28T16:14:12.460 回答