-2

当您使用正则表达式搜索某些内容时,我正在尝试获取 Google 在首页上生成的 10 个网站的链接。我对正则表达式很陌生,并且在让它工作时遇到了很多麻烦:

MatchCollection links = Regex.Matches(indexPage, @"<h3 class=""r""><a href=""\s*(.+?)\s*"" class=l", RegexOptions.Multiline);

一旦我在集合中获得链接,我将它们添加到此处的列表中:

foreach (Match link in links) {
    string result = link.Groups[1].Value;
    results.Add(result);
}

它没有找到任何链接,任何帮助将非常感谢

4

1 回答 1

1

这会找到所有网址:

    "#^((?#
    the scheme:
    )(?:https?://)(?#
    second level domains and beyond:
    )(?:[\S]+\.)+((?#
top level domains:
)MUSEUM|TRAVEL|AERO|ARPA|ASIA|EDU|GOV|MIL|MOBI|(?#
)COOP|INFO|NAME|BIZ|CAT|COM|INT|JOBS|NET|ORG|PRO|TEL|(?#
)A[CDEFGILMNOQRSTUWXZ]|B[ABDEFGHIJLMNORSTVWYZ]|(?#
)C[ACDFGHIKLMNORUVXYZ]|D[EJKMOZ]|(?#
)E[CEGHRSTU]|F[IJKMOR]|G[ABDEFGHILMNPQRSTUWY]|(?#
)H[KMNRTU]|I[DELMNOQRST]|J[EMOP]|(?#
)K[EGHIMNPRWYZ]|L[ABCIKRSTUVY]|M[ACDEFGHKLMNOPQRSTUVWXYZ]|(?#
)N[ACEFGILOPRUZ]|OM|P[AEFGHKLMNRSTWY]|QA|R[EOSUW]|(?#
)S[ABCDEGHIJKLMNORTUVYZ]|T[CDFGHJKLMNOPRTVWZ]|(?#
)U[AGKMSYZ]|V[ACEGINU]|W[FS]|Y[ETU]|Z[AMW])(?#
the path, can be there or not:
)(/[a-z0-9\._/~%\-\+&\#\?!=\(\)@]*)?)$#i"
于 2012-12-11T16:52:45.997 回答