0

您好,我需要一个正则表达式来获取来自本地域的所有链接,而不是外部网站。直到现在我有这个但只返回外部页面

<%function getPage(strURL)
dim strBody, objXML

set objXML = CreateObject("Msxml2.ServerXMLHTTP.6.0")
    objXML.Open "GET", strURL, False
    'objXML.setRequestHeader "User-Agent", "ddd" '===  falsify the agent
    'objXML.setRequestHeader "Content-Type", "text/html; Charset:ISO-8859-1"
    'objXML.setRequestHeader "Content-Type", "text/html; Charset:UTF-8"
    objXML.Send  
    status = objXML.status 
if err.number <> 0 or status <> 200 then 
    if status = 404 then 
        Response.Write "[EFERROR]Page does not exist (404)." 
    elseif status >= 401 and status < 402 then 
        Response.Write "[EFERROR]Access denied (401)." 
    elseif status >= 500 and status <= 600 then 
        Response.Write "[EFERROR]500 Internal Server Error on remote site." 
    else 
        Response.write "[EFERROR]Server is down or does not exist." 
    end if 
      end if
    strBody = objXML.responseText

set objXML = nothing
getPage = strBody
'First, create a reg exp object
Dim objRegExp
Set objRegExp = New RegExp

objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "<a\s+href=""http://(.*?)"">\s*((\n|.)+?)\s*</a>"

'Display all of the matches
Dim objMatch
For Each objMatch in objRegExp.Execute(strBody)
  Response.Write("http://" & objMatch.SubMatches(0) & "<br>")
Next

end function


getPage("http://www.google.com")
%>

谢谢你

4

1 回答 1

0

也许说的很明显,但是如果您在“localdomain.com”中搜索链接不只是

objRegExp.Pattern = "<a\s+href=""http://(.*?)localdomain\.com"">\s*((\n|.)+?)\s*</a>"

?

编辑:正则表达式模式也许可以通过这种方式使用传入的 url:

objRegExp.Pattern = "<a\s+href=""" & strURL & "(.*?)"">\s*((\n|.)+?)\s*</a>"

检索到的匹配项也需要附加该 strURL:

For Each objMatch in objRegExp.Execute(strBody)
  Response.Write("http://" & strURL &  objMatch.SubMatches(0) & "<br>")
Next
于 2012-09-06T11:47:19.833 回答