0

我从 ajax 请求中得到如下输出:

<div style='font-size:12px; font-weight:bold; line-height:17px;'>These results were cached from March 10, 2021, 1:11 pm PST to conserve server resources. <br/>If you are diagnosing a certificate installation problem,
                        you can get uncached results by <a href="" id="captchaLink">clicking here</a>.</div><table class='checker_messages'><tr><td class='passed'>&nbsp;</td><td><h3>www.Google.com resolves to 172.217.11.36</h3><</h3></td><tr><tr><td class='passed'>&nbsp;</td><td><h3>The certificate should be trusted by all major web browsers (all the correct intermediate certificates are installed).</h3></td><tr><tr><td class='passed'>&nbsp;</td><td><h3><table class=""><tr><td>The certificate will expire in <span id="cert_expiration_days">62</span> days. </td>
                                                        <td style="padding-left:10px;"><a href="" class="btn btn-blue" id="reminderButton">Remind me</a></td></tr></table><input type="hidden" id="cert_valid_to" value="1620822887" /></h3></td><tr><tr><td class='passed'>&nbsp;</td><td><h3>The hostname (www.Google.com) is correctly listed in the certificate.</h3></td><tr></table><table class='checker_certs'><tr><td class='cert'><img src='/assets/templates/sslshopper/images/sslchecker/certificate_good_server.png' height='128' width='128' /></td><td><b>Common name:</b> www.google.com<br/><b>SANs:</b> www.google.com<br/><b>Organization:</b> Google LLC<br/><b>Location:</b> Mountain View, California,   US<br/><b>Valid</b> from February 17, 2021 to May 12, 2021<br/><b>Serial Number:</b> 46638b76e6854ad205000000008779ef<br/><b>Signature Algorithm:</b> sha256WithRSAEncryption<br/><b>Issuer:</b> GTS CA 1O1<td></tr><tr><td class='chain'><img src='/assets/templates/sslshopper/images/sslchecker/arrow_down.png' height='48' width='48' /></td><td>&nbsp;</td></tr><tr><td class='cert'><img src='/assets/templates/sslshopper/images/sslchecker/certificate_good_chain.png' height='128' width='128' /></td><td><b>Common name:</b> GTS CA 1O1<br/><b>Organization:</b> Google Trust Services<br/><b>Location:</b>  US<br/><b>Valid</b> from June 14, 2017 to December 14, 2021<br/><b>Serial Number:</b> 01e3b49aa18d8aa981256950b8<br/><b>Signature Algorithm:</b> sha256WithRSAEncryption<br/><b>Issuer:</b> GlobalSign<td></tr></table><input type='hidden' id='reminderCertID' value='58366913' /><input type='hidden' id='expirationDate' value='1620822887' /><input type='hidden' id='clean_hostname' value='www.Google.com' />

当我尝试td使用以下代码段使用 goquery 进行解析时:

    doc, err := goquery.NewDocumentFromReader(strings.NewReader(pageContent))
    if err != nil {
        panic(err)
    }
    doc.Find("td").Each(func(i int, s *goquery.Selection) {
        fmt.Printf("%s\n", s.Text())
    })

输出:

www.Google.com resolves to 172.217.11.36
 
Server Type:  gws

 
The certificate should be trusted by all major web browsers (all the correct intermediate certificates are installed).
 
The certificate will expire in 62 days. 
                                                        Remind me
The certificate will expire in 62 days. 
Remind me
 
The hostname (www.Google.com) is correctly listed in the certificate.

Common name: www.google.comSANs: www.google.comOrganization: Google LLCLocation: Mountain View, California,   USValid from February 17, 2021 to May 12, 2021Serial Number: 46638b76e6854ad205000000008779efSignature Algorithm: sha256WithRSAEncryptionIssuer: GTS CA 1O1


 

Common name: GTS CA 1O1Organization: Google Trust ServicesLocation:  USValid from June 14, 2017 to December 14, 2021Serial Number: 01e3b49aa18d8aa981256950b8Signature Algorithm: sha256WithRSAEncryptionIssuer: GlobalSign

当我尝试使用b标签而不是td我得到如下输出:

Common name:
SANs:
Organization:
Location:
Valid
Serial Number:
Signature Algorithm:
Issuer:
Common name:
Organization:
Location:
Valid
Serial Number:
Signature Algorithm:
Issuer:

我试图实现的输出是只得到Organization: Google LLC. 我最近开始使用 StackOverflow 和 golang 新手,所以我不熟悉环境,如果我犯了错误,请告诉我。

4

1 回答 1

0

通过添加一些替换,我能够实现正确的输出。

    res1 := strings.ReplaceAll(pageContent, "</b>", "")
    res2 := strings.ReplaceAll(res1, "<br/>", "</b>")
    doc, err := goquery.NewDocumentFromReader(strings.NewReader(res2))
    if err != nil {
        panic(err)
    }
    doc.Find("b").Each(func(i int, s *goquery.Selection) {
        fmt.Println(s.Nodes[0].FirstChild.Data)
    })

输出:

Common name: www.google.com
SANs: www.google.com
Organization: Google LLC
Location: Mountain View, California,   US
Valid from February 17, 2021 to May 12, 2021
Serial Number: 46638b76e6854ad205000000008779ef
Signature Algorithm: sha256WithRSAEncryption
Issuer: GTS CA 1O1
Common name: GTS CA 1O1
Organization: Google Trust Services
Location:  US
Valid from June 14, 2017 to December 14, 2021
Serial Number: 01e3b49aa18d8aa981256950b8
Signature Algorithm: sha256WithRSAEncryption
Issuer: GlobalSign

但现在只想要Organization: Google LLC线。

于 2021-03-30T12:37:04.000 回答