go - 只刮一定的
使用 gocolly

Question

我正在尝试使用 gocolly 制作网络爬虫。我只想抓取<div>id 为dailyTexton的元素https://wol.jw.org/en/wol/h/r1/lp-e。我怎样才能做到这一点？

score 0 · Accepted Answer

感谢 xarantolus 的回答。
这对我很有用（如果域允许我使用它，那就是。）

func main() {
    cly := colly.NewCollector(
        colly.AllowedDomains("https://yourpage.site"),
    )
    cly.OnHTML("body", func(e *colly.HTMLElement) {
        link := e.Attr("div")
        fmt.Printf("Link found: %q -> %s\n", e.Text, link)
        cly.Visit(e.Request.AbsoluteURL(link))
    })
    cly.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL.String())
    })
    page := cly.Visit("https://yourpage.site")
    fmt.Print(page)
}

go - 只刮一定的使用 gocolly

1 回答 1

Related

Reference

go - 只刮一定的
使用 gocolly