1

我尝试抓取产品的 productId,但我不能。请帮忙

html代码

<span class="info">
 <button data-product="{"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}">
                  

当我尝试

h.ChildAttr("span.info>button", "data-product")

结果是{"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}

当我尝试

h.ChildAttr("span.info>button", "productId")

没有结果。如何用 colly 获取这些数据?

4

1 回答 1

0

属性值是一个原始值,在这种情况下,它是 JSON 格式,因此您需要解析 JSON 才能正确获取数据。

例如:

package main

import (
    "log"
    "encoding/json"
    "github.com/gocolly/colly"
)

func main() {
    c := colly.NewCollector()

    c.OnHTML(`body`, func(e *colly.HTMLElement) {
        text := e.ChildAttr("span.info>button", "data-product")

        var result map[string]interface{}
        err := json.Unmarshal([]byte(text), &result)
        if err != nil {
            log.Println(err)
            return
        }
        log.Println(result["productId"])
    })

    c.Visit("[some url]")
}

输出

2021/10/21 14:23:24 which I want to scrape
于 2021-10-21T12:25:46.880 回答