0

我正在使用 x 射线模块来抓取数据,如下面的 HTML 代码。如何用 X 射线刮出这样的定义列表?

<div class="content">
    <h1>Title</h1>
    <div class="productDetails">
        <dl>
            <dt>1</dt>
            <dd>A</dd>
            <dt>2</dt>
            <dd>B</dd>
            <dt>3</dt>
            <dd>C</dd>
        </dl>
    </div>
</div>

我想得到以下结果:

[
  {
    "productTitle": "Title",
    "productDetails": [
      {
        "attrName": "1",
        "attrDesc": "A"
      },
      {
        "attrName": "2",
        "attrDesc": "B"
      },
      {
        "attrName": "3",
        "attrDesc": "C"
      }
    ]
  }
]

这是我用来抓取内容的代码:

    x(url, '.content', [
    {
      productTitle: 'h1',
      productDetails1: x('div.productDetails', [{attrName: ['dl dt'], attrDesc: ['dl dd | remove_whitespace']}]),
      productDetails2: x('div.productDetails dl', ['dt', 'dd']),
    }
  ])

但我得到了这个结果:

[
  {
    "productTitle": "Title",
    "productDetails_1": [
      {
        "attrName": [
          "1",
          "2",
          "3"
        ],
        "attrDesc": [
          "A",
          "B",
          "C"
        ]
      }
    ],
    "productDetails_2": [
      "1",
      "2",
      "3"
    ]
  }
]

如何获取我上面描述的结构中的数据?

4

0 回答 0