我正在使用 x 射线模块来抓取数据,如下面的 HTML 代码。如何用 X 射线刮出这样的定义列表?
<div class="content">
<h1>Title</h1>
<div class="productDetails">
<dl>
<dt>1</dt>
<dd>A</dd>
<dt>2</dt>
<dd>B</dd>
<dt>3</dt>
<dd>C</dd>
</dl>
</div>
</div>
我想得到以下结果:
[
{
"productTitle": "Title",
"productDetails": [
{
"attrName": "1",
"attrDesc": "A"
},
{
"attrName": "2",
"attrDesc": "B"
},
{
"attrName": "3",
"attrDesc": "C"
}
]
}
]
这是我用来抓取内容的代码:
x(url, '.content', [
{
productTitle: 'h1',
productDetails1: x('div.productDetails', [{attrName: ['dl dt'], attrDesc: ['dl dd | remove_whitespace']}]),
productDetails2: x('div.productDetails dl', ['dt', 'dd']),
}
])
但我得到了这个结果:
[
{
"productTitle": "Title",
"productDetails_1": [
{
"attrName": [
"1",
"2",
"3"
],
"attrDesc": [
"A",
"B",
"C"
]
}
],
"productDetails_2": [
"1",
"2",
"3"
]
}
]
如何获取我上面描述的结构中的数据?