python - 抓取和解析后存储数据

Question

我有一个使用 Beautiful Soup 4 解析的 html 文件，这是我感兴趣的部分

[
 <td>Name :</td>,   <td>xyz</td>, 
 <td>Mobile :</td>, <td>180-14587962</td>, 
 <td>Company:</td>, <td>abc Comp</td>, 
 <td>Name :</td>,   <td>  </td>, 
 <td>Mobile :</td>, <td>  </td>, 
 <td>Company:</td>, <td>  </td>, 
 <td>Name :</td>,   <td>  </td>, 
 <td>Mobile :</td>, <td>  </td> 
]

我只需要分别提取 Name 和 Mobile （它们在解析树中处于同一级别）。我该怎么做？我已经尝试过使用soup.find_next_siblings 方法，但无法以所需的格式存储数据（ Number 和 Mobile 的两个单独列表）

score 0 · Accepted Answer

这就是我解决它的方法

for tag in soup.findAll('td'):
  if tag.text.strip("\n").strip(' ').strip("\n")== 'Name :':
      inter=tag.find_next_sibling()
      list_name.append(inter.text.strip("\n").strip(' ').strip("\n"))
  if tag.text.strip("\n").strip(' ').strip("\n")== 'Mobile :':
      inter=tag.find_next_sibling()
      list_mobile.append(inter.text.strip("\n").strip(' ').strip("\n"))

遍历所有 td 标记以查找“名称：”或“移动：”并将下一个标记（包含值）添加到单独的列表中

score 0 · Accepted Answer

你可以使用类似的东西：

from bs4 import BeautifulSoup
html = """
 <td>Name :</td>,   <td>xyz</td>, 
 <td>Mobile :</td>, <td>180-14587962</td>, 
 <td>Company:</td>, <td>abc Comp</td>, 
 <td>Name :</td>,   <td>  </td>, 
 <td>Mobile :</td>, <td>  </td>, 
 <td>Company:</td>, <td>  </td>, 
 <td>Name :</td>,   <td>  </td>, 
 <td>Mobile :</td>, <td>  </td> 
"""
soup = BeautifulSoup(html, "lxml")
x = soup.find_all("td")
print x[1]
print x[3]

标准输出

<td>xyz</td>
<td>180-14587962</td>

演示

http://ideone.com/xDzeni

python - 抓取和解析后存储数据

2 回答 2

Related

Reference