selenium-webdriver - selenium+scrapy 不匹配页面源码

Question

我试图从“zingat.com”中删除列表数据。清单的示例是“http://zingat.com//en/didim-akbuk-de-mustakil-girisli-havuzlu-3-1-daire-4032010i”。我尝试使用 scrapy + selenium 来执行此操作，但是 xpath 的输出与源代码不匹配，尤其是在页面的“功能”部分时。

    options = webdriver.ChromeOptions()
    options.add_argument("headless")
    #options.add_argument("--remote-debugging-port=9222")
    desired_capabilities = options.to_capabilities()
    driver = webdriver.Chrome(desired_capabilities=desired_capabilities)

    current_url = 'http://zingat.com//en/didim-akbuk-de-mustakil-girisli-havuzlu-3-1-daire-4032010i'
    #print(current_url)
    driver.get(current_url)
    driver.implicitly_wait(10)
    selenium_response_text = driver.page_source
    #time.sleep(10)
    selector = Selector(text=selenium_response_text)
    feature_list_names = selector.xpath('//strong[@class="col-md-6"]/text()').getall()
    feature_list_data = selector.xpath('//span[@class="col-md-6"]/text()').getall()

我没有正确获取所有功能名称和数据：

['\n Listing No\n ', '净平方米', '总平方米', '房间数量', '浴室数量', '建筑层数', '供暖类型', '户型' , '楼层', '\n Video Home Tour\n ', '\n ', '建筑年代', 'Pet Friendly Paws Houses', '家具状态', '使用状态', '物业状态' , '维护费', '租金收入', '来自谁', '地契状态', '适合银行贷款']

['4032010', '125', '150m²', '没有']

或者换句话说，列表名称与源代码不同，列表数据丢失了大约一半的点。我尝试添加隐式等待，但没有帮助。如果您有任何建议或理想情况下是直接解决方案或路径，我将不胜感激。

也感谢您的阅读。

selenium-webdriver - selenium+scrapy 不匹配页面源码

0 回答 0

Related

Reference