问题
我想投资网站的特定部分(汽车销售平台)。
坦率地说,CSS 对我来说太混乱了,无法自己找出问题所在。
#### scraping the website www.otomoto.pl with used cars #####
baseURL_otomoto = "https://www.otomoto.pl/osobowe/?page="
i <- 1
for ( i in 1:7000 )
{
link = paste0(baseURL_otomoto,i)
out = read_html(link)
print(i)
print(link)
### building year
build_year = html_nodes(out, xpath = '//*[@id="body-container"]/div[2]/div[1]/div/div[6]/div[2]/article[1]/div[2]/div[3]/ul/li[1]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
mileage = html_nodes(out, xpath = '//*[@id="body-container"]/div[2]/div[1]/div/div[6]/div[2]/article[1]/div[2]/div[3]/ul/li[2]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
volume = html_nodes(out, xpath = '//*[@id="body-container"]/div[2]/div[1]/div/div[6]/div[2]/article[1]/div[2]/div[3]/ul/li[3]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
fuel_type = html_nodes(out, xpath = '//*[@id="body-container"]/div[2]/div[1]/div/div[6]/div[2]/article[1]/div[2]/div[3]/ul/li[4]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
price = html_nodes(out, xpath = '//div[@class="offer-item__price"]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
link = html_nodes(out, xpath = '//div[@class="offer-item__title"]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
offer_details = html_nodes(out, xpath = '//*[@id="body-container"]/div[2]/div[1]/div/div[6]/div[2]/article[1]/div[2]/div[3]/ul') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
任何猜测可能是这种行为的原因是什么?
PS#1。
如何将分析网站上可用的报价中的所有 build_type、mileage 和fuel_type 数据作为data.frame 一次性获取?使用类 (xpath = '//div[@class=...) 在我的情况下不起作用
PS#2。
我想使用 fi 获得实际报价的详细信息
gear_type = html_nodes(out, xpath = '//*[@id="parameters"]/ul[1]/li[10]/div') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
论据
- in ul[a] 用于 in (1:2) &
- in li[b] 代表 b in (1:12)
不幸的是,虽然这个概念失败了,因为结果数据框是空的。任何猜测为什么?