在学习 Rust 时,我正在尝试构建一个简单的网络爬虫。我的目标是抓取https://news.ycombinator.com/并获取标题、超链接、投票和用户名。我为此使用了外部库reqwest和scraper,并编写了一个从该站点抓取 HTML 链接的程序。
货运.toml
[package]
name = "stackoverflow_scraper"
version = "0.1.0"
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
scraper = "0.12.0"
reqwest = "0.11.2"
tokio = { version = "1", features = ["full"] }
futures = "0.3.13"
src/main.rs
use scraper::{Html, Selector};
use reqwest;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let url = "https://news.ycombinator.com/";
let html = reqwest::get(url).await?.text().await?;
let fragment = Html::parse_fragment(html.as_str());
let selector = Selector::parse("a.storylink").unwrap();
for element in fragment.select(&selector) {
println!("{:?}",element.value().attr("href").unwrap());
// todo println!("Title");
// todo println!("Votes");
// todo println!("User");
}
Ok(())
}
如何获得其对应的标题、票数和用户名?