我正在使用 Rust 下载大量股票市场数据,每个周期大约 50,000 个 GET 请求。为了使这个过程变得更快,我已经能够使用多线程。到目前为止,我的代码如下所示:
// Instantiate a channel so threads can send data to main thread
let (s, r) = channel();
// Vector to store all threads created
let mut threads = Vec::new();
// Iterate through every security in the universe
for security in universe {
// Clone the sender
let thread_send = s.clone();
// Create a thread with a closure that makes 5 get requests for the current security
let t = thread::spawn(move || {
// Download the 5 price vectors and send everything in a tuple to the main thread
let price_vectors = download_security(&security);
let tuple = (security, price_vectors.0, price_vectors.1, price_vectors.2, price_vectors.3, price_vectors.4);
&thread_send.send(tuple).unwrap();
});
// PAUSE THE MAIN THREAD BECAUSE OF THE ERROR I'M GETTING
thread::sleep(Duration::from_millis(20));
// Add the new thread to the threads vector
threads.push(t);
};
drop(s);
// Join all the threads together so the main thread waits for their completion
for t in threads {
t.join();
};
每个线程调用的download_security()
函数只需发出 5 次 GET 请求以下载价格数据(每分钟、每小时、每天、每周、每月数据)。我正在使用ureq
板条箱来提出这些请求。该download_security()
函数如下所示:
// Call minutely data and let thread sleep for arbitrary amount of time
let minute_text = ureq::get(&minute_link).call().unwrap().into_string().unwrap();
thread::sleep(Duration::from_millis(1000));
// Call hourly data and let thread sleep for arbitrary amount of time
let hour_text = ureq::get(&hour_link).call().unwrap().into_string().unwrap();
thread::sleep(Duration::from_millis(1000));
// Call daily data and let thread sleep for arbitrary amount of time
let day_text = ureq::get(&day_link).call().unwrap().into_string().unwrap();
thread::sleep(Duration::from_millis(1000));
// Call weekly data and let thread sleep for arbitrary amount of time
let week_text = ureq::get(&week_link).call().unwrap().into_string().unwrap();
thread::sleep(Duration::from_millis(1000));
// Call monthly data and let thread sleep for arbitrary amount of time
let month_text = ureq::get(&month_link).call().unwrap().into_string().unwrap();
thread::sleep(Duration::from_millis(1000));
现在,我在这段代码中让我的线程进入睡眠状态的原因是因为似乎每当我太快地发出太多 HTTP 请求时,我都会收到这个奇怪的错误:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Transport(Transport { kind: Dns, message: None, url: Some(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.polygon.io")), port: None, path: "/v2/aggs/ticker/SPHB/range/1/minute/2021-05-22/2021-10-22", query: Some("adjusted=true&sort=asc&limit=200&apiKey=wo_oZg8qxLYzwo3owc6mQ1EIOp7yCr0g"), fragment: None }), source: Some(Custom { kind: Uncategorized, error: "failed to lookup address information: nodename nor servname provided, or not known" }) })', src/main.rs:243:54
当我在创建新子线程后增加主线程休眠的时间,或者在发出 5 个 GET 请求后增加子线程休眠的时间时,这些错误的数量会减少。当睡眠时间太短时,我会看到我尝试下载的 90% 以上的证券打印出这个错误。当睡眠时间更长时,一切都完美无缺,除了这个过程花费的时间太长。这很令人沮丧,因为我需要这个过程尽可能快,对于所有 10,000 种证券,最好小于 1 分钟。
我在 M1 Mac Mini 上运行 macOS Big Sur。关于我每秒可以发出多少 GET 请求,我的操作系统是否存在某种基本限制?
任何帮助将不胜感激。
谢谢!