我目前使用crawler4j作为我选择的网络爬虫,我正在尝试自学网络爬虫是如何工作的。我已经开始爬网,我希望它能够在下面看到的 crawlStorageFolder (/data/crawl/root) 中快速返回爬网数据
public class Controller {
public static void main(String[] args) throws Exception {
/*
* crawlStorageFolder is a folder where intermediate crawl data is
* stored.
*/
String crawlStorageFolder = "/data/crawl/root";
/*
* numberOfCrawlers shows the number of concurrent threads that should
* be initiated for crawling.
*/
int numberOfCrawlers = 7;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(crawlStorageFolder);
问题是我能找到的唯一信息是 crawlStorageFolder 位置的两个 .lck 文件和一个 .jdb 文件,我假设是数据的存储位置,但我也无法打开它们。是否有人能帮助我了解如何访问数据,以便我有希望并成功地将其输入数据库并最终将其显示在我的网站上。这将不胜感激。