I need to crawl a website, which basically has links like this:
www.website.com/link/page_1.html
www.website.com/link/page_2.html
www.website.com/link/page_3.html
...
The scraped content is going directly into the database through pipelines.
It is easy to tell django something like:
if item exists do not insert it, otherwise insert it
But is there any way to scrape the rest of the links which have been added since last scrape?
For example, after website.com inserts new items:
/link/page_1.html becomes /link/page_2.html
new items populate /link/page_1.html
At this point, what do I need to tell scrapy just scrape the new added items since last scrape?