Probably you have realized by title, I am using scrapy and xpath to extract data. I tried and provided xpaths from file to the spider (to make spider generic - not to edit often) As required, I am able to extract data in the format required.
Further, now I want to check the xpath expression (relative to webpage specified in spider) if the xpath provided is valid or not (incase if the html style has changed, then my xpath will be invalid). Regarding this I want to check my xpath expression before spider starts.
How do I test my xpath's correctness? or is there any way to do truth testing? Please help.
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = ["file:///<filepath>.html"]
def __init__(self):
self.mt = ""
def parse(self, response):
respDta = dict()
it_lst = []
dtData = response.selector.xpath(gx.spcPth[0])
for ra in dtData:
comoodityObj = ra.xpath(gx.spcPth[1])
list = comoodityObj.extract()
cmdNme = list[0].replace(u'\xa0', u' ')
cmdNme = cmdNme.replace("Header text: ", '')
self.populate_item(response, respDta, cmdNme, it_lst, list[0])
respDta["mt"] = self.mt
jsonString = json.dumps(respDta, default=lambda o: o.__dict__)
return jsonString
gx.spcPth
gx.spcPth is from other function which provides me xpath. And it has been used in many instances in rest of the code. I need to check xpath expression before spider starts throughout the code, wherever implemented