2

I'm developing a webapp that will need to download the html form a website and then iterate through the code and try to find a specific but ever changing value (in our case it will be the price for the product).

For this, I was thinking about asking the user (upon installation and setup) to provide the system with a few lines of html from the page (that has the price) and then from then on, every time we need to fetch the price we would try to search for those lines and find the price.

Now, I believe this is a horrible and slow way of doing this and since there are no rules and the html can be totally different from one website to another (even the same website might change) I couldn't find a better way.

One improvement that I thought about was to iterate through the first time and record the line at which we find the code. Once found, the subsequent times we would then start from a few lines before the expected location and start the search. Any Thoughts on how I can improve on this?

I posted this question on https://cstheory.stackexchange.com/ but they commented that it's not on topic and that I should post it here.

I have the code for the above and if needed I can post it, I'm simply thinking that there must be a better, faster way of doing this.

4

1 回答 1

1

这实际上是我最近尝试的一个项目(使用 BeautifulSoup 和 Python)。对我有用的解决方案是锻炼针对包含我正在寻找的值的元素的 CSS 选择器(可以映射到 jQuery 选择器)。在我的情况下,我能够将完整的文档缩小到包含我正在寻找的元素的元素,但是如果你不能准确地得到你在哪里,你可以将它与一些额外的乳酸类测试结合起来,看看它是否看起来像价格(通过正则表达式)或测试它旁边的内容。

于 2014-02-17T17:54:33.423 回答