0

有没有办法从 Web Harvest 的子链接收集数据?

下面是我使用的一个 xml 段:

<loop item="item" index="i">
            <list><var name="products"/></list>
            <body>
                <xquery>
                    <xq-param name="item"><var name="item"/></xq-param>
                    <xq-expression><![CDATA[
                           declare variable $item as node() external; 
                            for $i in $item//div[1]/p/a[@trace='auction'][1]
                            let $url := data($i/@href) 

如何在这个新的 url 上获取数据库,现在是 $url?

请帮我。谢谢。

4

2 回答 2

0

你没有提供完整的代码,让我检查运行,但这应该让你上路:

<config>
    <loop item="item" index="i">
            <list><var name="products"/></list>
            <body>          
                <var-def name="new_url">
                <xquery>
                    <xq-param name="item"><var name="item"/></xq-param>
                    <xq-expression><![CDATA[
                           declare variable $item as node() external; 
                            for $i in $item//div[1]/p/a[@trace='auction'][1]
                            let $url := data($i/@href) 
                                return
                                    {$url}
                    ]]></xq-expression>
                </xquery>
                </var-def>

                <!-- now your new url is saved in webharvest variable new_url and you are free to run a 
                new webharvest http request using it -->

                <var-def name="new_page_content">
                    <http url="${new_url}"/>
                </var-def>                  

                <!-- now the content of the new page has been downloaded and saved in new variable 
                new_page_content and you are free to query it further should you want to -->

                <var-def name="contact">
                <xpath expression="//a[contains(., 'contact')]/@href">
                <var name="new_page_content"/>
                </xpath>
            </body>
    </loop>             
</config>
于 2014-06-19T14:47:24.310 回答
0

您只需要创建另一个来包含此信息。我创建了一个示例供您轻松理解。请看一看:

脚本:

<?xml version="1.0" encoding="UTF-8"?>
<config>
    <var-def name="MainSite">http://www.appszoom.com/android_games/arcade_and_action</var-def>
        <loop item="titles" index="i">
        <list>
            <xpath expression="//li[@class='app captureLinkBox']/div/div/span/a">
                <html-to-xml>
                    <http url="${MainSite}"></http>
                </html-to-xml>
            </xpath>
        </list>
        <body>
            <var-def name="titleURL">
                    <xpath expression="data(/a/@href)">
                        <var name="titles"/>
                    </xpath>
            </var-def>
            <file action="append" path="D:\navin.xml">
                <xquery>
                    <xq-param name="titles"><template>${titles}</template></xq-param>
                    <xq-param name="titleURLContent">
                        <html-to-xml>
                            <http url="${titleURL}"></http>
                        </html-to-xml>
                    </xq-param>
                        <xq-expression>
                            <![CDATA[
                            declare variable $titles as node() external;
                            declare variable $titleURLContent as node() external;
                            <game>
                                <title>{$titles/a/text()}</title>
                                <downloads>{$titleURLContent//*[@id="left-bar"]/p[2]/span/text()}</downloads>
                            </game>
                            ]]>
                        </xq-expression>
                </xquery>
            </file>
        </body>
    </loop>
</config>

输出:

<game>
   <title>Clash of Clans</title>
   <downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
   <title>DEER HUNTER 2014</title>
   <downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
   <title>Subway Surfers</title>
   <downloads>100,000,000 - 500,000,000</downloads>
</game>
<game>
   <title>RoboCop™&lt;/title>
   <downloads>5,000,000 - 10,000,000</downloads>
</game><game>
   <title>DragonFlight for Kakao</title>
   <downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
   <title>Castle Clash</title>
   <downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
   <title>Sonic Dash</title>
   <downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
   <title>Injustice: Gods Among Us</title>
   <downloads>1,000,000 - 5,000,000</downloads>
</game>
<game>
   <title>Banana Kong</title>
   <downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
   <title>Temple Run 2</title>
   <downloads>100,000,000 - 500,000,000</downloads>
</game>
于 2014-03-03T06:20:28.973 回答