0

我不知何故无法从 AWIS 结果(包含 Alexa 数据)中提取信息。

我有一堆XML包含 AWIS 数据的文件,我想从中提取信息位,例如RankPageViews for 3 month period

这两个(冲突的)命名空间在某种程度上令人困惑,我的XPath表达式没有按预期工作。(即使是简单//aws:Rank/text()的也不行。)

如果有人可以帮助我继续前进,那就太好了。

目前,我正在使用jdom库,但不介意使用其他东西。这是一个小例子,不能像怀疑的那样工作:

Document doc = new SAXBuilder().build(file);
XPath xpath = XPath.newInstance("//aws:Rank");
xpath.addNamespace("aws", "http://awis.amazonaws.com/doc/2005-07-11/");
Element rank = (Element) xpath.selectSingleNode(doc);

虽然我更喜欢使用javax.xml...

这是一个示例XML

<?xml version="1.0"?>
<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11">
<aws:OperationRequest>
<aws:RequestId>XXXX-XXXX-XXXX-XXXX-XXXX</aws:RequestId>
</aws:OperationRequest>
<aws:UrlInfoResult>
<aws:Alexa>

  <aws:ContactInfo>
    <aws:DataUrl type="canonical">ahparis.com</aws:DataUrl>
    <aws:PhoneNumbers>
      <aws:PhoneNumber>+33 140289796</aws:PhoneNumber>
    </aws:PhoneNumbers>
    <aws:OwnerName>John Fay</aws:OwnerName>
    <aws:Email>hostmaster@superbregistrar.net</aws:Email>
    <aws:PhysicalAddress>
      <aws:Streets>
        <aws:Street>22 rue Saint Sauveur</aws:Street>
      </aws:Streets>
      <aws:City>Paris 75002,</aws:City>
      <aws:Country>FRANCE</aws:Country>
    </aws:PhysicalAddress>
    <aws:CompanyStockTicker/>
  </aws:ContactInfo>
  <aws:ContentData>
    <aws:DataUrl type="canonical">ahparis.com</aws:DataUrl>
    <aws:SiteData>
      <aws:Title>Ah Paris</aws:Title>
      <aws:Description>Short term apartment rentals. Search engine, descriptions, photos, rates.</aws:Description>
      <aws:OnlineSince>26-Feb-2003</aws:OnlineSince>
    </aws:SiteData>
    <aws:Keywords>
      <aws:Keyword>Fran̤ais</aws:Keyword>
      <aws:Keyword>Ile-de-France</aws:Keyword>
    </aws:Keywords>
    <aws:OwnedDomains>
      <aws:OwnedDomain>
        <aws:Domain>paris-tournament.org</aws:Domain>
        <aws:Title>paris-tournament.org</aws:Title>
      </aws:OwnedDomain>
    </aws:OwnedDomains>
  </aws:ContentData>
  <aws:TrafficData>
    <aws:DataUrl type="canonical">ahparis.com</aws:DataUrl>
    <aws:Rank>2547606</aws:Rank>
    <aws:RankByCountry/>
    <aws:RankByCity/>
    <aws:UsageStatistics>
      <aws:UsageStatistic>
        <aws:TimeRange>
          <aws:Months>3</aws:Months>
        </aws:TimeRange>
        <aws:Rank>
          <aws:Value>2547606</aws:Value>
          <aws:Delta>-658661</aws:Delta>
        </aws:Rank>
        <aws:Reach>
          <aws:Rank>
            <aws:Value>2964984</aws:Value>
            <aws:Delta>-152875</aws:Delta>
          </aws:Rank>
          <aws:PerMillion>
            <aws:Value>0.28</aws:Value>
            <aws:Delta>-10.64%</aws:Delta>
          </aws:PerMillion>
        </aws:Reach>
        <aws:PageViews>
          <aws:PerMillion>
            <aws:Value>0.01</aws:Value>
            <aws:Delta>+100%</aws:Delta>
          </aws:PerMillion>
          <aws:Rank>
            <aws:Value>2143379</aws:Value>
            <aws:Delta>-1628449</aws:Delta>
          </aws:Rank>
          <aws:PerUser>
            <aws:Value>4.0</aws:Value>
            <aws:Delta>+120%</aws:Delta>
          </aws:PerUser>
        </aws:PageViews>
      </aws:UsageStatistic>
      <aws:UsageStatistic>
        <aws:TimeRange>
          <aws:Months>1</aws:Months>
        </aws:TimeRange>
        <aws:Rank>
          <aws:Value>1430628</aws:Value>
          <aws:Delta>-3224794</aws:Delta>
        </aws:Rank>
        <aws:Reach>
          <aws:Rank>
            <aws:Value>1656655</aws:Value>
            <aws:Delta>-5103474</aws:Delta>
          </aws:Rank>
          <aws:PerMillion>
            <aws:Value>0.5</aws:Value>
            <aws:Delta>+500%</aws:Delta>
          </aws:PerMillion>
        </aws:Reach>
        <aws:PageViews>
          <aws:PerMillion>
            <aws:Value>0.02</aws:Value>
            <aws:Delta>+100%</aws:Delta>
          </aws:PerMillion>
          <aws:Rank>
            <aws:Value>1279227</aws:Value>
            <aws:Delta>-859817</aws:Delta>
          </aws:Rank>
          <aws:PerUser>
            <aws:Value>4</aws:Value>
            <aws:Delta>-63.11%</aws:Delta>
          </aws:PerUser>
        </aws:PageViews>
      </aws:UsageStatistic>
      <aws:UsageStatistic>
        <aws:TimeRange>
          <aws:Days>7</aws:Days>
        </aws:TimeRange>
        <aws:Rank>
          <aws:Value>1927968</aws:Value>
          <aws:Delta>+757770</aws:Delta>
        </aws:Rank>
        <aws:Reach>
          <aws:Rank>
            <aws:Value>2942088</aws:Value>
            <aws:Delta>+1612570</aws:Delta>
          </aws:Rank>
          <aws:PerMillion>
            <aws:Value>0.3</aws:Value>
            <aws:Delta>-64.64%</aws:Delta>
          </aws:PerMillion>
        </aws:Reach>
        <aws:PageViews>
          <aws:PerMillion>
            <aws:Value>0.05</aws:Value>
            <aws:Delta>+80%</aws:Delta>
          </aws:PerMillion>
          <aws:Rank>
            <aws:Value>708394</aws:Value>
            <aws:Delta>-413955</aws:Delta>
          </aws:Rank>
          <aws:PerUser>
            <aws:Value>10</aws:Value>
            <aws:Delta>+400%</aws:Delta>
          </aws:PerUser>
        </aws:PageViews>
      </aws:UsageStatistic>
    </aws:UsageStatistics>
    <aws:ContributingSubdomains>
      <aws:ContributingSubdomain>
        <aws:DataUrl>ahparis.com</aws:DataUrl>
        <aws:TimeRange>
          <aws:Months>1</aws:Months>
        </aws:TimeRange>
        <aws:Reach>
          <aws:Percentage>100.00%</aws:Percentage>
        </aws:Reach>
        <aws:PageViews>
          <aws:Percentage>100.00%</aws:Percentage>
          <aws:PerUser>4</aws:PerUser>
        </aws:PageViews>
      </aws:ContributingSubdomain>
    </aws:ContributingSubdomains>
  </aws:TrafficData>
</aws:Alexa>
</aws:UrlInfoResult>
<aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:StatusCode>Success</aws:StatusCode>
</aws:ResponseStatus>
</aws:Response>
</aws:UrlInfoResponse>
4

3 回答 3

2

您的代码中有错字。你有:

xpath.addNamespace("aws", "http://aws.amazonaws.com/doc/2005-07-11/");

但你应该有:

xpath.addNamespace("aws", "http://awis.amazonaws.com/doc/2005-07-11/");

(注意从aws到的变化awis)。

此外,您应该真正使用 JDOM 2.5,以及那里引入的新 XPath API。JDOM 2.x 版本对命名空间和生成内容的泛型有明显更好的处理。请参阅JDOM2.x XPath 处理中的变化

于 2014-12-04T13:12:58.117 回答
1

我使用带有以下样式表的 xslt 输入尝试了此操作:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:alex="http://alexa.amazonaws.com/doc/2005-10-05/"
    xmlns:awis="http://awis.amazonaws.com/doc/2005-07-11"
    version="1.0">

    <xsl:output omit-xml-declaration="yes"/>

    <xsl:template match="/">
        <xsl:value-of select="//awis:Rank/text()"/>
    </xsl:template>

</xsl:stylesheet>

不知何故,我得到了以下输出:

2547606

我想你必须在不同的前缀中注册命名空间,然后在你的 xpath 中使用它

于 2014-12-04T11:01:34.750 回答
1

它看起来像是命名空间 URI 中的错字 - 您的代码有

xpath.addNamespace("aws", "http://awis.amazonaws.com/doc/2005-07-11/");

(带有斜杠)但文档有

xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"

(没有斜线)。

不过我更喜欢使用 javax.xml ......

命名空间处理是一个真正的痛苦javax.xml.xpath,因为NamespaceContextJava 类库中没有提供接口的默认实现。您必须自己实现或使用第三方实现(我通常选择SimpleNamespaceContext来自 Spring)。如果您要进行大量 XPath 操作,我建议您查看Saxon 9(HE 版本是免费的)并使用它的s9api,因为它支持更强大的 XPath 语言 2.0 版。

于 2014-12-04T15:23:51.230 回答