1

我是 PHP 新手。我想写代码找到id下面html代码中指定的,即1123. 任何人都可以给我一些想法吗?

<span class="miniprofile-container /companies/1123?miniprofile="
      data-tracking="NUS_CMPY_FOL-nhre"
      data-li-getjs="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=dyt8o4nwtaujeutlgncuqe0dn&amp;fc=2">
    <strong>
        <a href="http://www.linkedin.com/nus-trk?trkact=viewCompanyProfile&pk=biz-overview-public&pp=1&poster=&uid=5674666402166894592&ut=NUS_UNIU_FOLLOW_CMPY&r=&f=0&url=http%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fcompany%2F1123%3Ftrk%3DNUS_CMPY_FOL-nhre&urlhash=7qbc">
        Bank of America
        </a>
    </strong>
</span> has a new Project Manager

注意:我不需要 span 类中的内容。我需要id跨度类名中的 。

我尝试了以下方法:

$dom = new DOMDocument('1.0', 'UTF-8');
@$dom->loadHTML($html);
$xmlElements = simplexml_import_dom($dom);
$id = $xmlElements->xpath("//span [@class='miniprofile-container /companies/$data_id?miniprofile=']");

...但我不知道如何进一步进行。

4

2 回答 2

1

取决于你的需要,你可以做

$matches = array();
preg_match('|<span class="miniprofile-container /companies/(\d+)\?miniprofile|', $html, $matches);
print_r($matches);

这是一个非常简单的正则表达式,但可以作为第一个建议。如果你想通过 DomDocument 或 simplexml,你不能像在你的例子中那样混合两者。您的首选方式是什么,我们可以缩小范围。

//编辑:几乎是@fireeyedboy所说的,但这是我刚刚摆弄的:

<?php
$html = <<<EOD
<html><head></head>
<body>
<span class="miniprofile-container /companies/1123?miniprofile="
      data-tracking="NUS_CMPY_FOL-nhre"
      data-li-getjs="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=dyt8o4nwtaujeutlgncuqe0dn&amp;fc=2">
    <strong>
        <a href="#">
        Bank of America
        </a>
    </strong>
</span> has a new Project Manager

</body>
</html>
EOD;

$domDocument = new DOMDocument('1.0', 'UTF-8');
$domDocument->recover = TRUE;
$domDocument->loadHTML($html);

$xPath = new DOMXPath($domDocument);
$relevantElements = $xPath->query('//span[contains(@class, "miniprofile-container")]');
$foundId = NULL;
foreach($relevantElements as $match) {
    $pregMatches = array();
    if (preg_match('|/companies/(\d+)\?miniprofile|', $match->getAttribute('class'), $pregMatches)) {
        if (isset($pregMatches[1])) {
            $foundId = $pregMatches[1];
            break;
        }
    };
}

echo $foundId;

?>
于 2012-11-15T10:04:22.080 回答
1

这应该做你所追求的:

$dom = new DOMDocument('1.0', 'UTF-8');
@$dom->loadHTML( $html );
$xpath = new DOMXPath( $dom );

/*
 * the following xpath query will find all class attributes of span elements
 * whose class attribute contain the strings " miniprofile-container " and " /companies/"
 */
$nodes = $xpath->query( "//span[contains(concat(' ', @class, ' '), ' miniprofile-container ') and contains(concat(' ', @class, ' '), ' /companies/')]/@class" );
foreach( $nodes as $node )
{
    // extract the number found between "/companies/" and "?miniprofile" in the node's nodeValue
    preg_match( '#/companies/(\d+)\?miniprofile#', $node->nodeValue, $matches );
    var_dump( $matches[ 1 ] );
}
于 2012-11-15T10:20:22.197 回答