0

这个问题是我之前问题的延续:

使用 PHP 检查标签并获取标签内的值

我有这样的文字:

<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.

使用我之前问题的答案代码并PREG_OFFSET_CAPTURE添加如下:

function get_text_between_tags($string, $tagname) {
    $pattern = "/<$tagname\b[^>]*>(.*?)<\/$tagname>/is";
    preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
    if(!empty($matches[1]))
        return $matches[1];
    return array();
}

我得到一个输出:

Array (
[0] => Array ( [0] => 北加浪岸县长 [1] => 14 )
[1] => Array ( [0] => IPB 校长 [1] => 131 )
[2] => 数组 ( [0] => IPB 官员 [1] => 222 ) )

14、131、222是匹配模式时的字符索引。我可以得到单词的索引吗?我的意思是这样的输出:

Array (
[0] => Array ( [0] => 北加浪岸县长 [1] => 0 )
[1] => Array ( [0] => IPB 校长 [1] => 15)
[2] => 数组 ( [0] => IPB 官员 [1] => 27 ) )

PREG_OFFSET_CAPTURE除了或需要更多代码之外,还有其他方法吗?我不知道。感谢帮助。:)

4

1 回答 1

1

这将起作用,但需要完成一些工作:

<?php

$raw = '<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.';

$result = getExploded($raw,'<ORGANIZATION>','</ORGANIZATION>');

echo '<pre>';
print_r($result);
echo '</pre>';

function getExploded($data, $tagStart, $tagEnd) {
    $tmpData = explode($tagStart,$data);
    $wordCount = 0;
    foreach($tmpData as $k => $v) {
        $tmp = explode($tagEnd,$v);
        $result[$k][0] = $tmp[0];
        $result[$k][1] = $wordCount;
        $wordCount = $wordCount + (count(explode(' ',$v)) - 1);
    }
    return $result;
}

?>

结果是:

Array
(
    [0] => Array
        (
            [0] => 
            [1] => 0
        )

    [1] => Array
        (
            [0] => Head of Pekalongan Regency
            [1] => 0
        )

    [2] => Array
        (
            [0] => Rector of IPB
            [1] => 16
        )

    [3] => Array
        (
            [0] => officials of IPB
            [1] => 28
        )

    )
于 2013-05-10T02:03:48.547 回答