0

正如标题所示,我有一个关于解析可能具有多个属性(或根本没有属性)的 XML 标记的问题,我正在寻找有关如何完成此操作的建议;但首先,我认为一些背景知识是有序的。

我正在开发一个名为Program O的基于 PHP 的AIML解释器脚本,并且正在将代码从字符串替换函数(例如 str_replace、preg_replace 等)迁移到使用 PHP 的内置 SimpleXML 函数。到目前为止,我为各种 AIML 标签创建的几乎所有解析功能都已完成,并且运行良好,但特别是有一个标签正在启动我的座椅加热器,那就是 CONDITION 标签。

根据AIML 标签参考,标签有三种不同的“形式”:一种同时具有 NAME 和 (VALUE|CONTAINS|EXISTS) 属性,称为“多条件”,一种只有 NAME 属性,称为“单条件” name list-condition”,最后一个“form”,称为“list-condition”,它只是 CONDITION 标记,根本没有任何属性。我之前链接的 AIML 标记参考包含所有三种形式的示例,但中间有很多单词,所以我将在这里重复它们,并结合周围的 AIML 代码:

FORM:多条件标签:

<category>
  <pattern>I AM BLOND</pattern>
  <template>You sound very
    <condition name="gender" value="female"> attractive.</condition>
    <condition name="gender" value="male"> handsome.</condition>
  </template>
</category>

FORM:列表条件标签:

<category>
  <pattern>I AM BLOND</pattern>
  <template>You sound very
    <condition>
      <li name="gender" value="female"> attractive.</li>
      <li name="gender" value="male"> handsome.</li>
    </condition>
  </template>
</category>

FORM:单名列表条件标签

<category>
  <pattern>I AM BLOND</pattern>
  <template>You sound very
    <condition name="gender">
      <li value="female"> attractive.</li>
      <li value="male"> handsome.</li>
    </condition>
  </template>
</category> 

在我正在处理的脚本的先前版本中,只使用了 CONDITION 标记的“列表条件”形式,虽然这是最常用的形式,但它并不是专门使用的,所以我需要能够也适用于其他两种形式。所以我的问题是:

如何以有效的方式实现这一点?

我已经有工作代码来解析 CONDITION 标记的列表条件形式,初步测试看起来很有希望,因为它不会引发错误,并且似乎产生了所需的响应(但仅适用于列表条件形式。其他 2表单因错误而失败,原因很明显)。功能如下:

function parse_condition_tag($convoArr, $element, $parentName, $level)
{
  runDebug(__FILE__, __FUNCTION__, __LINE__, 'Starting function and setting timestamp.', 2);
  $response = array();
  $attrName = $element['name'];
  if (!empty ($attrName))
  {
    $attrName = ($attrName == '*') ? $convoArr['star'][1] : $attrName;
    $search = $convoArr['client_properties'][$attrName];
    $path = ($search != 'undefined') ? "//li[@value=\"$search\"]" : '//li[not@*]';
    $choice = $element->xpath($path);
    $children = $choice[0]->children();
    if (!empty ($children))
    {
      $response = parseTemplateRecursive($convoArr, $children, $level + 1);
    }
    else
    {
      $response[] = (string) $choice[0];
    }
    $response_string = implode_recursive(' ', $response, __FILE__, __FUNCTION__, __LINE__);
    runDebug(__FILE__, __FUNCTION__, __LINE__, "Returning '$response_string' and exiting function.", 4);
    return $response_string;
  }
  trigger_error('Parsing of the CONDITION tag failed! XML = ' . $element->asXML());
}

我对使用 SimpleXML 函数还比较陌生,所以我很可能遗漏了一些明显的东西。事实上,我希望情况确实如此。:)

编辑:添加我最终得到的功能,正如我在其中一条评论中所承诺的那样:

  /*
   * function parse_condition_tag
   * Acts as a de-facto if/else structure, selecting a specific output, based on certain criteria
   * @param [array] $convoArr    - The conversation array (a container for a number of necessary variables)
   * @param [object] $element    - The current XML element being parsed
   * @param [string] $parentName - The parent tag (if applicable)
   * @param [int] $level         - The current recursion level
   * @return [string] $response_string
   */

 function parse_condition_tag($convoArr, $element, $parentName, $level)
 {
   runDebug(__FILE__, __FUNCTION__, __LINE__, 'Starting function and setting timestamp.', 2);
   global $error_response;
   $response = array();
   $attrName = $element['name'];
   $attributes = (array)$element->attributes();
   $attributesArray = (isset($attributes['@attributes'])) ? $attributes['@attributes'] : array();
   runDebug(__FILE__, __FUNCTION__, __LINE__, 'Element attributes:' . print_r($attributesArray, true), 1);
   $attribute_count = count($attributesArray);
   runDebug(__FILE__, __FUNCTION__, __LINE__, "Element attribute count = $attribute_count", 1);
   if ($attribute_count == 0) // Bare condition tag
   {
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'Parsing a CONDITION tag with no attributes. XML = ' . $element->asXML(), 2);
     $liNamePath = 'li[@name]';
     $condition_xPath = '';
     $exclude = array();
     $choices = $element->xpath($liNamePath);
     foreach ($choices as $choice)
     {
       $choice_name = (string)$choice['name'];
       if (in_array($choice_name, $exclude)) continue;
       $exclude[] = $choice_name;
       runDebug(__FILE__, __FUNCTION__, __LINE__, 'Client properties = ' . print_r($convoArr['client_properties'], true), 2);
       $choice_value = get_client_property($convoArr, $choice_name);
       $condition_xPath .= "li[@name=\"$choice_name\"][@value=\"$choice_value\"]|";
     }
     $condition_xPath .= 'li[not(@*)]';
     runDebug(__FILE__, __FUNCTION__, __LINE__, "xpath search = $condition_xPath", 4);
     $pick_search = $element->xpath($condition_xPath);
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'Pick array = ' . print_r($pick_search, true), 2);
     $pick_count = count($pick_search);
     runDebug(__FILE__, __FUNCTION__, __LINE__, "Pick count = $pick_count.", 2);
     $pick = $pick_search[0];
   }
   elseif (array_key_exists('value', $attributesArray) or array_key_exists('contains', $attributesArray) or array_key_exists('exists', $attributesArray)) // condition tag with either VALUE, CONTAINS or EXISTS attributes
   {
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'Parsing a CONDITION tag with 2 attributes.', 2);
     $condition_name = (string)$element['name'];
     $test_value = get_client_property($convoArr, $condition_name);
     switch (true)
     {
       case (isset($element['value'])):
         $condition_value = (string)$element['value'];
         break;
       case (isset($element['value'])):
         $condition_value = (string)$element['value'];
         break;
       case (isset($element['value'])):
         $condition_value = (string)$element['value'];
         break;
       default:
         runDebug(__FILE__, __FUNCTION__, __LINE__, 'Something went wrong with parsing the CONDITION tag. Returning the error response.', 1);
         return $error_response;
     }
     $pick = ($condition_value == $test_value) ? $element : '';
   }
   elseif (array_key_exists('name', $attributesArray)) // this ~SHOULD~ just trigger if the NAME value is present, and ~NOT~ NAME and (VALUE|CONTAINS|EXISTS)
   {
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'Parsing a CONDITION tag with only the NAME attribute.', 2);
     $condition_name = (string)$element['name'];
     $test_value = get_client_property($convoArr, $condition_name);
     $path = "li[@value=\"$test_value\"]|li[not(@*)]";
     runDebug(__FILE__, __FUNCTION__, __LINE__, "search string = $path", 4);
     $choice = $element->xpath($path);
     $pick = $choice[0];
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'Found a match. Pick = ' . print_r($choice, true), 4);
   }
   else // nothing matches
   {
     runDebug(__FILE__, __FUNCTION__, __LINE__, 'No matches found. Returning default error response.', 1);
     return $error_response;
   }
   $children = (is_object($pick)) ? $pick->children() : null;
   if (!empty ($children))
   {
     $response = parseTemplateRecursive($convoArr, $children, $level + 1);
   }
   else
   {
     $response[] = (string) $pick;
   }
   $response_string = implode_recursive(' ', $response);
   return $response_string;
 }

我怀疑可能有更好、更优雅的方式来做到这一点(我的生活故事,真的),但上述工作符合预期。任何改进建议都将被欣然接受,并仔细考虑。

4

1 回答 1

0

请注意,我没有使用SimpleXML,因为 imhoDOMDocument更好,更强大。从 PHP5 开始,两者都DOMDocument可用。DOMXPath

我创建了一个简单的解析器类,它解析提供的文档以获取不同样式的条件:

class AIMLParser
{
    public function parse($data)
    {
        $internalErrors = libxml_use_internal_errors(true);

        $dom = new DOMDocument();
        $dom->loadHTML($data);
        $xpath = new DOMXPath($dom);

        $templates = array();

        foreach($xpath->query('//template') as $templateNode) {
            $template = array(
                'text' => $templateNode->firstChild->nodeValue, // note this expects the first child note to always be the textnode
                'conditions' => array(),
            );

            foreach ($templateNode->getElementsByTagName('condition') as $condition) {
                if ($condition->hasAttribute('name') && $condition->hasAttribute('value')) {
                    $template['conditions'] = $this->parseConditionsWithoutChildren($template['conditions'], $condition);
                } elseif ($condition->hasAttribute('name')) {
                    $template['conditions'] = $this->parseConditionsWithNameAttribute($template['conditions'], $condition);
                } else {
                    $template['conditions'] = $this->parseConditionsWithoutAttributes($template['conditions'], $condition);
                }
            }

            $templates[] = $template;
        }

        libxml_use_internal_errors($internalErrors);

        return $templates;
    }

    private function parseConditionsWithoutChildren(array $conditions, DOMNode $condition)
    {
        if (!array_key_exists($condition->getAttribute('name'), $conditions)) {
            $conditions[$condition->getAttribute('name')] = array();
        }

        $conditions[$condition->getAttribute('name')][$condition->getAttribute('value')] = $condition->nodeValue;

        return $conditions;
    }

    private function parseConditionsWithNameAttribute(array $conditions, DOMNode $condition)
    {
        if (!array_key_exists($condition->getAttribute('name'), $conditions)) {
            $conditions[$condition->getAttribute('name')] = array();
        }

        foreach ($condition->getElementsByTagName('li') as $listItem) {
            $conditions[$condition->getAttribute('name')][$listItem->getAttribute('value')] = $listItem->nodeValue;
        }

        return $conditions;
    }

    private function parseConditionsWithoutAttributes(array $conditions, DOMNode $condition)
    {
        foreach ($condition->getElementsByTagName('li') as $listItem) {
            if (!array_key_exists($listItem->getAttribute('name'), $conditions)) {
                $conditions[$listItem->getAttribute('name')] = array();
            }

            $conditions[$listItem->getAttribute('name')][$listItem->getAttribute('value')] = $listItem->nodeValue;
        }

        return $conditions;
    }
}

它的作用是在文档中搜索template节点并遍历它们。对于每个template节点,它会找出条件的样式。基于此,它选择条件的正确解析函数。在遍历所有模板后,它会返回一个已解析的数组,其中包含您需要的所有信息(我认为)。

要解析一些文档,您可以这样做:

$parser = new AIMLParser();
$templates = $parser->parse($someVariableWithTheContentOfTheDocument);

演示:http ://codepad.viper-7.com/JPuBaE

于 2013-02-23T16:00:03.743 回答