介绍
您描述的问题不能用常规语言表示,因为常规语言无法平衡括号。然而,多年来,大多数正则表达式实现都添加了一些特性,允许解析比正则语言更复杂的语言。特别是,这个问题可以通过 .NET 的平衡匹配或PCRE 的递归表达式来解决(感谢@Gumbo,在评论中指出了这一点)。
但是,仅仅因为您可以做某事并不意味着您应该做某事。将正则表达式用于此类任务的问题在于,随着您扩展元语言,修改正则表达式将变得更加困难。而解析器往往更具可塑性且易于扩展。
因此,您可能可以构建一系列正则表达式来涵盖输入的非病态情况,但是当您可以编写一个解析器时,为什么还要尝试呢?它们易于维护,速度极快(比正则表达式快),易于扩展,而且启动起来很有趣。
本来我错过了这个问题是在寻找PHP的解决方案,所以我用JavaScript写了它。我将它翻译成 PHP,并在文章末尾留下了原始的 JavaScript 解决方案。
PHP解决方案
function parse( $s ) {
// we will always have a "current context". the current context is the array we're
// currently operating in. when we start, this is simply an empty array. as new
// arrays are created, this context will change.
$context = array();
// since we have to keep track of how deep our context is, we keep a context stack
$contextStack = array(&$context);
// this accumulates the name of the current array
$name = '';
for( $i=0; $i<strlen($s); $i++ ) {
switch( $s[$i] ) {
case ',':
// if the last array hasn't been added to the current context
// (as will be the case for arrays lacking parens), we add it now
if( $name!='' && !array_key_exists( $name, $context ) )
$context[$name] = array();
// reset name accumulator
$name = '';
break;
case '(':
// we are entering a subcontext
// save a new array in the current context; this will become our new context
$context[$name] = array();
// switch context and add to context stack
$context = &$context[$name];
$contextStack[] = &$context;
// reset name accumulator
$name = '';
break;
case ')':
// we are exiting a context
// if we haven't saved this array in the current context, do so now
if( $name!='' && !array_key_exists( $name, $context ) )
$context[$name] = array();
// we can't just assign $context the return value of array_pop because
// that does not return a reference
array_pop($contextStack);
if( count($contextStack) == 0 ) throw new Exception( 'Unmatched parenthesis' );
$context = &$contextStack[count($contextStack)-1];
// reset name accumulator
$name = '';
break;
default:
// this is part of the field name
$name .= $s[$i];
}
}
// add any trailing arrays to the context (this will cover the case
// where our input ends in an array without parents)
if( $name!='' && !array_key_exists( $name, $context ) )
$context[$name] = array();
if( count( $contextStack ) != 1 ) throw new Exception( 'Unmatched parenthesis' );
return array_pop( $contextStack );
}
原创 JavaScript 解决方案
function parse(s) {
var root = { parent: null, children: [] };
var field = { parent: root, name: '', start_idx: 0, children: [] };
root.children.push( field );
for( var i=0; i<s.length; i++ ) {
switch( s[i] ) {
case ',':
// if this field didn't have any children, we have to set its text
if( !field.children.length )
field.text = s.substr( field.start_idx, i - field.start_idx + 1 );
// start a new field; create new field and change context
var newfield = { parent: field.parent, name: '', start_idx: i, children:[] };
field.parent.children.push(newfield);
field = newfield;
break;
case '(':
// start of a subfield; create subfield and change context
var subfield = { parent: field, name: '', start_idx: i, children:[] };
field.children.push(subfield);
field = subfield;
break;
case ')':
// end of a subfield; fill out subfield details and change context
if( !field.parent ) throw new Error( 'Unmatched parenthesis!' );
field.text = s.substr( field.start_idx, i - field.start_idx + 1 );
if( field.text==='()' ) {
// empty subfield; pop this subfield so it doesn't clutter the parent
field.parent.children.pop();
}
field = field.parent;
break;
default:
// this is part of the field name
field.name += s[i];
field.name = field.name.trim();
}
}
return root;
}
现在我们已经有了你的语言的解析树,我们可以很容易地创建一些递归代码来发出你的 PHP:
function formatphp_namedarray(arr,indent,lastchild) {
var php = indent + arr.name + ' => array(';
if( arr.children.length ) {
if( arr.children.length===1 && arr.children[0].length===0 ) {
php += arr.children[0].name;
} else {
php += '\n';
indent += '\t';
for( var i=0; i<arr.children.length; i++ )
php += formatphp_namedarray(arr.children[i],indent,i===arr.children.length-1);
indent = indent.replace(/\t$/,'');
php += indent;
}
}
php += (lastchild?')':'),') + '\n';
return php;
}
function formatphp(t) {
var php = 'array(\n';
for( var i=0; i<t.children.length; i++ )
php += formatphp_namedarray( t.children[i], '\t', i===t.children.length-1 );
php += ')'
return php;
}
在这里看到这一切:http: //jsfiddle.net/6bguY/1/