这与嵌套函数的潜力相去甚远,Regex
再加上这是一种正在修改的结构化语言,因此词法分析器/解析器方案更合适。
这是处理这种性质的事物的系统示例
首先,我们定义一些可以位于输入中的东西(要修改的表达式)
public interface ISourcePart
{
/// <summary>
/// Gets the string representation of the kind of thing we're working with
/// </summary>
string Kind { get; }
/// <summary>
/// Gets the position this information is found at in the original source
/// </summary>
int Position { get; }
/// <summary>
/// Gets a representation of this data as Token objects
/// </summary>
/// <returns>An array of Token objects representing the data</returns>
Token[] AsTokens();
}
接下来,我们将为房屋标记(源文本的可识别部分)定义一个构造
public class Token : ISourcePart
{
public int Position { get; set; }
public Token[] AsTokens()
{
return new[] {this};
}
public string Kind { get; set; }
/// <summary>
/// Gets or sets the value of the token
/// </summary>
public string Value { get; set; }
/// <summary>
/// Creates a new Token
/// </summary>
/// <param name="kind">The kind (name) of the token</param>
/// <param name="match">The Match the token is to be generated from</param>
/// <param name="index">The offset from the beginning of the file the index of the match is relative to</param>
/// <returns>The newly created token</returns>
public static Token Create(string kind, Match match, int index)
{
return new Token
{
Position = match.Index + index,
Kind = kind,
Value = match.Value
};
}
/// <summary>
/// Creates a new Token
/// </summary>
/// <param name="kind">The kind (name) of the token</param>
/// <param name="value">The value to assign to the token</param>
/// <param name="position">The absolute position in the source file the value is located at</param>
/// <returns>The newly created token</returns>
public static Token Create(string kind, string value, int position)
{
return new Token
{
Kind = kind,
Value = value,
Position = position
};
}
}
在此示例中,我们将使用Regex
es 来查找我们的令牌(以下 - 摘自我的演示项目中的 Program.cs)。
/// <summary>
/// Breaks an input string into recognizable tokens
/// </summary>
/// <param name="source">The input string to break up</param>
/// <returns>The set of tokens located within the string</returns>
static IEnumerable<Token> Tokenize(string source)
{
var tokens = new List<Token>();
var sourceParts = new[] { new KeyValuePair<string, int>(source, 0) };
tokens.AddRange(Tokenize(OpenParen, "\\(", ref sourceParts));
tokens.AddRange(Tokenize(CloseParen, "\\)", ref sourceParts));
tokens.AddRange(Tokenize(Semi, ";", ref sourceParts));
tokens.AddRange(Tokenize(Operator, "[\\^\\\\*\\+\\-/]", ref sourceParts));
tokens.AddRange(Tokenize(Literal, "\\w+", ref sourceParts));
return tokens.OrderBy(x => x.Position);
}
如您所见,我已经为左括号和右括号、分号、基本数学运算符以及字母和数字定义了模式。
该Tokenize
方法定义如下(再次来自我的演示项目中的 Program.cs)
/// <summary>
/// Performs tokenization of a collection of non-tokenized data parts with a specific pattern
/// </summary>
/// <param name="tokenKind">The name to give the located tokens</param>
/// <param name="pattern">The pattern to use to match the tokens</param>
/// <param name="untokenizedParts">The portions of the input that have yet to be tokenized (organized as text vs. position in source)</param>
/// <returns>The set of tokens matching the given pattern located in the untokenized portions of the input, <paramref name="untokenizedParts"/> is updated as a result of this call</returns>
static IEnumerable<Token> Tokenize(string tokenKind, string pattern, ref KeyValuePair<string, int>[] untokenizedParts)
{
//Do a bit of setup
var resultParts = new List<KeyValuePair<string, int>>();
var resultTokens = new List<Token>();
var regex = new Regex(pattern);
//Look through all of our currently untokenized data
foreach (var part in untokenizedParts)
{
//Find all of our available matches
var matches = regex.Matches(part.Key).OfType<Match>().ToList();
//If we don't have any, keep the data as untokenized and move to the next chunk
if (matches.Count == 0)
{
resultParts.Add(part);
continue;
}
//Store the untokenized data in a working copy and save the absolute index it reported itself at in the source file
var workingPart = part.Key;
var index = part.Value;
//Look through each of the matches that were found within this untokenized segment
foreach (var match in matches)
{
//Calculate the effective start of the match within the working copy of the data
var effectiveStart = match.Index - (part.Key.Length - workingPart.Length);
resultTokens.Add(Token.Create(tokenKind, match, part.Value));
//If we didn't match at the beginning, save off the first portion to the set of untokenized data we'll give back
if (effectiveStart > 0)
{
var value = workingPart.Substring(0, effectiveStart);
resultParts.Add(new KeyValuePair<string, int>(value, index));
}
//Get rid of the portion of the working copy we've already used
if (match.Index + match.Length < part.Key.Length)
{
workingPart = workingPart.Substring(effectiveStart + match.Length);
}
else
{
workingPart = string.Empty;
}
//Update the current absolute index in the source file we're reporting to be at
index += effectiveStart + match.Length;
}
//If we've got remaining data in the working copy, add it back to the untokenized data
if (!string.IsNullOrEmpty(workingPart))
{
resultParts.Add(new KeyValuePair<string, int>(workingPart, index));
}
}
//Update the untokenized data to contain what we couldn't process with this pattern
untokenizedParts = resultParts.ToArray();
//Return the tokens we were able to extract
return resultTokens;
}
现在我们已经有了处理标记化数据的方法和类型,我们需要能够识别更大含义的片段,比如对简单函数(如sin(x)
)、复杂函数(如Function1(a1;a2;a3)
)、基本数学运算(如+
, -
, *
, 等)等等。我们将制作一个简单的解析器来处理它;首先,我们将为解析节点定义匹配条件。
public class ParseNodeDefinition
{
/// <summary>
/// The set of parse node definitions that could be transitioned to from this one
/// </summary>
private readonly IList<ParseNodeDefinition> _nextNodeOptions;
/// <summary>
/// Creates a new ParseNodeDefinition
/// </summary>
private ParseNodeDefinition()
{
_nextNodeOptions = new List<ParseNodeDefinition>();
}
/// <summary>
/// Gets whether or not this definition is an acceptable ending point for the parse tree
/// </summary>
public bool IsValidEnd { get; private set; }
/// <summary>
/// Gets the name an item must have for it to be matched by this definition
/// </summary>
public string MatchItemsNamed { get; private set; }
/// <summary>
/// Gets the set of parse node definitions that could be transitioned to from this one
/// </summary>
public IEnumerable<ParseNodeDefinition> NextNodeOptions
{
get { return _nextNodeOptions; }
}
/// <summary>
/// Gets or sets the tag that will be associated with the data if matched
/// </summary>
public string Tag { get; set; }
/// <summary>
/// Creates a new ParseNodeDefinition matching items with the specified name/kind.
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>A ParseNodeDefinition capable of matching items of the given name</returns>
public static ParseNodeDefinition Create(string matchItemsNamed, string tag, bool isValidEnd)
{
return new ParseNodeDefinition { MatchItemsNamed = matchItemsNamed, Tag = tag, IsValidEnd = isValidEnd };
}
public ParseNodeDefinition AddOption(string matchItemsNamed)
{
return AddOption(matchItemsNamed, string.Empty, false);
}
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag)
{
return AddOption(matchItemsNamed, tag, false);
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag, bool isValidEnd)
{
var node = Create(matchItemsNamed, tag, isValidEnd);
_nextNodeOptions.Add(node);
return node;
}
public ParseNodeDefinition AddOption(string matchItemsNamed, bool isValidEnd)
{
return AddOption(matchItemsNamed, string.Empty, isValidEnd);
}
/// <summary>
/// Links the given node as an option for a state to follow this one in the parse tree this node is a part of
/// </summary>
/// <param name="next">The node to add as an option</param>
public void LinkTo(ParseNodeDefinition next)
{
_nextNodeOptions.Add(next);
}
}
这将让我们按名称匹配单个元素(无论是ParseTree
稍后定义的)还是 aToken
因为它们都实现了ISourcePart
接口。接下来我们将创建一个ParseTreeDefinition
允许我们指定ParseNodeDefinitions
匹配序列的。
public class ParseTreeDefinition
{
/// <summary>
/// The set of parse node definitions that constitute an initial match to the parse tree
/// </summary>
private readonly IList<ParseNodeDefinition> _initialNodeOptions;
/// <summary>
/// Creates a new ParseTreeDefinition
/// </summary>
/// <param name="name">The name to give to parse trees generated from full matches</param>
public ParseTreeDefinition(string name)
{
_initialNodeOptions = new List<ParseNodeDefinition>();
Name = name;
}
/// <summary>
/// Gets the set of parse node definitions that constitute an initial match to the parse tree
/// </summary>
public IEnumerable<ParseNodeDefinition> InitialNodeOptions { get { return _initialNodeOptions; } }
/// <summary>
/// Gets the name of the ParseTreeDefinition
/// </summary>
public string Name { get; private set; }
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed)
{
return AddOption(matchItemsNamed, string.Empty, false);
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag)
{
return AddOption(matchItemsNamed, tag, false);
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="tag">The tag to associate with matched items</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, string tag, bool isValidEnd)
{
var node = ParseNodeDefinition.Create(matchItemsNamed, tag, isValidEnd);
_initialNodeOptions.Add(node);
return node;
}
/// <summary>
/// Adds an option for a named node to follow this one in the parse tree the node is a part of
/// </summary>
/// <param name="matchItemsNamed">The name of the item to be matched</param>
/// <param name="isValidEnd">Whether or not the element is a valid end to the parse tree</param>
/// <returns>The ParseNodeDefinition that has been added</returns>
public ParseNodeDefinition AddOption(string matchItemsNamed, bool isValidEnd)
{
return AddOption(matchItemsNamed, string.Empty, isValidEnd);
}
/// <summary>
/// Attempts to follow a particular branch in the parse tree from a given starting point in a set of source parts
/// </summary>
/// <param name="parts">The set of source parts to attempt to match in</param>
/// <param name="startIndex">The position to start the matching attempt at</param>
/// <param name="required">The definition that must be matched for the branch to be followed</param>
/// <param name="nodes">The set of nodes that have been matched so far</param>
/// <returns>true if the branch was followed to completion, false otherwise</returns>
private static bool FollowBranch(IList<ISourcePart> parts, int startIndex, ParseNodeDefinition required, ICollection<ParseNode> nodes)
{
if (parts[startIndex].Kind != required.MatchItemsNamed)
{
return false;
}
nodes.Add(new ParseNode(parts[startIndex], required.Tag));
return parts.Count > (startIndex + 1) && required.NextNodeOptions.Any(x => FollowBranch(parts, startIndex + 1, x, nodes)) || required.IsValidEnd;
}
/// <summary>
/// Attempt to match the parse tree definition against a set of source parts
/// </summary>
/// <param name="parts">The source parts to match against</param>
/// <returns>true if the parse tree was matched, false otherwise. parts is updated by this method to consolidate matched nodes into a ParseTree</returns>
public bool Parse(ref IList<ISourcePart> parts)
{
var partsCopy = parts.ToList();
for (var i = 0; i < parts.Count; ++i)
{
var tree = new List<ParseNode>();
if (InitialNodeOptions.Any(x => FollowBranch(partsCopy, i, x, tree)))
{
partsCopy.RemoveRange(i, tree.Count);
partsCopy.Insert(i, new ParseTree(Name, tree.ToArray(), tree[0].Position));
parts = partsCopy;
return true;
}
}
return false;
}
}
当然,如果没有一些地方来存储我们迄今为止定义的匹配器的结果,这些对我们没有多大好处,所以让我们定义ParseTree
aParseNode
是一个ParseTree
简单的ParseNode
对象集合,它ParseNode
是围绕 aParseTree
或Token
(或更多)的包装器一般任何ISourcePart
)。
public class ParseTree : ISourcePart
{
/// <summary>
/// Creates a new ParseTree
/// </summary>
/// <param name="kind">The kind (name) of tree this is</param>
/// <param name="nodes">The nodes the tree matches</param>
/// <param name="position">The position in the source file this tree is located at</param>
public ParseTree(string kind, IEnumerable<ISourcePart> nodes, int position)
{
Kind = kind;
ParseNodes = nodes.ToList();
Position = position;
}
public string Kind { get; private set; }
public int Position { get; private set; }
/// <summary>
/// Gets the nodes that make up this parse tree
/// </summary>
public IList<ISourcePart> ParseNodes { get; internal set; }
public Token[] AsTokens()
{
return ParseNodes.SelectMany(x => x.AsTokens()).ToArray();
}
}
public class ParseNode : ISourcePart
{
/// <summary>
/// Creates a new ParseNode
/// </summary>
/// <param name="sourcePart">The data that was matched to create this node</param>
/// <param name="tag">The tag data (if any) associated with the node</param>
public ParseNode(ISourcePart sourcePart, string tag)
{
SourcePart = sourcePart;
Tag = tag;
}
public string Kind { get { return SourcePart.Kind; } }
/// <summary>
/// Gets the tag associated with the matched data
/// </summary>
public string Tag { get; private set; }
/// <summary>
/// Gets the data that was matched to create this node
/// </summary>
public ISourcePart SourcePart { get; private set; }
public int Position { get { return SourcePart.Position; } }
public Token[] AsTokens()
{
return SourcePart.AsTokens();
}
}
这就是我们需要的构造,所以我们将开始配置我们的解析树定义。从这里开始的代码来自我演示中的 Program.cs。
正如您可能已经注意到,在上面关于为每个标记声明模式的块中,有一些值被引用但未定义,在这里它们是。
private const string CloseParen = "CloseParen";
private const string ComplexFunctionCall = "ComplexFunctionCall";
private const string FunctionCallStart = "FunctionCallStart";
private const string Literal = "Literal";
private const string OpenParen = "OpenParen";
private const string Operator = "Operator";
private const string ParenthesisRequiredElement = "ParenthesisRequiredElement";
private const string ParenthesizedItem = "ParenthesizedItem";
private const string Semi = "Semi";
private const string SimpleFunctionCall = "SimpleFunctionCall";
让我们首先定义一个匹配字面量 ( \w+
pattern) 的模式,后跟开括号;我们将使用它来匹配诸如sin(
or之类的东西Function1(
。
static ParseTreeDefinition CreateFunctionCallStartTree()
{
var tree = new ParseTreeDefinition(FunctionCallStart);
var name = tree.AddOption(Literal);
name.AddOption(OpenParen, true);
return tree;
}
真的不是很多,设置一棵树,为第一个匹配的内容添加一个选项作为文字,添加一个作为左括号匹配的下一个选项,并说它可以结束解析树。现在对于一个稍微复杂一点的二元数学运算(想不出任何需要包含的一元运算)
static ParseTreeDefinition CreateBinaryOperationResultTree()
{
var tree = new ParseTreeDefinition(Literal);
var parenthesizedItem = tree.AddOption(ParenthesizedItem);
var literal = tree.AddOption(Literal);
var simpleCall = tree.AddOption(SimpleFunctionCall);
var complexCall = tree.AddOption(ComplexFunctionCall);
var @operator = parenthesizedItem.AddOption(Operator);
literal.LinkTo(@operator);
simpleCall.LinkTo(@operator);
complexCall.LinkTo(@operator);
@operator.AddOption(ParenthesizedItem, true);
@operator.AddOption(Literal, true);
@operator.AddOption(SimpleFunctionCall, true);
@operator.AddOption(ComplexFunctionCall, true);
return tree;
}
这里我们说解析树可以以带括号的项目(like (1/2)
)、文字(like a5
or 3
)、简单的调用(like sin(4)
)或复杂的调用(likeFunction1(a1;a2;a3)
)。本质上,我们刚刚定义了左侧操作数的选项。接下来,我们说带括号的项目后面必须跟一个运算符(从开头声明的模式中的数学运算符之一),为了方便起见,我们将说左侧操作数的所有其他选项可以进展到相同的状态(有操作员)。接下来,操作员也必须有右手边,所以我们给它一组重复的选项以继续前进。请注意,它们与左侧操作数的定义不同,它们设置了能够终止解析树的标志。请注意,解析树被命名为 Literal 以避免必须指定另一种元素来匹配所有地方。接下来,带括号的项目:
static ParseTreeDefinition CreateParenthesizedItemTree()
{
var tree = new ParseTreeDefinition(ParenthesizedItem);
var openParen = tree.AddOption(OpenParen);
var nestedSimpleCall = openParen.AddOption(SimpleFunctionCall);
var nestedComplexCall = openParen.AddOption(ComplexFunctionCall);
var arg = openParen.AddOption(Literal);
var parenthesizedItem = openParen.AddOption(ParenthesizedItem);
var closeParen = nestedSimpleCall.AddOption(CloseParen, true);
arg.LinkTo(closeParen);
parenthesizedItem.LinkTo(closeParen);
nestedComplexCall.LinkTo(closeParen);
return tree;
}
这个很好,很容易,从一个括号开始,几乎任何东西都跟着它,然后用另一个括号来结束它。简单的调用(如sin(x)
)
static ParseTreeDefinition CreateSimpleFunctionCallTree()
{
var tree = new ParseTreeDefinition(SimpleFunctionCall);
var openParen = tree.AddOption(FunctionCallStart);
var nestedItem = openParen.AddOption(ParenthesizedItem);
var nestedSimpleCall = openParen.AddOption(SimpleFunctionCall);
var nestedComplexCall = openParen.AddOption(ComplexFunctionCall);
var arg = openParen.AddOption(Literal);
var parenthesizedItem = openParen.AddOption(ParenthesizedItem);
var closeParen = nestedSimpleCall.AddOption(CloseParen, true);
arg.LinkTo(closeParen);
nestedItem.LinkTo(closeParen);
parenthesizedItem.LinkTo(closeParen);
nestedComplexCall.LinkTo(closeParen);
return tree;
}
复杂的调用(如Function1(a1;a2;a3)
)
static ParseTreeDefinition CreateComplexFunctionCallTree()
{
var tree = new ParseTreeDefinition(ComplexFunctionCall);
var openParen = tree.AddOption(FunctionCallStart);
var arg = openParen.AddOption(Literal, ParenthesisRequiredElement);
var simpleCall = openParen.AddOption(SimpleFunctionCall, ParenthesisRequiredElement);
var complexCall = openParen.AddOption(ComplexFunctionCall, ParenthesisRequiredElement);
var nested = openParen.AddOption(ParenthesizedItem);
var semi = arg.AddOption(Semi);
simpleCall.LinkTo(semi);
complexCall.LinkTo(semi);
nested.LinkTo(semi);
var arg2 = semi.AddOption(Literal, ParenthesisRequiredElement);
var simpleCall2 = semi.AddOption(SimpleFunctionCall, ParenthesisRequiredElement);
var complexCall2 = semi.AddOption(ComplexFunctionCall, ParenthesisRequiredElement);
var nested2 = semi.AddOption(ParenthesizedItem);
arg2.LinkTo(semi);
simpleCall2.LinkTo(semi);
complexCall2.LinkTo(semi);
nested2.LinkTo(semi);
var closeParen = arg2.AddOption(CloseParen, true);
arg2.LinkTo(closeParen);
simpleCall2.LinkTo(closeParen);
complexCall2.LinkTo(closeParen);
return tree;
}
这就是我们需要的所有树,所以让我们看看运行这一切的代码
static void Main()
{
//The input string
const string input = @"Function1((a1^5-4)/2;1/sin(a2);a3;a4)+Function2(a1;a2;1/a3)";
//Locate the recognizable tokens within the source
IList<ISourcePart> tokens = Tokenize(input).Cast<ISourcePart>().ToList();
//Create the parse trees we'll need to be able to recognize the different parts of the input
var functionCallStartTree = CreateFunctionCallStartTree();
var parenthethesizedItemTree = CreateParenthesizedItemTree();
var simpleFunctionCallTree = CreateSimpleFunctionCallTree();
var complexFunctionCallTree = CreateComplexFunctionCallTree();
var binaryOpTree = CreateBinaryOperationResultTree();
//Parse until we can't parse anymore
while (functionCallStartTree.Parse(ref tokens) || binaryOpTree.Parse(ref tokens) || parenthethesizedItemTree.Parse(ref tokens) || simpleFunctionCallTree.Parse(ref tokens) || complexFunctionCallTree.Parse(ref tokens))
{ }
//Run our post processing to fix the parenthesis in the input
FixParenthesis(ref tokens);
//Collapse our parse tree(s) back to a string
var values = tokens.OrderBy(x => x.Position).SelectMany(x => x.AsTokens()).Select(x => x.Value);
//Print out our results and wait
Console.WriteLine(string.Join(string.Empty, values));
Console.ReadLine();
}
我们唯一需要定义的是如何实际包装“复杂”调用的参数列表中的元素。这是由FixParenthesis
方法处理的。
private static void FixParenthesis(ref IList<ISourcePart> items)
{
//Iterate through the set we're examining
for (var i = 0; i < items.Count; ++i)
{
var parseNode = items[i] as ParseNode;
//If we've got a parse node...
if (parseNode != null)
{
var nodeTree = parseNode.SourcePart as ParseTree;
//If the parse node represents a parse tree...
if (nodeTree != null)
{
//Fix parenthesis within the tree
var nodes = nodeTree.ParseNodes;
FixParenthesis(ref nodes);
nodeTree.ParseNodes = nodes;
}
//If this parse node required parenthesis, replace the subtree and add them
if (parseNode.Tag == ParenthesisRequiredElement)
{
var nodeContents = parseNode.AsTokens();
var combined = string.Join(string.Empty, nodeContents.OrderBy(x => x.Position).Select(x => x.Value));
items[i] = Token.Create(parseNode.Kind, string.Format("({0})", combined), parseNode.Position);
}
continue;
}
var parseTree = items[i] as ParseTree;
//If we've got a parse tree...
if (parseTree != null)
{
//Fix parenthesis within the tree
var nodes = parseTree.ParseNodes;
FixParenthesis(ref nodes);
parseTree.ParseNodes = nodes;
}
}
}
无论如何,我希望这有助于或至少提供了一个有趣的消遣。