3

我正在使用一种以类似 Lisp 的 S-Expression 字符串形式提供数据的服务。这些数据到达的速度又快又厚,我想尽可能快地翻阅它,最好是直接在字节流上(它只是单字节字符),没有任何回溯。这些字符串可能很长,我不希望为整个消息分配字符串的 GC 搅动。

我当前的实现使用带有语法的 CoCo/R,但它有一些问题。由于回溯,它将整个流分配给一个字符串。如果必须更改我的代码的用户,这也有点繁琐。我宁愿有一个纯 C# 解决方案。CoCo/R 也不允许重用解析器/扫描器对象,所以我必须为每条消息重新创建它们。

从概念上讲,数据流可以被认为是一系列 S-Expressions:

(item 1 apple)(item 2 banana)(item 3 chainsaw)

解析这个序列将创建三个对象。每个对象的类型可以由列表中的第一个值来确定,在上面的例子中是“item”。传入流的模式/语法是众所周知的。

在我开始编码之前,我想知道是否有库已经这样做了。我敢肯定我不是第一个遇到这个问题的人。


编辑

这是我想要的更多细节,因为我认为最初的问题可能有点含糊。

给定一些 SExpressions,例如:

(Hear 12.3 HelloWorld)
(HJ LAJ1 -0.42)
(FRP lf (pos 2.3 1.7 0.4))

我想要一个与此等效的对象列表:

{
    new HearPerceptorState(12.3, "HelloWorld"),
    new HingeJointState("LAJ1", -0.42),
    new ForceResistancePerceptorState("lf", new Polar(2.3, 1.7, 0.4))
}

我正在处理的实际数据集是来自 RoboCup 3D 模拟足球联赛中机器人模型的感知器列表。我可能还需要反序列化另一组具有更复杂结构的相关数据

4

6 回答 6

4

在我看来,解析生成器不需要解析仅由列表、数字和符号组成的简单 S 表达式。一个手写的递归下降解析器可能更简单,至少同样快。一般模式看起来像这样(在 java 中,c# 应该非常相似):

Object readDatum(PushbackReader in) {
    int ch = in.read();
    return readDatum(in, ch);
}
Object readDatum(PushbackReader in, int ch) {
    if (ch == '(')) {
        return readList(in, ch);
    } else if (isNumber(ch)) {
        return readNumber(in, ch);
    } else if (isSymbolStart(ch)) {
        return readSymbol(in, ch);
    } else {
        error(ch);
    }
}
List readList(PushbackReader in, int lookAhead) {
    if (ch != '(') {
        error(ch);
    }
    List result = new List();
    while (true) {
        int ch = in.read();
        if (ch == ')') {
            break;
        } else if (isWhiteSpace(ch)) {
            skipWhiteSpace(in);
        } else {
            result.append(readDatum(in, ch);
        }
    }
    return result;
}
String readSymbol(PushbackReader in, int ch) {
    StringBuilder result = new StringBuilder();
    result.append((char)ch);
    while (true) {
       int ch2 = in.read();
       if (isSymbol(ch2)) {
           result.append((char)ch2);
       } else if (isWhiteSpace(ch2) || ch2 == ')') {
           in.unread(ch2);
           break;
       } else if (ch2 == -1) {
           break;
       } else {
           error(ch2);
       }
    }
    return result.toString();
}
于 2010-06-24T13:21:05.357 回答
1

我使用OMeta#在 C# 中编写了一个 S-Expression 解析器。它可以解析您在示例中给出的 S-Expressions 类型,您只需向解析器添加十进制数。

该代码在 github 上以SExpression.NET的形式提供,相关文章可在此处获得。作为替代方案,我建议查看同样使用 OMeta# 编写的适用于 .NET的YaYAML YAML 解析器。

于 2013-02-28T21:48:54.693 回答
0

Drew, perhaps you should add some context to the question, otherwise this answer will make no sense to other users, but try this:

CHARACTERS

    letter = 'A'..'Z' + 'a'..'z' .
    digit = "0123456789" .
    messageChar = '\u0020'..'\u007e' - ' ' - '(' - ')'  .

TOKENS

    double = ['-'] digit { digit } [ '.' digit { digit } ] .
    ident = letter { letter | digit | '_' } .
    message = messageChar { messageChar } CONTEXT (")") .

Oh, I have to point out that '\u0020' is the unicode SPACE, which you are subsequently removing with "- ' '". Oh, and you can use CONTEXT (')') if you don't need more than one character lookahead.

FWIW: CONTEXT does not consume the enclosed sequence, you must still consume it in your production.

EDIT:

Ok, this seems to work. Really, I mean it this time :)

CHARACTERS
    letter = 'A'..'Z' + 'a'..'z' .
    digit = "0123456789" .
//    messageChar = '\u0020'..'\u007e' - ' ' - '(' - ')'  .

TOKENS

    double = ['-'] digit { digit } [ '.' digit { digit } ] .
    ident = letter { letter | digit | '_' } .
//    message = letter { messageChar } CONTEXT (')') .

// MessageText<out string m> = message               (. m = t.val; .)
// .

HearExpr<out HeardMessage message> = (. TimeSpan time; Angle direction = Angle.NaN; string messageText; .)
    "(hear" 
        TimeSpan<out time>
        ( "self" | AngleInDegrees<out direction> )
// MessageText<out messageText>    // REMOVED    
{ ANY } (. messageText = t.val; .) // MOD
    ')' (. message = new HeardMessage(time, direction, new Message(messageText)); .)
    .
于 2010-06-24T05:10:28.363 回答
0

Consider using Ragel. It's a state machine compiler and produces reasonably fast code.

It may not be apparent from the home page, but Ragel does have C# support. Here's a trivial example of how to use it in C#

于 2010-06-16T08:45:13.900 回答
0

看看gplexgppg

或者,您可以简单地将 S 表达式转换为 XML,然后让 .NET 完成其余工作。

于 2010-06-16T08:58:26.323 回答
0

这是一个相对简单(并且希望易于扩展)的解决方案:

public delegate object Acceptor(Token token, string match);

public class Symbol
{
    public Symbol(string id) { Id = id ?? Guid.NewGuid().ToString("P"); }
    public override string ToString() => Id;
    public string Id { get; private set; }
}

public class Token : Symbol
{
    internal Token(string id) : base(id) { }
    public Token(string pattern, Acceptor acceptor) : base(pattern) { Regex = new Regex(string.Format("^({0})", !string.IsNullOrEmpty(Pattern = pattern) ? Pattern : ".*"), RegexOptions.Compiled); ValueOf = acceptor; }
    public string Pattern { get; private set; }
    public Regex Regex { get; private set; }
    public Acceptor ValueOf { get; private set; }
}

public class SExpressionSyntax
{
    private readonly Token Space = Token("\\s+", Echo);
    private readonly Token Open = Token("\\(", Echo);
    private readonly Token Close = Token("\\)", Echo);
    private readonly Token Quote = Token("\\'", Echo);
    private Token comment;

    private static Exception Error(string message, params object[] arguments) => new Exception(string.Format(message, arguments));

    private static object Echo(Token token, string match) => new Token(token.Id);

    private static object Quoting(Token token, string match) => NewSymbol(token, match);

    private Tuple<Token, string, object> Read(ref string input)
    {
        if (!string.IsNullOrEmpty(input))
        {
            var found = null as Match;
            var sofar = input;
            var tuple = Lexicon.FirstOrDefault(current => (found = current.Item2.Regex.Match(sofar)).Success && (found.Length > 0));
            var token = tuple != null ? tuple.Item2 : null;
            var match = token != null ? found.Value : null;
            input = match != null ? input.Substring(match.Length) : input;
            return token != null ? Tuple.Create(token, match, token.ValueOf(token, match)) : null;
        }
        return null;
    }

    private Tuple<Token, string, object> Next(ref string input)
    {
        Tuple<Token, string, object> read;
        while (((read = Read(ref input)) != null) && ((read.Item1 == Comment) || (read.Item1 == Space))) ;
        return read;
    }

    public object Parse(ref string input, Tuple<Token, string, object> next)
    {
        var value = null as object;
        if (next != null)
        {
            var token = next.Item1;
            if (token == Open)
            {
                var list = new List<object>();
                while (((next = Next(ref input)) != null) && (next.Item1 != Close))
                {
                    list.Add(Parse(ref input, next));
                }
                if (next == null)
                {
                    throw Error("unexpected EOF");
                }
                value = list.ToArray();
            }
            else if (token == Quote)
            {
                var quote = next.Item3;
                next = Next(ref input);
                value = new[] { quote, Parse(ref input, next) };
            }
            else
            {
                value = next.Item3;
            }
        }
        else
        {
            throw Error("unexpected EOF");
        }
        return value;
    }

    protected Token TokenOf(Acceptor acceptor)
    {
        var found = Lexicon.FirstOrDefault(pair => pair.Item2.ValueOf == acceptor);
        var token = found != null ? found.Item2 : null;
        if ((token == null) && (acceptor != Commenting))
        {
            throw Error("missing required token definition: {0}", acceptor.Method.Name);
        }
        return token;
    }

    protected IList<Tuple<string, Token>> Lexicon { get; private set; }

    protected Token Comment { get { return comment = comment ?? TokenOf(Commenting); } }

    public static Token Token(string pattern, Acceptor acceptor) => new Token(pattern, acceptor);

    public static object Commenting(Token token, string match) => Echo(token, match);

    public static object NewSymbol(Token token, string match) => new Symbol(match);

    public static Symbol Symbol(object value) => value as Symbol;

    public static string Moniker(object value) => Symbol(value) != null ? Symbol(value).Id : null;

    public static string ToString(object value)
    {
        return
            value is object[] ?
            (
                ((object[])value).Length > 0 ?
                ((object[])value).Aggregate(new StringBuilder("("), (result, obj) => result.AppendFormat(" {0}", ToString(obj))).Append(" )").ToString()
                :
                "( )"
            )
            :
            (value != null ? (value is string ? string.Concat('"', (string)value, '"') : (value is bool ? value.ToString().ToLower() : value.ToString())).Replace("\\\r\n", "\r\n").Replace("\\\n", "\n").Replace("\\t", "\t").Replace("\\n", "\n").Replace("\\r", "\r").Replace("\\\"", "\"") : null) ?? "(null)";
    }

    public SExpressionSyntax()
    {
        Lexicon = new List<Tuple<string, Token>>();
        Include(Space, Open, Close, Quote);
    }

    public SExpressionSyntax Include(params Token[] tokens)
    {
        foreach (var token in tokens)
        {
            Lexicon.Add(new Tuple<string, Token>(token.Id, token));
        }
        return this;
    }

    public object Parse(string input)
    {
        var next = Next(ref input);
        var value = Parse(ref input, next);
        if ((next = Next(ref input)) != null)
        {
            throw Error("unexpected ", next.Item1);
        }
        return value;
    }
}

public class CustomSExpressionSyntax : SExpressionSyntax
{
    public CustomSExpressionSyntax()
        : base()
    {
        Include
        (
            // "//" comments
            Token("\\/\\/.*", SExpressionSyntax.Commenting),

            // Obvious
            Token("false", (token, match) => false),
            Token("true", (token, match) => true),
            Token("null", (token, match) => null),
            Token("\\-?[0-9]+\\.[0-9]+", (token, match) => double.Parse(match)),
            Token("\\-?[0-9]+", (token, match) => int.Parse(match)),

            // String literals
            Token("\\\"(\\\\\\n|\\\\t|\\\\n|\\\\r|\\\\\\\"|[^\\\"])*\\\"", (token, match) => match.Substring(1, match.Length - 2)),

            // Identifiers
            Token("[_A-Za-z][_0-9A-Za-z]*", NewSymbol)
        );
    }
}

public class Node { }

public class HearPerceptorState : Node
{
    public string Ident { get; set; }
    public double Value { get; set; }
}

public class HingeJointState : Node
{
    public string Ident { get; set; }
    public double Value { get; set; }
}

public class Polar : Tuple<double, double, double>
{
    public Polar(double a, double b, double c) : base(a, b, c) { }
}

public class ForceResistancePerceptorState : Node
{
    public string Ident { get; set; }
    public Polar Polar { get; set; }
}

public class Test
{
    public static void Main()
    {
        var input = @"
            (
                (Hear 12.3 HelloWorld)
                (HJ LAJ1 -0.42)
                (FRP lf (pos 2.3 1.7 0.4))
            )
        ";

        // visit DRY helpers
        Func<object, object[]> asRecord = value => (object[])value;
        Func<object, Symbol> symbol = value => SExpressionSyntax.Symbol(value);
        Func<object, string> identifier = value => symbol(value).Id;

        // the SExpr visit, proper
        Func<object[], Node[]> visitAll = null;
        Func<object[], Node> visitHear = null;
        Func<object[], Node> visitHJ = null;
        Func<object[], Node> visitFRP = null;

        visitAll =
            all =>
                all.
                Select
                (
                    item =>
                        symbol(asRecord(item)[0]).Id != "Hear" ?
                        (
                            symbol(asRecord(item)[0]).Id != "HJ" ?
                            visitFRP(asRecord(item))
                            :
                            visitHJ(asRecord(item))
                        )
                        :
                        visitHear(asRecord(item))
                ).
                ToArray();

        visitHear =
            item =>
                new HearPerceptorState { Value = (double)asRecord(item)[1], Ident = identifier(asRecord(item)[2]) };

        visitHJ =
            item =>
                new HingeJointState { Ident = identifier(asRecord(item)[1]), Value = (double)asRecord(item)[2] };

        visitFRP =
            item =>
                new ForceResistancePerceptorState
                {
                    Ident = identifier(asRecord(item)[1]),
                    Polar =
                        new Polar
                        (
                            (double)asRecord(asRecord(item)[2])[1],
                            (double)asRecord(asRecord(item)[2])[2],
                            (double)asRecord(asRecord(item)[2])[3]
                        )
                };

        var syntax = new CustomSExpressionSyntax();

        var sexpr = syntax.Parse(input);

        var nodes = visitAll(asRecord(sexpr));

        Console.WriteLine("SO_3051254");
        Console.WriteLine();
        Console.WriteLine(nodes.Length == 3);
        Console.WriteLine(nodes[0] is HearPerceptorState);
        Console.WriteLine(nodes[1] is HingeJointState);
        Console.WriteLine(nodes[2] is ForceResistancePerceptorState);
    }
}

可在此处测试:

https://repl.it/CnLC/1

'HTH,

于 2016-08-15T10:50:08.947 回答