3

我有以下文件,需要解析

--TestFile
Start ASDF123
Name "John"
Address "#6,US" 
end ASDF123

以开头的--行将被视为注释行。并且文件以“开始”开头并以end. 后面的字符串Start是 the UserID,然后是nameandaddress将在双引号内。

我需要解析文件并将解析后的数据写入 xml 文件。

所以生成的文件会像

<ASDF123>
  <Name Value="John" />
  <Address Value="#6,US" />
</ASDF123>

现在我使用模式匹配(Regular Expressions)来解析上面的文件。这是我的示例代码。

    /// <summary>
    /// To Store the row data from the file
    /// </summary>
    List<String> MyList = new List<String>();

    String strName = "";
    String strAddress = "";
    String strInfo = "";

方法:读取文件

    /// <summary>
    /// To read the file into a List
    /// </summary>
    private void ReadFile()
    {
        StreamReader Reader = new StreamReader(Application.StartupPath + "\\TestFile.txt");
        while (!Reader.EndOfStream)
        {
            MyList.Add(Reader.ReadLine());
        }
        Reader.Close();
    }

方法:FormateRowData

    /// <summary>
    /// To remove comments 
    /// </summary>
    private void FormateRowData()
    {
        MyList = MyList.Where(X => X != "").Where(X => X.StartsWith("--")==false ).ToList();
    }

方法:解析数据

    /// <summary>
    /// To Parse the data from the List
    /// </summary>
    private void ParseData()
    {
        Match l_mMatch;
        Regex RegData = new Regex("start[ \t\r\n]*(?<Data>[a-z0-9]*)", RegexOptions.IgnoreCase);
        Regex RegName = new Regex("name [ \t\r\n]*\"(?<Name>[a-z]*)\"", RegexOptions.IgnoreCase);
        Regex RegAddress = new Regex("address [ \t\r\n]*\"(?<Address>[a-z0-9 #,]*)\"", RegexOptions.IgnoreCase);
        for (int Index = 0; Index < MyList.Count; Index++)
        {
            l_mMatch = RegData.Match(MyList[Index]);
            if (l_mMatch.Success)
                strInfo = l_mMatch.Groups["Data"].Value;
            l_mMatch = RegName.Match(MyList[Index]);
            if (l_mMatch.Success)
                strName = l_mMatch.Groups["Name"].Value;
            l_mMatch = RegAddress.Match(MyList[Index]);
            if (l_mMatch.Success)
                strAddress = l_mMatch.Groups["Address"].Value;
        }
    }

方法:写文件

    /// <summary>
    /// To write parsed information into file.
    /// </summary>
    private void WriteFile()
    {
        XDocument XD = new XDocument(
                           new XElement(strInfo,
                                         new XElement("Name",
                                             new XAttribute("Value", strName)),
                                         new XElement("Address",
                                             new XAttribute("Value", strAddress))));
        XD.Save(Application.StartupPath + "\\File.xml");
    }

我听说过ParserGenerator

请帮助我使用 lex 和 yacc 编写解析器。这样做的原因是,我现有的 parser( Pattern Matching) 不灵活,更多的是它不是正确的方式(我认为是这样)。

我如何使用ParserGenerator(我已经阅读了代码项目示例一代码项目示例二 ,但我仍然不熟悉)。请向我推荐一些输出 C# 解析器的解析器生成器。

4

2 回答 2

5

Gardens Point LEXGardens Point Parser Generator受到 LEX 和 YACC 的强烈影响,并输出 C# 代码。

你的语法很简单,我认为你目前的方法很好,但是想要学习“真实”的方法是值得称赞的。:-) 所以这是我对语法的建议(只是生产规则;这远不是一个完整的例子。实际的 GPPG 文件需要...用 C# 代码替换用于构建语法树的代码,并且您需要令牌声明等 - 阅读文档中的 GPPG 示例。您还需要描述令牌的 GPLEX 文件):

/* Your input file is a list of "top level elements" */
TopLevel : 
    TopLevel TopLevelElement { ... }
    | /* (empty) */

/* A top level element is either a comment or a block. 
   The COMMENT token must be described in the GPLEX file as 
   any line that starts with -- . */
TopLevelElement:
    Block { ... }
    | COMMENT { ... }

/* A block starts with the token START (which, in the GPLEX file, 
   is defined as the string "Start"), continues with some identifier 
   (the block name), then has a list of elements, and finally the token
   END followed by an identifier. If you want to validate that the
   END identifier is the same as the START identifier, you can do that
   in the C# code that analyses the syntax tree built by GPPG.
   The token Identifier is also defined with a regular expression in GPLEX. */
Block:
    START Identifier BlockElementList END Identifier { ... }

BlockElementList:
    BlockElementList BlockElement { ... }
    | /* empty */

BlockElement:
    (NAME | ADDRESS) QuotedString { ... }
于 2011-03-12T12:16:30.870 回答
1

您首先必须为解析器定义语法。(Yacc部分)

似乎是这样的:

file : record file
     ;

record: start identifier recordContent end identifier {//rule to match the two identifiers}
      ;

recordContent: name value; //Can be more detailed if you require order in the fields

词法分析将按 lex 执行。我猜你的正则表达式对定义它们很有用。

我的回答是一个草稿,我建议你在网上找一个更完整的关于 lex/yacc flex/bison 的教程,如果你有更集中的问题,请回到这里。

我也不知道是否有 C# 实现可以让您保留托管代码。您可能必须使用非托管 C/C++ 导入。

于 2011-03-12T12:32:28.183 回答