c# - 这个文本文件是可解析的还是不可解析的？

Question

以免我遭受一阵乏味的震颤（大致介于震颤谵妄和腕管综合症之间），我需要找到一种方法来自动解析大型 sql 语句文件及其参数值。

我有一个包含大量 sql 语句的文件，格式如下：

select Animal#, RacketThreshold, PeakOil as Oil
from OilAnimalPlatypus2
where OilAnimalPlatypusID = :ID
  and Animal# = :Animal
  and TelecasterAccessType = 'D'
UNION
select Animal, RacketThreshold, PeakOil as Oil
from OilRequestPlatypus
where PlatypusID = :ID
  and Animal = :Animal
order by RacketThreshold

-->ID(VARCHAR[0])=<NULL> 
:Animal(INTEGER)=2

...即多行 sql 语句，后跟一个空行，后跟两个破折号和一个带有参数名称、数据类型和参数的箭头，后跟相同的东西，一遍又一遍地反复无止境地恶心（除了 sql 语句有无参数）。

我想从这个伟大的 gob 中为每个唯一查询创建一个单独的字符串（其中许多是相同的，尽管通常为参数分配了不同的参数值）。如果可能的话，我还想跟踪传递给特定查询的所有参数值（例如，如果第一次调用它并为特定参数传递“1”，下一次是“42”，下一次“3.14”等），我想要该 arg 名称的 1、42、3.14 的集合。

有超过 400 个查询，我讨厌“手动”完成这一切的想法——尤其是匹配查询的比较。

更新

好的，添加此代码以使用 Jon 的代码后：

private void buttonOpenAndParseSQLMonFile_Click(object sender, EventArgs e)
{
    var queriesAndArgs = (Dictionary<string, List<string>>)ParseFile("SQLMonTraceLog.txt");
    foreach(var pair in queriesAndArgs)
    {
        richTextBoxParsedResults.AppendText(pair.Key);
        richTextBoxParsedResults.AppendText(Environment.NewLine);
        foreach (String s in pair.Value)
        {
            richTextBoxParsedResults.AppendText(s);
            richTextBoxParsedResults.AppendText(Environment.NewLine);
        }
        richTextBoxParsedResults.AppendText(Environment.NewLine);
    }
}

...我在我的richTextBox 中得到这些类型的结果：

select ABCID from ABCWorker where lower(loginid) = lower(user) 


select r.roleid from abcrole r, abcworker w where lower(w.loginid)=lower(user)   and r.abcid=w.abcid   and r.status='A'


select Tier#, BenGrimm, PeakRate as Ratefrom RageAnimalGreenBayPackers2 where RageAnimalGreenBayPackersID = :ID and Tier# = 

:Tier and FlyingVAccessType = 'D' UNION select Tier, BenGrimm, PeakRate as Rate from CaliforniaCondorGreenBayPackers where 

GreenBayPackersID = :ID and Tier = :Tier order by BenGrimm 
-->   :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=1 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=1 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=1 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=4 


select Tier#, BenGrimm, PeakRate as Rate from RageAnimalGreenBayPackers2 where RageAnimalGreenBayPackersID = :ID and Tier# = 

:Tier and FlyingVAccessType = 'D' UNION select Tier, BenGrimm, PeakRate as Rate from CaliforniaCondorGreenBayPackers where 

GreenBayPackersID = :ID and Tier = :Tier order by BenGrimm 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=2 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=5 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=1 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=2 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=3 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=4 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=2 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=3 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=4 
(etc.)

...所以，这很有启发性，但我发现这不是我所需要的，而且这取决于我对文件的 lamo 手动调整。所以，我想我需要退后一步，解析文件，因为它实际上是给我的，递增的数字分隔每个“有趣的”事件：

. . .
6       11:30:46  SQL Execute: select ABCID
from ABCWorker
where lower(loginid) = lower(user)
7       11:30:46  SQL Prepare: select r.roleid from abcrole r, abcworker w where lower(w.loginid)=lower(user)   and     
r.abcid=w.abcid   and r.status='A'
8       11:30:46  SQL Execute: select r.roleid from abcrole r, abcworker w where lower(w.loginid)=lower(user)   and     
r.abcid=w.abcid   and r.status='A'
9       11:30:46  SQL Execute: select Tier#, BenGrimm, PeakRate as Rate
from RageAnimalGreenBayPackers2
where RageAnimalGreenBayPackersID = :ID
  and Tier# = :Tier
  and FlyingVAccessType = 'D'
UNION
select Tier, BenGrimm, PeakRate as Rate
from CaliforniaCondorGreenBayPackers
where GreenBayPackersID = :ID
  and Tier = :Tier
order by BenGrimm
10      11:30:46  :ID(VARCHAR[0])=<NULL> 
:Tier(INTEGER)=1
11      11:30:46  SQL Execute: select Tier#, BenGrimm, PeakRate as Rate
from RageAnimalGreenBayPackers2
where RageAnimalGreenBayPackersID = :ID
  and Tier# = :Tier
  and FlyingVAccessType = 'D'
UNION
select Tier, BenGrimm, PeakRate as Rate
from CaliforniaCondorGreenBayPackers
where GreenBayPackersID = :ID
  and Tier = :Tier
order by BenGrimm
12      11:30:46  :ID(VARCHAR[0])=<NULL> 
:Tier(INTEGER)=2
. . .

score 1 · Accepted Answer

你真正需要的是一个词法分析器。查看 ANTLR - http://www.antlr.org/

您将需要定义您的“语法”，即您的语言的每个元素的特征（在这种情况下是您的 SQL 文件）。最后，ANTLR 处理您的文件并根据我们的语法定义吐出结果。

这只是一个标记化和解析过程。

score 1 · Accepted Answer

这是我评论的一个具体例子；您可以通过使用StreamReader并将每个块收集到一个列表中来非常简单地做到这一点；例如：

string line = String.Empty;

List<String> statementBlocks = new List<String>();

System.IO.StreamReader file = new System.IO.StreamReader("C:\\temp\\annoying_text_file.sql");

StringBuilder blockCollector = new StringBuilder();

//read the file a line at a time
while((line = file.ReadLine()) != null)
{
  //If the line has content, then we append it to our string builder 
  if(!String.IsNullOrWhitespace(line)) //String.IsNullOrWhitespace is new in .Net 4 and will also match the new line
  {
      blockCollector.AppendLine(line);
  }
  else
  {
       //we've hit a blank line - dump it to the list and reinitialize the stringbuilder
       statementBlocks.Add(blockCollector.ToString();
       statementBlocks = new StringBuilder();
  }

}

//Tidy up
file.Close();

foreach(string statementBlock in statementBlocks)
{
  if(!String.IsNullOrEmpty(statementBlock))
  {
      if(statememtBlock.StartsWith("-->"))
      {
        //Code to split out the arguments; if they are delimited with : then you can just string.split this line
        //string[] paramsAndValues = line.Replace("-->", String.Empty).Split(Char.Parse(":"))
        // then for each string in here it's paramName(DataType)=Value, which is also splittable.
      }
      else
      {
      //Do whatever you want with this valid block (including writing it to another file!)
      //To keep only the unique ones, store each block in a list, then look to see if a block already exists in the list each time; if it does, just skip this block. Given you also know that the next block will be a parameter block, you can also collect the parameters here too
      }
  }    
}

我现在不能检查这个编译，但它应该让你大致了解一种可能的方式来做你想做的事。

它假设唯一的空行是语句块之间的空行。

score 1 · Accepted Answer

假设您通过另一个空白行将查询彼此分开，您可以尝试使用以下内容来解析您的文件。代码将通读文件直到结束。对 parseQuery 的每次调用都将逐行读取，直到找到一个空行，并将它们附加在一起作为您的查询。然后它将检查下一行，如果它不是参数块的开头，它将保存不带参数的查询，并重新开始，假设它位于另一个查询的开头。如果该行是参数块的开头，则代码将读取，直到遇到另一个空白行，保存查询及其参数，然后返回。while(parseQuery) 将确保整个文件被解析。

最后，代码输出一个字典，其中包含一个查询字符串作为键，一个字符串列表作为提供的不同参数。为简单起见，省略了错误检查。在实际场景中，您可能希望添加对文件不存在之类的处理。

static IDictionary<string, List<string>> ParseFile(string path)
{
    Dictionary<string, List<string>> queries = new Dictionary<string, List<string>>();
    using (var reader = File.OpenText(path))
    {
        while (parseQuery(reader, queries)) { }
    }
    return queries;
}

private static bool parseQuery(StreamReader reader, Dictionary<string, List<string>> queries)
{
    StringBuilder sbQuery = new StringBuilder();
    StringBuilder sbArgs = new StringBuilder();
    // Read in query
    bool moreLines = ParseBlock(reader, sbQuery);
    if (moreLines)
    {
        while (moreLines)
        {
            string line = reader.ReadLine();
            // Check for the beginning of an args block.
            if (line != null && line.StartsWith("-->"))
            {
                // Read in args
                sbArgs.Append(line);
                moreLines = ParseBlock(reader, sbArgs);
                break;
            }
            // If this is not an args block, it is a new query
            // Save the last query and start over
            else
            {
                AddQuery(queries, sbQuery.ToString(), sbArgs.ToString());
                sbQuery = new StringBuilder();
                sbQuery.Append(line); // Make sure we capture the last line
                moreLines = ParseBlock(reader, sbQuery);
            }
        }
    }
    AddQuery(queries, sbQuery.ToString(), sbArgs.ToString());
    return moreLines;
}

private static bool ParseBlock(StreamReader reader, StringBuilder builder)
{
    string line;
    while ((line = reader.ReadLine()) != null)
    {
        line = line.Trim();
        if (string.IsNullOrWhiteSpace(line)) break;

        builder.Append(line + " ");
    }
    return line != null;
}

private static void AddQuery(Dictionary<string, List<string>> queries, string query, string args)
{
    if (query.Length > 0)
    {
        List<string> lstParams;
        if (!queries.TryGetValue(query, out lstParams))
        {
            lstParams = new List<string>();
        }
        lstParams.Add(args);
        queries[query] = lstParams;
    }
}

c# - 这个文本文件是可解析的还是不可解析的？

更新

3 回答 3

Related

Reference