12

I'm new to both XML and C#; I'm trying to find a way to efficiently parse a given xml file to retrieve relevant numerical values, base on the "proj_title" value=heat_run or any other possible values. For example, calculating the duration of a particular test run (proj_end val-proj_start val).

ex.xml:

<proj ID="2">
      <proj_title>heat_run</proj_title>
      <proj_start>100</proj_start>
      <proj_end>200</proj_end>
</proj>

... We can't search by proj ID since this value is not fixed from test run to test run. The above file is huge: ~8mb, and there's ~2000 tags w/ the name proj_title. is there an efficient way to first find all tag names w/ proj_title="heat_run", then to retrieve the proj start and end value for this particular proj_title using C#??

Here's my current C# code:

public class parser
{
     public static void Main()
     {
         XmlDocument xmlDoc= new XmlDocument();
         xmlDoc.Load("ex.xml");

         //~2000 tags w/ proj_title
         //any more efficient way to just look for proj_title="heat_run" specifically?
         XmlNodeList heat_run_nodes=xmlDoc.GetElementsByTagName("proj_title");
     }
}    
4

3 回答 3

13

按照现代标准,8MB 真的不是很大。我个人会使用 LINQ to XML:

XDocument doc = XDocument.Load("ex.xml");
var projects = doc.Descendants("proj_title")
                  .Where(x => (string) x == "heat_run")
                  .Select(x => x.Parent) // Just for simplicity
                  .Select(x => new {
                              Start = (int) x.Element("proj_start"),
                              End = (int) x.Element("proj_end")
                          });

foreach (var project in projects)
{
    Console.WriteLine("Start: {0}; End: {1}", project.Start, project.End);
}

(显然,根据您自己的要求进行调整 - 根据这个问题,您还不清楚您需要做什么。)

替代查询:

var projects = doc.Descendants("proj")
                  .Where(x => (string) x.Element("proj_title") == "heat_run")
                  .Select(x => new {
                              Start = (int) x.Element("proj_start"),
                              End = (int) x.Element("proj_end")
                          });
于 2013-06-03T16:55:19.927 回答
8

您可以使用 XPath 查找所有匹配的节点,例如:

XmlNodeList matches = xmlDoc.SelectNodes("proj[proj_title='heat_run']")

matches将包含proj与标准匹配的所有节点。了解有关 XPath 的更多信息:http: //www.w3schools.com/xsl/xpath_syntax.asp

SelectNodes 上的 MSDN 文档

于 2013-06-03T16:55:28.200 回答
3

使用 XDocument 并使用 LINQ api。 http://msdn.microsoft.com/en-us/library/bb387098.aspx

如果在尝试后性能不是你所期望的,你必须寻找一个 sax 解析器。Sax 解析器不会将整个文档加载到内存中,而是尝试对内存中的所有内容应用 xpath 表达式。它在事件驱动的方法中更有效,在某些情况下,这可以更快,并且不会使用太多内存。

那里可能有 .NET 的 sax 解析器,我自己没有将它们用于 .NET,但我为 C++ 使用过。

于 2013-06-03T16:54:35.267 回答