我正在使用 c# 制作一个工具,它遍历一个大文件目录并提取某些信息。该目录是按语言(LCID)组织的,所以我想使用多线程来浏览目录——每个语言文件夹一个线程。


我在循环中设置了一个线程来获取 LCID 文件夹,但出现以下错误:“'HBscan' 没有重载与委托 System.threading.threadstart 匹配”。根据我在网上阅读的内容,然后我将我的方法放在一个类中,这样我就可以有参数了,现在没有错误,但是代码没有正确地遍历文件。它正在将文件排除在扫描之外。


public static void Main(string[] args)
        //change rootDirectory variable to point to directory which you wish to scan through
        string rootDirectory = @"C:\sample";
        DirectoryInfo dir = new DirectoryInfo(rootDirectory);

        //get the LCIDs from the folders
        string[] filePaths = Directory.GetDirectories(rootDirectory);
        for (int i = 0; i < filePaths.Length; i++)
            string LCID = filePaths[i].Split('\\').Last();

            HBScanner scanner = new HBScanner(new DirectoryInfo(filePaths[i]));
            Thread t1 = new Thread(new ThreadStart(scanner.HBscan));              

        Console.WriteLine("Scanning through files...");

    public class HBScanner
        private DirectoryInfo DirectoryToScan { get; set; }

        public HBScanner(DirectoryInfo startDir)
            DirectoryToScan = startDir;

        public void HBscan()

        public static void HBscan(DirectoryInfo directoryToScan)
            //create an array of files using FileInfo object
            FileInfo[] files;
            //get all files for the current directory
            files = directoryToScan.GetFiles("*.*");
            string asset = "";
            string lcid = "";

            //iterate through the directory and get file details
            foreach (FileInfo file in files)
                String name = file.Name;
                DateTime lastModified = file.LastWriteTime;
                String path = file.FullName;

                //first check the file name for asset id using regular expression
                Regex regEx = new Regex(@"([A-Z][A-Z][0-9]{8,10})\.");
                asset = regEx.Match(file.Name).Groups[1].Value.ToString();

                //get LCID from the file path using regular expression
                Regex LCIDregEx = new Regex(@"sample\\(\d{4,5})");
                lcid = LCIDregEx.Match(file.FullName).Groups[1].Value.ToString();

                //if it can't find it from filename, it looks into xml
                if (file.Extension == ".xml" && asset == "")
                    System.Diagnostics.Debug.WriteLine("File is an .XML");
                    System.Diagnostics.Debug.WriteLine("file.FullName is: " + file.FullName);
                    XmlDocument xmlDoc = new XmlDocument();
                    //load XML file in 

                    //check for <assetid> element
                    XmlNode assetIDNode = xmlDoc.GetElementsByTagName("assetid")[0];
                    //check for <Asset> element
                    XmlNode AssetIdNodeWithAttribute = xmlDoc.GetElementsByTagName("Asset")[0];

                    //if there is an <assetid> element
                    if (assetIDNode != null)
                        asset = assetIDNode.InnerText;
                    else if (AssetIdNodeWithAttribute != null) //if there is an <asset> element, see if it has an AssetID attribute
                        //get the attribute 
                        asset = AssetIdNodeWithAttribute.Attributes["AssetId"].Value;

                        if (AssetIdNodeWithAttribute.Attributes != null)
                            var attributeTest = AssetIdNodeWithAttribute.Attributes["AssetId"];
                            if (attributeTest != null)
                                asset = attributeTest.Value;

                Item newFile = new Item
                    AssetID = asset,
                    LCID = lcid,
                    LastModifiedDate = lastModified,
                    Path = path,
                    FileName = name



            //get sub-folders for the current directory
            DirectoryInfo[] dirs = directoryToScan.GetDirectories("*.*");
            foreach (DirectoryInfo dir in dirs)

4 回答 4



该代码将为每个线程创建一个扫描仪并执行 HBscan 方法。

public static void Main(string[] args)
            //change rootDirectory variable to point to directory which you wish to scan through
            string rootDirectory = @"C:\sample";
            DirectoryInfo dir = new DirectoryInfo(rootDirectory);

            //get the LCIDs from the folders
            string[] filePaths = Directory.GetDirectories(rootDirectory);
            for (int i = 0; i < filePaths.Length; i++)
                string LCID = filePaths[i].Split('\\').Last();

                Thread t1 = new Thread(() => new HBScanner(new DirectoryInfo(filePaths[i])).HBscan());

            Console.WriteLine("Scanning through files...");

        public class HBScanner
            private DirectoryInfo DirectoryToScan { get; set; }

            public HBScanner(DirectoryInfo startDir)
                DirectoryToScan = startDir;

            public void HBscan()

            public static void HBscan(DirectoryInfo directoryToScan)
                //create an array of files using FileInfo object
                FileInfo[] files;
                //get all files for the current directory
                files = directoryToScan.GetFiles("*.*");
                string asset = "";
                string lcid = "";

                //iterate through the directory and get file details
                foreach (FileInfo file in files)
                    String name = file.Name;
                    DateTime lastModified = file.LastWriteTime;
                    String path = file.FullName;

                    //first check the file name for asset id using regular expression
                    Regex regEx = new Regex(@"([A-Z][A-Z][0-9]{8,10})\.");
                    asset = regEx.Match(file.Name).Groups[1].Value.ToString();

                    //get LCID from the file path using regular expression
                    Regex LCIDregEx = new Regex(@"sample\\(\d{4,5})");
                    lcid = LCIDregEx.Match(file.FullName).Groups[1].Value.ToString();

                    //if it can't find it from filename, it looks into xml
                    if (file.Extension == ".xml" && asset == "")
                        System.Diagnostics.Debug.WriteLine("File is an .XML");
                        System.Diagnostics.Debug.WriteLine("file.FullName is: " + file.FullName);
                        XmlDocument xmlDoc = new XmlDocument();
                        //load XML file in 

                        //check for <assetid> element
                        XmlNode assetIDNode = xmlDoc.GetElementsByTagName("assetid")[0];
                        //check for <Asset> element
                        XmlNode AssetIdNodeWithAttribute = xmlDoc.GetElementsByTagName("Asset")[0];

                        //if there is an <assetid> element
                        if (assetIDNode != null)
                            asset = assetIDNode.InnerText;
                        else if (AssetIdNodeWithAttribute != null) //if there is an <asset> element, see if it has an AssetID attribute
                            //get the attribute 
                            asset = AssetIdNodeWithAttribute.Attributes["AssetId"].Value;

                            if (AssetIdNodeWithAttribute.Attributes != null)
                                var attributeTest = AssetIdNodeWithAttribute.Attributes["AssetId"];
                                if (attributeTest != null)
                                    asset = attributeTest.Value;

                    Item newFile = new Item
                        AssetID = asset,
                        LCID = lcid,
                        LastModifiedDate = lastModified,
                        Path = path,
                        FileName = name



                //get sub-folders for the current directory
                DirectoryInfo[] dirs = directoryToScan.GetDirectories("*.*");
                foreach (DirectoryInfo dir in dirs)
于 2013-01-25T11:06:41.127 回答

如果您使用的是 .NET 4.0,则可以使用 TPL 并使用Parallel.For/Parallel.ForEach同时处理多个项目相当容易。

几天前我才接触到它,这很有趣。它通过在不同内核上使用多个线程来加速您的工作,从而为您提供出色的性能。当然,由于过多的 IO 访问,这可能会在您的情况下受到限制。


于 2013-01-25T11:11:17.227 回答


public static void Main(string[] args)
    const string rootDirectory = @"C:\sample";

        .ForAll(f => HBScannner.HBScan(new DirectoryInfo(f)));

毕竟,您只能在循环体中获取 LCID 才能将其写入控制台。如果你想保持对控制台的写入,你可以这样做,

public static void Main(string[] args)
    const string rootDirectory = @"C:\sample";

    Console.WriteLine("Scanning through files...");

        .ForAll(f => 
                var lcid = f.Split('\\').Last();

                HBScannner.HBScan(new DirectoryInfo(f));

请注意,EnumerateDirectories应该优先使用 of,GetDirectories因为它是惰性评估的,因此您的处理可以在找到第一个目录后立即开始。您不必等待所有目录都加载到列表中。

于 2013-01-25T11:30:14.423 回答

使用 BlockingCollection http://msdn.microsoft.com/en-us/library/dd267312.aspx可以大大改善您的任务。

总体结构是这样的:您创建一个线程(或在主线程中执行此操作),它将枚举文件并将它们添加到 BlockingCollection。简单地枚举文件应该相当快,并且这个线程应该比工作线程更快地完成。

然后,您创建许多任务(与 Environment.ProcessorCount 相同的数量会很好)。这些任务应该像 docs (collection.Take()) 中的第一个示例一样。任务应该对一个单独的文件进行检查。

因此,一个线程正在寻找文件名并将它们放入 BlockingCollection,而其他并行的线程将检查文件内容。这样您将获得更好的并行性,因为如果您为文件夹创建线程,这可能会导致工作分配不均(您不知道每个文件夹中都有很多文件,对吧?)

于 2013-01-25T11:24:06.527 回答