1

(I didn't realize long posts were frowned upon)

My program reads a directory of files that looks similar to the following:

BkUpSalesReportJan2011(txt).zip
BkUpSalesReportJan2011(pdf).zip
BkUpSalesReportJan2011(doc).zip
BkUpSalesReportFeb2011(txt).zip
BkUpSalesReportMar2011(doc).zip
BkUpSalesReportMar2011(pdf).zip
Goes on for a few hundred more files...

I want to save only one copy of each report based on file type (in order of priority). I want to keep PDFs and delete all the duplicates. If there were no PDFs then keep DOCs and lastly keep the TXTs.

What is the best way to implement the sorting and deleting using Visual C# and windows forms?

4

1 回答 1

3

您可以使用 Regex 来解析数据的文件名,并使用 Linq 来获取 Duplicates 或 Distinct 记录。

POCO:

public class FileData
{
    public string Original { get; set; }
    public string Type { get; set; }
    public string Name { get; set; }

    public int Weight { get { return GetWeight(Type); } }

    private static int GetWeight(string option)
    {
        // This will put the files in order by pdf, doc, txt, etc
        switch(option)
        {
            case "pdf":
                return 0;
            case "doc":
                return 1;
            case "txt":
                return 2;
            default:
                return 3;
        }
    }
}

您将需要一个权重函数,因为默认值OrderBy将按字母顺序工作。通过这种方式,您可以指定哪些文件更重要。

代码:

// you can substitute this with Directory.GetFiles
// e.g. var files = Directory.GetFiles("Path/To/Files");
var files = new []
{
    "BkUpSalesReportJan2011(txt).zip",
    "BkUpSalesReportJan2011(pdf).zip",
    "BkUpSalesReportJan2011(doc).zip",
    "BkUpSalesReportFeb2011(txt).zip",
    "BkUpSalesReportMar2011(doc).zip",
    "BkUpSalesReportMar2011(pdf).zip"
};

var pattern = @"(?<FileName>.+)\((?<FileType>\w+)\)\.zip";
// (?<FileName>.+) Match the first part in a named group
// \( Match the first open parenthesis
// (?<FileType>\w+) Match the txt/pdf/doc/whatever in a named group
// \) Match the closing parenthesis
// \.zip Match a period followed by the zip

var matchedFiles = files.Select(f => Regex.Match(f, pattern))
                        .Where(m => m.Success)
                        .Select(f =>
                            new FileData
                                {
                                    Type = f.Groups["FileType"].Value,
                                    Name = f.Groups["FileName"].Value,
                                    Original = f.Value
                                }
                               ).ToList();

// Group all the files by the name e.g. BkUpSalesReportJan2011
// Transform the group into order and take the first record
// Take the original file name to get the originals
var distinct = matchedFiles.GroupBy(f => f.Name)
                           .Select(g => g.OrderBy(f => f.Weight).First())
                           .Select(f => f.Original);

// Group all the files by the name e.g. BkUpSalesReportJan2011
// Transform the group into order and skip the first record
// Since the records are still in a nested IEnumerable we need to flatten it
// Take the original file name to get the duplicates
var duplicates = matchedFiles.GroupBy(f => f.Name)
                             .Select(g => g.OrderBy(f => f.Weight).Skip(1))
                             .SelectMany(g => g)
                             .Select(f => f.Original);

也可以看看:

Directory.GetFiles

于 2013-08-07T14:00:45.437 回答