c# - C# 写入多个文件而无需不断关闭/重新打开流。蒸汽作家？

Question

我正在尝试从 Access 数据库中读取表，然后将该表中的数据排序到多个文本文件中。关键是要写入的文件名取决于每条记录中的值。这是我正式的第一个 C# 应用程序，所以你可以认为我是“绿色的”。我还应该提到，我正在使用 Access 数据库，直到我可以敲定代码，最终它将从具有数百万条记录的 SQL 服务器中提取。

我现在有代码工作，但问题是有大量的文件打开/关闭操作。我只想打开每个文件一次进行写入，因为它将这些文件写入网络驱动器。这本质上是在服务器上运行的胶水应用程序 - 所以也有一些其他限制 - 我无法保存到本地驱动器然后复制到网络。我无法在拉取之前对查询进行排序。我不能在运行时对服务器资源产生不利影响。

可能最好的方法是使用哈希表。检查文件是否已经打开，如果没有，打开它并将文件句柄保存在哈希表中。然后在完成后立即将它们全部关闭。但是我找不到如何同时使用多个 StreamWriter 对象的示例。

我希望相对容易地找到答案，但我似乎找不到他的解决方案。我怀疑 StreamWriter 是用于此的错误类。

我能找到的最接近的先前问题来自CodeProject page。在该页面上，他们说保持文件手打开的做法是不好的，应该避免，但该页面没有解释原因，也没有提供示例替代方案。有一个建议是将整个数据集加载到内存中然后对其进行操作，但这对我来说不是一个选项，因为表中的数据太多。

这是我到目前为止所拥有的。

String strConnection;
String strQuery;
String strPunchFileNameTemplate;

// Define our Variables
strConnection = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=ClockData.accdb";
strQuery = @"SELECT * FROM ClockPunches";   
strPunchFileNameTemplate = @"C:\PUNCHES\%READER%.TXT";      

// OleDbConnection implements iDisposable interface, so we must scope out its usage.
// Set up Connection to our data source
using (OleDbConnection ConnObj = new OleDbConnection(strConnection))    {

    // Create a Command with our Query String
    OleDbCommand CmdObj = new OleDbCommand(strQuery,ConnObj);

    // Open our Connection
    ConnObj.Open();

    // OledbDataReader implements iDisposable interface, so we must scope out its usage.
    // Execute our Reader
    using (OleDbDataReader ReaderObj = CmdObj.ExecuteReader(CommandBehavior.KeyInfo))   {

        // Load the source table's schema into memory (a DataTable object)
        DataTable TableObj = ReaderObj.GetSchemaTable();

        // Parse through each record in the Reader Object
        while(ReaderObj.Read()) {

            // Extract PunchTime, CardNumber, and Device to separate variables
            DateTime dtTime = ReaderObj.GetDateTime(ReaderObj.GetOrdinal("PunchTime"));
            Int16 intID = ReaderObj.GetInt16(ReaderObj.GetOrdinal("CardNumber"));
            String strReader = ReaderObj.GetString(ReaderObj.GetOrdinal("Device"));

            // Translate the device name into a designated filename (external function)
            strReader = GetDeviceFileName(strReader);

            // Put our dynamic filename into the path template
            String pathStr = strPunchFileNameTemplate.Replace("%READER%",strReader);

            // Check to see if the file exists.  New files need an import Header
            Boolean FileExistedBool = File.Exists(pathStr);

            // StreamWrite implements iDisposable interface, so we must scope out its usage.
            // Create a Text File for each Device, Append if it exists
            using (StreamWriter outSR = new StreamWriter(pathStr, true))    {

                // Write our Header if required
                if (FileExistedBool == false)   {
                    outSR.WriteLine("EXAMPLE FILE HEADER");
                }

                // Set up our string we wish to write to the file
                String outputStr = dtTime.ToString("MM-dd-yyyy HH:mm:ss") + " " + intID.ToString("000000");

                // Write the String
                outSR.WriteLine(outputStr);

                // End of StreamWriter Scope - should automatically close
            }
        }
        // End of OleDbDataREader Scope - should automatically close
    }
    // End of OleDbConnection Scope - should automatically close
}

score 2 · Accepted Answer

这是你自己陷入的一个非常有趣的问题。

缓存文件处理程序的问题在于，大量的文件处理程序会耗尽系统资源，从而使程序和窗口性能不佳。

如果数据库中的设备数量不太高（少于 100 个），我认为缓存句柄是安全的。

或者，您可以缓存一百万条记录，将它们分发到不同的设备并保存一些，然后再读取一些记录。

您可以将记录放在字典中，如下所示：

class PunchInfo
{  
    public PunchInfo(DateTime time, int id)
    {
        Id = id;
        Time = time;
    }
    public DateTime Time;
    public int Id;
}

Dictionary<string, List<PunchInfo>> Devices;
int Count = 0;
const int Limit = 1000000;
const int LowerLimit = 90 * Limit / 100;
void SaveRecord(string device, int id, DateTime time)
{
   PunchInfo info = new PunchInfo(time, id);
   List<PunchInfo> list;
   if (!Devices.TryGetValue(device, out list))
   {
      list = new List<PunchInfo>();
      Devices.Add(device, list);
   }
   list.Add(info);
   Count++;
   if (Count >= Limit)
   {
       List<string> writeDevices = new List<string>();
       foreach(KeyValuePair<string, List<PunchInfo>> item in Devices)
       {
           writeDevices.Add(item.Key);
           Count -= item.Value.Count;
           if (Count < LowerLimit) break;
       }

       foreach(string device in writeDevices)
       {
          List<PunchInfo> list = Devices[device];
          Devices.Remove(device);
          SaveDevices(device, list);
       }
    }
}

void SaveAllDevices()
{
    foreach(KeyValuePair<string, List<PunchInfo>> item in Devices)
        SaveDevices(item.Key, item.Value);
    Devices.Clear();
}

这样，您将避免打开和关闭文件并打开大量文件。

一百万条记录占用 20 MB 内存，您可以轻松地将其增加到 1000 万条记录而不会出现问题。

score 1 · Accepted Answer

您需要设置一组编写器。这是一个如何做的例子。

namespace example
{
    class Program
    {
    public static StreamWriter[] writer = new StreamWriter[3];

    static void Main(string[] args)
    {
        writer[0] = new StreamWriter("YourFile1.txt");
        writer[1] = new StreamWriter("YourFile2.txt");
        writer[2] = new StreamWriter("YourFile3.txt");

        writer[0].WriteLine("Line in YourFile1.");
        writer[1].WriteLine("Line in YourFile2.");
        writer[2].WriteLine("Line in YourFile3.");

        writer[0].Close();
        writer[1].Close();
        writer[2].Close();
    }
}

}

score 0 · Accepted Answer

我可以建议将您的数据保存在内存中并仅在达到一定的三售出时才写入磁盘

const int MAX_MEMORY_BUFFER = 100000; // To be defined according to you memory limits
String strConnection;
String strQuery;
String strPunchFileNameTemplate;

strConnection = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=ClockData.accdb";
strQuery = @"SELECT * FROM ClockPunches";   
strPunchFileNameTemplate = @"C:\PUNCHES\%READER%.TXT";      

Dictionary<string, StringBuilder> data = new Dictionary<string, StringBuilder>();

using (OleDbConnection ConnObj = new OleDbConnection(strConnection))    
{
    OleDbCommand CmdObj = new OleDbCommand(strQuery,ConnObj);
    ConnObj.Open();

    using (OleDbDataReader ReaderObj = CmdObj.ExecuteReader(CommandBehavior.KeyInfo))   
    {
        while(ReaderObj.Read()) 
        {
            DateTime dtTime = ReaderObj.GetDateTime(ReaderObj.GetOrdinal("PunchTime"));
            Int16 intID = ReaderObj.GetInt16(ReaderObj.GetOrdinal("CardNumber"));
            String strReader = ReaderObj.GetString(ReaderObj.GetOrdinal("Device"));

            strReader = GetDeviceFileName(strReader);

            bool dataPresent = data.ContainsKey(strReader);
            if (dataPresent == false)   
            {
                StringBuilder sb = new StringBuilder("EXAMPLE FILE HEADER\r\n");
                data.Add(strReader, sb);
            }

            String outputStr = dtTime.ToString("MM-dd-yyyy HH:mm:ss") + " " + intID.ToString("000000");
            StringBuilder sb = data[strReader];
            sb.AppendLine(outputStr);
            if(sb.Length > MAX_MEMORY_BUFFER)
            {
                String pathStr = strPunchFileNameTemplate.Replace("%READER%",strReader);
                using(StreamWriter sw = new StremWriter(pathStr, true) // Append mode
                {
                    // Write the buffer and set the lenght to zero
                    sw.WriteLine(sb.ToString());
                    sb.Length = 0;
                }
            }
        }
    }

    // Write all the data remaining in memory
    foreach(KeyValuePair<string, StringBuilder> info in data)
    {
        if(info.Value.Length > 0)
        {
          String pathStr = strPunchFileNameTemplate.Replace("%READER%",info.Key);
          using(StreamWriter sw = new StremWriter(pathStr, true) // Append mode
          {
              sw.WriteLine(info.Value.ToString());
          }
        }
    }
}

这段代码需要测试，但我想给你一个大致的想法。通过这种方式，您可以平衡您的 IO 操作。通过增加内存缓冲区来降低，反之亦然。当然，现在您还需要考虑可用于存储数据的内存。

score 0 · Accepted Answer

如果单个进程在较长时间内打开了 100 或 1000 个文件句柄，这通常被认为是有问题的。但是时代变了，这已经不是问题了。因此，如果情况需要，就去做。

在分析这些文件中的数据的过程中，我可以打开 100、1000 甚至 5000 个文件。这将持续几个小时。如果文件读/写性能下降，我在 Windows 操作系统上进行了测量。事实并非如此。由于现代机器现在可用的内存资源，在操作系统端的内存中拥有 5000 个文件描述符不会再导致任何问题。操作系统使它们保持排序（我猜），然后这些描述符的查找是 log(n)，所以没有任何可测量的事情发生。

打开这些句柄（文件描述符结构）肯定比用数据填充内存然后逐个文件将其刷新到磁盘要好得多。

c# - C# 写入多个文件而无需不断关闭/重新打开流。蒸汽作家？

4 回答 4

Related

Reference