3

I am developing an application and I have to upload data from CSV files into a DB tables. Problem is, I don’t have CSV files but I have flat text files to be converted into CSV. An additional problem is, as the application is used by several customers who have different systems, I have different flat text files with different layouts.

What I want to achieve is to create an application that loads “rules” from a special file; these rules will be processed with the flat text file in order generate the CSV file. The application that converts from flat file to CSV would be the same, just the set of rules would be different.

How can I achieve this? What is the best practice you recommend?

4

2 回答 2

7

这取决于规则的复杂性。如果唯一不同的输入是列的名称和使用的分隔符,那么这很容易,但如果您还希望能够解析完全不同的格式(如 XML 等),那就另当别论了。

我自己会选择为“记录”阅读器实现一个基类,该阅读器从文件中读取记录并将它们输出到数据集或 CSV。然后,您可以实现实现读取不同源格式的子类。

如果你愿意,你可以为这些格式添加特定的规则,这样你就可以创建一个从 BaseReader 继承的通用 XMLReader,但它允许可配置的列名。但是我会从一堆硬编码的阅读器开始,直到你更清楚你可能会遇到这些格式的哪些方言。

编辑:根据要求,它的外观示例。

注意,这个例子远非理想!它读取自定义格式,将其传输到一个特定的表结构并将其保存为 CSV 文件。您可能希望进一步拆分它,以便可以将代码重用于不同的表结构。尤其是字段 defs,您可能希望能够在后代类或工厂类中进行设置。但是为了简单起见,我采取了一种更严格的方法,并在一个基类中加入了太多的智能。

基类具有创建内存数据集所需的逻辑(我使用了 TClientDataSet)。它可以“迁移”文件。实际上,这意味着它读取、验证和导出文件。

阅读是抽象的,必须在子类中实现。它应该将数据读取到内存数据集中。这允许您在客户端数据集中进行所有必要的验证。这允许您以与数据库/文件格式无关的方式强制执行字段类型和大小,并在需要时进行任何其他检查。

验证和写入是使用数据集中的数据完成的。从源文件被解析为数据集的那一刻起,不再需要关于源文件格式的知识。

声明:不要忘记使用DB, DBClient.

type
  TBaseMigrator = class
  private
    FData: TClientDataset;
  protected
    function CSVEscape(Str: string): string;
    procedure ReadFile(AFileName: string); virtual; abstract;
    procedure ValidateData;
    procedure SaveData(AFileName: string);
  public
    constructor Create; virtual;
    destructor Destroy; override;

    procedure MigrateFile(ASourceFileName, ADestFileName: string); virtual;
  end;

执行:

{ TBaseReader }

constructor TBaseMigrator.Create;
begin
  inherited Create;
  FData := TClientDataSet.Create(nil);
  FData.FieldDefs.Add('ID', ftString, 20, True);
  FData.FieldDefs.Add('Name', ftString, 60, True);
  FData.FieldDefs.Add('Phone', ftString, 15, False);
  // Etc
end;

function TBaseMigrator.CSVEscape(Str: string): string;
begin
  // Escape the string to a CSV-safe format;
  // Todo: Check if this is sufficient!
  Result := '"' + StringReplace(Result, '"', '""', [rfReplaceAll]) + '"';
end;

destructor TBaseMigrator.Destroy;
begin
  FData.Free;
  inherited;
end;

procedure TBaseMigrator.MigrateFile(ASourceFileName, ADestFileName: string);
begin
  // Read the file. Descendant classes need to override this method.
  ReadFile(ASourceFileName);

  // Validation. Implemented in base class.
  ValidateData;

  // Saving/exporting. For now implemented in base class.
  SaveData(ADestFileName);
end;

procedure TBaseMigrator.SaveData(AFileName: string);
var
  Output: TFileStream;
  Writer: TStreamWriter;
  FieldIndex: Integer;
begin
  Output := TFileStream.Create(AFileName,fmCreate);
  Writer := TStreamWriter.Create(Output);
  try

    // Write the CSV headers based on the fields in the dataset
    for FieldIndex := 0 to FData.FieldCount - 1 do
    begin
      if FieldIndex > 0 then
        Writer.Write(',');
      // Column headers are escaped, but this may not be needed, since
      // they likely don't contain quotes, commas or line breaks.
      Writer.Write(CSVEscape(FData.Fields[FieldIndex].FieldName));
    end;
    Writer.WriteLine;

    // Write each row
    FData.First;
    while not FData.Eof do
    begin

      for FieldIndex := 0 to FData.FieldCount - 1 do
      begin
        if FieldIndex > 0 then
          Writer.Write(',');
        // Escape each value
        Writer.Write(CSVEscape(FData.Fields[FieldIndex].AsString));
      end;
      Writer.WriteLine;

      FData.Next
    end;

  finally
    Writer.Free;
    Output.Free;
  end;
end;

procedure TBaseMigrator.ValidateData;
begin
  FData.First;
  while not FData.Eof do
  begin
    // Validate the current row of FData
    FData.Next
  end;
end;

一个示例子类:TINiFileReader,它读取 inifile 部分,就好像它们是数据库记录一样。如您所见,您只需要实现读取文件的逻辑即可。

type
  TIniFileReader = class(TBaseMigrator)
  public
    procedure ReadFile(AFileName: string); override;
  end;

{ TIniFileReader }

procedure TIniFileReader.ReadFile(AFileName: string);
var
  Source: TMemIniFile;
  IDs: TStringList;
  ID: string;
  i: Integer;
begin
  // Initialize an in-memory dataset.
  FData.Close; // Be able to migrate multiple files with one instance.
  FData.CreateDataSet;

  // Parsing a weird custom format, where each section in an inifile is a
  // row. Section name is the key, section contains the other fields.
  Source := TMemIniFile.Create(AFileName);
  IDs := TStringList.Create;
  try
    Source.ReadSections(IDs);

    for i := 0 to IDs.Count - 1 do
    begin
      // The section name is the key/ID.
      ID := IDs[i];

      // Append a row.
      FData.Append;

      // Read the values.
      FData['ID'] := ID;
      FData['Name'] := Source.ReadString(ID, 'Name', '');
      // Names don't need to match. The field 'telephone' in this propriety
      // format maps to 'phone' in your CSV output.
      // Later, you can make this customizable (configurable) if you need to,
      // but it's unlikely that you encounter two different inifile-based
      // formats, so it's a waste to implement that until you need it.
      FData['Phone'] := Source.ReadString(ID, 'Telephone', '');

      FData.Post;
    end;

  finally
    IDs.Free;
    Source.Free;
  end;
end;
于 2012-09-17T14:05:47.300 回答
0

这与“屏幕刮板”面临的问题非常相似。如果最终用户打算能够使用它,我会避免使用正则表达式(如果需要,作为内部实现细节除外),并且不会将原始正则表达式编辑暴露给最终用户。

相反,我会让他们加载数据文件的样本,并通过拖放样式直观地构建规则。

  1. 单击“匹配文本”按钮,单击并拖动以选择屏幕上的矩形区域。如果格式不精确或不可重复,则可以选择允许向上或向下或向左或向右移动一定量。限制您可以超出原始框的距离。

  2. 单击“抓取文本”按钮,单击并拖动到屏幕上的矩形或非矩形(流动)区域。用一个字段命名输出,并给它一个类型(整数、字符串 [x] 等)。类似的限制适用于第 1 步。

  3. 单击保存,模板规则将写入磁盘。加载一个不同的文件,看看规则是否仍然适用。

相关的维基百科主题。

于 2012-09-17T18:05:59.153 回答