perl - Compare two CSV files and show only the difference

Question

I have two CSV files:

File1.csv

Time, Object_Name, Carrier_Name, Frequency, Longname

2013-08-05 00:00, Alpha, Aircel, 917.86, Aircel_Bhopal

2013-08-05 00:00, Alpha, Aircel, 915.13, Aircel_Indore

File2.csv

Time, Object_Name, Carrier_Name, Frequency, Longname

2013-08-05 00:00, Alpha, Aircel, 917.86, Aircel_Bhopal

2013-08-05 00:00, Alpha, Aircel, 815.13, Aircel_Indore

These are sample input files in actual so many headers and values will be there, so I can not make them hard coded.

In my expected output I want to keep the first two columns and the last column as it is as there won't be any change in the same and then the comparison should happen for the rest of the columns and values.

Expected output:

Time, Object_Name, Frequency, Longname

2013-08-05 00:00, 815.13, Aircel_Indore

How can I do this?

score 0 · Accepted Answer

Answering @IlmariKaronen's questions would clarify the problem much better, but meanwhile I made some assumptions and took a crack at the problem - mainly because I needed an excuse to learn a bit of Text::CSV.

Here's the code:

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV;
use Array::Compare;
use feature 'say';

open my $in_file, '<', 'infile.csv';
open my $exp_file, '<', 'expectedfile.csv';

open my $out_diff_file, '>', 'differences.csv';

my $text_csv = Text::CSV->new({ allow_whitespace => 1, auto_diag => 1 });

my $line = readline($in_file);
my $exp_line = readline($exp_file);
die 'Different column headers' unless $line eq $exp_line;
$text_csv->parse($line);
my @headers = $text_csv->fields();

my %all_differing_indices;

#array-of-array containings lists of "expected" rows for differing lines
# only columns that differ from the input have values, others are empty
my @all_differing_rows; 

my $array_comparer = Array::Compare->new(DefFull => 1);
while (defined($line = readline($in_file))) {
    $exp_line = readline($exp_file);
    if ($line ne $exp_line) {
        $text_csv->parse($line);
        my @in_fields = $text_csv->fields();
        $text_csv->parse($exp_line);
        my @exp_fields = $text_csv->fields();

        my @differing_indices = $array_comparer->compare([@in_fields], [@exp_fields]);
        @all_differing_indices{@differing_indices} = (1) x scalar(@differing_indices);
        my @output_row = ('') x scalar(@exp_fields);
        @output_row[0, 1, @differing_indices, $#exp_fields] = @exp_fields[0, 1, @differing_indices, $#exp_fields];
        $all_differing_rows[$#all_differing_rows + 1] = [@output_row];
    }
}

my @columns_needed = (0, 1, keys(%all_differing_indices), $#headers);

$text_csv->combine(@headers[@columns_needed]);
say $out_diff_file $text_csv->string();
for my $row_aref (@all_differing_rows) {
    $text_csv->combine(@{$row_aref}[@columns_needed]);   
    say $out_diff_file $text_csv->string();
}

It works for the File1 and File2 given in the question and produces the Expected output (except that the Object_Name 'Alpha' is present in the data line - I'm assuming that's a typo in the question).

Time,Object_Name,Frequany,Longname
"2013-08-05 00:00",Alpha,815.13,Aircel_Indore

score 0 · Accepted Answer

我用非常强大的 linux 工具为它创建了一个脚本。链接在这里...

Linux / Unix - 比较两个 CSV 文件这个项目是关于比较两个 csv 文件的。

假设 csvFile1.csv 有 XX 列，而 csvFile2.csv 有 YY 列。

我编写的脚本应该将 csvFile1.csv 中的一个（键）列与 csvFile2.csv 中的另一个（键）列进行比较。csvFile1.csv 中的每个变量（键列中的行）将与 csvFile2.csv 中的每个变量进行比较。

如果 csvFile1.csv 有 1,500 行并且 csvFile2.csv 有 15,000 个组合（比较）的总数将是 22,500,000。因此，这是创建可用性报告脚本的非常有用的方法，例如可以将内部产品数据库与外部（供应商）产品数据库进行比较。

使用的软件包： csvcut（剪切列） csvdiff（比较两个 csv 文件） ssconvert（将 xlsx 转换为 csv） iconv curlftpfs zip unzip ntpd proFTPD

您可以在我的官方博客（+示例脚本）上找到更多信息：http: //damian1baran.blogspot.sk/2014/01/linux-unix-compare-two-csv-files.html

score 0 · Accepted Answer

如果您没有绑定到Perl，这里有一个使用AWK的解决方案：

 #!/bin/bash

 awk -v FS="," '

 function filter_columns()
 {
     return sprintf("%s, %s, %s, %s", $1, $2, $(NF-1), $NF);
 }

 NF !=0 && NR == FNR {
    if (NR == 1) {
            print filter_columns();
    } else {
            memory[line++] = filter_columns();
    }
 } NF != 0 && NR != FNR {
    if (FNR == 1) {
            line = 0;
    } else {
            new_line = filter_columns();
            if (new_line != memory[line++]) {
                    print new_line;
            }
    }
 }' File1.csv File2.csv

这输出：

Time,  Object_Name,  Frequany, Longname
2013-08-05 00:00,  Alpha,  815.13,  Aircel_Indore

这里的解释：

#!/bin/bash

# FS = "," makes awk split each line in fields using
# the comma as separator
awk -v FS="," '

# this function selects the columns you want. NF is the
# the number of field. Therefore $NF is the content of
# the last column and $(NF-1) of the but last.
function filter_columns()
{
     return sprintf("%s, %s, %s, %s", $1, $2, $(NF-1), $NF);
}

# This block processes just the first file, this is the aim
# of the condition NR == FNR. The condition NF != 0 skips the
# empty lines you have in your file. The block prints the header
# and then save all the other lines in the array memory.
NF !=0 && NR == FNR {
    if (NR == 1) {
            print filter_columns();
    } else {
            memory[line++] = filter_columns();
    }
}
# This block processes just the second file (NR != FNR).
# Since the header has been already printed, it skips the first
# line of the second file (FNR == 1). The block compares each line
# against that one saved in the array memory (the corresponding
# line in the first file). The block prints just the lines
# that do not match.
NF != 0 && NR != FNR {
    if (FNR == 1) {
            line = 0;
    } else {
            new_line = filter_columns();
            if (new_line != memory[line++]) {
                    print new_line;
            }
    }
}' File1.csv File2.csv

score 0 · Accepted Answer

请看下面的链接，有一些示例脚本：

http://bytes.com/topic/perl/answers/647889-compare-two-csv-files-using-perl
Perl：比较两个 CSV 文件并打印出差异
 http://www.perlmonks.org/?node_id= 705049

perl - Compare two CSV files and show only the difference

File1.csv

File2.csv

4 回答 4

Related

Reference