excel - 处理电子邮件地址列表的最佳方式

Question

我有 3 个文本文件（A、B 和 C），每个文件都有数百个电子邮件地址。我想将列表 A 和列表 B 合并到一个文件中，忽略大小写和空格的差异。然后我想删除列表 C 中新列表中的所有电子邮件，再次忽略大小写和空格的差异。

我选择的编程语言通常是 C++，但它似乎不太适合这项任务。是否有一种脚本语言可以在相对较少的行中执行此操作（和类似任务）？

还是已经有软件（免费或商业）可以让我这样做？例如，是否可以在 Excel 中执行此操作？

score 3 · Accepted Answer

最快的方法可能不一定需要编码。您可以在一个工作表中将文件 A 和 B 导入 Excel，然后（如有必要）对生成的地址列表进行过滤以删除任何重复项。

下一步是将文件 C 导入第二个工作表。在第三个工作表中，您将执行 VLOOKUP 以挑选出您的第一个列表中的所有地址，如果它们在您的“列表 C”中，则将它们删除。

VLOOKUP 看起来像这样：

=IF(ISNA(VLOOKUP( email_address_cell , Sheet2! email_duplicates_list , 1, false), "", (VLOOKUP( email_address_cell , Sheet2! email_duplicates_list , 1, false)))

我还检查了公式是否返回“值不可用”错误，在这种情况下，单元格只显示一个空白值。从那里，你只需要删除你的空白，然后你的最终列表就出来了。

现在说了这么多，你仍然可以做一个 VBA 宏来做同样的事情，但可能会清理一下列表，这取决于你需要什么。希望有帮助！

score 3 · Accepted Answer

正如提到的 Excel，您也可以使用 Jet 和 VBScript 来做这种事情。

Set cn = CreateObject("ADODB.Connection")
strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Docs\;" _
& "Extended Properties=""text;HDR=No;FMT=Delimited"";"

cn.Open strCon

strSQL = "SELECT F1 Into New.txt From EmailsA.txt " _
    & "WHERE UCase(F1) Not IN (SELECT UCase(F1) From EmailsC.txt)"
cn.Execute strSQL

strSQL = "INSERT INTO New.txt ( F1 ) SELECT F1 FROM EmailsB.txt " _
    & "WHERE UCase(F1) Not IN (SELECT UCase(F1) From EmailsC.txt)"
cn.Execute strSQL

score 2 · Accepted Answer

对于您描述的那种文本处理，perl 或 python 都是理想的。

您可以使用关联数组（在这种情况下为带有字符串索引的数组）将电子邮件地址存储在列表中。

使用小写的、未加空格的电子邮件地址作为键，使用真实的电子邮件地址作为值。

然后是读入并存储第一个文件，读入并存储第二个文件（这将使用相同的密钥覆盖电子邮件地址），然后读入第三个文件并使用该密钥从列表中删除条目。

然后剩下的就是您想要的列表（A + B - C）。

这里的伪代码：

set list to empty
foreach line in file one:
    key = unwhitespace(tolowercase(line))
    list{key} = line
foreach line in file two:
    key = unwhitespace(tolowercase(line))
    list{key} = line
foreach line in file three:
    key = unwhitespace(tolowercase(line))
    if exists(list{key})
        delete list{key}
foreach key in list:
    print list{key}

score 1 · Accepted Answer

Sadly this answer probably won't help you, but if in fact you were using Unix (Linux for example) you could do something like:

cat filea >> fileb # append file a to file b

sort fileb | uniq > newFile # newFile now contains a merger of file a and file b, with sorted and unique email addresses

The above could all be done on one line as follows: cat filea >> fileb | sort | uniq > newFile

Now you're left with simply removing common emails. Some variation of "diff" should be helpful there such as perhaps: diff newFile fileC > finalFile

diff will give you a list of differences between the two files, so the output in "finalFile" should be a list of emails that are in "newFile" (the merger of A & B) but are NOT in fileC. Options to the various tools allow you to ignore whitespace and case. I'd have to play with it a bit to get it exactly right but the above is the general idea.

I used to have an extra box running Linux for the sole purpose of doing stuff like the above which is a hassle under Windoze but a breeze under Unix type operating systems. When my hardware died I never got around to building another Linux box.

I believe the MKS toolkit for Windoze probably has all of the above utilities.

score 1 · Accepted Answer

我想上面的答案，回答技术HOW TO问题；唯一需要考虑的是您必须执行多少次任务。如果这是一次性的事情，并且您对 Excel 更满意，请从那里开始。如果您知道您将执行此任务至少两次甚至更多次，那么编写脚本或可执行文件是您的最佳选择。

score 1 · Accepted Answer

在 Python 中，是这样的：

请注意，这会将小写电子邮件写入最终输出。如果这不行，那么基于字典的解决方案将是必要的。

def read_file(filename):
    with file(filename, "r") as f:
        while True:
            line = f.readline();
            if not line:
                break;
            line = line.rstrip();
            if line:
                yield line;

def write_file(filename, lines):
    with file(filename, "w") as f:
        for line in lines:
            f.write(line + "\n");

set_a = set((line.lower() for line in read_file("file_a.txt")));
set_b = set((line.lower() for line in read_file("file_b.txt")));
set_c = set((line.lower() for line in read_file("file_c.txt")));

# Calculate (a + b) - c
write_file("result.txt", set_a.union(set_b).difference(set_c));

score -1 · Accepted Answer

-1

Excel 可以做到，如上。最适合的编程语言是 Perl。

于 2008-11-09T23:51:38.290 回答

excel - 处理电子邮件地址列表的最佳方式

7 回答 7

Related

Reference