0

I have a pretty large account full of ~20k emails in Outlook and I need to extract phone numbers from those emails.

An example of an email would be:

From: Amy Schwartz <amy@blahdyblah.com>

Dear Anatoliy, 
I want you to do blahdy blahdy blah.

Amy Schwartz
(347) 555-1212 <---- I want this
Blahdy Blah Company

The idea is to go through every email and match the last Phone number via regex and export a list in the following format:

  • Name: Name from the "From" field
  • Email: Email from the "From" field
  • Phone: The last phone number matched in the email text

Do you have any ideas on how to go about doing this?

UPDATE: Didn't find any prebuilt solutions, but I'm hacking together my own using this. codeTwo Outlook Express. You can export any email field (body, HTML body, from, from name) to CSV. It's a little slow (3 seconds a message on my i7 iMac running a Win7 VM). But it works :) And from there I will probably just put in a database and do some regex magic. Will post process once I'm done.

4

1 回答 1

0

弄清楚了。如果您知道如何制作 Node.js 脚本(但我相信您可以用 Bash 编写),那将非常容易。

1) 使用Outlook 导出插件将所有电子邮件导出到 CSV。确保电子邮件是第一列,名称是第二列,正文(文本)是第三列。

2) 在与您的 CSV 电子邮件相同的目录中的 Node JS 中编写以下脚本

var fs = require('fs');
var csv = require('csv');
csv()
    .from.stream(fs.createReadStream(__dirname+'/data.csv'))
    .to.path(__dirname+'/out.csv')
    .transform( function(row){
      var match = row[2].match(/(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})/);
        return '"' + row[0] + '","' + row[1] + '","' + (match ? match[0] : '') + '"\n';
    })
    .on('error', function(error){
      console.log(error.message);
    });

并使用node script.js.

就是这样!运行速度超快(20k 封电子邮件约 20 秒)。

如果您有任何建议,请告诉我(或将其打包成可下载的可执行文件)

于 2013-03-22T01:37:38.830 回答