I can connect to the IMAP mail server easy enough:
use Mail::IMAPClient;
use MIME::Base64;
use MIME::Parser;
my $imap = Mail::IMAPClient->new(
Server => '192.168.2.2',
User => 'xxxxxx',
Password => 'yyyyyy',
Ssl => 1,
Uid => 1,
);
my $folders = $imap->folders
or die "List folders error: ", $imap->LastError, "\n";
print "Folders: @$folders\n";
$sfolder="INBOX.2012";
$imap->select( $sfolder )
or die "Select '$Opt{sfolder}' error: ", $imap->LastError, "\n";
my @msgs = $imap->messages or die "Could not messages: $@\n";
However, the text and html I want is not easily parsed due to codes like this:
Content-Transfer-Encoding:base64
Content-Type:text/html; charset=utf-8
Content-Transfer-Encoding:base64
Content-Type:text/html; charset=utf-8
Content-Transfer-Encoding:
Content-Type:multipart/mixed; boundary="----------=_4F0F4830.7079357A"
Multipart
Content-Transfer-Encoding:
Content-Type:multipart/mixed; boundary="----=_Part_4487195_1184536749.1326753403034"
Multipart
Content-Transfer-Encoding:
Content-Type:multipart/alternative; boundary=--boundary_164442_d184e417-739f-
46d6-824a-6ea1846e79de
Multipart
Content-Transfer-Encoding:
Content-Type:multipart/mixed; boundary="----=_Part_3882878_23916831.1326509484032"
Multipart
Content-Transfer-Encoding:
I tried this but it only works on a tiny number of different encodings.
if ($imap->get_header($msg,"Content-Transfer-Encoding")=~ /base64/i) {
print "\nMatch base64";
if ($imap->get_header($msg,"Content-Type")=~m/text/i ) {
push(@mail,decode_base64($imap->body_string($msg)));
}
elsif ($imap->get_header($msg,"Content-Type")=~m/image/i )
{ print "\nImage detected"; }
elsif ($imap->get_header($msg,"Content-Type")=~m/application/i )
{ print "\nApplication detected"; }
There are 7bit and 8bit variants and other encoding methods that contain the html or text I want for later use. I successfully use decode_base64() to decode base64. The worse ones to decode are the ones that contain multi-part codes. I feel like I am re-inventing the wheel and there must be a library or module that can do all the heavy lifting for me. Other content types such as .jpg,.gif, and .pdf should simply be ignored. The multi-part emails contain at least 1 part that I an interested but many that are useless to me.
After further research this structure has some of the information I need but don't know how to get it out efficiently is another matter.
Dumping:$VAR1 = bless( {
'bodyparms' => {
'boundary' => '----=_NextPart_002_BC64_7D688C1F.A2FF9BE0'
},
'bodyextra' => undef,
'_top' => 1,
'bodydisp' => 'NIL',
'_id' => 'HEAD',
'bodysubtype' => 'mixed',
'PartsIndex' => {
'1.3' => bless( {
'bodyparms' => 'NIL',
'bodyid' => '<d9e26cc0-019c-4ac0-9b1e-9c9ac8424f52>',
'bodyextra' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.3',
'bodysubtype' => 'jpeg',
'_prefix' => '1.3',
'bodysize' => '4808',
'bodytype' => 'image',
'bodyMD5' => 'NIL',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' ),
'1.1' => bless( {
'bodyparms' => {
'boundary' => '----=_NextPart_000_36AE_880DDD08.0A776E35'
},
'bodyextra' => undef,
'bodydisp' => 'NIL',
'_id' => '1.1',
'bodysubtype' => 'alternative',
'_prefix' => '1.1',
'bodytype' => 'MULTIPART',
'bodystructure' => [
bless( {
'bodyparms' => {
'charset' => 'utf-8'
},
'bodyextra' => 'NIL',
'bodyid' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.1.1',
'bodysubtype' => 'PLAIN',
'_prefix' => '1.1.1',
'bodysize' => '1971',
'bodytype' => 'TEXT',
'bodyMD5' => 'NIL',
'textlines' => '74',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'quoted-printable'
}, 'Mail::IMAPClient::BodyStructure' ),
bless( {
'bodyparms' => {
'charset' => 'utf-8'
},
'bodyextra' => 'NIL',
'bodyid' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.1.2',
'bodysubtype' => 'HTML',
'_prefix' => '1.1.2',
'bodysize' => '23364',
'bodytype' => 'TEXT',
'bodyMD5' => 'NIL',
'textlines' => '331',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'quoted-printable'
}, 'Mail::IMAPClient::BodyStructure' )
],
'bodyloc' => 'NIL',
'bodylang' => 'NIL'
}, 'Mail::IMAPClient::BodyStructure' ),
'1' => bless( {
'bodyparms' => {
'boundary' => '----=_NextPart_001_EA96_2BF8DEDE.32622D51'
},
'bodyextra' => undef,
'bodydisp' => 'NIL',
'_id' => 1,
'bodysubtype' => 'related',
'_prefix' => 1,
'bodytype' => 'MULTIPART',
'bodystructure' => [
$VAR1->{'PartsIndex'}{'1.1'},
bless( {
'bodyparms' => 'NIL',
'bodyid' => '<5dff39db-e81c-4410-be75-8662564fd328>',
'bodyextra' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.2',
'bodysubtype' => 'jpeg',
'_prefix' => '1.2',
'bodysize' => '14406',
'bodytype' => 'image',
'bodyMD5' => 'NIL',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' ),
$VAR1->{'PartsIndex'}{'1.3'},
bless( {
'bodyparms' => 'NIL',
'bodyid' => '<717f2ef4-f795-4d1c-87cc-283c9b0a59b0>',
'bodyextra' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.4',
'bodysubtype' => 'gif',
'_prefix' => '1.4',
'bodysize' => '2912',
'bodytype' => 'image',
'bodyMD5' => 'NIL',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' )
],
'bodyloc' => 'NIL',
'bodylang' => 'NIL'
}, 'Mail::IMAPClient::BodyStructure' ),
'1.2' => $VAR1->{'PartsIndex'}{'1'}{'bodystructure'}[1],
'1.1.2' => $VAR1->{'PartsIndex'}{'1.1'}{'bodystructure'}[1],
'2' => bless( {
'bodyparms' => {
'name' => 'BKD-7361945220.pdf'
},
'bodyid' => 'NIL',
'bodyextra' => 'NIL',
'bodydisp' => {
'attachment' => {
'filename' => 'BKD-7361945220.pdf'
}
},
'_id' => 2,
'bodysubtype' => 'octetstream',
'_prefix' => 2,
'bodysize' => '47540',
'bodytype' => 'application',
'bodyMD5' => 'NIL',
'bodystructure' => [],
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' ),
'1.4' => $VAR1->{'PartsIndex'}{'1'}{'bodystructure'}[3],
'1.1.1' => $VAR1->{'PartsIndex'}{'1.1'}{'bodystructure'}[0]
},
'_prefix' => 'HEAD',
'PartsList' => [
1,
'1.1',
'1.1.1',
'1.1.2',
'1.2',
'1.3',
'1.4',
2
],
'bodytype' => 'MULTIPART',
'bodystructure' => [
$VAR1->{'PartsIndex'}{'1'},
$VAR1->{'PartsIndex'}{'2'}
],
'bodyloc' => 'NIL',
'bodylang' => 'NIL'
}, 'Mail::IMAPClient::BodyStructure' );
As you can see none of the values are guaranteed to be part of every part on the PartsIndex and some them are nested.
variable of interest for each PartsIndex item: bodytype bodysubtype bodyenc