0

I have extracted data from one source to .txt file. Source is some sort of address book and I used macro recorder for extraction. Now I have several files which are formated exactly in next way (example on 4 contacts):

Abbrucharbeiten
ATR Armbruster
Werkstr. 28
78727 Oberndorf  
Tel. 0175 7441784
Fax 07423 6280
Abbrucharbeiten
Jensen & Sohn, Karl
Schallenberg 6A
25587 Münsterdorf
Tel. 04821 82538
Fax 04821 83381
Abbrucharbeiten
Kiwitt, R.
Auf der Heide 54
48282 Emsdetten
Tel. 02572 88559
Tel. 0172 7624359
Abbrucharbeiten, Sand und Kies, Transporte, Kiesgruben, Erdbau
Josef Grabmeier GmbH
Reitgesing 1
85560 Ebersberg
Tel. 08092 24701-0
Fax 08092 24701-24

1st row is always field(name) of bussines 2nd row is always name of company/firm 3rd row is always street adress 4th row is always Zip code and Place and then 5th row and next couple of rows (sometimes are two rows sometimes more) are eithar Tel. or Fax.

I want to format it so it would be something like excel sheet like:

Branche:    Name:     Address:   Place:    contact1:   contact2:
1st row     2nd row   3rd row    4th row   5th row     6th row.....

Now the main problem is I have over 500.000 contacts and my main problems are last fields which aren't always the same number... I don't wan't to do it manually, please help me...

4

1 回答 1

2

既不python也不visual basic但不应该很难翻译成这些语言。这是perl.

perl -lne '
        ## Print header. Either the header and data will be separated with pipes.
        ## Contacts(contact1, contact2, etc) are not included because at this 
        ## moment I can not know how many there will be. It could be done but script
        ## would be far more complex.
        BEGIN { 
                push @header, q|Branche:|, q|Name:|, q|Address:|, q|Place:|;
                printf qq|%s\n|, join q{|}, @header;
        }

        ## Save information for each contact. At least six lines. Over that only
        ## if lines begins with strings "Tel" or "Fax".
        if ( $line < 6 || m/\A(?i)tel|fax/ ) {
                push @contact_info, $_;
                ++$line;

                ## Not skip the printing of last contact.
                next unless eof;
        }

        ## Print info of contact, initialize data structures and repeat process
        ## for the next one.
        printf qq|%s\n|, join q{|}, @contact_info;

        $line = 0;
        undef @contact_info;

        push @contact_info, $_;
        ++$line;

' infile

它是单行的(我知道它似乎没有,但你可以去掉注释并删除换行符来获取它),所以直接从你的 shell 运行它。它产生:

Branche:|Name:|Address:|Place:
Abbrucharbeiten|ATR Armbruster|Werkstr. 28|78727 Oberndorf  |Tel. 0175 7441784|Fax 07423 6280
Abbrucharbeiten|Jensen & Sohn, Karl|Schallenberg 6A|25587 Münsterdorf|Tel. 04821 82538|Fax 04821 83381
Abbrucharbeiten|Kiwitt, R.|Auf der Heide 54|48282 Emsdetten|Tel. 02572 88559|Tel. 0172 7624359
Abbrucharbeiten, Sand und Kies, Transporte, Kiesgruben, Erdbau|Josef Grabmeier GmbH|Reitgesing 1|85560 Ebersberg|Tel. 08092 24701-0|Fax 08092 24701-24

考虑到我没有打印完整的标题并且字段用管道分隔。我认为在 Excel 中导入它没有问题。

于 2013-05-29T09:15:21.593 回答