0

I wrote a bash script in order to split a file. The file looks like this:

@<TRIPOS>MOLECULE
ZINC32514653
....
....

@<TRIPOS>MOLECULE
ZINC982347645
....
....

Here is the script I wrote:

#!/bin/bash
#split the file into files named xx##.mol2
csplit -b %d.mol2 ./Zincpharmer_ligprep_1.mol2 '/@<TRIPOS>MOLECULE/' '{*}'
#rename all files called xx##.mol2 by their 2nd line which is ZINC######
for filename in ./xx*.mol2; 
do
    newFilename=$(echo $filename | sed -n 2p $filename)
    if [ ! -e "./$newFilename.mol2" ]; then
    mv -i $filename ./$newFilename.mol2

    else
        num=2
        while [ -e "./"$newFilename"_$num.mol2" ]; do
        num=$((num+1))  
        done
        mv $filename "./"$newFilename"_$num.mol2"
    fi
    done

I have two questions:

1) is there a way to include the prefix option into csplit and telling csplit that the prefix is the line after the seperator.

2) the first line created by csplit xx00 is an empty file, as the separator is in the first line. How can I avoid this?

The expected output would be files named ZINC32514653.mol2 and ZINC982347645.mol2. An in case there a two entries with the same ZINC### ZINC982347645_2.mol2.

4

2 回答 2

0

All you need to know if available from this man csplit page:-

To tell csplit to change the prefix:-

-f, --prefix=PREFIX
       use PREFIX instead of 'xx'

To exclude empty files:-

-z, --elide-empty-files
       remove empty output files
于 2016-07-28T12:01:45.720 回答
0

This can't be done with csplit. I recommend something along the lines of:

awk  '/@<TRIPOS>MOLECULE/ { getline file; next } {print $0 > file }'
于 2016-07-28T13:02:59.733 回答