linux - Bash, splitting files from directory into groups by size (low peformance)

Question

I have a problem with my Bash script to split all files from directory into groups where each group size is 1GB.

I have a script that looks like this:

#!/bin/bash
path=$1
unset i
echo $path start
fpath=`pwd`"/files"
find "$path" -type f>files
max=`wc -l $fpath | awk '{printf $1}'`
while read file; do
    files[i]=$file
    size[i]=$(du -s $file | awk '{printf $1}')
    ((i++))
    echo -ne $i/$max'\r'
done < `pwd`"/files"
echo -ne '\n'
echo 'sizes and filenames done'
unset weight index groupid
for item in  ${!files[*]}; do
    weight=$((weight+${size[$item]})) 
    group[index]=${files[$item]}
    ((index++))
    if [ $weight -gt "$((2**30))" ]; then
        ((groupid++))
        for filename in "${group[@]}"
        do 
            echo $filename
        done >euenv.part"$groupid"
        unset group index weight
     fi
done
((groupid++))
for filename in "${group[@]}"
do 
    echo $filename
done >euenv.part"$groupid"
echo 'done'

It works, but it is very slow. Can anyone help me and give me some advice how to make it faster? Thanks

score 0 · Accepted Answer

Below are my few suggestions, I have not implemented them myself so I am unable to tell what will be their performance improvement but I hope they will give you advice how to make it faster.

The first loop can be avoided when you replace

weight=$((weight+${size[$item]}))

in the second loop with:

size=$(du -s ${files[$item]} | awk '{printf $1}')

The temporary file files can be avoided when you replace

for item in ${!files[*]}; do

with

find "$path" -type f | while read file

and replace ${files[$item]} with ${file}.

Checking the size of the files can be avoided when instead of

find "$path" -type f

you use

find "$path" -type f -ls

and extract the columns with the name and the size.

linux - Bash, splitting files from directory into groups by size (low peformance)

1 回答 1

Related

Reference