0

我假设有 50 个文件夹,每个文件夹都有不同数量的文件对,这些文件是命令行工具的输入。

#for f in ./*shuf; do #lists all the directories
    #FILES=${f}/*.fastq #to get all the fastq files in the directory

    FILES="./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_r.fastq"

我需要做的是将文件分成它们各自的对(每个文件名一个 r 和一个 f),看起来像这样(一对):

echo $PAIR

./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq

我将使用它作为需要采用这种格式的输入

 (`basename ${PAIR%_*}; $PAIR`):
 C115_7.121017_1 ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq

然后遍历所有对。

我试图这样做:

IFS=' ' read -ra ADDR <<< "$FILES"
echo "${ADDR[ ]}"

但我遇到了一个错误${ADDR[ ]}: bad substitution。您能否解释一下我真正想学习的方法。

编辑:

澄清一点:

这有点像我正在寻找的输出:

 IFS=' ' read -ra ADDR <<< "$FILES"
 pairs="${ADDR[@]}"
 for afile in ${pairs}; do bfile=${afile%_*}; echo ${bfile}_r.fastq ${bfile}_f.fastq; done

但没有重复:

./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_f.fastq
4

2 回答 2

1
shopt -s nullglob

KEYS=()
declare -A MAP=()

for D in ./*shuf; do
    for F in "$D"/*.fastq; do
        KEY=${F##*/} KEY=${KEY%_*}
        [[ -z ${MAP[$KEY]} ]] && KEYS+=("$KEY")
        MAP[$KEY]+=" $F"
    done
    for KEY in "${KEYS[@]}"; do
        echo "${KEY}${MAP[$KEY]}"
    done
    KEYS=()
    MAP=()
done

或者

shopt -s nullglob

KEYS=()
declare -A MAP=()

for D in ./*shuf; do
    for F in "$D"/*.fastq; do
        KEY=${F##*/} KEY=${KEY%_*}
        [[ -z ${MAP[$KEY]} ]] && KEYS+=("$KEY")
        MAP[$KEY]+=" $F"
    done
done

for KEY in "${KEYS[@]}"; do
    echo "${KEY}${MAP[$KEY]}"
done

您需要 Bash 4.0 或更新版本。祝你好运。

于 2014-07-03T17:55:36.263 回答
0
for f in *shuf; do
  files=( "$f"/*.fastq ) # an array of files, NOT a string
  for file in "${files[@]}"; do # expands each element into a separate parameter
    # write output; note that this is DANGEROUS because it's newline-terminating
    # ...filenames which can potentially themselves contain newlines.
    printf '%s %s\n' "$(basename "${file%_*}")" "$file"
  done
done
于 2014-07-03T17:39:52.330 回答