6

I am getting started with Snakemake and I have a very basic question which I couldnt find the answer in snakemake tutorial.

I want to create a single rule snakefile to download multiple files in linux one by one. The 'expand' can not be used in the output because the files need to be downloaded one by one and wildcards can not be used because it is the target rule.

The only way comes to my mind is something like this which doesnt work properly. I can not figure out how to send the downloaded items to specific directory with specific names such as 'downloaded_files.dwn' using {output} to be used in later steps:

links=[link1,link2,link3,....]
rule download:    
output: 
    "outdir/{downloaded_file}.dwn"
params: 
    shellCallFile='callscript',
run: 
    callString=''
    for item in links:
        callString+='wget str(item) -O '+{output}+'\n'
    call('echo "' + callString + '\n" >> ' + params.shellCallFile, shell=True)
    call(callString, shell=True)

I appreciate any hint on how this should be solved and which part of snakemake I didnt understand well.

4

1 回答 1

5

这是一个可以帮助您解决问题的注释示例:

# Create some way of associating output files with links
# The output file names will be built from the keys: "chain_{key}.gz"
# One could probably directly use output file names as keys 
links = {
    "1" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
    "2" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
    "3" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}


rule download:
    output:
        # We inform snakemake that this rule will generate
        # the following list of files:
        # ["outdir/chain_1.gz", "outdir/chain_2.gz", "outdir/chain_3.gz"]
        # Note that we don't need to use {output} in the "run" or "shell" part.
        # This list will be used if we later add rules
        # that use the files generated by the present rule.
        expand("outdir/chain_{n}.gz", n=links.keys())
    run:
        # The sort is there to ensure the files are in the 1, 2, 3 order.
        # We could use an OrderedDict if we wanted an arbitrary order.
        for link_num in sorted(links.keys()):
            shell("wget {link} -O outdir/chain_{n}.gz".format(link=links[link_num], n=link_num))

这是另一种方法,它为下载的文件使用任意名称并使用output(虽然有点人为):

links = [
    ("foo_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz"),
    ("bar_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz"),
    ("baz_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz")]


rule download:
    output:
        # We inform snakemake that this rule will generate
        # the following list of files:
        # ["outdir/foo_chain.gz", "outdir/bar_chain.gz", "outdir/baz_chain.gz"]
        ["outdir/{f}".format(f=filename) for (filename, _) in links]
    run:
        for i in range(len(links)):
            # output is a list, so we can access its items by index
            shell("wget {link} -O {chain_file}".format(
                link=links[i][1], chain_file=output[i]))
        # using a direct loop over the pairs (filename, link)
        # could be considered "cleaner"
        # for (filename, link) in links:
        #     shell("wget {link} -0 outdir/{filename}".format(
        #         link=link, filename=filename))

可以使用以下方式并行完成三个下载的示例snakemake -j 3

# To use os.path.join,
# which is more robust than manually writing the separator.
import os

# Association between output files and source links
links = {
    "foo_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
    "bar_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
    "baz_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}


# Make this association accessible via a function of wildcards
def chainfile2link(wildcards):
    return links[wildcards.chainfile]


# First rule will drive the rest of the workflow
rule all:
    input:
        # expand generates the list of the final files we want
        expand(os.path.join("outdir", "{chainfile}"), chainfile=links.keys())


rule download:
    output:
        # We inform snakemake what this rule will generate
        os.path.join("outdir", "{chainfile}")
    params:
        # using a function of wildcards in params
        link = chainfile2link,
    shell:
        """
        wget {params.link} -O {output}
        """
于 2017-06-16T13:08:57.373 回答