我在远程服务器(到目前为止使用 ssh 连接)上使用 bash 脚本(如下)来执行一个 python 脚本,该脚本一次下载大量 pdf 文件(从带有 URL 的文本文件中获取下载位置)在一个循环中。

我想在下载文件时将文件从远程服务器移动到我的本地计算机,然后从远程服务器中删除文件。有没有办法可以扩展我的 bash 脚本来做到这一点?或者有其他方法可以完成这项任务吗?

 while read line; do python python_script.py -l $line; done < pdfURLs.txt


1 回答 1


[Edited to reflect the fact that the original poster can't scp into his local computer from the server; I assume it's behind NAT or something of the sort]

[Edit 2: I'm keeping the current tunnel-based answer, for reference; but, since the original poster is unable to ssh back into his local machine, I'll assume something else is blocking the tunnel. See the suggestion at the end].

Ok, you'll need to open up a tunnel between the server and your home computer. So, ssh from your local computer (I assume it's Unix-based, you mentioned is a Mac, so that's fine) into the server with this command:

ssh -R 10022:localhost:22 your_server_address

In brief, this will forward the server's port 10022 (it's a high (> 1024) port, so it's likely to be available) to your local computer's port 22 (which is where ssh usually listens). That is, once you've done that, if you ssh into the server's 10022 port, you're actually sshing into your local computer. If you want to test it, from the server, do:

ssh -p 10022 localhost

login with your local computer's username and password, and you should see its shell prompt. If you do this test, remeber to log out, so as not to confuse yourself.

Once you've opened the tunnel, keep that connection open. You may use it to run the bash command line that downloads the PDF etc, but that's not necessary.

Then, try the following command-line:

while read line; do python python_script.py -l "$line"; scp -P 10022 *.pdf localhost:path/to/put/files/; rm *.pdf; done < pdfURLs.txt

A few things to keep in mind:

  • This waits until scp has finished and only then will the python script downloaded the next PDF. You mentioned you effectively wanted this, not to keep the PDF files on the server for long.
  • This copies all PDF files from the current directory to your local computer (and then erases them), so preferably run this from a previously empty directory.
  • I assume you can scp without having to type a password (using shared key authentication, for instance), otherwise it might get a bit annoying, having to retype your password all the time.

That should do it.

[Edited to add this alternative, for when the tunnel doesn't work]

If that fails, I can only assume something else is blocking your ssh/scp from the server to your local machine. In that case, you may try something different: from you local machine, do

while read line; do ssh -n server_address "cd tmp_download_directory && rm -f *.pdf && python python_script.py -l $line" && scp server_address:tmp_download_directory/*.pdf /local/path/to/put/files/; done < pdfURLs.txt; ssh server_address "rm -f tmp_download_directory/*.pdf"

(The "-n" switch to ssh is necessary, not to feed subsequente $lines into the ssh shell.)

于 2010-06-17T15:48:54.767 回答