bash - 需要包含大量垃圾的 URL 的名称期望名称。（高级 BASH）

Question

http://romhustler.net/file/54654/RFloRzkzYjBxeUpmSXhmczJndVZvVXViV3d2bjExMUcwRmdhQzltaU5UUTJOVFE2TVRrM0xqZzNMakV4TXk0eU16WTZNVE01TXpnME1UZ3pPRHBtYVc1aGJGOWtiM2R1Ykcy<5aF _

http://romhustler.net/rom/ps2/final-fantasy-x-usa <-- 父网址

如果您复制粘贴此 url，您将看到浏览器识别文件名。我怎样才能得到一个 bash 脚本来做同样的事情？

我需要 WGET 第一个 URL，但因为它将用于 100 多个项目，所以我无法复制粘贴每个 URL。

我目前为所有文件设置了菜单。只是不知道如何单独批量下载每个文件，因为文件的 URL 没有匹配的模式。

*我的工作菜单：

                    #Raw gamelist grabber
    w3m http://romhustler.net/roms/ps2 |cat|egrep "/5" > rawmenu.txt

                    #splits initial file into a files(games01) that contain 10 lines.
                    #-d puts lists files with 01
    split -l 10 -d rawmenu.txt games

                    #s/ /_/g - replaces spaces with underscore
                    #s/__.*//g - removes anything after two underscores
    select opt in\
    $(cat games0$num|sed -e 's/ /_/g' -e 's/__.*//g')\
    "Next"\
    "Quit" ;

    if [[ "$opt" =~ "${lines[0]}" ]];
    then
        ### Here the URL needs to be grabbed ###

这个必须做的是BASH。这可能吗？

score 0 · Accepted Answer

romhustler.net 似乎在其完整下载页面上使用了一些 Javascript，以在页面加载后将最终下载链接隐藏几秒钟，这可能是为了防止这种网络抓取。

但是，例如，如果他们使用指向 ZIP 文件的直接链接，我们可以这样做：

# Use curl to get the HTML of the page and egrep to match the hyperlinks to each ROM
curl -s http://romhustler.net/roms/ps2 | egrep -o "rom/ps2/[a-zA-Z0-9_-]+" > rawmenu.txt

# Loop through each of those links and extract the full download link
while read LINK
do
    # Extract full download link
    FULLDOWNLOAD=`curl -s "http://romhustler.net$LINK" | egrep -o "/download/[0-9]+/[a-zA-Z0-9]+"`
    # Download the file
    wget "http://romhustler.net$FULLDOWNLOAD"
done < "rawmenu.txt"

bash - 需要包含大量垃圾的 URL 的名称期望名称。（高级 BASH）

1 回答 1

Related

Reference