image - 制作一个包含在一个大字符串中的链接数组

Question

我有一个大字符串（来自网页的 html 代码）。

现在的问题是如何解析图像的链接。

我想制作一个包含该网页中图像的所有链接的数组。

我知道如何在 java 中执行此操作，但我不知道如何解析字符串并在 shell 中执行字符串操作。我知道有很多技巧，我想这很容易做到。

最后我想得到这样的东西

    #!/bin/bash

read BIG_STRING <<< $(curl some_web_page_with_links_to_images.com)

#parse the big string and fill the LINKS variable

    # fill this with the links to image somewhow (.jpg and .png only)
    #after the parsing the LINKS should look like this
    LINKS=("www.asd.com/asd1.jpg" "www.asd.com/asd.jpg" "www.asd.com/asd2123.jpg")


    #I need the parsing and to fill the LINKS variable with the links from the web page

    # get length of an array
    tLen=${#LINKS[@]}


    for (( i=0; i<${tLen}; i++ ));
    do
      echo ${LINKS[$i]}
    done

谢谢，你的回复让我省了几天的挫败感

score 1 · Accepted Answer

为什么不从正确的工具开始？解析 HTML 很困难，尤其是sed. 如果你有mojo来自 Mojolicious 项目的工具，你可以这样做：

mojo get http://example.com a attr href

然后只需检查每行是否以jpg,png或其他结尾。

score 0 · Accepted Answer

很难提供比近似值更多的信息。让我们假设所有有趣的链接都是href=""属性，并且href每行最多有一个属性（而且链接也只有一行，实际上我不确定 URL 中是否允许换行。

假设您的源文件名为test.html.

以下应打印这些假设下的所有链接：

sed -n 's/.*\<href="\([^"]*\)".*/\1/p' test.html

要了解它是如何工作的，您应该知道什么是正则表达式并阅读了关于 sed 的教程（特别是s ubstitute 命令的工作原理）

image - 制作一个包含在一个大字符串中的链接数组

2 回答 2

Related

Reference