html - 将 html 源代码传递给 bash 脚本并对其进行操作

Question

如果已经回答了这样的问题，我深表歉意，但我对 bash 脚本编写太陌生，无法判断它是否已得到回答。

我想将网页的 html 源代码传递给脚本，以便它可以修改/抓取其 HTML 标记的网页。我尝试过的一个例子：

猫网页.htm | 。/做一点事

dosomething 的代码如下

#!/bin/bash

export LC_ALL='C'

echo "testing"
echo $1 #this is the part where I'd like to be able to access the html that I've passed into the script
echo "still testing"
sed 's/<[^>]*>//g' < $1 #trying to strip the html tags of the webpage that I've passed in

当 cat 不起作用时，我尝试了：

./dosomething < 网页.htm

我的脚本代码也不起作用。该脚本需要从标准输入读取 HTML 并在将修改后的 HTML 放入标准输出之前对其进行修改 - 我无法将网页作为实际参数传递，如下所示：

./dosomething 网页.htm

score 1 · Accepted Answer

如果你想从网页中去除 html 标签，这已经被命令行浏览器解决了。看看 lynx -dump 选项

lynx -dump http://www.subir.com/lynx.html

elinks 也有类似的选项，不太确定 w3c

score 0 · Accepted Answer

由于源已经通过标准输入提供给脚本，因此脚本中的命令会继承此输入，因此您不能将输入重定向到那里 - 删除< $1.

现在祝你勇敢地在 bash 中处理 HTML 好运。

html - 将 html 源代码传递给 bash 脚本并对其进行操作

2 回答 2

Related

Reference