3

如果您已经使用过 Xidel,您通常需要定位具有特定类别的节点。为了更容易做到这一点,我想创建has-class("class")一个函数作为表达式的别名:
contains(concat(" ", normalize-space(@class), " "), " class ")

例子:

$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'

e-xidel.sh 包含以下代码:

#!/bin/bash

echo -e "$(tput setaf 2) Checking... $(tput sgr0)"

path=$1
expression=$2

# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'

xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")

echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"
4

2 回答 2

3

您可以使用sed(GNU 版本,不能保证它会与其他实现一起使用)来满足您的需求:

sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " \1 ")/g'

解释:

  • s/pattern/substitution/gsubstitution:用字符串替换匹配模式的部分;g 用于替换行的所有部分的标志(全局替换)
  • has-class("\([^)]\+\)"): 以has-class("包含除右括号 ( [^)]) 以外的任何字符开头并以 . 结尾的部分")。内部部分周围的转义括号捕获子部分并将其与 alias 相关联\1,因为它是第一个创建的捕获组。
  • contains(concat(" ", normalize-space(@class), " "), " \1 "): 用这个文本替换机器化的部分;\1将被相关捕获组的内容扩展。

您的脚本将是:

#!/bin/bash

function expand-has-class() {
    echo "$1" |
    sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " \1 ")/g'
}

echo -e "$(tput setaf 2) Checking... $(tput sgr0)"

path=$1
expression="$(expand-has-class "$2")"

# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'

xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")

echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"
于 2019-02-18T13:56:14.900 回答
1

contains(concat(" ", normalize-space(@class), " "), " class ")

例子:

$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'

这是没有意义的。

contains(concat(" ",normalize-space("wp-image")," ")," wp-image ")

将与

contains("wp-image","wp-image")

如果你真的想要一个布尔值作为输出,同时将class属性的值与文字字符串进行比较,那么这个......

xidel -s example.com -e '//article/p//img/@class="wp-image"'

...将返回truefalse

Ifwp-imageclass属性值的子字符串:

xidel -s example.com -e '//article/p//img/contains(@class,"wp-image")'
于 2019-08-19T23:20:48.253 回答