0

我是 webharvest 的新手,正在使用它从网站获取文章数据,使用以下语句:

let $text := data($doc//div[@id="articleBody"])

这是我从上述声明中得到的数据:

The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army

Notable people

Notable current and former residents of Pittstown include:

我的问题是,是否可以使用配置删除“知名人士”之后的全部内容。有可能这样做吗?如果可能,请让我知道如何。谢谢。

编辑: 所需的输出:

The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army

Notable people
4

1 回答 1

1

you just need to change your let statement like:

let $text := substring-before(data($doc//div[@id="articleBody"]/text()), 'Notable people')

to get your desired output

于 2013-09-16T12:38:26.447 回答