google-sheets - 在 Google 表格中使用 ImportXML 从网页中抓取署名

Question

希望从文章中提取作者姓名。当前使用 =IMPORTXML(G2,"//*[@class='author-details']")

当我这样做时，它会在下面创建 4 个单元格，其中包含我无法摆脱的单词“By”。

非常新的代码 - 我做错了什么？

附例：https ://docs.google.com/spreadsheets/d/1Mi1D5G1-_gNsQwVQ6I_ealDqcWixKA2p-hFqJpjlGt4/edit?usp=sharing

score 0 · Accepted Answer

您可以使用：

=index(IMPORTXML(G2,"//*[@class='author-details']"),1,2)

这仅显示返回内容的第二列的第一行。您所追求的信息。

编辑：

此外，由于您突出显示您想要作者姓名。如果所有名称都采用“By FIRST LAST @TwitterHandle Affiliation”格式，那么您可以使用它来获取作者的姓名：

=trim(split(right(index(IMPORTXML(G2,"//*[@class='author-details']"),1,2),len(index(IMPORTXML(G2,"//*[@class='author-details']"),1,2))-3),"@",true,true))

可能看起来像伏都教，但将其粘贴进去，它可以工作。它删除前 3 个字符（“By”），在“@”符号处拆分文本，然后只保留其左侧的文本，即名称。

google-sheets - 在 Google 表格中使用 ImportXML 从网页中抓取署名

1 回答 1

Related

Reference