0

目的是从网页中提取标题标签

我正在使用importdata,我希望将结果全部放在 1 行中。像这样:

[webpage] [title] [1st tag] [2nd tag] [3 rd tag] [4th tag] ... [last tag]

在 googlesheet 的过程中卡住了一半

  • 第一个选项卡Extracted - 我从大数据中提取了必要的行。

    =query({array_constrain(IMPORTDATA(A1),6375,10)},"WHERE (Col1 CONTAINS 'btn btn-secondary' AND Col1 CONTAINS 'href') or (Col1 CONTAINS 'meta property' AND Col1 CONTAINS 'og:title')")
  • 第二个选项卡with REGEXEXTRACT- 提取了我需要的文本,但仅适用于第一行(仅提取tagstitle仍然不存在,因为它分布在几列中......)

    =REGEXEXTRACT(query({array_constrain(IMPORTDATA(A1),6375,10)},"WHERE (Col1 CONTAINS 'btn btn-secondary' AND Col1 CONTAINS 'href')"),"\>(.+)\

我不知道如何走得更远:(感谢任何帮助!

4

1 回答 1

0
=ARRAYFORMULA({REGEXREPLACE(TEXTJOIN(", ",1,
 QUERY(ARRAY_CONSTRAIN(SUBSTITUTE(IMPORTDATA(A2),"""",""),1000,15),
 "where Col1 contains '<meta property=og:title content='")),
 "<meta property=og:title content=| />",""),
 TRANSPOSE(REGEXEXTRACT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(
 ARRAY_CONSTRAIN(SUBSTITUTE(IMPORTDATA(A2),"""",""),8000,3)),,50000)),
 "where Col1 contains '<a class=btn btn-secondary'"),"\>(.*)+\<"))})

0

演示电子表格

于 2019-03-10T18:05:32.787 回答