0

我正在尝试通过 Google 公共专利数据中的 SQL 检索特定的 cpc 代码和受让人。我正在尝试搜索术语“大众”和 cpc.code “H01M8”。

但我得到了错误:

参数类型的运算符 = 没有匹配的签名:ARRAY <STRUCT<name STRING, country_code STRING>>, STRING。支持的签名:ANY = ANY at [15:3]

代码:

SELECT
  publication_number application_number,
  family_id,
  publication_date,
  filing_date,
  priority_date,
  priority_claim,
  ipc,
  cpc.code,
  inventor,
  assignee_harmonized,
FROM
  `patents-public-data.patents.publications`
WHERE
  assignee_harmonized = "VOLKSWAGEN" AND cpc.code = "H01M8"
LIMIT
  1000

我也有兴趣搜索多个受让人,例如:

in ("VOLKSWAGEN", "PORSCHE", "AUDI", "SCANIA", "SKODA", "MAZDA", "TOYOTA", "HONDA", "BOSCH", "KYOCERA", "PANASONIC", "TOTO", "NISSAN", "LG FUEL CELL SYSTEMS", "SONY", "HYUNDAI", "SUZUKI", "PLUG POWER", "SFC ENERGY", "BALLARD", "KIA MOTORS", "SIEMENS", "KAWASAKI", "BAYERISCHE MOTORENWERKE", "HYDROGENICS", "POWERCELL SWEDEN", "ELRINGKLINGER", "PROTON MOTOR")

我最近开始使用 SQL 并没有看到错误:/

非常感谢您的帮助!

4

2 回答 2

0

在 Google BigQueryUNNEST中需要访问 ARRAY 元素。此处对此进行了描述:

https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays

以下查询对我有用。

SELECT
  publication_number application_number,
  family_id,
  publication_date,
  filing_date,
  priority_date,
  priority_claim,
  ipc,
  cpc__u.code,
  inventor,
  assignee_harmonized,
FROM
  `patents-public-data.patents.publications`,
  UNNEST(assignee_harmonized) AS assignee_harmonized__u,
  UNNEST(cpc) AS cpc__u
WHERE
  assignee_harmonized__u.name = "VOLKSWAGEN AG"
  AND cpc__u.code LIKE "H01M8%"
LIMIT
  1000

以下是我为生成结果所做的更改:

  1. UNNEST(assignee_harmonized) as assignee_harmonized__u访问assignee_harmonized__u.name.
  2. UNNEST(cpc) as cpc__u访问cpc__u.code.
  3. assignee_harmonized__u.name = "VOLKSWAGEN AG"as"VOLKSWAGEN"不返回任何结果。
  4. cpc__u.code LIKE "H01M8%"as"H01M8"不返回任何结果。一个示例值为H01M8/10

这将返回以下内容:

查询完成(经过 2.3 秒,已处理 29.2 GB)

如果要筛选多个受让人姓名,IN将按以下方式工作,但是,您需要完全匹配,例如VOLKSWAGEN AGor AUDI AG

assignee_harmonized__u.name IN ("VOLKSWAGEN", "PORSCHE", "AUDI", "SCANIA", "SKODA", "MAZDA", "TOYOTA", "HONDA", "BOSCH", "KYOCERA", "PANASONIC", "TOTO", "NISSAN", "LG FUEL CELL SYSTEMS", "SONY", "HYUNDAI", "SUZUKI", "PLUG POWER", "SFC ENERGY", "BALLARD", "KIA MOTORS", "SIEMENS", "KAWASAKI", "BAYERISCHE MOTORENWERKE", "HYDROGENICS", "POWERCELL SWEDEN", "ELRINGKLINGER", "PROTON MOTOR")

如果要对LIKE多个字符串进行样式匹配,可以尝试REGEXP_CONTAINS

https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains

于 2021-07-10T11:18:42.540 回答
0

非常感谢,现在我创建了这个代码来筛选多家公司。是否可以在每个行单元格中从“cpc__u.code”中获取请求查询?用“,”分隔输出字符串之间的代码?。同样,我也想在这里考虑 assignee_harmonized__u.name !

你认为这些公司会被这个程序和“IN”运算符筛选吗?

SELECT
  publication_number application_number,
  family_id,
  publication_date,
  filing_date,
  priority_date,
  priority_claim,
  cpc__u.code,
  inventor,
  assignee_harmonized,
  assignee
FROM
  `patents-public-data.patents.publications`,
  UNNEST(assignee_harmonized) AS assignee_harmonized__u,
  UNNEST(cpc) AS cpc__u
WHERE
  assignee_harmonized__u.name in ("VOLKSWAGEN", "PORSCHE", "AUDI", "SCANIA", "SKODA", "MAZDA", "TOYOTA", "HONDA", "BOSCH", "KYOCERA", "PANASONIC", "TOTO", "NISSAN", "LG FUEL CELL SYSTEMS", "SONY", "HYUNDAI", "SUZUKI", "PLUG POWER", "SFC ENERGY", "BALLARD", "KIA MOTORS", "SIEMENS", "KAWASAKI", "BAYERISCHE MOTORENWERKE", "HYDROGENICS", "POWERCELL SWEDEN", "ELRINGKLINGER", "PROTON MOTOR")
  AND cpc__u.code LIKE "H01M8%"
LIMIT
  100000
于 2021-07-10T12:57:05.720 回答