0

我想通过application_number这样的方式收集数据。真实的申请号是CN 201510747352

SELECT c.application_number AS Pub, COUNT(p.publication_number) AS CitedBy 
     FROM `patents-public-data.patents.publications` AS p, UNNEST(citation) AS c 
     WHERE c.application_number IN ('CN-201510747352-A') 
     GROUP BY c.application_number

但它行不通。网址是专利页面。谁能帮我一个忙?https://patents.google.com/patent/CN105233911B/zh?oq=CN201510747352.8

4

1 回答 1

1

我的猜测是,专利可以在其状态为 Application 后被引用 - 因此,不要使用初始编号CN-201510747352- 当状态为 Application 时,您应该使用 app/pub 编号 - 此外,您不仅需要应用不同的计数,而且还需要使用 - A or -B or etc. suffix - 这就是为什么你会看到使用 regex_extract 函数

#standardSQL
SELECT 
  c.publication_number AS Pub, 
  COUNT(DISTINCT REGEXP_EXTRACT(p.publication_number, r'(.+-.+)-')) AS CitedByCount
FROM `patents-public-data.patents.publications` AS p, 
UNNEST(citation) AS c 
WHERE c.publication_number LIKE ('CN-105233911%') 
GROUP BY c.publication_number  

结果

Row Pub             CitedBy  
1   CN-105233911-A  10   

...如果我只有应用程序数据,我怎么能实现它?

#standardSQL
SELECT 
  c.publication_number AS Pub, 
  COUNT(DISTINCT REGEXP_EXTRACT(p.publication_number, r'(.+-.+)-')) AS CitedByCount
FROM `patents-public-data.patents.publications` AS p, 
UNNEST(citation) AS c 
WHERE c.publication_number IN (
  SELECT publication_number 
  FROM `patents-public-data.patents.publications`
  WHERE application_number IN ('CN-201510747352-A') 
)
GROUP BY c.publication_number 
于 2018-11-27T06:39:05.970 回答