我的猜测是,专利可以在其状态为 Application 后被引用 - 因此,不要使用初始编号CN-201510747352- 当状态为 Application 时,您应该使用 app/pub 编号 - 此外,您不仅需要应用不同的计数,而且还需要使用 - A or -B or etc. suffix - 这就是为什么你会看到使用 regex_extract 函数
#standardSQL
SELECT
c.publication_number AS Pub,
COUNT(DISTINCT REGEXP_EXTRACT(p.publication_number, r'(.+-.+)-')) AS CitedByCount
FROM `patents-public-data.patents.publications` AS p,
UNNEST(citation) AS c
WHERE c.publication_number LIKE ('CN-105233911%')
GROUP BY c.publication_number
结果
Row Pub CitedBy
1 CN-105233911-A 10
...如果我只有应用程序数据,我怎么能实现它?
#standardSQL
SELECT
c.publication_number AS Pub,
COUNT(DISTINCT REGEXP_EXTRACT(p.publication_number, r'(.+-.+)-')) AS CitedByCount
FROM `patents-public-data.patents.publications` AS p,
UNNEST(citation) AS c
WHERE c.publication_number IN (
SELECT publication_number
FROM `patents-public-data.patents.publications`
WHERE application_number IN ('CN-201510747352-A')
)
GROUP BY c.publication_number