我想绘制一个条形图,说明谷歌学者作者的 h-index 每年如何变化。为了计算这一点,我需要每年每篇论文的引用次数并计算每年的 h-index。
我设法在作者个人资料页面上获得了图表。以爱因斯坦的谷歌学者为例https://scholar.google.com/citations?user=qc6CJjYAAAAJ&hl=en,我得到了右边每年的被引次数图,但这是不正确的。我真正想要的是,当你点击一篇论文时,会有一个按年份划分的总引用数图表。我在 Python 中使用 BeautifulSoup 和 selenium 包。我现在最大的困难是:如果你查看一个作者的html代码,每篇论文的内容都是隐藏的,如何点击每篇论文并访问每篇论文的总引用数图表?
这是我为右边的图表所做的
def get_citation_by_year(url):
s = soup(str(urllib.request.urlopen(url).read()), 'lxml')
print(s)
#print(s.title.text) #whose google scholar is this?
years = list(map(int, [i.text for i in s.find_all('span', {'class':'gsc_g_t'})]))
citation_number = list(map(int, [i.text for i in s.find_all('span', {'class':'gsc_g_al'})]))
final_chart_data = dict(zip(years, citation_number))
df = pd.DataFrame({'Year': years, 'Cited_By': citation_number})
return(df)
单击 showmore 按钮以显示最大文章数:
def get_citation_byarticle_byyear(url):
#quote_page is an URL of google scholar page of a specific author
quote_page = url
page = urlopen(quote_page)
# Click Show more
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=r"/Users/upcrown/Desktop/chromedriver") #need to download ChromeDriver, http://chromedriver.chromium.org/downloads
driver.implicitly_wait(30)
driver.get(url)
python_button = driver.find_element_by_xpath('//*[@id="gsc_bpf_more"]')
python_button.click() #click fhsu link
time.sleep(5)
# Selenium hands the page source to Beautiful Soup
s = BeautifulSoup(driver.page_source, "html.parser")
year = list(map(str, [i.text for i in s.find_all('span', {'class': 'gsc_a_h gsc_a_hc gs_ibl'})])) ##string not int because some are ''
#find the paper
#paper = soup.find_all("a", attrs={"class": "gsc_a_at"})
paper = list(map(str, [i.text for i in s.find_all('a', {'class': 'gsc_a_at'})]))
#find the citations
#citations = soup.find_all("a", attrs={"class":"gsc_a_ac gs_ibl"})
citations = list(map(str, [i.text for i in s.find_all('a', {'class': 'gsc_a_ac gs_ibl'})]))
尝试过的其他工具:R“学者”包,没有每年每篇论文的引用计数,只有每年的引用计数。Windows 应用程序:发布或消亡(同样的问题)。Scopus API(没有作为谷歌学者的作者所有文章的完整列表)