-1

ValueError:传递值的长度为 4,索引意味着 3

import requests

from bs4 import BeautifulSoup

import pandas as pd

url_wiki = 'https://pt.wikipedia.org/wiki/Lista_dos_distritos_de_S%C3%A3o_Paulo_por_popula%C3%A7%C3%A3o'

response = requests.get(url_wiki)

soup = BeautifulSoup(response.text, "html.parser")

table = soup.find('table', {'class':'wikitable sortable'}).tbody

rows = table.find_all('tr')

columns= [v.text.replace('\n', '') for v in rows[0].find_all('th')]

print(columns)

'#columns has 3 elements!!'

df=pd.DataFrame(columns=columns)

#now have to populate the table
for i in range (1,len(rows)): #find skipping the first row, search in all rows
    tds=rows[i].find_all('td')
    #inspect ...rowspan 2 td tags, otherwise 3 td tags
    if len(tds) == 2:
        values = [tds[0].text.replace('\n', ''), tds[1].text.replace('\n', ''), tds[2].text.replace('\n', '')]
    else:
        values=[td.text.replace('\n', '') for td in tds]
    #print(values)

到目前为止一切看起来都很好

当我运行下面的行时,我得到了错误。检查上面的评论,索引有3个元素

同样通过在上面运行 #print(values) 可以清楚地看到该表有 3 列。

我在这里想念什么

    df=df.append(pd.Series(values, index=columns), ignore_index=True)
    df

追溯

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-210-2fa24a66fa93> in <module>
----> 1 df=df.append(pd.Series(values, index=columns), ignore_index=True)

e:\Anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    319                 try:
    320                     if len(index) != len(data):
--> 321                         raise ValueError(
    322                             f"Length of passed values is {len(data)}, "
    323                             f"index implies {len(index)}."

ValueError: Length of passed values is 4, index implies 3.
4

1 回答 1

0

谢谢@Prune 和@goalie1998。意识到问题并解决了。原始表中有一个“错误”,最后一个空行中有一个隐藏列。通过在此处减去 1 来解决它:

for i in range (1,len(rows)-1): #find 跳过第一行,搜索所有行

干杯

于 2021-01-27T23:50:39.713 回答