python-2.7 - KeyError Pandas Dataframe（编码索引）

Question

我正在运行下面的代码。它创建了几个数据帧，它们将另一个数据帧中的一列作为其索引，该数据帧具有会议名称列表。

    df_conf = pd.read_sql("select distinct Conference from publications where year>=1991 and length(conference)>1 order by conference", db)

    for index, row in df_conf.iterrows():
            row[0]=row[0].encode("utf-8")

    df2= pd.DataFrame(index=df_conf['Conference'], columns=['Citation1991','Citation1992'])

    df2 = df2.fillna(0)
    df_if= pd.DataFrame(index=df_conf['Conference'], columns=['IF1994','IF1995'])

    df_if = df_if.fillna(0)

    df_pubs=pd.read_sql("select Conference, Year, count(*) as totalPubs from publications where year>=1991 group by conference, year", db)

    for index, row in df_pubs.iterrows():
        row[0]=row[0].encode("utf-8")

    df_pubs= df_pubs.pivot(index='Conference', columns='Year', values='totalPubs')
    df_pubs.fillna(0)

    for index, row in df2.iterrows():
        df_if.ix[index,'IF1994'] = df2.ix[index,'Citation1992'] / (df_pubs.ix[index,1992]+df_pubs.ix[index,1993])

最后一行不断给我以下错误：

KeyError: 'Analyse dynamischer Systeme in Medizin, Biologie und \xc3\x96kologie'

不太确定我做错了什么。我尝试对索引进行编码。它行不通。我什至试过.at仍然不会工作。

我知道它与编码有关，因为它总是停在非 ascii 字符的索引处。

我正在使用 python 2.7

score 1 · Accepted Answer

我认为这个问题：

for index, row in df_conf.iterrows():
    row[0]=row[0].encode("utf-8")

是它可能起作用也可能不起作用，我很惊讶它没有发出警告。

str除此之外，对系列使用矢量化方法要快得多encode：

df_conf['col_name'] = df_conf['col_name'].str.encode('utf-8')

如果需要，您还可以以类似的方式对索引进行编码：

df.index = df.index.str.encode('utf-8')

score 0 · Accepted Answer

它发生在代码最后一部分的行中？

df_if.ix[index,'IF1994'] = df2.ix[index,'Citation1992'] / (df_pubs.ix[index,1992]+df_pubs.ix[index,1993])

如果那时，请尝试

df_if.ix[index,u'IF1994'] = df2.ix[index,u'Citation1992'] / (df_pubs.ix[index,1992]+df_pubs.ix[index,1993])

它会起作用的。UTF8 中的数据帧索引以奇怪的方式工作，即使脚本是用“# - - coding:utf8 - -”声明的。当您使用数据框列并使用 utf8 字符串进行索引时，只需将“u”放入 utf8 字符串中

python-2.7 - KeyError Pandas Dataframe（编码索引）

2 回答 2

Related

Reference