python - 从 python 中的 postgres unidecode 文本列

Question

我是 Python 新手，我想从 postgresql 数据库中获取一列“user_name”并删除名称中的所有重音符号。Postgres 早些时候有一个名为 unaccent 的函数，但现在似乎不起作用。所以，我求助于 Python。

到目前为止，我有：

from sqlalchemy import create_engine
from pandas import DataFrame
import unidecode
engine_gear = create_engine('XYZABC')
connection = engine_gear.connect()
member = 1
result = connection.execute("select user_name from user") 
df = DataFrame(result.fetchall())
df.columns = result.keys()
connection.close()
df['n'] = df['user_name'].apply(unidecode)

当我运行这段代码时，我收到以下错误：

Traceback (most recent call last):
File "C:/Users/s/PycharmProjects/test/name_matching_test.py", line 20, in <module>
df['n'] = df['user_name'].apply(unidecode)
File "C:\Python\lib\site-packages\pandas\core\series.py", line 2355, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\_libs\src\inference.pyx", line 1574, in pandas._libs.lib.map_infer (pandas\_libs\lib.c:66645)
TypeError: 'module' object is not callable

起初，我认为我应该将 user_name 列转换为字符串。所以，我使用了 df['user_name'].astype('str')。但是这样做后我仍然遇到同样的错误。

任何帮助或指导将不胜感激。

数据样本：

user_name
Linda
Alonso

TestUser1
Arjang "RJ"
XI(DAPHNE)
Ajuah-AJ
Anthony "Tony"
Joseph-Patrick
Zoë 
André

score 0 · Accepted Answer

您有 2 个小问题，代码中的“unidecode”是一个模块，您希望 unidecode 功能脱离此模块，其次您需要应用于每个元素而不是系列/列，因此：

df.applymap(unidecode.unidecode)

score 0 · Accepted Answer

尝试这样的事情：

df[col]=df[col].str.decode('utf8')

我查询了一个 KDB 数据库，所以我不确定它在 Postgres 中的情况，但在我的情况下，字符串总是以 'byte' 类型返回，有时我需要使用 'latin-1' 解码器而不是 utf- 8（在我的例子中是西班牙和法国的名字）。我要做的是在每个查询之后运行一个函数，该函数遍历存储为“对象”的每一列并对其进行解码。像这样的东西：

def cleanup_datatypes(df, decoder='latin-1'):
    """
    kdb returns all strings as bytes, decode into readable strings. default is latin-1, which includes french, but can be UTF-8 as well.
    """
    for col in df.columns:
        if df[col].dtypes==object:
            df[col]=df[col].str.decode(decoder)
    return df

python - 从 python 中的 postgres unidecode 文本列

2 回答 2

Related

Reference