python - dfply: Mutating string column: TypeError

Question

My pandas dataframe contains a column "file" which are strings with a file path. I am trying to use dfply to mutate this column like

resultstatsDF.reset_index() >> mutate(dirfile = os.path.join(os.path.basename(os.path.dirname(X.file)),os.path.basename(X.file)))

but I get the error

TypeError: __index__ returned non-int (type Call)

What did I do wrong? How do I do it right?

score 4 · Accepted Answer

由于我的问题被投票赞成，我想，这对某些人来说仍然很有趣。到目前为止已经在 Python 中学到了很多东西，让我来回答一下，也许它会对其他用户有所帮助。

首先，让我们导入所需的包

import pandas as pd
from dfply import *
from os.path import basename, dirname, join

并制作所需的 pandas DataFrame

resultstatsDF = pd.DataFrame({'file': ['/home/user/this/file1.png', '/home/user/that/file2.png']})

这是

                        file
0  /home/user/this/file1.png
1  /home/user/that/file2.png

我们看到我们仍然得到一个错误（尽管它由于 dfply 的不断发展而改变）：

resultstatsDF.reset_index() >> \
mutate(dirfile = join(basename(dirname(X.file)), basename(X.file)))

TypeError：索引返回非整数（意图类型）

原因是，因为 mutate 适用于系列，但我们需要一个适用于元素的函数。在这里我们可以使用 pandas 的函数pandas.Series.apply，它适用于系列。但是，我们还需要一个自定义函数，我们可以将其应用于系列的每个元素file。一切都放在一起，我们最终得到了代码

def extract_last_dir_plus_filename(series_element):
    return join(basename(dirname(series_element)), basename(series_element))

resultstatsDF.reset_index() >> \
mutate(dirfile = X.file.apply(extract_last_dir_plus_filename))

哪个输出

   index                       file         dirfile
0      0  /home/user/this/file1.png  this/file1.png
1      1  /home/user/that/file2.png  that/file2.png

在没有 dfply 的情况下这样做mutate，我们可以替代地写

resultstatsDF['dirfile'] = resultstatsDF.file.apply(extract_last_dir_plus_filename)

python - dfply: Mutating string column: TypeError

1 回答 1

Related

Reference