python - 按行设置熊猫条件，Python 2.7

Question

（我很讨厌给这些问题命名……）

所以我已经完成了 90% 的大熊猫学习过程，但我还有一件事要弄清楚。让我举个例子（实际的原始文件是一个逗号分隔的 CSV，它有更多的行）：

 Name    Price    Rating    URL                Notes1       Notes2            Notes3
 Foo     $450     9         a.com/x            NaN          NaN               NaN
 Bar     $99      5         see over           www.b.com    Hilarious         Nifty
 John    $551     2         www.c.com          Pretty       NaN               NaN
 Jane    $999     8         See Over in Notes  Funky        http://www.d.com  Groovy

URL 栏可以说很多不同的东西，但它们都包括“看一遍”，并且没有一致地指出右侧的哪一栏包含该网站。

我想做一些事情，在这里：首先，将网站从任何 Notes 列移动到 URL；其次，将所有注释列折叠成一列，并在它们之间换行。所以这个（NaN 被删除，因为 pandas 让我为了在 df.loc 中使用它们）：

 Name    Price    Rating    URL                Notes1       
 Foo     $450     9         a.com/x            
 Bar     $99      5         www.b.com          Hilarious
                                               Nifty
 John    $551     2         www.c.com          Pretty
 Jane    $999     8         http://www.d.com   Funky
                                               Groovy

我通过这样做到达了中途：

 df['URL'] = df['URL'].fillna('')
 df['Notes1'] = df['Notes1'].fillna('')
 df['Notes2'] = df['Notes2'].fillna('')
 df['Notes3'] = df['Notes3'].fillna('')
 to_move = df['URL'].str.lower().str.contains('see over')
 df.loc[to_move, 'URL'] = df['Notes1']

我不知道如何使用 www 或 .com 找到 Notes 列。例如，如果我尝试使用上述方法作为条件，例如：

 if df['Notes1'].str.lower().str.contains('www'):
    df.loc[to_move, 'URL'] = df['Notes1']

我回来了ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() 但是添加.any()or.all()有一个明显的缺陷，即他们没有给我我正在寻找的东西：例如，满足 URL 中 to_move 要求的每一行都会得到 Notes1 中的任何内容。我需要逐行进行检查。出于类似的原因，我什至无法开始折叠 Notes 列（而且我也不知道如何检查非空的空字符串单元格，这是我在这一点上创建的一个问题）。

在它所处的位置，我知道我还必须在满足第一个条件时将 Notes2 移动到 Notes1，将 Notes3 移动到 Notes2，并将 '' 移动到 Notes3，因为我不希望 Notes 列中的剩余 URL。我确信 pandas 的路线比我正在做的更简单，因为它是 pandas，当我尝试用 pandas 做任何事情时，我发现它可以在一行中完成，而不是我的 20 行......

（PS，我不在乎是否留下了空列 Notes2 和 Notes3，b/c 我不会在下一步的 CSV 导入中使用它们，尽管我总能学到比我需要的更多的东西）

更新：所以我一次一步地使用我的非熊猫 python 逻辑想出了一个糟糕的详细解决方案。我想出了这个（上面的前五行相同，减去 df.loc 行）：

url_in1 = df['Notes1'].str.contains('\.com')
url_in2 = df['Notes2'].str.contains('\.com')
to_move = df['URL'].str.lower().str.contains('see-over')
to_move1 = to_move & url_in1 
to_move2 = to_move & url_in2
df.loc[to_move1, 'URL'] = df.loc[url_in1, 'Notes1']
df.loc[url_in1, 'Notes1'] = df['Notes2']
df.loc[url_in1, 'Notes2'] = ''
df.loc[to_move2, 'URL'] = df.loc[url_in2, 'Notes2']
df.loc[url_in2, 'Notes2'] = ''

（行移动和 to_move 在实际代码中重复）我知道必须有一个更有效的方法......这也不会在 Notes 列中崩溃，但是使用相同的方法应该很容易，除了我仍然没有不知道找到空字符串的好方法。

score 1 · Accepted Answer

我还在学习 pandas，所以这段代码的某些部分可能不是那么优雅，但总体思路是 - 获取所有注释列，在其中找到所有 url，将其与URL列结合，然后将剩余的注释连接到Notes1列中：

import pandas as pd
import numpy as np
import pandas.core.strings as strings

# Just to get first notnull occurence
def geturl(s):
    try:
        return next(e for e in s if not pd.isnull(e))
    except:
        return np.NaN

df =  pd.read_csv("d:/temp/data2.txt")

dfnotes = df[[e for e in df.columns if 'Notes' in e]]

#       Notes1            Notes2  Notes3
# 0        NaN               NaN     NaN
# 1  www.b.com         Hilarious   Nifty
# 2     Pretty               NaN     NaN
# 3      Funky  http://www.d.com  Groovy

dfurls = dfnotes.apply(lambda x: x.str.contains('\.com'), axis=1)
dfurls = dfurls.fillna(False).astype(bool)

#   Notes1 Notes2 Notes3
# 0  False  False  False
# 1   True  False  False
# 2  False  False  False
# 3  False   True  False

turl = dfnotes[dfurls].apply(geturl, axis=1)

df['URL'] = np.where(turl.isnull(), df['URL'], turl)
df['Notes1'] = dfnotes[~dfurls].apply(lambda x: strings.str_cat(x[~x.isnull()], sep=' '), axis=1)

del df['Notes2']
del df['Notes3']

df
#    Name Price  Rating               URL           Notes1
# 0   Foo  $450       9           a.com/x                 
# 1   Bar   $99       5         www.b.com  Hilarious Nifty
# 2  John  $551       2         www.c.com           Pretty
# 3  Jane  $999       8  http://www.d.com     Funky Groovy

python - 按行设置熊猫条件，Python 2.7

1 回答 1

Related

Reference