python-2.7 - 单引号替换，pandas/python2.7中空整数的处理

Question

Pandas/Python 的新手，我不得不编写一些笨拙的代码。我将不胜感激您将如何执行此操作并加快速度（我将为千兆字节的数据执行此操作）。

所以，我正在使用 pandas/python 进行一些 ETL 工作。执行逐行计算，因此我需要将它们作为流程中的数字类型（这部分省略）。我需要将一些字段作为数组输出，并去掉单引号、nan 和“.0”。

ifelse第一个问题，有没有办法将这些 if else 语句向量化R？其次，肯定有更好的方法来删除“.0”。pandas/numpy 处理数字类型中的空值似乎存在重大问题。

最后，.replace单引号的 DataFrame 似乎不起作用。我错过了什么吗？这是示例代码，如果您对此有任何疑问，请告诉我：

import pandas as pd

# have some nulls and need it in integers
d = {'one' : [1.0, 2.0, 3.0, 4.0],'two' : [4.0, 3.0, NaN, 1.0]}
dat = pd.DataFrame(d)

# make functions to get rid of the ".0" and necessarily converting to strings
def removeforval(val):
    if str(val)[-2:] == ".0":
        val = str(val)[:len(str(val))-2]
    else:
        val = str(val)
    return val
def removeforcol(col):
    col = col.apply(removeforval)
    return col
dat = dat.apply(removeforcol,axis=0)
# remove the nan's
dat = dat.replace('nan','')

# need some fields in arrays on a postgres database
quoted  = ['{' + str(tuple(x))[1:-1] + '}'  for x in dat.to_records(index=False)]
print "Before single quote removal"
print quoted

# try to replace single quotes using DataFrame's replace
quoted_df = pd.DataFrame(quoted).replace('\'','')
quoted_df = quoted_df.replace('\'','')
print "DataFrame does not seem to work"
print quoted_df

# use a loop
for item in range(len(quoted)):
    quoted[item] = quoted[item].replace('\'','')
print "This Works"
print quoted

谢谢！

score 1 · Accepted Answer

你明白，制作一个完全像这样的字符串是很奇怪的。这根本不是有效的python。你在用这个做什么？你为什么要把它串起来？

修改

In [144]: list([ "{%s , %s}" % tup[1:] for tup in df.replace(np.nan,0).astype(int).replace(0,'').itertuples() ])
Out[144]: ['{1 , 4}', '{2 , 3}', '{3 , }', '{4 , 1}']

python-2.7 - 单引号替换，pandas/python2.7中空整数的处理

1 回答 1

Related

Reference