我正在尝试使用 python3 制作一个最简单的特征映射器。两个目的:获得最佳性能和了解如何编写 python :)
这是我的代码,它不起作用:
import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'],
'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']})
#trim column value selecting first two symbols
def s_trim(x):
return x[:2]
#make new column from two selecting first two symbols from each
def s_trim_concat(x,y):
return '%s-%s' % (x[:2],y[:2])
features = [
('trim',['Country'],s_trim),
('trim1',['Country','City'],s_trim_concat),
('trim2',['City','Country'],s_trim_concat)
]
for feature_name, columns, func in features:
source[feature_name] = source[columns].apply(func, axis=1)
print(source)
更新:现在代码可以工作,但我不得不使函数复杂化,所以我仍在寻找允许使用简单函数而无需内部类型转换的好的解决方案:
import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'],
'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']})
#trim column value selecting first two symbols
def s_trim(x):
return x.str[:2]
#make new column from two selecting first two symbols from each
def s_trim_concat(row):
x = row[0]
y = row[1]
return '%s-%s' % (x[:2],y[:2])
features = [
('trim',['Country'],s_trim),
('trim1',['Country','City'],s_trim_concat),
('trim2',['City','Country'],s_trim_concat)
]
for feature_name, columns, func in features:
if len(columns) == 1:
source[feature_name] = source[columns].apply(func)
else:
source[feature_name] = source[columns].apply(func, axis=1)
print(source)