1

我正在尝试使用 python3 制作一个最简单的特征映射器。两个目的:获得最佳性能和了​​解如何编写 python :)

这是我的代码,它不起作用:

import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'], 
                  'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']})

#trim column value selecting first two symbols
def s_trim(x):
    return x[:2]

#make new column from two selecting first two symbols from each
def s_trim_concat(x,y):
    return '%s-%s' % (x[:2],y[:2])

features = [
    ('trim',['Country'],s_trim),
    ('trim1',['Country','City'],s_trim_concat),
    ('trim2',['City','Country'],s_trim_concat)
    ]

for feature_name, columns, func in features:
    source[feature_name] = source[columns].apply(func, axis=1)

print(source)

更新:现在代码可以工作,但我不得不使函数复杂化,所以我仍在寻找允许使用简单函数而无需内部类型转换的好的解决方案:

import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'], 
                  'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']})

#trim column value selecting first two symbols
def s_trim(x):
    return x.str[:2]

#make new column from two selecting first two symbols from each
def s_trim_concat(row):
    x = row[0]
    y = row[1]
    return '%s-%s' % (x[:2],y[:2])

features = [
    ('trim',['Country'],s_trim),
    ('trim1',['Country','City'],s_trim_concat),
    ('trim2',['City','Country'],s_trim_concat)
    ]

for feature_name, columns, func in features:
    if len(columns) == 1:
        source[feature_name] = source[columns].apply(func)
    else:
        source[feature_name] = source[columns].apply(func, axis=1)
print(source)
4

2 回答 2

0

我认为问题在于您将一个列表传递给 s_trim_concat 而不是两个单独的参数。

您能否提供一个示例,说明此示例的最终输出应该是什么样的。首先,我需要澄清从 s_trim_concat 返回的值应该与哪个键相关联?

更新

尝试这个:

import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'], 
                  'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']})

#trim column value selecting first two symbols
def s_trim(x):
    return x[:2]

#make new column from two selecting first two symbols from each
def s_trim_concat(x,y):
    return '%s-%s' % (x[:2],y[:2])

features = [
    ('trim',['Country'],s_trim),
    ('trim1',['Country','City'],s_trim_concat),
    ('trim2',['City','Country'],s_trim_concat)
    ]

for feature_name, columns, func in features:
    source[feature_name] = apply(func, columns)

print(source)
于 2013-07-21T11:25:28.537 回答
0

可能我找到了解决方案:

import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'], 
                  'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']})

#trim column value selecting first two symbols
def s_trim(x):
    return x.str[:2]

#make new column from two selecting first two symbols from each
def s_trim_concat(x,y):
    return '%s-%s' % (x[:2],y[:2])

features = [
    ('trim',['Country'],s_trim),
    ('trim1',['Country','City'],s_trim_concat),
    ('trim2',['City','Country'],s_trim_concat)
    ]

for feature_name, columns, func in features:
    source[feature_name] = source[columns].apply(
        func if len(columns) == 1 
        else lambda x: func(x[0],x[1]), axis=1)
print(source)
于 2013-07-21T12:55:41.813 回答