1

这是我在 stackoverflow 上的第一个问题。

我有两个不同大小的数据df1框(266808 行)和df2(201 行)。 df1

df2

我想将每个值/数字的计数附加df1['WS_140m']df2['count']如果数字落在给定的类间隔中df2['Class_interval']

我努力了

1)

df2['count']=pd.cut(x=df1['WS_140m'], bins=df2['Class_interval'])

2)

df2['count'] = df1['WS_140m'].groupby(df1['Class_interval'])

3)

for anum in df1['WS_140m']:
    if anum in df2['Class_interval']:
        df2['count'] = df2['count'] + 1

请指导,如果有人知道。

4

2 回答 2

0

请尝试类似:


def in_class_interval(value, interval):
    #TODO
def in_class_interval_closure(interval):
   return lambda x: in_class_interval(x, interval)

df2['count'] = df2['Class_interval']
      .apply(lambda x: df1[in_class_interval_closure(x)(df1['WS_140m'])].size,axis=1)

定义你的函数in_class_interval(value, interval),它返回布尔值。

于 2020-04-13T08:27:12.067 回答
0

我想这样的事情会做到这一点:

In [330]: df1                                                                                                                                                                                               
Out[330]: 
   WS_140m
0     5.10
1     5.16
2     5.98
3     5.58
4     4.81

In [445]: df2                                                                                                                                                                                               
Out[445]: 
   count Class_interval
0      0            NaN
1      0    (0.05,0.15]
2      0    (0.15,0.25]
3      0    (0.25,0.35]
4      0    (3.95,5.15]

In [446]: df2.Class_interval = df2.Class_interval.str.replace(']', ')')

In [451]: from ast import literal_eval
In [449]: for i, v in df2.Class_interval.iteritems(): 
     ...:     if pd.notnull(v): 
     ...:         df2.at[i, 'Class_interval'] = literal_eval(df2.Class_interval[i]) 

In [342]: df2['falls_in_range'] = df1.WS_140m.between(df2.Class_interval.str[0], df2.Class_interval.str[1])                                                                                                 

您可以在任何地方增加计数True,如下所示:

In [360]: df2['count'] = df2.loc[df2.index[df2['falls_in_range'] == True].tolist()]['count'] +1                                                                                                             

In [361]: df2                                                                                                                                                                                               
Out[361]: 
   count Class_interval  falls_in_range
0    NaN            NaN           False
1    NaN   (0.05, 0.15)           False
2    NaN   (0.15, 0.25)           False
3    NaN   (0.25, 0.35)           False
4    1.0   (3.95, 5.15)            True
于 2020-04-13T08:35:11.420 回答