0

您好,我正在使用 pandas 从两个 excel 文件中导入数据,其中一个文件中包含的数据示例如下所示。基本上,我试图找到两个文件中相同的时间戳,然后将例如“电源”列中的数据排序,该列对应于两个文件中的相同时间戳到一些垃圾箱中。此示例中的 bin 从 0-50、50-100 等以 50 到例如 1000 的间隔

1.  Location    UnitName    Timestamp           Power        Windspeed   Yaw
2.  Bull Creek  F10         01/11/2014 00:00:00 7,563641548  3,957911002 280,5478821     
3.  Bull Creek  F10         01/11/2014 00:20:00 60,73444748  4,24157236  280,4075012
4.  Bull Creek  F10         01/11/2014 00:30:00 63,15441132  4,241089859 280,3903809
5.  Bull Creek  F10         01/11/2014 00:40:00 59,09280396  4,38904965  280,4152527
6.  Bull Creek  F10         01/11/2014 00:50:00 69,26197052  4,374599175 280,3750916
7.  Bull Creek  F10         01/11/2014 01:00:00 101,0624237  5,343887005 280,5173035
8.  Bull Creek  F10         01/11/2014 01:10:00 122,7936935  5,183885235 280,4681702
9.  Bull Creek  F10         01/11/2014 01:20:00 86,57110596  5,046733923 280,3834534     
10. Bull Creek  F10         01/11/2014 01:40:00 16,74042702  3,024427626 280,1408386
11. Bull Creek  F10         01/11/2014 01:50:00 12,5870142   2,931351769 280,1185913
12. Bull Creek  F10         01/11/2014 02:00:00 -1,029753685 3,116549245 279,9686279
13. Bull Creek  F10         01/11/2014 02:10:00 13,35998058  3,448055706 279,8687134
14. Bull Creek  F10         01/11/2014 02:20:00 17,42461395  2,943588415 280,1383057
15. Bull Creek  F10         01/11/2014 02:30:00 -9,614940643 2,744164819 280,6514893   
16. Bull Creek  F10         01/11/2014 02:50:00 -11,01966286 3,554833538 283,1451416
17. Bull Creek  F10         01/11/2014 03:00:00 -4,383010387 4,279259377 283,3281555

我想知道是否有比我到目前为止所制定的更智能的方法来做到这一点,因为箱的大小和最大值可能会改变。但这是我拥有的代码,它可以工作但不是很聪明。

import pandas as pd

fileREF = 'FilterDataREF.xlsx'

dataREF = pd.read_excel(fileREF, sheetname='Sheet1')

filePCU = 'FilterDataPCU.xlsx'

dataPCU = pd.read_excel(filePCU, sheetname='Ark1')

dateREF = dataREF['Timestamp']
datePCU = dataPCU['Timestamp']


n = 50
PowerLim = 1500
nBins = PowerLim/n
bins = range(0, PowerLim+1, n)

for i in range(len(dataREF)):
    for j in range(len(dataPCU)):
        if dataREF['Timestamp'][i] == dataPCU['Timestamp'][j] and 
        dataREF['Power'][i] > 0 and dataPCU['Power'][j] > 0:
         data_common = [dataREF.loc[i], dataPCU.loc[j]]

         data_power = [data_common[0][3], data_common[1][3]]
         power_dif = data_common[1][3]-data_common[0][3]

         power_REF = data_power[:][0]
         power_PCU = data_power[:][1]

         bin1 = power_REF[power_REF < 50]
         bin2 = power_REF[power_REF > 50 and power_REF < 100]
         bin3 = power_REF[power_REF > 100 and power_REF < 150]
4

1 回答 1

0

您可以使用 .cut 功能:

data_common['bin'] = pd.cut(data_common['power_REF'],bins=(0,max(data_common['power_REF'])+50,50),labels=range(0,max(data_common['powerREF'])+50,50))
于 2018-01-23T12:32:11.747 回答