我有一个复杂的群体问题,我需要帮助。
我有司机的名字,随着时间的推移,每个人都开过几辆车。每次他们打开汽车开车时,我都会捕捉到远程传输的周期和小时数。
我想做的是使用分组来查看驾驶员何时获得新车。我正在使用 Car_Cycles 和 Car_Hours 来监控重置(新车)。每个司机的小时数和周期按升序排列,直到有一辆新车并重置。我想让每辆车都成为一个序列,但逻辑上只能通过循环/小时重置来识别汽车。
我使用带有 if 语句的 for 循环在数据帧上执行此操作,处理时间需要几个小时。我有几十万行,每行包含大约 20 列。
我的数据来自通过中等可靠连接的传感器,因此我想使用以下标准进行过滤:仅当 Car_Hours 和 Car_Cycles 连续 2 行都小于前一组的最后一行时,新组才有效。使用两个输出并检查两行更改足以过滤所有错误数据。
如果有人能告诉我如何在不使用我繁琐的 for 循环和 if 语句的情况下快速解决 Car_Group 问题,我将不胜感激。
另外,对于那些非常冒险的人,我在下面添加了我原来的 for 循环和 if 语句。请注意,我在每个组中进行了一些其他数据分析/跟踪,以查看汽车的其他行为。如果您敢于查看该代码并向我展示一个高效的 Pandas 替代品,那就更赞了。
name Car_Hours Car_Cycles Car_Group DeltaH
jan 101 404 1 55
jan 102 405 1 55
jan 103 406 1 56
jan 104 410 1 55
jan 105 411 1 56
jan 0 10 2 55
jan 1 12 2 58
jan 2 14 2 57
jan 3 20 2 59
jan 4 26 2 55
jan 10 36 2 56
jan 15 42 2 57
jan 27 56 2 57
jan 100 61 2 58
jan 500 68 2 58
jan 2 4 3 56
jan 3 15 3 57
pete 190 21 1 54
pete 211 29 1 58
pete 212 38 1 55
pete 304 43 1 56
pete 14 20 2 57
pete 15 27 2 57
pete 36 38 2 58
pete 103 47 2 55
mike 1500 2001 1 55
mike 1512 2006 1 59
mike 1513 2012 1 58
mike 1515 2016 1 57
mike 1516 2020 1 55
mike 1517 2024 1 57
..............
for i in range(len(file)):
if i == 0:
DeltaH_limit = 57
car_thresholds = 0
car_threshold_counts = 0
car_threshold_counts = 0
car_change_true = 0
car_change_index_loc = i
total_person_thresholds = 0
person_alert_count = 0
person_car_count = 1
person_car_change_count = 0
total_fleet_thresholds = 0
fleet_alert_count = 0
fleet_car_count = 1
fleet_car_change_count = 0
if float(file['Delta_H'][i]) >= DeltaH_limit:
car_threshold_counts += 1
car_thresholds += 1
total_person_thresholds += 1
total_fleet_thresholds += 1
elif i == 1:
if float(file['Delta_H'][i]) >= DeltaH_limit:
car_threshold_counts += 1
car_thresholds += 1
total_person_thresholds += 1
total_fleet_thresholds += 1
elif i > 1:
if file['name'][i] == file['name'][i-1]: #is same person?
if float(file['Delta_H'][i]) >= DeltaH_limit:
car_threshold_counts += 1
car_thresholds += 1
total_person_thresholds += 1
total_fleet_thresholds += 1
else:
car_threshold_counts = 0
if car_threshold_counts == 3:
car_threshold_counts += 1
person_alert_count += 1
fleet_alert_count += 1
#Car Change?? Compare cycles and hours to look for reset
if i+1 < len(file):
if file['name'][i] == file['name'][i+1] == file['name'][i-1]:
if int(file['Car_Cycles'][i]) < int(file['Car_Cycles'][i-1]) and int(file['Car_Hours'][i]) < int(file['Car_Hours'][i-1]):
if int(file['Car_Cycles'][i+1]) < int(file['Car_Cycles'][i-1]) and int(file['Car_Hours'][i]) < int(file['Car_Hours'][i-1]):
car_thresholds = 0
car_change_true = 1
car_threshold_counts = 0
car_threshold_counts = 0
old_pump_first_flight = car_change_index_loc
car_change_index_loc = i
old_pump_last_flight = i-1
person_car_count += 1
person_car_change_count += 1
fleet_car_count += 1
fleet_car_change_count += 1
print(i, ' working hard!')
else:
car_change_true = 0
else:
car_change_true = 0
else:
car_change_true = 0
else:
car_change_true = 0
else: #new car
car_thresholds = 0
car_threshold_counts = 0
car_threshold_counts = 0
car_change_index_loc = i
car_change_true = 0
total_person_thresholds = 0
person_alert_count = 0
person_car_count = 1
person_car_change_count = 0
if float(file['Delta_H'][i]) >= DeltaH_limit:
car_threshold_counts += 1
car_thresholds += 1
total_person_thresholds += 1
total_fleet_thresholds += 1
file.loc[i, 'car_thresholds'] = car_thresholds
file.loc[i, 'car_threshold_counts'] = car_threshold_counts
file.loc[i, 'car_threshold_counts'] = car_threshold_counts
file.loc[i, 'car_change_true'] = car_change_true
file.loc[i, 'car_change_index_loc'] = car_change_index_loc
file.loc[i, 'total_person_thresholds'] = total_person_thresholds
file.loc[i, 'person_alert_count'] = person_alert_count
file.loc[i, 'person_car_count'] = person_car_count
file.loc[i, 'person_car_change_count'] = person_car_change_count
file.loc[i, 'Total_Fleet_Thresholds'] = total_fleet_thresholds
file.loc[i, 'Fleet_Alert_Count'] = fleet_alert_count
file.loc[i, 'fleet_car_count'] = fleet_car_count
file.loc[i, 'fleet_car_change_count'] = fleet_car_change_count