0

我有一个包含 2 列计数器和历史记录的数据框,如下所示

*counter History*
1        Log Type: customer chat
         chat history:
            xxxxxxxxx
            xxxxxxx
            xxxxxxxxxxxxxxx
            May 10 2020 23:34:57 +GMT 05:30
            --------------------------------------------
            log type: Phone call
            issue type: xxxxxx
            issue:
             qqqqqqqqqqqq
             qqqqqqqqqqqqqqqqqqqqqqq
             qqqqqqqqqqqqqqq
             May 11 2020 08:54:54 + GMT 05:30
             ----------------------------------------------
             log type: phone call
             issue:
              eeeeeeeeeeeeee
              eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
              eeeeeeee
              eeeeeeeeeee
              eeeeeeeeeeee
              eeeeeeeeeeeeeeeeeee
              May 11 2020 14:58:54 + GMT 05:30
            ----------------------------------
2           Log Type: Phone call
            issue:
            xxxxxxxxx
            xxxxxxx
            xxxxxxxxxxxxxxx
            May 10 2020 23:34:57 +GMT 05:30
            --------------------------------------------
            log type: Phone call
            issue type: xxxxxx
            issue:
             qqqqqqqqqqqq
             qqqqqqqqqqqqqqqqqqqqqqq
             qqqqqqqqqqqqqqq
             May 11 2020 08:54:54 + GMT 05:30
             ----------------------------------------------
             log type: phone call
             issue:
               eeeeeeeeeeeeee
               eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
               eeeeeeee
               eeeeeeeeeee
               eeeeeeeeeeee
               eeeeeeeeeeeeeeeeeee
             May 12 2020 14:58:54 + GMT 05:30
            ----------------------------------------------

现在我想做一个检查,如果日志类型只显示电话,那么它应该计算唯一的日期戳,即如果 2 个日期戳相同,则计数应该是 1 在这种情况下不需要时间戳。所需的输出如下

counter History                                                 count
0        Log Type: customer chat                                 1
         chat history:
            xxxxxxxxx
            xxxxxxx
            xxxxxxxxxxxxxxx
            May 10 2020 23:34:57 +GMT 05:30
            --------------------------------------------
            log type: Phone call
            issue type: xxxxxx
            issue:
             qqqqqqqqqqqq
             qqqqqqqqqqqqqqqqqqqqqqq
             qqqqqqqqqqqqqqq
             May 11 2020 08:54:54 + GMT 05:30
             ----------------------------------------------
             log type: phone call
             issue:
              eeeeeeeeeeeeee
              eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
              eeeeeeee
              eeeeeeeeeee
              eeeeeeeeeeee
              eeeeeeeeeeeeeeeeeee
              May 11 2020 14:58:54 + GMT 05:30
            ----------------------------------
1           Log Type: Phone call                                3
            issue:
            xxxxxxxxx
            xxxxxxx
            xxxxxxxxxxxxxxx
            May 10 2020 23:34:57 +GMT 05:30
            --------------------------------------------
            log type: Phone call
            issue type: xxxxxx
            issue:
             qqqqqqqqqqqq
             qqqqqqqqqqqqqqqqqqqqqqq
             qqqqqqqqqqqqqqq
             May 11 2020 08:54:54 + GMT 05:30
             ----------------------------------------------
             log type: phone call
             issue:
               eeeeeeeeeeeeee
               eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
               eeeeeeee
               eeeeeeeeeee
               eeeeeeeeeeee
               eeeeeeeeeeeeeeeeeee
             May 12 2020 14:58:54 + GMT 05:30
            ----------------------------------------------

我使用的代码是

import datetime
from dateparser.search import search_dates
def extract_sentence(input, word):
    return ".".join((sentence for sentence in input.split("----") if word in sentence))
df2.reset_index(inplace=True)
lst_3=[]
ind_1=[]
for i in range (0,len(df2('counter'):
    matches = search_dates(extract_sentence(df2['History'][i],'Phone call'))
    lst_4=[]
    for x in matches:
        date_string =  x[1]
        lst_4.append(date_string)
    lst_6=[]
    lst_5=[]
    for item in lst_4:
        lst_5.append(str(item))
    for i in lst_5:
        ab=i[0:10]
        lst_6.append(ab)
    res = [i for i in lst_6 if '2020' in i or '2019' in i or '2018' in i or '2017' in i or '2016' in i or '2015' in i]
    lst_8=[]
    lst_8=len(list(set(res)))
    lst_1=[]
    try:
        for match in matches:
            lst=match
            lst_1.append(lst)
    except TypeError:
        continue
    ind=i
    ind_1.append(ind)
    lst_2=len(list(set(lst_1)))
    lst_3.append(lst_2)
                
df3=pd.DataFrame({'Index1': ind_1,'Count3': lst_3})

df2.reset_index(inplace=True)
df2['Index1']= np.arange(len(df2))

df4=pd.merge(df2,df3[['Index1','Count3']],on='Index1',how='left')

我在运行时遇到的错误如下所示

TypeError: can't compare offset-naive and offset-aware datetimes

在这方面需要帮助

4

0 回答 0