2

我是熊猫的新手,我不知道如何处理这个问题。我正在通过帮助台系统分析工单流程。原始数据如下所示(有更多列,有时跨越数天):

    TicketNo SvcGroup           CreatedAt                   ClosedAt
0    4237941     Unix 2013-07-28 03:55:00 2013-07-28 11:01:37.346438
1    4238041  Windows 2013-07-28 04:59:00 2013-07-28 18:25:02.193182
2    4238051  Windows 2013-07-28 05:09:00 2013-07-28 23:11:12.003673
3    4238291  Windows 2013-07-28 05:10:00 2013-07-28 05:32:51.547251
4    4238321     Unix 2013-07-28 01:15:00        2013-07-28 10:09:20
5    4238331     Unix 2013-07-28 01:53:00 2013-07-28 17:42:56.192088
6    4238561  Windows 2013-07-28 02:03:00 2013-07-28 06:34:09.455042
7    4238691  Windows 2013-07-28 02:03:00 2013-07-28 20:54:47.306731
8    4238811  Windows 2013-07-28 03:23:00 2013-07-28 13:15:20.823505
9    4238851  Windows 2013-07-28 04:16:00 2013-07-28 23:51:55.561463
10   4239011     Unix 2013-07-28 04:26:00 2013-07-28 09:27:06.275342
11   4239041  Windows 2013-07-28 04:38:00 2013-07-28 07:55:34.416621
12   4239131     Unix 2013-07-28 08:15:00 2013-07-28 08:46:42.380739
13   4239141  Windows 2013-07-28 01:08:00 2013-07-28 15:37:12.266341

我想按小时查看数据,以了解工单如何按班次流经帮助台 - 所以中间步骤可能是这样的:

                        Opened  Open  Closed  CarryFwd
TicketNo SvcGroup Hour
4237941  Unix     3          1     1       0         1
                  4          0     1       0         1
                  5          0     1       0         1
                  6          0     1       0         1
                  7          0     1       0         1
                  8          0     1       0         1
                  9          0     1       0         1
                  10         0     1       0         1
                  11         0     1       1         0
4239041  Windows  4          1     1       0         1
                  5          0     1       0         1
                  6          0     1       0         1
                  7          0     1       1         0

最终结果如下(来自上述分组):

               Opened  Closed  CarryFwd
SvcGroup Hour
Unix     3          6       7        47
         4          7      10        44
         5          1       6        39
         6         11       2        48
         7          7       3        52
         8          5       5        52
         9          5      11        46
Windows  3          6       7        22
         4          3      10        15
         5          5       2        18
         6          6       2        22
         7         11      11        22
         8          2       4        20
         9          0       2        18   

注意:这是按小时细分的,但我可能想按天、周等来查看它。一旦我到达上面,我就可以判断一个服务组是否正在取得进展,落后等。

关于如何解决这个问题的任何想法?我真的无法弄清楚的部分是如何将 CreatedAt 设置为 ClosedAt 持续时间并按离散的时间间隔(小时等)将其分解......

非常感谢任何指导。谢谢。

4

2 回答 2

0

这是另一种方式...

创建一个函数,该函数接受一行并创建以下相应的 DataFrame:

def sparse_opened_closed(row):
    opened_hour, closed_hour = row['CreatedAt'].hour, row['ClosedAt'].hour
    hours = xrange(opened_hour, closed_hour + 1)
    index = pd.MultiIndex.from_tuples((row['TicketNo'], row['SvcGroup'], h) for h in hours])
    opened, closed = np.zeros_like(hours), np.zeros_like(hours)
    opened[0], closed[-1] = 1, 1
    open, carry = np.ones_like(hours), np.ones_like(hours)
    carry[-1] = 0
    return pd.DataFrame({'Opened': opened, 'Open': open, 'Closed': closed, 'CarryFwd': carry}, index=index)

你当然可以提高效率。

现在,遍历每一行并连接:

In [11]: pd.concat(sparse_opened_closed(row) for _, row in df.iterrows()).head(10)
Out[11]:
                    CarryFwd  Closed  Open  Opened
4237941 Unix    3          1       0     1       1
                4          1       0     1       0
                5          1       0     1       0
                6          1       0     1       0
                7          1       0     1       0
                8          1       0     1       0
                9          1       0     1       0
                10         1       0     1       0
                11         0       1     1       0
4238041 Windows 4          1       0     1       1
于 2013-09-10T07:25:10.523 回答
0

这只是部分答案。

读入您的数据,注意必须结合 2 个日期/时间列

In [75]: df = read_csv(StringIO(data),sep='\s+',skiprows=1,parse_dates=[[3,4],[5,6]],header=None)

In [76]: df.columns = ['created','closed','idx','num','typ']

In [77]: df
Out[77]: 
               created                     closed  idx      num      typ
0  2013-07-28 03:55:00 2013-07-28 11:01:37.346438    0  4237941     Unix
1  2013-07-28 04:59:00 2013-07-28 18:25:02.193182    1  4238041  Windows
2  2013-07-28 05:09:00 2013-07-28 23:11:12.003673    2  4238051  Windows
3  2013-07-28 05:10:00 2013-07-28 05:32:51.547251    3  4238291  Windows
4  2013-07-28 01:15:00        2013-07-28 10:09:20    4  4238321     Unix
5  2013-07-28 01:53:00 2013-07-28 17:42:56.192088    5  4238331     Unix
6  2013-07-28 02:03:00 2013-07-28 06:34:09.455042    6  4238561  Windows
7  2013-07-28 02:03:00 2013-07-28 20:54:47.306731    7  4238691  Windows
8  2013-07-28 03:23:00 2013-07-28 13:15:20.823505    8  4238811  Windows
9  2013-07-28 04:16:00 2013-07-28 23:51:55.561463    9  4238851  Windows
10 2013-07-28 04:26:00 2013-07-28 09:27:06.275342   10  4239011     Unix
11 2013-07-28 04:38:00 2013-07-28 07:55:34.416621   11  4239041  Windows
12 2013-07-28 08:15:00 2013-07-28 08:46:42.380739   12  4239131     Unix
13 2013-07-28 01:08:00 2013-07-28 15:37:12.266341   13  4239141  Windows

In [78]: df.dtypes
Out[78]: 
created    datetime64[ns]
closed     datetime64[ns]
idx                 int64
num                 int64
typ                object
dtype: object

对于每个偶数,将 1 放在它所在的范围内(创建关闭)。用 0 填充 nan。

In [82]: m = df.apply(lambda x: Series(1,index=np.arange(x['created'].hour,x['closed'].hour+1)),axis=1).fillna(0)

In [81]: m
Out[81]: 
    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23
0    0   0   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0
1    0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0
2    0   0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
3    0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4    1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0
5    1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0
6    0   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
7    0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0
8    0   0   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0
9    0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
10   0   0   0   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0
11   0   0   0   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
12   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
13   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0

将其加入原始数据集并设置索引

在 [83] 中:y = df[['num','typ']].join(m).set_index(['num','typ'])

In [84]: y
Out[84]: 
                 1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23
num     typ                                                                                                
4237941 Unix      0   0   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0
4238041 Windows   0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0
4238051 Windows   0   0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
4238291 Windows   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4238321 Unix      1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0
4238331 Unix      1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0
4238561 Windows   0   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4238691 Windows   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0
4238811 Windows   0   0   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0
4238851 Windows   0   0   0   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
4239011 Unix      0   0   0   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4239041 Windows   0   0   0   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4239131 Unix      0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4239141 Windows   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   0   0   0   0   0   0   0   0

此时您可以进行计算

打开/关闭是直接的边缘检测。携带 Fwd 只是m.where(m==1)

于 2013-09-08T16:33:08.997 回答