1

I have a txt file that looks like this:

0.065998       81   
0.319601      81   
0.539613      81  
0.768445      81  
1.671893      81  
1.785064      81  
1.881242      954  
1.921503      193  
1.921605      188  
1.943166      81  
2.122283      63  
2.127669      83  
2.444705      81  

The first column is the packet arrival and second packet size in bytes.

I need to get the average value of bytes in each second. For example in the first second I have only packets with value 81 so the average bitrate is 81*8= 648bit/s. Then I should plot a graph x axis time in seconds, y axis average bitrate in each second.

So far I have only managed to upload my data as arrays:

import numpy as np

d = np.genfromtxt('data.txt')

x = (d[:,0])  
y = (d[:,1 ])

print x  
print(y*8)

I'm new to Python, so any help where to start would be much appreciated!

Here is the result script:

import matplotlib.pyplot as plt  
import numpy as np  
x, y = np.loadtxt('data.txt', unpack=True)  
bins = np.arange(60+1)  
totals, edges = np.histogram(x, weights=y, bins=bins)  
counts, edges = np.histogram(x, bins=bins)  

print counts  
print totals*0.008/counts  

plt.plot(totals*0.008/counts, 'r')  
plt.xlabel('time, s')  
plt.ylabel('kbit/s')  
plt.grid(True)  
plt.xlim(0.0, 60.0)  
plt.show()      

The script reads the .txt file which contains packet size(bytes) and arrival time, and plots the average bitrate/s during a time period. Used to monitor server incoming/outgoing traffic!

4

3 回答 3

5

您的数据已经按时间排序,所以我可能只itertools.groupby用于这个:

from itertools import groupby
with open('data.txt') as d:
     data = ([float(x) for x in line.split()] for line in d)
     for i_time,packet_info in groupby(data,key=lambda x:int(x[0])):
         print i_time, sum(x[1] for x in packet_info)

输出是:

0 324.0
1 1578.0
2 227.0
于 2013-02-01T18:17:52.567 回答
4

如果你想使用numpy,你可以使用numpy.histogram

>>> import numpy as np
>>> x, y = np.loadtxt('data.txt', unpack=True)
>>> bins = np.arange(10+1)
>>> totals, edges = np.histogram(x, weights=y, bins=bins)
>>> totals
array([  324.,  1578.,   227.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.])

这给出了每个 bin 中的总数,您可以除以 bin 的宽度以获得近似的瞬时速率:

>>> totals/np.diff(bins)
array([  324.,  1578.,   227.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.])

(好吧,因为 bin 宽度都是一,这不是很有趣。)

[更新]

我不确定我是否理解你的后续评论,即你需要每秒的平均数据包大小——我没有在你的问题中看到任何地方提到过,但我因错过了明显的.. :-/在任何情况下,如果您想要时间箱中的数据包数量,那么您根本不需要设置权重(默认为 1):

>>> counts, edges = np.histogram(x, bins=bins)
>>> counts
array([4, 6, 3, 0, 0, 0, 0, 0, 0, 0])

其中 counts 是到达每个 bin 的数据包数。

于 2013-02-01T18:38:11.673 回答
0

由于到达时间是不规则的,我建议将它们量化为整数秒,然后聚合给定秒内所有到达的总字节数。完成此操作后,绘图和其他分析变得容易得多。

于 2013-02-01T18:20:21.990 回答