0

我有一个小的 Python 脚本需要修改,因为度量文件的格式略有变化。我根本不了解 Python,并试图诚实地自己修复它。这些更改对我来说很有意义,但显然脚本仍然存在一个问题。否则,其他一切都在工作。脚本如下所示:

import sys
import datetime

##########################################################################

now = datetime.datetime.now();
logFile = now.strftime("%Y%m%d")+'.QE-Metric.log';

underlyingParse = True;
strParse = "UNDERLYING_TICK";
if (len(sys.argv) == 2):
    if sys.argv[1] == '2':
    strParse = "ORDER_SHOOT";
        underlyingParse = False;
elif (len(sys.argv) == 3):
    logFile = sys.argv[2];    
    if sys.argv[1] == '2':
    strParse = "ORDER_SHOOT";
        underlyingParse = False;
else:
    print 'Incorrect number of arguments. Usage: <exec> <mode (1) Underlying (2) OrderShoot> <FileName (optional)>'
    sys.exit()

##########################################################################

# Read the deployment file
FIput = open(logFile, 'r');
FOput = open('ParsedMetrics.txt', 'w');

##########################################################################

def ParseMetrics( file_lines ):

    ii = 0
    tokens = []; 
    for ii in range(len(file_lines)):

        line = file_lines[ii].strip()

        if (line.find(strParse) != -1):

             tokens = line.split(",");
             currentTime = float(tokens[2])

             if (underlyingParse == True and ii != 0):
                 newIndex = ii-1
                 prevLine = file_lines[newIndex].strip()
                 while (prevLine.find("ORDER_SHOOT") != -1 and newIndex > -1):
                     newIndex -= 1;
                     tokens = prevLine.split(",");
                     currentTime -= float(tokens[2]);
                     prevLine = file_lines[newIndex].strip();

         if currentTime > 0:
                 FOput.write(str(currentTime) + '\n')

##########################################################################

file_lines = FIput.readlines()
ParseMetrics( file_lines );

print 'Metrics parsed and written to ParsedMetrics.txt'

一切工作正常,除了应该反向迭代前几行以添加自上次发生 UNDERLYING_TICK 事件以来的 ORDER_SHOOT 数字的逻辑(从代码开始:if (underlyingParse == True and ii != 0):.. .) 然后从正在处理的当前 UNDERLYING_TICK 事件行中减去该总数。这是正在解析的文件中的典型行的样子:

08:40:02.039387(+26): UNDERLYING_TICK, 1377, 1499.89

基本上,我只对最后一个数据元素(1499.89)感兴趣,它是以微秒为单位的时间。我知道这一定很愚蠢。我只需要另一双眼睛。谢谢!

4

2 回答 2

0

目前还不清楚您的输出有什么问题,因为您没有显示您的输出,我们也无法真正理解您的输入。

我假设以下内容:

  1. 行被格式化为“absolutetime: TYPE, positiveinteger, float_time_duration_in_ms”,其中最后一项是事物所花费的时间。
  2. 行按“绝对时间”排序。因此,属于 UNDERLYING_TICK 的 ORDER_SHOOT 始终位于自最后一个 UNDERLYING_TICK(或文件开头)以来的行上,并且位于那些行上。如果这个假设成立,那么您需要先对文件进行排序。您可以使用单独的程序(例如管道输出sort)来执行此操作,或者使用该bisect模块来存储排序的行并轻松提取相关行。

如果这两个假设都成立,请查看以下脚本。(未经测试,因为我没有大的输入样本或输出样本可供比较。)

这是一种更 Pythonic 的风格,更容易阅读和理解,不使用全局变量作为函数参数,并且应该更高效,因为它不会向后迭代行或将整个文件加载到内存中解析它。

它还演示了如何使用该argparse模块进行命令行解析。这不是必需的,但如果您有很多命令行 Python 脚本,您应该熟悉它。

import sys

VALIDTYPES = ['UNDERLYING_TICK','ORDER_SHOOT']

def parseLine(line):
    # format of `tokens`:
    # 0 = absolute timestamp
    # 1 = event type
    # 2 = ???
    # 3 = timedelta (microseconds)
    tokens = [t.strip(':, \t') for t in line.strip().split()]
    if tokens[1] not in VALIDTYPES:
        return None
    tokens[2] = int(tokens[2])
    tokens[3] = float(tokens[3])
    return tuple(tokens)

def parseMetrics(lines, parsetype):
    """Yield timedelta for each line of specified type

    If parsetype is 'UNDERLYING_TICK', subtract previous ORDER_SHOOT 
    timedeltas from the current UNDERLYING_TICK delta before yielding
    """
    order_shoots_between_ticks = []
    for line in lines:
        tokens = parseLine(line)
        if tokens is None:
            continue # go home early
        if parsetype=='UNDERLYING_TICK':
            if tokens[1]=='ORDER_SHOOT':
                order_shoots_between_ticks.append(tokens)
            elif tokens[1]=='UNDERLYING_TICK':
                adjustedtick = tokens[3] - sum(t[3] for t in order_shoots_between_ticks)
                order_shoots_between_ticks = []
                yield adjustedtick
        elif parsetype==tokens[1]:
            yield tokens[3]

def parseFile(instream, outstream, parsetype):
    printablelines = ("{0:f}\n".format(time) for time in parseMetrics(instream, parsetype))
    outstream.writelines(printablelines)

def main(argv):
    import argparse, datetime
    parser = argparse.ArgumentParser(description='Output timedeltas from a QE-Metric log file')
    parser.add_argument('mode', type=int, choices=range(1, len(VALIDTYPES)+1),
        help="the types to parse. Valid values are: 1 (Underlying), 2 (OrderShoot)")
    parser.add_argument('infile', required=False,
        default='{}.QE-Metric.log'.format(datetime.datetime.now().strftime('%Y%m%d'))
        help="the input file. Defaults to today's file: YYYYMMDD.QE-Metric.log. Use - for stdin.")
    parser.add_argument('outfile', required=False,
        default='ParsedMetrics.txt',
        help="the output file. Defaults to ParsedMetrics.txt. Use - for stdout.")
    parser.add_argument('--verbose', '-v', action='store_true')
    args = parser.parse_args(argv)

    args.mode = VALIDTYPES[args.mode-1]

    if args.infile=='-':
        instream = sys.stdin
    else:
        instream = open(args.infile, 'rb')

    if args.outfile=='-':
        outstream = sys.stdout
    else:
        outstream = open(args.outfile, 'wb')

    parseFile(instream, outstream, args.mode)

    instream.close()
    outstream.close()

    if args.verbose:
        sys.stderr.write('Metrics parsed and written to {0}\n'.format(args.outfile))



if __name__=='__main__':
    main(sys.argv[1:])
于 2012-07-16T16:37:38.363 回答
0

因此,如果命令行选项为 2,该函数会创建一个输出文件,其中所有行仅包含输入文件中包含“order_shoot”标记的行的“时间”部分?

如果命令行选项为 1,该函数会为输入文件中包含“underlying_tick”标记的每一行创建一个输出文件,除了你想要的数字是 underlying_tick 时间值减去所有 order_shoot 时间值发生在前面的基础标记值之后(或者如果这是第一个,则从文件的开头开始)?

如果这是正确的,并且所有行都是唯一的(没有重复),那么我建议使用以下重写脚本:

#### Imports unchanged.

import sys 
import datetime 

#### Changing the error checking to be a little simpler.
#### If the number of args is wrong, or the "mode" arg is
#### not a valid option, it will print the error message
#### and exit.

if len(sys.argv) not in (2,3) or sys.argv[2] not in (1,2):
    print 'Incorrect arguments. Usage: <exec> <mode (1) Underlying (2) OrderShoot> <FileName (optional)>'
    sys.exit()  

#### the default previously specified in the original code.

now = datetime.datetime.now()

#### Using ternary logic to set the input file to either
#### the files specified in argv[2] (if it exists), or to
#### the default previously specified in the original code.

FIput = open((sys.argv[2] if len(sys.argv)==3 
                          else now.strftime("%Y%m%d")+'.QE-Metric.log'), 'r');

#### Output file not changed.

FOput = open('ParsedMetrics.txt', 'w');

#### START RE-WRITTEN FUNCTION

def ParseMetrics(file_lines,mode): 

#### The function now takes two params - the lines from the 
#### input file, and the 'mode' - whichever the user selected
#### at run-time. As you can see from the call down below, this
#### is taken straight from argv[1]. 

    if mode == '1':

#### So if we're doing underlying_tick mode, we want to find each tick,
#### then for each tick, sum the preceding order_shoots since the last
#### tick (or start of file for the first tick).

        ticks = [file_lines.index(line) for line in file_lines \
                                        if 'UNDERLYING_TICK' in line]

#### The above list comprehension iterates over file_lines, and creates
#### a list of the indexes to file_lines elements that contain ticks.
#### 
#### Then the following loop iterates over ticks, and for each tick,
#### subtracts the sum of all times for order_shoots that occure prior
#### to the tick, from the time value of the tick itself. Then that
#### value is written to the outfile.

        for tick in ticks:
            sub_time = float(file_lines[tick].split(",")[2]) - \
                       sum([float(line.split(",")[2]) \ 
                       for line in file_lines if "ORDER_SHOOT" in line \
                       and file_lines.index(line) <= tick]
            FOput.write(float(line.split(",")[2]))    

#### if the mode is 2, then it just runs through file_lines and
#### outputs all of the order_shoot time values.

    if mode == '2':
        for line in file_lines:
            if 'ORDER_SHOOT' in line:
                FOput.write(float(line.split(",")[2]))

#### END OF REWRITTEN FUNCTION

#### As you can see immediately below, we pass sys.argv[2] for the
#### mode argument of the ParseMetrics function.

ParseMetrics(FIput.readlines(),sys.argv[2])

print 'Metrics parsed and written to ParsedMetrics.txt' 

这应该可以解决问题。主要问题是,如果您有任何带有“UNDERLYING_TICK”的行与任何其他此类行完全相同,那么这将不起作用。需要应用不同的逻辑来获得正确的索引。

我相信有办法让这变得更好,但这是我的第一个想法。

还值得注意的是,为了便于阅读,我在上面的源代码中添加了很多内联换行符,但如果你按照书面方式使用它们,你可能想要拉它们。

于 2012-07-16T17:02:35.510 回答