0

我正在通过 Web API 提取日志,到目前为止,当提取日志时,它们以以下格式返回(下面的 3 个事件以 . 开头和结尾。我的问题是循环遍历每一行并连接它们的最佳方法以便结果事件如下所示。

电流输出

<attack_headlines version="1.0.1">
  <attack_headline>
    <site_id>1</site_id>
    <category>V2luZG93cyBEaXJlY3RvcmllcyBhbmQgRmlsZXM=</category>
    <subcategory>SUlTIEhlbHA=</subcategory>
    <client_ip>172.17.1.126</client_ip>
    <date>1363735940</date>
    <gmt_diff>0</gmt_diff>
    <reference_id>6D13-DE3D-9539-8980</reference_id>
  </attack_headline>
</attack_headlines>
<attack_headlines version="1.0.1">
  <attack_headline>
    <site_id>1</site_id>
    <category>V2luZG93cyBEaXJlY3RvcmllcyBhbmQgRmlsZXM=</category>
    <subcategory>SUlTIEhlbHA=</subcategory>
    <client_ip>172.17.1.136</client_ip>
    <date>1363735971</date>
    <gmt_diff>0</gmt_diff>
    <reference_id>6D13-DE3D-9539-8981</reference_id>
  </attack_headline>
</attack_headlines>
<attack_headlines version="1.0.1">
  <attack_headline>
    <site_id>1</site_id>
    <category>V2luZG93cyBEaXJlY3RvcmllcyBhbmQgRmlsZXM=</category>
    <subcategory>SUlTIEhlbHA=</subcategory>
    <client_ip>172.17.1.156</client_ip>
    <date>1363735975</date>
    <gmt_diff>0</gmt_diff>
    <reference_id>6D13-DE3D-9539-8982</reference_id>
  </attack_headline>
</attack_headlines>

预期产出

<attack_headlines version="1.0.1"><attack_headline><site_id>1</site_id<category>V2luZG93cyBEaXJlY3RvcmllcyBhbmQgRmlsZXM=</category<subcategory>SUlTIEhlbHA=</subcategory><client_ip>172.17.1.156</client_ip<date>1363735975</date><gmt_diff>0</gmt_diff<reference_id>6D13-DE3D-9539-8982</reference_id></attack_headline</attack_headlines>

提前致谢!

import json
import os
from suds.transport.https import WindowsHttpAuthenticated

class Helpers:
        def set_connection(self,conf):
                        #SUDS BUG FIXER(doctor)
                        protocol=conf['protocol']
                        hostname=conf['hostname']
                        port=conf['port']
                        path=conf['path']
                        file=conf['file']
                        u_name=conf['login']
                        passwrd=conf['password']
                        auth_type = conf['authType']

                        from suds.xsd.doctor import ImportDoctor, Import
                        from suds.client import Client

                        url = '{0}://{1}:{2}/{3}/{4}?wsdl'.format(protocol,
                        hostname,port, path, file)

                        imp = Import('http://schemas.xmlsoap.org/soap/encoding/')
                        d = ImportDoctor(imp)
                        if(auth_type == 'ntlm'):
                                ntlm = WindowsHttpAuthenticated(username=u_name, password=passwrd)
                                client = Client(url, transport=ntlm, doctor=d)
                        else:
                                client = Client(url, username=u_name, password=passwrd, doctor=d)
                        return client
        def read_from_file(self, filename):
                try:
                        fo = open(filename, "r")
                        try:
                                result = fo.read()
                        finally:
                                fo.close()
                                return result
                except IOError:
                        print "##Error opening/reading file {0}".format(filename)
                        exit(-1)


        def read_json(self,filename):
                string=self.read_from_file(filename)
                return json.loads(string)


        def get_recent_attacks(self, client):
            import time
            import base64
            from xml.dom.minidom import parseString
            epoch_time_now = int(time.time())
            epochtimeread = open('epoch_last', 'r')
            epoch_time_last_read = epochtimeread.read()
            epochtimeread.close()
            epoch_time_last = int(float(epoch_time_last_read))
            print client.service.get_recent_attacks("",epoch_time_last,epoch_time_now,1,"",15)
4

3 回答 3

1

如果这只是一个带有换行符的大字符串对象,您可以简单地删除它们:

import re
text = re.sub('\s*\n\s*', '', text)

要在分隔符后面留下换行符</attack_headline>,请尝试:

re.sub('(?<!<\/attack_headline>)\s*\n\s*', '',  x)
于 2013-03-20T15:23:52.323 回答
1

你可以使用:

oneline = "".join(multiline.split())

编辑 1(我刚刚看到您的编辑) - 我将更改您的代码,如下所示:

with open(filename, "r") as fo:
    result = []
    for line in fo.readlines():
        result.append(line.strip())
    return result

编辑2(我已经阅读了您对另一个答案的评论)-您可以这样做:

with open(filename, "r") as fo:
    partial = []
    for line in fo.readlines():
        if line.startswith("<"):
            yield "".join(partial)
            partial = []
        else:
            clean = line.strip()
            if clean:
                partial.append(clean)
于 2013-03-20T15:27:21.803 回答
0
import re
# remove all newline whitespace stuff as in answer given before:
text = re.sub(r'\s*\n\s*', '', text)
# break again at desired points:
text = re.sub(r'</attack_headlines>', '</attack_headlines>\n', text)
于 2013-03-20T15:56:04.750 回答