2

我无法更改的遗留系统每天会输出 5 Gig 的大部分糟糕的 XML 日志,并破坏我的摄取许可证。有 2 类详细错误每分钟发生 1000 多次,但每隔几分钟就有一个真正有趣的条目。我想大大缩短 sed 中的重复条目,并保留有趣的条目不变

所以我需要
1. 正则表达式来匹配 2 类烦人的日志条目中的每一个(例如 ...'decimal'... 和 ...'DBNull'... 但不是偶尔有趣的)。
一个匹配每个烦人的错误类的正则表达式很好,我可以做 2 次 sed 传递
2。我需要一个带有时间戳的捕获组,这样我就可以用一个简洁的版本替换长 XML 行 - 但要使用正确的时间戳,以免丢失保真度。

我已经做到了与捕获创建日期相匹配:

(?:<Log).*?(createdDate="\d{2}\/\d{2}\/\d{4}.\d{2}:\d{2}:\d{2}").*?(?:decimal).*?(<\/Log>)

这是接近的,但遭受了一种反向贪婪,我将“十进制”匹配到一个开头的日志语句之前的几个条目已经玩过消极的后视,但只是让自己非常头疼

样本数据

<Log type="ERROR" createdDate="11/09/2015 08:13:14" > 
 <![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
  ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:13" > 
 <![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
  ]]></Log> 

<Log type="ERROR" createdDate="11/09/2015 08:13:12" > 
 <![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef, ): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=] 
  Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid.
Parameters:
 [RETURN_VALUE][ReturnValue] Value: [0]
 ---> System.InvalidCastException: Conversion from type 'DBNull' to type 'Long' is not valid.
 ]]></Log> 

 <Log type="ERROR" createdDate="11/09/2015 08:13:11" > 
 <![CDATA[ [129] Services.DService.D.FailedToAddRQ(Exceptionex, RQEntityrQ, RHeaderEntityrHeader, StringPRef, ): FailedToAddRQ()...with parameters [pRef:=123,0,1], [rQ.AffinityCode:=],[Q.thing=thing][rQ.AffinityRQDT:=123],[rHeader.RHeaderIDPK:=123],[rQ.UWriteIDFK:=] 
  Data.DataAccessLayerException: Conversion from type 'DBNull' to type 'Long' is not valid.
  ]]></Log> 

 <Log type="ERROR" createdDate="11/09/2015 08:13:10" > 
 <![CDATA[ [231] An actual interesting log entry with a real error message ]]></Log>

<Log type="ERROR" createdDate="11/09/2015 08:13:09" > 
 <![CDATA[ [108] -- much cruft removed-- SerializationException: There was an error deserializing the object of type Common.DataCtract.QResult. The value '' cannot be parsed as the type 'decimal'. ---> System.Xml.XmlException: The value '' cannot be parsed as the type 'decimal'. ---> System.FormatException: Input string was not in a correct format.
  ]]></Log> 
4

1 回答 1

0

不确定您要寻找什么,但这是一个如何隔离<Log...</Log>块并进行替换的示例:

sed '/^<Log /{:a;/<\/Log>/!{N;ba;};s/>.*\(decimal\|DBNull\).*</>\1</}' file.log

细节:

/^<Log / { # condition: a line that starts with "<Log "
    :a;    # define the label "a"
    /<\/Log>/! { # condition: if the line doesn't contain "</Log>"
        N;       # append the next line to the pattern space
        ba;      # go to the label "a"
    };
    s/>.*\(decimal\|DBNull\).*</>\1</ # replace the block
}

(我假设<Log总是在行首,这与第 10 和 11 秒的记录不同,这可能是拼写错误。)

于 2015-09-13T15:31:00.003 回答