java - 使用多行匹配标准改进日志文件解析器

Question

给定一个有点特殊的日志文件，由以下片段表示：

FILE (insert): file=Templates\xyz_EN_0615.pdf key=KEY_EN_AP_PAID
FILE (insert): file=Templates\xyz_DE_0615.pdf key=KEY_DE_STD_PAID
FILE (insert): file=Templates\xyz_DE_0615_free.pdf key=KEY_DE_STD_FREE
FILE (insert): file=Templates\xyz_IT_0615.pdf key=KEY_IT_STD_PAID
FILE (insert): file=Templates\xyz_IT_0615_free.pdf key=KEY_IT_STD_FREE
DEBUG: Opening Migration\abc_1.pdf
DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
Jul 31, 2015 5:07:54 PM java.util.prefs.WindowsPreferences <init>
WARNUNG: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
Jul 31, 2015 5:07:55 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNUNG: Using fallback font ArialMT for base font ZapfDingbats
DEBUG: Writing Migration\abc_1-migrated.pdf
PERFORMANCE: [OVERALL completed in 2303ms]
DEBUG: Opening Migration\abc_2_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
Field not available: Reset_1
Field not available: Print
DEBUG: Writing Migration\abc_2_DE-migrated.pdf
PERFORMANCE: [OVERALL completed in 756ms]
DEBUG: Opening Migration\abc_3_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
DEBUG: Writing Migration\abc_3-migrated.pdf
PERFORMANCE: [OVERALL completed in 660ms]
DEBUG: Opening Migration\abc_4.pdf
DEBUG: Opening Templates\xyz_EN_0615_free.pdf
null
DEBUG: Opening Migration\abc_5.pdf
DEBUG: Opening Templates\xyz_EN_0615_free.pdf
null
DEBUG: Opening Migration\abc_6_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
Field not available: Text6
Field not available: Text7
Field not available: Text8
Field not available: Text9
Field not available: Text10
Field not available: Text11
DEBUG: Writing Migration\abc_6-migrated.pdf
PERFORMANCE: [OVERALL completed in 686ms]
null
%EOF

为了分析自动化 PDF 表单字段转换服务运行的准确度，我需要过滤并计算以下 4 元组的所有出现次数：

DEBUG: Opening Migration\abc_1.pdf
DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
DEBUG: Writing Migration\abc_1-migrated.pdf
PERFORMANCE: [OVERALL completed in 2303ms]

在最后的 4 元组之间可以有任意数量的行，可以跳过或添加到无效日志条目列表中。简单的选择标准被硬编码到下面的代码中。

接下来，日志文件应随后拆分为有效条目和无效条目，包括行号。当前程序针对上述示例运行的输出将输出：

Statistics: Valid[tuples]=4 Valid[lines]=16 Invalid[lines]=8 Skipped[lines]=17 Total[lines]=41
----------------------[VALID]----------------------
key=6 value=DEBUG: Opening Migration\abc_1.pdf
key=7 value=DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
key=12 value=DEBUG: Writing Migration\abc_1-migrated.pdf
key=13 value=PERFORMANCE: [OVERALL completed in 2303ms]
key=14 value=DEBUG: Opening Migration\abc_2_DE.pdf
key=15 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=18 value=DEBUG: Writing Migration\abc_2_DE-migrated.pdf
key=19 value=PERFORMANCE: [OVERALL completed in 756ms]
key=20 value=DEBUG: Opening Migration\abc_3_DE.pdf
key=21 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=22 value=DEBUG: Writing Migration\abc_3-migrated.pdf
key=23 value=PERFORMANCE: [OVERALL completed in 660ms]
key=30 value=DEBUG: Opening Migration\abc_6_DE.pdf
key=31 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=38 value=DEBUG: Writing Migration\abc_6-migrated.pdf
key=39 value=PERFORMANCE: [OVERALL completed in 686ms]
----------------------[VALID]----------------------
----------------------[INVALID]----------------------
key=24 value=DEBUG: Opening Migration\abc_4.pdf
key=25 value=DEBUG: Opening Templates\xyz_EN_0615_free.pdf
key=26 value=null
key=27 value=DEBUG: Opening Migration\abc_5.pdf
key=28 value=DEBUG: Opening Templates\xyz_EN_0615_free.pdf
key=29 value=null
key=40 value=null
key=41 value=%EOF
----------------------[INVALID]----------------------

这是我的方法：

import org.testng.annotations.Test;

import java.io.*;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

public class AnalyseMigrationLog {

    public class RingMap<K, V> extends LinkedHashMap<K, V> {
        private int cacheSize;

        public RingMap(int cacheSize) {
            super(cacheSize);
            this.cacheSize = cacheSize;
        }

        @Override
        protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
            return size() > cacheSize;
        }
    }

    @Test
    public void doAnalysis() throws IOException {
        final String logfile = "./run-simple.log";
        final int ringSize = 4;
        int lc = 0;
        int skipped = 0;
        Long count;
        String line;
        Map<Integer, String> circularFifo = new RingMap<>(ringSize);
        Map<Integer, String> validTuples = new LinkedHashMap<>();
        Map<Integer, String> invalidTuples = new LinkedHashMap<>();

        FileReader     fre = new FileReader(logfile);
        BufferedReader bre = new BufferedReader(fre);
        while ((line = bre.readLine ()) != null) {
            lc++;
            if (line.matches("^(FILE \\(insert\\):|WARNUNG|Field not available).*") || line.endsWith("<init>")) {
                skipped++;
                continue;
            }
            circularFifo.put(lc, line);
            if (circularFifo.size() < ringSize)
                continue;

            count = circularFifo.values().stream().
                    filter(p -> p.matches("^(DEBUG: Opening|DEBUG: Writing|PERFORMANCE:).*")).count();

            // Get the LRU entry in the circular fifo
            List<Map.Entry<Integer, String>> entryList = new ArrayList<>(circularFifo.entrySet());
            Map.Entry<Integer, String> lastEntry = entryList.get(entryList.size() - 1);

            if (count == ringSize && lastEntry.getValue().startsWith("PERFORMANCE:")) {
                validTuples.putAll(circularFifo);
                // Remove already pushed entries from invalidTuples list to avoid duplicate entries
                circularFifo.forEach((key, value) -> invalidTuples.remove(key));
                circularFifo.clear();
            } else {
                invalidTuples.putAll(circularFifo);
            }
        }
        // Put in the last entries that didn't fill up the circular fifo anymore.
        invalidTuples.putAll(circularFifo);
        bre.close();
        fre.close();

        System.out.printf("Statistics: Valid[tuples]=%s Valid[lines]=%s Invalid[lines]=%s Skipped[lines]=%s Total[lines]=%s%n",
                validTuples.size()/ringSize, validTuples.size(), invalidTuples.size(), skipped, lc);

        System.out.printf("----------------------[VALID]----------------------%n");
        validTuples.forEach((key, value) -> System.out.printf("key=%s value=%s%n", key, value));
        System.out.printf("----------------------[VALID]----------------------%n");

        System.out.printf("----------------------[INVALID]----------------------%n");
        invalidTuples.forEach((key, value) -> System.out.printf("key=%s value=%s%n", key, value));
        System.out.printf("----------------------[INVALID]----------------------%n");
    }
}

基本技巧是为此任务引入循环先进先出。虽然简短、快速且运行良好，但我想知道是否可以将其更充分地转换为 Java-8 功能，例如使用 NIO2 和适当的流技术。我不想使用 Guava 或任何其他过度设计的库来完成如此简单的任务。

现在，我特别不喜欢像上面那样获取 LRU 条目的解决方案。我如何能够通过以下方式扩展和使用内部类：

public class RingMap<K, V> extends LinkedHashMap<K, V> {
    private int cacheSize;

    public RingMap(int cacheSize) {
        super(cacheSize);
        this.cacheSize = cacheSize;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
        return size() > cacheSize;
    }

    //TODO: how exactly would this work?
    public <K, V> Map.Entry<K,V> getLast(LinkedHashMap<K, V> map) {
        Map.Entry<K, V> result = null;
        for (Map.Entry<K, V> kvEntry : map.entrySet()) {
            result = kvEntry;
        }
        return result;
    }
}

接下来，我真的很想利用 NIO2 的特性，但是我不明白如何最好地将它们集成到我的解决方案中。类似于以下内容：

@Test
public void doAnalysisNIO2() throws IOException {
    final String logfile = "./run-simple.log";

    Path path = Paths.get(logfile);
    try (Stream<String> filteredLines = Files.lines(path, StandardCharsets.UTF_8)
            .onClose(() -> System.out.println("Stream has been closed!"))
            .filter(s -> !(s.matches("^(FILE \\(insert\\):|WARNUNG|Field not available).*") ||
                           s.endsWith("<init>")))) {
        // Do the same thing as in the other code
        filteredLines.forEach((l) -> System.out.printf("line = %s%n", l));
    }
}

java - 使用多行匹配标准改进日志文件解析器

0 回答 0

Related

Reference