给定一个有点特殊的日志文件,由以下片段表示:
FILE (insert): file=Templates\xyz_EN_0615.pdf key=KEY_EN_AP_PAID
FILE (insert): file=Templates\xyz_DE_0615.pdf key=KEY_DE_STD_PAID
FILE (insert): file=Templates\xyz_DE_0615_free.pdf key=KEY_DE_STD_FREE
FILE (insert): file=Templates\xyz_IT_0615.pdf key=KEY_IT_STD_PAID
FILE (insert): file=Templates\xyz_IT_0615_free.pdf key=KEY_IT_STD_FREE
DEBUG: Opening Migration\abc_1.pdf
DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
Jul 31, 2015 5:07:54 PM java.util.prefs.WindowsPreferences <init>
WARNUNG: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
Jul 31, 2015 5:07:55 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNUNG: Using fallback font ArialMT for base font ZapfDingbats
DEBUG: Writing Migration\abc_1-migrated.pdf
PERFORMANCE: [OVERALL completed in 2303ms]
DEBUG: Opening Migration\abc_2_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
Field not available: Reset_1
Field not available: Print
DEBUG: Writing Migration\abc_2_DE-migrated.pdf
PERFORMANCE: [OVERALL completed in 756ms]
DEBUG: Opening Migration\abc_3_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
DEBUG: Writing Migration\abc_3-migrated.pdf
PERFORMANCE: [OVERALL completed in 660ms]
DEBUG: Opening Migration\abc_4.pdf
DEBUG: Opening Templates\xyz_EN_0615_free.pdf
null
DEBUG: Opening Migration\abc_5.pdf
DEBUG: Opening Templates\xyz_EN_0615_free.pdf
null
DEBUG: Opening Migration\abc_6_DE.pdf
DEBUG: Opening Templates\xyz_DE_0615_free.pdf
Field not available: Text6
Field not available: Text7
Field not available: Text8
Field not available: Text9
Field not available: Text10
Field not available: Text11
DEBUG: Writing Migration\abc_6-migrated.pdf
PERFORMANCE: [OVERALL completed in 686ms]
null
%EOF
为了分析自动化 PDF 表单字段转换服务运行的准确度,我需要过滤并计算以下 4 元组的所有出现次数:
DEBUG: Opening Migration\abc_1.pdf
DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
DEBUG: Writing Migration\abc_1-migrated.pdf
PERFORMANCE: [OVERALL completed in 2303ms]
在最后的 4 元组之间可以有任意数量的行,可以跳过或添加到无效日志条目列表中。简单的选择标准被硬编码到下面的代码中。
接下来,日志文件应随后拆分为有效条目和无效条目,包括行号。当前程序针对上述示例运行的输出将输出:
Statistics: Valid[tuples]=4 Valid[lines]=16 Invalid[lines]=8 Skipped[lines]=17 Total[lines]=41
----------------------[VALID]----------------------
key=6 value=DEBUG: Opening Migration\abc_1.pdf
key=7 value=DEBUG: Opening Templates\xyz_DE_0615_kostenlos.pdf
key=12 value=DEBUG: Writing Migration\abc_1-migrated.pdf
key=13 value=PERFORMANCE: [OVERALL completed in 2303ms]
key=14 value=DEBUG: Opening Migration\abc_2_DE.pdf
key=15 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=18 value=DEBUG: Writing Migration\abc_2_DE-migrated.pdf
key=19 value=PERFORMANCE: [OVERALL completed in 756ms]
key=20 value=DEBUG: Opening Migration\abc_3_DE.pdf
key=21 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=22 value=DEBUG: Writing Migration\abc_3-migrated.pdf
key=23 value=PERFORMANCE: [OVERALL completed in 660ms]
key=30 value=DEBUG: Opening Migration\abc_6_DE.pdf
key=31 value=DEBUG: Opening Templates\xyz_DE_0615_free.pdf
key=38 value=DEBUG: Writing Migration\abc_6-migrated.pdf
key=39 value=PERFORMANCE: [OVERALL completed in 686ms]
----------------------[VALID]----------------------
----------------------[INVALID]----------------------
key=24 value=DEBUG: Opening Migration\abc_4.pdf
key=25 value=DEBUG: Opening Templates\xyz_EN_0615_free.pdf
key=26 value=null
key=27 value=DEBUG: Opening Migration\abc_5.pdf
key=28 value=DEBUG: Opening Templates\xyz_EN_0615_free.pdf
key=29 value=null
key=40 value=null
key=41 value=%EOF
----------------------[INVALID]----------------------
这是我的方法:
import org.testng.annotations.Test;
import java.io.*;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
public class AnalyseMigrationLog {
public class RingMap<K, V> extends LinkedHashMap<K, V> {
private int cacheSize;
public RingMap(int cacheSize) {
super(cacheSize);
this.cacheSize = cacheSize;
}
@Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > cacheSize;
}
}
@Test
public void doAnalysis() throws IOException {
final String logfile = "./run-simple.log";
final int ringSize = 4;
int lc = 0;
int skipped = 0;
Long count;
String line;
Map<Integer, String> circularFifo = new RingMap<>(ringSize);
Map<Integer, String> validTuples = new LinkedHashMap<>();
Map<Integer, String> invalidTuples = new LinkedHashMap<>();
FileReader fre = new FileReader(logfile);
BufferedReader bre = new BufferedReader(fre);
while ((line = bre.readLine ()) != null) {
lc++;
if (line.matches("^(FILE \\(insert\\):|WARNUNG|Field not available).*") || line.endsWith("<init>")) {
skipped++;
continue;
}
circularFifo.put(lc, line);
if (circularFifo.size() < ringSize)
continue;
count = circularFifo.values().stream().
filter(p -> p.matches("^(DEBUG: Opening|DEBUG: Writing|PERFORMANCE:).*")).count();
// Get the LRU entry in the circular fifo
List<Map.Entry<Integer, String>> entryList = new ArrayList<>(circularFifo.entrySet());
Map.Entry<Integer, String> lastEntry = entryList.get(entryList.size() - 1);
if (count == ringSize && lastEntry.getValue().startsWith("PERFORMANCE:")) {
validTuples.putAll(circularFifo);
// Remove already pushed entries from invalidTuples list to avoid duplicate entries
circularFifo.forEach((key, value) -> invalidTuples.remove(key));
circularFifo.clear();
} else {
invalidTuples.putAll(circularFifo);
}
}
// Put in the last entries that didn't fill up the circular fifo anymore.
invalidTuples.putAll(circularFifo);
bre.close();
fre.close();
System.out.printf("Statistics: Valid[tuples]=%s Valid[lines]=%s Invalid[lines]=%s Skipped[lines]=%s Total[lines]=%s%n",
validTuples.size()/ringSize, validTuples.size(), invalidTuples.size(), skipped, lc);
System.out.printf("----------------------[VALID]----------------------%n");
validTuples.forEach((key, value) -> System.out.printf("key=%s value=%s%n", key, value));
System.out.printf("----------------------[VALID]----------------------%n");
System.out.printf("----------------------[INVALID]----------------------%n");
invalidTuples.forEach((key, value) -> System.out.printf("key=%s value=%s%n", key, value));
System.out.printf("----------------------[INVALID]----------------------%n");
}
}
基本技巧是为此任务引入循环先进先出。虽然简短、快速且运行良好,但我想知道是否可以将其更充分地转换为 Java-8 功能,例如使用 NIO2 和适当的流技术。我不想使用 Guava 或任何其他过度设计的库来完成如此简单的任务。
现在,我特别不喜欢像上面那样获取 LRU 条目的解决方案。我如何能够通过以下方式扩展和使用内部类:
public class RingMap<K, V> extends LinkedHashMap<K, V> {
private int cacheSize;
public RingMap(int cacheSize) {
super(cacheSize);
this.cacheSize = cacheSize;
}
@Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > cacheSize;
}
//TODO: how exactly would this work?
public <K, V> Map.Entry<K,V> getLast(LinkedHashMap<K, V> map) {
Map.Entry<K, V> result = null;
for (Map.Entry<K, V> kvEntry : map.entrySet()) {
result = kvEntry;
}
return result;
}
}
接下来,我真的很想利用 NIO2 的特性,但是我不明白如何最好地将它们集成到我的解决方案中。类似于以下内容:
@Test
public void doAnalysisNIO2() throws IOException {
final String logfile = "./run-simple.log";
Path path = Paths.get(logfile);
try (Stream<String> filteredLines = Files.lines(path, StandardCharsets.UTF_8)
.onClose(() -> System.out.println("Stream has been closed!"))
.filter(s -> !(s.matches("^(FILE \\(insert\\):|WARNUNG|Field not available).*") ||
s.endsWith("<init>")))) {
// Do the same thing as in the other code
filteredLines.forEach((l) -> System.out.printf("line = %s%n", l));
}
}