java - 解析 .txt 文件（考虑性能指标）

Question

DurationOfRun:5
ThreadSize:10
ExistingRange:1-1000
NewRange:5000-10000
Percentage:55 - AutoRefreshStoreCategories  Data:Previous/30,New/70    UserLogged:true/50,false/50      SleepTime:5000     AttributeGet:1,16,10106,10111       AttributeSet:2060/30,10053/27
Percentage:25 - CrossPromoEditItemRule      Data:Previous/60,New/40    UserLogged:true/50,false/50      SleepTime:4000     AttributeGet:1,10107                AttributeSet:10108/34,10109/25
Percentage:20 - CrossPromoManageRules       Data:Previous/30,New/70    UserLogged:true/50,false/50      SleepTime:2000     AttributeGet:1,10107                AttributeSet:10108/26,10109/21

我正在尝试解析上面的 .txt 文件（前四行是固定的，后三行可以增加意味着它可以超过 3），因此我编写了下面的代码及其工作，但它看起来很乱。那么有没有更好的方法来解析上述 .txt 文件，如果我们考虑性能，那么这将是解析上述 txt 文件的最佳方法。

private static int noOfThreads;
private static List<Command> commands;
public static int startRange;
public static int endRange;
public static int newStartRange;
public static int newEndRange;
private static BufferedReader br = null;
private static String sCurrentLine = null;
private static List<String> values;
private static String commandName;
private static String percentage;
private static List<String> attributeIDGet;
private static List<String> attributeIDSet;
private static LinkedHashMap<String, Double> dataCriteria;
private static LinkedHashMap<Boolean, Double> userLoggingCriteria;
private static long sleepTimeOfCommand;
private static long durationOfRun;

br = new BufferedReader(new FileReader("S:\\Testing\\PDSTest1.txt"));
values = new ArrayList<String>();

while ((sCurrentLine = br.readLine()) != null) {
    if(sCurrentLine.startsWith("DurationOfRun")) {
        durationOfRun = Long.parseLong(sCurrentLine.split(":")[1]);
    } else if(sCurrentLine.startsWith("ThreadSize")) {
        noOfThreads = Integer.parseInt(sCurrentLine.split(":")[1]);
    } else if(sCurrentLine.startsWith("ExistingRange")) {
        startRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
        endRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
    } else if(sCurrentLine.startsWith("NewRange")) {
        newStartRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
        newEndRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
    } else {
        attributeIDGet =  new ArrayList<String>();
        attributeIDSet =  new ArrayList<String>();
        dataCriteria = new LinkedHashMap<String, Double>();
        userLoggingCriteria = new LinkedHashMap<Boolean, Double>();

        percentage = sCurrentLine.split("-")[0].split(":")[1].trim();
        values = Arrays.asList(sCurrentLine.split("-")[1].trim().split("\\s+"));
        for(String s : values) {
            if(s.startsWith("Data")) {
                String[] data = s.split(":")[1].split(",");
                for (String n : data) {
                    dataCriteria.put(n.split("/")[0], Double.parseDouble(n.split("/")[1]));
                }
                //dataCriteria.put(data.split("/")[0], value)
            } else if(s.startsWith("UserLogged")) {
                String[] userLogged = s.split(":")[1].split(",");
                for (String t : userLogged) {
                    userLoggingCriteria.put(Boolean.parseBoolean(t.split("/")[0]), Double.parseDouble(t.split("/")[1]));
                }
                //userLogged = Boolean.parseBoolean(s.split(":")[1]);
            } else if(s.startsWith("SleepTime")) {
                sleepTimeOfCommand = Long.parseLong(s.split(":")[1]);
            } else if(s.startsWith("AttributeGet")) {
                String[] strGet = s.split(":")[1].split(",");
                for(String q : strGet) attributeIDGet.add(q); 
            } else if(s.startsWith("AttributeSet:")) {
                String[] strSet = s.split(":")[1].split(",");
                for(String p : strSet) attributeIDSet.add(p); 
            } else {
                commandName = s;
            }
        }
        Command command = new Command();
        command.setName(commandName);
        command.setExecutionPercentage(Double.parseDouble(percentage));
        command.setAttributeIDGet(attributeIDGet);
        command.setAttributeIDSet(attributeIDSet);
        command.setDataUsageCriteria(dataCriteria);
        command.setUserLoggingCriteria(userLoggingCriteria);
        command.setSleepTime(sleepTimeOfCommand);
        commands.add(command);

score 2 · Accepted Answer

好吧，一旦您深入到解析器的较低层，解析器通常会很混乱:-)

但是，至少在代码质量方面，一种可能的改进是认识到您的语法是分层的这一事实。

我的意思是每一行都是一个识别标记，后面跟着一些属性。

在、和的情况下DurationOfRun，属性相对简单。稍微复杂一些，但还可以。ThreadSizeExistingRangeNewRangePercentage

我会将代码构造为（伪代码）：

def parseFile (fileHandle):
    while (currentLine = fileHandle.getNextLine()) != EOF:
        if currentLine.beginsWith ("DurationOfRun:"):
            processDurationOfRun (currentLine[14:])

        elsif currentLine.beginsWith ("ThreadSize:"):
            processThreadSize (currentLine[11:])

        elsif currentLine.beginsWith ("ExistingRange:"):
            processExistingRange (currentLine[14:])

        elsif currentLine.beginsWith ("NewRange:"):
            processNewRange (currentLine[9:])

        elsif currentLine.beginsWith ("Percentage:"):
            processPercentage (currentLine[11:])

        else
            raise error

然后，在每个processWhatever()函数中，您根据预期格式解析该行的其余部分。这样可以使您的代码保持小而易读，并且将来可以轻松更改，而无需在泥潭中导航:-)

例如，processDurationOfRun()只需从该行的其余部分获取一个整数：

def processDurationOfRun (line):
    this.durationOfRun = line.parseAsInt()

同样，两个范围的函数拆分字符串-并从结果值中获取两个整数：

def processExistingRange (line):
    values[] = line.split("-")
    this.existingRangeStart = values[0].parseAsInt()
    this.existingRangeEnd   = values[1].parseAsInt()

该processPercentage()功能是一个棘手的功能，但如果您也将其分层，这也很容易实现。假设这些东西总是按相同的顺序排列，它包括：

一个整数；
文字-;
某种文本类别；和
一系列key:value对。

甚至这些对中的值也可以由较低级别解析，首先用逗号分割以获得像Previous/30and之类New/70的子值，然后将每个子值分割成斜线以获得单个项目。这样，逻辑层次结构可以反映在您的代码中。

除非您期望每秒多次解析此文本文件，或者除非它的大小为数兆字节，否则我会更关心代码的可读性和可维护性，而不是解析速度。

我们需要从代码中榨取最后一点性能的日子已经一去不复返了，但是当发现错误或需要增强时，我们仍然无法及时修复所述代码。

有时最好优化可读性。

score 1 · Accepted Answer

在确定确实存在性能问题之前，我不会担心性能。关于其余代码，如果您不添加任何新的线型，我不会担心。但是，如果您确实担心，工厂设计模式可以帮助您将所需处理类型的选择与实际处理分开。它使添加新的线型更容易，而不会引入太多的错误机会。

score 0 · Accepted Answer

更年轻、更方便的类是Scanner。您只需要修改分隔符，并一次性读取所需格式（readInt、readLong）的数据 - 无需单独的 x.parseX - 调用。

第二：将你的代码拆分成小的、可重用的部分。它们使程序可读，并且您可以轻松隐藏细节。

例如，不要犹豫对范围使用类似结构的类。从方法返回多个值可以通过这些来完成，而无需样板文件（getter、setter、ctor）。

import java.util.*;
import java.io.*;

public class ReadSampleFile
{
    // struct like classes:
    class PercentageRow {
        public int percentage;
        public String name;
        public int dataPrevious;
        public int dataNew;
        public int userLoggedTrue;
        public int userLoggedFalse;
        public List<Integer> attributeGet;
        public List<Integer> attributeSet;
    }
    class Range {
        public int from;
        public int to;
    }

    private int readInt (String name, Scanner sc) {     
        String s = sc.next (); 
        if (s.startsWith (name)) {
            return sc.nextLong ();
        }
        else err (name + " expected, found: " + s);     
    }

    private long readLong (String name, Scanner sc) {
        String s = sc.next (); 
        if (s.startsWith (name)) {
            return sc.nextInt ();
        }
        else err (name + " expected, found: " + s);     
    }

    private Range readRange (String name, Scanner sc) {
        String s = sc.next (); 
        if (s.startsWith (name)) {
            Range r = new Range ();
            r.from = sc.nextInt ();
            r.to = sc.nextInt ();
            return r; 
        }
        else err (name + " expected, found: " + s);
    }

    private PercentageLine readPercentageLine (Scanner sc) {
        // reuse above methods
        PercentageLine percentageLine = new PercentageLine ();
        percentageLine.percentage = readInt ("Percentage", sc);
        // ...
        return percentageLine;
    }

    public ReadSampleFile () throws FileNotFoundException
    {       
        /* I only read from my sourcefile for convenience. 
        So I could scroll up to see what's the next entry.                  
        Don't do this at home. :) The dummy later ...
        */ 
        Scanner sc = new Scanner (new File ("./ReadSampleFile.java"));
        sc.useDelimiter ("[ \n/,:-]");
        // ... is the comment I had to insert.
        String dummy = sc.nextLine (); 
        List <String> values = new ArrayList<String> ();
        if (sc.hasNext ()) {
            // see how nice the data structure is reflected 
            // by this code:  
            long duration = readLong ("DurationOfRun");         
            int noOfThreads = readInt ("ThreadSize");
            Range eRange = readRange ("ExistingRange");
            Range nRange = readRange ("NewRange");
            List <PercentageRow> percentageRows = new ArrayList <PercentageRow> ();
            // including the repetition ...
            while (sc.hasNext ()) {
                percentageRows.add (readPercentageLine ()); 
            }
        }
    }

    public static void main (String args[])  throws FileNotFoundException
    {
        new ReadSampleFile ();
    }

    public static void err (String msg)
    {
        System.out.println ("Err:\t" + msg);
    }
}

java - 解析 .txt 文件（考虑性能指标）

3 回答 3

Related

Reference