java - 从Java中的平面文件中读取多条记录

Question

我有一个文本文件转储，需要将其转换为分隔文件。该文件包含一系列格式如下的“记录”（因为没有更好的词）：

User: abc123 
Date: 7/3/12
Subject: the foo is bar
Project: 123456
Problem: foo bar in multiple lines of text
Resolution: foo un-barred in multiple lines of text

User: abc123 
Date: 7/3/12
Subject: the foo is bar
Project: 234567
Problem: foo bar in multiple lines of text
Resolution: foo un-barred in multiple lines of text

...

我的最终结果是获得一个包含分隔值的平面文件。使用上面的记录，我们会看到：

abc123;7/3/12;the foo is bar;123456;foo bar in multiple lines of text;foo un-barred in multiple lines of text
abc123;7/3/12;the foo is bar;234567;foo bar in multiple lines of text;foo un-barred in multiple lines of text

代码出现在下面，然后是我遇到的问题。

    import java.util.*;
import java.io.*;
import java.nio.file.*;
//
public class ParseOutlookFolderForSE
{
        public static void main(String args[])
        {
            String user = "";
            String PDLDate = "";
            String name = "";
            String PDLNum = "";
            String problemDesc = "test";
            String resolutionDesc = "test";
            String delim = ";";
            int recordCounter = 0;
            //
            try
            {
                Path file = Paths.get("testfile2.txt");
                FileInputStream fstream = new FileInputStream("testfile2.txt");
               // Get the object of DataInputStream
                /* DataInputStream in = new DataInputStream(fstream);  */
                BufferedReader br = new BufferedReader(new InputStreamReader(fstream));  //Buffered Reader
                String inputLine = null;     //String
                StringBuffer theText = new StringBuffer();  //StringBuffer
// problem: output contains last record ONLY. program is cycling through the entire file, overwriting records until the end.
// add a for loop based on recordCounter
                for(recordCounter=0;recordCounter<10;recordCounter++)
                {
                while((inputLine=br.readLine())!=null)
                {
                    if(inputLine.toLowerCase().startsWith("from:"))
                    {

                /*      recordCounter = recordCounter++;    */  // commented out when I added recordCounter++ to the for loop
                        user = inputLine.trim().substring(5).trim();
                    }
                    else
                    if(inputLine.toLowerCase().startsWith("effective date"))
                    {

                        PDLDate = inputLine.trim().substring(15).trim();
                    }
                    else
                    if(inputLine.toLowerCase().startsWith("to:"))
                    {

                        name = inputLine.trim().substring(3).trim();
                    }
                    else
                    if(inputLine.toLowerCase().startsWith("sir number"))
                    {

                        PDLNum = inputLine.trim().substring(12).trim();
                    }
                }      //close for loop
                }   // close while
                System.out.println(recordCounter + "\n" + user + "\n" + name + "\n" + PDLNum + "\n" + PDLDate + "\n" + problemDesc + "\n" + resolutionDesc);
                System.out.println(recordCounter + ";" + user + ";" + name + ";" + PDLNum + ";" + PDLDate + ";" + problemDesc + ";" + resolutionDesc);
                String lineForFile = (recordCounter + ";" + user + ";" + name + ";" + PDLNum + ";" + PDLDate + ";" + problemDesc + ";" + resolutionDesc + System.getProperty("line.separator"));
                System.out.println(lineForFile);
                try
                {
                    BufferedWriter out = new BufferedWriter(new FileWriter("testfileoutput.txt"));
                    out.write(lineForFile);
                    out.close();
                }
                catch (IOException e)
                {
                    System.out.println("Exception ");
                }
            } //close try
            catch (Exception e)
            {
                System.err.println("Error: " + e.getMessage());
            }
        }

    }

我的最终输出只是最后一条记录。我相信正在发生的事情是程序正在读取每一行，但只有最后一个不会被下一条记录覆盖。说得通。所以我添加了一个FOR循环，递增 1if(inputLine.toLowerCase().startsWith("user:"))并用我的数据输出计数器变量来验证发生了什么。

我的循环在我的伪代码中的第 3 步之后开始……在我的陈述FOR之后BufferedReader但之前。IF我在第 6 步中写入文件后终止它。我正在使用for(recCounter=0;recCounter<10;recCounter++)并且在输出文件中获得十条记录时，它们都是输入文件的最后一条记录的实例，编号为 0-9。

将 for 循环留在原处，我将其修改为读取for(recCounter=0;recCounter<10;)并在语句中放置recCounter's 增量IF，每次以User:. 在这种情况下，我的输出文件中也有 10 条记录，它们是输入文件中最后一条记录的 10 条实例，所有计数器均为 0。

编辑：鉴于文件的格式，从下一条记录中确定 w=一条记录的唯一方法是在行首出现单词“User：”的后续实例。每次发生，直到它发生的下一次是构成单个记录的时间。

似乎我没有正确设置我的“recCounter”，或者我没有将设置的结果解释为“开始新记录”。

有人对如何将此文件作为多条记录读取有任何建议吗？

score 3 · Accepted Answer

好的，所以你的伪代码应该是这样的：

declare variables
open file
while not eof
  read input
  if end of set
    format output
    write output
    clear variables
  figure out which variable
  store in correct variable
end-while

可能有一个技巧可以确定您何时完成一组并可以开始下一组。如果一个集合应该由一个空行终止，如您的示例中所示，那么您可以只检查空行。否则，你怎么知道？集合总是以“用户”开头吗？

另外，不要忘记写最后一条记录。你不想在你的缓冲区/表中留下不成文的东西。

score 1 · Accepted Answer

从您的描述看来，情况如下：您实际上并没有在完成输出字符串时编写它们，而是在最后完成所有写入。听起来您并没有将输出字符串保存在循环之外，因此每次找到记录时，您都会覆盖之前计算的输出字符串。

在找到每条记录并创建其输出字符串后，您应该测试您是否实际上正在写入文件。

如果不发布您的代码，我不确定我能否为您提供更多帮助。

java - 从Java中的平面文件中读取多条记录

2 回答 2

Related

Reference