3

我在其他语言中发现了这个问题,但还没有在 java 应用程序中找到解决这个问题的方法。

我有一个.txt包含数百万条记录的大文件。每条记录都是/n分隔的。基本上它是表中的一列数据。目标是从输入文件中读取数据并对其进行分区。然后将分区数据写入新文件。例如,一个有 200 万条记录的文件将变成 200 个文件,每个文件有 10,000 条记录(最后一个文件包含 <10,000 条记录)。

我正在成功读取和分区数据。我成功地创建了第一个文件并且它被正确命名。

问题是只创建了 1 个文件并且它是空的。代码按原样编译和运行,没有错误或异常。

我的代码如下:

    import java.io.BufferedReader;
    import java.io.BufferedWriter;
    import java.io.FileReader;
    import java.io.FileWriter;
    import java.io.IOException;
    import java.io.StringWriter;
    import java.util.ArrayList;
    import java.util.Collection;
    import java.util.List;
    import java.util.concurrent.atomic.AtomicInteger;
    import java.util.stream.Collectors;

    public class ChunkTextFile {

    private static final String inputFilename = "inputFile.txt";

    public static void main(String[] args) {

        BufferedReader reader = null;

        BufferedWriter fileWriter = null;

        BufferedWriter lineWriter = null;

        StringWriter stringWriter = null;

        // Create an ArrayList object to hold the lines of input file

        List<String> lines = new ArrayList<String>();

        try {
            // Creating BufferedReader object to read the input file

            reader = new BufferedReader(new FileReader("src" + "//" + inputFilename));

            // Reading all the lines of input file one by one and adding them into ArrayList
            String currentLine = reader.readLine();

            while (currentLine != null) {
                lines.add(currentLine);

                currentLine = reader.readLine();

            }
            // End of file read.

           //Partition ArrayList into a collection of smaller Lists<String>
            final AtomicInteger counter = new AtomicInteger(0);
            final int size = 10000;

            Collection<List<String>> partitioned = lines.stream()
                    .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / size)).values();

            //Printing partitions. Each partition will be written to a file.
            //Testing confirms the partitioning works correctly.
            partitioned.forEach(System.out::println);

            //Iterate through the Collections and create a file for List<String> object.
            //Testing confirms that multiple files are created and properly named.
            Integer count = 0;
            for (List<String> chunks : partitioned) {
                // Prepare new incremented file name.
                String outputFile = "batched_items_file_";
                String txt = ".txt";
                count++;


                String filename = outputFile + count + txt;

                // Write file to directory.
                fileWriter = new BufferedWriter(new FileWriter("src" + "//" + outputFile));
                fileWriter = new BufferedWriter(new FileWriter(filename));

                //Iterate through the List of Strings and write each String to the file.
                //Writing is not successful. Only 1 file is created and it is empty.
                for (String chunk : chunks) {
                    stringWriter = new StringWriter();
                    lineWriter = new BufferedWriter(stringWriter);
                    // Prepare list of strings to be written to new file.
                    // Write each item number to file.
                    lineWriter.write(chunk);
                    lineWriter.flush();
                }
                lineWriter.close(); // <- flush the BufferedWriter

                fileWriter.close();
            }

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            // Closing the resources
            System.out.println("Finished");

            try {
                if (reader != null) {
                    reader.close();
                }

                if (fileWriter != null) {
                    fileWriter.close();
                }

                if (stringWriter != null) {
                    stringWriter.close();
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

输入文件示例:

230449
235659
295377
329921
348526
359836
361447
384723
396202
571490

先感谢您。

4

5 回答 5

6

您不需要 for 中的所有这些额外的编写器,并且不会调用应该写入 (fileWriter) 到文件的编写器。用这个替换你的:

for (String chunk : chunks) {
    fileWriter.write(chunk);
}

提示:只需在 finally 块中调用 fileWriter.close() 一次。close 方法会自动为您刷新 writer(无需调用 fileWriter.flush())。

于 2019-02-22T16:18:38.153 回答
1

您的代码有几个问题。这些文件是空的,因为您没有关闭编写器。您甚至可以按照此顺序创建冗余写入器

fileWriter = new BufferedWriter(new FileWriter("src" + "//" + outputFile));
fileWriter = new BufferedWriter(new FileWriter(filename));

要以最佳方式处理读取器和写入器等资源,请使用try-with-resources 语句

缺少的新行只是一个小问题。

此外,您不必要地将整个输入文件读入堆内存,只是为了能够对其执行有问题的 Stream 操作。虽然可以直接流式传输文件,例如使用Files.lines,但使用 an 分组AtomicInteger并不是使用 a 的预期方式Stream。最终结果仍会将整个输入行保存在内存中,而将这些行立即写入目标文件将是直接的。

一个简单有效的解决方案是

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class ChunkTextFile {

    private static final String inputFilename = "inputFile.txt";

    public static void main(String[] args) {
        final int size = 10000;
        try(BufferedReader reader=Files.newBufferedReader(Paths.get("src", inputFilename))) {
            String line = reader.readLine();
            for(int count = 0; line != null; count++) {
                try(BufferedWriter writer = Files.newBufferedWriter(
                        Paths.get("batched_items_file_" + count + ".txt"))) {
                    for(int i = 0; i < size && line != null; i++) {
                        writer.write(line);
                        writer.newLine();
                        line = reader.readLine();
                    }
                }
            }
        }
        catch(IOException ex) {
            ex.printStackTrace();
        }
    }
}
于 2019-02-22T17:49:06.983 回答
0

AStringWriter不是用于写入字符串,而是用于写入字符串

于 2019-02-22T16:18:23.930 回答
0

你可以只使用

Path file = Paths.get(filename);
Files.write(file, chunks, Charset.forName("UTF-8"));

并且,您应该在循环之前放置 count=0 ,否则它将始终为 0。

总的来说会是这样的:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;

public class ChunkTextFile {

private static final String inputFilename = "inputFile.txt";

public static void main(String[] args) {

    BufferedReader reader = null;


    // Create an ArrayList object to hold the lines of input file

    List<String> lines = new ArrayList<String>();

    try {
        // Creating BufferedReader object to read the input file

        reader = new BufferedReader(new FileReader(inputFilename));

        // Reading all the lines of input file one by one and adding them into ArrayList
        String currentLine = reader.readLine();

        while (currentLine != null) {
            lines.add(currentLine);

            currentLine = reader.readLine();

        }
        // End of file read.

        //Partition ArrayList into a collection of smaller Lists<String>
        final AtomicInteger counter = new AtomicInteger(0);
        final int size = 10;

        Collection<List<String>> partitioned = lines.stream()
                .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / size)).values();

        //Printing partitions. Each partition will be written to a file.
        //Testing confirms the partitioning works correctly.
        partitioned.forEach(System.out::println);

        //Iterate through the Collections and create a file for List<String> object.
        //Testing confirms the file is created and properly named.
        Integer count = 0;
        for (List<String> chunks : partitioned) {
            // Prepare new incremented file name.
            String outputFile = "batched_items_file_";
            String txt = ".txt";

            count++;

            String filename = outputFile + count + txt;

            Path file = Paths.get(filename);
            Files.write(file, chunks, Charset.forName("UTF-8"));
        }

    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        // Closing the resources
        System.out.println("Finished");

        try {
            if (reader != null) {
                reader.close();
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
 }
 }
于 2019-02-22T16:37:12.977 回答
0

我接受上述答案,因为它解决了我的问题,但我想为任何找到这个问题和答案的人扩展它。为了使创建的文件与输入文件的格式相同(换行符分隔),我使用接受的答案更改了我的代码并添加了System.lineSeparator().

最终解决方案如下所示。

fileWriter.write(chunk + System.lineSeparator());

再次感谢您的快速回复。

这是工作版本。我建议注释掉或删除partitioned.forEach(System.out::println);以提高性能。

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.StringWriter;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;

public class ChunkTextFile {

private static final String inputFilename = "inputFile.txt";

public static void main(String[] args) {

    BufferedReader reader = null;

    BufferedWriter fileWriter = null;


    // Create an ArrayList object to hold the lines of input file

    List<String> lines = new ArrayList<String>();

    try {
        // Creating BufferedReader object to read the input file

        reader = new BufferedReader(new FileReader("src" + "//" + inputFilename));

        // Reading all the lines of input file one by one and adding them into ArrayList
        String currentLine = reader.readLine();

        while (currentLine != null) {
            lines.add(currentLine);

            currentLine = reader.readLine();

        }
        // End of file read.

        final AtomicInteger counter = new AtomicInteger(0);
        final int size = 10000;

        Collection<List<String>> partitioned = lines.stream()
                .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / size)).values();

        //Printing partitions. Each partition will be written to a file.
        //Testing confirms the partitioning works correctly.
        partitioned.forEach(System.out::println);

        //Iterate through the Collections and create a file for List<String> object.
        //Testing confirms the file is created and properly named.
        Integer count = 0;
        for (List<String> chunks : partitioned) {
            // Prepare new incremented file name.
            String outputFile = "batched_items_file_";
            String txt = ".txt";
             count++;

            String filename = outputFile + count + txt;

            // Write file to directory.
            fileWriter = new BufferedWriter(new FileWriter("src" + "//" + outputFile));
            fileWriter = new BufferedWriter(new FileWriter(filename));

            //Iterate through the List of Strings and write each String to the file.
            //Writing is not successful. Only 1 file is created and it is empty.
            for (String chunk : chunks) {
                // Prepare list of strings to be written to new file.
                // Write each item number to file.
                fileWriter.write(chunk + System.lineSeparator());
            }

        }

    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        // Closing the resources
        System.out.println("Finished");

        try {
            if (reader != null) {
                reader.close();
            }

            if (fileWriter != null) {
                fileWriter.close();
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
  }
}
于 2019-02-22T16:37:52.067 回答