16

我有一个文件,我想用 Java 读取并将这个文件拆分成n(用户输入)输出文件。这是我读取文件的方式:

int n = 4;
BufferedReader br = new BufferedReader(new FileReader("file.csv"));
try {
    String line = br.readLine();

    while (line != null) {
        line = br.readLine();
    }
} finally {
    br.close();
}

如何拆分文件 -file.csvn文件?

注意 - 由于文件中的条目数约为 100k,因此我无法将文件内容存储到数组中,然后将其拆分并保存到多个文件中。

4

9 回答 9

25

由于一个文件可能非常大,因此每个拆分文件也可能很大。

例子:

源文件大小:5GB

拆分数量:5:目的地

文件大小:每个 1GB(5 个文件)

即使我们有这样的内存,也没有办法一口气读取这个大的拆分块。byte-array基本上,对于每个拆分,我们都可以读取一个我们知道在性能和内存方面应该可行的固定大小。

NumSplits:10 MaxReadBytes:8KB

public static void main(String[] args) throws Exception
    {
        RandomAccessFile raf = new RandomAccessFile("test.csv", "r");
        long numSplits = 10; //from user input, extract it from args
        long sourceSize = raf.length();
        long bytesPerSplit = sourceSize/numSplits ;
        long remainingBytes = sourceSize % numSplits;

        int maxReadBufferSize = 8 * 1024; //8KB
        for(int destIx=1; destIx <= numSplits; destIx++) {
            BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+destIx));
            if(bytesPerSplit > maxReadBufferSize) {
                long numReads = bytesPerSplit/maxReadBufferSize;
                long numRemainingRead = bytesPerSplit % maxReadBufferSize;
                for(int i=0; i<numReads; i++) {
                    readWrite(raf, bw, maxReadBufferSize);
                }
                if(numRemainingRead > 0) {
                    readWrite(raf, bw, numRemainingRead);
                }
            }else {
                readWrite(raf, bw, bytesPerSplit);
            }
            bw.close();
        }
        if(remainingBytes > 0) {
            BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+(numSplits+1)));
            readWrite(raf, bw, remainingBytes);
            bw.close();
        }
            raf.close();
    }

    static void readWrite(RandomAccessFile raf, BufferedOutputStream bw, long numBytes) throws IOException {
        byte[] buf = new byte[(int) numBytes];
        int val = raf.read(buf);
        if(val != -1) {
            bw.write(buf);
        }
    }
于 2013-10-04T10:37:55.190 回答
8
import java.io.*;  
import java.util.Scanner;  
public class split {  
public static void main(String args[])  
{  
 try{  
  // Reading file and getting no. of files to be generated  
  String inputfile = "C:/test.txt"; //  Source File Name.  
  double nol = 2000.0; //  No. of lines to be split and saved in each output file.  
  File file = new File(inputfile);  
  Scanner scanner = new Scanner(file);  
  int count = 0;  
  while (scanner.hasNextLine())   
  {  
   scanner.nextLine();  
   count++;  
  }  
  System.out.println("Lines in the file: " + count);     // Displays no. of lines in the input file.  

  double temp = (count/nol);  
  int temp1=(int)temp;  
  int nof=0;  
  if(temp1==temp)  
  {  
   nof=temp1;  
  }  
  else  
  {  
   nof=temp1+1;  
  }  
  System.out.println("No. of files to be generated :"+nof); // Displays no. of files to be generated.  

  //---------------------------------------------------------------------------------------------------------  

  // Actual splitting of file into smaller files  

  FileInputStream fstream = new FileInputStream(inputfile); DataInputStream in = new DataInputStream(fstream);  

  BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine;  

  for (int j=1;j<=nof;j++)  
  {  
   FileWriter fstream1 = new FileWriter("C:/New Folder/File"+j+".txt");     // Destination File Location  
   BufferedWriter out = new BufferedWriter(fstream1);   
   for (int i=1;i<=nol;i++)  
   {  
    strLine = br.readLine();   
    if (strLine!= null)  
    {  
     out.write(strLine);   
     if(i!=nol)  
     {  
      out.newLine();  
     }  
    }  
   }  
   out.close();  
  }  

  in.close();  
 }catch (Exception e)  
 {  
  System.err.println("Error: " + e.getMessage());  
 }  

}  

}   
于 2014-07-23T06:47:56.350 回答
2

虽然这是一个老问题,但作为参考,我列出了用于将大文件拆分为任何大小的代码,它适用于 1.4 以上的任何 Java 版本。

示例拆分和连接块如下所示:

public void join(String FilePath) {
    long leninfile = 0, leng = 0;
    int count = 1, data = 0;
    try {
        File filename = new File(FilePath);
        //RandomAccessFile outfile = new RandomAccessFile(filename,"rw");

        OutputStream outfile = new BufferedOutputStream(new FileOutputStream(filename));
        while (true) {
            filename = new File(FilePath + count + ".sp");
            if (filename.exists()) {
                //RandomAccessFile infile = new RandomAccessFile(filename,"r");
                InputStream infile = new BufferedInputStream(new FileInputStream(filename));
                data = infile.read();
                while (data != -1) {
                    outfile.write(data);
                    data = infile.read();
                }
                leng++;
                infile.close();
                count++;
            } else {
                break;
            }
        }
        outfile.close();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

public void split(String FilePath, long splitlen) {
    long leninfile = 0, leng = 0;
    int count = 1, data;
    try {
        File filename = new File(FilePath);
        //RandomAccessFile infile = new RandomAccessFile(filename, "r");
        InputStream infile = new BufferedInputStream(new FileInputStream(filename));
        data = infile.read();
        while (data != -1) {
            filename = new File(FilePath + count + ".sp");
            //RandomAccessFile outfile = new RandomAccessFile(filename, "rw");
            OutputStream outfile = new BufferedOutputStream(new FileOutputStream(filename));
            while (data != -1 && leng < splitlen) {
                outfile.write(data);
                leng++;
                data = infile.read();
            }
            leninfile += leng;
            leng = 0;
            outfile.close();
            count++;
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

完整的 Java 代码在 Java 程序中的文件拆分链接中可用。

于 2016-05-20T18:54:11.817 回答
1

一个干净的编辑解决方案。

此解决方案涉及将整个文件加载到内存中。

设置文件的所有行List<String> rowsOfFile;

编辑maxSizeFile以选择拆分的单个文件的最大大小

public void splitFile(File fileToSplit) throws IOException {
  long maxSizeFile = 10000000 // 10mb
  StringBuilder buffer = new StringBuilder((int) maxSizeFile);
  int sizeOfRows = 0;
  int recurrence = 0;
  String fileName;
  List<String> rowsOfFile;

  rowsOfFile = Files.readAllLines(fileToSplit.toPath(), Charset.defaultCharset());

  for (String row : rowsOfFile) {
      buffer.append(row);
      numOfRow++;
      sizeOfRows += row.getBytes(StandardCharsets.UTF_8).length;
      if (sizeOfRows >= maxSizeFile) {
          fileName = generateFileName(recurrence);
          File newFile = new File(fileName);

          try (PrintWriter writer = new PrintWriter(newFile)) {
              writer.println(buffer.toString());
          }

          recurrence++;
          sizeOfRows = 0;
          buffer = new StringBuilder();
      }
  }
  // last rows
  if (sizeOfRows > 0) {
      fileName = generateFileName(recurrence);
      File newFile = createFile(fileName);

      try (PrintWriter writer = new PrintWriter(newFile)) {
          writer.println(buffer.toString());
      }
  }
  Files.delete(fileToSplit.toPath());
}

生成文件名的方法:

    public String generateFileName(int numFile) {
      String extension = ".txt";
      return "myFile" + numFile + extension;
    }
于 2019-11-11T16:47:58.093 回答
0

无需在文件中循环两次。您可以将每个块的大小估计为源文件大小除以所需的块数。然后你就停止用数据填充每个块,因为它的大小超过了估计值。

于 2013-10-04T09:47:06.103 回答
0

有一个计数器来计算条目数。假设每行一个条目。

step1:初始创建新的子文件,设置counter=0;

step2:当您将每个条目从源文件读取到缓冲区时递增计数器

step3:当计数器达到要在每个子文件中写入的条目数限制时,将缓冲区的内容刷新到子文件。关闭子文件

step4 : 跳转到 step1 直到你在源文件中有数据可以读取

于 2013-10-04T09:44:29.313 回答
0

这是一个对我有用的,我用它来分割 10GB 文件。它还使您能够添加页眉和页脚。在拆分基于文档的格式(例如 XML 和 JSON)时非常有用,因为您需要在新的拆分文件中添加文档包装器。

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class FileSpliter
{
    public static void main(String[] args) throws IOException
    {
        splitTextFiles("D:\\xref.csx", 750000, "", "", null);
    }

    public static void splitTextFiles(String fileName, int maxRows, String header, String footer, String targetDir) throws IOException
    {
        File bigFile = new File(fileName);
        int i = 1;
        String ext = fileName.substring(fileName.lastIndexOf("."));

        String fileNoExt = bigFile.getName().replace(ext, "");
        File newDir = null;
        if(targetDir != null)
        {
            newDir = new File(targetDir);           
        }
        else
        {
            newDir = new File(bigFile.getParent() + "\\" + fileNoExt + "_split");
        }
        newDir.mkdirs();
        try (BufferedReader reader = Files.newBufferedReader(Paths.get(fileName)))
        {
            String line = null;
            int lineNum = 1;
            Path splitFile = Paths.get(newDir.getPath() + "\\" +  fileNoExt + "_" + String.format("%02d", i) + ext);
            BufferedWriter writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE);
            while ((line = reader.readLine()) != null)
            {
                if(lineNum == 1)
                {
                    System.out.print("new file created '" + splitFile.toString());
                    if(header != null && header.length() > 0)
                    {
                        writer.append(header);
                        writer.newLine();
                    }
                }
                writer.append(line);

                if (lineNum >= maxRows)
                {
                    if(footer != null && footer.length() > 0)
                    {
                        writer.newLine();
                        writer.append(footer);
                    }
                    writer.close();
                    System.out.println(", " + lineNum + " lines written to file");
                    lineNum = 1;
                    i++;
                    splitFile = Paths.get(newDir.getPath() + "\\" + fileNoExt + "_" + String.format("%02d", i) + ext);
                    writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE);
                }
                else
                {
                    writer.newLine();
                    lineNum++;
                }
            }
            if(lineNum <= maxRows) // early exit
            {
                if(footer != null && footer.length() > 0)
                {
                    writer.newLine();
                    lineNum++;
                    writer.append(footer);
                }
            }
            writer.close();
            System.out.println(", " + lineNum + " lines written to file");
        }

        System.out.println("file '" + bigFile.getName() + "' split into " + i + " files");
    }
}
于 2017-11-10T06:59:16.650 回答
0

下面的代码用于将大文件拆分为行数较少的小文件。

    long linesWritten = 0;
    int count = 1;

    try {
        File inputFile = new File(inputFilePath);
        InputStream inputFileStream = new BufferedInputStream(new FileInputStream(inputFile));
        BufferedReader reader = new BufferedReader(new InputStreamReader(inputFileStream));

        String line = reader.readLine();

        String fileName = inputFile.getName();
        String outfileName = outputFolderPath + "\\" + fileName;

        while (line != null) {
            File outFile = new File(outfileName + "_" + count + ".split");
            Writer writer = new OutputStreamWriter(new FileOutputStream(outFile));

            while (line != null && linesWritten < linesPerSplit) {
                writer.write(line);
                line = reader.readLine();
                linesWritten++;
            }

            writer.close();
            linesWritten = 0;//next file
            count++;//nect file count
        }

        reader.close();

    } catch (Exception e) {
        e.printStackTrace();
    }
于 2017-11-21T06:18:05.393 回答
0

将文件拆分为多个块(在内存操作中),这里我将任何文件拆分为 500kb(500000 字节)的大小:

public static List<ByteArrayOutputStream> splitFile(File f) {
List<ByteArrayOutputStream> datalist = new ArrayList<>();
try {

    int sizeOfFiles = 500000;
    byte[] buffer = new byte[sizeOfFiles];

    try (FileInputStream fis = new FileInputStream(f); BufferedInputStream bis = new BufferedInputStream(fis)) {

        int bytesAmount = 0;
        while ((bytesAmount = bis.read(buffer)) > 0) {
            try (OutputStream out = new ByteArrayOutputStream()) {
                out.write(buffer, 0, bytesAmount);
                out.flush();
                datalist.add((ByteArrayOutputStream) out);
            }
        }
    }
} catch (Exception e) {
    //get the error
}

return datalist; }
于 2020-10-01T14:02:37.353 回答