1

我正在尝试阅读一篇技术论文,将所有句子分开,使用过滤器查找句子中的关键术语和短语,然后创建自己的摘要。

到目前为止,我有两个BufferedReaders读取一个带有段落的文本文件,然后读取我的过滤器。然后将每一行存储到一个ArrayList并打印到控制台以测试它们是否被正确读取。

我想知道我是否以正确的方式使用 aBufferedReader而不是 a Scanner。我只想能够打印出“。”之后的所有句子。(刀塔 '!' (感叹号)或“?” (问号)现在,所以我知道文件正在被正确读取。

到目前为止,这是我的代码:

import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.*;
import java.io.*;
import java.util.Scanner;


public class Filtering {

    public static void main(String[] args) throws IOException {
        ArrayList<String> lines1 = new ArrayList<String>();
        ArrayList<String> lines2 = new ArrayList<String>();

        try {
            FileInputStream fstream1 = new FileInputStream("paper.txt");
            FileInputStream fstream2 = new FileInputStream("filter2.txt");  
            DataInputStream inStream1 = new  DataInputStream (fstream1);
            DataInputStream inStream2 = new DataInputStream (fstream2);

            BufferedReader br1 = new BufferedReader(
                new InputStreamReader(inStream1));
            BufferedReader br2 = new BufferedReader(
                new InputStreamReader(inStream2));

            String strLine1;
            String strLine2;

            while ((strLine1 = br1.readLine()) != null) {
                lines1.add(strLine1);
            }

            while ((strLine2 = br2.readLine()) != null) {
                lines2.add(strLine2);
            }

            inStream1.close();
            inStream2.close();
        }   
        catch (Exception e) {
            System.err.println("Error: " + e.getMessage());
        }

        System.out.println(lines1);
        System.out.println(lines2);
    }
}
4

1 回答 1

1
  • 使用 BufferedReader 读取任何文件是一个好习惯,因为它将缓冲文件而不是逐个访问每个字节
  • 不需要 DataInputStream
  • 您应该在 InputStreamReader 中指定字符编码
  • 您可以在 StringBuilder 中累积所有字符串,以便将整个文本放在一个引用中
  • 您可能希望查看BreakIterator将您的文本拆分为句子。看看 getSentenceInstance()。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.BreakIterator;

public class Filtering {

    public static void main(String[] args) throws IOException {
        File paperFile = new File("paper.txt");
        File filterFile = new File("filter2.txt");
        // If you want you could roughly initiate the stringbuilders to their
        // approximate future size
        StringBuilder paper = new StringBuilder();
        StringBuilder filter2 = new StringBuilder();

        FileInputStream fstream1 = null;
        FileInputStream fstream2 = null;
        try {
            fstream1 = new FileInputStream(paperFile);
            fstream2 = new FileInputStream(filterFile);

            BufferedReader br1 = new BufferedReader(new InputStreamReader(fstream1, "UTF-8"));
            BufferedReader br2 = new BufferedReader(new InputStreamReader(fstream2, "UTF-8"));

            String strLine1;
            String strLine2;

            while ((strLine1 = br1.readLine()) != null) {
                paper.append(strLine1).append('\n');
            }
            while ((strLine2 = br2.readLine()) != null) {
                filter2.append(strLine2).append('\n');
            }

        }

        catch (Exception e) {
            System.err.println("Error: " + e.getMessage());
        } finally {
            if (fstream1 != null) {
                fstream1.close();
            }
            if (fstream2 != null) {
                fstream2.close();
            }
        }
        String paperString = paper.toString();
        String filterString = filter2.toString();
        System.out.println(paperString);
        System.out.println(filterString);

        // To break it into sentences
        BreakIterator boundary = BreakIterator.getSentenceInstance();
        boundary.setText(paperString);
        int start = boundary.first();
        for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary.next()) {
            System.out.println(paper.substring(start, end));
        }
    }

}
于 2012-05-02T14:53:03.143 回答