java - Java分隔符跳过一个单词

Question

我正在读取一个文本文件并将该文本文件中的一组唯一单词存储到一个 ArrayList 中（请建议是否有更好的结构来执行此操作）。我正在使用扫描仪扫描文本文件并将分隔符指定为“”（空格），如下所示；

    ArrayList <String> allWords = new ArrayList <String> ();
    ArrayList <String> Vocabulary = new ArrayList <String> ();
    int count = 0;

    Scanner fileScanner = null;
    try {
        fileScanner = new Scanner (new File (textFile));

    } catch (FileNotFoundException e) {
        System.out.println (e.getMessage());
        System.exit(1);
    }

    fileScanner.useDelimiter(" ");

    while (fileScanner.hasNext()) {

        allWords.add(fileScanner.next().toLowerCase());

        count++;

        String distinctWord = (fileScanner.next().toLowerCase());
        System.out.println (distinctWord.toString());

        if (!allWords.contains(distinctWord)) {

            Vocabulary.add(distinctWord);

        }
    }

因此，在打印 Vocabulary 的内容后，每个单词后面都会有一个单词被跳过。因此，例如，如果我有以下文本文件；

“敏捷的棕色狐狸跳过了懒狗”

打印的内容是“quick fox over lazy”，然后它给了我一个错误；

Exception in thread "main" java.util.NoSuchElementException
    at java.util.Scanner.throwFor(Unknown Source)
    at java.util.Scanner.next(Unknown Source)
    at *java filename*.getWords(NaiveBayesTxtClass.java:82)
    at *java filename*.main(NaiveBayesTxtClass.java:22)

谁能给我一些关于如何解决这个问题的建议？我感觉它与 fileScanner.useDelimiter 和 fileScanner.hasNext() 语句有关。

score 5 · Accepted Answer

您在检查 hasNext() 一次后调用了 Scanner#next() 两次，而忽略了 next() 的返回值之一。

您在 (1) 处调用它并将其添加到 allWords
并在 (2) 处再次调用它并打印它。

while (fileScanner.hasNext()) {

    allWords.add(fileScanner.next().toLowerCase()); // **** (1)

    count++;

    String distinctWord = (fileScanner.next().toLowerCase());  // **** (2)
    System.out.println (distinctWord.toString());

    if (!allWords.contains(distinctWord)) {

        Vocabulary.add(distinctWord);

    }
}

解决方法：调用 Scanner#next()一次，将返回的字符串保存到一个变量中，然后将该变量添加到 HashSet 中，并打印该变量。例如，

while (fileScanner.hasNext()) {
    String word = fileScanner.next().toLowerCase();
    allWords.add(word); // **** (1)
    count++;
    // String distinctWord = (fileScanner.next().toLowerCase());  // **** (2)
    System.out.println (word);
    vocabularySet.add(word); // a HashSet
}

安全的一般规则是，您应该为每次调用Scanner#hasNextXXX()和Scanner#nextXXX()

score 2 · Accepted Answer

正如您还要求提供数据结构，您可以执行以下操作：

    List<String> allWords = new ArrayList<String>();
    SortedSet<String> Vocabulary = new TreeSet<String>();
    int count = 0;

    Scanner fileScanner = null;
    try {
        fileScanner = new Scanner(new File(textFile));

    } catch (FileNotFoundException e) {
        System.out.println(e.getMessage());
        System.exit(1);
    }

    fileScanner.useDelimiter(" ");

    while (fileScanner.hasNext()) {
        String word = fileScanner.next().toLowerCase();
        allWords.add(word);
        if (Vocabulary.add(word)) {
            System.out.print("+ ");
        }
        System.out.println(word);
    }

如您所见，变量由接口（List，SortedSet）声明并使用具体类实现。这不仅允许重新实现，而且对函数参数特别有用。

java - Java分隔符跳过一个单词

2 回答 2

Related

Reference