java - 如何考虑新段落的第一个单词？

Question

我正在尝试构建一个程序来接收文件并输出文件中的字数。当一切都在一个完整的段落下时，它可以完美地工作。但是，当有多个段落时，它不会考虑新段落的第一个单词。例如，如果一个文件读取“我的名字是约翰”，程序将输出“4 个单词”。但是，如果文件读取“我的名字是约翰”，每个单词都是一个新段落，程序将输出“1 个单词”。我知道这一定与我的 if 语句有关，但我假设在新段落之前有空格会考虑新段落中的第一个单词。这是我的一般代码：

import java.io.*;
public class HelloWorld
{
    public static void main(String[]args)
    {
        try{
            // Open the file that is the first
            // command line parameter
            FileInputStream fstream = new FileInputStream("health.txt");
            // Use DataInputStream to read binary NOT text.
            BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
            String strLine;

            int word2 =0;
            int word3 =0;
            //Read File Line By Line
            while ((strLine = br.readLine()) != null)   {
                // Print the content on the console
                ;
                int wordLength = strLine.length();
                System.out.println(strLine);
                for(int i = 0 ; i < wordLength -1 ; i++)
                    {
                        Character a = strLine.charAt(i);
                        Character b= strLine.charAt(i + 1);
                        **if(a == ' ' && b != '.' &&b != '?' && b != '!' && b != ' ' )**
                            {
                                word2++;
                                //doesnt take into account 1st character of new paragraph
                            }
                    }
                word3 = word2 + 1;
            }



            System.out.println("There are " + word3 + " "
                               + "words in your file.");
            //Close the input stream
            in.close();
        }catch (Exception e){//Catch exception if any
            System.err.println("Error: " + e.getMessage());
        }


    }
}

我尝试过调整多个团队的 if 语句，但似乎没有什么不同。有谁知道我在哪里搞砸了？

我是一个相当新的用户，几天前问了一个类似的问题，有人指责我对用户要求太多，所以希望这能缩小我的问题范围。我真的很困惑为什么它不考虑新段落的第一个单词。如果您需要更多信息，请告诉我。谢谢！！

score 1 · Accepted Answer

首先，您的计数逻辑不正确。考虑：

word3 = word2 + 1;

想想这是做什么的。每次通过循环时，当您阅读一行时，您基本上都会计算该行中的单词，然后将总计数重置为word2 + 1. 提示：如果你想计算文件中的总数，你需要每次递增 word3，而不是用当前行的字数替换它。

其次，您的单词解析逻辑略有偏差。考虑空行的情况。您将在其中看不到任何单词，但您将行中的字数视为word2 + 1，这意味着您错误地将空白行计为 1 个单词。提示：如果该行的第一个字符是字母，则该行以单词开头。

您的方法是合理的，尽管您的实现略有缺陷。作为替代选项，您可能需要考虑String.split()每一行。结果数组中的元素数是行中的单词数。

顺便说一句，如果您对变量使用有意义的名称（例如totalWords，而不是word3），您可以提高代码的可读性，并使调试更容易。

score 0 · Accepted Answer

如果您的段落不是以空格开头，那么您的 if 条件将不计算第一个单词。“我叫约翰”，程序会输出“4 个单词”，这是不正确的，因为你漏掉了第一个单词，但后面又加了一个。尝试这个：

String strLine;
strLine = strLine.trime();//remove leading and trailing whitespace
String[] words = strLine.split(" ");
int numOfWords = words.length;

score 0 · Accepted Answer

对于这类事情，我个人更喜欢使用基于令牌的扫描的常规扫描仪。像这样的东西怎么样：

int words = 0;
Scanner lineScan = new Scanner(new File("fileName.txt"));
while (lineScan.hasNext()) {
    Scanner tokenScan = new Scanner(lineScan.Next());
    while (tokenScan.hasNext()) {
        tokenScan.Next();
        words++;
    }
}

这会遍历文件中的每一行。对于文件中的每一行，它会遍历每个标记（在本例中为单词）并增加字数。

score 0 · Accepted Answer

我不确定您所说的“段落”是什么意思，但是我尝试按照您的建议使用大写字母，并且效果很好。我使用了 Appache Commons IO 库

 package Project1;

import java.io.*;
import org.apache.commons.io.*;
public class HelloWorld
{
    private static String fileStr = "";
    private static String[] tokens;
    public static void main(String[]args)
    {


    try{
        // Open the file that is the first
        // command line parameter
        try {
             File f = new File("c:\\TestFile\\test.txt");
             fileStr = FileUtils.readFileToString(f);
             tokens = fileStr.split(" ");
             System.out.println("Words in file : " + tokens.length);
        }
    catch(Exception ex){
        System.out.println(ex);
    }           

    }catch (Exception e){//Catch exception if any
        System.err.println("Error: " + e.getMessage());
    }


}

}

java - 如何考虑新段落的第一个单词？

4 回答 4

Related

Reference