java - 如何使用java在文件中搜索单词

Question

我正在编写一个 java 程序来搜索包含字典中单词列表的文本文件中的单词。正如您现在可能看到的，该文件包含大约 300,000 个单词。我能够想出一个程序，该程序可以遍历将每个单词与输入单词（我正在搜索的单词）进行比较的单词。问题是这个过程需要很长时间才能找到一个词，特别是如果这个词以 x、y 或 z 等最后一个字母开头。我想要更高效的东西，几乎可以立即找到一个词。这是我的代码：

import java.io.IOException;
import java.io.InputStreamReader;

public class ReadFile
{
public static void main(String[] args) throws IOException
{
    ReadFile rf = new ReadFile();
    rf.searchWord(args[0]);
}

private void searchWord(String token) throws IOException
{
    InputStreamReader reader = new InputStreamReader(
            getClass().getResourceAsStream("sowpods.txt"));
    String line = null;
    // Read a single line from the file. null represents the EOF.
    while((line = readLine(reader)) != null && !line.equals(token))
    {
        System.out.println(line);
    }

    if(line != null && line.equals(token))
    {
        System.out.println(token + " WAS FOUND.");
    }
    else if(line != null && !line.equals(token))
    {
        System.out.println(token + " WAS NOT FOUND.");
    }
    else
    {
        System.out.println(token + " WAS NOT FOUND.");
    }
    reader.close();
}

private String readLine(InputStreamReader reader) throws IOException
{
    // Test whether the end of file has been reached. If so, return null.
    int readChar = reader.read();
    if(readChar == -1)
    {
        return null;
    }
    StringBuffer string = new StringBuffer("");
    // Read until end of file or new line
    while(readChar != -1 && readChar != '\n')
    {
        // Append the read character to the string. Some operating systems
        // such as Microsoft Windows prepend newline character ('\n') with
        // carriage return ('\r'). This is part of the newline character
        // and therefore an exception that should not be appended to the
        // string.
        if(readChar != '\r')
        {
            string.append((char) readChar);
        }
        // Read the next character
        readChar = reader.read();
    }
    return string.toString();
}

}

另请注意，我想在 Java ME 环境中使用该程序。任何帮助将不胜感激 - Jevison7x。

score 1 · Accepted Answer

您可以使用fgrep（fgrep由-Fto激活grep）（fgrep 的 Linux 手册页）：

grep -F -f dictionary.txt inputfile.txt

字典文件应在每一行包含一个单词。

不确定它是否仍然准确，但维基百科关于 grep的文章提到了在中使用Aho-Corasick 算法，fgrep这是一种基于固定字典构建自动机的算法，用于快速字符串匹配。

无论如何，您可以查看Wikipedia上有限模式集上的字符串搜索算法列表。这些是在字典中搜索单词时更有效的方法。

java - 如何使用java在文件中搜索单词

1 回答 1

Related

Reference