1

今天,我正在与一个使用 Java 从文本文件创建索引的客户端合作。我需要做的就是反转索引以从头到尾重新创建文本。现在,我似乎遇到的问题是从哪里开始以及如何执行每一步。到目前为止,我已经尝试创建一个单词数组并遍历我的符号表并将每个键分配给数组。然后我最终只从索引中得到了一个单词列表。出于某种原因,这个问题让我觉得很愚蠢,因为它似乎应该是一个简单的解决方案。我似乎想不出任何有效的想法来让我开始重新创作这个故事。我在这里包括了来源:

public class InvertedConcordance {

public static ST<String, SET<Integer>> createConcordance (String[] words) {
    ST<String, SET<Integer>> st = new ST<String, SET<Integer>>();
    for (int i = 0; i < words.length; i++) {
        String s = words[i];
        if (!st.contains(s)) {
            st.put(s, new SET<Integer>());
        }
        SET<Integer> set = st.get(s);
        set.add(i);
    }
    return st;
}
public static String[] invertConcordance (ST<String, SET<Integer>> st) {

 //This is what I have so far
//Here is what I have that doesnt work
for(String key : st.keys())
{
inv[i++] = key;
}
for(int z = 0; z< inv.length; z++)
{
System.out.println(inv[z]);
}


 String[]inv = new String[st.size()];

    return inv;
}
private static void saveWords (String fileName, String[] words) {
    int MAX_LENGTH = 70;
    Out out = new Out (fileName);
    int length = 0;
    for (String word : words) {
        length += word.length ();
        if (length > MAX_LENGTH) {
            out.println ();
            length = word.length ();
        }
        out.print (word);
        out.print (" ");
        length++;
    }
    out.close ();
}
public static void main(String[] args) {
    String fileName = "data/tale.txt";
    In in = new In (fileName);
    String[] words = in.readAll().split("\\s+");

    ST<String, SET<Integer>> st = createConcordance (words);
    StdOut.println("Finished building concordance");

    // write to a file and read back in (to check that serialization works)
    //serialize ("data/concordance-tale.txt", st);
    //st = deserialize ("data/concordance-tale.txt");

    words = invertConcordance (st);
    saveWords ("data/reconstructed-tale.txt", words);
}

}

4

1 回答 1

1

首先 - 你为什么要使用一些奇怪的类,比如:

  • 英石

而不是内置的java类:

  • 地图

这里有哪些?

至于您的问题,您的代码根本不应该编译,因为您在inv使用它之后声明了变量:

public static String[] invertConcordance (ST<String, SET<Integer>> st) {

 //This is what I have so far
//Here is what I have that doesnt work
for(String key : st.keys())
{
inv[i++] = key;
}
for(int z = 0; z< inv.length; z++)
{
System.out.println(inv[z]);
}


 String[]inv = new String[st.size()];

    return inv;
}

如果我正确理解您的想法,则索引只是创建单词列表和包含在其上找到的索引的集合。如果这是一个正确的解释,那么逆运算将是:

public static String[] invertConcordance (ST<String, SET<Integer>> st) {

//First - figure out the length of the document, which is simply the maximum index in the concordancer
int document_length = 0;
for(String key : st.keys()){
  for(Integer i : st.get(key)){
    if(i>document_length){
      document_length=i;
    }
  }
}    

//Create the document
String[] document = new String[document_length+1];

//Reconstruct
for(String key : st.keys()){
  for(Integer i : st.get(key)){
    document[i] = key;
  }
}

return document;
}

我假设,索引的编号从 0 到文档的长度为 1,如果实际存储从 1 到文档的长度,则应该修改行:

String[] document = new String[document_length+1];

String[] document = new String[document_length];

    document[i] = key;

    document[i-1] = key;
于 2013-08-09T06:36:53.413 回答