2

我想从文档中拆分每个句子并将每个句子存储在不同的数组中。每个数组元素都是句子的单词。但我不能远离这一点。

int count =0,len=0;
String sentence[];
String words[][];
sentence = name.split("\\.");
count = sentence.length;

System.out.print("total sentence: " );
System.out.println(count);
int h;  
words = new String[count][]; 

for (h = 0; h < count; h++) {
     String tmp[] = sentence[h].split(" ");
     words[h] = tmp;
     len = len + words[h].length;
     System.out.println("total words: " );
     System.out.print(len); 

     temp = sentence[h].split(delimiter);  

     for(int i = 0; i < temp.length; i++) {
        System.out.print(len);
        System.out.println(temp[i]);
        len++;
     }  
}
4

2 回答 2

1

我无法理解您的代码,但这里是如何仅用 3 行代码来实现您声明的意图:

String document; // read from somewhere

List<List<String>> words = new ArrayList<>();
for (String sentence : document.split("[.?!]\\s*"))
    words.add(Arrays.asList(sentence.split("[ ,;:]+")));

如果要将 转换Lists为数组,请使用List.asArray(),但我不推荐它。列表比数组更容易处理。一方面,它们会自动扩展(上述代码如此密集的原因之一)。

附录:(大多数)字符不需要在字符类中转义。

于 2014-04-06T14:40:22.610 回答
1

您的输入字符串似乎存储在main. 我不明白内部for循环应该做什么:它len重复打印,但不更新它!

String sentences[];
String words[][];

// End punctuation marks are ['.', '?', '!']
sentences = name.split("[\\.\\?\\!]"); 

System.out.println("num of sentences: " + sentences.length);

// Allocate stogage for (sentences.length) new arrays of strings
words = new String[sentences.length][];

// For each sentence
for (int h = 0; h < sentences.length; h++) {
  // Remove spaces from beginning and end of sentence (to avoid 0-length words)
  // split by any white space character sequence (caution if using Unicode!)
  words[h] = sentences[h].trim().split("\\s+"); 

  // Print out length of sentence.
  System.out.println("words (in sentence " + (h+1) + "): " + words[h].length);
}
于 2014-04-06T14:42:36.963 回答