因此,我有一长串这样的单词,并且基于第一个空格,我想将单词拆分为word-meaning。基本上我正在使用Apache POI
它,因为我必须读取 docx 文件,然后从中获取数据。
abash humiliate, embarrass
abdicate relinquish power or position
aberrant abnormal
abet aid, encourage (typically of crime)
abeyance postponement
aboriginal indigenous
abridge shorten
abstemious moderate
...
那么什么正则表达式适合我的目的,以便我可以像这样显示它:
word :abash
meaning : humiliate, embarrass
...
我的代码是:
public class WordFileReader {
/**
* @param args
*/
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("E:\\important.docx");
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
System.out.print(oleTextExtractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
}
}
--编辑--根据建议的答案,我正在使用这个
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("E:\\Words.docx");
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
//System.out.print(oleTextExtractor.getText());
Scanner sc = new Scanner(oleTextExtractor.getText());
while(sc.hasNextLine()) {
String line = sc.nextLine();
int i = line.indexOf(' ');
String word = line.substring(0, i);
String meaning = line.substring(i).trim();
System.out.println("word "+word);
System.out.println("meaning "+meaning);
}
} catch (Exception e) {
e.printStackTrace();
}
}
但我明白了
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(Unknown Source)
at WordFileReader.main(WordFileReader.java:25)