1

我需要找到一种好的机制来提取特定单词(由用户提供)和单词两侧的 7 个单词。例如,如果我们有以下文本

text = "The mean distance of the Sun from the Earth is approximately 149.6 million kilometers (1 AU), though the distance varies as the Earth moves from perihelion in January to aphelion in July"

如果用户输入“地球”这个词,我应该能够提取文本的以下部分

mean distance of the Sun from the Earth is approximately 149.6 million kilometers (1 AU)

所以你可以看到“地球”这个词被每边7个词包围。我怎样才能在 Java 中做到这一点?

4

2 回答 2

3

用于([^ ]+ ?)匹配单词,并([^ ]+ ?){0,7}获取关键字:

String text = "The mean distance of the Sun from the Earth is approximately 149.6 million kilometers (1 AU), though the distance varies as the Earth moves from perihelion in January to aphelion in July";
String word = "Earth";
int around=7;
String pattern="([^ ]+ ?){0,"+around+"}"+word+"( ?[^ ]+){0,"+around+"}";        
if(pattern!=null){
    Matcher m = Pattern.compile(pattern).matcher(text);
    if(m.find()){
        System.out.println(m.group());
    }
}
于 2012-10-01T00:17:22.237 回答
1
public static void print() throws Exception {

    String s = "The mean distance of the Sun from the Earth is approximately 149.6 million kilometers (1 AU), though the distance varies as the Earth moves from perihelion in January to aphelion in July";
    int presize = 7;
    int postsize = 7;

    String term = "Earth";
    String[] flds = s.split("[\\s]+");

    int idx = 0;
    for (idx = 0; idx < flds.length && !flds[idx].equals(term); idx++) 
        ;

    if (idx == flds.length)
        throw new Exception("Term not found");

    int start = idx-presize;
    if (start < 0)
        start = 0;
    int end = idx+postsize;
    if (end >= flds.length)
        end = flds.length-1;
    for (int i = start; i <= end; i++) {
        System.out.print(flds[i] + " ");
    }
}
于 2012-10-01T00:13:39.303 回答