3

我有一个字符串,

    string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships."

我还有另一个名为“string2”的字符串,它只有<NOUN> and </NOUN>用空格分隔的“”标签包围的字符串。

string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>"

请注意,第二个字符串可以有任何名词标记词(基于 'string1',例如:如果 string1 有 3 个名词,则 string2 将有相同的 3 个名词被名词标签包围)
我想将标签添加到 'string1' 和使 string1 如下,

string1 = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships."

我使用以下代码来执行此操作,

Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
    Matcher m = p.matcher(string2);
    while(m.find()) {
        string1= string1.replaceAll(m.group(1),m.group(0));
    } 

但它给了我以下输出,

<NOUN><NOUN><NOUN>Sri Lanka</NOUN></NOUN> National Chess Championship</NOUN> this year and represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> at represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.

谁能告诉我如何正确地做到这一点?
或者请告诉我如何从给定的输出中获得所需的输出?

4

2 回答 2

2

代替 :

string1= string1.replaceAll(m.group(1),m.group(0));

利用 :

string1= string1.replaceAll("(?<!<NOUN>)("+m.group(1)+")(?!</NOUN>)",m.group(0));

在此处查看有关“向前看和向后看构造” 的更多信息

于 2012-08-18T08:57:06.970 回答
0

你的例子的问题是 thatSri Lanka National Chess Championship是一个名词,并且Sri Lanka这个字符串的一部分也是一个名词。因此,您的匹配器正在多次替换字符串。

您可以通过不替换已替换的字符串片段来解决此问题。我将每个匹配的字符串分成三个部分:before、match-str、after。保持断弦的顺序。Vector 是一个非常方便的数据结构。

import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class Check {

static String print(Vector<String> parts) {
    String str = parts.elementAt(0);

    for(int i=1; i<parts.size(); i++) {
        str += parts.elementAt(i); 
        //System.out.print(i + " : " + parts.elementAt(i) + "\n");
    }

    return str;
}

public static void main(String args[]) {
    String string1;
    String string2;
    String expected;

    string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships.";
    string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>";
    expected = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.";


    Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
    Matcher m = p.matcher(string2);
    Vector<String> parts = new Vector<String>();
    parts.add(string1);

    while(m.find()) {
        for(int i=0; i<parts.size(); i++) {

            //search for used part
            if(parts.elementAt(i).indexOf("<NOUN>")!=-1) {
                continue;
            }

            // search for pattern
            String cur = parts.elementAt(i);
            int disp = cur.indexOf(m.group(1));
            if(disp==-1) {
                continue;
            } else {
                parts.remove(i);
                Vector<String> newParts = new Vector<String>();

                if(disp!=0) {
                    newParts.add(cur.substring(0, disp));
                }

                newParts.add(m.group(0));

                if((disp+m.group(1).length())!=cur.length()) {
                    newParts.add(cur.substring(disp+m.group(1).length()));
                }

                if(i!=0) {
                    parts.addAll(i, newParts);
                } else {
                    parts.addAll(newParts);
                }

                //System.out.print(print(parts) + "\n");
            }           
        }
    }

    string1 = print(parts);
    if(!string1.equals(expected)) {
        System.out.println("Unexpected output !!");
    } else {
        System.out.println("Correct !!");
    }
}

};

为方便起见,您可以将打印方法重命名为字符串化。

于 2012-08-18T07:04:49.507 回答