java - 以 & 作为分隔符的正则表达式

Question

我得到了一个长文本，我需要在其中找到嵌入一对中的所有文本&（例如，在文本"&hello&&bye&"中，我需要找到单词"hello"和"bye"）。

我尝试使用正则表达式".*&([^&])*&.*"，但它不起作用，我不知道有什么问题。

有什么帮助吗？

谢谢

score 6 · Accepted Answer

试试这个方法

String data = "&hello&&bye&";
Matcher m = Pattern.compile("&([^&]*)&").matcher(data);
while (m.find())
    System.out.println(m.group(1));

输出：

hello
bye

score 2 · Accepted Answer

不需要正则表达式。只是迭代！

boolean started = false;
List<String> list;
int startIndex;
for(int i = 0; i < string.length(); ++i){
    if(string.charAt(i) != '&')
        continue;
    if(!started) {
        started = true;
        startIndex = i + 1;
    }
    else {
        list.add(string.substring(startIndex, i)); // maybe some +-1 here in indices
    }
    started = !started;
}

或使用拆分！

String[] parts = string.split("&");
for(int i = 1; i < parts.length; i += 2) { // every second
    list.add(parts[i]);
}

score 2 · Accepted Answer

如果您不想使用正则表达式，这里有一个简单的方法。

String string = "xyz...." // the string containing "hello", "bye" etc. 


String[] tokens = string.split("&"); // this will split the string into an array 
                                     // containing tokens separated by "&"

for(int i=0; i<tokens.length; i++)
 {
     String token = tokens[i];


     if(token.length() > 0)
        { 
             // handle edge case
             if(i==tokens.length-1)
             {
                   if(string.charAt(string.length()-1) == '&')
                     System.out.println(token);
             }
             else 
              { 
                System.out.println(token);
              }
        }
 }

score 0 · Accepted Answer

我会进一步简化它。

检查第一个字符是&
检查最后一个字符是否&
String.split("&&")在它们之间的子串上

在代码中：

if (string.length < 2)
    throw new IllegalArgumentException(string); // or return[], whatever
if ( (string.charAt(0) != '&') || (string.charAt(string.length()-1) != '&')
    // handle this, too
String inner = string.substring(1, string.length()-1);
return inner.split("&&");

score 0 · Accepted Answer

两个问题：

您正在重复捕获组。这意味着您只会捕获&组中 s 之间的最后一个字母。
您只会匹配最后一个单词，因为.*s 会吞噬字符串的其余部分。

改用环视：

(?<=&)[^&]+(?=&)

现在整个比赛将是hello（并且bye当您第二次应用正则表达式时）因为周围&的 s 将不再是比赛的一部分：

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("(?<=&)[^&]+(?=&)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
}

score 0 · Accepted Answer

0

周围的环境.*没有意义，也没有生产力。就够&([^&])*&了。

于 2013-04-26T20:31:44.203 回答

java - 以 & 作为分隔符的正则表达式

6 回答 6

Related

Reference