1

我需要从给定的网页中检索数字后跟一些特定单位,例如 10 m、5 km...。这些特定单位是 a 的键map<String, Integer>keySet()返回一个逗号分隔的列表,例如["m", "km"...]. 有没有一种聪明的方法可以将集合作为变量的析取,["m"|"km"|...]这样我就可以在正则表达式中使用它,例如:

"(\\d+)"+ " " +"myMap.keySet()......"
4

3 回答 3

1

用管道加入集合: "(\\d+)\\s*(" + StringUtils.join(myMap.keySet(), "|") + ")"

于 2013-08-01T15:20:23.937 回答
1

怎么样

myMap.keySet().toString().replaceAll(",\\s*", "|").replaceAll("^\\[|\\]$", "")
//                       ^                         ^
//                       |                         +remove [ at start and ] at end
//                       +replace `,` and spaces after it with |

反而

myMap.keySet()

您的代码可能如下所示

String data = "1km is equal 1000 m, and 1  m is equal 100cm. 1 mango shouldnt be found";

Map<String, Integer> map = new HashMap<>();
map.put("m", 1);
map.put("km", 2);
map.put("cm", 3);

String regex = "\\d+\\s*("
        + map.keySet().toString()       //will create "[cm, m, km]"
            .replaceAll(",\\s*", "|")   //will change it to "[cm|m|km]"
            .replaceAll("^\\[|\\]$", "")//will change it to "cm|m|km"
        + ")\\b";                       
    // I added \\b - word boundary - to prevent matching `m` if it is at
    // start of some word like in 1 mango where it normally would match
    // (1 m)ango

Pattern p=Pattern.compile(regex);
Matcher m=p.matcher(data);
while(m.find()){
    System.out.println(m.group());
}
于 2013-08-01T15:28:04.427 回答
0

你可以试试这个:

String p = "\\d+ (?:";
for (String key : yourMap.keySet())
   p += key + "|";
p = p.substring(0, p.length() - 1) + ")";
于 2013-08-01T15:26:40.497 回答