1

我正在开发一个 Java 程序,该程序在字典中搜索由一组特定字母组成的单词。我想知道是否可以设置一个正则表达式,让您只使用一个字符,就像它出现在一个字符串中一样频繁。例如,带有字母 SHARE。听到,野兔,海,等等都是有效的。但是 see 或 sarah 无效,因为您分别只有一个 e 或一个 a。

4

5 回答 5

1

正则表达式是关于模式匹配的。找到一个简单的模式可能是不可能的。

如果你真的真的很想要一个正则表达式,这些函数会生成一个:

public  static String permutation(String str) {
    return "^" + permutation("",str).replaceFirst("\\|", "(") + ")$";
 }

 private static String permutation(String prefix, String str) {
    String s = "";
    int n = str.length();
    if (n == 0) return "|"+prefix;
    else {
        for (int i = 0; i < n; i++)
           s += permutation(prefix + str.charAt(i)+"?",
                            str.substring(0, i) + str.substring(i+1, n));
    }
    return s;
}

对于“分享”,它将返回:

^(s?h?a?r?e?|s?h?a?e?r?|s?h?r?a?e?|s?h?r?e?a?|s?h?e?a?r?|s?h?e?r?a?|s?a?h?r?e?|s?a?h?e?r?|s?a?r?h?e?|s?a?r?e?h?|s?a?e?h?r?|s?a?e?r?h?|s?r?h?a?e?|s?r?h?e?a?|s?r?a?h?e?|s?r?a?e?h?|s?r?e?h?a?|s?r?e?a?h?|s?e?h?a?r?|s?e?h?r?a?|s?e?a?h?r?|s?e?a?r?h?|s?e?r?h?a?|s?e?r?a?h?|h?s?a?r?e?|h?s?a?e?r?|h?s?r?a?e?|h?s?r?e?a?|h?s?e?a?r?|h?s?e?r?a?|h?a?s?r?e?|h?a?s?e?r?|h?a?r?s?e?|h?a?r?e?s?|h?a?e?s?r?|h?a?e?r?s?|h?r?s?a?e?|h?r?s?e?a?|h?r?a?s?e?|h?r?a?e?s?|h?r?e?s?a?|h?r?e?a?s?|h?e?s?a?r?|h?e?s?r?a?|h?e?a?s?r?|h?e?a?r?s?|h?e?r?s?a?|h?e?r?a?s?|a?s?h?r?e?|a?s?h?e?r?|a?s?r?h?e?|a?s?r?e?h?|a?s?e?h?r?|a?s?e?r?h?|a?h?s?r?e?|a?h?s?e?r?|a?h?r?s?e?|a?h?r?e?s?|a?h?e?s?r?|a?h?e?r?s?|a?r?s?h?e?|a?r?s?e?h?|a?r?h?s?e?|a?r?h?e?s?|a?r?e?s?h?|a?r?e?h?s?|a?e?s?h?r?|a?e?s?r?h?|a?e?h?s?r?|a?e?h?r?s?|a?e?r?s?h?|a?e?r?h?s?|r?s?h?a?e?|r?s?h?e?a?|r?s?a?h?e?|r?s?a?e?h?|r?s?e?h?a?|r?s?e?a?h?|r?h?s?a?e?|r?h?s?e?a?|r?h?a?s?e?|r?h?a?e?s?|r?h?e?s?a?|r?h?e?a?s?|r?a?s?h?e?|r?a?s?e?h?|r?a?h?s?e?|r?a?h?e?s?|r?a?e?s?h?|r?a?e?h?s?|r?e?s?h?a?|r?e?s?a?h?|r?e?h?s?a?|r?e?h?a?s?|r?e?a?s?h?|r?e?a?h?s?|e?s?h?a?r?|e?s?h?r?a?|e?s?a?h?r?|e?s?a?r?h?|e?s?r?h?a?|e?s?r?a?h?|e?h?s?a?r?|e?h?s?r?a?|e?h?a?s?r?|e?h?a?r?s?|e?h?r?s?a?|e?h?r?a?s?|e?a?s?h?r?|e?a?s?r?h?|e?a?h?s?r?|e?a?h?r?s?|e?a?r?s?h?|e?a?r?h?s?|e?r?s?h?a?|e?r?s?a?h?|e?r?h?s?a?|e?r?h?a?s?|e?r?a?s?h?|e?r?a?h?s?)$

显然这可以被简化+优化很多,但仍然不是一个好主意。

编辑:较短输出的功能:

public  static String permutation(String str) {
    return "^(" + permutation("",str) + ")$";
 }

 private static String permutation(String prefix, String str) {
   String s = "";
   int n = str.length();
   if (n == 0) return prefix;
   else {
     for (int i = 0; i < n; i++)
       if (i != n-1)
         s += prefix + str.charAt(i) + "?(" +
            permutation("", str.substring(0, i) + str.substring(i+1, n))+")|";
       else
         s += prefix + str.charAt(i) + "?" +
            permutation("", str.substring(0, i) + str.substring(i+1, n));
   }
   return s;
}

印刷:

^(s?(h?(a?(r?(e?)|e?r?)|r?(a?(e?)|e?a?)|e?a?(r?)|r?a?)|a?(h?(r?(e?)|e?r?)|r?(h?(e?)|e?h?)|e?h?(r?)|r?h?)|r?(h?(a?(e?)|e?a?)|a?(h?(e?)|e?h?)|e?h?(a?)|a?h?)|e?h?(a?(r?)|r?a?)|a?(h?(r?)|r?h?)|r?h?(a?)|a?h?)|h?(s?(a?(r?(e?)|e?r?)|r?(a?(e?)|e?a?)|e?a?(r?)|r?a?)|a?(s?(r?(e?)|e?r?)|r?(s?(e?)|e?s?)|e?s?(r?)|r?s?)|r?(s?(a?(e?)|e?a?)|a?(s?(e?)|e?s?)|e?s?(a?)|a?s?)|e?s?(a?(r?)|r?a?)|a?(s?(r?)|r?s?)|r?s?(a?)|a?s?)|a?(s?(h?(r?(e?)|e?r?)|r?(h?(e?)|e?h?)|e?h?(r?)|r?h?)|h?(s?(r?(e?)|e?r?)|r?(s?(e?)|e?s?)|e?s?(r?)|r?s?)|r?(s?(h?(e?)|e?h?)|h?(s?(e?)|e?s?)|e?s?(h?)|h?s?)|e?s?(h?(r?)|r?h?)|h?(s?(r?)|r?s?)|r?s?(h?)|h?s?)|r?(s?(h?(a?(e?)|e?a?)|a?(h?(e?)|e?h?)|e?h?(a?)|a?h?)|h?(s?(a?(e?)|e?a?)|a?(s?(e?)|e?s?)|e?s?(a?)|a?s?)|a?(s?(h?(e?)|e?h?)|h?(s?(e?)|e?s?)|e?s?(h?)|h?s?)|e?s?(h?(a?)|a?h?)|h?(s?(a?)|a?s?)|a?s?(h?)|h?s?)|e?s?(h?(a?(r?)|r?a?)|a?(h?(r?)|r?h?)|r?h?(a?)|a?h?)|h?(s?(a?(r?)|r?a?)|a?(s?(r?)|r?s?)|r?s?(a?)|a?s?)|a?(s?(h?(r?)|r?h?)|h?(s?(r?)|r?s?)|r?s?(h?)|h?s?)|r?s?(h?(a?)|a?h?)|h?(s?(a?)|a?s?)|a?s?(h?)|h?s?)$
于 2013-02-25T19:55:20.937 回答
0

这是一种方法:

  1. 遍历您的字符串数组以创建一个MultiMap<String, String>(如果您使用 Guava 库或HashMap<String, List<String>>使用 java.util),其中键是已排序的单词,值是该排序字符串的合法单词。这将是您的预处理步骤,因此您只需执行一次。由于您的哈希图已经存在,因此后续搜索将相对较快(与每次循环通过您的字典以匹配某些正则表达式相比,这比使用哈希图要慢得多)。
  2. 对您的搜索字符串进行排序,并找到该排序字符串的所有子字符串。
  3. 遍历排序子集,并搜索 HashMap 或 MultiMap 以获取该排序子集字符串的值。跟踪所有的值,你就会得到答案。

我认为这里的问题是正则表达式不适合您所描述的内容,因为您仍然必须为每次搜索(已存储为数组)遍历整个字典。然而,如果您创建哈希图(这一步相对昂贵),您只会循环通过排序的子集列表(这很便宜)。

于 2013-02-25T19:54:37.787 回答
0

如果单词中没有出现两次的字母,因为没有 in share,你可以使用

^(?!([share]).*\\1)[share]+$

这将匹配由 中的部分或全部字母组成的任何单词share

如果一个字母出现不止一次,则(?!)包含对括号中匹配内容的反向引用的负前瞻会阻止匹配。\\1

您可以扩展此原则以处理包含多次出现的字母的单词。

于 2013-02-25T19:55:11.583 回答
0

好的,这是一个如何执行此操作的示例。但是,您应该阅读这些有关灾难性回溯的文章:

失控的正则表达式:灾难性的回溯

正则表达式性能

^(?!.*s.*s)(?!.*h.*h)(?!.*a.*a)(?!.*r.*r)(?!.*e.*e)(?![^share]).*$

如果您想允许 2 个字母“s”之类的共享来允许单词 sashes,您可以这样做。

^(?!.*s.*s.*s)(?!.*h.*h)(?!.*a.*a)(?!.*r.*r)(?!.*e.*e)(?![^share]).*$

单词中少于3个“s”的想法是可以的......

于 2013-02-25T19:56:52.597 回答
0

一种不使用模式匹配但解决问题根源的方法是创建一个数组,其中包含目标单词中每个字符的计数:“deaf”将是数组 (1,0,0,1 ,1,1,0,0,...)。

然后,当您遍历字典时,为每个单词准备相同的数组并将其从目标单词的数组中减去 - 如果差异数组中有任何负值,则该单词不能由字母组成目标词。

于 2013-02-25T20:28:04.403 回答