1

我正在寻找一种比较两个字符串的方法。但不是简单的equals()。我需要一些指标来说明与 String 匹配的可能性有多大。例如(值是未计算的猜测):

“汽车”和“汽车”重新运行 1.0

“Car dog”和“Car”返回 0.5

“坐”和“坐”返回 0.45 等。

基本上可以替代:Java 的 difflib.sqeuencematcher ( http://docs.python.org/2/library/difflib.html )。

我已经看过@ java-diff-utils,但我没有找到办法做到这一点......

4

2 回答 2

4

你应该看看 Apache Commons Lang StringUtils。尤其是Levenshtein 距离

于 2013-05-21T14:49:43.530 回答
2

要获得与您要求的近似值,您可以使用字符串的大小,然后 StringUtils.remove 您的匹配尝试,然后将原始大小减去剩余大小除以原始大小。

public double matchString(final String stringToMatch, final String matchPattern) {

    final int testSize = stringToMatch.length();


    if (testSize == 0 && matchPattern.length() == 0) {
        return 1.0;
    } else if (testSize == 0) {
        return 0.0;
    }

    final String remainderString = StringUtils.remove(stringToMatch, matchPattern);
    final int remainderSize = remainderString.length();

    final double result = (double) (testSize - remainderSize) / (double) testSize;

    return result;
}

@Test
public void testMatchString() {

    final double emptyResult = matchString("", "");

    final double delta = 0.01;
    Assert.assertEquals(1.0, emptyResult, delta);

    final double emptyCarResult = matchString("", "Car");
    Assert.assertEquals(0.0, emptyCarResult, delta);

    final double dogCatResult = matchString("CarDog", "Car");
    Assert.assertEquals(0.5, dogCatResult, delta);

    final double carResult = matchString("Car", "Car");
    Assert.assertEquals(1.0, carResult, delta);

    final double carsCarResult = matchString("Cars", "Car");
    Assert.assertEquals(0.75, carsCarResult, delta);

    final double sittingSitResult = matchString("Sitting", "Sit");
    Assert.assertEquals(0.4286, sittingSitResult, delta);

    // no match since the 'S' in Sitting is uppercased and is not in sit.
    // this can be fixed up lowercasing both the stringToMatch and matchPattern
    // in matchString
    final double sittingSit2Result = matchString("Sitting", "sit");
    Assert.assertEquals(0.0, sittingSit2Result, delta);

    // note the Sit match pattern matches two instences in 'Sit Sitting'
    final double sittingSit3Result = matchString("Sitter Sitting", "Sit");
    Assert.assertEquals(0.4286, sittingSit3Result, delta);
}
于 2013-05-21T15:01:27.587 回答