我的问题是比较两个字符串的最快(质量也很重要,但不太重要)的方法是什么?
我正在寻找比较两个字符串的最有效方法。我比较的一些字符串可能超过 5000 个字符。我正在将大约 80 个字符串的列表与另一个大约 200 个字符串的列表进行比较。它需要很长时间,即使我正在穿线它。我正在使用StringUtils.getLevenshteinDistance(String s, String t)
来自 Apache Commons 的方法。我的方法如下。有一个更好的方法吗?
private void compareMe() {
List<String> compareStrings = MainController.getInstance().getCompareStrings();
for (String compare : compareStrings) {
int levenshteinDistance = StringUtils.getLevenshteinDistance(me, compare);
if (bestScore > levenshteinDistance
&& levenshteinDistance > -1) {
bestScore = levenshteinDistance; //global variable
bestString = compare; //global variable
}
}
}
这是两个字符串的示例,应该有一个很好的分数:
字符串 1:
SELECT
CORP_VENDOR_NAME as "Corporate Vendor Name",
CORP_VENDOR_REF_ID as "Reference ID",
MERCHANT_ID as "Merchant ID",
VENDOR_CITY as "City",
VENDOR_STATE as "State",
VENDOR_ZIP as "Zip",
VENDOR_COUNTRY as "Country",
REMIT_VENDOR_NAME as "Remit Name",
REMIT_VENDOR_REF_ID as " Remit Reference ID",
VENDOR_PRI_UNSPSC_CODE as "Primary UNSPSC"
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE VENDOR_REFERENCE_ID in
(SELECT distinct CORP_VENDOR_REF_ID
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE CORP_VENDOR_REF_ID = '${request.corp_vendor_id};')
字符串 2:
SELECT
CORP_VENDOR_NAME as "Corporate Vendor Name",
CORP_VENDOR_REF_ID as "Reference ID",
MERCHANT_ID as "Merchant ID",
VENDOR_CITY as "City",
VENDOR_STATE as "State",
VENDOR_ZIP as "Zip",
VENDOR_COUNTRY as "Country",
REMIT_VENDOR_NAME as "Remit Name",
REMIT_VENDOR_REF_ID as " Remit Reference ID",
VENDOR_PRI_UNSPSC_CODE as "Primary UNSPSC"
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE VENDOR_REFERENCE_ID in
(SELECT distinct CORP_VENDOR_REF_ID
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE CORP_VENDOR_REF_ID = 'ACQ-169013')
您会注意到唯一的区别是'${request.corp_vendor_id};'
字符串末尾的 。这将导致它26
从LevenshteinDistance
方法中获得分数。