我正在尝试查找数据集中频繁项目的数量。所以最初我试图找到输入字符串的子集
Input:
coke,cracker,beer
coke,cracker
到目前为止我所做的是
String[] transaction = value.toString().split(delim);
/*
* Get subsets
*/
System.out.println("Transaction----"+Arrays.toString(transaction));
Arrays.sort(transaction);
int len = transaction.length;
long numofSubsets = (long) Math.pow(2, transaction.length);
for (long i = 1; i < numofSubsets; i++) {
String j = String.format("%" + len + "s", Long.toBinaryString(i)).replace(' ', '0');
String addVal = "";
for (int l = 0; l < j.length(); l++) {
if (j.charAt(l) == '0') {
//do nothing
}
else{
addVal += transaction[l]+delim;
System.out.println("addval---------- "+addVal);
addVal = addVal.substring(0, addVal.length()-1);
}
}
}
输出是
Transaction----[coke, cracker, beer]
addval---------- cracker
addval---------- coke
addval---------- coke
addval---------- coke,cracker
addval---------- beer
addval---------- beer
addval---------- beer,cracker
addval---------- beer
addval---------- beer,coke
addval---------- beer
addval---------- beer,coke
addval---------- beer,coke,cracker
Transaction----[coke, cracker]
addval---------- cracker
addval---------- coke
addval---------- coke
addval---------- coke,cracker
我希望子集为
coke
cracker
beer
coke,cracker
coke,beer
cracker,beer
coke
cracker
coke,cracker
Transaction----[coke, cracker]
addval---------- cracker
addval---------- coke
addval---------- coke
addval---------- coke,cracker
这里coke
越来越repeated
。
我做错什么了吗。
请指教。