Since you are going to sort words, I assume all characters ascii values are in the range 0-255. Then you can do a Counting Sort over the words.
The counting sort is going to take the same amount of time as the size of the input word. Reconstruction of the string obtained from counting sort will take O(wordlen). You cannot make this step less than O(wordLen) because you will have to iterate the string at least once ie O(wordLen). There is no predefined order. You cannot make any assumptions about the word without iterating though all the characters in that word. Traditional sorting implementations(ie comparison based ones) will give you O(n * lg n). But non comparison ones give you O(n).
Iterate over all the words of the list and sort them using our counting sort. Keep a map of
sorted words to the list of known words they map. Addition of elements to a list takes constant time. So overall the complexity of the algorithm is O(n * avgWordLength).
Here is a sample implementation
import java.util.ArrayList;
public class ClusterGen {
static String sortWord(String w) {
int freq[] = new int[256];
for (char c : w.toCharArray()) {
freq[c]++;
}
StringBuilder sortedWord = new StringBuilder();
//It is at most O(n)
for (int i = 0; i < freq.length; ++i) {
for (int j = 0; j < freq[i]; ++j) {
sortedWord.append((char)i);
}
}
return sortedWord.toString();
}
static Map<String, List<String>> cluster(List<String> words) {
Map<String, List<String>> allClusters = new HashMap<String, List<String>>();
for (String word : words) {
String sortedWord = sortWord(word);
List<String> cluster = allClusters.get(sortedWord);
if (cluster == null) {
cluster = new ArrayList<String>();
}
cluster.add(word);
allClusters.put(sortedWord, cluster);
}
return allClusters;
}
public static void main(String[] args) {
System.out.println(cluster(Arrays.asList("tea", "eat", "abba", "aabb", "hello")));
System.out.println(cluster(Arrays.asList("moon", "bat", "meal", "tab", "male")));
}
}
Returns
{aabb=[abba, aabb], ehllo=[hello], aet=[tea, eat]}
{abt=[bat, tab], aelm=[meal, male], mnoo=[moon]}