我有一本放在2-3-4 树中的单词词典。单词有不同的长度。使用电话键盘,我需要找到可以响应特定电话号码的所有可能单词。给定键盘:
例如,数字 26678837 可能是单词“COMPUTER”,但也可能是另一个单词。鉴于我的所有单词都在 2-3-4 树中,为了从给定的电话号码中找到所有可能的单词,最好的算法或搜索方法是什么?
我有一本放在2-3-4 树中的单词词典。单词有不同的长度。使用电话键盘,我需要找到可以响应特定电话号码的所有可能单词。给定键盘:
例如,数字 26678837 可能是单词“COMPUTER”,但也可能是另一个单词。鉴于我的所有单词都在 2-3-4 树中,为了从给定的电话号码中找到所有可能的单词,最好的算法或搜索方法是什么?
这里的关键是获取所有不同的字符组合,这些组合可以从与该数字从左到右序列中提供的数字相关的字符中派生出来。显然,提供的数字越大,组合就越多。例如,您在帖子中提供的26678837的数量可以生成11,664 个唯一字符组合(字符串单词)。
2 6 6 7 8 8 3 7
abc mno mno pqrs tuv tuv def pqrs
这 11,664 个生成的字符串单词中的每一个都需要通过一个本身可能包含 170,000 个(或更多)单词并检查(不区分大小写)相等性的字典。这不会是一个过快的过程,因为这相当于近 20 亿次平等检查。在我使用了 10 年的桌面 Windows 机器上,这大约需要 1 分 20 秒。
在任何情况下,下面提供的方法都采用提供的字符串编号,生成不同的字符组合(字符词)并将每个组合与提供的字典(词)文本文件中的词进行比较。阅读附加到该方法的文档:
/**
* Converts the supplied string telephone number (or any string of integer
* numbers) to possible words acquired from within the supplied dictionary
* text file.<br><br>
*
* Specific digits relate to specific alpha characters as they do in a telephone
* keypad, for example:<pre>
*
* 1 2 3 4 5 6 7 8 9 0
* ABC DEF GHI JKL MNO PQRS TUV WXYZ
* </pre>
*
* Digits 1 and 0 do not relate to any alpha characters. By looking at the
* above scale you can see how a number like "26678837" can generate 11,664
* unique character combinations (string words). Each one of these combinations
* needs to be check against the supplied dictionary (or word) text file to see
* if matches a specific dictionary word. If it does then we store that word
* and carry on to the next character combination. If the dictionary (word)
* text file contains 170,000 words and there are 11,664 different character
* combinations then a simple calculation (170,000 x 11,664) determines that
* there will need to be 1,982,880,000 (almost 2 billion) inspections done for
* equality. Because of this, many (sometimes thousands) of Executor Service
* threads are used to speed up the processing task. At the end of the day
* however, task speed completely relies on the speed of the Hard Disk Drive
* (HDD) and too many threads can actually degrade performance. It's a tough
* balance. Modify things as you see fit.
*
*
*
* @param dictionaryFilePath (String) The path and file name of the dictionary
* file to gather possible words from. This dictionary (or word) file must
* contain a single word on each line of the file. There can not be more than
* one word on any given file line. Blank lines within the file are ignored.
* Letter case is always ignored during processing however any words found and
* returned will be in Upper Letter Case.<br><br>
*
* If you don't currently have a dictionary or word text file available then
* you can easily acquire one from the Internet.<br>
*
* @param telephoneNumber (String) The telephone number (or any number) to
* convert to a dictionary word. Any non-digit characters within the supplied
* string are automatically removed so that only digits remain. The longer
* the number string, the longer it takes to process this number to a word.
* There is no guarantee that the number you supply will produce a word from
* within the supplied dictionary (word) file.<br>
*
* @param showProcess (boolean) If boolean 'true' is supplied then the processing
* is displayed within the console window. This however slows things down
* considerably but does give you an idea what is going on. Because larger
* string numbers can take some time to complete processing, a modal 'Please
* Wait' dialog with a progress bar is displayed in the middle of your screen.
* When processing has completed this dialog window is automatically closed.<br>
*
* @return (String[] Array) A String[] Array of the word or words from the
* supplied dictionary (word) file which the supplied number correlates to.
*/
public static String[] telephoneNumberToDictionaryWords(final String dictionaryFilePath,
final String telephoneNumber, final boolean showProcess) {
long timeStart = System.currentTimeMillis();
if (dictionaryFilePath == null || dictionaryFilePath.isEmpty()) {
System.err.println("The argument for dictionaryFilePath can not be null or empty!");
return null;
}
else if (telephoneNumber == null || telephoneNumber.trim().isEmpty()) {
System.err.println("The argument for telephoneNumber can not be null or empty!");
return null;
}
else if (!new File(dictionaryFilePath).exists()) {
System.err.println("File Not Found! ("
+ new File(dictionaryFilePath).getAbsolutePath() + ")");
return null;
}
String digits = String.valueOf(telephoneNumber).replaceAll("[^\\d]", "");
if (digits.isEmpty()) {
return null;
}
List<String> foundList = new ArrayList<>();
// ================== The 'Please Wait' Dialog Window ==================
// Depending on the number supplied, this process can take a while to
// complete therefore put a popup 'Please Wait...' window into play
// here.
final javax.swing.JFrame iframe = new javax.swing.JFrame();
iframe.setAlwaysOnTop(true);
iframe.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
iframe.setLocationRelativeTo(null);
final JDialog workingDialog = new JDialog(iframe);
workingDialog.setCursor(java.awt.Cursor.getPredefinedCursor(java.awt.Cursor.WAIT_CURSOR));
final JPanel p1 = new JPanel(new BorderLayout());
javax.swing.border.Border lineBorder = javax.swing.BorderFactory.createLineBorder(Color.red, 4);
p1.setBorder(lineBorder);
p1.setBackground(Color.BLACK);
p1.setPreferredSize(new java.awt.Dimension(255, 150));
final JLabel label = new JLabel();
label.setHorizontalAlignment(JLabel.CENTER);
label.setVerticalAlignment(JLabel.CENTER);
label.setText("<html><font size='5',color=#14e5eb><b><center>Please Wait..."
+ "</center></b></font><br><center>This can take a little "
+ "while.</center></html>");
label.setForeground(Color.WHITE);
p1.add(label, BorderLayout.CENTER);
final JLabel label2 = new JLabel();
label2.setHorizontalAlignment(JLabel.CENTER);
label2.setVerticalAlignment(JLabel.CENTER);
label2.setText("Hello World");
label2.setForeground(Color.WHITE);
p1.add(label2, BorderLayout.BEFORE_FIRST_LINE);
final javax.swing.JProgressBar progressBar = new javax.swing.JProgressBar();
progressBar.setPreferredSize( new java.awt.Dimension (190, 25));
progressBar.setMaximumSize(new java.awt.Dimension(190,25));
progressBar.setStringPainted(true);
p1.add(progressBar, BorderLayout.SOUTH);
workingDialog.setUndecorated(true);
workingDialog.getContentPane().add(p1);
workingDialog.pack();
workingDialog.setLocationRelativeTo(iframe);
workingDialog.setDefaultCloseOperation(JDialog.DISPOSE_ON_CLOSE);
workingDialog.setModal(true);
// =====================================================================
SwingWorker<String, Void> worker = new SwingWorker<String, Void>() {
@Override
protected String doInBackground() throws InterruptedException {
// Get the number of words contained within the supplied dictionary file.
long numberOfDictionaryWords = 0L;
// This will auto-close the stream.
try (java.util.stream.Stream<String> linesStream
= java.nio.file.Files.lines(java.nio.file.Paths.get(dictionaryFilePath))) {
numberOfDictionaryWords = linesStream.count();
}
catch (IOException ex) {
System.err.println(ex);
}
if (showProcess) {
System.out.println("The number of words in dictionary: --> "
+ numberOfDictionaryWords);
}
// Map the alpha characters related to telephone digits
HashMap<Character, String> map = new HashMap<>();
map.put('0', "");
map.put('1', "");
map.put('2', "abc");
map.put('3', "def");
map.put('4', "ghi");
map.put('5', "jkl");
map.put('6', "mno");
map.put('7', "pqrs");
map.put('8', "tuv");
map.put('9', "wxyz");
// Get all the character combinations available for the
// telelphone number (or just number) digits provided.
List<String> phoneCharactersList = new ArrayList<>();
phoneCharactersList.add("");
for (int i = 0; i < digits.length(); i++) {
if (digits.charAt(i) == '0' || digits.charAt(i) == '1') {
continue;
}
ArrayList<String> tempList = new ArrayList<>();
String option = map.get(digits.charAt(i));
for (int j = 0; j < phoneCharactersList.size(); j++) {
for (int p = 0; p < option.length(); p++) {
tempList.add(new StringBuilder(phoneCharactersList.get(j))
.append(option.charAt(p)).toString());
}
}
phoneCharactersList.clear();
phoneCharactersList.addAll(tempList);
}
if (showProcess) {
System.out.println("The number of generated words to process: --> "
+ phoneCharactersList.size());
}
progressBar.setMinimum(0);
progressBar.setMaximum(phoneCharactersList.size());
/* Iterate through all the character combinations now contained
within the 'phoneCharactersList' and compare them to the words
contained within the dictionary file. Store found words into
the foundList List Interface object.
Declare concurrent executor service threads (one for each possible
word to check for in dictionary file). */
java.util.concurrent.ExecutorService executor =
java.util.concurrent.Executors.newFixedThreadPool(phoneCharactersList.size());
for (int i = 0; i < phoneCharactersList.size(); i++) {
String wordToFind = phoneCharactersList.get(i);
label2.setText("<html><center>Processing word: <font color=yellow>"
+ wordToFind + "</font></center></html>");
if (showProcess) {
System.out.println((i + 1) + ") Processing the possible word: " + wordToFind);
}
Runnable threadWorker = new Runnable() {
@Override
public void run() {
try (java.io.BufferedReader br = new java.io.BufferedReader(new java.io.FileReader(dictionaryFilePath))) {
String line;
while ((line = br.readLine()) != null) {
line = line.trim();
if (line.isEmpty() || (line.length() != wordToFind.length())) {
continue;
}
if (line.equalsIgnoreCase(wordToFind)) {
foundList.add(wordToFind.toUpperCase());
break;
}
}
}
catch (Exception ex) {
System.err.println(ex);
}
}
};
executor.execute(threadWorker);
progressBar.setValue(i+1);
}
// Shut down Executor Service threads (when they are done)
executor.shutdown();
// Hold rest of code processing until are executors are terminated.
while (!executor.isTerminated()) {
}
return null;
}
@Override
protected void done() {
workingDialog.dispose();
iframe.dispose();
workingDialog.setCursor(java.awt.Cursor.getPredefinedCursor(java.awt.Cursor.DEFAULT_CURSOR));
}
};
worker.execute();
workingDialog.setVisible(true);
try {
worker.get();
}
catch (Exception e1) {
e1.printStackTrace();
}
java.util.Collections.sort(foundList);
long timeEnd = System.currentTimeMillis();
if (showProcess) {
long seconds = (timeEnd - timeStart) / 1000;
String timeSpan = new StringBuffer("").append(String.valueOf(seconds))
.append(" seconds").toString();
if (seconds > 60) {
timeSpan = new StringBuffer("").append(String.valueOf(seconds/60))
.append(" minutes and ").append(String.valueOf(seconds%60))
.append(" seconds").toString();
}
System.out.println(new StringBuffer("It took ").append(timeSpan)
.append(" to process the number: ").append(telephoneNumber)
.toString());
}
return foundList.toArray(new String[foundList.size()]);
}
修改您认为合适的代码,以便利用您的 2-3-4 树、B-Tree 或任何您喜欢的。要使用此方法,您可以这样做:
String phoneNumber = "26678837";
String[] words = telephoneNumberToDictionaryWords("dictionary.txt", phoneNumber, false);
if (words == null || words.length == 0) {
System.out.println("There are no dictionary words available for "
+ "the Phone Number: " + phoneNumber);
}
else {
for (String str : words) {
System.out.println(str);
}
}
在 youtube 上,我观看了一个非常有趣的谷歌采访,其中编码器使用了这个算法。