puzzle - 翻译没有空格的摩尔斯电码

Question

我有一些摩尔斯电码丢失了字母之间的空格，我的挑战是找出信息的内容。到目前为止，由于可能存在大量组合，我有点迷路了。

这是我收到的消息的所有信息。

输出将是英文
总会有一个有意义的翻译
这是示例消息-..-...-...-...-..-.-.-.-.-..-.-.-.-.-.-.-.-.-.-..-...-.
消息不应超过 70 个字符
莫尔斯电码取自较长的流，因此第一组或最后一组可能被切断，因此没有有效的翻译

有没有人有一个聪明的解决方案？

score 8 · Accepted Answer

这不是一个容易的问题，因为正如 ruakh 所建议的，给定消息有许多可行的句子。例如，“JACK AND JILL WENT UP THE HILL”与“JACK AND JILL WALK CHISELED”具有相同的编码。由于这些都是语法句子并且每个单词中的单词都很常见，因此在不深入研究自然语言的情况下如何选择一个或另一个（或任何其他 40141055989476564163599 个与此消息具有相同编码的不同英语单词序列中的任何其他序列）并不明显加工。

无论如何，这里有一个动态编程解决方案来解决寻找最短句子的问题（如果有平局，则使用最少的字符）。它还可以计算与给定消息具有相同编码的句子总数。它需要一个文件中的英语单词词典。

下一个增强应该更好地衡量一个句子的可能性：也许是词频、莫尔斯电码的误报率（例如，“I”是一个常见词，但它经常作为其他莫尔斯电码序列的一部分出现） . 棘手的部分将是制定一个好的分数函数，该函数可以用动态编程计算的方式表示。

MORSE = dict(zip('ABCDEFGHIJKLMNOPQRSTUVWXYZ', [
    '.-', '-...', '-.-.', '-..', '.', '..-.', '--.', '....',
    '..', '.---', '-.-', '.-..', '--', '-.', '---', '.--.',
    '--.-', '.-.', '...', '-', '..-', '...-', '.--', '-..-',
    '-.--', '--..'
]))

# Read a file containing A-Z only English words, one per line.
WORDS = set(word.strip().upper() for word in open('dict.en').readlines())
# A set of all possible prefixes of English words.
PREFIXES = set(word[:j+1] for word in WORDS for j in xrange(len(word)))

def translate(msg, c_sep=' ', w_sep=' / '):
    """Turn a message (all-caps space-separated words) into morse code."""
    return w_sep.join(c_sep.join(MORSE[c] for c in word)
                      for word in msg.split(' '))

def encode(msg):
    """Turn a message into timing-less morse code."""
    return translate(msg, '', '')

def c_trans(morse):
    """Construct a map of char transitions.

    The return value is a dict, mapping indexes into the morse code stream
    to a dict of possible characters at that location to where they would go
    in the stream. Transitions that lead to dead-ends are omitted.
    """
    result = [{} for i in xrange(len(morse))]
    for i_ in xrange(len(morse)):
        i = len(morse) - i_ - 1
        for c, m in MORSE.iteritems():
            if i + len(m) < len(morse) and not result[i + len(m)]:
                continue
            if morse[i:i+len(m)] != m: continue
            result[i][c] = i + len(m)
    return result

def find_words(ctr, i, prefix=''):
    """Find all legal words starting from position i.

    We generate all possible words starting from position i in the
    morse code stream, assuming we already have the given prefix.
    ctr is a char transition dict, as produced by c_trans.
    """
    if prefix in WORDS:
        yield prefix, i
    if i == len(ctr): return
    for c, j in ctr[i].iteritems():
        if prefix + c in PREFIXES:
            for w, j2 in find_words(ctr, j, prefix + c):
                yield w, j2

def w_trans(ctr):
    """Like c_trans, but produce a word transition map."""
    result = [{} for i in xrange(len(ctr))]
    for i_ in xrange(len(ctr)):
        i = len(ctr) - i_ - 1
        for w, j in find_words(ctr, i):
            if j < len(result) and not result[j]:
                continue
            result[i][w] = j
    return result

def shortest_sentence(wt):
    """Given a word transition map, find the shortest possible sentence.

    We find the sentence that uses the entire morse code stream, and has
    the fewest number of words. If there are multiple sentences that
    satisfy this, we return the one that uses the smallest number of
    characters.
    """
    result = [-1 for _ in xrange(len(wt))] + [0]
    words = [None] * len(wt)
    for i_ in xrange(len(wt)):
        i = len(wt) - i_ - 1
        for w, j in wt[i].iteritems():
            if result[j] == -1: continue
            if result[i] == -1 or result[j] + 1 + len(w) / 30.0 < result[i]:
                result[i] = result[j] + 1 + len(w) / 30.0
                words[i] = w
    i = 0
    result = []
    while i < len(wt):
        result.append(words[i])
        i = wt[i][words[i]]
    return result

def sentence_count(wt):
    result = [0] * len(wt) + [1]
    for i_ in xrange(len(wt)):
        i = len(wt) - i_ - 1
        for j in wt[i].itervalues():
            result[i] += result[j]
    return result[0]

msg = 'JACK AND JILL WENT UP THE HILL'
print sentence_count(w_trans(c_trans(encode(msg))))
print shortest_sentence(w_trans(c_trans(encode(msg))))

score 0 · Accepted Answer

我不知道这是否“聪明”，但我会尝试广度优先搜索（与 BRPocock 的正则表达式理念中隐含的深度优先搜索相反）。假设您的字符串如下所示：

.---.--.-.-.-.--.-...---...-...-..
J   A C   K  A N D  J   I L   L

你从状态开始('', 0)（''到目前为止你已经解码；0作为你在摩尔斯电码字符串中的位置）。从位置零开始，可能的初始字符是. E、.- A、.-- W、.--- J和.---- 1。因此，将状态('E', 1)、('A', 2)、('W', 3)、('J', 4)和('1', 5)推送到您的队列中。出队状态后('E', 1)，您将入队状态('ET', 2),('EM', 3)和('EO', 4)。

现在，您的可能状态队列将很快增长 - { ., -} 都是字母，所有 { .., .-, -., --} 和所有 { ..., ..-, .-., .--, -.., -.-, --., } 都是字母---，所以在每次传递中你的状态数将增加至少三倍——所以你需要有一些用户反馈机制。特别是，您需要某种方式来询问您的用户“这个字符串是否以开头EOS3AIOSF？”，如果用户说“否”，您将需要丢弃状态("EOS3AIOSF", 26)从您的队列中。理想的情况是向用户提供一个 GUI，该 GUI 每隔一段时间会显示所有当前状态并让他/她选择哪些值得继续。（当然，“用户”也将是你。英语缺少代词：如果“你”指的是程序，那么什么代词指的是用户程序员？！）

score 0 · Accepted Answer

维护 3 样东西：到目前为止的单词列表 S、到目前为止的当前单词 W 和当前符号 C。

S 应该只是好词，例如。'快速'
W 应该是一个单词的有效前缀，例如。['兄弟']
C 应该是某个字母的有效前缀，例如。'.-'

现在，给定一个新符号，比如说'-'，我们用它扩展C（在这种情况下，我们得到'.--'）。如果 C 是一个完整的字母（在这种情况下是字母 'W'），我们可以选择将它添加到 W 中，或者通过添加更多符号来进一步扩展字母。如果我们扩展 W，我们可以选择将它添加到 S（如果它是一个有效的词），或者继续进一步扩展它。

这是一个搜索，但大多数路径会很快终止（只要 W 不是您可以停止的任何单词的有效前缀，并且只要 C 不是您可以停止的任何字母的前缀）。

为了提高效率，您可以使用动态编程来避免冗余工作并使用尝试来有效地测试前缀。

代码可能是什么样的？省略测试字符串是否为英文单词的函数“is_word”和测试字符串是否是任何有效单词的开头的函数“is_word_prefix”，如下所示：

morse = {
    '.-': 'A',
    '-...': 'B',
    etc.
}

def is_morse_prefix(C):
    return any(k.startswith(C) for k in morse)

def break_words(input, S, W, C):
    while True:
        if not input:
            if W == C == '':
                yield S
            return
        i, input = input[0], input[1:]
        C += i
        if not is_morse_prefix(C):
            return
        ch = morse.get(C, None)
        if ch is None or not is_word_prefix(W + ch):
            continue
        for result in break_words(input, S, W + ch, ''):
            yield result
        if is_word(W + ch):
            for result in break_words(input, S + ' ' + W + ch, '', ''):
                yield result

for S in break_words('....--', [], '', ''):
    print S

puzzle - 翻译没有空格的摩尔斯电码

3 回答 3

Related

Reference