0

PS如果增补删减权重不同。比有什么算法可以帮助我。

或者,如果添加/删除和替换的权重不同,Wagner-Fischer 算法需要进行哪些修改以最小化编辑距离?

4

3 回答 3

0

到目前为止,我所知道的最理想的是Levenshtein,您也可以查看此出版物。希望能帮助到你 :)

于 2014-10-06T18:12:55.573 回答
0

不知道你知不知道,但由于编辑距离 PD 中的每一行只取决于前一行,所以你可以只保留最后两行。通过这种方式,您可以实现 O(n) 空间复杂度,而不是简单实现中的 O(n^2)。

Python 中的示例(假设替换成本为 2,添加成本为 3,删除成本为 5):

def levenshtein(s1, s2):
    A = [0]*(len(s2)+1)
    B = range(len(s2)+1)
    for i, c1 in enumerate(s1, 1):
        A[0] = i
        for j, c2 in enumerate(s2, 1):
            if c1 == c2:
                A[j] = B[j-1]
            else:
                A[j] = min(B[j-1]+2, A[j-1]+3, B[j]+5)
        A,B = B,A

    return B[len(s2)]

print levenshtein('kitten', 'sitting')
于 2014-10-06T18:39:50.463 回答
0

我用插入、删除和替换(cins、cdel、csub)的权重修改了我的 Wagner Fischer 实现。

#!/bin/bash

set -f

# based on https://github.com/osteslag/Changeset/blob/master/Sources/Changeset.swift
# https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm

declare A="the quick brown fox jumps  over a lazy dog"
declare B="the quick brown fox jumped        lazy dog and a brown and black dog"

declare RET

join(){
  local A="$1"
  RET="$A"
  RET="${RET//[?] [?]/ }"
  RET="${RET//[+] [+]/ }"
  RET="${RET//[-] [-]/ }"
  return 0
}

edits(){
  local  -a    A=("" ${1//[^a-zA-Z0-9]/ })  # remove all punctuation FIXME: could surround with space
  local  -a    B=("" ${2//[^a-zA-Z0-9]/ })
  local -i  rows=${#A[*]}
  local -i  cols=${#B[*]}
  local  -a pROW=("")
  local  -a cROW
  local -ia pdst=(0)
  local -ia cdst=(0)
  local -i  min r c
  local -i  ins cins=1
  local -i  del cdel=1
  local -i  sub csub=10

  # fill first row of insertions

  for((c=1; c<cols; c++)); do         # for each target +1
      pROW[c]="${pROW[c-1]} +\e[42m${B[c]}\e[m+"
    ((pdst[c]=c*cins))
  done

  ((rows==0)) && return 1

  for((r=1; r<rows; r++)); do

    # first column are deletions to get ""

      cROW[0]="${pROW[0]} -\e[41m${A[r]}\e[m-"
    ((cdst[0]=pdst[0]+cdel))

    ((cols>0)) && {

      #  X  0  T1  T2  T3
      #  0  0  i   i   i
      # S1  d  
      # S2  d

      #     c-1 c
      # r-1 SUB DEL
      # r   INS 

      for((c=1; c<cols\; c++)); do
        if [[ "${A[r]}" = "${B[c]}" ]]; then  # source and target match - no operation
            cROW[c]="${pROW[c-1]} ${A[r]}"
          ((cdst[c]=   pdst[c-1]))
        else
          ((ins=cdst[c-1], sub=pdst[c-1], del=pdst[c] ))  # 
          ((min= (del<=ins) ? ((del<=sub)?del:sub) : ins))
            if ((del==min)); then ((cdst[c]=min+cdel)); cROW[c]="${pROW[c  ]} -\e[41m${A[r]}\e[m-"
          elif ((ins==min)); then ((cdst[c]=min+cins)); cROW[c]="${cROW[c-1]} +\e[42m${B[c]}\e[m+"
          else                    ((cdst[c]=min+csub)); cROW[c]="${pROW[c-1]} ?\e[41m${A[r]}\e[m-\e[42m${B[c]}\e[m?"
          fi
          #((cdst[c]=min+1))
        fi
      done
    }
    pROW=("${cROW[@]}")
    pdst=( ${cdst[*]} )
  done

#  printf "%s "    "${pROW[@]}"    ; printf "\n"
#  printf "%s\n\n" "${pROW[@]: -1}"
  join "${pROW[*]: -1}"
  printf "%b\n\n" "$RET"
  return 0
}

edits "s e t t i n g" "k i t t e n"

edits "$A" "$B"
exit 0

输出:

$ ./pc-wf.bash
 -s e- +k i+ t t -i n- +e+ ?g-n?

 the quick brown fox -jumps over- +jumped lazy dog and+ a +brown and+ ?lazy-black? dog
于 2018-11-14T15:04:02.297 回答