vim - Vim、词频函数和法语口音

Question

我最近发现了 Vim Tip n° 1531（文件的词频统计）。

按照建议，我将以下代码放入我的 .vimrc

function! WordFrequency() range
  let all = split(join(getline(a:firstline, a:lastline)), '\A\+')
  let frequencies = {}
  for word in all
    let frequencies[word] = get(frequencies, word, 0) + 1
  endfor
  new
  setlocal buftype=nofile bufhidden=hide noswapfile tabstop=20
  for [key,value] in items(frequencies)
    call append('$', key."\t".value)
  endfor
  sort i
endfunction
command! -range=% WordFrequency <line1>,<line2>call WordFrequency()

除了口音和其他法语细节（拉丁小连字 a 或 o 等）外，它工作正常。

我应该在此功能中添加什么以使其适合我的需要？

提前致谢

score 3 · Accepted Answer

该模式\A\+匹配任意数量的连续非字母字符 - 不幸的是 - 包括多字节字符，如我们的挚爱çàéô和朋友。

这意味着您的文本在空格和多字节字符处分割。

,\A\+短语

Rendez-vous après l'apéritif.

给出：

ap      1
apr     1
l       1
Rendez  1
ritif   1
s       1
vous    1

如果您确定您的文本不包含花哨的空格，则可以将此模式替换为\s\+仅匹配空格的模式，但这可能是自由的。

使用这种模式，\s\+相同的短语给出：

après       1
l'apéritif. 1
Rendez-vous 1

我认为，这更接近你想要的。

可能需要进行一些自定义以排除标点符号。

score 3 · Accepted Answer

3

对于 8 位字符，您可以尝试将拆分模式从更改\A\+为 [^[:alpha:]]\+。

于 2011-09-23T11:18:35.707 回答

score 0 · Accepted Answer

function! WordFrequency() range
  " Whitespace and all punctuation characters except dash and single quote
  let wordSeparators = '[[:blank:],.;:!?%#*+^@&/~_|=<>\[\](){}]\+'
  let all = split(join(getline(a:firstline, a:lastline)), wordSeparators)
  "...
endfunction

如果所有标点字符都应该是单词分隔符，则表达式缩短为

let wordSeparators = '[[:blank:][:punct:]]\+'

vim - Vim、词频函数和法语口音

3 回答 3

Related

Reference