emacs - Emacs：如何为文档生成单词表？

Question

我想使用 RefTex 为 LaTex 文档生成索引，遵循 RefTex 手册中的建议：

“...您可能希望从文档的单词列表开始，删除所有不应编入索引的单词。” （-> 为索引短语文件收集短语）。

现在我问自己：如何为我的多文件 LaTex 文档生成这样的单词列表？我在 Emacs 手册或网络上找不到答案。但是 Emacs 必须能够做到这一点，对吧？

感谢您的任何提示。

score 1 · Accepted Answer

一种快速入门的方法（在命令行，而不是 emacs）：

sed 's/ */\n/g' < myDocument.txt | sort -f | uniq > wordListToEdit.txt

score 0 · Accepted Answer

I found a solution that is independent from Emacs, but it produces a file with all tokens found in the document(s). I just marked all the .tex files in my LaTeX project in Emacs Dired, and then used

! myshellscript

to run the following script on all of them. You find more Information about nltk and Python here: http://www.nltk.org/

#!/usr/bin/env bash
echo $0
echo $1

python -c "\
from __future__ import division;\
import nltk, re, pprint;\
f = open('$1');\
raw = f.read();\
print nltk.word_tokenize(raw)\
" >> tok

emacs - Emacs：如何为文档生成单词表？

2 回答 2

Related

Reference