我正在尝试将 TextGrid 文件读入 NLTK,但遇到了一些麻烦。我知道有一个 Textgrid 解析器(如此处所示:http://nltk.googlecode.com/svn/trunk/nltk_contrib/nltk_contrib/textgrid.py )。
不幸的是,我是 NLTK 的新手,我不知道如何使用解析器。
任何帮助将不胜感激。
不幸的是,了解 NLTK 并没有帮助:我查看了 textgrid 的源代码,虽然它是由 NLTK 的核心团队编写的,但它与其他 NLTK“语料库阅读器”没有任何共同之处。我建议您研究源代码中的文件头并进行一些实验——文档本来就足够了。
让您开始:看起来您可以通过将打开的文件指针传递给类的构造函数来加载 TextGrid 文件TextGrid
:
fp = open("grid_file.praat")
grid = TextGrid(fp)
for tier in grid:
# do something with the Tier object
PS。这不是一个非常完整的答案,但我不能在评论中包含代码片段。
A bit late to the party, but here I go:
You could save the TextGrid
object as a JSON file and read it into NLTK using the standard python libraries as in this answer.
Praat does not include (at this time) a JSON converter, but [full disclaimer] I have been working on one such script that should do the job. It's part of a larger plugin I maintain, which can be downloaded from its github repository.
Once you install the plugin, you can use it by running
runScript: preferencesDirectory$ + "/plugin_jjatools/save_as_json.praat",
..."/output/path", "Data stream", "Pretty printed"
That script basically calls a perl script in the background, which does most of the hard work, so you could also just run the perl script directly. Even though it is still in development, most object types are currently supported.