dictionary - Hunspell, unmunch - dump whole dictionary, encoding error

Question

I'd like dump hunspell's pl_PL dictionary.

I found the solution: unmunch /usr/share/hunspell/pl_PL.dic /usr/share/hunspell/pl_PL.aff

But there's problem with encoding.

Part of the output:

ambasadorowaniom
ambasadorowaniach
ambasadorowa�
ambasadoruj�cy
ambasadoruj�cym

I've also tried filtering output with iconv, but the problem wasn't solved:

   affix: z�c� 4, strip: �� 2
   affix: z�ce 4, strip: �� 2
   affix: z�cej 5, strip: �� 2
stable 50 num is 470 flag G
parsing line: MAP 8
parsing line: MAP a�
parsing line: MAP c�

How can i solve this problem?

score 2 · Accepted Answer

如果您仍然想知道如何解决这个问题（我今晚遇到了这个问题），或者将来有人会有它并看这里 - iconv 解决了这个问题 - 字典文件似乎是用 iso-latin-2 编码的：

unmunch pl_PL.dic pl_PL.aff 2>/dev/null | iconv -f iso-8859-2 -t utf

score 1 · Accepted Answer

简短版：这是您的控制台终端的问题。将其更改为另一个，例如 xterm。

更长：奇怪。它应该是 UTF8。您确定不是您的控制台或终端不支持 UTF8 造成的吗？在任何支持 UTF8 的图形编辑器中检查结果。并检查您的 LOCALE 设置。

免责声明：我想提供帮助。但是，由于我无法发表任何评论（1 个信誉点），请求澄清或向用户发送消息，我必须提供任何答案（在我的答案中）以不被删除。

dictionary - Hunspell, unmunch - dump whole dictionary, encoding error

2 回答 2

Related

Reference