linux - Linux系统Prolog程序中TXT文件编码的问题

Question

我正在研究一个由 SWI Prolog 文本分析器组成的大学 Prolog，它非常简单地执行以下操作：

读取包含一些文本的 .txt 输入文件，并将此文本放入我称之为 ASCII 字符的列表中：dataggare.txt
对这个原始的 ASCII 字符列表进行一些操作，并将其保存在一个名为System.txt的新文件中
最后将新修改的System.txt文件与另一个名为oracolo.txt的文件进行比较（表示如果所有操作都成功完成，System.txt应该是怎样的）， FMeasure 值表示System.txt与oracolo.txt的相似程度，但这是现在不重要

当我将我的新System.txt文件与oracolo.txt文件进行比较时会出现问题，并且仅当我使用 Linux 运行程序时才会出现此问题（如果我在 Windows 上运行它，我没有问题）

所以问题是当我执行以下查询时，我有一系列与oracolo.txt 文件的编码相关的警告

[debug]  ?- tagConfronto('dataggare.txt', 'oracolo.txt', FMeasure).
Warning: oracolo.txt:1:422: Illegal UTF-8 continuation
Warning: oracolo.txt:2:77: Illegal UTF-8 continuation
Warning: oracolo.txt:2:129: Illegal UTF-8 continuation
Warning: oracolo.txt:3:31: Illegal UTF-8 continuation
Warning: oracolo.txt:3:71: Illegal UTF-8 continuation
Warning: oracolo.txt:3:199: Illegal UTF-8 start
Warning: oracolo.txt:3:258: Illegal UTF-8 continuation
............
Warning: oracolo.txt:12:222: Illegal UTF-8 continuation
Warning: oracolo.txt:12:563: Illegal UTF-8 continuation
FMeasure = 0.02564102564102564

tagConfronto/3谓词比较dataggare.txt文件内容和oracolo.txt文件并计算相关值FMeasure

正如您所看到的，执行此操作发现oracolo.txt 编码存在一些问题，这给我带来了很多问题，因为它大大改变了 .txt 的值FMeasure。

只有当我在 Linux 上运行程序而不是在 Windows 下运行程序时我才会遇到这个问题（在第二种情况下，我没有警告和正确的FMeasure值）

一些同事告诉我，也许我可以以某种方式解决重新保存更改编码的文件（我不知道我是否必须以不同的方式保存System.txt或oracolo.txt，我不知道是什么我必须使用的编码类型或者如果有不同的解决方案）

有任何想法吗？

score 2 · Accepted Answer

在 Unix 上，

?- current_prolog_flag(encoding,X).
X = utf8.

在 Windows 上时

?- current_prolog_flag(encoding,X).
X = text.

也许您应该在打开文件时使用 open/4 设置相同的值 - 或者使用 set_prolog_flag/2 进行全局更改。要更改已打开的流，请使用 set_stream/2。

我不确定encoding(text)是否合适，请参阅文档页面了解所有支持的值。

linux - Linux系统Prolog程序中TXT文件编码的问题

1 回答 1

Related

Reference