perl - 电子表格问题::阅读

Question

在我的应用程序中，我们使用 Spreadsheet::Read 来读取 excel 并对行执行一些任务，最后将其添加到数据库中。

我们正在导入的文件是 Excel 文件 (.XLSX)。这个 excel 文件实际上是一个词汇表，它支持不同的用户语言。

问题是，我在这个过程中面临的问题是我们在某些行/列中有特殊字符单元格，这些单元格没有被正确解码。

例如，如果我有一个西班牙语 Excel 文件：

在 EXCEL 表中 =>从日志中提取

Información de cuenta => Informaci\x{f3}n de cuenta

Página de consola de administración de curso => P\x{e1}gina de consola de administraci\x{f3}n de curso

Informaci\x{f3}n de cuenta 被添加到 Db 中，并且在获取时它会在 UI 中显示无关字符。

我尝试了这个解决方案，但它不起作用。这基本上是电子表格的黑客攻击::Read

use Text::Iconv;
package Spreadsheet::XLSX;

sub new {
    my $converter = Text::Iconv->new("ASCII","utf-8");
    return __PACKAGE__->SUPER::new(@_, $converter);
}

请建议我有什么问题或任何更好的解决方案？

score 2 · Accepted Answer

Spreadsheet::Read returns strings as octets encoded in Latin1. To make Perl characters, use the Encode module. Read the introduction to the topic of encoding in Perl.

use Encode qw(decode);
use Spreadsheet::Read qw(ReadData);
my $ref = ReadData 'spanish.xls';
my $characters = decode 'Latin-1', $ref->[1]{A1};

perl - 电子表格问题::阅读

在 EXCEL 表中 =>从日志中提取

1 回答 1

Related

Reference