4

我有以下测试:

use Test::More;
use Lingua::EN::NameCase 'nc';
use utf8;
my $output = Test::Builder->new->todo_output;
binmode $output, ':encoding(UTF-8)';
$output = Test::Builder->new->failure_output;
binmode $output, ':encoding(UTF-8)';

my $name = 'Lintão';
is nc($name), $name, 'nc() should not change a properly namecased name';
diag nc($name);

done_testing;

在带有 Perl 5.10.1 的 Mac OS X 上,我得到以下输出:

nc.t ..
ok 1 - nc() should not change a properly namecased name
1..1
# Lintão
ok
All tests successful.
Files=1, Tests=1,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.04 cusr  0.00 csys =  0.07 CPU)
Result: PASS

不幸的是,使用 5.10.1 Perl 在 Debian Squeezebox 上进行的相同测试会产生以下输出:

nc.t ..
not ok 1 - nc() should not change a properly namecased name

#   Failed test 'nc() should not change a properly namecased name'
#   at nc.t line 10.
#          got: 'LintãO'
#     expected: 'Lintão'
# LintãO
1..1
# Looks like you failed 1 test of 1.
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests

Test Summary Report
-------------------
nc.t (Wstat: 256 Tests: 1 Failed: 1)
  Failed test:  1
  Non-zero exit status: 1
Files=1, Tests=1,  0 wallclock secs ( 0.01 usr  0.00 sys +  0.03 cusr  0.00 csys =  0.04 CPU)
Result: FAIL

子例程中的违规行nc()似乎是这样的:

s{ \b (\w)   }{\u$1}gox ;           # Uppercase first letter of every word.

所以不知何故,Debian 上相同版本的 Perl 将边界这个词弄错了。谁能帮我进一步调试?

4

1 回答 1

6

Linux 机器上的语言环境不考虑ã单词字符(Lingua::EN::NameCasehasuse locale;所以它使用当前LC_CTYPE设置进行字符分类)。使用从 5.8.1 到 5.18.1 的 perlbrewed perls,我可以在具有en_GB.UTF-8区域设置的 Ubuntu 12.04 LTS 上始终获得此输出:

$ perl -Mutf8 -le 'print 0+("ã" =~ /\w/); use locale; print 0+("ã" =~ /\w/)'
1
0
于 2013-09-16T13:11:16.663 回答