1

I want to write a perl script for checking if some characters are balanced in a string or not. if they are not balance, it would remove them. for example if a string only contains open parenthesis, this characters have to be removed. I used the following code, but it doesn't work...

sub checkBalance{
    my $text= $_[0];
    ### Check Balanced Quates
    my $count = ($text =~ tr/"//);
    if ( $count%2 !=0)
    {
      $text=~ s/"//g;
    }
    ### Check Balanced «»
    if (((($text =~ m#(.*».*)#) && !($text =~ m#(.*«.*)#)) || !(($text =~ m#(.*».*)#) && ($text =~ m#(.*«.*)#))) || (index($text, "«")>index($text, "»")))
    {
      $text=~ s/»//g;
      $text=~ s/«//g;
    }
    return $text;
} 

Why it doesn't work?

The pl file is UTF8. Sample input is:

 می گوید: «یکی از اصول

and expected output is:

 می گوید: یکی از اصول

I used this code on an English string, it seems that it works for English strings, but not other languages such as Arabic and Persian.

4

2 回答 2

3

添加缺失的位:

use utf8;                               # Tell Perl script is encoded using UTF-8.
use strict;
use warnings;
use open ':std', ':encoding(UTF-8)';    # Tell Perl terminal expects UTF-8.
use feature qw( say );

sub checkBalance{
   ...
}

my $in = " می گوید: «یکی از اصول";
my $expect = " می گوید: یکی از اصول";
my $got = checkBalance($in);

say $in;
say $expect;
say $got;
say $got eq $expect ? "Got expected output" : "Didn't get expected output.";

我得到正确的输出:

$ perl x.pl
 می گوید: «یکی از اصول
 می گوید: یکی از اصول
 می گوید: یکی از اصول
Got expected output

我怀疑你没有告诉 Perl 你的源文件是使用 UTF-8 编码的。添加use utf8;.

将来,请提供问题的演示。简单地发布您的功能并不能说明问题。

于 2012-07-02T20:42:30.327 回答
1

如果您不想自己滚动,可以使用Text::Balanced来处理在文本中查找平衡分隔符的问题。

于 2012-07-02T20:50:19.547 回答