I want to write a perl script for checking if some characters are balanced in a string or not. if they are not balance, it would remove them. for example if a string only contains open parenthesis, this characters have to be removed. I used the following code, but it doesn't work...

sub checkBalance{
    my $text= $_[0];
    ### Check Balanced Quates
    my $count = ($text =~ tr/"//);
    if ( $count%2 !=0)
      $text=~ s/"//g;
    ### Check Balanced «»
    if (((($text =~ m#(.*».*)#) && !($text =~ m#(.*«.*)#)) || !(($text =~ m#(.*».*)#) && ($text =~ m#(.*«.*)#))) || (index($text, "«")>index($text, "»")))
      $text=~ s/»//g;
      $text=~ s/«//g;
    return $text;

Why it doesn't work?

The pl file is UTF8. Sample input is:

 می گوید: «یکی از اصول

and expected output is:

 می گوید: یکی از اصول

I used this code on an English string, it seems that it works for English strings, but not other languages such as Arabic and Persian.


use utf8;                               # Tell Perl script is encoded using UTF-8.
use strict;
use warnings;
use open ':std', ':encoding(UTF-8)';    # Tell Perl terminal expects UTF-8.
use feature qw( say );

sub checkBalance{

my $in = " می گوید: «یکی از اصول";
my $expect = " می گوید: یکی از اصول";
my $got = checkBalance($in);

say $in;
say $expect;
say $got;
say $got eq $expect ? "Got expected output" : "Didn't get expected output.";


$ perl x.pl
 می گوید: «یکی از اصول
 می گوید: یکی از اصول
 می گوید: یکی از اصول
Got expected output

我怀疑你没有告诉 Perl 你的源文件是使用 UTF-8 编码的。添加use utf8;.


