regex - 如何通过perl和正则表达式验证字符串只有某些字母

Question

我正在寻找一个 perl 正则表达式，它将验证仅包含字母 ACGT 的字符串。例如，“AACGGGTTA”应该是有效的，而“AAYYGGTTA”应该是无效的，因为第二个字符串的“YY”不是A、C、G、T字母之一。我有以下代码，但它验证了上述两个字符串

if($userinput =~/[A|C|G|T]/i)
{
    $validEntry = 1;
    print "Valid\n";
}

谢谢

score 5 · Accepted Answer

使用字符类，并确保使用字符串标记的开头和字符串标记的结尾来检查整个\A字符串\z。

您还应该使用*or+来表示您想匹配多少个字符——*表示“零个或多个”，+表示“一个或多个”。

因此，下面的正则表达式是说“在（不区分大小写）字符串的开头和结尾之间，应该只有以下一个或多个字符：a、c、g、t”

if($userinput =~ /\A[acgt]+\z/i)
{
    $validEntry = 1;
    print "Valid\n";
}

score 5 · Accepted Answer

Using the character-counting tr operator:

if( $userinput !~ tr/ACGT//c )
{
    $validEntry = 1;
    print "Valid\n";
}

tr/characterset// counts how many characters in the string are in characterset; with the /c flag, it counts how many are not in the characterset. Using !~ instead of =~ negates the result, so it will be true if there are no characters not in characterset or false if there are characters not in characterset.

score 4 · Accepted Answer

您的角色类[A|C|G|T]包含|. |不代表字符类中的交替，它只代表它自己。因此，字符类将包含|字符，这不是您想要的。

你的模式没有锚定。该模式/[ACGT]+/将匹配任何包含一个或多个这些字符的字符串。相反，您需要锚定您的模式，以便仅匹配从头到尾仅包含这些字符的字符串。

$ can match a newline. To avoid that, use \z to anchor at the end. \A anchors at the beginning (although it doesn't make a difference whether you use that or ^ in this case, using \A provides a nice symmetry.

So, you check should be written:

if ($userinput =~ /\A [ACGT]+ \z/ix)
{
    $validEntry = 1;
    print "Valid\n";
}

regex - 如何通过perl和正则表达式验证字符串只有某些字母

3 回答 3

Related

Reference